...
- Create a new snapshot harvest Set 'Max number of bytes per domain’ 1.000.000 bytes (1 mbyte).
- Tjek that the job is started correctly in the 'Harvest status'->'All Jobs' in the left menu and that no errors or warnings are present in the system overview.
Check that at least one file has been uploaded.
...
- Stop the system after the first arc fil has been uploaded
- Go to harvest status page at http://kb-test-adm-001.kb.dk:8076/HarvestDefinition and find the Job for kum.dk.
- In the system overview finde the harvester running the job. The information will appear in the log column when the job has been started.
- Run the attached script to stop the test system after the first arcfile has been uploaded. Note that the script needs to be updated with the relevant job number and harvester.
- Check that the coreect file has been generated.
- Log on to the harveter, eg. ssh kb-test-har-001.
- Verify that a meta data fil exists at ~/TEST?/harvester_low/{crawldir}/metadata/
- Copy the file to /tmp
- Create a fake crawl dir
- ssh sb-test-har-001.statsbiblioteket.dk
- cd TEST6/harvester_high
- cp -r ~netarkiv/testdata/TEST6/23-fakejobdir .
- Restart the test system
- Verify the restarted system. On kb-test-adm-001
Check the log for warnings and errors.
Code Block cd /home/test/$TESTX/log/ grep SEVERE *.log.0 grep WARNING *.log.0
The following entries are normal:
Code Block arcrepositoryapplication0.log.0:WARNING: AdminDataFile (./admin.data) was not found. guiapplication0.log.0:WARNING: Refusing to schedule harvest definition 'netarkivet' in the past. Skipped 18 events. Old nextDate was Mon Dec 18 14:29:30 CET 2006 new nextDate is Tue Dec 19 09:29:30 CET 2006 GUIApplication0.log.0:WARNING: Job 2 failed: HarvestErrors = dk.netarkivet.common.exceptions.IOFailure: Crawl probably interrupted by shutdown of HarvestController
The following warning may occur after a while:
Code Block WARNING: Error processing message ' Class: com.sun.messaging.jmq.jmsclient.ObjectMessageImpl getJMSMessageID(): ID:40-130.225.27.140(d2:1:3:b1:10:de)-46478-1197902260630 getJMSTimestamp(): 1197902260630 getJMSCorrelationID(): null JMSReplyTo: null JMSDestination: TEST6_COMMON_THE_SCHED getJMSDeliveryMode(): PERSISTENT getJMSRedelivered(): false getJMSType(): null getJMSExpiration(): 0 getJMSPriority(): 4 Properties: null' dk.netarkivet.common.exceptions.UnknownID: Job id 23 is not known in persistent storage at dk.netarkivet.harvester.datamodel.JobDBDAO.read(JobDBDAO.java:294) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.processCrawlStatusMessage(HarvestSchedulerMonitorServer.java:103) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.visit(HarvestSchedulerMonitorServer.java:285) at dk.netarkivet.harvester.harvesting.distribute.CrawlStatusMessage.accept(CrawlStatusMessage.java:133) at dk.netarkivet.harvester.distribute.HarvesterMessageHandler.onMessage(HarvesterMessageHandler.java:67) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.deliverAndAcknowledge(MessageConsumerImpl.java:330) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.onMessage(MessageConsumerImpl.java:265) at com.sun.messaging.jmq.jmsclient.SessionReader.deliver(SessionReader.java:102) at com.sun.messaging.jmq.jmsclient.ConsumerReader.run(ConsumerReader.java:174) at java.lang.Thread.run(Thread.java:595)
- Go to the system overview page and check that all the expected applications are listen and are without warnings or errors.
...