This test requires restart of infrastructure components (database and network). These steps must be coordinated with the other testers.
Resubmit jobs after restart, restart of failed jobs, upload of old files at harvester restart, scheduler skips old jobs.
|
Under migration from old twiki test, TEST6: Err-1 (Resubmit af jobs efter nedlukning, genstart af fejlede jobs, upload af gamle filer ved høster-genstart, scheduler overspringer gamle jobs). |
Uses heritrix3 templates default_order_xml |
On devel@kb-prod-udv-001.kb.dk:
export TESTX=TEST6 export PORT= 807 ? export MAILRECEIVERS=foo @bar .dk export VERSION=???????????????? all_test.sh |
Check that the GUI is available and that the System Status does not show any startup problems.
Start a hourly selective harvest for the 'netarkivet.dk' domain.
metadata.jobName=default_orderxml_smallwarcs
metadata.description=Default Profile generating small warc-files (5000 bytes)
warcWriter.maxFileSizeBytes = 5000
ssh netarkdv@sb-test-har-001.statsbiblioteket.dk cd TEST6/harvester_high cp -r ~netarkdv/testdata-h3/TEST6/23-fakejobdir . mkdir 23-fakejobdir/heritrix3/jobs/23-fakejobdir/logs touch 23-fakejobdir/heritrix3/jobs/23-fakejobdir/logs/crawl.log touch 23-fakejobdir/heritrix3/jobs/23-fakejobdir/logs/progress-statistics.log |
Wait 3 Hours then Restart the System
Check the log for warnings and errors.
cd /home/devel/$TESTX/log/ grep ERROR *.log | grep -v COMMON_ERROR grep WARN *.log |
The following entries are normal:
arcrepositoryapplication0.log.0:WARNING: AdminDataFile (./admin.data) was not found. guiapplication0.log.0:WARNING: Refusing to schedule harvest definition 'netarkivet' in the past. Skipped 18 events. Old nextDate was Mon Dec 18 14:29:30 CET 2006 new nextDate is Tue Dec 19 09:29:30 CET 2006 GUIApplication0.log.0:WARNING: Job 2 failed: HarvestErrors = dk.netarkivet.common.exceptions.IOFailure: Crawl probably interrupted by shutdown of HarvestController |
The following warning may occur after a while:
WARNING: Error processing message ' Class: com.sun.messaging.jmq.jmsclient.ObjectMessageImpl getJMSMessageID(): ID:40-130.225.27.140(d2:1:3:b1:10:de)-46478-1197902260630 getJMSTimestamp(): 1197902260630 getJMSCorrelationID(): null JMSReplyTo: null JMSDestination: TEST6_COMMON_THE_SCHED getJMSDeliveryMode(): PERSISTENT getJMSRedelivered(): false getJMSType(): null getJMSExpiration(): 0 getJMSPriority(): 4 Properties: null' dk.netarkivet.common.exceptions.UnknownID: Job id 23 is not known in persistent storage at dk.netarkivet.harvester.datamodel.JobDBDAO.read(JobDBDAO.java:294) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.processCrawlStatusMessage(HarvestSchedulerMonitorServer.java:103) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.visit(HarvestSchedulerMonitorServer.java:285) at dk.netarkivet.harvester.harvesting.distribute.CrawlStatusMessage.accept(CrawlStatusMessage.java:133) at dk.netarkivet.harvester.distribute.HarvesterMessageHandler.onMessage(HarvesterMessageHandler.java:67) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.deliverAndAcknowledge(MessageConsumerImpl.java:330) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.onMessage(MessageConsumerImpl.java:265) at com.sun.messaging.jmq.jmsclient.SessionReader.deliver(SessionReader.java:102) at com.sun.messaging.jmq.jmsclient.ConsumerReader.run(ConsumerReader.java:174) at java.lang.Thread.run(Thread.java:595) |
Use a browser set up as a viewerproxy connection for this test. Select any completed job and click on the "Browse reports for jobs" link.
You should see a list like
metadata://netarkivet.dk/crawl/setup/duplicatereductionjobs?majorversion=1&minorversion=0&harvestid=1&harvestnum=0&jobid=1 metadata://netarkivet.dk/crawl/setup/crawl-manifest.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/setup/harvestInfo.xml?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/setup/order.xml?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/setup/seeds.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/reports/arcfiles-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/reports/crawl-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/reports/frontier-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/reports/hosts-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/reports/mimetype-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/reports/responsecode-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/reports/seeds-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/logs/crawl.log?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/logs/heritrix.out?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/logs/heritrix_dmesg.log?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/logs/local-errors.log?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/logs/progress-statistics.log?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/logs/runtime-errors.log?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/logs/uri-errors.log?heritrixVersion=1.14.4&harvestid=1&jobid=1 metadata://netarkivet.dk/crawl/index/cdx?majorversion=1&minorversion=0&harvestid=1&jobid=1×tamp=20130814073013&serialno=00000 |
Check that all the entries are present and browse each in turn. (Note that the HeritrixVersion, harvestIf, and jobId will differ). Some of the errors might be empty