This test requires restart of infrastructure components (database and network) - these steps must be coordinated with the other testers. Only the steps under "Database crash test" and "Network recovery test" below need to be coordinated.
Resubmit jobs after restart, restart of failed jobs, upload of old files at harvester restart, scheduler skips old jobs. |
Uses heritrix3 templates default_order_xml |
On devel@kb-prod-udv-001.kb.dk, run the tests as specified under "Release Test Information" on 5.4 Release Test, and with TESTX set to TEST6
Check that the GUI is available and that the System Status does not show any startup problems.
Start a hourly selective harvest for the 'netarkivet.dk' domain.
metadata.jobName=default_orderxml_smallwarcs
metadata.description=Default Profile generating small warc-files (5000 bytes)
warcWriter.maxFileSizeBytes = 5000
ssh netarkdv@sb-test-har-001.statsbiblioteket.dk cd TEST6/harvester_high cp -r ~netarkdv/testdata-h3/TEST6/23-fakejobdir . mkdir 23-fakejobdir/heritrix3/jobs/23-fakejobdir/logs touch 23-fakejobdir/heritrix3/jobs/23-fakejobdir/logs/crawl.log touch 23-fakejobdir/heritrix3/jobs/23-fakejobdir/logs/progress-statistics.log |
Wait 3 Hours then Restart the System
Check the log for warnings and errors.
cd /home/devel/$TESTX/log/ grep ERROR *.log | grep -v COMMON_ERROR grep WARN *.log |
The following entries are normal:
arcrepositoryapplication0.log.0:WARNING: AdminDataFile (./admin.data) was not found. guiapplication0.log.0:WARNING: Refusing to schedule harvest definition 'netarkivet' in the past. Skipped 18 events. Old nextDate was Mon Dec 18 14:29:30 CET 2006 new nextDate is Tue Dec 19 09:29:30 CET 2006 GUIApplication0.log.0:WARNING: Job 2 failed: HarvestErrors = dk.netarkivet.common.exceptions.IOFailure: Crawl probably interrupted by shutdown of HarvestController |
The following warning may occur after a while:
WARNING: Error processing message ' Class: com.sun.messaging.jmq.jmsclient.ObjectMessageImpl getJMSMessageID(): ID:40-130.225.27.140(d2:1:3:b1:10:de)-46478-1197902260630 getJMSTimestamp(): 1197902260630 getJMSCorrelationID(): null JMSReplyTo: null JMSDestination: TEST6_COMMON_THE_SCHED getJMSDeliveryMode(): PERSISTENT getJMSRedelivered(): false getJMSType(): null getJMSExpiration(): 0 getJMSPriority(): 4 Properties: null' dk.netarkivet.common.exceptions.UnknownID: Job id 23 is not known in persistent storage at dk.netarkivet.harvester.datamodel.JobDBDAO.read(JobDBDAO.java:294) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.processCrawlStatusMessage(HarvestSchedulerMonitorServer.java:103) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.visit(HarvestSchedulerMonitorServer.java:285) at dk.netarkivet.harvester.harvesting.distribute.CrawlStatusMessage.accept(CrawlStatusMessage.java:133) at dk.netarkivet.harvester.distribute.HarvesterMessageHandler.onMessage(HarvesterMessageHandler.java:67) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.deliverAndAcknowledge(MessageConsumerImpl.java:330) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.onMessage(MessageConsumerImpl.java:265) at com.sun.messaging.jmq.jmsclient.SessionReader.deliver(SessionReader.java:102) at com.sun.messaging.jmq.jmsclient.ConsumerReader.run(ConsumerReader.java:174) at java.lang.Thread.run(Thread.java:595) |
Use a browser set up as a viewerproxy connection for this test. Select any completed job and click on the "Browse reports for jobs" link.
You should see a list like
metadata://netarkivet.dk/crawl/setup/duplicatereductionjobs?majorversion=1&minorversion=0&harvestid=1&harvestnum=10&jobid=14 metadata://netarkivet.dk/crawl/setup/crawler-beans.cxml?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/setup/harvestInfo.xml?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/setup/seeds.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/archivefiles-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/crawl-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/frontier-summary-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/hosts-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/mimetype-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/responsecode-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/seeds-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/source-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/threads-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/alerts.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/crawl.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/heritrix3_err.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/heritrix3_out.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/heritrix_out.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/job.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/nonfatal-errors.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/progress-statistics.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/runtime-errors.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/scope.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/uri-errors.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/index/cdx?majorversion=2&minorversion=0&harvestid=1&jobid=14&filename=14-1-20161101215537865-00000-ciblee_2015_sb-test-har-001.statsbiblioteket.dk.warc |
Check that all the entries are present and browse each in turn. (Note that the HeritrixVersion, harvestIf, and jobId will differ). Some of the entries might be empty