...
Stop the Test Automatically During Upload
- Using the GUI, go to "Harvest status"→"All Jobs", and by clicking each Job ID for the snapshot harvest in turn, find the job ID for the job in which kum.dk is being harvested.
- Go back to "Harvest status"→"All Jobs", and reload the page until the job you just identified has status "Started"... then immediately go to "Harvest status"→"H3 Remote Access",
...
- keep reloading the page until the job ID found above appears, and
...
- click the job
...
- ID (this may take several tries until it is ready) then immediately pause the job.
- Go to "Harvest status"→"
...
- H3 Remote Access" and click the job ID you identified, then click "View/Search in cached Crawllog", then "Update cache". Go to "Harvest status"→"All Running Jobs" and search for "kum.dk" to find the job, then note down the name of the harvest machine (Host) for that job.
- Download the attached script and modify it to point at the correct harvester and job number
- Copy the script to
kb-prod-udv-001.kb.dk
...
:/home/devel/
, give it a "chmod 755"
then run it. It monitors the "warcs" directory and as soon as the first warcfile is uploaded it detects that uploading has started and shuts down the test instance.
...
- Go to "Harvest status"→"H3 Remote Access"→identified Job ID, and unpause the job (no explicit logout is necessary)
- Wait for the job to complete, after which the TEST6 instance is stopped, starting with the apps on machine harvesting kum.dk
Save the Metadata Warcfile
- Log From
kb-prod-udv-001.kb.dk
, log into the harvester where kum.dk was being harvested Find the crawldir in TEST6/harvester(with user netarkdv if harvester is in Aarhus, and userdevel
if harvester is in Kbh). - Find the crawldir in TEST6/harvester_low
- Find the metadata warcfile in the metadata subdirectory and copy it to TEST6/
Create a Fake Crawl Dir
From kb-prod-udv-001.kb.dk
do:
Code Block |
---|
ssh netarkdv@sb-test-har-001.statsbiblioteket.dk cd TEST6/harvester_high cp -r ~netarkdv/testdata-h3/TEST6/23-fakejobdir . mkdir 23-fakejobdir/heritrix3/jobs/23-fakejobdir/logs touch 23-fakejobdir/heritrix3/jobs/23-fakejobdir/logs/crawl.log touch 23-fakejobdir/heritrix3/jobs/23-fakejobdir/logs/progress-statistics.log |
...
Wait atleast 3 Hours then Restart the System (by running stop_test.sh
then start_test.sh
)
Verify the restarted system. On devel@kb-test-adm-001
...
Check the log for warnings and errors.
Code Block cd /home/devel/$TESTX/log/ grep ERROR *.log | grep -v COMMON_ERROR grep WARN *.log
The following entries are normal:
Code Block When checking for warnings/errors, be sure to ignore any warnings/error that happened before the above restart. Also, the following kinds of entries are normal/known, and can be ignored:
Code Block arcrepositoryapplication0.log.0:WARNING: AdminDataFile (./admin.data) was not found. guiapplication0HarvestJobManagerApplication.log.0:13:WARNING12:05.567 RefusingWARN to schedule d.n.h.s.jobgen.AbstractJobGenerator.generateJobs - Refusing to schedule harvest definition 'netarkivetTEST6-selective-harvest-HOURLY' in the past. Skipped 1871 events. Old nextDate was MonFri DecApr 1813 1413:2959:3019 CETCEST 20062018 new nextDate is TueMon DecApr 1916 0913:2959:3019 CEST CET2018 2006 GUIApplication0HarvestJobManagerApplication.log.0:13:WARNING12:20.959 Job 2WARN d.n.h.s.HarvestSchedulerMonitorServer.processCrawlStatusMessage - Job 124 failed: HarvestErrors = dk.netarkivet.common.exceptions.IOFailure: Crawl probably interrupted by shutdown of HarvestController
The following warning may occur after a while:
Code Block WARNING: Error processing message ' Class: com.sun.messaging.jmq.jmsclient.ObjectMessageImpl getJMSMessageID(): ID:40-130.225.27.140(d2:1:3:b1:10:de)-46478-1197902260630 getJMSTimestamp(): 1197902260630 getJMSCorrelationID(): null JMSReplyTo: null JMSDestination: TEST6_COMMON_THE_SCHED getJMSDeliveryMode(): PERSISTENT getJMSRedelivered(): false getJMSType(): null getJMSExpiration(): 0 getJMSPriority(): 4 Properties: null' dk.netarkivet.common.exceptions.UnknownID: Job id 23 is not known in persistent storage at dk.netarkivet.harvester.datamodel.JobDBDAO.read(JobDBDAO.java:294) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.processCrawlStatusMessage(HarvestSchedulerMonitorServer.java:103) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.visit(HarvestSchedulerMonitorServer.java:285) at dk.netarkivet.harvester.harvesting.distribute.CrawlStatusMessage.accept(CrawlStatusMessage.java:133) at dk.netarkivet.harvester.distribute.HarvesterMessageHandler.onMessage(HarvesterMessageHandler.java:67) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.deliverAndAcknowledge(MessageConsumerImpl.java:330) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.onMessage(MessageConsumerImpl.java:265) at com.sun.messaging.jmq.jmsclient.SessionReader.deliver(SessionReader.java:102) at com.sun.messaging.jmq.jmsclient.ConsumerReader.run(ConsumerReader.java:174) at java.lang.Thread.run(Thread.java:595)
- Go to the system overview page and check that all the expected applications are listening and are up without warnings or errors.
- Check that the scheduler schedules only one job for the hourly selective harvest.
Check that a job can be resubmitted
- Check that you can reject a job for resubmission using the "Reject?" button so that it is no longer visible when you list failed jobs.
- Check that you can see the rejected job when you now list all jobs.
- Click on one or more "Genstart"/"Resubmit" buttons. Note that you only can resubmit jobs failed due to harvesting errors, not due to upload errors.
- Check that the job-status changes to "resubmitted" and that a new Job is made from the same harvestdefinition with the same configurations.
- Check that resubmitted jobs contain information about which job they were resubmitted (NAS-1466)
Check Report Generation
Use a browser set up as a viewerproxy connection for this test. Select any completed job and click on the "Browse reports for jobs" link.
You should see a list like
HarvestJobManagerApplication.log:13:13:17.710 WARN d.n.h.s.HarvestSchedulerMonitorServer.processCrawlStatusMessage - Received unexpected CrawlStatusMessage for job 23 with new status FAILED, current state is DONE. Marking job as DONE. Reported harvestErrors on job: dk.netarkivet.common.exceptions.IOFailure: Crawl probably interrupted by shutdown of HarvestController HarvestJobManagerApplication.2018-04-09.0.log:15:49:59.836 WARN d.n.h.datamodel.H3HeritrixTemplate.insertAttributes - Placeholder '%{MAX_HOPS}' not found in template. Therefore not substituted by '10' in this template HarvestJobManagerApplication.2018-04-09.0.log:15:49:59.837 WARN d.n.h.datamodel.H3HeritrixTemplate.insertAttributes - Placeholder '%{HONOR_ROBOTS_DOT_TXT}' not found in template. Therefore not substituted by 'ignore' in this template HarvestJobManagerApplication.2018-04-09.0.log:15:49:59.837 WARN d.n.h.datamodel.H3HeritrixTemplate.insertAttributes - Placeholder '%{EXTRACT_JAVASCRIPT}' not found in template. Therefore not substituted by 'true' in this template HarvestJobManagerApplication.2018-04-09.0.log:14:59:59.489 WARN d.n.h.datamodel.HeritrixTemplate.editOrderXMLAddPerDomainCrawlerTraps - Found empty trap for domain netarkivet.dk ArcRepositoryApplication.log:13:11:49.119 WARN d.n.a.arcrepository.ArcRepository.startUpload - Trying to upload file '123-9-20180413105219139-00029-kb-test-har-004.kb.dk.warc.gz' that already has state UPLOAD_COMPLETED for this replica BitarchiveMonitorApplication_KBBM.2018-04-10.0.log:13:41:05.321 WARN d.n.a.bitarchive.BitarchiveMonitor.updateWithBitarchiveReply - Received batch reply with error: Batch job failed on 1 files. at BA monitor from bitarchive 10.17.0.56_BitApp_1 BitarchiveMonitorApplication_KBBM.2018-04-11.0.log:11:09:47.037 WARN d.n.c.distribute.JMSConnectionSunMQ.onException - JMSException with errorcode 'C4056' encountered: HarvestJobManagerApplication.2018-04-09.0.log:15:02:01.877 WARN d.n.h.s.HarvestSchedulerMonitorServer.processCrawlStatusMessage - Job 2 failed: HarvestErrors = java.lang.RuntimeException: Exception during crawl GUIApplication.2018-04-10.0.log:11:05:28.412 WARN dk.netarkivet.common.utils.DBUtils.setStringMaxLength - lastPeekUri of dk.netarkivet.harvester.harvesting.frontier.FrontierReportLine@96f6d5e3 is longer than the allowed 1000 characters. The contents is truncated to length 1000. The untruncated contents was: https://www.firstpost.com/%22data:image/jpeg;base64... GUIApplication.2018-04-10.0.log:13:41:05.350 WARN d.n.a.a.d.JMSArcRepositoryClient.batch - The batch job 'ID:59980-130.226.228.6(f0:ef:fc:a:6:4d)-40252-1523360465135: To TEST6_COMMON_THE_REPOS ReplyTo TEST6_COMMON_THIS_REPOS_CLIENT_130_226_228_6_GUIWS OK Job: dk.netarkivet.viewerproxy.webinterface.CrawlLogLinesMatchingRegexp, on filename-pattern: 31-metadata-[0-9]+\.(w)?arc(\.gz)?, for replica: KB' resulted in the following error: Batch job failed on 1 files.
The following kind of warning can be ignored, unless it appears repeatedly:
Code Block GUIApplication.2018-04-09.0.log:15:01:57.609 WARN d.n.monitor.jmx.HostForwarding.registerRemoteMbeans - Failure connecting to remote JMX MBeanserver (Host=kb-test-acs-001.kb.dk, JMXport=8150, RMIport=8250, last seen live at Mon Apr 09 15:01:47 CEST 2018). Creating an error MBean
The following warning may occur after a while, and can be ignored as well:
Code Block WARNING: Error processing message ' Class: com.sun.messaging.jmq.jmsclient.ObjectMessageImpl getJMSMessageID(): ID:40-130.225.27.140(d2:1:3:b1:10:de)-46478-1197902260630 getJMSTimestamp(): 1197902260630 getJMSCorrelationID(): null JMSReplyTo: null JMSDestination: TEST6_COMMON_THE_SCHED getJMSDeliveryMode(): PERSISTENT getJMSRedelivered(): false getJMSType(): null getJMSExpiration(): 0 getJMSPriority(): 4 Properties: null' dk.netarkivet.common.exceptions.UnknownID: Job id 23 is not known in persistent storage at dk.netarkivet.harvester.datamodel.JobDBDAO.read(JobDBDAO.java:294) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.processCrawlStatusMessage(HarvestSchedulerMonitorServer.java:103) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.visit(HarvestSchedulerMonitorServer.java:285) at dk.netarkivet.harvester.harvesting.distribute.CrawlStatusMessage.accept(CrawlStatusMessage.java:133) at dk.netarkivet.harvester.distribute.HarvesterMessageHandler.onMessage(HarvesterMessageHandler.java:67) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.deliverAndAcknowledge(MessageConsumerImpl.java:330) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.onMessage(MessageConsumerImpl.java:265) at com.sun.messaging.jmq.jmsclient.SessionReader.deliver(SessionReader.java:102) at com.sun.messaging.jmq.jmsclient.ConsumerReader.run(ConsumerReader.java:174) at java.lang.Thread.run(Thread.java:595)
Any other warning should be considered a release test failure.
- Go to the system overview page and check that all the expected applications are listening and are up without warnings or errors.
If there is a warning of this kind:Remote JMX bean generated exception:
javax.management.InstanceNotFoundException: dk.netarkivet.common.logging:applicationinstanceid=,name
=error_host_kb-test-har-004.kb.dk_8150,httpport=8076,machine=kb-test-adm-001.kb.dk,applicationname=d
k.netarkivet.common.webinterface.GUIWebServer,index=0,channel=,replicaname=KBN,hostname=kb-test-har-
004.kb.dk,location=K
then refresh the system state overview page. The warning should disappear.
- Check that the scheduler schedules only one job for the hourly selective harvest.
Check that a job can be resubmitted
- Go to "Harvest status"→"All Jobs", select job status "Failed", and press "Show". Check that you can reject a job for resubmission using the "Reject?" button so that it is no longer visible when you list failed jobs.
- Check that you can see the rejected job when you now list all jobs.
- Click on one or more "Genstart"/"Restart?" buttons to resubmit. Note that you only can resubmit jobs failed due to harvesting errors, not due to upload errors.
- Check that the job-status changes to "resubmitted" and that a new Job is made from the same harvestdefinition with the same configurations.
- Check that resubmitted jobs contain information about which job they were resubmitted (NAS-1466)
Check Report Generation
Use a browser set up as a viewerproxy connection for this test (see https://kb-dk.atlassian.net/wiki/pages/viewpage.action?pageId=12225230#TheNetarkivetDistributedTest/DevelEnvironment-ViewerProxyUsage ). Select any completed job and click on the "Browse reports for jobs" link.
You should see a list like
Code Block |
---|
metadata://netarkivet.dk/crawl/setup/duplicatereductionjobs?majorversion=1&minorversion=0&harvestid=1&harvestnum=10&jobid=14 metadata://netarkivet.dk/crawl/setup/crawler-beans.cxml?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/setup/harvestInfo.xml?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/setup/seeds.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/archivefiles-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/setup/duplicatereductionjobs?majorversion=1&minorversion=0reports/crawl-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&harvestnum=10&jobid=14 metadata://netarkivet.dk/crawl/setupreports/crawlerfrontier-summary-beansreport.cxmltxt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/setupreports/harvestInfohosts-report.xmltxt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/setupreports/seedsmimetype-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/archivefilesprocessors-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/crawlresponsecode-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/frontier-summaryseeds-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/hostssource-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/mimetypethreads-report.txt?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reportslogs/processors-reportalerts.txtlog?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reportslogs/responsecode-reportcrawl.txtlog?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/seeds-report.txtlogs/heritrix3_err.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/source-report.txtlogs/heritrix3_out.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/reports/threads-report.txtlogs/heritrix_out.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/alertsjob.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/crawl/nonfatal-errors.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/heritrix3_errprogress-statistics.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/heritrix3_outruntime-errors.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/heritrix_outscope.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/joburi-errors.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/nonfatal-errors.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/progress-statistics.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/runtime-errors.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/scope.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/logs/uri-errors.log?heritrixVersion=3.3.0-LBS-2016-02&harvestid=1&jobid=14 metadata://netarkivet.dk/crawl/index/cdx?majorversion=2&minorversion=0&harvestid=1&jobid=14&filename=14-1-20161101215537865-00000-ciblee_2015_sb-test-har-001.statsbiblioteket.dk.warc |
Check that all the entries are present and browse each in turn. (Note that the HeritrixVersion, harvestIf, and jobId will differ). Some of the entries might be empty
...
index/cdx?majorversion=2&minorversion=0&harvestid=1&jobid=14&filename=14-1-20161101215537865-00000-ciblee_2015_sb-test-har-001.statsbiblioteket.dk.warc |
Check that a few (like, 3) of the entries are present and browse each in turn. (Note that the HeritrixVersion, harvestIf, and jobId will differ). Some of the entries might be empty.
The following two tests ("Database crash test" and "Network recovery test") must be coordinated with the other testers.
Database crash test
Tests that the system can survive a database crash/stop and resume operation after the database is restarted
Log in as root on kb-test-adm-001
ssh test@kb-test-adm-
001
su
Stop the postgresdb and wait a couple of minutes.
/etc/init.d/postgresql stop
- Verify that the GUI has lost the connection to the database by listing domains or harvest definitions.
Restart the database
/etc/init.d/postgresql start
- Check that the different GUI pages works as usual.
- Create a new active selecive and verify the a job is created and started.
Network recovery test
Tests that the system can survive a network crash/stop and resume operation after the becomes available.
Disable the network/switch/DC for some minutes and see that all batch processes reconnects and continue after restart
login on to kb-test-adm-001 as root and stop the networkinterface by installing a cron-job that does this for you:
Install script restartNetworkWithWait.sh as root cronjob (Add 0 17 * * * (/root/restartNetworkWithWait.sh) to restart network at 5 PM)#!/bin/bash
# stopping network
/etc/init.d/network stop
# waiting
3
minutes
/bin/sleep 3m
# starting network
/etc/init.d/network start
- Check that the connection to the GUI is lost.
- After 5 minutes verify the system comes back online
- Verify that the GUI pages are working properly.
- Create a new active selective harvest definition and verify that a new job is created and started.
- Run a batch job or two and verify these work correctly.
Shutdown the system