...
Check the log for warnings and errors.
Code Block cd /home/devel/$TESTX/log/ grep ERROR *.log | grep -v COMMON_ERROR grep WARN *.log
When checking for warnings/errors, be sure to ignore any warnings/error that happened before the above restart. Also, the following kinds of entries are normal/known, and can be ignored:
Code Block arcrepositoryapplication0.log.0:WARNING: AdminDataFile (./admin.data) was not found. HarvestJobManagerApplication.log:13:12:05.567 WARN d.n.h.s.jobgen.AbstractJobGenerator.generateJobs - Refusing to schedule harvest definition 'TEST6-selective-harvest-HOURLY' in the past. Skipped 71 events. Old nextDate was Fri Apr 13 13:59:19 CEST 2018 new nextDate is Mon Apr 16 13:59:19 CEST 2018 HarvestJobManagerApplication.log:13:12:20.959 WARN d.n.h.s.HarvestSchedulerMonitorServer.processCrawlStatusMessage - Job 124 failed: HarvestErrors = dk.netarkivet.common.exceptions.IOFailure: Crawl probably interrupted by shutdown of HarvestController HarvestJobManagerApplication.log:13:13:17.710 WARN d.n.h.s.HarvestSchedulerMonitorServer.processCrawlStatusMessage - Received unexpected CrawlStatusMessage for job 23 with new status FAILED, current state is DONE. Marking job as DONE. Reported harvestErrors on job: dk.netarkivet.common.exceptions.IOFailure: Crawl probably interrupted by shutdown of HarvestController HarvestJobManagerApplication.2018-04-09.0.log:15:49:59.836 WARN d.n.h.datamodel.H3HeritrixTemplate.insertAttributes - Placeholder '%{MAX_HOPS}' not found in template. Therefore not substituted by '10' in this template HarvestJobManagerApplication.2018-04-09.0.log:15:49:59.837 WARN d.n.h.datamodel.H3HeritrixTemplate.insertAttributes - Placeholder '%{HONOR_ROBOTS_DOT_TXT}' not found in template. Therefore not substituted by 'ignore' in this template HarvestJobManagerApplication.2018-04-09.0.log:15:49:59.837 WARN d.n.h.datamodel.H3HeritrixTemplate.insertAttributes - Placeholder '%{EXTRACT_JAVASCRIPT}' not found in template. Therefore not substituted by 'true' in this template HarvestJobManagerApplication.2018-04-09.0.log:14:59:59.489 WARN d.n.h.datamodel.HeritrixTemplate.editOrderXMLAddPerDomainCrawlerTraps - Found empty trap for domain netarkivet.dk ArcRepositoryApplication.log:13:11:49.119 WARN d.n.a.arcrepository.ArcRepository.startUpload - Trying to upload file '123-9-20180413105219139-00029-kb-test-har-004.kb.dk.warc.gz' that already has state UPLOAD_COMPLETED for this replica BitarchiveMonitorApplication_KBBM.2018-04-10.0.log:13:41:05.321 WARN d.n.a.bitarchive.BitarchiveMonitor.updateWithBitarchiveReply - Received batch reply with error: Batch job failed on 1 files. at BA monitor from bitarchive 10.17.0.56_BitApp_1 BitarchiveMonitorApplication_KBBM.2018-04-11.0.log:11:09:47.037 WARN d.n.c.distribute.JMSConnectionSunMQ.onException - JMSException with errorcode 'C4056' encountered: HarvestJobManagerApplication.2018-04-09.0.log:15:02:01.877 WARN d.n.h.s.HarvestSchedulerMonitorServer.processCrawlStatusMessage - Job 2 failed: HarvestErrors = java.lang.RuntimeException: Exception during crawl GUIApplication.2018-04-10.0.log:11:05:28.412 WARN dk.netarkivet.common.utils.DBUtils.setStringMaxLength - lastPeekUri of dk.netarkivet.harvester.harvesting.frontier.FrontierReportLine@96f6d5e3 is longer than the allowed 1000 characters. The contents is truncated to length 1000. The untruncated contents was: https://www.firstpost.com/%22data:image/jpeg;base64... GUIApplication.2018-04-10.0.log:13:41:05.350 WARN d.n.a.a.d.JMSArcRepositoryClient.batch - The batch job 'ID:59980-130.226.228.6(f0:ef:fc:a:6:4d)-40252-1523360465135: To TEST6_COMMON_THE_REPOS ReplyTo TEST6_COMMON_THIS_REPOS_CLIENT_130_226_228_6_GUIWS OK Job: dk.netarkivet.viewerproxy.webinterface.CrawlLogLinesMatchingRegexp, on filename-pattern: 31-metadata-[0-9]+\.(w)?arc(\.gz)?, for replica: KB' resulted in the following error: Batch job failed on 1 files.
The following kind of warning can be ignored, unless it appears repeatedly:
Code Block GUIApplication.2018-04-09.0.log:15:01:57.609 WARN d.n.monitor.jmx.HostForwarding.registerRemoteMbeans - Failure connecting to remote JMX MBeanserver (Host=kb-test-acs-001.kb.dk, JMXport=8150, RMIport=8250, last seen live at Mon Apr 09 15:01:47 CEST 2018). Creating an error MBean
The following warning may occur after a while, and can be ignored as well:
Code Block WARNING: Error processing message ' Class: com.sun.messaging.jmq.jmsclient.ObjectMessageImpl getJMSMessageID(): ID:40-130.225.27.140(d2:1:3:b1:10:de)-46478-1197902260630 getJMSTimestamp(): 1197902260630 getJMSCorrelationID(): null JMSReplyTo: null JMSDestination: TEST6_COMMON_THE_SCHED getJMSDeliveryMode(): PERSISTENT getJMSRedelivered(): false getJMSType(): null getJMSExpiration(): 0 getJMSPriority(): 4 Properties: null' dk.netarkivet.common.exceptions.UnknownID: Job id 23 is not known in persistent storage at dk.netarkivet.harvester.datamodel.JobDBDAO.read(JobDBDAO.java:294) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.processCrawlStatusMessage(HarvestSchedulerMonitorServer.java:103) at dk.netarkivet.harvester.scheduler.HarvestSchedulerMonitorServer.visit(HarvestSchedulerMonitorServer.java:285) at dk.netarkivet.harvester.harvesting.distribute.CrawlStatusMessage.accept(CrawlStatusMessage.java:133) at dk.netarkivet.harvester.distribute.HarvesterMessageHandler.onMessage(HarvesterMessageHandler.java:67) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.deliverAndAcknowledge(MessageConsumerImpl.java:330) at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.onMessage(MessageConsumerImpl.java:265) at com.sun.messaging.jmq.jmsclient.SessionReader.deliver(SessionReader.java:102) at com.sun.messaging.jmq.jmsclient.ConsumerReader.run(ConsumerReader.java:174) at java.lang.Thread.run(Thread.java:595)
Any other warning should be considered a release test failure.
- Go to the system overview page and check that all the expected applications are listening and are up without warnings or errors.
If there is a warning of this kind:Remote JMX bean generated exception:
javax.management.InstanceNotFoundException: dk.netarkivet.common.logging:applicationinstanceid=,name
=error_host_kb-test-har-004.kb.dk_8150,httpport=8076,machine=kb-test-adm-001.kb.dk,applicationname=d
k.netarkivet.common.webinterface.GUIWebServer,index=0,channel=,replicaname=KBN,hostname=kb-test-har-
004.kb.dk,location=K
then refresh the system state overview page. The warning should disappear.
- Check that the scheduler schedules only one job for the hourly selective harvest.
...
- Go to "Harvest status"→"All Jobs", select job status "Failed", and press "Show". Check that you can reject a job for resubmission using the "Reject?" button so that it is no longer visible when you list failed jobs.
- Check that you can see the rejected job when you now list all jobs.
- Click on one or more "Genstart"/"Restart?" buttons to resubmit. Note that you only can resubmit jobs failed due to harvesting errors, not due to upload errors.
- Check that the job-status changes to "resubmitted" and that a new Job is made from the same harvestdefinition with the same configurations.
- Check that resubmitted jobs contain information about which job they were resubmitted (NAS-1466)
Check Report Generation
Use a browser set up as a viewerproxy connection for this test (see https://sbprojectskb-dk.statsbiblioteketatlassian.dknet/wiki/pages/viewpage.action?pageId=37597440#TheNetarkivetDistributedTest12225230#TheNetarkivetDistributedTest/DevelEnvironment-ViewerProxyUsage ). Select any completed job and click on the "Browse reports for jobs" link.
...
Tests that the system can survive a database crash/stop and resume operation after the database is restarted
Logging Log in as root on kb-test-adm-001
kbssh
test@kb-test-adm-
001
su
Stop the postgresdb and wait a couple of minutes.
/etc/init.d/postgresql stop
- Verify that the GUI has lost the connection to the database by listing domains or harvest definitions.
Restart the database
/etc/init.d/postgresql start
- Check that the different GUI pages works as usual.
- Create a new active selecive and verify the a job is created and started.
...
login on to kb-test-adm-001 as root and stop the networkinterface by installing a cron-job that does this for you:
Install script restartNetworkWithWait.sh as root cronjob (Add 0 17 * * * (/root/restartNetworkWithWait.sh) to restart network at 5 PM)#!/bin/bash
# stopping network
/etc/init.d/network stop
# waiting
3
minutes
/bin/sleep 3m
# starting network
/etc/init.d/network start
- Check that the connection to the GUI is lost.
- After 5 minutes verify the system comes back online
- Verify that the GUI pages are working properly.
- Create a new active selective harvest definition and verify that a new job is created and started.
- Run a batch job or two and verify these work correctly.
...