Excerpt |
---|
Test basic start and stop of selectiv-,event- and snapshot harvesting, scheduling and deduplication. |
Table of Contents |
---|
Check of monitoring and basic settings
...
Verify that the harvest is activated and done
- Click 'Harvest status'->'All Jobs' in the left menu
- Select "All" in "Only display job status" to the right from the menu
- Click the "Show" button, until the jobs have stepped through statuses "NEW", "SUBMITTED", "STARTED", "DONE"
- Wait until all jobs have got status "DONE"
- Check the following for the domains '''raeder.dk''' and '''kb.dk''': (Using page Harvest Status -> All jobs per domain)
- Check that the domain has been harvested by one job of the name <eh. name>
- Check that this job has configuration <eh. name>_frontpages__
- Check that there is a number for 'Run number' and 'Job ID'
- Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> harvest
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contain "Domain Completed"
- Check the following job details for the domain '''netarkivet.dk''': (Using page SelectiveHarvests->History->Run Number 0 ->JobID 1)
- Check that the 'Submit time', 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> harvest
- Click on "Browse reports for jobs"
- Check that you don't get any errors when you click on some of the links
- Click on "Browse harvest files for job"
- Check that you don't get any errors when you click on some of the links
- Click on "Browse only relevant crawl-log lines for domain netarkivet.dk"
- Check that you don't get any errors when you click on some of the links
- Check the following for the domain '''netarkivet.dk''': (Using page Harvest Status -> All jobs per domain)
- Check that the domain has been harvested by 2 jobs of the name <eh. name>
- Check that one of the jobs has configuration <eh. name>_frontpages__
- Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name>
- Check that one of the jobs has configuration <eh. name>_frontpages_plus_2levels__
- Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> harvest
- Check that 'Run number' and 'Job ID' columns contains positive numbers
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contains "Domain Completed"
- Check the following for the domain '''kaarefc.dk''': (Using page Harvest Status -> All jobs per domain)
- Check that the domain has been harvested by 1 job of the name <eh. name>
- Check that the job has configuration <eh. name>_frontpages_plus_2levels__
- Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> harvest
- Check that 'Run number' and 'Job ID' columns contains positive numbers
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contains "Domain Completed"
Browse in data from the first event harvest only
- Click 'Definitions'->'Selective Harvests' in the left menu
- Click 'History' in column 7 on the line with the event harvest <eh. name>
- Click 'Show jobs' in column 'Total number of jobs' on the line with 'Run number' 0
- Click 'Select these jobs for QA with viewerproxy' (it may take some time to create page)
- Check following in the 'Current Viewerproxy status'
- No errors are reported
- Check the "Currently does _not_ collect missing URLs." appear
- Check the "Current list of missing URLs contains 0 URLs."
- Check there is a line expressing index used from harvest <eh. name>, run 0 and built on jobs being looked at.
- Open a New tab or window in the browser (optionally, and in same kind of browser)
- Go to page http://www.netarkivet.dk
- Check that an error occurs saying that www.netarkivet.dk was not found (DOES NOT WORK: NAS-2076)
- Go to page http://www.kaarefc.dk
- Check that this page contains data
- Go to page http://www.kaarefc.dk/wop/
- This page should exist.
- Go to page http://indvandrerbiblioteket.dk
- Check that an error occurs saying that www.indvandrerbiblioteket.dk was not found
- Go to page http://kb-prod-udv-001.kb.dk/netarchivesuite/clock.php
- Check that a page containing date and time of the first harvest appears