Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Test basic start and stop of selectiv-,event- and snapshot harvesting, scheduling and deduplication.

Table of Contents

Check of monitoring and basic settings

...

Verify that the harvest is activated and done

  1. Click 'Harvest status'->'All Jobs' in the left menu
  2. Select "All" in "Only display job status" to the right from the menu
  3. Click the "Show" button, until the jobs have stepped through statuses "NEW", "SUBMITTED", "STARTED", "DONE"
  4. Wait until all jobs have got status "DONE"
  5. Check the following for the domains '''raeder.dk''' and '''kb.dk''': (Using page Harvest Status -> All jobs per domain)
  6. Check that the domain has been harvested by one job of the name <eh. name>
  7. Check that this job has configuration <eh. name>_frontpages__
  8. Check that there is a number for 'Run number' and 'Job ID'
  9. Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> harvest
  10. Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
  11. Check that the 'Stopped due to' columns contain "Domain Completed"
  12. Check the following job details for the domain '''netarkivet.dk''': (Using page SelectiveHarvests->History->Run Number 0 ->JobID 1)
  13. Check that the 'Submit time', 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> harvest
  14. Click on "Browse reports for jobs"
  15. Check that you don't get any errors when you click on some of the links
  16. Click on "Browse harvest files for job"
  17. Check that you don't get any errors when you click on some of the links
  18. Click on "Browse only relevant crawl-log lines for domain netarkivet.dk"
  19. Check that you don't get any errors when you click on some of the links
  20. Check the following for the domain '''netarkivet.dk''': (Using page Harvest Status -> All jobs per domain)
  21. Check that the domain has been harvested by 2 jobs of the name <eh. name>
  22. Check that one of the jobs has configuration <eh. name>_frontpages__
  23. Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name>
  24. Check that one of the jobs has configuration <eh. name>_frontpages_plus_2levels__
  25. Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> harvest
  26. Check that 'Run number' and 'Job ID' columns contains positive numbers
  27. Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
  28. Check that the 'Stopped due to' columns contains "Domain Completed"
  29. Check the following for the domain '''kaarefc.dk''': (Using page Harvest Status -> All jobs per domain)
  30. Check that the domain has been harvested by 1 job of the name <eh. name>
  31. Check that the job has configuration <eh. name>_frontpages_plus_2levels__
  32. Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> harvest
  33. Check that 'Run number' and 'Job ID' columns contains positive numbers
  34. Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
  35. Check that the 'Stopped due to' columns contains "Domain Completed"

Browse in data from the first event harvest only

  1. Click 'Definitions'->'Selective Harvests' in the left menu
  2.  Click 'History' in column 7 on the line with the event harvest <eh. name>
  3. Click 'Show jobs' in column 'Total number of jobs' on the line with 'Run number' 0
  4. Click 'Select these jobs for QA with viewerproxy' (it may take some time to create page)
  5. Check following in the 'Current Viewerproxy status'
  6. No errors are reported
  7. Check the "Currently does _not_ collect missing URLs." appear
  8. Check the "Current list of missing URLs contains 0 URLs."
  9. Check there is a line expressing index used from harvest <eh. name>, run 0 and built on jobs being looked at.
  10. Open a New tab or window in the browser (optionally, and in same kind of browser)
  11. Go to page http://www.netarkivet.dk
  12. Check that an error occurs saying that www.netarkivet.dk was not found (DOES NOT WORK: NAS-2076)
  13. Go to page http://www.kaarefc.dk
  14. Check that this page contains data
  15. Go to page http://www.kaarefc.dk/wop/
  16. This page should exist.
  17. Go to page http://indvandrerbiblioteket.dk
  18. Check that an error occurs saying that www.indvandrerbiblioteket.dk was not found
  19. Go to page http://kb-prod-udv-001.kb.dk/netarchivesuite/clock.php
  20. Check that a page containing date and time of the first harvest appears