Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
http://www.kaarefc.dk/

Verify that the harvest is activated and done

...

This page describes how to verify that a harvest is carried out correctly

  1. Click 'Harvest status'->'All Jobs' in the left menu
  2. Select "All" in "Only display job status" to the right from the menu
  3. Click the "Show" button, until the jobs have stepped through statuses "NEW", "SUBMITTED", "STARTED", "DONE"
  4. Wait until all jobs have got status "DONE"
  5. Check that you can search on Harvest name, start and end date
  6. Check that you can change number of rows to be displayed per page e.g. 1 and
  7. Check that you can press next and previous page and
  8. Check that the reset button resets all changes to default(note that the display value is also blanked, but is 100 by default)
  9. Check the following for the domains '''raeder.dk''' and '''kb.dk''': (Using page Harvest Status -> All jobs per domain)
  10. Check that the domain has been harvested by one job of the name <eh. name>
  11. Check that this job has configuration <eh. name>_frontpages__ __
  12. Check that there is a number for 'Run number' and 'Job ID'
  13. Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> harvest
  14. Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
  15. Check that the 'Stopped due to' columns contain "Domain Completed"
  16. Check the following job details for the domain '''netarkivet.dk''': (Using page SelectiveHarvests->History->Run Number 0 ->JobID 1)
  17. Check that the 'Submit time', 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> harvest
  18. Click on "Browse reports for jobs"
  19. Check that you don't get any errors when you click on some of the links
  20. Click on "Browse harvest files for job"
  21. Check that you don't get any errors when you click on some of the links
  22. Click on "Browse only relevant crawl-log lines for domain netarkivet.dk"
  23. Check that you don't get any errors when you click on some of the links
  24. Check the following for the domain '''netarkivet.dk''': (Using page Harvest Status -> All jobs per domain)
  25. Check that the domain has been harvested by 2 jobs of the name <eh. name>
  26. Check that one of the jobs has configuration <eh. name>_frontpages
  27. Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name>
  28. Check that one of the jobs has configuration <eh. name>_frontpages_plus_2levels__ __
  29. Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> harvest
  30. Check that 'Run number' and 'Job ID' columns contains positive numbers
  31. Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
  32. Check that the 'Stopped due to' columns contains "Domain Completed"
  33. Check the following for the domain '''kaarefc.dk''': (Using page Harvest Status -> All jobs per domain)
  34. Check that the domain has been harvested by 1 job of the name <eh. name>
  35. Check that the job has configuration <eh. name>_frontpages_plus_2levels
  36. Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> harvest
  37. Check that 'Run number' and 'Job ID' columns contains positive numbers
  38. Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
  39. Check that the 'Stopped due to' columns contains "Domain Completed"

Follow the schedule of the next job

  1. Click 'Definitions'->'Selective Harvests' in the left menu
  2. Check that the selective harvest <sh. name> is schedule to start in a week
  3. Check that the event harvest <eh. name> is schedule to start in approx. an hour
  4. Click on edit on the event harvest and override the next run time with current date and time + 5 min
  5. Click 'save'
  6. Click 'Harvest status'->'All Jobs' in the left menu after 5 min
  7. Check that two new event harvest <eh. name> job has been generated
  8. Check that NO new selective harvest <sh. name> job has been generated

Verify that the harvest is activated and done

  1. Click 'Harvest status'->'All Jobs' in the left menu
  2. Select "All" in "Only display job status" to the right from the menu
  3. Click the "Show" button, until the jobs have stepped through statuses "NEW", "SUBMITTED", "STARTED", "DONE"
  4. Wait until all jobs have got status "DONE"
  5. Check the following for the domains '''raeder.dk''' and '''kb.dk''': (Using page Harvest Status -> All jobs per domain)
  6. Check that the domain has been harvested by one job of the name &lt;eh. name&gt;
  7. Check that this job has configuration &lt;eh. name&gt;_frontpages__
  8. Check that there is a number for 'Run number' and 'Job ID'
  9. Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with &lt;eh. name&gt; harvest
  10. Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
  11. Check that the 'Stopped due to' columns contain "Domain Completed"
  12. Check the following job details for the domain '''netarkivet.dk''': (Using page SelectiveHarvests->History->Run Number 0 ->JobID 1)
  13. Check that the 'Submit time', 'Start time' and 'End time' columns approximately corresponds to time of test with &lt;eh. name&gt; harvest
  14. Click on "Browse reports for jobs"
  15. Check that you don't get any errors when you click on some of the links
  16. Click on "Browse harvest files for job"
  17. Check that you don't get any errors when you click on some of the links
  18. Click on "Browse only relevant crawl-log lines for domain netarkivet.dk"
  19. Check that you don't get any errors when you click on some of the links
  20. Check the following for the domain '''netarkivet.dk''': (Using page Harvest Status -> All jobs per domain)
  21. Check that the domain has been harvested by 2 jobs of the name &lt;eh. name&gt;
  22. Check that one of the jobs has configuration &lt;eh. name&gt;_frontpages__
  23. Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with &lt;eh. name&gt;
  24. Check that one of the jobs has configuration &lt;eh. name&gt;_frontpages_plus_2levels__
  25. Check that the 'Start time' and 'End time' approximately corresponds to time of test with &lt;eh. name&gt; harvest
  26. Check that 'Run number' and 'Job ID' columns contains positive numbers
  27. Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
  28. Check that the 'Stopped due to' columns contains "Domain Completed"
  29. Check the following for the domain '''kaarefc.dk''': (Using page Harvest Status -> All jobs per domain)
  30. Check that the domain has been harvested by 1 job of the name &lt;eh. name&gt;
  31. Check that the job has configuration &lt;eh. name&gt;_frontpages_plus_2levels__
  32. Check that the 'Start time' and 'End time' approximately corresponds to time of test with &lt;eh. name&gt; harvest
  33. Check that 'Run number' and 'Job ID' columns contains positive numbers
  34. Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
  35. Check that the 'Stopped due to' columns contains "Domain Completed"