...
Seed list "<eh. name>_frontpages__" for domain =netarkivet.dk=
Code Block |
---|
http://netarkivet.dk/index-en.php http://netarkivet.dk/adgang-for-forskere/ |
Seed list "<eh. name>_frontpages_plus_2levels
Code Block |
---|
http://netarkivet.dk/index-en.php |
...
Code Block |
---|
http://www.kaarefc.dk/ |
Verify that the harvest is activated and done'
This page describes how to verify that a harvest is carried out correctly
- Click 'Harvest status'->'All Jobs' in the left menu
- Select "All" in "Only display job status" to the right from the menu
- Click the "Show" button, until the jobs have stepped through statuses "NEW", "SUBMITTED", "STARTED", "DONE"
- Wait until all jobs have got status "DONE"
- Check that you can search on Harvest name, start and end date
- Check that you can change number of rows to be displayed per page e.g. 1 and
- Check that you can press next and previous page and
- Check that the reset button resets all changes to default(note that the display value is also blanked, but is 100 by default)
- Check the following for the domains '''raeder.dk''' and '''kb.dk''': (Using page Harvest Status -> All jobs per domain)
- Check that the domain has been harvested by one job of the name <eh. name>
- Check that this job has configuration <eh. name>_frontpages__ __
- Check that there is a number for 'Run number' and 'Job ID'
- Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> harvest
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contain "Domain Completed"
- Check the following job details for the domain '''netarkivet.dk''': (Using page SelectiveHarvests->History->Run Number 0 ->JobID 1)
- Check that the 'Submit time', 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> harvest
- Click on "Browse reports for jobs"
- Check that you don't get any errors when you click on some of the links
- Click on "Browse harvest files for job"
- Check that you don't get any errors when you click on some of the links
- Click on "Browse only relevant crawl-log lines for domain netarkivet.dk"
- Check that you don't get any errors when you click on some of the links
- Check the following for the domain '''netarkivet.dk''': (Using page Harvest Status -> All jobs per domain)
- Check that the domain has been harvested by 2 jobs of the name <eh. name>
- Check that one of the jobs has configuration <eh. name>_frontpages
- Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name>
- Check that one of the jobs has configuration <eh. name>_frontpages_plus_2levels__ __
- Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> harvest
- Check that 'Run number' and 'Job ID' columns contains positive numbers
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contains "Domain Completed"
- Check the following for the domain '''kaarefc.dk''': (Using page Harvest Status -> All jobs per domain)
- Check that the domain has been harvested by 1 job of the name <eh. name>
- Check that the job has configuration <eh. name>_frontpages_plus_2levels
- Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> harvest
- Check that 'Run number' and 'Job ID' columns contains positive numbers
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contains "Domain Completed"