...
- Click 'Definitions'->'Find Domain(s)'
- Search for =netarkivet.dk= by writing this text and click 'Search'
- Check that the GUI returns a result-set of one, namely the domain =netarkivet.dk=
- Click on the link =netarkivet.dk=, and the page for domain =netarkivet.dk= should be shown without errors
- Click 'Edit' on the line configuration line for defaultconfig
- Check that Name is "defaultconfig"
- Check that Harvest template is "default_orderxml"
- Check that Maximum number of objects is "2,000" (in some languages (e.g. Danish) this is represented as 2.000
- Check that Maximum number of bytes is "500,000,000" (in some languages (e.g. Danish) this is represented as 500.000.000
...
- Make a new selective (event) harvest definition with a name you can remember
- Click 'Definitions'->'Selective Harvests' in the left menu
- Click 'Create new harvestdefinition' in the bottom of the main window
- Fill in the Harvest name and note the name for later use (from now referred as <eh. name> EH)
- Choose '''Once_an_hour''' in the drop down list for 'Schedule'
- Click Save (DO NOT CLICK ACTIVATE YET)
- Add seeds to the selective (event) harvest
- Click 'Edit' in column 6 on the line with the <eh. name> EH
- Write domain list from 'Seed list 1' given below to a file on your desktop e.g. notepad)
- Click 'Add seeds from a file' at the bottom of the main page
- Click 'Browse" and pick up the just created file with seeds
- Choose '''frontpages''' in the drop-down list for 'Harvest template' (set maxobjects pr domain to 500; max bytes to 400.000.000)
- Click 'Insert'
- Now click 'Add seeds'
- Choose '''frontpages_plus_2levels''' in the drop-down list for 'Harvest template'
- Write domain list from 'Seed list 2' given below (you can cut and paste from this page) (set maxobjects pr domain to 300; max bytes to 500.000.000)
- Click 'Insert'
- *Click 'Save'
- Check that seed lists for domains in Seed list 1 has changed correspondingly (You have to click on Show unused configurations/seedlists show all)
- For each of the domains =raeder.dk=, =netarkivet.dk= do:
- Click 'Definitions'->'Find Domain(s)'
- Search for domain by writing its name as text and click 'Search'
- Check that there exists a configuration with the name "<eh. name>EH_frontpages__" __"
- Check that there exists a seed list with the name "<eh. name>EH_frontpages
- Click 'Edit' in the line with seed list "<eh. name>EH_frontpages__" __",
- Check that the seed list shown corresponds to the seed list for the domain (see below)
- Check that seed lists for domains in Seed list 2 has changed correspondingly (you have to click on Show unused configurations/seedlists show all)
- For the domains =kaarefc.dk=, =netarkivet.dk= do:
- Click 'Definitions'->'Find Domain(s)'
- Search for =netarkivet.dk= for the domain by writing this text (either kaarefc.dk or netarkivet.dk) and click Search
- Check that there exists a configuration with name "<eh. name>name EH_frontpages_plus_2levels
- Check that there exists a seed list with the name "<eh. name>name EH_frontpages_plus_2levels__" __"
- Click 'Edit' in the line with seed list "<eh. name>list EH_frontpages_plus_2levels
- Check that the seed list shown corresponds to the seed list for the domain (see below)
- Activate the harvest
- Click 'Definitions'->'Selective Harvests' in the left menu
- Click 'Activate' in column 5 on the line with the <eh. name>
- Check harvest status of the event harvest using menu "All Jobs"
- Click 'Harvest status'->'All Jobs' in the left menu
- Select "All" in "Only display job status" to the rigth from the menu
- Click the "Show" button, until the <eh. name> appears in a new job line (approx. after a minute)
- Check that two jobs appears and that they both have Harvest name <eh. name>
- Check the menu "Running jobs", that the jobs appears and that you can go to the Heritrix GUI. by clicking on the host link and by using the login/password: "admin"/"adminPassword" and close the window again.
...
- Click 'Harvest status'->'All Jobs' in the left menu
- Select "All" in "Only display job status" to the right from the menu
- Click the "Show" button, until the jobs have stepped through statuses "NEW", "SUBMITTED", "STARTED", "DONE"
- Wait until all jobs have got status "DONE"
- Check that you can search on Harvest name, start and end date
- Check that you can change number of rows to be displayed per page e.g. 1 and
- Check that you can press next and previous page and
- Check that the reset button resets all changes to default(note that the display value is also blanked, but is 100 by default)
- Check the following for the domain '''raeder.dk''': (Using page Harvest Status -> All jobs per domain)
- Check that the domain has been harvested by one job of the name <eh. name>
- Check that this job has configuration <eh. name> EH_frontpages__ __
- Check that there is a number for 'Run number' and 'Job ID'
- Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> the EH harvest
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contain "Domain Completed"
- Check the following job details for the domain '''netarkivet.dk''': (Using page SelectiveHarvests->History->Run Number 0 ->JobID 1)
- Check that the 'Submit time', 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> EH harvest
- Click on "Browse reports for jobs"
- Check that you don't get any errors when you click on some of the links
- Click on "Browse harvest files for job"
- Check that you don't get any errors when you click on some of the links
- Click on "Browse only relevant crawl-log lines for domain netarkivet.dk"
- Check that you don't get any errors when you click on some of the links
- Check the following for the domain '''netarkivet.dk''': (Using page Harvest Status -> All jobs per domain)
- Check that the domain has been harvested by 2 jobs of the name <eh. name> EH
- Check that one of the jobs has configuration <eh. name> EH_frontpages
- Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> EH
- Check that one of the jobs has configuration <eh. name> EH_frontpages_plus_2levels__ __
- Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> EH harvest
- Check that 'Run number' and 'Job ID' columns contains positive numbers
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contains "Domain Completed"
- Check the following for the domain '''kaarefc.dk''': (Using page Harvest Status -> All jobs per domain)
- Check that the domain has been harvested by 1 job of the name <eh. name> EH
- Check that the job has configuration <eh. name> EH_frontpages_plus_2levels
- Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> EH harvest
- Check that 'Run number' and 'Job ID' columns contains positive numbers
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contains "Domain Completed"
...