...
- Make a new selective (event) harvest definition with a name you can remember
- Click 'Definitions'->'Selective Harvests' in the left menue
- Click 'Create new harvestdefinition' in the bottom of the main window
- Fill in the Harvest name and note the name for later use (from now referred as EH)
- Choose '''Once_an_hour''' in the drop down list for 'Schedule'
- Click Save (DO NOT CLICK ACTIVATE YET)
- Add seeds to the selective (event) harvest
- Click 'Edit' in column 6 on the line with the EH
- Write domain list from 'Seed list 1' given below to a file on your desktop e.g. notepad)
- Click 'Add seeds from a file' at the bottom of the main page
- Click 'Browse" and pick up the just created file with seeds
- Choose default_orderxml in the drop-down list for 'Harvest template' (set maxobjects pr domain to 500; max bytes to 400.000.000, maxhops to 0, obey robots.txt? unchecked and extract_javascript checked) [previously used template frontpages]
- Click 'Insert'
- Now click 'Add seeds'
- Choose default_orderxml in the drop-down list for 'Harvest template'
- Write domain list from 'Seed list 2' given below (you can cut and paste from this page) (set maxobjects pr domain to 300; max bytes to 500.000.000, maxhops to 2, obey robots.txt? unchecked and extract_javascript checked) [previously used template frontpages_2levels]
- Click 'Insert'
- *Click 'Save'
- Check that seed lists for domains in Seed list 1 has changed correspondingly
- For each of the domains raeder.dk, netarkivet.dk do:
- Click 'Definitions'->'Find Domain(s)'
- Search for domain by writing its name as text and click 'Search'
- Click the domain name link and on this 'Edit domain' page make sure to click on 'show unused configurations' /and 'show unused seedlists')
- Check that there exists a configuration with the name "EH_default_orderxml_400000000Bytes_500Objects" (verify that the config has maxHops=0, obey robots unchecked, extract javascript checked)
- Check that there exists a seed list with the name "EH_default_orderxml_400000000Bytes_500Objects
- Click 'Edit' in the line with seed list "EH_default_orderxml_400000000Bytes_500Objects
- Check that the seed list shown corresponds to the seed list for the domain (see below)
- Check that seed lists for domains in Seed list 2 has changed correspondingly (you have to click on Show unused configurations/seedlists show all)
- For the domains kaarefc.dk, netarkivet.dk do:
- Click 'Definitions'->'Find Domain(s)'
- Search for the domain by writing this text (either kaarefc.dk or netarkivet.dk) and click Search
- Check that there exists a configuration with name EH_default_orderxml_500000000Bytes_300Objects (verify that the config has maxHops=2)
- Check that there exists a seed list with the name EH_default_orderxml_500000000Bytes_300Objects
- Click 'Edit' in the line with seed list EH_default_orderxml_500000000Bytes_300Objects
- Check that the seed list shown corresponds to the seed list for the domain (see below)
- Activate the harvest
- Click 'Definitions'->'Selective Harvests' in the left menu
- Click 'Activate' in column 5 on the line with the <eh. name>
- Check harvest status of the event harvest using menu "All Jobs"
- Click 'Harvest status'->'All Jobs' in the left menu
- Select "All" in "Only display job status" to the right from the menu
- Click the "Show" button, until the <eh. name> appears in a new job line (approx. after a minute)
- Check that two jobs appears and that they both have Harvest name <eh. name>
- Check the menu "Running jobs", that the jobs appears and that you can go to H3 Remote Access and monitor the jobs progress e.g. by viewing the cached crawl-log.
...