...
- Check that netarkivet.dk and sulnudu.dk are not listed as being harvested.
- Check that neither domain appears in the order templates for any of the jobs with the (possible) exception of the following lines:
Code Block |
---|
<map name="http-headers"> . <string name="user-agent">Mozilla/5.0 (compatible; heritrix/1.5.0-200506132127+metadata.operatorContactUrl=http://netarkivet.dk/websitewebcrawler/info.html)</string> <string name="from"> netarkivet-svar@netarkivet.dk </string> </map> metadata.operatorFrom=info@netarkivet.dk |
This can be done by grepping with a command like
...
or by scp'ing the metadata file to kb-prod-udv-001 and inspecting it with "less". (Or just by displaying the order template in the NAS GUI and searching.)
Check Byte Limits for the Second Harvest
- Confirm that the stop reason "Max Bytes limit reached" or "Domain Completed" is given for all the domains included.
- Confirm that oernehoej.dk and statsbiblioteket.dk are not found in the "Domain" column in any of the jobs for the second run.
[ No longer valid. We now include DeDuplication in TEST2. Check that there was no Deduplication
- Goto the Job details page for the newly finished job by clicking the link in the JobID column.
- Click the
Browse reports for jobs
link. - Confirm that there was no DeDuplicator report, eg. verify the string duplicatereductionjob doesn't appear in the listed reports. ]
Stop the Test and Clean-Up
...