Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Check that netarkivet.dk and sulnudu.dk are not listed as being harvested.
  2. Check that neither domain appears in the order templates for any of the jobs with the (possible) exception of the following lines:
Code Block
<map name="http-headers">
 . <string name="user-agent">Mozilla/5.0 (compatible; heritrix/1.5.0-200506132127+metadata.operatorContactUrl=http://netarkivet.dk/websitewebcrawler/info.html)</string> <string name="from"> netarkivet-svar@netarkivet.dk </string>
</map>
metadata.operatorFrom=info@netarkivet.dk

This can be done by grepping with a command like

...

or by scp'ing the metadata file to kb-prod-udv-001 and inspecting it with "less". (Or just by displaying the order template in the NAS GUI and searching.)

Check Byte Limits for the Second Harvest

  1. Confirm that the stop reason "Max Bytes limit reached" or "Domain Completed" is given for all the domains included.
  2. Confirm that oernehoej.dk and statsbiblioteket.dk are not found in the "Domain" column in any of the jobs for the second run.

[ No longer valid. We now include DeDuplication in TEST2. Check that there was no Deduplication

  1. Goto the Job details page for the newly finished job by clicking the link in the JobID column.
  2. Click the Browse reports for jobs link.
  3. Confirm that there was no DeDuplicator report, eg. verify the string duplicatereductionjob doesn't appear in the listed reports.  ]

Stop the Test and Clean-Up

...