...
- In the Heritrix GUI, change some parameters for the domain netarkivet.dk e.g. max-hops 15 and delay-factor 1.5
- Click on "Resume" on the Heritrix Console
- Confirm that the job is running again in the NAS System overview.
Restart The System
- Stop and Restart NAS. After some time, a job should appear in the state "Failed".
- Resubmit the job. A new job should be created.
- Wait for the job to finish.
Check the Overrides are Applied
When For the failed job is finished, go to the QA interface check the order template for the job as listed in the reports (or login to the bitarchive and look directly in the metadata arcfile). Check that the overrides are visible. The easiest way to do this is from test@kb-prod-udv-001:
Code Block |
---|
[test@kb-prod-udv-001 ~]$ ssh netarkiv@sb-test-bar-001 grep max-hops /netarkiv/0001/TEST2/filedir/<jobno>-metadata-1.arc [test@kb-prod-udv-001 ~]$ ssh netarkiv@sb-test-bar-001 grep delay-factor /netarkiv/0001/TEST2/filedir/<jobno>-metadata-1.arc |
Restart The System
...
...
Check that Alias Domains are not Harvested
...
Code Block |
---|
<map name="http-headers"> . <string name="user-agent">Mozilla/5.0 (compatible; heritrix/1.5.0-200506132127+http://netarkivet.dk/website/info.html)</string> <string name="from"> netarkivet-svar@netarkivet.dk </string> </map> |
This can be done by grepping with a command like
Code Block |
---|
[test@kb-prod-udv-001 ~]$ ssh netarkiv@sb-test-bar-001 grep netarkivet.dk /netarkiv/0001/TEST2/filedir/*-metadata-1.arc | grep -v 'metadata:' |
Check Byte Limits for the Second Harvest
...