Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Progression/Queues
  • Crawllog: check cache update, filtering, paging
  • Reports: click on at least two of them. Check that the Processors report shows that deduplication has been disabled.
  • Show/delete frontier: delete some items from the frontier
  • Add RejectRules: add a new rule
  • Modify budget: add an object limit to some domain or subdomain

Now restart (unpause) the job. Then immediatiely ...  

...

  1. Check that netarkivet.dk and sulnudu.dk are not listed as being harvested.
  2. Check that neither domain appears in the order templates harvest template's crawler beans for any of the jobs with the (possible) exception of the following lines:

...

Code Block
[devel@kb-prod-udv-001 ~]$ ssh netarkiv@sbnetarkdv@sb-test-bar-001.statsbiblioteket.dk grep netarkivet.dk /netarkiv/0001/TEST2/filedir/*-metadata-1.warc | grep -v 'metadata:'

...