Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Table of Contents
outlinetrue

 

Goals

 

Excerpt

Test snaphots harvesting in detail and subsequent follow-up harvesting   

Table of Contents
outlinetrue

 

 

Prerequisites

None special.

...

  1. Download the file http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain and upload it as a global-crawler-trap list.
  2. Download the list again from the GUI and compare it with original list. If necessary use "sort -u" to remove any duplicates and ensure that the two versions are ordered identically.

Update Byte Limits

Update the "Maximum number of bytes" for the defaultconfig for six domains as follows:

kb.dk100000
statsbiblioteket.dk100001
netarkivet.dk100002
dbc.dk100003
bs.dk100004
sulnudu.dk100005

Add Alias Domain

  1. Using the GUI, set netarkivet.dk to be an alias of kb.dk. Confirm that it is listed on the "Alias Summary" page.
  2. Now try to make dbc.dk an alias of netarkivet.dk. This should fail because chains of aliases are not allowed.

Start a Snapshot Harvest

  1. Create and activate a snapshot harvest with a "Max number of bytes per domain" of 1000000.
  2. Wait until HarvestJobManager on the Status page shows that jobs have been created for the harvest.

Check that Alias Domain is not Harvested

For each job generated, check that netarkivet.dk is not included in the domains harvested.

Check that Jobs Complete as Expected

Use the Harvest Status section of the GUI to monitor the jobs. When all jobs have finished, check each job in turn to see that the domains report their "Stopped due to" as follows:

 
 bs.dk                   Domain Completed
 kum.dk                  Max Bytes limit reached
 oernhoej.dk             Domain Completed
 drive-badmintonklub.dk  Max Bytes limit reached
 sulnudu.dk              Domain-config limit reached
 statsbiblioteket.dk     Domain-config limit reached
 kb.dk                   Domain-config limit reached
 raeder.dk               Max Bytes limit reached
 trinekc.dk              Max Bytes limit reached
 kaarefc.dk              Domain Completed
 kaareogtrine.dk         Max Bytes limit reached
 dbc.dk                  Max Bytes limit reached
 olsen2.dk               Domain Completed
 slothchristensen.dk     Max Bytes limit reached
 pligtaflevering.dk      Max Bytes limit reached
 trineogkaare.dk         Max Bytes limit reached
 sy-jonna.dk             Max Bytes limit reached

If any of these have a different reason, investigate to see if the new stop reason makes sense.