Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

 

 

Goals

 Test snaphots harvesting in detail and subsequent follow-up harvesting   

Prerequisites

None special.

Procedure

Prepare Installation

On test@kb-prod-udv-001.kb.dk:export TESTX=TEST2
export PORT=807?
export MAILRECEIVERS=foo@bar.dk
stop_test.sh
cleanup_all_test.sh
prepare_test.sh deploy_config_dedup_disabled.xml
install_test.sh
start_test.sh

Check Domain Statistics

Go to the GUI and

  1. Check that the installation has initially 17 domains loaded
  2. Check that you can search for domains by name or wildcard
  3. Check that you can create a new domain
  4. Add a new configuration to an existing domain. Confirm that it is listed in the domain definition.
  5. Add to a domain the list of crawlertraps at http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain . There should be no errors.

Global Crawler Traps

  1. Download the file http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain and upload it as a global-crawler-trap list.
  2. Download the list again from the GUI and compare it with original list. If necessary use "sort -u" to remove any duplicates and ensure that the two versions are ordered identically.

Update Byte Limits

Update the "Maximum number of bytes" for the defaultconfig for six domains as follows:

kb.dk100000
statsbiblioteket.dk100001
netarkivet.dk100002
dbc.dk100003
bs.dk100004
sulnudu.dk100005

Add Alias Domain

  1. Using the GUI, set netarkivet.dk to be an alias of kb.dk. Confirm that it is listed on the "Alias Summary" page.
  2. Now try to make dbc.dk an alias of netarkivet.dk. This should fail because chains of aliases are not allowed.

Start a Snapshot Harvest

  1. Create and activate a snapshot harvest with a "Max number of bytes per domain" of 1000000.
  2. Wait until HarvestJobManager on the Status page shows that jobs have been created for the harvest.

Check that Alias Domain is not Harvested

For each job generated, check that netarkivet.dk is not included in the domains harvested.

Check that Jobs Complete as Expected

Use the Harvest Status section of the GUI to monitor the jobs. When all jobs have finished, check each job in turn to see that the domains report their "Stopped due to" as follows:

 
 bs.dk                   Domain Completed
 kum.dk                  Max Bytes limit reached
 oernhoej.dk             Domain Completed
 drive-badmintonklub.dk  Max Bytes limit reached
 sulnudu.dk              Domain-config limit reached
 statsbiblioteket.dk     Domain-config limit reached
 kb.dk                   Domain-config limit reached
 raeder.dk               Max Bytes limit reached
 trinekc.dk              Max Bytes limit reached
 kaarefc.dk              Domain Completed
 kaareogtrine.dk         Max Bytes limit reached
 dbc.dk                  Max Bytes limit reached
 olsen2.dk               Domain Completed
 slothchristensen.dk     Max Bytes limit reached
 pligtaflevering.dk      Max Bytes limit reached
 trineogkaare.dk         Max Bytes limit reached
 sy-jonna.dk             Max Bytes limit reached

If any of these have a different reason, investigate to see if the new stop reason makes sense.

 

  • No labels