Table of Contents | ||
---|---|---|
|
Goals
Excerpt |
---|
Test snaphots harvesting in detail and subsequent follow-up harvesting |
Table of Contents | ||
---|---|---|
|
|
Prerequisites
None special.
...
- Download the file http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain and upload it as a global-crawler-trap list.
- Download the list again from the GUI and compare it with original list. If necessary use "sort -u" to remove any duplicates and ensure that the two versions are ordered identically.
Update Byte Limits
Update the "Maximum number of bytes" for the defaultconfig for six domains as follows:
kb.dk | 100000 |
---|---|
statsbiblioteket.dk | 100001 |
netarkivet.dk | 100002 |
dbc.dk | 100003 |
bs.dk | 100004 |
sulnudu.dk | 100005 |
Add Alias Domain
- Using the GUI, set netarkivet.dk to be an alias of kb.dk. Confirm that it is listed on the "Alias Summary" page.
- Now try to make dbc.dk an alias of netarkivet.dk. This should fail because chains of aliases are not allowed.
Start a Snapshot Harvest
- Create and activate a snapshot harvest with a "Max number of bytes per domain" of 1000000.
- Wait until HarvestJobManager on the Status page shows that jobs have been created for the harvest.
Check that Alias Domain is not Harvested
For each job generated, check that netarkivet.dk is not included in the domains harvested.
Check that Jobs Complete as Expected
Use the Harvest Status section of the GUI to monitor the jobs. When all jobs have finished, check each job in turn to see that the domains report their "Stopped due to" as follows:
bs.dk Domain Completed kum.dk Max Bytes limit reached oernhoej.dk Domain Completed drive-badmintonklub.dk Max Bytes limit reached sulnudu.dk Domain-config limit reached statsbiblioteket.dk Domain-config limit reached kb.dk Domain-config limit reached raeder.dk Max Bytes limit reached trinekc.dk Max Bytes limit reached kaarefc.dk Domain Completed kaareogtrine.dk Max Bytes limit reached dbc.dk Max Bytes limit reached olsen2.dk Domain Completed slothchristensen.dk Max Bytes limit reached pligtaflevering.dk Max Bytes limit reached trineogkaare.dk Max Bytes limit reached sy-jonna.dk Max Bytes limit reached
If any of these have a different reason, investigate to see if the new stop reason makes sense.