Check that the modularisation of the software allows for separate installation of the components.


  • Test and confirm the modularisation


needs the following deploy configurations:

  • kb-prod-udv-001:/home/devel/devel-config/deployment-configs/default_deploy_config_arcrepository.xml
  • kb-prod-udv-001:/home/devel/devel-config/deployment-configs/default_deploy_config_harvester.xml
  • deploy_config_viewerproxy.xml


Start a pure ArcRepository (default config with the harvester-apps deleted)

On devel@kb-prod-udv-001.kb.dk:

## Change to correct version as required
export VERSION=5.4-RC1
export H3ZIP=/home/devel/nas_versions/bundler/NetarchiveSuite-heritrix3-bundler-$VERSION.zip
export TESTX=TEST9
## Change to correct port 
export PORT=807?
## Change to correct mail-address
export MAILRECEIVERS=foo@bar.dk
export CONF=default_deploy_config_arcrepository.xml 
prepare_test.sh -d $CONF -v $VERSION -3 $H3ZIP


Check that the ArcRepository is Running

Go into the GUI at http://kb-test-adm-001.kb.dk:$PORT/BitPreservation . There should only be two site sections in the leftmost menu.

Click on "Bitpreservation" and update all flilelists and checksums. There should be no errors.

Click "System State" and check that there are no warnings or errors.

Upload a file to the ArcRepository

On devel@kb-test-adm-001:

export TESTX=TEST9
scp devel@kb-prod-udv-001.kb.dk:bitarchive_testdata/arcfiles/1-1-20130110140720-00000-kb-test-har-001.kb.dk.arc /tmp
export CLASSPATH=$CLASSPATH:$HOME/$TESTX/lib/netarchivesuite-archive-core.jar
export CLASSPATH=$CLASSPATH:$HOME/$TESTX/lib/netarchivesuite-monitor-core.jar
java -Ddk.netarkivet.settings.file=$HOME/$TESTX/conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=upload dk.netarkivet.archive.tools.Upload /tmp/1-1-20130110140720-00000-kb-test-har-001.kb.dk.arc 

One of the last lines written to screen should be

Uploading file '/tmp/1-1-20130110140720-00000-kb-test-har-001.kb.dk.arc' succeeded
All files processed, closing connection to ArcRepository

Run a Batch Job

  • In the GUI, got BitPreservation -> Batchjob Overview
  • Select "Checksum Job", Replica "KBN BITARCHIVE", Job Id = 1, and  "Both" and click "Execute batchjob" button

    And you will a page with header Executing batchjob with contents:
    Executing batchjob with the following parameters. 
    BatchJob name: dk.netarkivet.common.utils.batch.ChecksumJob
    Replica: KBN
    Regular expression: .*1-.*.*
  • Return to the Batchjob Overview. The job has probably already completed. Otherwise just refresh the page until it is done.
  • Download the output file and the error file.
  • The output file should look like

  • and the error file should look like

    Starting batchjob 'ChecksumJob' at time '1376383883971' on replica 'KB' with pattern '1-.*'.
    Successfully finished BatchJob 'ChecksumJob' on 1 files.
    BatchJob 'ChecksumJob' has failed on '0' files and has gotten '0' exceptions.
    Number of exceptions: 0

[This step is deprecated - csr] Download a Record from the ArcRepository

On kb-test-adm-001

export TESTX=TEST9
mkdir /tmp/$TESTX 
cd /tmp/$TESTX
scp -r test@kb-prod-udv-001.kb.dk:test-data/1-cache.tar.bz2 .
mkdir test-index
cd test-index
tar -xjf ../1-cache.tar.bz2
java -Ddk.netarkivet.settings.file=$HOME/$TESTX/conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=record dk.netarkivet.archive.tools.GetRecord /tmp/$TESTX/test-index http://www.pligtaflevering.dk/online/vejledning.pdf > x.pdf

Check that the file x.pdf contains an http header followed by a pdf.

Cleanup the ArcRepository

On devel@kb-prod-udv-001.kb.dk:


Install a pure Harvester

On kb-prod-udv-001.kb.dk as user 'devel'

## Change to correct version as required
export VERSION=5.4-RC1
export H3ZIP=/home/devel/nas_versions/bundler/NetarchiveSuite-heritrix3-bundler-$VERSION.zip
export TESTX=TEST9
## Change to correct port 
export PORT=807?
## Change to correct mail-address
export MAILRECEIVERS=foo@bar.dk
export CONF=default_deploy_config_harvester.xml 
prepare_test.sh -d $CONF -v $VERSION -3 $H3ZIP

Check that the web GUI shows sections for Harvest Definition, History, and Status and that the Status section shows no obvious errors.

Create a Pseudo-Cache for Job 1

Reason: To add a pseudocache for the single harvester, so it doesn't require an Indexserver, and also test our simple implementation of the JobIndexCache interface

On netarkdv@sb-test-har-001:

export TESTX=TEST9
scp devel@kb-prod-udv-001.kb.dk:test-data/1-cache.tar.bz2 /tmp
mkdir -p cache
cd cache
mkdir -p TrivialJobIndexCache
cd TrivialJobIndexCache

# extract 1-cache.tar.bz2 into dummy-cache
mkdir -p dummy-cache
cd dummy-cache
tar -xjf /tmp/1-cache.tar.bz2
cd $HOME/$TESTX/cache/TrivialJobIndexCache

# make symbolic links to the dummy index
# adds the possibility for running more than one job  
export ID="empty 1 2 3 4 "

for num in $ID
  ln -vs dummy-cache $num-DEDUP_CRAWL_LOG-cache

Perform a Harvest

Define and perform a harvest of netarkivet.dk with a 5MB limit.

Check the Harvest

There should be at least two warcfiles in netarkdv@sb-test-har-001.statsbiblioteket.dk:/home/netarkiv/$TESTX/localarchive .

Shutdown the Test

On kb-prod-udv-001 as user devel


Start a pure Viewerproxy

On kb-prod-udv-001:

## Change to correct version as required
export VERSION=5.4-RC1
export H3ZIP=/home/devel/nas_versions/bundler/NetarchiveSuite-heritrix3-bundler-$VERSION.zip
export TESTX=TEST9
## Change to correct port 
export PORT=807?
## Change to correct mail-address
export MAILRECEIVERS=foo@bar.dk
export CONF=default_deploy_config_viewerproxy.xml
prepare_test.sh -d $CONF -v $VERSION -3 $H3ZIP

Ignore errors during the database-setup process. The pure ViewerProxy doesn't use or need a database.

Add some data to the Viewerproxy

On kb-test-acs-001 as user devel:

export TESTX=TEST9
mkdir -p $HOME/$TESTX/cache/TrivialJobIndexCache/1-FULL_CRAWL_LOG-cache/
scp kb-prod-udv-001.kb.dk:test-data/1-cache.tar.bz2 $HOME/$TESTX/cache/TrivialJobIndexCache/1-FULL_CRAWL_LOG-cache/
cd $HOME/$TESTX/cache/TrivialJobIndexCache/1-FULL_CRAWL_LOG-cache/
tar -xjf 1-cache.tar.bz2
mkdir -p localarchive/
cd localarchive/
scp kb-prod-udv-001.kb.dk:test-data/1-1-* .

Test that the Viewerproxy works

On kb-prod-udv-001 as user devel:

wget --execute "http_proxy = http://kb-test-acs-001.kb.dk:$PORT" "http://netarchivesuite.viewerproxy.invalid/changeIndex?jobID=1&label=dummy&returnURL=http://localhost"

(Ignore the 404 error)

wget --execute "http_proxy = http://kb-test-acs-001.kb.dk:$PORT" "http://www.netarkivet.dk"

Check that this downloads a file which shows the frontpage of netarkivet.dk

Close down the Test
