TEST 9
Check that the modularisation of the software allows for separate installation of the components.
Goals
- Test and confirm the modularisation
Prerequisites
needs the following deploy configurations:
- kb-prod-udv-001:/home/devel/devel-config/deployment-configs/default_deploy_config_arcrepository.xml
- kb-prod-udv-001:/home/devel/devel-config/deployment-configs/default_deploy_config_harvester.xml
- deploy_config_viewerproxy.xml
Procedure
Start a pure ArcRepository (default config with the harvester-apps deleted)
On devel@kb-prod-udv-001.kb.dk:
## Change to correct version as required export VERSION=5.4-RC1 export H3ZIP=/home/devel/nas_versions/bundler/NetarchiveSuite-heritrix3-bundler-$VERSION.zip export TESTX=TEST9 ## Change to correct port export PORT=807? ## Change to correct mail-address export MAILRECEIVERS=foo@bar.dk export CONF=default_deploy_config_arcrepository.xml prepare_test.sh -d $CONF -v $VERSION -3 $H3ZIP install_test.sh start_test.sh
Check that the ArcRepository is Running
Go into the GUI at http://kb-test-adm-001.kb.dk:$PORT/BitPreservation . There should only be two site sections in the leftmost menu.
Click on "Bitpreservation" and update all flilelists and checksums. There should be no errors.
Click "System State" and check that there are no warnings or errors.
Upload a file to the ArcRepository
On devel@kb-test-adm-001:
export TESTX=TEST9 scp devel@kb-prod-udv-001.kb.dk:bitarchive_testdata/arcfiles/1-1-20130110140720-00000-kb-test-har-001.kb.dk.arc /tmp cd $HOME/$TESTX export CLASSPATH=$CLASSPATH:$HOME/$TESTX/lib/netarchivesuite-archive-core.jar export CLASSPATH=$CLASSPATH:$HOME/$TESTX/lib/netarchivesuite-monitor-core.jar java -Ddk.netarkivet.settings.file=$HOME/$TESTX/conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=upload dk.netarkivet.archive.tools.Upload /tmp/1-1-20130110140720-00000-kb-test-har-001.kb.dk.arc
One of the last lines written to screen should be
Uploading file '/tmp/1-1-20130110140720-00000-kb-test-har-001.kb.dk.arc' succeeded All files processed, closing connection to ArcRepository
Run a Batch Job
- In the GUI, got BitPreservation -> Batchjob Overview
Select "Checksum Job", Replica "KBN BITARCHIVE", Job Id = 1, and "Both" and click "Execute batchjob" button
And you will a page with header Executing batchjob with contents: Executing batchjob with the following parameters. BatchJob name: dk.netarkivet.common.utils.batch.ChecksumJob Replica: KBN Regular expression: .*1-.*.*
- Return to the Batchjob Overview. The job has probably already completed. Otherwise just refresh the page until it is done.
- Download the output file and the error file.
The output file should look like
1-1-20130110140720-00000-kb-test-har-001.kb.dk.arc##b898dc8bc244f8d5e65400b0c96ab5f2
and the error file should look like
Starting batchjob 'ChecksumJob' at time '1376383883971' on replica 'KB' with pattern '1-.*'. Successfully finished BatchJob 'ChecksumJob' on 1 files. BatchJob 'ChecksumJob' has failed on '0' files and has gotten '0' exceptions. Number of exceptions: 0
[This step is deprecated - csr] Download a Record from the ArcRepository
On kb-test-adm-001
export TESTX=TEST9 mkdir /tmp/$TESTX cd /tmp/$TESTX scp -r test@kb-prod-udv-001.kb.dk:test-data/1-cache.tar.bz2 . mkdir test-index cd test-index tar -xjf ../1-cache.tar.bz2 cd $HOME/$TESTX java -Ddk.netarkivet.settings.file=$HOME/$TESTX/conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=record dk.netarkivet.archive.tools.GetRecord /tmp/$TESTX/test-index http://www.pligtaflevering.dk/online/vejledning.pdf > x.pdf
Check that the file x.pdf contains an http header followed by a pdf.
Cleanup the ArcRepository
On devel@kb-prod-udv-001.kb.dk:
cleanup_all_test.sh
Install a pure Harvester
On kb-prod-udv-001.kb.dk as user 'devel'
## Change to correct version as required export VERSION=5.4-RC1 export H3ZIP=/home/devel/nas_versions/bundler/NetarchiveSuite-heritrix3-bundler-$VERSION.zip export TESTX=TEST9 ## Change to correct port export PORT=807? ## Change to correct mail-address export MAILRECEIVERS=foo@bar.dk export CONF=default_deploy_config_harvester.xml prepare_test.sh -d $CONF -v $VERSION -3 $H3ZIP install_test.sh start_test.sh
Check that the web GUI shows sections for Harvest Definition, History, and Status and that the Status section shows no obvious errors.
Create a Pseudo-Cache for Job 1
Reason: To add a pseudocache for the single harvester, so it doesn't require an Indexserver, and also test our simple implementation of the JobIndexCache interface
On netarkdv@sb-test-har-001:
export TESTX=TEST9 scp devel@kb-prod-udv-001.kb.dk:test-data/1-cache.tar.bz2 /tmp cd $HOME/$TESTX/ mkdir -p cache cd cache mkdir -p TrivialJobIndexCache cd TrivialJobIndexCache # extract 1-cache.tar.bz2 into dummy-cache mkdir -p dummy-cache cd dummy-cache tar -xjf /tmp/1-cache.tar.bz2 cd $HOME/$TESTX/cache/TrivialJobIndexCache # make symbolic links to the dummy index # adds the possibility for running more than one job export ID="empty 1 2 3 4 " for num in $ID do ln -vs dummy-cache $num-DEDUP_CRAWL_LOG-cache done
Perform a Harvest
Define and perform a harvest of netarkivet.dk with a 5MB limit.
Check the Harvest
There should be at least two warcfiles in netarkdv@sb-test-har-001.statsbiblioteket.dk:/home/netarkiv/$TESTX/localarchive .
Shutdown the Test
On kb-prod-udv-001 as user devel
cleanup_all_test.sh
Start a pure Viewerproxy
On kb-prod-udv-001:
## Change to correct version as required export VERSION=5.4-RC1 export H3ZIP=/home/devel/nas_versions/bundler/NetarchiveSuite-heritrix3-bundler-$VERSION.zip export TESTX=TEST9 ## Change to correct port export PORT=807? ## Change to correct mail-address export MAILRECEIVERS=foo@bar.dk export CONF=default_deploy_config_viewerproxy.xml prepare_test.sh -d $CONF -v $VERSION -3 $H3ZIP install_test.sh start_test.sh
Ignore errors during the database-setup process. The pure ViewerProxy doesn't use or need a database.
Add some data to the Viewerproxy
On kb-test-acs-001 as user devel:
export TESTX=TEST9 mkdir -p $HOME/$TESTX/cache/TrivialJobIndexCache/1-FULL_CRAWL_LOG-cache/ scp kb-prod-udv-001.kb.dk:test-data/1-cache.tar.bz2 $HOME/$TESTX/cache/TrivialJobIndexCache/1-FULL_CRAWL_LOG-cache/ cd $HOME/$TESTX/cache/TrivialJobIndexCache/1-FULL_CRAWL_LOG-cache/ tar -xjf 1-cache.tar.bz2 cd $HOME/$TESTX mkdir -p localarchive/ cd localarchive/ scp kb-prod-udv-001.kb.dk:test-data/1-1-* .
Test that the Viewerproxy works
On kb-prod-udv-001 as user devel:
wget --execute "http_proxy = http://kb-test-acs-001.kb.dk:$PORT" "http://netarchivesuite.viewerproxy.invalid/changeIndex?jobID=1&label=dummy&returnURL=http://localhost"
(Ignore the 404 error)
wget --execute "http_proxy = http://kb-test-acs-001.kb.dk:$PORT" "http://www.netarkivet.dk"
Check that this downloads a file which shows the frontpage of netarkivet.dk
Close down the Test
cleanup_all_test.sh