Note that this documentation is for the old 5.55 release.
For the newest documentation, please see the current release documentation.

Appendix A - How-To Examples

Contents

Install the QuickStart according to the Quick Start Manual, e.g. in /home/test/QUICKSTART

  • Add some domains to harvest using the the ADMGUI e.g. netarkivet.dk, kb.dk, statsbiblioteket.dk
  • Create and run a snapshot with a byte limit of 100.000
  • Wait until the job is done
  • Setup your browser for browsing and index your harvest job

    cd /home/test/QUICKSTART/bitarkiv/filedir
    export CLASSPATH=/home/test/QUICKSTART/lib/netarchivesuite-common-core.jar
    export LOG=-Dlogback.configurationFile=/path/to/logback.xml
    ls 

    e.g.

Extract CDX:

export FILEONE=1-1-20090519083732-00002-dia-test-int-01.kb.dk.warc
java $LOG dk.netarkivet.common.tools.ArchiveExtractCDX $FILEONE > output.cdx 

Get Record using Lucene:

#e.g. an URI from the harvest found in your "viewerproxy"
export URI=http://netarkivet.dk/index-da.php
cd /home/test/QUICKSTART/cache/fullcrawllogindex
cp -r 1-cache 1-cache.unzip
cd 1-cache.unzip/
ls
gunzip *
export SETTINGSFILE=/home/test/QUICKSTART/conf/settings.xml
export LUCENE_INDEX=/home/test/QUICKSTART/cache/fullcrawllogindex/1-cache.unzip
export OPTS=-Ddk.netarkivet.settings.file=$SETTINGSFILE \
      -Dsettings.common.remoteFile.port=5000 
java $LOG $OPTS dk.netarkivet.archive.tools.GetRecord $LUCENE_INDEX $URI

Upload:

cd /home/test/QUICKSTART
cp /home/test/QUICKSTART/bitarkiv/filedir/resulting.arc new_resulting.arc
export SETTINGSFILE=/home/test/QUICKSTART/settings.xml
export OPTS=-Ddk.netarkivet.settings.file=$SETTINGSFILE -Dsettings.common.remoteFile.port=5000
java $LOG $OPTS -cp /home/test/QUICKSTART/lib/netarchivesuite-archive-core.jar dk.netarkivet.archive.tools.Upload new_resulting.arc
#just press <CTRL-C> to stop the job

Batch e.g. with checksum:

cd /home/test/QUICKSTART
mkdir batchprogs

#copy example batchprog ChecksumJob.java to  batchprogs/.
cd batchprogs
javac ChecksumJob.java
export SETTINGSFILE=/home/test/QUICKSTART/settings.xml
java -cp lib/netarchivesuite-archive-core.jar -Dsettings.common.remoteFile.port=5000 \
$LOG -Ddk.netarkivet.settings.file=$SETTINGSFILE dk.netarkivet.archive.tools.RunBatch \
-Cbatchprogs/ChecksumJob.class -Ooutput.checksum

ChecksumJob.java

Â