Appendix A - How-To Examples
Contents
Install the QuickStart according to the Quick Start Manual, e.g. in /home/test/QUICKSTART
- Add some domains to harvest using the the ADMGUI e.g. netarkivet.dk, kb.dk, statsbiblioteket.dk
- Create and run a snapshot with a byte limit of 100.000
- Wait until the job is done
Setup your browser for browsing and index your harvest job
cd /home/test/QUICKSTART/bitarkiv/filedir export CLASSPATH=/home/test/QUICKSTART/lib/netarchivesuite-common-core.jar export LOG=-Dlogback.configurationFile=/path/to/logback.xml ls
e.g.
Extract CDX:
export FILEONE=1-1-20090519083732-00002-dia-test-int-01.kb.dk.warc java $LOG dk.netarkivet.common.tools.ArchiveExtractCDX $FILEONE > output.cdx
Get Record using Lucene:
#e.g. an URI from the harvest found in your "viewerproxy" export URI=http://netarkivet.dk/index-da.php cd /home/test/QUICKSTART/cache/fullcrawllogindex cp -r 1-cache 1-cache.unzip cd 1-cache.unzip/ ls gunzip * export SETTINGSFILE=/home/test/QUICKSTART/conf/settings.xml export LUCENE_INDEX=/home/test/QUICKSTART/cache/fullcrawllogindex/1-cache.unzip export OPTS=-Ddk.netarkivet.settings.file=$SETTINGSFILE \ -Dsettings.common.remoteFile.port=5000 java $LOG $OPTS dk.netarkivet.archive.tools.GetRecord $LUCENE_INDEX $URI
Upload:
cd /home/test/QUICKSTART cp /home/test/QUICKSTART/bitarkiv/filedir/resulting.arc new_resulting.arc export SETTINGSFILE=/home/test/QUICKSTART/settings.xml export OPTS=-Ddk.netarkivet.settings.file=$SETTINGSFILE -Dsettings.common.remoteFile.port=5000 java $LOG $OPTS -cp /home/test/QUICKSTART/lib/netarchivesuite-archive-core.jar dk.netarkivet.archive.tools.Upload new_resulting.arc #just press <CTRL-C> to stop the job
Batch e.g. with checksum:
cd /home/test/QUICKSTART mkdir batchprogs #copy example batchprog ChecksumJob.java to batchprogs/. cd batchprogs javac ChecksumJob.java export SETTINGSFILE=/home/test/QUICKSTART/settings.xml java -cp lib/netarchivesuite-archive-core.jar -Dsettings.common.remoteFile.port=5000 \ $LOG -Ddk.netarkivet.settings.file=$SETTINGSFILE dk.netarkivet.archive.tools.RunBatch \ -Cbatchprogs/ChecksumJob.class -Ooutput.checksum