...
Login to the indexserver (kb-test-acs-001.kb.dk)
Code Block $ ssh devel@kb-test-acs-001 $ export TESTX=TEST11A $ cd ${TESTX}
Create dir for batchprograms and copy contents of /home/devel/backup_TEST11A/batchprogs to /home/devel/TEST11A/
Code Block $ mkdir batchprogs $ cp -av /home/devel/backup_TEST11A/batchprogs batchprogs/.
Run batch programs on archived files
...
This is run on the KBN replica, which has four bitapps (according to the default configuration file 'deploy_config_multi_bitapps.xml' ). Therefore the sentence 'Legal' is written four times.
...
This is run on the SBN replica, which has only one bitapp (according to the default configuration file'deploy_config_multi_bitapps.xml'). Therefore the sentence 'Legal' is written once.
...
Code Block |
---|
org.archive.io.arc.ARCRecord@f16070 available: q:\bitarkiv\JOLF\filedir\1-1-20090316092641-00001-kb-test-har-002.kb.dk.arc: {ip-address=0.0.0.0, content-type=text/plain, absolute-offset=0, subject-uri=filedesc://1-1-20090316092641-00001-kb-test-har-002.kb.dk.arc.open, length=1343, creation-date=20090316092641, version=1.1} |
Furthermore, a list of failed files should be printed to stdout (all of them WARC files).
Running methods from jar files, including Jhove based methods
Before running any Jhove methods, run the following command:
Code Block |
---|
scpcp -r test@kb-prod-udv-001.kb.dk:lib/jhove lib/.av /home/devel/backup_TEST11A/lib_jhove/* /home/devel/TEST11A/lib |
eu.planets.batch.jar -> eu.planets.JhoveArcJob
...
Code Block |
---|
java -cp lib/netarchivesuite-archive-core.jar -Dsettings.common.applicationInstanceId=JHOVE -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml dk.netarkivet.archive.tools.RunBatch -Jbatchprogs/eu.planets.batch.jar,lib/jhove/jhove.jar,lib/jhove/jhove-module.jar -Neu.planets.JhoveArcJob -R.*\.arc -Ooutput.jhove.arc |
...
Code Block |
---|
0,HTML,null,text/html null,null,null,image/png null,null,null,text/css 0,GIF,null,image/gif 0,JPEG,null,image/jpeg ..... |
Furthermore, a list of failed files should be printed to stdout (all of them WARC files).
Check the content of metadata files and 'content'
...
This value should be approximately the same as the combined size of all the harvests.
Furthermore, a list of failed files should be printed to stdout (all of them WARC files).
Test a WARC Batch Job
Code Block |
---|
java -Dsettings.common.applicationInstanceId=DEDUP -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml -cp lib/netarchivesuite-archive-core.jar dk.netarkivet.archive.tools.RunBatch -Ndk.netarkivet.common.utils.cdx.WARCExtractCDXJob -Jlib/netarchivesuite-wayback-indexer.jar -R'.*dk.*\.warc' -BSBN -Ocdx.warc.all.output |
Check that the output file (cdx.warc.all.output
) has a significant amount of content.
eu.planets.batch.jar -> eu.planets.CopyArcContent: 'metadata'
...