...
eu.planets.batch.jar -> eu.planets.JhoveArcJob
Get some metadata from all records within the arc-files through Jhove.
...
For the dataformats which are not handled, these values are replaced by 'null', except mimetype
which is replaced by the mimetype of the record.
Run
Code Block |
---|
java -cp lib/dk.netarkivet.archive.jar -Dsettings.common.applicationInstanceId=JHOVE -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml dk.netarkivet.archive.tools.RunBatch -Jbatchprogs/eu.planets.batch.jar,lib/jhove/jhove.jar,lib/jhove/jhove-module.jar -Neu.planets.JhoveArcJob -R.*.arc -Ooutput.jhove.arc |
...
The output.jhove.arc file should be in the following format:
Code Block |
---|
0,HTML,null,text/html null,null,null,image/png null,null,null,text/css 0,GIF,null,image/gif 0,JPEG,null,image/jpeg ..... |
...
This value should be approximately the same as the combined size of all the harvests.
eu.planets.batch.jar -> eu.planets.CopyArcContent: 'metadata'
Copy the content of the metadata files only.
Run
Code Block |
---|
java -cp lib/dk.netarkivet.archive.jar -Dsettings.common.applicationInstanceId=META -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/CopyArcContent.class -Ooutput.copy.meta -R'.*-metadata-.*' -BKBN |
This should give the following output in the console:
Code Block |
---|
Running batch job 'batchprogs/CopyArcContent.class' on files matching '.*-metadata-.*' on replica 'KBN', output written to file 'output.copy.meta', errors written to stderr
Processed 2 files with 0 failures |
eu.planets.batch.jar -> eu.planets.CopyArcContent: ’content’
Copy only the content from of the files collected in the harvest. This test assumes that the harvesters in the test system is in the '.dk' domain.
Run
Code Block |
---|
java -cp lib/dk.netarkivet.archive.jar -Dsettings.common.applicationInstanceId=CONTENT -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/CopyArcContent.class -Ooutput.copy.content -R'.*.dk.*' -BKBN |
The regular expression should handle any files besides the metadata files, since they don't contain the sequence '.dk' in their name. This means that they handles all the other files, the 'content' files.
This should give the following output in the console:
Code Block |
---|
Running batch job 'batchprogs/CopyArcContent.class' on files matching '.*.dk.*' on replica 'KBN', output written to file 'output.copy.content', errors written to stderr
Processed 9 files with 0 failures |
Check that the system properties can be read
Run
Code Block |
---|
java -cp lib/dk.netarkivet.archive.jar -Dsettings.common.applicationInstanceId=PROPS -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/SystemReaderJob.class -Ooutput.system -Eerror.system -R'.*.dk.*' -BSBN |
Be aware of, that all metadata files are excluded.
Check that the error.system file is empty.
The output.system should contain system properties and a list of files. First the java version, then the operating system name, the operating system architecture and the operating system version. Then the list of files should be written, followed by a count of the files and the system property user name. It could look something like this:
Code Block |
---|
System properties!
java version: 1.6.0_24
os name: Linux
os architecture: i386
os version: 2.6.32-220.4.1.el6.x86_64
File: 1-1-20130114144130-00001-kb-test-har-001.kb.dk.arc
File: 1-1-20130114144130-00002-kb-test-har-001.kb.dk.arc
File: 1-1-20130114144130-00003-kb-test-har-001.kb.dk.arc
File: 1-1-20130114144130-00000-kb-test-har-001.kb.dk.arc
File count: 4
User: netarkiv |