Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

eu.planets.batch.jar -> eu.planets.JhoveArcJob 

 

Get some metadata from all records within the arc-files through Jhove.

...

For the dataformats which are not handled, these values are replaced by 'null', except mimetype which is replaced by the mimetype of the record. 

Run 

Code Block
java -cp lib/dk.netarkivet.archive.jar -Dsettings.common.applicationInstanceId=JHOVE -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml dk.netarkivet.archive.tools.RunBatch -Jbatchprogs/eu.planets.batch.jar,lib/jhove/jhove.jar,lib/jhove/jhove-module.jar -Neu.planets.JhoveArcJob -R.*.arc -Ooutput.jhove.arc

...

The output.jhove.arc file should be in the following format:

 

Code Block
0,HTML,null,text/html
null,null,null,image/png
null,null,null,text/css
0,GIF,null,image/gif
0,JPEG,null,image/jpeg
.....

...

This value should be approximately the same as the combined size of all the harvests.

eu.planets.batch.jar -> eu.planets.CopyArcContent: 'metadata'

Copy the content of the metadata files only.

 

Run

 

Code Block
java -cp lib/dk.netarkivet.archive.jar -Dsettings.common.applicationInstanceId=META -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/CopyArcContent.class -Ooutput.copy.meta -R'.*-metadata-.*' -BKBN

 

This should give the following output in the console:

Code Block
Running batch job 'batchprogs/CopyArcContent.class' on files matching '.*-metadata-.*' on replica 'KBN', output written to file 'output.copy.meta', errors written to stderr
Processed 2 files with 0 failures

eu.planets.batch.jar -> eu.planets.CopyArcContent: ’content’

Copy only the content from of the files collected in the harvest. This test assumes that the harvesters in the test system is in the '.dk' domain.

 

Run

 

Code Block
java -cp lib/dk.netarkivet.archive.jar -Dsettings.common.applicationInstanceId=CONTENT -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/CopyArcContent.class -Ooutput.copy.content -R'.*.dk.*' -BKBN

 

The regular expression should handle any files besides the metadata files, since they don't contain the sequence '.dk' in their name. This means that they handles all the other files, the 'content' files.

This should give the following output in the console:

 

Code Block
Running batch job 'batchprogs/CopyArcContent.class' on files matching '.*.dk.*' on replica 'KBN', output written to file 'output.copy.content', errors written to stderr
Processed 9 files with 0 failures

Check that the system properties can be read

 

Run

 

Code Block
java -cp lib/dk.netarkivet.archive.jar -Dsettings.common.applicationInstanceId=PROPS -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/SystemReaderJob.class -Ooutput.system -Eerror.system -R'.*.dk.*' -BSBN

 

Be aware of, that all metadata files are excluded.

Check that the error.system file is empty.

The output.system should contain system properties and a list of files. First the java version, then the operating system name, the operating system architecture and the operating system version. Then the list of files should be written, followed by a count of the files and the system property user name. It could look something like this:

 

Code Block
System properties!
java version: 1.6.0_24
os name: Linux
os architecture: i386
os version: 2.6.32-220.4.1.el6.x86_64
File: 1-1-20130114144130-00001-kb-test-har-001.kb.dk.arc
File: 1-1-20130114144130-00002-kb-test-har-001.kb.dk.arc
File: 1-1-20130114144130-00003-kb-test-har-001.kb.dk.arc
File: 1-1-20130114144130-00000-kb-test-har-001.kb.dk.arc
File count: 4
User: netarkiv