...
Excerpt |
---|
Verifies the Batch GUI functionality |
Standard functionality
- Go to the 'Bitpreservation' -> 'Batchjob Overview' page.
- Run the FilelistJob for 'JobID = 1' and 'filetype = Both'. Verify that only filenames starting with 1- are included in the output
- Run the ChecksumJob for 'JobID = .*' and 'filetype = Both'. Verify that both arc and warc files are included in the output.
Adding new BatchJobs
Install new BatchJobs on kb-test-adm-001.kb.dk: (Build them using the recipe on kb-test-adm-001.kb.dk:/home/devel/batch/NetarchiveSuiteBatchprograms/src/README.txt
Code Block |
---|
ssh kb-test-adm-001.kb.dk
export TESTX=TEST11A
# build the BatchJobs.jar from the recipe on kb-test-adm-001.kb.dk:/home/devel/batch/NetarchiveSuiteBatchprograms/src/README.txt
export BATCHJOBS_JAR=/home/devel/batch/NetarchiveSuiteBatchprograms/BatchJobs-<timestamp>.jar
cd /home/devel/${TESTX}/
cp -pv $BATCHJOBS_JAR /home/devel/${TESTX}/BatchJobs.jar |
Add the following to conf/settings_GUIApplication.xml in the common section:
Code Block |
---|
<settings>
<common>
<batch>
<batchjobs>
<batchjob>
<class>dk.netarkivet.common.utils.batch.ChecksumJob</class>
<jarfile/>
</batchjob>
<batchjob>
<class>dk.netarkivet.common.utils.batch.FileListJob</class>
<jarfile/>
</batchjob>
<batchjob>
<class>batchjobs.MimeSearch</class>
<jarfile>BatchJobs.jar</jarfile>
</batchjob>
<batchjob>
<class>batchjobs.URLsearch</class>
<jarfile>BatchJobs.jar</jarfile>
</batchjob>
<batchjob>
<class>batchjobs.ContentSearch</class>
<jarfile>BatchJobs.jar</jarfile>
</batchjob>
<batchjob>
<class>batchjobs.UrlAndMimeSearch</class>
<jarfile>BatchJobs.jar</jarfile>
</batchjob>
</batchjobs>
</batch>
</common>
</settings> |
Restart the GUI:
Code Block |
---|
conf/restart.sh |
Go to the GUI and verify that the new batch jobs are available in the Batch Overview page
Run all the BatchJobs on a snapshot harvest (settings the Job ID).
- Run the MimeSearch BatchJob with argument
text/html
and verify that the result is a list of html pages. - Run URLsearch BatchJob with arguments '
.*kb.*'
. This should generate a list of the kb harvested ressources. - Run ContentSearch BatchJob with MimeType arguments 'text/html' and TextPattern '.*statsbiblioteket.*". This should generate a list of
html
ressources the word {{statsbiblioteket}}. Note: this operation will take a while to finish (about 10 minutes) - Run UrlAndMimeSearch with argument 'image/.*' for mimetype and '.*kb\.dk/.*' for url. Verify that only images from the kb domain is listed.