...
Add the following to conf/settings_GUIApplication.xml in the commons common section:
Code Block |
---|
<settings> <common> <batch> <batchjobs> <batchjob> <class>dk.netarkivet.common.utils.batch.ChecksumJob</class> <jarfile/> </batchjob> <batchjob> <class>dk.netarkivet.common.utils.batch.FileListJob</class> <jarfile/> </batchjob> <batchjob> <class>batchjobs.MimeSearch</class> <jarfile>BatchJobs.jar</jarfile> </batchjob> <batchjob> <class>batchjobs.URLsearch</class> <jarfile>BatchJobs.jar</jarfile> </batchjob> <batchjob> <class>batchjobs.ContentSearch</class> <jarfile>BatchJobs.jar</jarfile> </batchjob> <batchjob> <class>batchjobs.UrlAndMimeSearch</class> <jarfile>BatchJobs.jar</jarfile> </batchjob> </batchjobs> </batch> </common> </settings> |
Restart the GUI:
Code Block |
---|
conf/restart.sh |
Go to the GUI and verify that the new batch jobs are available .in the Batch Overview page
Run all the BatchJobs on a snapshot harvest (settings the Job ID).
- Run the MimeSearch BatchJob with argument
text/html
and verify that the result is a list of html pages. - Run URLsearch BatchJob with arguments '
.*kb.*'
. This should generate a list of the kb harvested ressources. - Run ContentSearch BatchJob with MimeType arguments 'text/html' and TextPattern '.*statsbiblioteket.*". This should generate a list of
html
ressources the word {{statsbiblioteket}}. Note: this operation will take a while to finish (about 10 minutes) - Run UrlAndMimeSearch with argument 'image/.*' for mimetype and '.*kb\.dk/.*' for url. Verify that only images from the kb domain is listed.