Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Add the following to conf/settings_GUIApplication.xml in the commons common section:

Code Block
<settings>
<common>
<batch>
  <batchjobs>
    <batchjob>
      <class>dk.netarkivet.common.utils.batch.ChecksumJob</class>
      <jarfile/>
    </batchjob>
    <batchjob>
      <class>dk.netarkivet.common.utils.batch.FileListJob</class>
      <jarfile/>
    </batchjob>
    <batchjob>
      <class>batchjobs.MimeSearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
    <batchjob>
      <class>batchjobs.URLsearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
    <batchjob>
      <class>batchjobs.ContentSearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
    <batchjob>
      <class>batchjobs.UrlAndMimeSearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
  </batchjobs>
</batch>
</common>
</settings>

Restart the GUI: 

Code Block
conf/restart.sh

Go to the GUI and verify that the new batch jobs are available .in the Batch Overview page

Run all the BatchJobs on a snapshot harvest (settings the Job ID).

  1. Run the MimeSearch BatchJob with argument text/html and verify that the result is a list of html pages.
  2. Run URLsearch BatchJob with arguments '.*kb.*' . This should generate a list of the kb harvested ressources. 
  3. Run ContentSearch BatchJob with MimeType arguments 'text/html' and TextPattern  '.*statsbiblioteket.*". This should generate a list of html ressources the word {{statsbiblioteket}}.  Note: this operation will take a while to finish (about 10 minutes)
  4. Run UrlAndMimeSearch with argument 'image/.*' for mimetype and '.*kb\.dk/.*' for url. Verify that only images from the kb domain is listed.