Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note that a batchjob will only be sent to one bitarchive replica!

It is not possible to send batchjobs to checksum replicas.
Only , as only bitarchive replicas can
handle batchjobs.

Prerequisites for running a batch job

...

  • A Settings file must be present and must include declarations of at least the following setttings:
  • Replicas to identify the replica you want to communicate with:
  • settings.common.replicas in order for the batch program to identify and messages to the bitarchive.
  • settings.common.useReplicaId in order to determine default bitarchive replica to use.
  • Channel settings to be able to make channel names to communicate with running system:
  • settings.common.environmentName(typically PROD)
  • settings.common.applicationName(RunBatchApplication, but currently set automatically)
  • Other settings related to communication where the running systems settings differs from default.
  • Batch program: The batch program must be designed as a Java class that extend ARCBatchJob or FileBatchJob depending on whether you want to make a batch program over arc records or a batch program over files.
  • Call location: The RunBatch program can be started from any of the machines in the distributed system where the system runs.
  • Disk space requirement on bitarchive: The disk space needed will depend on the batch program concerned. As an example the ChecksumJob produces about 100 bytes per arc-file, whereas a batch program writing the full contents of arc-files would require as much space as the archive it self.
  • Class Path: Running RunBatch requires *lib/dk.netarkivet.archive.jar* in the class path
  • Memory space on bitarchive: The memory space needed will depend on the written batch program. If the batch program is written using a lot of jar files, these files will be needed to be kept in memory while the batch program is running, and on top of that comes the memory requirenments for the batch job it self.
  • Timeout on bitachive monitor: To set an specific timeout for a concrete BatchJob, its needed to override 'protected long batchJobTimeout = -1;' in FileBatchJob.java. Otherwise the default timeout is 14 days.

    Execution and Arguments

    The execution of a batch program is done by calling the *dk.netarkivet.archive.tools.RunBatch* program with the following arguments:

If the batch program is given in a single class file, this must be specified in the parameter:

  • *-C<classfile>* is a file containing a FileBatchJob/ARCBatchJob implementation
    If the batch program is given in one or more jar files, this must be specified in the parameters:
  • *-N<className>* is the name of the primary class to be loaded and executed as a FileBatchJob/ARCBatchJob implementation
  • *-J<jarfile>* is on or more files containing all the classes needed by the primary class. The files must be comma separated by commas.
    To specify which files the batch program must be executed on, the following parameters may be set optionally
  • -B<replica>* is the name of the bitarchive replica which the batchjob must be executed on. The default is the name of the bitarchive replica identified by the setting *settings.common.useReplicaId*. Note that it is the replica name and not replica id which are refered to here. Also it cannot be the name of a checksum replica, since batchjob can only be executed on bitarchive replicas.
  • -R<regexp>* is a regular expression that will be matched against file names in the archive. The default is *.**which means it will be executed on all files in the bitarchive replica.
    To specify output files from the batch program, the following parameters may be set optionally

...