Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

(In the above cases, <redirect-url> is empty, which it will always be unless there is a 3xx http code.)

Run CDX Job On WARC Files

Code Block
java -Dsettings.common.applicationInstanceId=DEDUP -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml -cp lib/dk.netarkivet.archive.jar dk.netarkivet.archive.tools.RunBatch -Ndk.netarkivet.wayback.batch.DeduplicationCDXExtractionBatchJob -Jlib/dk.netarkivet.wayback.jar -R'.*metadata.*\.warc' -BSBN -Ocdx.warc.output

Check that there is non-empty output. This verifies that CDX deduplicate extraction also works on warcfiles. 

ExceptionBatchInit 

This method should fail before starting to process the files, and thus no files should be processed.

...