...
(In the above cases, <redirect-url> is empty, which it will always be unless there is a 3xx http code.)
Run CDX Job On WARC Files
Code Block |
---|
java -Dsettings.common.applicationInstanceId=DEDUP -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml -cp lib/dk.netarkivet.archive.jar dk.netarkivet.archive.tools.RunBatch -Ndk.netarkivet.wayback.batch.DeduplicationCDXExtractionBatchJob -Jlib/dk.netarkivet.wayback.jar -R'.*metadata.*\.warc' -BSBN -Ocdx.warc.output |
Check that there is non-empty output. This verifies that CDX deduplicate extraction also works on warcfiles.
ExceptionBatchInit
This method should fail before starting to process the files, and thus no files should be processed.
...