Running JWAT-Tools

Running JWAT-Tools

Installing and running

To install JWAT-Tools simply unpack the archive.

To run JWAT-Tools use the Windows or Linux scripts included in the package.

The scripts can be called from any location.

Windows scripts

jwattools.cmd

jwattools_debug.cmd

jwattools_debug_suspended.cmd

Linux scripts

jwattools.sh

jwattools_debug.sh

jwattools_debug_suspended.sh

Options

The command line interface has changed yet again for v0.5.6.

The main help page only lists command and global options.

Use jwattools help <command> to show a command's usage.

Commandline options (v0.5.6)
C:\Java\workspace\jwat-tools>target\jwat-tools-0.5.6-SNAPSHOT\jwattools.cmd JWATTools v0.5.6 usage: JWATTools <command> [<args>] Commands: arc2warc convert ARC to WARC cdx create a CDX index (unsorted) compress compress decompress decompress extract extract ARC/WARC record(s) interval interval extract pathindex create a heritrix path index (unsorted) test test validity of ARC/WARC/GZip file(s) unpack unpack multifile GZip See 'jwattools help <command>' for more information on a specific command. C:\Java\workspace\jwat-tools>

Command line interface for v0.5.5.

Commandline options (v0.5.5)
C:\Java\workspace\jwat-tools>target\jwat-tools-0.5.5-SNAPSHOT\jwattools.cmd JWATTools v0.5.5 Usage: JWATTools [-dte19] command [file ...] Commands: arc2warc convert ARC to WARC cdx create a CDX index (unsorted) compress compress decompress decompress extract extract ARC/WARC record(s) interval interval extract pathindex create a heritrix path index (unsorted) test test validity of ARC/WARC/GZip file(s) unpack unpack multifile GZip Options: -r recursive (currently has no effect) -w<x> set the amount of worker thread(s) (defaults to 1) Test options: -e show errors -l relaxed URL URI validation -x to validate text/xml payload (eg. mets) Compress options: -1, --fast compress faster -9, --slow compress better C:\Java\workspace\jwat-tools>

You can supply one or more files. Each file can contain * and/or ? wildcards, but only in the filename part of the path. You can use more wildcards at the same time if you want.