Excerpt | ||
---|---|---|
| ||
Instructions on how to run JWAT-Tools. |
The command line interface is work in progress. So at some point the arguments/options will be refactored.
Unfortunately I have a small command line package which also requires refactoring.
Options
The following options are currently available in JWAT-Tools.
...
Installing and running
To install JWAT-Tools simply unpack the archive.
To run JWAT-Tools use the Windows or Linux scripts included in the package.
The scripts can be called from any location.
Info | ||||
---|---|---|---|---|
| ||||
jwattools.cmd jwattools_debug.cmd jwattools_debug_suspended.cmd |
Info | ||||
---|---|---|---|---|
| ||||
jwattools.sh jwattools_debug.sh jwattools_debug_suspended.sh |
Options
The command line interface has changed yet again for v0.5.6.
The main help page only lists command and global options.
Use jwattools help <command> to show a command's usage.
Code Block | ||||
---|---|---|---|---|
| ||||
C:\Java\workspace\jwat-tools>target\jwat-tools-0.5.6-SNAPSHOT\jwattools.cmd
JWATTools v0.5.6
usage: JWATTools <command> [<args>]
Commands:
arc2warc convert ARC to WARC
cdx create a CDX index (unsorted)
compress compress
decompress decompress
extract extract ARC/WARC record(s)
interval interval extract
pathindex create a heritrix path index (unsorted)
test test validity of ARC/WARC/GZip file(s)
unpack unpack multifile GZip
See 'jwattools help <command>' for more information on a specific command.
C:\Java\workspace\jwat-tools> |
Command line interface for v0.5.5.
Code Block | ||||
---|---|---|---|---|
| ||||
C:\Java\workspace\jwat-tools>target\jwat-tools-0.5.45-SNAPSHOT\jwattools.cmd JWATTools v0.5.45 usageUsage: JWATTools [-dte19] command [file ...] Commands: arc2warc convert ARC to WARC cdx create a CDX index (unsorted) compress compress decompress decompress extract extract ARC/WARC record(s) interval interval extract pathindex create a heritrix path -tindex (unsorted) test test validity of ARC, /WARC and/or GZip file(s) unpack unpack multifile GZip Options: -r recursive (currently has no effect) -w<x> set the amount of worker thread(s) (defaults to 1) Test options: -e show errors -l relaxed URL URI validation -x to validate text/xml payload (eg. mets) Compress -doptions: decompress -1, --1fast compress faster -9 compress better -i interval extract -u unpack multifile gzip -c9, --slow convertcompress arcbetter to warc -C output CDX C:\Java\workspace\jwat-tools> |
You can supply one or more files. Each file can contain * and/or ? wildcards, but only in the filename part of the path. You can use more wildcards at the same time if you want.
-t (test)
Reads and validates all the files supplied. Files which are not recognized as either GZip, ARC or WARC are skipped. If wildcards are used, files that do not match are also skipped.
Use -e for more than a summary of errors.
-d (decompress)
Decompress one or more (multi-part) GZip files and write the decompressed data to a new file, one for each input file.
Useful for decompressing ARC and/or WARC files.
-r (recursive)
Is currently ignored. All operations are currently recursive.
-1..-9 (compress)
Compress normal and/or WARC files.
-i (interval extract)
Extract an interval from a given file. Interval can be expressed as offset, offset2 or offset,+length. Offset and length can be expressed in hex by pre-pending "$" or "0x".
-u (unpack)
Unpack a (multi-file)GZip and save each entry as individual files.
-c (convert)
Convert ARC files to WARC.
-C (output CDX)
Index one or more ARC/WARC files and output the result in CDX format.