Documentation
The pages below describe the individual packages and also the process by which ARC and WARC files are read and validated.
If you can not find the information you seek you can always try the javadocs or look at the source code. As a last resort you are also welcome to email me.
Package layout
This toolkit includes the following packages:
- jwat-common: General purpose classes including specialized streams, binary->string encoding and common arc/warc http-response/payload code.
- jwat-gzip: GZip reader/validator/writer, including input/output streams for data.
- jwat-arc: Contains Arc reader/validator/writer specific classes.
- jwat-warc: Contains Warc reader/validator/writer specific classes.
jwat-common
common package
jwat-gzip
gzip package
jwat-arc
arc package
jwat-warc
warc package
ARC reader process
Describes the steps taken to read and validate an ARC record.
WARC reader process
Describes the steps taken to read and validate a WARC record.
Caveat user
Known side-effects and pitfalls from the current reading/validating strategy