...
If you want to use JWAT from any build environment that supports artifacts, the following can be used.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
<dependencies> <dependency> <groupId>org.jwat</groupId> <artifactId>jwat-common</artifactId> <version>0<version>1.90.1<4</version> </dependency> <dependency> <groupId>org.jwat</groupId> <artifactId>jwat-gzip</artifactId> <version>0<version>1.90.1<4</version> </dependency> <dependency> <groupId>org.jwat</groupId> <artifactId>jwat-arc</artifactId> <version>0<version>1.90.1<4</version> </dependency> <dependency> <groupId>org.jwat</groupId> <artifactId>jwat-warc</artifactId> <version>1.0.4</version> </dependency> <version>0.9.1<<dependency> <groupId>org.jwat</groupId> <artifactId>jwat-archive-common</artifactId> <version>1.0.4</version> </dependency> </dependencies> |
...
The ARC/WARC factory and reader methods support both compressed and uncompressed files in varying combinations.
Basically you can create a reader for compressed or uncompressed files, but the factory also includes an auto-detection mechanism which can determine the correct reader on a per file basis.
Writing compressed ARC/WARC files is also possible though the use of different methods in the writer factories.
GZip compression is only supported on ARC/WARC files where each record is compressed individually and concatenated into one file and not the case where the whole ARC/WARC file and all it's records are GZip'ed as a whole. The later mostly because this makes random access to individual record highly ineffective.
...