common package
org.jwat.common:
As the package name indicates this package includes various classes of general use but also more specific ARC/WARC classes. The classes can be classified as follows.
String encoding
The base classes are defined in various RFCs and are commonly used across the internet in many different contexts. Basically an input comprising of an array of 8bit characters is converted into a string of printable characters.
- Base64: Uses an alphabet of 64 characters and is the most widely used.
- Base32: Uses an alphabet of 32 characters and seems to be the default encoding for WARC digests.
- Base16: Uses an alphabet of 16 characters and is also more commonly called hexadecimal strings or just hex for short.
- Base2: Uses only 0s and 1s and represents the 8bit values as binary string representations.
InputStream / StringReader
Yo!