Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Known side-effects from the current reading/validating strategy

Side-effects:

The payload is inaccessible when the "Content-Length" is absent or invalid.

Warc-Payload-Digest header is computed only on defined record payloads where the leading header has been read. This makes it a requirement for the WARC parser to identify and always parse the http response and not make it optional.

For GZip'ed records this is not a big problem since we know the record ends when the GZip entry ends.

For uncompressed records the payload input stream would have to look ahead for a valid WARC version line at which point the payload stream should be closed and the bytes read beyond that pushed back onto the internal streams.

  • No labels