Fixed
Details
Assignee
Nicholas ClarkeNicholas ClarkeReporter
MMAccuracy of estimate
RoughOriginal estimate
Time tracking
No time logged1w 2d remainingComponents
Priority
Major
Details
Details
Assignee
Nicholas Clarke
Nicholas ClarkeReporter
M
MAccuracy of estimate
Rough
Original estimate
Time tracking
No time logged1w 2d remaining
Components
Priority
Checklist
Checklist
Checklist
Created September 29, 2011 at 1:35 PM
Updated February 16, 2016 at 5:28 PM
Resolved September 5, 2012 at 2:06 PM
The CDX generating code must work for both ARC and WARC files. Currently the method dk.netarkivet.common.utils.cdx.ExtractCDX.generateCDX() ignores all files not ending with .arc. This method is used in the Harvest documentation phase to generate CDX-files for the arc-files coming from Heritrix
When generating a single CDX-entry for an URL request, information from several Warc-records is combined.
Note that Wayback already has code to make an CDX from WARC:
https://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/resourcestore/indexer/