/
ONB Wayback usage
ONB Wayback usage
Wayback Usage at ONB
Wayback Indexing Process
Build Pathindex
- Loops over all files in all mounted data segments (/mnt/wa001 to /mnt/waNNN) and generates a csv line for each file (Filename\tAbsoluteFilename) and creates a csv file for each segment
- Merging and sorting of all segment files to one pathindex file
Generate CDX
- Loops over all pathindex files of each segment and calls for data ARCs dk.netarkivet.wayback.batch.ExtractWaybackCDXBatchJob and for metadata ARCs dk.netarkivet.wayback.batch.ExtractDeduplicateCDXBatchJob and generates a CDX-File for each ARC when such a file doesn’t exist
Merge CDX
- Merging of all single cdx files to one large cdx file per segment
Sort CDX
- Sorting cdx files of all segments to one large sorted cdx file via the linux sort command