Loops over all files in all mounted data segments (/mnt/wa001 to /mnt/waNNN) and generates a csv line for each file (Filename\tAbsoluteFilename) and creates a csv file for each segment
Merging and sorting of all segment files to one pathindex file
Generate CDX
Loops over all pathindex files of each segment and calls for data ARCs dk.netarkivet.wayback.batch.ExtractWaybackCDXBatchJob and for metadata ARCs dk.netarkivet.wayback.batch.ExtractDeduplicateCDXBatchJob and generates a CDX-File for each ARC when such a file doesn’t exist
Merge CDX
Merging of all single cdx files to one large cdx file per segment
Sort CDX
Sorting cdx files of all segments to one large sorted cdx file via the linux sort command