WARC support in NetarchiveSuite

Implement support for the WARC  format in the main NetarchiveSuite project.

See NAS-1720 Enable WARC file writing and handling in the NetarchiveSuite for specific list of tasks.

Suggestion for appending harvestInfo.xml to the existing Heritrix warc-info (3th december 2012):

WARC/1.0
WARC-Type: warcinfo
WARC-Date: 2012-11-23T17:32:58Z
WARC-Filename: 1-1-20121123173258-00000-kb-test-har-002.kb.dk.warc
WARC-Record-ID: <urn:uuid:c01abb4a-44ef-4ab7-9d35-69a64066a107>
Content-Type: application/warc-fields
Content-Length: 872

software: Heritrix/1.14.4 http://crawler.archive.org
ip: 130.226.228.8
hostname: kb-test-har-002.kb.dk
format: WARC File Format 1.0
conformsTo: http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf
operator: Admin
isPartOf: default_orderxml
description: Default Profile
robots: ignore
http-header-user-agent: Mozilla/5.0 (compatible; heritrix/1.14.4 +http://netarkivet.dk)
http-header-from: svc@kb.dk

harvestInfo.version: 0.4
harvestInfo.jobId: 1
harvestInfo.priority: HIGHPRIORITY
harvestInfo.harvestNum: 0
harvestInfo.origHarvestDefinitionID: 1
harvestInfo.maxBytesPerDomain: 500000000
harvestInfo.maxObjectsPerDomain: 2000
harvestInfo.orderXMLName: default_orderxml
harvestInfo.origHarvestDefinitionName: netarkivet-harvest
harvestInfo.scheduleName: Once_a_week
harvestInfo.harvestFilenamePrefix: 1-1

Suggestion for the warc-info in the NetarchiveSuite metadata warc-files (3th december 2012).

WARC/1.0
WARC-Type: warcinfo
WARC-Date: 2012-11-30T11:50:47Z
WARC-Filename: 1-metadata-1.warc
WARC-Record-ID: <urn:uuid:a1d642da-b00b-471b-8313-990b9e80d921>
Content-Type: application/warc-fields
Content-Length: 332

software: NetarchiveSuite/Version: 4.0.0 status UNSTABLE (r2560)/https://kb-dk.atlassian.net/wiki/display/NAS/
ip: 130.226.228.8
hostname: kb-test-har-002.kb.dk
conformsTo: http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf
isPartOf: harvestname:test_harvest/harvestID:1/harvestnum:0/job-id: 1
format: WARC File Format 1.0