harvestInfo.performer in warcinfo records is not included in harvestInfo.xml
Field Tab
Test
Field Tab
Test
Description
In 5.1, the harvestInfo.performer included in warcinfo records is empty. And is not included in harvestInfo.xml.
#added by NetarchiveSuite Version: 5.1 (<a href="https://github.com/netarchivesuite/netarchivesuite/commit/cde61d78299cabccae6195908b81ef77c84a76b9">cde61d7829</a>) harvestInfo.version: 0.5 harvestInfo.jobId: 20 harvestInfo.channel: CIBLEE harvestInfo.harvestNum: 3 harvestInfo.origHarvestDefinitionID: 2 harvestInfo.maxBytesPerDomain: -1 harvestInfo.maxObjectsPerDomain: 50 harvestInfo.orderXMLName: default_NAS5_1_KLM harvestInfo.origHarvestDefinitionName: test saa harvestInfo.scheduleName: annuelle harvestInfo.harvestFilenamePrefix: BnF-20-2 harvestInfo.jobSubmitDate: Mon Jul 25 13:08:46 CEST 2016 harvestInfo.performer: harvestInfo.audience: champ public
<?xml version="1.0" encoding="UTF-8"?> <harvestInfo> <version>0.5</version> <jobId>20</jobId> <channel>CIBLEE</channel> <harvestNum>3</harvestNum> <origHarvestDefinitionID>2</origHarvestDefinitionID> <maxBytesPerDomain>-1</maxBytesPerDomain> <maxObjectsPerDomain>50</maxObjectsPerDomain> <orderXMLName>default_NAS5_1_KLM</orderXMLName> <origHarvestDefinitionName>test saa</origHarvestDefinitionName> <origHarvestDefinitionComments>Collecte réalisée avec NAS 5.1 pour contrôler les WARC de données et de métadonnées.</origHarvestDefinitionComments> <scheduleName>annuelle</scheduleName> <harvestFilenamePrefix>BnF-20-2</harvestFilenamePrefix> <jobSubmitDate>2016-07-25T11:08:46Z</jobSubmitDate> <audience>champ public</audience> </harvestInfo>
Checklist
Activity
Show:
Sara AubryOctober 10, 2016 at 1:58 PM
Tested, if the settings.harvester.performer is not declared, it will not appear either in the harvestInfo.xml, nor in the warcinfo metadata of the data files.
Sara AubrySeptember 28, 2016 at 6:55 AM
Great, we'll test it.
SrSeptember 27, 2016 at 4:02 PM
NAS is now consistent. It does not longer add empty performer values into the warcInfo metadata
SrSeptember 21, 2016 at 1:50 PM
So probably, you just need to override the empty settings-value
The thing, we probably should do is to avoid inserting a empty performer in warcInfo metadata
Sara AubrySeptember 21, 2016 at 1:48 PM
Yes, we saw that by declaring a performer in the settings. But to be consistent, if the performer is not declared in the settings, maybe the harvestInfo.performer should not be inserted with an empty value in the warcinfo records of the WARC data files. It is currently not inserted in the harvestInfo.xml.
In 5.1, the harvestInfo.performer included in warcinfo records is empty. And is not included in harvestInfo.xml.
#added by NetarchiveSuite Version: 5.1 (<a href="https://github.com/netarchivesuite/netarchivesuite/commit/cde61d78299cabccae6195908b81ef77c84a76b9">cde61d7829</a>)
harvestInfo.version: 0.5
harvestInfo.jobId: 20
harvestInfo.channel: CIBLEE
harvestInfo.harvestNum: 3
harvestInfo.origHarvestDefinitionID: 2
harvestInfo.maxBytesPerDomain: -1
harvestInfo.maxObjectsPerDomain: 50
harvestInfo.orderXMLName: default_NAS5_1_KLM
harvestInfo.origHarvestDefinitionName: test saa
harvestInfo.scheduleName: annuelle
harvestInfo.harvestFilenamePrefix: BnF-20-2
harvestInfo.jobSubmitDate: Mon Jul 25 13:08:46 CEST 2016
harvestInfo.performer:
harvestInfo.audience: champ public
<?xml version="1.0" encoding="UTF-8"?>
<harvestInfo>
<version>0.5</version>
<jobId>20</jobId>
<channel>CIBLEE</channel>
<harvestNum>3</harvestNum>
<origHarvestDefinitionID>2</origHarvestDefinitionID>
<maxBytesPerDomain>-1</maxBytesPerDomain>
<maxObjectsPerDomain>50</maxObjectsPerDomain>
<orderXMLName>default_NAS5_1_KLM</orderXMLName>
<origHarvestDefinitionName>test saa</origHarvestDefinitionName>
<origHarvestDefinitionComments>Collecte réalisée avec NAS 5.1 pour contrôler les WARC de données et de métadonnées.</origHarvestDefinitionComments>
<scheduleName>annuelle</scheduleName>
<harvestFilenamePrefix>BnF-20-2</harvestFilenamePrefix>
<jobSubmitDate>2016-07-25T11:08:46Z</jobSubmitDate>
<audience>champ public</audience>
</harvestInfo>