Activating a snapshot harvest on page HarvestDefinition/Definitions-snapshot-harvests.jsp calls the SnapshotHarvestDefinition#flipActive() method.
This method has the following logic if deduplication is enabled:
log.info("Snapshot harvest #{} activated. Requesting preparation of deduplicationIndex before jobgeneration can commence", harvestId); Set<Long> jobSet = hdDaoProvider.get().getJobIdsForSnapshotDeduplicationIndex(harvestId); jobIndexCache.requestIndex(jobSet, harvestId);
This sends a IndexRequestMessage to the IndexServer or more specifically the IndexRequestServer for a deduplicationIndex for the given list of jobs.
After processing the request, it sends a IndexReadyMessage to the HarvestJobManager with either indexOK=true (index is ready), or indexOK=false (The server failed to generate the index)
(See dk.netarkivet.harvester.indexserver.distribute.IndexRequestServer#doProcessIndexRequestMessage(), ll. 416-423)
The HarvestJobManager receives the response to the IndexreadyMessage in method HarvestSchedulerMonitorServer#processIndexReadyMessage()
Here the 'isindexready' field in the table 'fullharvests' is set to true, if the 'indexOK' field in the IndexReadyMessage is true, otherwise it is set to false.
Selecting the list of jobs included in the deduplicationIndex
The method HarvestDefinitionDBDAO#getJobIdsForSnapshotDeduplicationIndex is responsible for computing the list of jobs included in the deduplication index.
It uses the getPreviousFullHarvests() method
Classes involved in this workflow:
- harvester/harvester-core/src/main/java/dk/netarkivet/harvester/webinterface/SnapshotHarvestDefinition.java, ll. 251-299 (esp. 267-282)
- harvester/harvest-scheduler/src/main/java/dk/netarkivet/harvester/scheduler/HarvestSchedulerMonitorServer.java, ll. 196-224
- harvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/distribute/IndexRequestServer.java, ll. 416-423
- harvester//harvester-core/src/main/java/dk/netarkivet/harvester/datamodel/HarvestDefinitionDBDAO.java, ll. 1167-1187, 1189-1233