Activating a snapshot harvest on page HarvestDefinition/Definitions-snapshot-harvests.jsp calls the SnapshotHarvestDefinition#flipActive() method.
This method has the following logic if deduplication is enabled:
log.info("Snapshot harvest #{} activated. Requesting preparation of deduplicationIndex before jobgeneration can commence", harvestId); Set<Long> jobSet = hdDaoProvider.get().getJobIdsForSnapshotDeduplicationIndex(harvestId); jobIndexCache.requestIndex(jobSet, harvestId);
This sends a message to the IndexServer
// fhd.setIndexReady(true); // will be set to true later, when Indexserver announces, that deduplication index is ready
// by sending a IndexReadyMessage back to the scheduler (i.e. the HarvestJobManager).
// See dk.netarkivet.harvester.indexserver.distribute.IndexRequestServer#doProcessIndexRequestMessage(), ll. 416-423
The HarvestJobManager receives the response to the IndexreadyMessage in method HarvestSchedulerMonitorServer#processIndexReadyMessage()
Here the 'isindexready' field in the table 'fullharvests' is set to true, if the 'indexOK' field in the IndexReadyMessage is true, otherwise it is set to false.
The method HarvestDefinitionDBDAO#getJobIdsForSnapshotDeduplicationIndex is responsible for computing the list of jobs included in the deduplication index.
It uses the getPreviousFullHarvests() method
Classes involved in this workflow:
- harvester/harvester-core/src/main/java/dk/netarkivet/harvester/webinterface/SnapshotHarvestDefinition.java, ll. 251-299 (esp. 267-282)
- harvester/harvest-scheduler/src/main/java/dk/netarkivet/harvester/scheduler/HarvestSchedulerMonitorServer.java, ll. 196-224
- harvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/distribute/IndexRequestServer.java, ll. 416-423
- harvester//harvester-core/src/main/java/dk/netarkivet/harvester/datamodel/HarvestDefinitionDBDAO.java, ll. 1167-1187, 1189-1233