...
This sends a IndexRequestMessage to the IndexServer or more specifically the IndexRequestServer for a deduplicationIndex for the given list of jobs. (See IndexRequestClietn
After processing the request, it sends a IndexReadyMessage to the HarvestJobManager with either indexOK=true (index is ready), or indexOK=false (The server failed to generate the index)
...
The method HarvestDefinitionDBDAO#getJobIdsForSnapshotDeduplicationIndex is responsible for computing the list of jobs included in the deduplication index.
It uses the rather complex (and maybe wrong) getPreviousFullHarvests() method also in HarvestDefinitionDBDAO
Code Block |
---|
/**
* Get list of harvests previous to this one.
*
* @param thisHarvest The id of this harvestdefinition
* @return a list of IDs belonging to harvests previous to this one.
*/
private List<Long> getPreviousFullHarvests(Long thisHarvest) {
List<Long> results = new ArrayList<Long>();
try (Connection c = HarvestDBConnection.get();) {
// Follow the chain of originating IDs back
for (Long originatingHarvest = thisHarvest; originatingHarvest != null;
// Compute next originatingHarvest
originatingHarvest = DBUtils.selectFirstLongValueIfAny(c, "SELECT previoushd FROM fullharvests"
+ " WHERE fullharvests.harvest_id=?", originatingHarvest)) {
if (!originatingHarvest.equals(thisHarvest)) {
results.add(originatingHarvest);
}
}
// Find the first harvest in the chain (but last in the list).
Long firstHarvest = thisHarvest;
if (!results.isEmpty()) {
firstHarvest = results.get(results.size() - 1);
}
// Find the last harvest in the chain before
Long olderHarvest = DBUtils.selectFirstLongValueIfAny(c, "SELECT fullharvests.harvest_id"
+ " FROM fullharvests, harvestdefinitions," + " harvestdefinitions AS currenthd"
+ " WHERE currenthd.harvest_id=?" + " AND fullharvests.harvest_id "
+ "= harvestdefinitions.harvest_id"
+ " AND harvestdefinitions.submitted " + "< currenthd.submitted"
+ " ORDER BY harvestdefinitions.submitted " + HarvestStatusQuery.SORT_ORDER.DESC.name(),
firstHarvest);
// Follow the chain of originating IDs back
for (Long originatingHarvest = olderHarvest; originatingHarvest != null; originatingHarvest = DBUtils
.selectFirstLongValueIfAny(c, "SELECT previoushd FROM fullharvests"
+ " WHERE fullharvests.harvest_id=?", originatingHarvest)) {
results.add(originatingHarvest);
}
} catch (SQLException e) {
log.warn("Exception thrown while updating fullharvests.isindexready field: {}",
ExceptionUtils.getSQLExceptionCause(e), e);
}
return results;
} |
Classes involved in this workflow:
- harvester/harvester-core/src/main/java/dk/netarkivet/harvester/webinterface/SnapshotHarvestDefinition.java, ll. 251-299 (esp. 267-282)
- harvester/harvest-scheduler/src/main/java/dk/netarkivet/harvester/scheduler/HarvestSchedulerMonitorServer.java, ll. 196-224
- harvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/distribute/IndexRequestServer.java, ll. 416-423
- harvester//harvester-core/src/main/java/dk/netarkivet/harvester/datamodel/HarvestDefinitionDBDAO.java, ll. 1167-1187, 1189-1233
- harvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/distribute/IndexRequestClient.java, ll. 358-383