Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This sends a IndexRequestMessage to the IndexServer or more specifically the IndexRequestServer for a deduplicationIndex for the given list of jobs. (See IndexRequestClietn

After processing the request, it sends a IndexReadyMessage to the HarvestJobManager with either indexOK=true (index is ready), or indexOK=false (The server failed to generate the index)

...

The method HarvestDefinitionDBDAO#getJobIdsForSnapshotDeduplicationIndex is responsible for computing the list of jobs included in the deduplication index.
It uses the rather complex (and maybe wrong) getPreviousFullHarvests() method also in HarvestDefinitionDBDAO

Code Block
    /**
     * Get list of harvests previous to this one.
     *
     * @param thisHarvest The id of this harvestdefinition
     * @return a list of IDs belonging to harvests previous to this one.
     */
    private List<Long> getPreviousFullHarvests(Long thisHarvest) {
        List<Long> results = new ArrayList<Long>();
        try (Connection c = HarvestDBConnection.get();) {
            // Follow the chain of originating IDs back
            for (Long originatingHarvest = thisHarvest; originatingHarvest != null;
                // Compute next originatingHarvest
                 originatingHarvest = DBUtils.selectFirstLongValueIfAny(c, "SELECT previoushd FROM fullharvests"
                         + " WHERE fullharvests.harvest_id=?", originatingHarvest)) {
                if (!originatingHarvest.equals(thisHarvest)) {
                    results.add(originatingHarvest);
                }
            }

            // Find the first harvest in the chain (but last in the list).
            Long firstHarvest = thisHarvest;
            if (!results.isEmpty()) {
                firstHarvest = results.get(results.size() - 1);
            }

            // Find the last harvest in the chain before
            Long olderHarvest = DBUtils.selectFirstLongValueIfAny(c, "SELECT fullharvests.harvest_id"
                            + " FROM fullharvests, harvestdefinitions," + "  harvestdefinitions AS currenthd"
                            + " WHERE currenthd.harvest_id=?" + " AND fullharvests.harvest_id "
                            + "= harvestdefinitions.harvest_id"
                            + " AND harvestdefinitions.submitted " + "< currenthd.submitted"
                            + " ORDER BY harvestdefinitions.submitted " + HarvestStatusQuery.SORT_ORDER.DESC.name(),
                    firstHarvest);
            // Follow the chain of originating IDs back
            for (Long originatingHarvest = olderHarvest; originatingHarvest != null; originatingHarvest = DBUtils
                    .selectFirstLongValueIfAny(c, "SELECT previoushd FROM fullharvests"
                            + " WHERE fullharvests.harvest_id=?", originatingHarvest)) {
                results.add(originatingHarvest);
            }
        } catch (SQLException e) {
            log.warn("Exception thrown while updating fullharvests.isindexready field: {}",
                    ExceptionUtils.getSQLExceptionCause(e), e);
        }
        return results;
    }



Classes involved in this workflow:

  • harvester/harvester-core/src/main/java/dk/netarkivet/harvester/webinterface/SnapshotHarvestDefinition.java, ll. 251-299 (esp. 267-282)
  • harvester/harvest-scheduler/src/main/java/dk/netarkivet/harvester/scheduler/HarvestSchedulerMonitorServer.java, ll. 196-224
  • harvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/distribute/IndexRequestServer.java, ll. 416-423
  • harvester//harvester-core/src/main/java/dk/netarkivet/harvester/datamodel/HarvestDefinitionDBDAO.java, ll. 1167-1187, 1189-1233
  • harvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/distribute/IndexRequestClient.java, ll. 358-383