dk.netarkivet.common.distribute.indexserver.JobIndexCache (interface):
/** * An interface to a cache of data for jobs. */ public interface JobIndexCache { /** * Get an index for the given list of job IDs. The resulting file contains a suitably sorted list. This method * should always be safe for asynchronous calling. This method may use a cached version of the file. * * @param jobIDs Set of job IDs to generate index for. * @return An index, consisting of a file and the set this is an index for. This file must not be modified or * deleted, since it is part of the cache of data. */ Index<Set<Long>> getIndex(Set<Long> jobIDs); /** * Request an index from the indexserver. Prepare the index but don't give it to me. * * @param jobSet Set of job IDs to generate index for. * @param harvestId Harvestdefinition associated with this set of jobs */ void requestIndex(Set<Long> jobSet, Long harvestId); }
All relevant implementations of the JobIndexCache are:
- dk.netarkivet.harvester.indexserver.CDXIndexCache
- dk.netarkivet.harvester.indexserver.CrawlLogIndexCache (abstract class):
- DedupCrawlLogIndexCache (extends CrawlLogIndexCache)
- FullCrawlLogIndexCache (extends CrawlLogIndexCache)
- dk.netarkivet.harvester.indexserver.distribute.IndexRequestClient
- dk.netarkivet.common.distribute.indexserver.TrivialJobIndexCache
The types of jobs you can request is defined by the enum class dk.netarkivet.common.distribute.indexserver.RequestType:
public enum RequestType { CDX, DEDUP_CRAWL_LOG, FULL_CRAWL_LOG }
The naming of the cachefiles is done by the MultiFileBasedCache#getCacheFile() method:
/** * Get the filename for the file containing the combined data for a set of IDs. * * @param ids A set of IDs to generate a filename for * @return A filename that uniquely identifies this set of IDs within the cache. It is considered acceptable to have * collisions at a likelihood the order of 1/2^128 (i.e. use MD5 to abbreviate long lists). */ public File getCacheFile(Set<T> ids) { String fileName = FileUtils.generateFileNameFromSet(ids, "-cache"); return new File(getCacheDir(), fileName); }