Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

The current HARVEST_CONTROLLER_PRIORITY setting implies that the harvest controller architecture is based on a number of harvest controller instances, each with a specific priority. This dos does not reflect the current setup, where there only exists one 'HarvestControllerClient' which dispatches HarvestJob to the harvester via. 2 message queues. The HarvestControllerClient choses which queue to use based on the JobPriority of the concrete job. This means that the HARVEST_CONTROLLER_PRIORITY setting isn't used on the client side. On the server side the setting is used to find the relevant queue to consume messages from. So if we wish to use the name of the setting to describe the settings purpose, is should be called something like: HARVEST_SERVER_JOB_QUEUE.CHANNEL and the type should change to a ChannelID.

The current code is ridelled with code like:

Code Block

String priority = Settings.get(
                HarvesterSettings.HARVEST_CONTROLLER_PRIORITY);
  ChannelID result;
  if (priority.equals(JobPriority.LOWPRIORITY.toString())) {
    result = Channels.getAnyLowpriorityHaco();
  } 
  else {
    if (priority.equals(JobPriority.HIGHPRIORITY.toString())) {
      result = Channels.getAnyHighpriorityHaco();
    } 
    else throw new UnknownID(priority + " is not a valid priority");
  }
NetarkivetMessage naMsg = new DoOneCrawlMessage(theJob, result, TestInfo.emptyMetadata);

Which could be replaced by the much shorter and clearer:

Code Block

NetarkivetMessage naMsg = new DoOneCrawlMessage(
    theJob, Settings.getChannel(HARVEST_SERVER_JOB_CHANNEL), TestInfo.emptyMetadata);

Another renaming we might consider is renaming the HarvestControllerClient to HarvestJobDispatcher to better describe what this class is doing. It really isn't controlling anything, and using 'Client' as a marker in a distributed architecture doesn't say much, many components are acting as clients and servers, often both at the same time. The HarvestControllerServer could then be renamed to just HarvestServer. Colin mentioned that the Controller part may have come from the classes responsibility of controlling a Heritrix instance, but in my opinion this is just a ordinary delegation, which shouldn't be reflected in the classes public name (a classes name should primarily reflect the services it exposes, not how it implements them, eg. controls Heritrix to do the actual crawls).