Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

The current HARVEST_CONTROLLER_PRIORITY setting implies that the harvest controller architecture is based on a number of harvest controller instances, each with a specific priority. This does not reflect the current setup, where there only exists one 'HarvestControllerClient' which dispatches HarvestJob to the harvester via. 2 message queues. The HarvestControllerClient choses which queue to use based on the JobPriority of the concrete job. This means that the HARVEST_CONTROLLER_PRIORITY setting isn't used on the client side. On the server side the setting is used to find the relevant queue to consume messages from. So if we wish to use the name of the setting to describe the settings purpose, is should be called something like: HARVEST_SERVER_JOB_QUEUE.

Another renaming we might consider is renaming the HarvestControllerClient to HarvestJobDispatcher to better describe what this class is doing. It really isn't controlling anything, and using 'Client' as a marker in a distributed architecture doesn't say much, many components are acting as clients and servers, often both at the same time. The HarvestControllerServer could then be renamed to just HarvestServer. Colin mentioned that the Controller part may have come from the classes responsibility of controlling a Heritrix instance, but in my opinion this is just a ordinary delegation, which shouldn't be reflected in the classes public name (a classes name should primarily reflect the services it exposes, not how it implements them, eg. controls Heritrix to do the actual crawls).

  • No labels