This patch enables one to execute a script to cleanup an umbra instance before starting a new harvest on any umbra enabled harvester. The patch can be enabled by replacing the two (identical) jar-files <blah> and <blah> with the jarfile and restarting the HarvestControllerApplication instance. The git-source for this path is commit 13599b1.
The settings for umbra in HarvestControllerApplication have an extra optional element
The default value is "drain-queue" but this can be replaced with the path to a more sophisticated script - for example one that also restarts umbra. In tests we have used the following script to enable the specific python version under which umbra runs in the Netarkivet installation
Create the script you wish to use and make it executable
Modify the settings to point to the script
Restart the HarvestControllerApplication
Side effects
All script output is logged in the HarvestControllerApplication log
Remember that the default implementation "drain-queue" will empty the entire umbra input queue. So therefore it is highly inadvisable to have more than one HarvestControllerApplication using the same umbra instance
The call to execute the hook script is blocking, so heritrix will not start until the script ends
Highlights in 5.5
NetarchiveSuite now supports browser-based harvesting using Internet Archive Umbra
Improved stability in Heritrix MatchesRegexListDecideRule
Improved handling of queue-assignments in Heritrix
Upgrading From Previous NetarchiveSuite Releases
There are no special requirements involved in the upgrade. It should be sufficient to replace all .jar files in your installation lib directory with those from the new release, and replace the heritrix bundler zip-file on your HarvestController machines with the new bundler.
Enabling Umbra integration requires some reconfiguration. This is described in the documentation. Note that if enabling Umbra, you should define the new queue for Umbra jobs in the NetarchiveSuite GUI before you start any new HarvestController instances to listen to the queue. (See
NAS-2794
-
Getting issue details...STATUS
.)