Heritrix 3 Scripts

The Heritrix 3 Web GUI has a scripting console where complex procedures can be run on an running heritrix job. The developers have begun developing some scripts based on Curator requests. These can be downloaded here.

To use the script, open the Scripting Console in the GUI for the current job:

Open the nas.groovy file and paste the contents into the script console. Select "Groovy" as the script engine:

Now at the top of the script, enter the regular expression to search either in the frontier or the crawl log. Comment out the line for the other command (using //)

Press "execute" and: