/
Heritrix 3 Scripts

Heritrix 3 Scripts

The Heritrix 3 Web GUI has a scripting console where complex procedures can be run on an running heritrix job. The developers have begun developing some scripts based on Curator requests. These can be downloaded here.

To use the script, open the Scripting Console in the GUI for the current job:

Open the nas.groovy file and paste the contents into the script console. Select "Groovy" as the script engine:

Now at the top of the script, enter the regular expression to search either in the frontier or the crawl log. Comment out the line for the other command (using //)

Press "execute" and:

 

 

Related content

Heritrix Control and GUI-console Access
Heritrix Control and GUI-console Access
More like this
Appendix B2: Managing Heritrix 3 Crawler-Beans
Appendix B2: Managing Heritrix 3 Crawler-Beans
More like this
Appendix B2: Managing Heritrix 3 Crawler-Beans
Appendix B2: Managing Heritrix 3 Crawler-Beans
More like this
Heritrix3 Configurations
Heritrix3 Configurations
More like this
Appendix A - How-To Examples
Appendix A - How-To Examples
More like this
Appendix A - How-To Examples
Appendix A - How-To Examples
More like this