NetarchiveSuite 5.1 Release Notes

Release Date: 24th Februar 2016

Contents

Highlights

Heritrix 3

This is the first production release for NetarchiveSuite to support harvesting with Heritrix 3. Harvesting with Heritrix 1 is not supported in this version but may be supported in later versions.

Added new attributes tables

We have added two new tables (eav_attribute, eav_type_attribute) that by default add three extra attributes to the configurations, viz, MAX_HOPS, EXTRACT_JAVASCRIPT, and HONOR_ROBOTS_DOT_TXT

The eav_type_attribute contains the name of the attribute, which also is equal to their placeholder-name in the H3 templates.

By default, MAX_HOPS is set to 20, EXTRACT_JAVASCRIPT is set to 1 (true), and HONOR_ROBOTS_DOT_TXT is set to 0 (Ignore)

Java 7

NetarchiveSuite now requires java 7 to run. The GUIApplication does not run with java 8. The other applications will also run on java 8.

More detailed jar file structure

The jar files now fit the application structure better. This means the deploy script should be updated to the follow application <-> jar file classpath definitions. See Updating deploy file jar definitions to 5.0.

Heritrix3 bundler zip now needs to be supplied in the deploy

The Heritrix3 crawler code has been moved into a separate zip file to allow more flexibility for choosing a concrete H3 version to use. This means that the HarvesterControllers deployment now needs to be configured with a extra H3 bundler zip. This can either be done as an argument to the deploy call or by defining a bundler for each HarvestControllerApplication in the deploy configuration.

Switch to Maven as build tool

The project is now build with Maven. This means that the jars, source-jars and javadoc jar can be found at the https://sbforge.org/nexus/content/groups/public/ repository and also that netarchivesuite modules can be added as maven dependencies in other projects, for example:

<dependency>
  <groupId>org.netarchivesuite</groupId>
  <artifactId>nas-module</artifactId>
  <version>${nas.version}</version>
</dependency>

Source moved to Github

The Netarchivesuite source code is now located at github here: https://github.com/netarchivesuite/netarchivesuite.

Full list of issues resolved in this release

type key priority summary
Loading...
Refresh

Known issues

type key priority summary fixversions
Loading...
Refresh