Pluggable parts

Contents

Some points in NetarchiveSuite can be swapped out for other implementations, in a way similar to what Heritrix uses.

RemoteFile

The RemoteFile interface defines how large chunks of data are transferred between machines in a NetarchiveSuite installation. This is necessary because JMS has a relatively low limit on the size of messages, well below the several hundred megabytes to over a gigabyte that is easily stored in an ARC or WARC file.

The RemoteFile interface is defined by the RemoteFile interface.

JMSConnection

The JMSConnection provides access to a specific JMS connection. The default NetarchiveSuite distribution contains only one implementation, namely JMSConnectionSunMQ which uses Sun's OpenMQ. We recommend using this implementation, as other implementations have previously been found to violate some assumptions that NetarchiveSuite depends on.

The JMSConnection interface is defined by the abstract class JMSConnection.

Implementations of this interface needs to implement the four abstract methods in this interface: getConnectionFactory(),  getDestination(String destinationName), onException(JMSException e), and getQueueSession().

ArcRepositoryClient

The ArcRepositoryClient handles access to the Archive module, both upload and low-level access.

The ArcRepositoryClient interface is defined by the interface ArcRepositoryClient

IndexClient

The IndexClient provides the Lucene indices that are used for deduplication and for viewerproxy access. It makes use of the ArcRepositoryClient to fetch data from the archive and implements several layers of caching of these data and of Lucene-indices created from the data. It is advisable to perform regular clean-up of the cache directories.

The IndexClient interface is defined by the Java interface JobIndexCache

Archive Admin DBSpecifics

Defines functionality specific to the type of database for the Archive Admin database, see javadoc for details.

Harvester DBSpecifics

Defines functionality specific to the type of database used for the Harvester module, see javadoc for details.

Notifications

The Notifications interface lets you choose how you want important error notifications to be handled in your system. Two implementations exist, one to send emails, and one to print the messages to System.err. Adding more specialised plugins should be easy.

The Notifications interface is defined by the abstract class Notifications.

HeritrixController

The HeritrixController interface defines our interface for initialize a running Heritrix instance and communicate with this instance. We have two implementations that starts heritrix as its own process and then communicates with it using JMX (JMXHeritrixController, BnfHeritrixController), and a deprecated implementation with heritrix embedded inside NetarchiveSuite (DirectHeritrixController), which controls Heritrix using a CrawlController instance.

The HeritrixController interface is defined by the Java interface HeritrixController.

ActiveBitPreservation

The ActiveBitpreservaton interface defines our interface for initializing bitpreservation actions from our GUI. We have a filebased (now deprecated) and a database based implementation. Both these implementations communicate with the archive through the ArcRepository interface.

The ActiveBitPreservation interface is defined by the Java interface ActiveBitPreservation