/
Pluggable parts of Netarchivesuite

Note that this documentation is for the coming release NetarchiveSuite 7.4
and is still work-in-progress.

For documentation on the released versions, please view the previous versions of the NetarchiveSuite documentation and select the relevant version.

Pluggable parts of Netarchivesuite

Contents

Some parts of NetarchiveSuite can be swapped out for other implementations of the same interface.

RemoteFile

The RemoteFile interface defines how large chunks of data are transferred between machines in a NetarchiveSuite installation. This is necessary because JMS has a relatively low limit on the size of messages, well below the several hundred megabytes to over a gigabyte that is easily stored in an ARC or WARC file.

The RemoteFile interface is defined by the RemoteFile interface.

JMSConnection

The JMSConnection provides access to a specific JMS connection. The default NetarchiveSuite distribution contains only one implementation, namely JMSConnectionSunMQ which uses Sun's OpenMQ. We recommend using this implementation, as other implementations have previously been found to violate some assumptions that NetarchiveSuite depends on, especially that a JMS queue X is created in the broker when an application begins listening to queue X.

The JMSConnection interface is defined by the abstract class JMSConnection.

Implementations of this interface needs to implement the four abstract methods in this interface: getConnectionFactory(),  getDestination(String destinationName), onException(JMSException e), and getQueueSession().

ArcRepositoryClient

The ArcRepositoryClient handles access to the Archive module, both upload and low-level access.

The ArcRepositoryClient interface is defined by the interface ArcRepositoryClient

IndexClient

The IndexClient provides the Lucene indices that are used for deduplication and for viewerproxy access. It makes use of the ArcRepositoryClient to fetch data from the archive and implements several layers of caching of these data and of Lucene-indices created from the data. This is particularly important for deduplication during snapshot harvesting as many harvest jobs need to reuse the same large deduplication index. It is advisable to perform regular clean-up of the cache directories.

The IndexClient interface is defined by the Java interface JobIndexCache

Archive Admin DBSpecifics

Defines functionality specific to the type of database for the Archive Admin database, see javadoc for details.

Harvester DBSpecifics

Defines functionality specific to the type of database used for the Harvester module, see javadoc for details.

Notifications

The Notifications interface lets you choose how you want important error notifications to be handled in your system. Two implementations exist, one to send emails, and one to print the messages to System.err. Adding more specialised plugins should be easy.

The Notifications interface is defined by the abstract class Notifications.

IHeritrixController

The IHeritrixController interface defines our interface for initialize a running Heritrix instance and communicate with this instance. In NetarchiveSuite 5.0+ only one implementation is supported - that for Heritrix 3.

ActiveBitPreservation

The ActiveBitpreservaton interface defines our interface for initializing bitpreservation actions from our GUI. We have a file-based (now deprecated) and a database-based implementation. Both these implementations communicate with the archive through the ArcRepository interface.

The ActiveBitPreservation interface is defined by the Java interface ActiveBitPreservation