Pluggable parts
Contents
Some points in NetarchiveSuite can be swapped out for other implementations, in a way similar to what Heritrix uses.
RemoteFile
The RemoteFile interface defines how large chunks of data are transferred between machines in a NetarchiveSuite installation. This is necessary because JMS has a relatively low limit on the size of messages, well below the several hundred megabytes to over a gigabyte that is easily stored in an ARC or WARC file.
The RemoteFile interface is defined by the RemoteFile interface.
JMSConnection
The JMSConnection provides access to a specific JMS connection. The default NetarchiveSuite distribution contains only one implementation, namely JMSConnectionSunMQ which uses Sun's OpenMQ. We recommend using this implementation, as other implementations have previously been found to violate some assumptions that NetarchiveSuite depends on.
The JMSConnection interface is defined by the abstract class JMSConnection.
Implementations of this interface needs to implement the four abstract methods in this interface: getConnectionFactory(), getDestination(String destinationName), onException(JMSException e), and getQueueSession().
ArcRepositoryClient
The ArcRepositoryClient handles access to the Archive module, both upload and low-level access.
The ArcRepositoryClient interface is defined by the interface ArcRepositoryClient
IndexClient
The IndexClient provides the Lucene indices that are used for deduplication and for viewerproxy access. It makes use of the ArcRepositoryClient to fetch data from the archive and implements several layers of caching of these data and of Lucene-indices created from the data. It is advisable to perform regular clean-up of the cache directories.
The IndexClient interface is defined by the Java interface JobIndexCache
Archive Admin DBSpecifics
Defines functionality specific to the type of database for the Archive Admin database, see javadoc for details.
Harvester DBSpecifics
Defines functionality specific to the type of database used for the Harvester module, see javadoc for details.
Notifications
The Notifications interface lets you choose how you want important error notifications to be handled in your system. Two implementations exist, one to send emails, and one to print the messages to System.err
. Adding more specialised plugins should be easy.
The Notifications interface is defined by the abstract class Notifications.
HeritrixController
The HeritrixController interface defines our interface for initialize a running Heritrix instance and communicate with this instance. We have two implementations that starts heritrix as its own process and then communicates with it using JMX (JMXHeritrixController, BnfHeritrixController), and a deprecated implementation with heritrix embedded inside NetarchiveSuite (DirectHeritrixController), which controls Heritrix using a CrawlController instance.
The HeritrixController interface is defined by the Java interface HeritrixController.
ActiveBitPreservation
The ActiveBitpreservaton interface defines our interface for initializing bitpreservation actions from our GUI. We have a filebased (now deprecated) and a database based implementation. Both these implementations communicate with the archive through the ArcRepository interface.
The ActiveBitPreservation interface is defined by the Java interface ActiveBitPreservation