(IN PROGRESS)
At the KB-Denmark Netarkiv we are working on some quite radical changes to our backend architecture - replacing our ArcRepository storage with bitrepository.org software, and implementing a new mass-processing architecture probably based on hadoop. As part of this process we would like to know what parts of NAS are actually in use at our partner institutions so we can develop a strategy for future support.
NAS Applications
Which of the following NAS applications (services are in use in your production environment?
Application | Denmark | France | Austria | Spain | Sweden | Comments |
---|---|---|---|---|---|---|
HarvestControllerServer | y | y | ||||
GUIWebServer | y | y | ||||
HarvestJobManager | y | y | ||||
ChecksumFileServer | y | n | ||||
ViewerProxy | y | y (but only to access data and metadata files) | ||||
WaybackIndexer | y | n | ||||
AggregationWorker | y | n | ||||
IndexServer | y | y | ||||
ArcRepository | y | y | ||||
BitarchiveServer | y | n | ||||
BitarchiveMonitorServer | y | n | ||||
AccessBitarchiveServer | y/n | n | This is a special read-only server which is used in a specific data-extraction system in DK, outside the main Netarkivet installation. |
Plugins
Which of the following plugins are used in your production setup? Those marked with a are default values set in the packaged settings file.
Interface | Implementation | Denmark | France | Austria | Spain | Sweden | |
---|---|---|---|---|---|---|---|
AbstractRemoteFile | HTTPRemoteFile | ||||||
HTTPSRemoteFile | |||||||
FTPRemoteFile | y | ||||||
ActiveBitPreservation | DatabaseBasedActiveBitPreservation | ||||||
FileBasedActiveBitPreservation | y | ||||||
Admin | UpdateableAdminData | ||||||
DatabaseAdmin | y | ||||||
arcrepositoryadmin.DBSpecifics | DerbyServerSpecifics | ||||||
DerbyEmbeddedSpecifics | |||||||
MySQLSpecifics | |||||||
PostgreSQLSpecifics | y | ||||||
ChecksumArchive | FileChecksumArchive | y | |||||
DatabaseChecksumArchive | |||||||
JMSConnection | JMSConnectionSunMQ | y | |||||
ArcRepositoryClient | JMSArcRepositoryClient | y | |||||
LocalArcRepositoryClient | |||||||
MonitorRegistryClient | PrintMonitorRegistryClient | ||||||
JMSMonitorRegistryClient | y | ||||||
JobIndexCache | IndexRequestClient | y | |||||
Notifications | EMailNotifications | y | |||||
PrintNotifications | |||||||
FreeSpaceProvider | DefaultFreeSpaceProvider | y | |||||
FreeSpaceProvider | |||||||
OnbFreeSpaceProvider | |||||||
datamodel.DBSpecifics | DerbyServerSpecifics | ||||||
DerbyEmbeddedSpecifics | |||||||
MySQLSpecifics | |||||||
PostgreSQLSpecifics | y | ||||||
JobGenerator | DefaultJobGenerator | y | |||||
FixedDomainConfigurationCountJobGenerator | |||||||
ArchiveFileNaming | LegacyNamingConvention | y | |||||
CollectionPrefixNamingConvention | |||||||
FrontierReportFilter | TopTotalEnqueuesFilter | y | |||||
ExhaustedQueuesFilter | |||||||
MaxSizeFrontierReportExtract | |||||||
RetiredQueuesFilter | |||||||
HeritrixLauncherAbstract | HeritrixLauncher | y | |||||
IHeritrixController | HeritrixController | y | |||||
HarvestReport | LegacyHarvestReport | y | |||||
BnFHarvestReport | |||||||
IndexRequestServerInterface | IndexRequestServer | y |
Command Line Tools
Over the years, the NetarchiveSuite codebase has accumulated a lot of command line utilities. Some of these were probably developed for a single specialised use-case or for test purposes, but others may have become part of the normal workflow at the various repositories. Here is a partial list of those that look most likely to be of general interest. Please mark any of those you know of that are used as part of your workflows.
Tool | Purpose | Denmark | France | Austria | Spain | Sweden | |
---|---|---|---|---|---|---|---|
DeployApplication | y | ||||||
HarvestdatabaseUpdateApplication | y | ||||||
BuildCompleteSettings | y | ||||||
GetFile | Retrieves a file via the ArcRepository interface | ||||||
GetRecord | Retrieves a (w)arc-record via the ArcRepository interface | ||||||
LoadDatabaseChecksumArchive | Migration tool from file-based checksums to database-based checksums | ||||||
ReestablishAdminDatabase | For reestablishing the admin database from a 'admin.data' file | ||||||
RunBatch | Runs a batch job from the command line | ||||||
Upload | Uploads a file to the ArcRepository from the command line. (Handy for testdata.) | y | |||||
ReestablishAdminDatabase | |||||||
ClassDependencies | |||||||
CreateIndex | |||||||
RunChecksum | |||||||
SendDedupIndexRequestToIndexserver | |||||||
MakeIndex | |||||||
FindRelevantCrawllogLines | |||||||
Heritrix1Constants | |||||||
JMXProxy | |||||||
DeduplicateToCDXApplication | |||||||
ResetFailedFiles | |||||||
ARCReaderUtils | |||||||
TestBitrepository | |||||||
ArcWrap | |||||||
ExtractCDX | |||||||
JMSBroker | |||||||
WriteBytesToFile | |||||||
FTPValidator SimpleCmdlineTool ArcMerge ArchiveExtractCDX WARCExtractCDX ReformatTranslationFile MailValidator DigestIndexer MakeNewMetadataFile FindDomainsForCrawllogExtraction CheckDuplicateReduction StandaloneApplicationReduced SchedulerDatabaseBuilder MigrateDefaultHarvestDatabase CreateCDXMetadataFile Heritrix3ControllerTest H3LaunchTest HarvesterQueueControl HarvestDatabaseValidator HarvestTemplateApplication CheckDomainCrawltraps CheckTrapsInFile | |||||||