Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

(IN PROGRESS)

At the KB-Denmark Netarkiv we are working on some quite radical changes to our backend architecture - replacing our ArcRepository storage with bitrepository.org software, and implementing a new mass-processing architecture probably based on hadoop. As part of this process we would like to know what parts of NAS are actually in use at our partner institutions so we can develop a strategy for future support. 

NAS Applications

Which of the following NAS applications (services are in use in your production environment?

ApplicationDenmarkFranceAustriaSpainSwedenComments
HarvestControllerServeryy



GUIWebServeryy



HarvestJobManageryy



ChecksumFileServeryn



ViewerProxyyy (but only to access data and metadata files)



WaybackIndexeryn



AggregationWorkeryn



IndexServeryy



ArcRepositoryyy



BitarchiveServeryn



BitarchiveMonitorServeryn



AccessBitarchiveServery/nn


This is a special read-only server which is used in a specific data-extraction system in DK, outside the main Netarkivet installation.

Plugins

Which of the following plugins are used in your production setup? Those marked with a (star) are default values set in the packaged settings file.

InterfaceImplementationDenmarkFranceAustriaSpainSweden
AbstractRemoteFileHTTPRemoteFile





HTTPSRemoteFile





FTPRemoteFile (star)y




ActiveBitPreservationDatabaseBasedActiveBitPreservation





FileBasedActiveBitPreservation (star)y




Admin

UpdateableAdminData







DatabaseAdmin

y




arcrepositoryadmin.DBSpecifics

DerbyServerSpecifics







DerbyEmbeddedSpecifics







MySQLSpecifics





PostgreSQLSpecifics

y




ChecksumArchive

FileChecksumArchive 

y




DatabaseChecksumArchive







JMSConnectionJMSConnectionSunMQ (star)y




ArcRepositoryClientJMSArcRepositoryClienty




LocalArcRepositoryClient





MonitorRegistryClient

PrintMonitorRegistryClient







JMSMonitorRegistryClient (star)y




JobIndexCacheIndexRequestClient  (star)y




Notifications

EMailNotifications (star)

y




PrintNotifications







FreeSpaceProvider

DefaultFreeSpaceProvider (star)

y




FreeSpaceProvider







OnbFreeSpaceProvider







datamodel.DBSpecifics



DerbyServerSpecifics (star)







DerbyEmbeddedSpecifics







MySQLSpecifics







PostgreSQLSpecifics

y




JobGenerator

DefaultJobGenerator (star)

y




FixedDomainConfigurationCountJobGenerator







ArchiveFileNaming

LegacyNamingConvention (star)

y




CollectionPrefixNamingConvention







FrontierReportFilter

TopTotalEnqueuesFilter (star)

y




ExhaustedQueuesFilter







MaxSizeFrontierReportExtract







RetiredQueuesFilter







HeritrixLauncherAbstractHeritrixLauncher (star)y




IHeritrixControllerHeritrixController (star)y




HarvestReport

LegacyHarvestReport (star)

y




BnFHarvestReport







IndexRequestServerInterfaceIndexRequestServer (star)y





Command Line Tools

Over the years, the NetarchiveSuite codebase has accumulated a lot of command line utilities. Some of these were probably developed for a single specialised use-case or for test purposes, but others may have become part of the normal workflow at the various repositories. Here is a partial list of those that look most likely to be of general interest. Please mark any of those you know of that are used as part of your workflows.

ToolPurposeDenmarkFranceAustriaSpainSweden
DeployApplication
y




HarvestdatabaseUpdateApplication
y




BuildCompleteSettings
y




GetFileRetrieves a file via the ArcRepository interface





GetRecordRetrieves a (w)arc-record via the ArcRepository interface





LoadDatabaseChecksumArchiveMigration tool from file-based checksums to database-based checksums





ReestablishAdminDatabaseFor reestablishing the admin database from a 'admin.data' file





RunBatchRuns a batch job from the command line





UploadUploads a file to the ArcRepository from the command line. (Handy for testdata.)y




ReestablishAdminDatabase









ClassDependencies








CreateIndex








RunChecksum








SendDedupIndexRequestToIndexserver








MakeIndex








FindRelevantCrawllogLines






Heritrix1Constants






JMXProxy






DeduplicateToCDXApplication






ResetFailedFiles






ARCReaderUtils






TestBitrepository






ArcWrap






ExtractCDX






JMSBroker






WriteBytesToFile






FTPValidator








SimpleCmdlineTool








ArcMerge








ArchiveExtractCDX








WARCExtractCDX








ReformatTranslationFile








MailValidator








DigestIndexer








MakeNewMetadataFile








FindDomainsForCrawllogExtraction








CheckDuplicateReduction








StandaloneApplicationReduced








SchedulerDatabaseBuilder








MigrateDefaultHarvestDatabase






CreateCDXMetadataFile






Heritrix3ControllerTest






H3LaunchTest






HarvesterQueueControl






HarvestDatabaseValidator






HarvestTemplateApplication






CheckDomainCrawltraps






CheckTrapsInFile










  • No labels