...
Application | Denmark | France | Austria | Spain | Sweden | Comments |
---|---|---|---|---|---|---|
HarvestControllerServer | y | y | y | y | y | |
GUIWebServer | y | y | y | n | y | |
HarvestJobManager | y | y | y | y | y | |
ChecksumFileServer | y | n | n | n | y | |
ViewerProxy | y | y | n | y | y (but only to access data and metadata files) | BnF: we only use the ViewerProxy to get access to warc files Sweden: currently not working due to storage requirements |
WaybackIndexer | y | n | n | n | ||
AggregationWorker | y | n | n | n | ||
IndexServer | y | y | y | y | y | |
ArcRepository | y | y | y | y | y | |
BitarchiveServer | y | n | y | n | y | |
BitarchiveMonitorServer | y | n | y | n | y | |
AccessBitarchiveServer | y/n | n | n | n | This is a special read-only server which is used in a specific data-extraction system in DK, outside the main Netarkivet installation. |
...
Which of the following plugins are used in your production setup? Those marked with a are default values set in the packaged settings file.
Interface | Implementation | Denmark | France | Austria | Spain | Sweden | |
---|---|---|---|---|---|---|---|
AbstractRemoteFile | HTTPRemoteFile | y | |||||
HTTPSRemoteFile | |||||||
FTPRemoteFile | y | y | |||||
ActiveBitPreservation | DatabaseBasedActiveBitPreservation | FileBasedActiveBitPreservation y | |||||
FileBasedActiveBitPreservation | y | y | |||||
Admin | UpdateableAdminData | ||||||
DatabaseAdmin | y | y | y | ||||
arcrepositoryadmin.DBSpecifics | DerbyServerSpecifics | ||||||
DerbyEmbeddedSpecifics | |||||||
MySQLSpecifics | |||||||
PostgreSQLSpecifics | y | y | y | ||||
ChecksumArchive | FileChecksumArchive | y | y | ||||
DatabaseChecksumArchive | |||||||
JMSConnection | JMSConnectionSunMQ | y | y | y | y | y | |
ArcRepositoryClient | JMSArcRepositoryClient | y | |||||
LocalArcRepositoryClient | y | ||||||
MonitorRegistryClient | PrintMonitorRegistryClient | ||||||
JMSMonitorRegistryClient | y | y | y | y | |||
JobIndexCache | IndexRequestClient | y | y | y | y | ||
Notifications | EMailNotifications | y | y | ||||
PrintNotifications | y | ||||||
FreeSpaceProvider | DefaultFreeSpaceProvider | y | y | ||||
FreeSpaceProvider | |||||||
OnbFreeSpaceProvider | y | ||||||
datamodel.DBSpecifics | DerbyServerSpecifics | ||||||
DerbyEmbeddedSpecifics | |||||||
MySQLSpecifics | |||||||
PostgreSQLSpecifics | y | y | y | ||||
JobGenerator | DefaultJobGenerator | y | y | ||||
FixedDomainConfigurationCountJobGenerator | y | y | |||||
ArchiveFileNaming | LegacyNamingConvention | y | y | y | |||
CollectionPrefixNamingConvention | y | ||||||
FrontierReportFilter | TopTotalEnqueuesFilter | y | y | y | y | ||
ExhaustedQueuesFilter | |||||||
MaxSizeFrontierReportExtract | |||||||
RetiredQueuesFilter | y | ||||||
HeritrixLauncherAbstract | HeritrixLauncher | y | y | y | y | ||
IHeritrixController | HeritrixController | y | y | y | y | ||
HarvestReport | LegacyHarvestReport | y | y | y | |||
BnFHarvestReport | y | ||||||
IndexRequestServerInterface | IndexRequestServer | y | y | y | y |
Command Line Tools
Over the years, the NetarchiveSuite codebase has accumulated a lot of command line utilities. Some of these were probably developed for a single specialised use-case or for test purposes, but others may have become part of the normal workflow at the various repositories. Here is a partial list of those that look most likely to be of general interest. Please mark any of those you know of that are used as part of your workflows.
Tool | Purpose | Denmark | France | Austria | Spain | Sweden | |
---|---|---|---|---|---|---|---|
DeployApplication | Creates deploy scripts from a deploy-config | y | y | y | n | ||
HarvestdatabaseUpdateApplication | Updates HarvestDB schema | y | y | y | |||
BuildCompleteSettings | Merges module settings files in NAS to one large global default settings file. Run as part of release process. | y | n | ||||
GetFile | Retrieves a file via the ArcRepository interface | y | n | ||||
GetRecord | Retrieves a (w)arc-record via the ArcRepository interface | y | n | ||||
LoadDatabaseChecksumArchive | Migration tool from file-based checksums to database-based checksums | n(?) | |||||
ReestablishAdminDatabase | For reestablishing the admin database from a 'admin.data' file | ||||||
RunBatch | Runs a batch job from the command line | y | |||||
Upload | Uploads a file to the ArcRepository from the command line. (Handy for testdata.) | y | y | ||||
ReestablishAdminDatabase | Should be deprecated |
Reads old admin.data file. | |||||||
ClassDependencies | Non NAS Utility (license is not ours) | ||||||
CreateIndex | CLI to talk to IndexServer via IndexClient | ||||||
RunChecksum | CLI to get all checksums from a Bitarchive (deprecated) | n(?) | |||||
SendDedupIndexRequestToIndexserver | Asynchronously starts a dedup indexing on an IndexServer and then exits. Tue Hejlskov Larsen is this what you use to generate deduplication indexes? | i don't know ... | |||||
MakeIndex | Runs a CDX extraction on a single file in a remote ArcRepository | ||||||
FindRelevantCrawllogLines | Finds crawl-log lines matching a given domain name in a local metadata file | ||||||
JMXProxy | "This tool will simply reregister all MBeans that matches the given query from the JMX hosts read in settings, using* its own platformmbeanserver. It will then wait forever." | ||||||
DeduplicateToCDXApplication | Extracts CDX records for deduplicate annotations from a local crawl log file | ||||||
ResetFailedFiles | Utility for WaybackIndexer to reset files that have failed more than 3 times so they can be retried | n(?) | |||||
ARCReaderUtils | Splits an arcfile (not warc) and dumps results to a directory | ||||||
ArcWrap | Creates an arcfile by wrapping a file | ||||||
ExtractCDX | Extracts CDX records, unsorted, from a list of local input arcfiles (not warcs) | ||||||
JMSBroker | Checks that a JMS broker (as specified in NAS settings) is up and running. | ||||||
WriteBytesToFile | Just creates large files full of null bytes | ||||||
FTPValidator | Tests if an ftp server configuration in a NAS settings file points to a NAS-compliand ftp server. | ||||||
ArcMerge | Merges several arcfiles into one arcfile | ||||||
ArchiveExtractCDX | Extracts CDX records, unsorted, from a list of local input (w)arcfiles | ||||||
WARCExtractCDX | Extracts CDX records, unsorted, from a list of local input warcfiles | ||||||
ReformatTranslationFile | i) reorders a translation file so keys are in the same order as a reference file, and ii) allows the encoding of the output file to be changed | ||||||
MailValidator | Checks the validity of a mail-server configured in NAS settings by sending a test-mail | ||||||
MakeNewMetadataFile | Creates a metadata file. For use when postprocessing fails. Is this used? | ||||||
FindDomainsForCrawllogExtraction | ? | ||||||
CheckDuplicateReduction | Validates deduplication by comparing a crawl log with a collection of arcfiles. (not warc) | ||||||
StandaloneApplicationReduced | Creates a standalone NetarchiveSuite in a single JVM | ||||||
MigrateDefaultHarvestDatabase | This just initialises a SiteSection object which is supposed to upgrade the harvest database as a side-effect | ||||||
CreateCDXMetadataFile | Complex tool that takes a set of filenames and runs a batch job to extracts the cdx'es from each files and pack them in a metadata arc or warc file, one record per input file | ||||||
HarvesterQueueControl | Tool to count the number of messages in a given JMS queue | ||||||
HarvestDatabaseValidator | Validates whether you can connect to the harvest database with the settings in a given settings file | ||||||
HarvestTemplateApplication | Utility for uploading and updating heritrix templates | y (in test) | |||||
CheckDomainCrawltraps | Runs through all domains in the harvest database and checks whether each crawlertrap regexp can validly be included as text-content in an xml document | y | |||||
CheckTrapsInFile | Runs through a list of crawler-trap regexes in a fileand checks whether each crawlertrap regex can validly be included as text-content in an xml documen | y(?) |