...
Tool | Purpose | Denmark | France | Austria | Spain | Sweden | |
---|---|---|---|---|---|---|---|
DeployApplication | Creates deploy scripts from a deploy-config | y | |||||
HarvestdatabaseUpdateApplication | Updates HarvestDB schema | y | |||||
BuildCompleteSettings | Merges module settings files in NAS to one large global default settings file. Run as part of release process. | y | |||||
GetFile | Retrieves a file via the ArcRepository interface | ||||||
GetRecord | Retrieves a (w)arc-record via the ArcRepository interface | ||||||
LoadDatabaseChecksumArchive | Migration tool from file-based checksums to database-based checksums | ||||||
ReestablishAdminDatabase | For reestablishing the admin database from a 'admin.data' file | ||||||
RunBatch | Runs a batch job from the command line | ||||||
Upload | Uploads a file to the ArcRepository from the command line. (Handy for testdata.) | y | |||||
ReestablishAdminDatabase | Should be deprecated Reads old admin.data file. | ||||||
ClassDependencies | Non NAS Utility (license is not ours) | ||||||
CreateIndex | CLI to talk to IndexServer via IndexClient | ||||||
RunChecksum | CLI to get all checksums from a Bitarchive (deprecated) | ||||||
SendDedupIndexRequestToIndexserver | Asynchronously starts a dedup indexing on an IndexServer and then exits. Tue Hejlskov Larsen is this what you use to generate deduplication indexes? | ||||||
MakeIndex | Runs a CDX extraction on a single file in a remote ArcRepository | ||||||
FindRelevantCrawllogLines | Finds crawl-log lines matching a given domain name in a local metadata file | ||||||
JMXProxy | "This tool will simply reregister all MBeans that matches the given query from the JMX hosts read in settings, using* its own platformmbeanserver. It will then wait forever." | ||||||
DeduplicateToCDXApplication | Extracts CDX records for deduplicate annotations from a local crawl log file | ||||||
ResetFailedFiles | Utility for WaybackIndexer to reset files that have failed more than 3 times so they can be retried | ||||||
ARCReaderUtils | Splits an arcfile (not warc) and dumps results to a directory | ||||||
ArcWrap | Creates an arcfile by wrapping a file | ||||||
ExtractCDX | Extracts CDX records, unsorted, from a list of local input arcfiles (not warcs) | ||||||
JMSBroker | Checks that a JMS broker (as specified in NAS settings) is up and running. | ||||||
WriteBytesToFile | Just creates large files full of null bytes | ||||||
FTPValidator | Tests if an ftp server configuration in a NAS settings file points to a NAS-compliand ftp server. | ||||||
ArcMerge | Merges several arcfiles into one arcfile | ||||||
ArchiveExtractCDX | Extracts CDX records, unsorted, from a list of local input (w)arcfiles | ||||||
WARCExtractCDX | Extracts CDX records, unsorted, from a list of local input warcfiles | ||||||
ReformatTranslationFile | i) reorders a translation file so keys are in the same order as a reference file, and ii) allows the encoding of the output file to be changed | ||||||
MailValidator | Checks the validity of a mail-server configured in NAS settings by sending a test-mail | ||||||
MakeNewMetadataFile | Creates a metadata file. For use when postprocessing fails. Is this used? | ||||||
FindDomainsForCrawllogExtraction | ? | ||||||
CheckDuplicateReduction | Validates deduplication by comparing a crawl log with a collection of arcfiles. (not warc) | ||||||
StandaloneApplicationReduced | Creates a standalone NetarchiveSuite in a single JVM | ||||||
MigrateDefaultHarvestDatabase | This just initialises a SiteSection object which is supposed to upgrade the harvest database as a side-effect | ||||||
CreateCDXMetadataFile | Complex tool that takes a set of filenames and runs a batch job to extracts the cdx'es from each files and pack them in a metadata arc or warc file, one record per input file | ||||||
HarvesterQueueControl | Tool to count the number of messages in a given JMS queue | ||||||
HarvestDatabaseValidator | Validates whether you can connect to the harvest database with the settings in a given settings file | ||||||
HarvestTemplateApplication | Utility for uploading and updating heritrix templates | y (in test) | |||||
CheckDomainCrawltraps | Runs through all domains in the harvest database and checks whether each crawlertrap regexp can validly be included as text-content in an xml document | y | |||||
CheckTrapsInFile | Runs through a list of crawler-trap regexes in a fileand checks whether each crawlertrap regex can validly be included as text-content in an xml documen | y(?) |