Note that this documentation is for the coming release NetarchiveSuite 7.4
and is still work-in-progress.
For documentation on the released versions, please view the previous versions of the NetarchiveSuite documentation and select the relevant version.
System Design
- M
- Sr
- Colin Samuel Rosenthal
This is a document describing the design of the NetarchiveSuite software.This document only describes the underlying design of the NetarchiveSuite software. It does not describe how to install, run, or use NetarchiveSuite. For that see the Installation Manual and the User Manual.
The first section gives an overview, and the remainder of the document gives more details about the design.
The code is available through the our releases and download page or from our github repository .
Contents
- Overall Systems Design
- Settings
- Localization
- JMS Channels
- JSP
- Pluggable parts of Netarchivesuite
- XML handling by Deploy
- Archive Design — The Archive Design Description contains the description of overview of how the archive works, describes Indexing and caching and describes CrawlLogIndexCache.
- Harvester design
- ViewerProxy Design — This section describes the viewerproxy control resolver, the special viewerproxy access via urls and the observer resolver.
- Index Server Design
Audience
The reader is expected to be familiar with Java programming and have an understanding of the core issues involved in large-scale web harvesting. Previous use of Heritrix is a definite plus, and an elementary understanding of SQL databases is required for some parts.