Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Excerpt

This is a quick two-page t overview of the software and its major features. It can be read by anybody who is interested.

Introduction

The primary function of the NetarchiveSuite is to plan, schedule and archive web harvests of parts of the internet. We use Heritrix Heritrix3 as our web-crawler.
NetarchiveSuite was released on July 2007 as Open Source under the LGPL license and is used by the Danish organization Netarkivet.dk. This organization has since July 2005 been using NetarchiveSuite to harvest Danish websites as authorized by the latest Danish Legal Deposit Act.

...

The NetarchiveSuite is split into four main modules: One module with common functionality and three modules corresponding to processes of harvesting, archiving and accessing, respectively.

Gliffy
nameNetarchiveSuite overview

The Common Module

The framework and utilities used by the whole suite, like exceptions, settings, messaging, file transfer (RemoteFile), and logging. It also defines the Java interfaces used to communicate between the different modules, to support alternative implementations.

...