Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Reverted from v. 4

We have prepared a bash shell script that starts all the necessary components on one machine. We will use this script throughout this quickstart manual to allow you to get a feel for what the system can do and how it works without having to deal with issues of distributing to other servers.

Table of Contents

Base system required

For the quick startup, NetarchiveSuite requires:

  • A Linux system with a minimum of 2GB free diskspace. (The minimum diskspace can be configured, but this is a reasonable minimum amount of space in which to store the harvested data.) Note that for the quickstart, you must be able to run a browser on the machine that you run the system on - this is an artifact of the quickstart system and is not the case in the full system.
  • Sun/Oracle Java SE (Standard Edition) JDK version 1.6.0_19 (or later) running on the Linux system (32-bit or 64-bit). Newer versions of Sun Java 1.6 will probably work, but have not been tested. Other Java versions such as OpenJDK and Oracle Java 7 are not tested or recommended. The latest download version of Sun Java 6 SE is "JDK 6 Update 43" (May 2013).
  • The standard Quickstart setup assumes that there are at least two users defined on the linux machine. One is your own normal login, and the additional user is named "test". The commands to install NetarchiveSuite are run from your own login. The commands install and run the NetarchiveSuite software under user "test". This simulates the more realistic productions situation where the software runs under various logins on one or more machines in a distributed network. For convenience, it is a good idea to configure the test-user to have password-free ssh access - i.e. you should be able to execute "ssh test@localhost" in a shell without entering the test-user's password.

Setup JMS

NetarchiveSuite uses Java Messaging Service (JMS) for communication between the different components.

To download and install it, do the following:

Install the openmq broker with:

Code Block
sh mq.sh install

This will download openmq, install and start it. 

OpenMQ will as defaut be installed to ~/openmq4.5. A alternative installdir can be defined the installdir variable prior to using the mq script, eg. 

Code Block
export installdir="netarchive/openmq" 
sh mq.sh start 

...

  • Root-access is not required to install and run NetarchiveSuite (although you will need root access to create the test-user.)

To check that you have the right version of Java do the following

  • start a terminal login to the linux system as an ordinary user
  • check java version is version 1.6.0_19 (or higher) by writing:

    Code Block
    $ java -version

    you should then see something like

    Code Block
    linux>java -version
    java version "1.6.0_19"
    Java(TM) SE Runtime Environment (build 1.6.0_19-b04)
    Java HotSpot(TM) Server VM (build 16.2-b04, mixed mode)
    

Downloading

Download of the newest release is described here

...

  • and put it in

...

  • the netarchive

...

  • directory you

...

  • created earlier.

Note: Instead of downloading a NetarchiveSuite.zip you can also build it yourself from the svn trunk:

Code Block
$ svn export https://sbforge.org/svn/netarchivesuite/trunk .
$ cd trunk
$ ant releasezipball
$ mv NetarchiveSuite.zip ../NetarchiveSuite.zip

Setup JMS

NetarchiveSuite uses JMS for inter-process communication. JMS is the Java Messaging Service, which provides asynchronous communication between processes. You do not need any knowledge of JMS to use NetarchiveSuite. However you need to make sure that there are not already JMS brokers running on your system using PORT 7676.

Currently only the open-source version of Sun's JMS implementation is supported, since some functionality of other implementations does not match our assumptions well.

To download and install it, do the following:

  • Create a new dir for the messagebus broker in the netarchive dir:

    Code Block
    $ cd ~/netarchive 
    $ mkdir broker
  • Open this link in a browser window http://mq.java.net/downloads.html
  • Click the Linux Link under version 4.5 binary Downloads to download a file openmq4_5-binary-Linux_X86.zip (or later version)
  • Save the download file to the broker directory.
  • Unpack the zip file.
  • Set necessary environment variables: IMQ_HOME, IMQ_VARHOME, IMQ_ETCHOME:

    Code Block
    $ export IMQ_HOME=$HOME/netarchive/broker/mq
    $ export IMQ_VARHOME=$IMQ_HOME/var
    $ export IMQ_ETCHOME=$IMQ_HOME/etc

     

  • Run imqbroker in order to create settings file
  • Code Block
    $ chmod +x $IMQ_HOME/bin/imqbrokerd
    $ $IMQ_HOME/bin/imqbrokerd 
  • Check that

    Code Block
     imqbrokerd 

    starts and that the last message is

    Code Block
     "Broker <localhost>:7676 ready" 
  • Stop the imqbroker by pressing

    Code Block
     control-C 
  • edit settings to allow for enough listeners to a queue by doing
    edit

    Code Block
    $IMQ_VARHOME/instances/imqbroker/props/config.properties
    
  • uncomment and specify count=20 for listeners by changing line

    Code Block
    #            imq.autocreate.queue.maxNumActiveConsumers

    to

    Code Block
                imq.autocreate.queue.maxNumActiveConsumers=20

To start it, do the following:

Code Block
$ cd netarchive
$ $IMQ_HOME/bin/imqbrokerd &

Installation

Download the following files to the netarchive directory:

The first script is a simple script for doing all the steps during deployment. It takes a NetarchiveSuite package ('.zip'), a configuration file (the second file), and a temporary installation directory as arguments (in the given order). The different ports used by the application for communication are included in the deploy_standalone_example.xml file.

In the configuration file all the applications are placed on one machine, the current machine (localhost).

When the installation script is run it will unpack the installation files into the netarchive/deploy directory and install NetarchiveSuite into the /home/test/QUICKSTART directory (using ssh).  Remember to check, that a Sun JVM is in the path for the test. If you already have a Quickstart installation, the existing bitarchive, database and admin.data files will be untouched. You must explicitly remove any previous installation, if you want a clean empty installation.

Code Block
$ chmod +x RunNetarchiveSuite.sh
$ ./RunNetarchiveSuite.sh NetarchiveSuite.zip deploy_standalone_example.xml deploy/

Note that if you have not setup your automatic ssh test user login (using key based login), you need to login some times before the installation finishes successfully. You must also have permission to ssh and scp to test@localhost (try e.g ssh test@localhost)

The script creates a deployment folder named "QUICKSTART" in e.g. /home/test/QUICKSTART, which contains methods for starting and stopping NetarchiveSuite, and starts the whole NetarchiveSuite. The files to run the installation will be placed in the ~/netarchive/deploy directory.

  • Start a web browser. Note that it is important that the browser is started on the same machine as the simple harvest script is run on
  • Anchor
    proxy
    proxy
    Setup the browser to proxy on port 8070 and exclude localhost and the hostname (used by the Heritrix GUI) e.g. in firefox:
Code Block
Choose in the firefox toolbar:
Edit->Preferences->Advanced->Network->Settings
Checkmark:
Manual Proxy Configuration
and add:
Proxy: localhost
Port: 8070
No Proxy for: localhost
  • Write following url in the started browser http://localhost:8074/HarvestDefinition
  • You can now see the webinterface in the browser. You can now create, run and browse according to the following or the User Manual
  • You can stop and start the entire NAS system with:
Code Block
ssh test@localhost
cd QUICKSTART
./conf/killall.sh
./conf/startall.sh
  • If you want to try other deploy examples, then go to "Examples of deploy configuration files" in the Installation Manual.
Section
Column
 
Column
width100%
 

...

Column

Image Modified