Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Test basic start and stop of selectiv-,event- and snapshot harvesting, scheduling and deduplication.

Check of monitoring and basic settings

Chaech taht

  • No error messages are received at start up
  • The monitoring works
  • The database works and contains test data

Do following in a browser:

Start Program

  1. Go to http://$GUIadminserver:$http-port/HarvestDefinition/ where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication. In the one-machine setup (deploy_example_one_machine.xml ) the link will be : http://localhost:8074

Check that JMX works (Partially tested by SystemOverviewTest#generalTest)

  • Click 'Systemstate'->'Overview of the system state'
  • Check that all internally developed applications are up and running
  • Check that last status message for each application do not contain errors or warnings
  • Check that there are no empty log messages
  • Click on an physical location in the 'Location' column e.g. "K"
  • Check that you now only see relevant SW applications for the chosen organisation
  • Click 'Show all' in the 'Location' header
  • Check that you return to the full listening again
  • Repeat the above 4 steps for Machine ", "HTTP Port", "Application", "Instance Id", "Priority", "Use Replica" and "Index" (Index shows all log lines in the given appl. log)

. The Instance ID column in the System overview GUI is a technical suffix to the Application column to separate more than one Application of the same type on the same server. If there is only 1 Application on a server it will normally be empty. If there is more than 1 Application of the same type on the same server, there must be added a suffix i.e. an Instance ID. It is userdefined (in the deploy script) and must be unique.

  • Click 'Systemstate' -> 'Overview of the system state'
  • Check that you are back to the full overview with log line 0

Check that basic database data is present

  • Click 'Definitions'->'Find Domain(s)'
  • Search for =netarkivet.dk= by writing this text and click 'Search'
  • Check that the GUI returns a result-set of one, namely the domain =netarkivet.dk=
  • Click on the link =netarkivet.dk=, and the page for domain =netarkivet.dk= should be shown without errors
  • Click 'Edit' on the line configuration line for defaultconfig
  • Check that Name is "defaultconfig"
  • Check that Harvest template is "default_orderxml"
  • Check that Maximum number of objects is "2.000"
  • Check that Maximum number of bytes is "500.000.000"

Check that mail recipients specified in the start of this test receive no error mails

  • Check that there are no mails with error messages about non-existing files
  • Check that there are no mails with error messages about applications that could not be started

Running selective harvest

Partially tested by SelectiveHarvestTest

  1. Go to http://$GUIadminserver:$http-port/HarvestDefinition/where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication. In the one-machine setup (deploy_example_one_machine.xml ) the link will be : http://localhost:8074
  2. Make a new selective harvest definition with a name you can remember
    1. Click 'Definitions'->'Selective Harvests' in the left menu
    2. *Click 'Create new harvest definition' in the bottom of the main window
    3.  Fill in the Harvest name and note the name for later use (from now referred as <sh. name>)
    4. Choose "Once_a_week" in the drop down list for 'Schedule'
    5. Write =netarkivet.dk= in the 'Enter Domain...' window and click 'Add domains'
    6. If =netarkivet.dk= is unknown (i.e. not registered in the domain table), the button "Create and add to the harvest definition" is added to the to page, and you then need to click on this button.
    7. Click 'Save'
  3. Activate the selective harvest
    1. Click 'Activate' in column 5 on the line with the <sh. name>
    2. Check that the time in the ’Next Run’ column time on the line with the <sh. name> is now.
  4. Check harvest status of the selective harvest
    1. Click 'Harvest status'->'All Jobs' in the left menu
    2. Select "All" in "Only display job status" to the right from the menu
    3. Click the "Show" button, until the name appears in a new job line (approx. after a minute)
    4. Check that the job has status "NEW", it may have turned into status "SUBMITTED" or status "STARTED" before you see it.
  5. Check job creation in the system status for the selective harvest
    1. Click 'Systemstate'->'Overview of the system state'
    2. Find and click 'HarvestJobManagerApplication' in the 'Application' column for the KB kb-test-adm-001
    3. Click 'show all' in the ‘Index’ header
    4. Check that there exists a line with the message "INFO: Created 1 jobs for harvest definition ' and a line after that "INFO: Job #1 submitted, and later the line: "INFO: Job #1 has been started by the harvester."

Define and run an event harvest

This page describes how to define and run an event harvest. It also test that seeds lists are created from first and second level definition of the domain names.

  1. Make a new selective (event) harvest definition with a name you can remember
    1. Click 'Definitions'->'Selective Harvests' in the left menu
    2. Click 'Create new harvestdefinition' in the bottom of the main window
    3. Fill in the Harvest name and note the name for later use (from now referred as <eh. name>)
    4. Choose '''Once_an_hour''' in the drop down list for 'Schedule'
    5. Click Save (DO NOT CLICK ACTIVATE YET)
  2. Add seeds to the selective (event) harvest
    1. Click 'Edit' in column 6 on the line with the <eh. name>
    2. Write domain list from 'Seed list 1' given below to a file on your desktop e.g. notepad)
    3. Click 'Add seeds from a file' at the bottom of the main page
    4. Click 'Browse" and pick up the just created file with seeds
    5. Choose '''frontpages''' in the drop-down list for 'Harvest template'
    6. Click 'Insert'
    7. Now click 'Add seeds'
    8. Choose '''frontpages_plus_2levels''' in the drop-down list for 'Harvest template'
    9. Write domain list from 'Seed list 2' given below (you can cut and paste from this page)
    10. Click 'Insert'
    11. *Click 'Save'
  3. Check that seed lists for domains in Seed list 1 has changed correspondingly (You have to click on Show unused configurations/seedlists show all)
    1. For each of the domains =raeder.dk=, =statsbiblioteket.dk=, =netarkivet.dk= do:
    2. Click 'Definitions'->'Find Domain(s)'
    3. Search for domain by writing its name as text and click 'Search'
    4. Check that there exists a configuration with the name "<eh. name>_frontpages__" __"
    5. Check that there exists a seed list with the name "<eh. name>_frontpages
    6. Click 'Edit' in the line with seed list "<eh. name>_frontpages__" __",
    7. Check that the seed list shown corresponds to the seed list for the domain (see below)
    8. Check that seed lists for domains in Seed list 2 has changed correspondingly (you have to click on Show unused configurations/seedlists show all)
    9. For the domains =kaarefc.dk=, =netarkivet.dk= do:
    10. Click 'Definitions'->'Find Domain(s)'
    11. Search for =netarkivet.dk= by writing this text and click Search
    12. Check that there exists a configuration with name "<eh. name>_frontpages_plus_2levels
    13. Check that there exists a seed list with the name "<eh. name>_frontpages_plus_2levels__" __"
    14. Click 'Edit' in the line with seed list "<eh. name>_frontpages_plus_2levels
    15. Check that the seed list shown corresponds to the seed list for the domain (see below)
  4. Activate the harvest
    1. Click 'Definitions'->'Selective Harvests' in the left menu
    2. Click 'Activate' in column 5 on the line with the <eh. name>
  5. Check harvest status of the event harvest using menu "All Jobs"
    1. Click 'Harvest status'->'All Jobs' in the left menu
    2. Select "All" in "Only display job status" to the rigth from the menu
    3. Click the "Show" button, until the <eh. name> appears in a new job line (approx. after a minute)
    4. Check that two jobs appears and that they both have Harvest name <eh. name>
    5. Check the menu "Running jobs", that the jobs appears and that you can go to the Heritrix GUI. by clicking on the host link and by using the login/password: "admin"/"adminPassword" and close the window again.

Seed list 1 (Harvest template "frontpages"):

http://netarkivet.dk/adgang-for-forskere/
http://netarkivet.dk/index-en.php
http://www.raeder.dk/
http://kb-prod-udv-001.kb.dk/netarchivesuite/clock.php

Seed list 2 (Harvest template "frontpages_plus_2levels"):

http://netarkivet.dk/index-en.php
http://www.kaarefc.dk/
http://www.kaarefc.dk/private/
http://www.kaarefc.dk/wop/

Seed list "<eh. name>_frontpages__" for domain =raeder.dk= __"

http://www.raeder.dk/

Seed list "<eh. name>_frontpages

http://kb-prod-udv-001.kb.dk/netarchivesuite/clock.php

Seed list "<eh. name>_frontpages__" for domain =netarkivet.dk=

 

http://netarkivet.dk/index-en.php
http://netarkivet.dk/adgang-for-forskere/

Seed list "<eh. name>_frontpages_plus_2levels

 

http://netarkivet.dk/index-en.php

Seed list "<eh. name>_frontpages_plus_2levels" for domain =kaarefc.dk=

http://www.kaarefc.dk/
  • No labels