Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Image Removed

Quality assurance is done by browsing the archive for selected domains. If something is missing on the pages the system can be set to automatically collect all the missing URL’s for later transfer to the harvesting system. Before doing Quality Control you need to setup your browser to use a proxyserver (see Quick Start Manual)

It is suitable to investigate one domain at a time (unless several domains are included in the same website-complex).

Start collecting URL’s Hereby Assurance is an essential activity on the curation of any digital collection. In NetarchiveSuite, the main tool for manual QA is the Viewerproxy. With Viewerproxy one can

  1. Browse in harvested material from a single harvest job or from a single run of a harvest definition
  2. Collect a list of URLs missed by the harvest, which can be added as seeds to a future harvest

In order to use Viewerproxy, your webbrowser needs to be set up to read data from the NetarchiveSuite archive, rather than from the living web. The details of the setup will depend on precisely how your installation of Netarchivesuite is configured. You will need to know i) the machine on which the Viewerproxy application is installed, and ii) the port number the Viewerproxy uses. Your web browser should then be set to use this machine and port as a proxy for all requests, except those to the machine where the NetarchiveSuite GUI is running. Most browsers have plugins which can help with managing proxy settings.

To use the Viewerproxy, select any Run number or JobID from the list of jobs or harvests and click on the line "Select these jobs for QA with viewerproxy ". You should see a page which looks like

Image Added

If you don't see this then your proxy setup is not correct. After a while, this page should redirect to something like this:

Image Added

 

Quality assurance is done by browsing the archive for selected domains. For example, simply open another tab and type a URL you expect to be present in the harvest you are studying. The page should be visible and navigable.

The various links under "Missing URL collection" can then be used to help you find material missed during harvesting.

Start collecting URL’s This starts the collection of URL’s. The Current Viewerproxy status textbox shows if the system collects is currently collecting URL’s or not – and how many URL’s are currently collected.''

Stop collecting URL’s Collection Stops the collection of further URL’s can be stopped at any time.''

Clear collected URL’s The list of URL’s can be cleared at any time e.g. when investigating a new domain starts. NB! This function can not be undone.

Show collected URL’s The list of URL’s can be viewed at any time. The list can be copied and manually be added to relevant Seed lists for the relevant domains in the harvesting system.