Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Contents

Table of Contents

Tools in Wayback Module

In addition to the tools described here, the NetarchiveSuite Java applications for continuous indexing of an arcrepository are described in the Configuration Manual. 

...

Wayback is a tool for browsing in webarchives. It can be downloaded from http://archive-access.sourceforge.net/projects/wayback/. The NetarchiveSuite plugin for wayback is a class NetarchiveResourceStore which implements org.archive.wayback.ResourceStore. NetarchiveResourceStore instantiates a connection to a NetarchiveSuite ArcRepository and retrieves archive data from it via NetarchiveSuite.  
In order to make use of the plugin, it is necessary to. :

  • Copy the required jar files into the lib-directory of your wayback installation.
  • Ensure that wayback has access to a NetarchiveSuite settings file with the necessary connection information.
  • Configure wayback to use NetarchiveResourceStore

The lib directory for wayback will be under

...

Code Block
<bean id="localcdxcollection" class="org.archive.wayback.webapp.WaybackCollection">
     <property name="resourceStore">       
 <bean<bean class="dk.netarkivet.wayback.NetarchiveResourceStore">       
 <</bean>          < 
</property>

    <property <property name="resourceIndex">             '
<bean class="org.archive.wayback.resourceindex.LocalResourceIndex">          
<property name="source">          
<bean class="org.archive.wayback.resourceindex.CompositeSearchResultSource">            
<property name="CDXSources">             
<list> <list>                                 
 <value>index1<value>index1.cdx</value>                 
 <value>index2<value>index2.cdex</value>              
</list>            
</property>          
</bean>              
</property>        
<property name="maxRecords" value="40000" />      
</bean>
 
  </property>  
</bean>

but should work with other types of wayback collection.
There is an ant build file which can be used to repack the wayback war-file with the addition of the netarchivesuite plugin. Ant tasks to unpack and repack the wayback war-file are in wayback.build.xml and there are samples settings in ''examples/wayback/*''.

...

Note the syntax of the regular expression which selects all arcfiles generated by job 1042 ''except'' for metadata arcfiles. The cdx filesgenerated files generated are unsorted. For use in wayback they must be sorted and merged e.g. using unix sort:

...

Code Block
java -cp dk.netarkivet.wayback.jar dk.netarkivet.wayback.DeduplicateToCDXApplication crawl1.log crawl2.log crawl3.log > out.cdx
Section



Column

Column
width100%
 
Column