Contents
Table of Contents |
---|
Tools in Wayback Module
In addition to the tools described here, the NetarchiveSuite Java applications for continuous indexing of an arcrepository are described in the Configuration Manual.
...
Wayback is a tool for browsing in webarchives. It can be downloaded from http://archive-access.sourceforge.net/projects/wayback/. The NetarchiveSuite plugin for wayback is a class NetarchiveResourceStore
which implements org.archive.wayback.ResourceStore
. NetarchiveResourceStore
instantiates a connection to a NetarchiveSuite ArcRepository and retrieves archive data from it via NetarchiveSuite.
In order to make use of the plugin, it is necessary to. :
- Copy the required jar files into the lib-directory of your wayback installation.
- Ensure that wayback has access to a NetarchiveSuite settings file with the necessary connection information.
- Configure wayback to use NetarchiveResourceStore
The lib directory for wayback will be under
...
Code Block |
---|
<bean id="localcdxcollection" class="org.archive.wayback.webapp.WaybackCollection"> <property name="resourceStore"> <bean<bean class="dk.netarkivet.wayback.NetarchiveResourceStore"> <</bean> < </property> <property <property name="resourceIndex"> ' <bean class="org.archive.wayback.resourceindex.LocalResourceIndex"> <property name="source"> <bean class="org.archive.wayback.resourceindex.CompositeSearchResultSource"> <property name="CDXSources"> <list> <list> <value>index1<value>index1.cdx</value> <value>index2<value>index2.cdex</value> </list> </property> </bean> </property> <property name="maxRecords" value="40000" /> </bean> </property> </bean> |
but should work with other types of wayback collection.
There is an ant build file which can be used to repack the wayback war-file with the addition of the netarchivesuite plugin. Ant tasks to unpack and repack the wayback war-file are in wayback.build.xml and there are samples settings in ''examples/wayback/*''.
...
Note the syntax of the regular expression which selects all arcfiles generated by job 1042 ''except'' for metadata arcfiles. The cdx filesgenerated files generated are unsorted. For use in wayback they must be sorted and merged e.g. using unix sort:
...
Code Block |
---|
java -cp dk.netarkivet.wayback.jar dk.netarkivet.wayback.DeduplicateToCDXApplication crawl1.log crawl2.log crawl3.log > out.cdx |
Section | |||||||||
---|---|---|---|---|---|---|---|---|---|
|