Goals
The goals of this mini-project were:
- To test openwayback (beta) to see if it actually functions with our archive
- To learn how to configure and deploy openwayback to operate with a locally-mounted archive
- To test the performance of openwayback on a locally-mounted archive
Installation
Cloning and building wayback from its github distribution (git@github.com:iipc/openwayback.git) was painless. As with older distributions of wayback, the accepted installation procedure is
- Deploy wayback jar-file on a running tomcat and wait for it to be unpacked
- Shut tomcat down
- Replace wayback's config files with your own
- Restart tomcat
This is no-more an acceptable deployment procedure now than it ever has been. However there now exists an example using maven overlays (https://github.com/iipc/openwayback-sample-overlay) which should be investigated as a better way to create a directly deployable package.
Wayback was deployed on a tomcat 7.0.52 running at scape@iapetus:csr/tomcat on port 6051.
Initially I tried deploying to the web-context "/wayback" but was unable to get it to work and switched to deploying to the ROOT context by renaming wayback.war to ROOT.war in the webapps directory.
Configuration
A wayback AccessPoint requires various elements of which the most important is a WaybackCollection which in turn consists of a ResourceIndex, a ResourceStore, and one or more Shutdownables.
ResourceIndex
Fortunately, we already had about 6TB of cdx index files available on isilon, covering netarkivet up to early-2011. I set up an rsync process to copy the remaining files over. (See http://noerdroid.blogspot.dk/2013/06/getting-from-to-c-via-b-with-ssh.html for more info.) The full index is now around 8.3TB. The files are in scape@iapetus:/home/scape/scape-hdfs/csr/
(Copying the files over highlighted a problem with our naming procedure for wayback index files. We use rollover-naming as one does for logfiles. But this means that the filenames are reused so it can be difficult when one returns to making a copy after a period of time to know which files one has already copied over. )
ResourceStore
This is the main difference from the current production installation of wayback. The entire netarkiv arcrepository is mounted under /home/scape/netarkiv . One defines a ListFactoryBean containing a DirectoryResourceFileSource pointing to this directory. The ResourceStore used is a LocationDBResourceStore which has a reference to this ListFactoryBean.
Shutdownables
The Shutdownables to be configured are just those which monitor the ResourceStore for new archive files and add them to the relevant database.
Startup
When wayback is restarted, the ResourceIndex is immediately functional and it is possible to test it by searching in the wayback web-UI. This worked out of the box. Clicking on a link to a search result initially failed, simply because the ResourceStore had not had time to build up its initial database of all archive files. After a few hours this was complete. Then loading the page in wayback was found to give a server error with a stacktrace in the tomcat logs.
Bugfixing