...
Fortunately, we already had about 6TB of cdx index files available on isilon, covering netarkivet up to early-2011. I set up an rsync process to copy the remaining files over. (See http://noerdroid.blogspot.dk/2013/06/getting-from-to-c-via-b-with-ssh.html for more info.) The full index is now around 8.3TB. The files are in scape@iapetus:/home/scape/scape-hdfs/csr/
(Copying the files over highlighted a problem with our naming procedure for wayback index files. We use rollover-naming as one does for logfiles. But this means that the filenames are reused so it can be difficult when one returns to making a copy after a period of time to know which files one has already copied over. )
ResourceStore
This is the main difference from the current production installation of wayback. The entire netarkiv arcrepository is mounted under /home/scape/netarkiv . One defines a ListFactoryBean containing a DirectoryResourceFileSource pointing to this directory. The ResourceStore used is a LocationDBResourceStore which has a reference to this ListFactoryBean.
Shutdownables
The Shutdownables to be configured are just those which monitor the ResourceStore for new archive files and add them to the relevant database.
Startup
When wayback is restarted, the ResourceIndex is immediately functional and it is possible to test it by searching in the wayback web-UI. This worked out of the box. Clicking on a link to a search result initially failed, simply because the ResourceStore had not had time to build up its initial database of all archive files. After a few hours this was complete. Then loading the page in wayback was found to give a server error with a stacktrace in the tomcat logs.
Bugfixing