2011 Workshop Technical Track

Schedule

Day 1 (Thursday 24) - 14:00 - 17:00 Technical Discussions

Location: Tower 3, Level 4, Meeting Room // Chair: Mikis

  • See below ideas for technical discussions.

Day 2 (Friday 25) - 09:00 - 12:30 Common Curator/Technical Discussions

Location: Tower 3, Level 4, Meeting Room // Chairs: Mikis, Sara

Ideas for Technical Discussions

Using the  Bit Repository as archive (Mikis)

We are in DK hoping to move to the The Bit Repository project, Mikis will talk a bit about this.

First point of the afternoon, common with Jhonas Track.

Possiblities from Spring (Nicolas)

Nicolas has as part of the work with the BnF curator front-end worked with the Spring Framework. Is there any obvious places in the NetarchoveSuite system we could benefit from using Spring?

Nicolas gave a inspiring walkthrough of the Spring experiences gain in the BCWeb project, and hade a number of propasals for how Spring could be introduced in NAS, thereby improving some of the overly complicated parts of the NAS system.

Nicolas will send Mikis the presentation, perhaps with examples of BCWeb code. Mikis will then add this to the current NAS Spring documentation.

Slides from Nicolas.

Wayback (Mikis)

Let's share any Wayback experiences

  • Are BnF or ONB using Wayback for access.
  • Are they using the NAS Wayback module.

We discovered that we have a lot experience regarding usage of wayback. In order to improve the sharing of knowledge of NAS technical aspects in general, Mikis will create a number of wiki pages where usages of NAS can be documentated and discussed. This could be a per module page for each organization describing what is used, why some parts isn't used, experiences, extentensions, etc. Mikis will request input when the structure for this has been created.

Free text search (Mikis)

In Denmark we are starting to look into using SOLR for free text searching of our web archives. Has BnF or ONB any experience with free text searching?

We again here that the different organizations have experiences and plans which might be nice to share, so Mikis will create a wiki context for this and request input from the potential contributers.

Development process (Mikis)

We have had som problems up to codefreezes where is has been difficult to establish a properly QA'ed codebase. How do we work towards a more robust codefreeze?

We skipped this point due running short on time. The work continues though as part of the ongoing development (see Roadmap).

Roadmap (Mikis)

Let's try to define a prioritized list of improvements we would like to include in the development in the near future.

 Candidates are:

  • Introduction of Spring could be combined with concrete redesign issues (NAS-1829 or NAS-1859), but very much hings on dedicated resources being available (Nicolas).
  • Automatic system testing and Maven migration is planned for the coming iteration 50.

Virtualisation (Bert, Christophe)

At BnF, we have set up virtual servers to run harvesters et indexers. Has DK and ONB experience in or want to move to virtualisation?

Virtualization is not used at the other institutions.

BnF had a number of improvement which would be nice to have addressed in the development project. Some of these is allready solved and will be part of the 3.18.0 release. The last couple of issues should be entered into JIRA.

The BnF presentation can be found here: BnF Crawling Architecture.

Deduplication (Bert, Christophe)

Deduplication processes have been hard to work through and are still a black blox at BnF. At BnF, we would like to share about it.

  • Bnf has upgradeed the hardware used for indexing which has significantly increased performance.
  • A number of permance improvement in the indexing functionality will be included in the 3.18.0. release.
  • BnF would like to be able to create indexes after a job has finished, instead of the current functionality, where indexes are created when needed. A JIRA issue should be created by BnF for this.

NAS_Preload (Nicolas)

At BnF, we have developped a tool to create a domain/seed list for our snapshot harvest which analyses DNS and redirections. Is it of interest to anyone? Could it be officially part of NetarchiveSuite?

It would be include NAS_Preload in the External Tools. If BnF accepts that the NAS_Preload can be included in the NAS OSS, Nicolas can add it. Slides from Nicolas.