2012-06-26 Statusmeeting

Agenda for the joint BNF, ONB, SB and KB NetarchiveSuite tele-conference June the 26th 2012, 13:00-14:00.

Practical information

  • BNF: Nicholas
  • ONB: Michaela and Andreas
  • KB: Tue, Søren and Nicholas
  • SB: Colin and Mikis, Sabine
  • Any other issues to be discussed on today's tele-conference?

Heritrix 3 in NetarchiveSuite

  • Issue for planning: NAS-2066 Heritrix roadmap Workshop.
  • How do we coordinate the joint effort.
  • When do we participate and where?
    Shortly after the NAS workshop at SB in september?

Sara and Mikis will discuss this further.

JhoNAS status

I accidentally made this project status some days ago.

Iteration 51 (3.20 Productiontion release) (Mikis)

We are planning with codefreeze in a day or 2 with release at the end of next week.

Everybody is committed to the release test.

BnF would like to have two extra features included in the 3.20 release (NAS-2069 Allow an alternative job generation algorithm & Retired queues). It will not be possible to include these features in the initial 3.20 release as this would prevent us us from having the release ready before the summer vacation sets in. We will instead plan for a 3.20.1 release including these two features.

Status of the production sites

  • Netarkivet:

We are using 3.18.3 and allmost finished with the second broad crawl in 2012.
About 28 TB is harvested and about 18 TB is uploaded this time (about 500-900 GB/day).
We have activated 30 concurrently broadcrawl harvesters during the crawl.
After the manual startup I have no issues with the system.
We plan to upgrade to NAS 3.20 medio august.
We harvest now also facebook.com comments even though we can not see them in viewer-proxy or in wayback
We started our own event harvest on the Olympics 2012. Simultaneously we participate in nominating url's to the IIPC Olympics harvest

  • BNF:
Peter (july 2012)

- We're looking at ways of doing a crawl of blog platforms to complement our domain crawl, where as we only take 10 000 URLs per domain these platforms are under-represented. We're just beginning tests on identifying and collecting blogs on a selection of the most important platforms, but we'll let you know how it goes.

- In July we have three thematic crawls: on French government publications, US government publications (as part of the IDEA project to "dematerialise" exchanges of these publications between libraries) and the Olympic Games (as part of the IIPC collaborative crawl).

  • ONB:
  • Currently we are testing a NAS Installation on behalf of our IT-Department. We are using version 3.18.3 for it. After these tests we will start to crawl all academic and government websites.

Date for NAS workshop at SB

Beginning of September.

Mikis will send a mail requesting preferred dates.

Date for next joint tele-conference.

  • August 14th 13-14.


Any other business