2014-01-14 Statusmeeting

Agenda for the joint BNF, ONB, SB and KB NetarchiveSuite tele-conference January 21th 2014, 13:00-14:00.

Practical informationSkype-conference

  • TDC tele-conference:
    • Dial in number (+45) 70 26 50 45
    • Dial in code 9064479#
  • BridgeIT: BridgeIT conference will be available about 5 min. before start of meeting. The Bridgit url is konf01.statsbiblioteket.dk. The Bridgit password is sbview.

Participants

  • BNF: Sara and Nicolas
  • ONB: Andreas
  • KB: Tue, Søren and Nicholas
  • SB: Colin, Mikis  and Sabine
  • Any other issues to be discussed on today's tele-conference?

Development

  • Planning of developer ressources 2014:
    • DK: 0.8 SB (Colin + Mikis), 1.0 Kb from 3.3 (Nicholas) + SVC in support. 1 associated SB ressource supporting SB related activities (H3?).
    • BnF:??
    • ONB:??
  • 4.4 release status 

Curator roadmap

Status of the production sites

Netarkivet
  • We finished our fourth broad crawl for 2013 on 2013-12-27
  • We started an event harvest of MGP (Melodie Grand Prix Eurovision de la Chançon), which will take place in Denmark in 2014
  • We are harvesting very big Danish sites, such as the Danish Broadcasting stations DR and TV2’s sites. Our broad crawls do not capture these sites completely because of the domaine limits, so we crawl them separately about for times a year.
  • We are harvesting ministries and government administration sites for the same reasons as very big sites
  • We are looking forward to try out the extended fields for our documentation J

 

BnF
We finished our 2013 broad crawl at the beginning of January, with a total of 1,628 jobs in NetarchiveSuite. We obtained a volume of 56.2 TB which is 70% more than last year because we did not have elections in 2013 and so we were able to allocate some of the budget for focused crawls to the broad crawl. We obtained 1.7 billion harvested URLs. We had some difficulties with the technical infrastructure, however we maintained a good harvesting speed with 8 URLs per second.

 

ONB
  • Domaincrawl was finished end of November. Now working on CDX-Index and Reports
  • Running Collection on Media & Politics Sites

Next meeting

 

Any other business?