2013-05-14 Statusmeeting

Agenda for the joint BNF, ONB, SB and KB NetarchiveSuite tele-conference May the 14th 2013, 13:00-14:00.

Practical informationSkype-conference

  • TDC tele-conference:
    • Dial in number (+45) 70 26 50 45
    • Dial in code 9064479#
  • BridgeIT: BridgeIT conference will be available about 5 min. before start of meeting. The Bridgit url is konf01.statsbiblioteket.dk. The Bridgit password is sbview.

Participants

  • BNF: Sara
  • ONB: Andreas
  • KB: Tue, Søren and Nicholas
  • SB: Colin, Mikis  and Sabine
  • Any other issues to be discussed on today's tele-conference?

Development

  • 4.1 release test in progress.
  • Is there plans at BnF for implementing https support in Wayback.
  • Heritrix3 investigation by Søren.

Curator roadmap

 

IIPC GA

Summary (Sara and Tue).

Status of the production sites

Netarkivet

Since the end of March we had our main focus on an event crawl of one of the biggest lockouts of Danish history: all Danish school Teachers had been locked out since 1st of April.  This event had a rather big impact on Danish society. As school children could not come to school, the parents had to take care of them. Some had the possibility to take their children to their work place, others took on holidays. Our government has just passed a law, which finished the lockout.

One of the big issues in the event crawl was harvesting YouTube videos, still a kind of manually process. Our documentation of the procedure and the tool used is not finished, we will put it on the wiki, when we are ready for that.

Furthermore we have finished our first broad crawl for 2013 in the middle of April. We harvested about 32 TB/740 million objects.  The 2nd broad crawl has been started, this time with WARC.

BNF

During March we finished our first semestrial crawl of the year. This represents the second-largest part of our focused crawls after the annual crawl, which will take place in May.

We have also been working to improve our crawl of videos on Dailymotion by stopping Heritrix from collecting multiple copies of the same videos. We will let you know the results once the crawl is complete.

ONB:
  • We have been working on our politics collection, because we already had some regional elections this year and will have parliamentary elections in fall 2013. After attending the Twittervane workshop at IIPC GA I wanted to use Twittervane for the selection of sites. My experience was that the tool works, but the Austrian twitter community was not very active prior to the elections and started tweeting on the day of the event. It was interesting to analyze the tweets afterwards, but it was not helpful for selection of sites before the event.
  • Domain crawl stage 1 has been finished, stage 2 will be started soon.
  • We got new hardware and switched to NAS 4.01 for the new servers.

 

Next meeting

June 18th 13-14??

Any other business?