2012-03-20 Statusmeeting

Agenda for the joint BNF, ONB, SB and KB NetarchiveSuite tele-conference March the 20th 2012, 13:00-14:00.

Søren is on vacation her.

Practical information

  • Skype-conference
    Mikis will establish the skype-conference at 13:00
    • TDC tele-conference (If it fails to establish a skype tele-conference):
  BridgeIT conference will be available about 5 min. before start of meeting.


  • BNF: Sara and Nicholas
  • ONB: Michaela and Andreas
  • KB: Tue, Søren and Jonas
  • SB: Colin and Mikis
  • Any other issues to be discussed on today's tele-conference?

Followup to workshop

  • Actions from last meeting.
    • Sara will create a post to the curator mailing list regarding the new content..
    • Mikis and Sara will look into defined general Jira usage. Karen will contacted afterwards for analysis of NASC-17@jira.
    • (Sara) The wiki content regarding templates and crawler traps isn't finished yet, even though some content has been added.

IIPC GA in Washington

  • Contributions:
    • NetarchiveSuite Thursday 9-12:30.
      • The agenda will be sent to IIPC friday so Birgit and Bjarne have time to comment.
      • Mikis will be responsable for the first bullet.
      • Sara vil take care of the second.
      • in the third bullet:
        • Bjarne vil represent DK
        • Michaela will probably represent OnB
        • BnF will be represented by ???
    • JHonas presentation Tuesday 15-15:15

      Nicholas will be responsable for the presentation. A more informative title is needed for the agenda, Mikis will send one to Sara for dispatch to IIPC.

Jhonas workshop at KB in april

  • BnF team will arrive on the 1.april and will therefore be ready for an early start monday. Let's start at 9:00.
  • BnF team will leave early afternoon on the 3.april.
  • We'll try to arrange a social dinner monday evening.

See NAS Warc workshop agenda.

Iteration 50 (3.19 Development release) (Mikis)

  • 3.19.0 release test
    • Code freeze is in effect.
    • Sanity test is in progress.
    • Full test should be ready to start in a day or two.

Status of the production sites

  • Netarchive - technical update (Mikis):
    • Step 2 of the current broad crawl has been stop because of a critical bottleneck in the 3.18 system (NAS-2051@jira). The 3.18.3 release should fix this.
  • Netarchive - curator update (Sabine):
    • We started our broad crawl number 1/2012 on 2012-02-22, first step with a limit of 10 MB per domain.
    • We are experimenting with download/archiving of videos from YouTube with Greasemonkey for Firefox.We download ‘Download YouTube Videos as MP4’ from http://userscripts.org/scripts/show/25105. The “operation” seems to be successfull. The big challenge is: how can users view this video files from the archive.
    • We started an event harvest on a right-wing extremist demonstration that will take place in Aarhus on 2012-03-31
  • BNF - curator update (Peter):

    Harvest for the 2012 Presidential and Parliamentary Elections

    • We have already started to harvest websites for the elections. As the candidates use social networks massively, we conducted special analyses about Facebook and Twitter. We had no problem with Facebook but Twitter was a real nightmare with redirections and # in the URL. Fortunately we manage to harvest it thanks to a special profile (without the mention Mozilla in the user agent), four times per day. However we have not yet resolved the problem of its access in the wayback machine.
    • We have made also a focus on videos, especially the platform Dailymotion. With a Beanshell script, we succeed in crawling more than 17 000 videos in two days. We'll use the same solution for our big Dailymotion harvest at the end of the month.
  • ONB:
    • Domain crawl has been finished. We are now going to recrawl some broken jobs due to server crashes. Afterwards we will start analysing the data.
    • Literature crawl is still on hold after complaints and discussion about legal deposit legislation (for political reasons).
      Michaela is back from holidays and will seek permission for travels to IIPC GA.

Date for next joint tele-conference.

  • April 24th 13-14.

Any other business