2014-03-04 Statusmeeting

Agenda for the joint BNF, ONB, SB and KB NetarchiveSuite tele-conference March 4th 2014, 13:00-14:00.

Practical informationSkype-conference

  • TDC tele-conference:
    • Dial in number (+45) 70 26 50 45
    • Dial in code 9064479#
  • BridgeIT: BridgeIT conference will be available about 5 min. before start of meeting. The Bridgit url is konf01.statsbiblioteket.dk. The Bridgit password is sbview.


  • BNF: Sara
  • ONB: Andreas
  • KB: Tue, Søren and Nicholas
  • SB: Colin, Mikis  and Sabine
  • Any other issues to be discussed on today's tele-conference?


  • Plans for Nicolas last minute refactoring on the 4.4_branch. 
  • 4.4 release status 

Planning for GA in Paris (19th to the 23rd May)

How is NAS going to be presented and who will be attending the GA?

NetarchiveSuite workshop 2014

When and where

Status of the production sites


We have almost finished our first broad crawl for 2014. We are busy with solving problems we cause for a few website owners when crawling with Heritrix.

Often the problem is Heritrix aggressively inventing url’s because of some javascript on the given webpage, which Heritrix doesn’t understand.

We participated in the IIPC event harvest of the Winter Olympics in Sochi and at the same time we did our own event harvest  on this event.

We are also busy with an event harvest on the European song contest which will take place in Denmark this year. For this event harvest we are assisted by 2 researchers who conduct a research project on the European song contest.  This is the first time, researchers are directly involved in an event harvest from the very beginning of the harvest.


On the crawling side: we are very active on different selective crawls: ongoing biannual crawls, Winter olympics, local elections, personal diaries and literature blogs. About 20 jobs running in parallel.

On the development side, Nicolas has left BnF on Februrary 17th. But we already have someone to replace him: Lam Mai, who was already working at BnF on another project. He is currently working on NetarchiveSuite. His first project is our complete move to WARC from crawling to access.


We also participated in the IIPC event harvest of the Winter Olympics in Sochi and did crawl out own event harvesting on the Olympics. Also upcoming the paraolympics event.

Next meeting


Any other business?