2019-11-05 Statusmeeting

Agenda for the joint NetarchiveSuite tele-conference 2019-11-05, 13:00-14:00.


  • BNF: Clara, Sara
  • ONB: Andreas
  • KB/DK - Copenhagen: Tue, Stephen, Anders, Kristian
  • KB/DK - Aarhus: Colin, Sabine, Knud Åge
  • BNE: Alicia, María, Manuel, José
  • KB/Sweden: Par, Thomas, Peter

Update on NAS latest tests and developments

Feedback on usage / tests on NetarchiveSuite 5.6 release: see NetarchiveSuite 5.6 Release Notes

Feedback on tests on BnF test NAS 6.0 + IIPC H3 release : see presentation

Status of the production sites


Because of a political decision to change the terms of conditions, a broadcast station (radio 24/syv) decided to close down on 31 October. The announcement of the popular broadcast station to be closed raised a storm of reactions in the social media. People asked whether KB DK was going to keep this broadcasting station’s archive. We tried to capture as much podcasts as possible with umbra. As to the QA of the harvested content, we had to wait for the generation of an index (there was a long queue in the index generator), which was not ready before 4 November.

As Yahoo Groups are going to close down, too, we crawled Danish Yahoo Groups in the last couple of days.

The fourth broad crawl for 2019 is in preparation: there will be a step 1 with a domain limit of 50 MB and a step 2 with a domain limit of 16 GB. Together with this broad crawl we will run the following selective crawls: Research databases, Municipalities and regions, Ministries and Government Agencies, YouTube
We will crawl with NAS 5.5 and expect step 1 to last about 2 weeks, step 2 about 6-8 weeks.

Other projects keeping us busy:
• Work on risk assessment
• Implementation of SolR Wayback
• Consolidation of BCWeb (build up a community)
• Revision of collection strategies
• Capture of content behind paywalls – the never ending story



  •  We finished our 7th domain crawl, which was done again with only one stage. We crawled 150 Million objects in almost 7 TB (this is about 3 TB on disk).
  •  We upgraded to NAS 5.6 and use it in production
  • We moved out the database to a stronger server, and since then the duplicate job generation error disappeared. Currently we are doing only selective crawls. We will see how it will work during a domain crawl.


We have finished our annual broad crawl:

  • There were about 2 million of websites
  • We have configured it with a limit of 150 MB/domain
  • It lasted 29 days
  • We have crawled 88% of the domains completely

We have some problems with the index and we are not be able to make QA of some of our content since February

We are going to install 6.1 version of BCWeb  in the next weeks. After that, we will send our changes in BCWeb in order to let you analyze them and consider if you want to include them in next versions


Next meetings

  • December 3
  • January 7, 2020

