2020-10-06 Statusmeeting

Agenda for the joint NetarchiveSuite tele-conference 2020-10-06, 13:00-14:00.


  • BNF: Clara, Sara
  • ONB: Andreas
  • KB/DK - Copenhagen: Tue, Stephen, Anders 
  • KB/DK - Aarhus: Kristian, Colin
  • BNE: Alicia, José Carlos, María
  • KB/Sweden: Pär, Peter

Join from PC, Mac, Linux, iOS or Android:


Or an H.323/SIP room system:

    Meeting ID: 104 443 571

    SIP: 104443571@

Or Skype for Business (Lync):


Or Telephone:

Denmark: +45 89 88 37 88 or +45 32 71 31 57
United Kingdom: +44 203 051 2874 or +44 203 481 5237 or +44 203 966 3809 or +44 131 460 1196
Finland: +358 9 4245 1488 or +358 3 4109 2129
Sweden: +46 850 539 728 or +46 8 4468 2488
Norway: +47 7349 4877 or +47 2396 0588
US: +1 669 900 6833 or +1 646 558 8656
    Meeting ID: 104 443 571

    International numbers available: https://zoom.us/u/acRu0MV3xJ

You can join a meeting by using apps from a pc, a tablet or a smartphone, but you can also use the browser based version (it works with newer versions of Chrome or Firefox)

Update on NAS latest tests and developments

Any feedback on NAS 6.0 ?

Status of the production sites


Broad crawl
Step 2 is proceeding in a great fashion.

Event crawl
We decided to continue with the event crawl on Corona in Denmark but with lower frequency and. 0-hop sites reduced greatly, and with minimal curational activity.

Alexandre, trainee
Arrived and is up and running, working remotely from Copenhagen with the rest of the team. We are almost done with the intro-program and are looking into what will give most value to Alexandre, Netarkivet and also BnF.

IT-University in Copenhagen:
The collaboration with the IT-University in Copenhagen is moving forward.

We have experimented with getting embedded video-content and so far the results are great (except WARC-validation is not valid with re-visits)

We are working on finalizing a workflow from Webrecorder/Conifer.org to Netarkivet. To be able to validate WARC-files correctly is a big part of getting the right level of preservation (we use JWAT for this). But it´s a bit complicated – see for instance this OPF blog by Remco van Veenendaal from Holland: https://openpreservation.org/blogs/warc-validation-tool-experiences/

(How) are you validating WARC-files? And what is the future on this?


Our annual broad crawl will be lanched this tuesday, 6th of October. This will be our first broad crawl with the new NAS version including the official Heritrix 3 IIPC and we expect a better efficiency of the crawling. We have reduced to 2000 the maximal number of URLs per domain (instead of 2500 last year) and we expect to harvest between 110 and 115 TB.

Next week, we'll put in production a new version of our public GUI "Archives de l'internet" making available for the readers the video channels harvested in July. These channels cover the topics of covid19 outbreak and French local elections. The new version includes also a mechanism allowing to display Instagram account pages in our web archive and to browse the posts using Picuki directly from the Instagram page. Finally, this new version gives access to 3 titles of paid online newspapers, harvested with authentication.



Even Crawl

We are still working in Coronavirus collection. Now we are focused on audio and video content.

Serials Broad Crawl

At the beginning of the year we launched a broad crawl of open-access electronic serials and now we are focused in the quality assurance of this collection


We are having some problems with the old version of BCWeb, so we hope this will speed up the installation of the new version that we have available, 6.1

IT Department

The new head of the IT department has just arrived.


Next meetings

  • November 3, 2020
  • December 8, 2020
  • January 5, 2021

Any other business?
