2024-07-02 Statusmeeting

Agenda for the joint NetarchiveSuite teleconference 2024-07-02, 13:00-14:00.


  • BNF:  Auriane, Sara, Haja, Nola
  • ONB: Andreas, Antares
  • KB/DK - Copenhagen: Anders, Thomas, Stephen, Tue
  • KB/DK - Aarhus: Colin
  • BNE: José, Miguel, Eva
  • KB/Sweden: Peter, Pär

Update on NAS latest tests and developments

Status of the production sites


  • 3rd Broadcrawl 2024- step 1 - started
  • Browsertrix
  • Data delivery - a few projects at the moment
  • Web Archives: An Untapped Source Of Smart Data - https://londondataweek.org/events/#web-archives-an-untapped-source-of-smart-data
    • The message we want to convey is that web archives are useful sources of ‘data’. I am aware that ‘data’ means different things to different people. For instance, for me web archives are a great source to access past web data at scale that is large amounts of web data instead of small and well curated collections. I am using such data to answer socio-economic research questions about cities and regions. I am aware that for different organisations data might have a different meaning, but we want to highlight these dimensions. We want to present web archives to the non-specialised audience and highlight this perspective.
      I see these presentations as an opportunity to showcase web archives, their value, and utility to a lay audience. I am particularly interested in highlighting their untapped potential as a data source. I am also aware that ‘data’ means different things to different people, so I will leave this open to your interpretation.

    • Any input on why webarchives are smart data that you would like to pass on via me?


At the BnF, the harvest activity has been particularly valuable and intense since June.
The Elections harvest which started in March is still continuing. Further to the dissolution of the national assembly, a new harvest concerning legislative elections was quickly organized and must continue until the end of July.
The Olympic games harvest started in June and will go on until the end of September.
In addition, Olympics and Elections Instagram crawls are launched every week and an Olympics and Elections Videos crawl has just been launched.

Then, the Auction Houses harvest has been launched last week and our biannual selective harvest will be launched this week.
And to finish we are also still working on preparations for our broad crawl and resolving the URLs 404 error.



We have started the preparation of annual broad crawl of .es domain. This year the list of seeds will be more than 2,100,000 domains. The parameters will be: 500 seeds for job, unlimited space for domain, and limit time of 36 hours for job.

This month, all our efforts will be focused on catalogue and repository migration. We are developing new processes and workflows. We expect to have everything up and running smoothly by September.


Next meetings

  • September 3rd
  • October 1st
  • November 5th
  • December 3rd
  • January 7th 2025

Any other business?