2025-09-02 Statusmeeting
Agenda for the joint NetarchiveSuite teleconference 2025-09-02, 13:00-14:00.
Participants
BNF: Sara, Leslie, Auriane
ONB: Andreas, Antares
KB/DK - Copenhagen: Thomas, Stephen, Tue
KB/DK - Aarhus: Colin
BNE: José, Miguel, Eva
KB/Sweden: Peter, Pär
Update on NAS latest tests and developments
Release of NetarchiveSuite 7.8 (integrating Heritrix 3.10 - June 12th 2025): https://kb-dk.atlassian.net/wiki/spaces/NAS/pages/21664563
Ongoing work at the BnF:
Removal of the ViewerProxy module : https://kb-dk.atlassian.net/browse/NAS-2898
Change default sort order of domain search results : https://kb-dk.atlassian.net/browse/NAS-2899
Sort seed lists alphabetically in the Domain configuration entry/edit form : https://kb-dk.atlassian.net/browse/NAS-2902
Sort the configuration drop-down menu alphabetically in the Focused crawl creation form : https://kb-dk.atlassian.net/browse/NAS-2903
Fix question mark displayed instead of thousands separator in harvest history : https://kb-dk.atlassian.net/browse/NAS-2904
Remote H3 Crawllog viewer getting a 404 with Heritrix 3.10 : https://kb-dk.atlassian.net/browse/NAS-2900
kB vs. URis labels in Job budget change page : https://kb-dk.atlassian.net/browse/NAS-2901
Questionning about 3.10.1 (https://github.com/internetarchive/heritrix3/releases/tag/3.10.1 July 21) and 3.10.2 (https://github.com/internetarchive/heritrix3/releases/tag/3.10.2 August 29th) integration
Status of the production sites
Netarkivet
3rd Broadcrawl 2025- step 2 started Jun 10, 2025
Quite a few issues with NAS GUI. Quick fix made
Some of the bigger selective crawls started
Browsertrix
Waiting on MongoDB database size fix in https://github.com/webrecorder/browsertrix/releases/tag/v1.18.1
Facebook-behaviour hopefully coming up
Experimenteting with behaviour crawls and YouTube crawl (logged in…semi works but embedded playback is not always working)
Outreach and more
Netarkivet turns 20 - https://docs.google.com/forms/d/1eMYPX91QnAkjxchddAc2E1Pz0hiLhluVFUSvxzKOWIY/edit YOu are all welcome.
Brewster Kahle visits KB Sep 4, 2025 and holds a public lecture planned by University of Copenhagen and us: https://artsandculturalstudies.ku.dk/research/daloss/events/2025/universal-access-to-all-knowledge-in-europe/
BnF
We're resuming the preparations for our 2025 broad crawl, after the summer break. Several works are underway such as the update of the NAS and Heritrix versions, the setup of the Icelandic JS extractor and the HTTP 2 fetch module.
During July, an IT incident affected our entire production activity. All our crawls were interrupted, and some had to be restarted. This was the case for the Auction houses and the biannual harvests, which were delayed. So the Auction houses harvest ran between mid-June and early August, and the biannual harvest between mid-July and mid-August. 88,579,685 URLs and 1.50 TiB of data were archived for the Auction houses harvest and 106,307,921 as well as 4.94 TiB for the biannual crawl.
ONB
BNE
Over the past two months, we have completed the broad crawls of the .es domain and the co-official languages domains, .gal (Galicia), .cat (Catalonia) and .eus (Basque Country), with the following configuration:
10 hops
Leaving the domain is not allowed
Maximum size per domain 300 MB
Duration of work, 36 hours
This summer, Spain has experienced a incredible number of fires in different regions, with more than 20 at the same time. It has been decided not to create a specific event, but it is being covered in different collections such as press, environment and climate change, and national politics.
KB-Sweden
Next meetings
October 7th
November 4th
December 2nd (is this OK to postponed to Tuesday 9th?)
January 6th 2026