2026-01-13 Statusmeeting
Agenda for the joint NetarchiveSuite teleconference 2026-01-13, 13:00-14:00.
Participants
BNF: Sara, Leslie, Auriane
ONB: Andreas, Antares
KB/DK - Copenhagen: Thomas, Stephen, Tue, Anders,
KB/DK - Aarhus: Colin
BNE: José, Miguel, Eva, Alberto
KB/Sweden: Peter, Pär
New comers to the NAS meetings from KB-DK
Ergin Kilic, erki@kb.dk
Lars Næsbye Christensen, lnch@kb.dk
Halina Holst, halh@kb.dk
Bolette Ammitzbøl Jurik, baj@kb.dk
Rasmus Pihl, rapi@kb.dk
Update on NAS latest tests and developments
Latest release of NetarchiveSuite 8.0 including Heritrix 3.12.1 on 2025-11-05 : https://kb-dk.atlassian.net/wiki/x/AYBsPg
BnF latest fix on NAS: https://kb-dk.atlassian.net/browse/NAS-2905
BnF code report on main Heritrix repository (including JSON parser) included in Heritrix 3.13.0 : https://github.com/internetarchive/heritrix3/releases/tag/3.13.0
Status of the production sites
Netarkivet
1st Broadcrawl 2026 will start soon
NAS/Heritrix
Looking into how huge list of domains/ips can be used: https://ip.thc.org/
Mostly for finding relevant domains outside of .dk
Have updated default template with a lot of patterns preventing 404s. Thanks to Kris for starting this list and making it possible.
https://github.com/netarchivesuite/solrwayback/releases/tag/5.4.0
Greenland event harvest started (Facebook, X, Sotwe, foreign news and more)
Browsertrix
General EPIC on the way for the collection-team:
Translated: As Netarkivet, we want to have Browsertrix in production, end-to-end workflow, so that Netarkivet can collect and preserve dynamic web data stably and scalablely.
Som Netarkivet vil vi have Browsertrix i produktion, end-to-end-workflow, så Netarkivet, kan indsamle og bevare dynamisk webdata stabilt og skalerbart
Looking at performance and transfer issues
Custom behaviours focus
Outreach and more
WEB CHILD
WARC-files for AU
Two former KB employees will work on the project, including SolrWayback-efforts
3 workshops during 2026
First release of data from KB for training Danish language models
Første udgivelse af data til træning af danske sprogmodeller - sprogteknologi.dk
Next up - more web archive data. The projects aim is 300 billion words (Netarkivet is around 8000 billion words)
BnF
We have a change in the team, our colleague Anaïs Crinière-Boizet becomes harvesting manager and assistant head of the digital legal deposit service.
Our 2025 broad crawl ended in mid-December. We archived nearly 3 billion URLs during 5 and a half weeks. The total size reached 185,11 Tio.
After a break of almost a year due to a blacklistage, we tried to launch a new Videos harvest related to the AlgoJO research project which aims to measure YouTube's algorithmic recommendations during the Olympic Games. 2614 channels and more than 5643 videos were archived for a size of 500 GB.
ONB
BNE
A new event collection has been created to harvest the elections to the Assembly of Aragon (northwest Spain), in collaboration with a web archivist from the region.
In coordination with the BNE, the Castilla-La Mancha Library has launched a pilot project to introduce the Spanish Web Archive in public schools and universities. In December, the first experience took place at a secondary school in Toledo (¡Muchas gracias,... - Biblioteca Nacional de España | Facebook), and in the spring of 2026 we plan to hold a workshop at the University of Castilla-La Mancha.
KB-Sweden
Next meetings
February 3rd
March 3rd
April 7th
May 5th
June 2nd
July 7th
September 1st
October 6th
November 3rd
December 1st
January 5th 2027