2025-11-04 Statusmeeting

2025-11-04 Statusmeeting

Agenda for the joint NetarchiveSuite teleconference 2025-11-04, 13:00-14:00.

Participants

  • BNF:  Sara, Leslie, Auriane

  • ONB: Andreas, Antares

  • KB/DK - Copenhagen: Thomas, Stephen, Tue

  • KB/DK - Aarhus: Colin

  • BNE: José, Miguel, Eva

  • KB/Sweden: Peter, Pär

Update on NAS latest tests and developments

Status of the production sites

Netarkivet

  • 1st Broadcrawl 2025- step 1 finished - will start step 2 asap

  • Event crawl on election: Kommunalvalg 2025

  • Browsertrix

    • Working on API transfer - also from cloud

    • custom behaviours focus

  • Data extraction.

    • WARC-files excluding some domains via https://github.com/iipc/jwarc

    • Researchers will build their own SolrWayback using the WARC-files

    • Early ARC-files used - using JWARC and some fixes means more content is available. Some ARC-files stopped indexing due to errors in WARC-indexer.

  • Outreach and more

BnF

We are executing the final steps before the launch of our 2025 broad crawl, which is expected to start this week. We will crawl 2500 URLs per domain and the projected budget is 189 Tio.

Following the shutdown of the http://typepad.fr servers, the harvest ended on October 14th. In addition to the 118 queues remaining to be crawled for the http://typepad.com domain, 6 queues weren't ended for the http://typepad.fr domain. In total the harvest lasted one month. We archived over 1 million URLs, for 154 GB of data.

ONB

 

BNE

On November 13 and 14, during the 1st Virtual Conference on Legal Deposit Experiences in the National Libraries of Ibero-America, organized by ABINIA and focused on legal deposit, we managed to include a roundtable titled: "Challenges and Opportunities of Legal Deposit in the Digital Environment", as well as a workshop on "Implementing a Web Archive from Scratch". In the first sessions, we will discuss the lessons learned from the Spanish Web Archive, and afterwards, we will have two sessions. One focused on Heritrix and the other on Browsertrix

We have completed the broad crawl of open-access journals that we carry out every year, and we are now going to begin analysing the results. With around 5 TB of data, we are currently preparing to Quality Assurance (QA).

KB-Sweden

 

Next meetings

  • December 9th

  • January 6th 2026

Any other business?

  •