2025-10-07 Statusmeeting
Agenda for the joint NetarchiveSuite teleconference 2025-10-07, 13:00-14:00.
Participants
BNF: Sara, Leslie, Auriane
ONB: Andreas, Antares
KB/DK - Copenhagen: Thomas, Stephen, Tue
KB/DK - Aarhus: Colin
BNE: José, Miguel, Eva
KB/Sweden: Peter, Pär
Update on NAS latest tests and developments
Release of Heritrix 3.11 - September 22nd 2025 : https://github.com/internetarchive/heritrix3/releases/tag/3.11.0
Status of the production sites
Netarkivet
3rd Broadcrawl 2025- finished Sep 29, 2025
Browsertrix
Our local system runs stable after fix from Webrecorder
Facebook-behaviour working in some ways:
+4000 pages crawled without being logged out
Doesn´t go beyound “see more”/scroll on the timeline
No setting of All comments - as far as I can see
Scoping is a bit tricky on Facebook/groups etc.
No reindexeing of JSON as text yet
Will check status with Webrecorder
Experimenteting with behaviour crawls and YouTube crawl (logged in…semi works but embedded playback is not always working)
Many of our wishes have already been implemented
Outreach and more
IIPC WAC - Anders will try do something, maybe 15 min presentation + QA regarding reactions to climate change project: https://merevandisystemet.dk/
Anders guest teacher at class on Digital Heritage at Copenhagen University Oct 3, 2025
Netarkivet turns 20
Joint presentation: Netarkivet 20 år - Fortid, nutid & fremtid - Right click and translate to English or other languages works great:-)
Niels Brüggers presentation:
Photos:
BnF
Following the announcement of the closure of the Typepad blogging platform on September 30th, we launched an emergency crawl for French blogs. So we compiled a list of 652 starting URLs from Google search results. The harvest was launched on September 13th. We developed a special system that enables us to successively crawl packets of 100 URLs for each host. 116 http://typepad.com could not be fully archived because of the closure of the access and 28 typepad.fr are still being archived.
We are still continuing preparations for our upcoming 2025 broad crawl. We have just launched our test broad crawl this week on 2600 URLs per domain for a little more than 6 million starting domains. At this stage the latest technical tests are successful.
After 8 months, the last harvest around the 80th anniversary of the French liberation was launched on September 1st and ended on September 23rd. We have a total of 362 selections. 23,251,453 URLs were archived for 1.26 TiB.
ONB
BNE
We have completed the preprocessing and preparation of URLs and are now ready to launch the electronic serials crawl. The crawler will use a maximum crawl depth of 10 hops and a time limit of 36 hours per domain. Each domain will have a maximum size limit of 1.5 GB. A total of 10 000 online journals will be harvested.
KB-Sweden
Next meetings
November 4th
December 9th
January 6th 2026