Agenda for the joint NetarchiveSuite teleconference 2025-03-04, 13:00-14:00.
Participants
BNF: Sara
ONB: Andreas, Antares
KB/DK - Copenhagen: Anders, Thomas, Stephen, Tue
KB/DK - Aarhus: Colin
BNE: José, Miguel, Eva
KB/Sweden: Peter, Pär, Johan
Update on NAS latest tests and developments
Preparation of NAS Workshop in Oslo
Agenda proposal: https://docs.google.com/document/d/1eO2cgcfwQ-7BOxrjzSApjvTEgXc-VfCPEgmqyVOriic/edit?usp=sharing
Decision on times: 13:00 - 17:00
Location: at the Conference venue: Henrik Ibsens gate 110, 0255 OSLO.
Status of the production sites
Netarkivet
1st Broadcrawl 2025- step 2 almost finished - smoother crawl than ever
Data delivery of all text from the archive +some metadata for research project finished. 32 TB compressed.
“Mere vand i systemet/More water in the system” climatechange debate-project
Proceeding as planned:
Using Browsertrix Cloud to crawl hard-to-get content like video (YouTube + LinkedIn logged in) and more.
Waiting on results from development from Webrecorder on Facebook-behaviour (expand comments, view reels/content etc.). Logged in.
Lots of experience and findings using Browsertrix including live-exclusions (text-regex etc.)
Browsertrix
Lots of updates from Webrecorder - means issues on local installs. Swift reactions from Webrecorder
We have 3 instances:
Local:
Devel
Prod (with IP mapped for getting behind paywall-content)
Cloud:
3TB Pro Plan. Crawl time monthly os a bit challenging
Solr-index - new SDD-drives update.
Outreach and more
BnF
On the occasion of the exhibition "Apocalypse, yesterday and tomorrow" which takes place until June 8th at the BnF, we published a new homepage of our Archives de l'internet which presents selections of the Apocalypse and the end of the world on the web. Here is a preview: https://x.com/dlwebbnf/status/1886717418245967885?s=46&t=jbG3gmDk9NL-WihrmL3kRA
We are currently running tests concerning our next Podcasts harvest. According to the first estimates, the budget should reach around 20TB for over 13 500 podcasts.
On the occasion of the launch of the biannual selective harvest, we launched a special crawl of some blog platforms (canalblog, over-blog, etc.). These blogs must be archived according to a particular configuration in order to avoid being dynamically blocked. 263 blogs are archived in this way.
Finally, we also launched a harvest of sites of learned societies of local history. 520 sites from all the French regions are currently being archived.
ONB
BNE
KB-Sweden
Next meetings
March 4th
April 7th (IRL!)
May 6th
June 3rd
July 8th
September 2nd
October 7th
November 4th
December 2nd
January 6th 2026