2024-12-03 Statusmeeting
Agenda for the joint NetarchiveSuite teleconference 2024-12-03, 13:00-14:00.
Participants
- BNF: Auriane, Sara, Nola
- ONB: Andreas, Antares
- KB/DK - Copenhagen: Anders, Thomas, Stephen, Tue
- KB/DK - Aarhus: Colin
- BNE: Miguel, Alberto
- KB/Sweden: Peter, Pär
Update on NAS latest tests and developments
Status of the production sites
Netarkivet
- 4th Broadcrawl 2024- step 2 almost finished- will conclude before holidays.
- Focus on Browsertrix, paywall sites and Bluesky
- IA-purchased content
- Since March 2024, we have been working on the Danish domains and subdomains we purchased from IA, and we are almost finished with it.
- We acquired 700,000 domains/subdomains and have checked for Danish relevance.
- This has resulted in 9,676 clean domains found to be relevant to Denmark and 26,820 subdomains found to be relevant to Denmark.
- We have taken the 26,820 subdomains into Browertrix via API, and although we still have a little bit left to do, we now have 2.35TB and over 1 million pages from this work
- IIPC WAC 2025 Oslo proposals on NAS/Netarkivet/Nettarkivet & more rewritten.
- In the news/seminars/webinars:
- Article about Netarkivet: https://dm.dk/digi/artikler/forskning/netarkivet-indsamler-vores-internethistorie/
- Webinar about Netarkivet - open for all: https://dm.dk/digi/arrangementer/0003257/ Jan. 15 2025
- DISAPPEARANCE Workshop - Royal Danish Library, December 11-12th, 2024, 20 researchers visit Netarkivet to see servers and have a short presentation. Related to this: https://artsandculturalstudies.ku.dk/research/daloss/
Presentation at ODA anual meeting (Organisation of Danish Archives): The Web Archive at the Royal Library - Background, status quo & insights from an archive with over 46 billion objects
- Pilot Project Investigating Danes' Reactions to Climate Change - Mere vand i systemet /More water in the system -
- https://merevandisystemet.dk/ site launched
- Google doc for nominations - open for all : More Water in the System - Pilot Project Investigating Danes' Reactions to Climate Change - Web Archive - The Royal Library.
BnF
Our broad crawl is still ongoing. The first estimates indicate that the budget should reach around 152 TiB of data.
Our internal harvesting workshop concerning the podcasts started in mid-November. The goal of this workshop is to improve the existing process, explore the possibilities to add a new platform to the harvest and think about a simplified access model for podcasts (in particular through a virtual guided tour).
We will launch this week the "Social movements" and "Solidarity" harvests, for which the projected budget is 1.50 and 0.80 TB respectively.
Finally, our last Instagram harvest of the year has been launched last week. This is the 21st launch of the harvest since the beginning of the year.
ONB
BNE
Last month, two broad crawls were completed: the .gal (Galicia) and .cat (Catalonia) domains, with 8,000 and 84,000 domains collected, respectively. With these two crawls, we have achieved the goal of archiving all four national domains in 2024.
A new collection, focusing on computing and the Internet, has been created to gather information on key topics such as artificial intelligence and cybersecurity. Three universities have collaborated with us to create this collection as part of the agreement we have with REBIUN, the organization that represents university and scientific libraries in Spain.
KB-Sweden
Next meetings
- January 7th 2025
Any other business?
- WAC 2025 proposals: Past, Present and Future of KB + NB and 18 years with NetArchiveSuite haved been merged : Past, Present & Future Of Cross-Institutional Collaboration In Web Archiving:
Insights From The Norwegian And Danish Web Archive, The NetArchiveSuite Community & Beyond - NAS community meeting in Oslo before GA+WAC (8-10 April). Monday April 7th? Full day ? Afternoon ?
- Working group meeting on "Collecting a national domain: practices, limits and challenges" during the GA. Any interest to contribute?