2025-10-07 Statusmeeting

2025-10-07 Statusmeeting

Agenda for the joint NetarchiveSuite teleconference 2025-10-07, 13:00-14:00.

Participants

  • BNF:  Sara, Leslie, Auriane

  • ONB: Andreas, Antares

  • KB/DK - Copenhagen: Thomas, Stephen, Tue

  • KB/DK - Aarhus: Colin

  • BNE: José, Miguel, Eva

  • KB/Sweden: Peter, Pär

Update on NAS latest tests and developments

Status of the production sites

Netarkivet

  • 3rd Broadcrawl 2025- finished Sep 29, 2025

  • Browsertrix

    • Our local system runs stable after fix from Webrecorder

    • Facebook-behaviour working in some ways:

      • +4000 pages crawled without being logged out

      • Doesn´t go beyound “see more”/scroll on the timeline

      • No setting of All comments - as far as I can see

      • Scoping is a bit tricky on Facebook/groups etc.

      • No reindexeing of JSON as text yet

      • Will check status with Webrecorder

    • Experimenteting with behaviour crawls and YouTube crawl (logged in…semi works but embedded playback is not always working)

    • Many of our wishes have already been implemented

  • Outreach and more

    • IIPC WAC - Anders will try do something, maybe 15 min presentation + QA regarding reactions to climate change project: https://merevandisystemet.dk/

    • Anders guest teacher at class on Digital Heritage at Copenhagen University Oct 3, 2025

    • Netarkivet turns 20

      • Joint presentation: Netarkivet 20 år - Fortid, nutid & fremtid - Right click and translate to English or other languages works great:-)

      • Niels Brüggers presentation:

      • Photos:

        IMG_2025.JPEG
        IMG_2009.JPEG
        IMG_1955.JPEG
        IMG_1933.JPEG
        IMG_1909.JPEG
        IMG_1915.JPEG
        IMG_1897.JPG
        IMG_1892.JPG
        IMG_1977.JPEG
        IMG_1896.JPEG
        2 (1).jpg
        11 (1).jpg
        10 (1).jpg
        5 (1).jpg

BnF

Following the announcement of the closure of the Typepad blogging platform on September 30th, we launched an emergency crawl for French blogs. So we compiled a list of 652 starting URLs from Google search results. The harvest was launched on September 13th. We developed a special system that enables us to successively crawl packets of 100 URLs for each host. 116 http://typepad.com could not be fully archived because of the closure of the access and 28 typepad.fr are still being archived.

We are still continuing preparations for our upcoming 2025 broad crawl. We have just launched our test broad crawl this week on 2600 URLs per domain for a little more than 6 million starting domains. At this stage the latest technical tests are successful.

After 8 months, the last harvest around the 80th anniversary of the French liberation was launched on September 1st and ended on September 23rd. We have a total of 362 selections. 23,251,453 URLs were archived for 1.26 TiB.

ONB

 

BNE

We have completed the preprocessing and preparation of URLs and are now ready to launch the electronic serials crawl. The crawler will use a maximum crawl depth of 10 hops and a time limit of 36 hours per domain. Each domain will have a maximum size limit of 1.5 GB. A total of 10 000 online journals will be harvested.

KB-Sweden

 

Next meetings

  • November 4th

  • December 9th

  • January 6th 2026

Any other business?

  •