Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Agenda for the joint NetarchiveSuite tele-conference 2023-03-07, 13:00-14:00.

Participants

  • BNF:  Auriane, Sara, Clara, Nola
  • ONB: Andreas
  • KB/DK - Copenhagen: Anders, Thomas, Stephen, Tue
  • KB/DK - Aarhus: Colin
  • BNE: José, Miguel
  • KB/Sweden: Peter, Pär, Jonas

Update on NAS latest tests and developments


Status of the production sites

Netarkivet

  • Broad Crawl going great
  • March+April we will focus on Browsertrix
  • Data dump of all text from Netarkivet to research project on making a new Danish language model in the works.
  • Small organisation change in Copenhagen. Section manager at Digital Cultural Heritage changes to Head of Department at Digitla Transformation-dept.
  • Anders visited Nettarkivet in Oslo to see their world premiere of researcher access to web archive data. https://dhnb.eu/conferences/dhnb2023/workshops/the-norwegian-web-archive-searching-and-examining-the-web-of-the-past/
    • They used Pywb 2.6 but will use much better 2.7.x soon .
    • Had prototype free text search based on natural language extracted from HTML.  https://github.com/nlnwa/fulltekstsok
    • I showed the organisers SolrWayback - It will fullfill many of the wishes from researchers  that came up during the workshop and save them development time. They need to index 1.8 PB data though.
    • Nettarkivet uses browser-based crawler Veidemann for all their crawls, but I'm not sure of the scale (will check out). They have legal deposit law but don´t get a complte TLD list like KB do from DK Hostmaster.
    • Want to work more together. 
  • ...

BnF

First of all, this week, we are launching our first internal harvesting workshop of the year 2022. Until March, 31th, our team will experiment Browsertrix with different types of websites. In this framework we will also test the harvest of social networks.

Following the TikTok crawl launched in 2022 on the theme of the elections, we are going to launch our first current TikTok harvest this month.
198 TikTok accounts or tags have been selected until now.

On March 13, there will be an exchange day around the results and future prospects of the ResPaDon project, the aim of which is to "to set up a network about web archives". This day will be held at the BnF and will be broadcast live on Youtube.

ONB


BNE


KB-Sweden


Next meetings

  • April 11th
  • May 9th
  • June 6th
  • July 4th
  • September 5th
  • October 3rd
  • November 7th
  • December 5th
  • January 9th 2024

Any other business?


  • No labels