Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Agenda for the joint NetarchiveSuite teleconference 2024-05-07, 13:00-14:00.

Participants

  • BNF:  Auriane, Nola, Sara, Haja
  • ONB: Andreas, Antares
  • KB/DK - Copenhagen: Anders, Thomas, Stephen, Tue
  • KB/DK - Aarhus: Colin
  • BNE: José, Miguel, Eva
  • KB/Sweden: Peter, Pär

Update on NAS latest tests and developments

Everybody should check that they have access to the new wiki here and that their login is functioning.

Status of the production sites

Netarkivet

  • Toke
  • Letting all the great knowledge and impression from IIPC WAC 2024 in Paris sink in.
  • 2nd Broadcrawl 2024- step 2
    • Running as planned
    • Investigating status code 555! mainly from 2023 and 2024
      • "555 Security Incident Detected Your request was blocked. If you are the owner of the website: The Website Application Firewall that is protecting your website has blocked this request for being suspicious. You can see the detailed reason for this in your webserver logs. If you are the visitor: The public IP address assigned to you, by your internet provider, might be suffering from poor reputation: Look up IP reputation here. IP addresses from VPN providers or public networks often have poor reputation."
  • Still testing on-site installation of Browsertrix. Upgraded to latest version but forgot to update crawler to later than 1.1.0 (neccesary for QA functions) 
  • SolrWayback as a search & discovery tool for researchers to work with web archive collections -  Workshop at DHNBC 2024, Iceland, with Jon from Nettarkivet, Norway https://www.conftool.org/dhnb2024/index.php?page=browseSessions&form_session=94&presentations=sho
  • https://github.com/netarchivesuite/solrwayback/releases/tag/5.1.0 
    • Substantial speed up when exporting (csv,warc etc.) from large multi sharded collections. See #329. This feature still needs a little more testing. Feedback will be welcome.
  • Progress on data delivery and legal matters.
  • Part of project with KU and others: "På randen af litteraturhistoriens digitale afgrund" (translsated to "On the brink of the digital abyss of literary history!"
    • Maybe crowdsourcing some parts (donations? crawls?) needs to be investigated further.
  • New consortium accelerates Danish language models

BnF

The main piece of news in April on the BnF site was the organization of the IIPC WAC2024 conference at the end of the month. The event was the opportunity to address, through about fifty presentations, various subjects around digital preservation, tools and workflows, search and access or even artificial intelligence and machine learning.
The conference was opened with a panel devoted to the archiving of Skyblogs by BnF and INA and the closing keynote about artificial intelligence was given by Benoît Sagot, holder of the annual chair on Informatics and Digital Sciences at the Collège de France.

The event was a great success and helped to strengthen the links between the members of the consortium.

Our annual selective harvest is still in progress and should last until the end of May. We decided to generate a special queue management for the blogs (over-blog, canalblog) because we usually encounter blacklisting problems.

The Olympic and Paralympic Games will take place in Paris this year, so we plan to launch the harvest in June. It will last until mid-September. First, the frequency will be monthly before becoming bimonthly with weekly crawls.

To finish, we have also started the first preparations for our 2024 broad crawl and we requested the different lists to the registrars.

ONB


BNE


KB-Sweden


Next meetings

  • June 4th
  • July 2th
  • September 3rd
  • October 1st
  • November 5th
  • December 3rd
  • January 7th 2025

Any other business?


  • No labels