Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

Panel
  • Toke
  • Letting all the great knowledge and impression from IIPC WAC 2024 in Paris sink in.
  • 2nd Broadcrawl 2024- step 2
    • Running as planned
    • Investigating status code 555! mainly from 2023 and 2024
      • "555 Security Incident Detected Your request was blocked. If you are the owner of the website: The Website Application Firewall that is protecting your website has blocked this request for being suspicious. You can see the detailed reason for this in your webserver logs. If you are the visitor: The public IP address assigned to you, by your internet provider, might be suffering from poor reputation: Look up IP reputation here. IP addresses from VPN providers or public networks often have poor reputation."
  • Still testing on-site installation of Browsertrix. Upgraded to latest version but forgot to update crawler to later than 1.1.0 (neccesary for QA functions) 
  • SolrWayback as a search & discovery tool for researchers to work with web archive collections -  Workshop at DHNBC 2024, Iceland, with Jon from Nettarkivet, Norway https://www.conftool.org/dhnb2024/index.php?page=browseSessions&form_session=94&presentations=sho
  • https://github.com/netarchivesuite/solrwayback/releases/tag/5.1.0 
    • Substantial speed up when exporting (csv,warc etc.) from large multi sharded collections. See #329. This feature still needs a little more testing. Feedback will be welcome.
  • Progress on data delivery and legal matters.
  • Part of project with KU and others: "På randen af litteraturhistoriens digitale afgrund" (translsated to "On the brink of the digital abyss of literary history!"
    • Maybe crowdsourcing some parts (donations? crawls?) needs to be investigated further.
  • New consortium accelerates Danish language models

...

Panel

The main piece of news in April on the BnF site was the organization of the IIPC WAC2024 conference at the end of the month. The event was the opportunity to address, through about fifty presentations, various subjects around digital preservation, tools and workflows, search and access or even artificial intelligence and machine learning.
The conference was opened with a panel devoted to the archiving of Skyblogs by BnF and INA and the closing keynote about artificial intelligence was given by Benoît Sagot, holder of the annual chair on Informatics and Digital Sciences at the Collège de France.


The event was a great success and helped to strengthen the links between the members of the consortium.

Our annual selective harvest is still in progress and should last until the end of May. We decided to generate a special queue management for the blogs (over-blog, canalblog) because we usually encounter blacklisting problems.

The Olympic and Paralympic Games will take place in Paris this year, so we plan to launch the harvest in June. It will last until mid-September. First, the frequency will be monthly before becoming bimonthly with weekly crawls.

To finish, we have also started the first preparations for our 2024 broad crawl and we requested the different lists to the registrars.

ONB

Panel


BNE

Panel

This month, we are working on three election events at the same time, included the election for European Parlament.

The broad crawl of the .eus domain (Basque Country) has been finished and we are going to start with the harvesting of open access reviews that we carry out annually. The idea is to incorporate the links from OpenWayback to catalog in the missing reviews every year.

We are preparing an in-person workshop for collaborating web curators from the different regional conservation centres that it will be held at the BNE in September. It will focus on legal deposit and especially on web archiving; it is the first one we have held since before the pandemic.

KB-Sweden

Panel


Next meetings

...