2021-04-06 Statusmeeting

Agenda for the joint NetarchiveSuite tele-conference 2021-04-06, 13:00-14:00.

Participants

  • BNF: Auriane, Clara
  • ONB: Andreas
  • KB/DK - Copenhagen: Tue, Stephen, Anders
  • KB/DK - Aarhus: Colin
  • BNE: José, Alicia
  • KB/Sweden: Pär, Peter

Update on NAS latest tests and developments

NetarchiveSuite 7.0 has been released: NetarchiveSuite 7.x Release Notes

For the rest of the Spring, the  Core Development Team (ie Colin + Rasmus) will be concentrating on support tasks in connection with migration and deployment of NetarchiveSuite 7.0 so there will be very limited resources for development work on the NetarchiveSuite codebase.

Status of the production sites

Netarkivet

BnF

We are pleased to announce that, last month, we published our selective crawls seed lists on the new version of the BnF website dedicated to APIs and datasets. These lists are created from BCWeb exports including some crawl settings and descriptive elements like themes and keywords.
In 2020, three new crawls were launched and added on the website: Instagram, Artificial Intelligence and Environnemental Issues.
You can consult all these lists at this address: https://api.bnf.fr/fr/liste-des-adresses-url-des-collectes-ciblees-du-web-francais-par-la-bnf
Another page which is focused on Covid-19 selections can be consulted at this address: https://api.bnf.fr/fr/node/176

For the second consecutive year, we launched an Instagram crawl. We plan to make five Instagram crawls, some of them are about specific subjects like the Olympic games or the regional and departmental elections in France.
Just like last year, we had to crawl picuki.com. Actually, in spite of many tests, we always end up being blocked by Instagram.

And finally, our in-house harvesting workshop about Flash is going to finish. It was complicated to find a way to harvest automatically some of the websites with Flash animations because some URLs are dynamically generated or relative, and so they are inaccessible to Heritrix. So we will try to discover all the URLs with the help of a human hand and we will launch the harvest in a second time.
In case of successful crawl, we will sometimes have an issue with compatibility of Flash plugin used with the Wayback.

ONB


BNE

  • New contribution to coronavirus international crawl with a selection of 200 seeds
  • Last month we had two meetings with our regional web curators from different part of Spain. We worked on the selection of seeds.
  • This month we are working with our collaborators on a new event collection for the regional election in Madrid
  • We continue to work on regional election in Catalonia

KB-Sweden


Next meetings

  • May 4th
  • June 8th
  • July 6th
  • September 7th
  • October 5th
  • November 2nd
  • December 14th
  • January 11th, 2022

Any other business?

·