...
- New Heritrix release (3.4.0-20240909) made by Alex Osborne (NLA): https://github.com/internetarchive/heritrix3/releases/tag/3.4.0-20240909 which includes our PR on alternative resolution images: https://github.com/internetarchive/heritrix3/issues/604
- Proposal on "18 years of NetarchiveSuite collaborative use and development" has been submitted as a 15-min presentation for IIPC WAC 2025 with Colin, Anders, José and Sara as presenters: https://docs.google.com/document/d/1U1GUvqb31cqAz1egrWBjKuYHUW1sBeW_onKB2yJZF2k/edit
- Colin intends to use next week to try to create a new NAS release using all the latest Heritrix updates. First priority will be make sure that dockerised Quickstart can run the newest versions.
Status of the production sites
Netarkivet
Panel |
---|
|
BnF
Panel |
---|
We are still continuing preparations for our upcoming 2024 broad crawl. The technical tests were successful and we launched our test broad crawl last week on 2300 URLs per domain for 5.9 million starting domains. The Olympic and Paralympic Games harvests are now complete. In total, 15 weekly and twice monthly crawls were carried out between the beginning of June and mid-September. 1095 seeds were selected for the harvest as well as 59 Youtube channels, 340 Instagram accounts and several press sections dealing with the subject were daily harvested. A virtual guided tour about Environmental Issues is currently being prepared. It includes 14 themes, currently being drafted, which concern the various issues surrounding climate change, the preservation of biodiversity, health, agriculture, public policies and the energy transition. It should be published at the end of 2024 or the beginning of 2025. |
ONB
Panel |
---|
BNE
Panel |
---|
Last week, we completed the broad crawl of the .es domain. I do not have the results yet, but I will present them at the next meeting. This week, we are starting the broad crawl of the .gal domain, which is the regional domain of Galicia. The .gal domain includes more than 7,000 websites, and we have already completed the preparation work to begin the harvesting. We are also working on a project to recover missing e-journals. So far, we have identified more than 500 e-journals published between 2011 and 2023. Our goal is to include links to the Spanish Web Archive in the catalogue, using the new field 857 (Marc 21: https://www.loc.gov/marc/bibliographic/bd857.html), to restore access to these e-journals. |
KB-Sweden
Panel |
---|
Next meetings
...