Table of Contents |
---|
...
Panel |
---|
Broad crawl Corona event harvest Personnel Allan Christophersen <ALCH@kb.dk> has joined as project employee and is on Netarkviet 20% of his time SolrWayback; https://github.com/netarchivesuite/solrwayback/releases/tag/4.0.5 https://github.com/netarchivesuite/solrwayback http://webadmin.oszk.hu/solrwayback/ (Hungarian Archive) |
BnF
Panel |
---|
Our annual broad crawl has ended on 7th of November. It lasted 32 days, executed 1037 jobs, and crawled 2,455 billions of URLs for a size of 117,59 TB (compressed). The French newspaper Liberation contacted our team to inform us that their blog platform (https://www.liberation.fr/blogs,26) would be closed in the course of December. The platform hosts more than 300 blogs. We launched an emergency crawl last week to crawl these blogs and preserve them. We are working on the full text indexation (with Solr) of our covid-19 crawl performed between February and July of 2020 and covering the first wave of the pandemic. The size of this collection is about 15 TB (compressed). The new collection will be put in production during december and will be available to the readers through the GUI Archives de l'internet Labs. |
...
- January 5, 2021
Any other business?
·