...
Update on NAS latest tests and developments
NAS 7.4.4 was released over the New Year. It fixes a bug relating to download of very large crawl logs via hdfs.
If anybody wants to read a day-in-the-life story, I spent literally one day trying to a) learn kubernetes and b) use it to create a Netarchive Suite deployment. Not surprisingly it wasn't a 100% success, but I learned a lot from the attempt - https://sbprojects.statsbiblioteket.dk/pages/viewpage.action?pageId=141575534 .
Status of the production sites
...
Panel |
---|
First of all, this week, we are launching our first internal harvesting workshop of the year 2022. Until March, 31th, our team will experiment Browsertrix with different types of websites. In this framework we will also test the harvest of social networks. Following the TikTok crawl launched in 2022 on the theme of the elections, we are going to launch our first current TikTok harvest this month. On March 13, there will be an exchange day around the results and future prospects of the ResPaDon project, the aim of which is to "to set up a network about web archives". This day will be held at the BnF and will be broadcast live on Youtube. |
ONB
Panel |
---|
BNE
Panel |
---|
Continuing with the tests to harvest Twitter. We have avoided the 429 error launching the crawl over night. We have a new error, but Miguel thinks it is a problem with the operating system, it’s not enough updated, and not a Twitter problem. It's not possible for us to harvest hashtags or trending topic, we have an error 404, we don’t know how to avoid it. Special crawling for the International Women’s Day, we harvest dairy more than 170 website and more than 100 Twitter profiles and Twitter accounts. Preparing the broad crawl of magazine in free Access. More than 10,000 magazines in open Access. We plan to launch in March or early April. |
KB-Sweden
Panel |
---|
Next meetings
...