2020-07-07 Statusmeeting
Agenda for the joint NetarchiveSuite tele-conference 2020-07-07, 13:00-14:00.
Participants
- BNF: Clara, Sara, Alexandre
- ONB: Andreas
- KB/DK - Copenhagen: Tue, Stephen, Anders
- KB/DK - Aarhus: Sabine, Kristian, Colin
- BNE: Alicia
- KB/Sweden: Pär, Peter
Join from PC, Mac, Linux, iOS or Android:
https://kbdk.zoom.us/j/104443571
Or an H.323/SIP room system:
H.323: 109.105.112.236
Meeting ID: 104 443 571
SIP: 104443571@109.105.112.236
Or Skype for Business (Lync):
https://kbdk.zoom.us/skype/104443571
Or Telephone:
Denmark: +45 89 88 37 88 or +45 32 71 31 57
United Kingdom: +44 203 051 2874 or +44 203 481 5237 or +44 203 966 3809 or +44 131 460 1196
Finland: +358 9 4245 1488 or +358 3 4109 2129
Sweden: +46 850 539 728 or +46 8 4468 2488
Norway: +47 7349 4877 or +47 2396 0588
US: +1 669 900 6833 or +1 646 558 8656
Meeting ID: 104 443 571
International numbers available: https://zoom.us/u/acRu0MV3xJ
You can join a meeting by using apps from a pc, a tablet or a smartphone, but you can also use the browser based version (it works with newer versions of Chrome or Firefox)
Update on NAS latest tests and developments
NAS 6.0 has been released: https://kb-dk.atlassian.net/wiki/pages/viewpage.action?pageId=38897345
and includes latest IIPC H3 release: https://github.com/internetarchive/heritrix3/releases/tag/3.4.0-20200518
Any feedback/questions?
Our next NAS workshop was originally scheduled end of 2020 in Sweden. Do we maintain it or postpone it? Do we want to organize a replacement with a virtual meeting? What would be the hot topics?
Status of the production sites
Netarkivet
We are working on upgrading our BCWeb installation to 7.2.0.
BnF
At the end of June, we have put in production the new version of NAS (6.0.0) with the official IIPC version of Heritrix (3.4.0-20200518). By this upgrade, we intend to improve the quality and the completeness of our crawlings. The new version of Heritrix includes contributions done by BnF's IT team's developers : treatment of the "data" attribute in the pictures tags, and harvesting of the files hosted on servers secured by SFTP, and not only on FTP servers. With the new Javascript extractor and the inclusion of "data" attrributes, we expect a significative amelioration in the harvesting of pictures, especially for the responsive websites. In addition, the new version of Heritrix allows parallelization of queues, and we expect more rapidity and completeness in the social networks accounts harvesting, singularly Twitter. In the next weeks, we plan to compare jobs done by the previous and the new version of Heritrix, to assess if these improvements become a reality.
The second round of the local elections was held on 28th of June. Since the beginning of June, our elections crawl continues with the initial schedule again : social networks crawled twice a day and other websites crawled twice a month. The crawling will go on until mid-July to cover the setup of the new city councils and the investiture of the mayors.
ONB
We have put in production version 6.0 of NAS
BNE
Event crawl
We reactivate two event crawls about local elections. We had to stop them in March because the elections were postponed.
Broad crawl
We have started our annual broad crawl. It is not the best time because the Library has not returned to a normal situation but we thought that there will be many websites that will disappear due to the crisis caused by the coronavirus.
NAS and BCWeb
No news. We will have to wait to a new head of IT team.
KB-Sweden
Next meetings
- September 8, 2020
- October 6, 2020
- November 3, 2020
- December 8, 2020
- January 5, 2021
Any other business?
·