Agenda for the joint BNF, ONB, SB, KB and BNE NetarchiveSuite tele-conference 2016-09-20, 13:00-14:00.
Practical information
- Go to https://c.deic.dk/netarkivstyregruppe
- Login as guest
- Write your name
- Insert password: wayback
Participants
- BNF: Lam, Annick, Sara
- ONB: Michaela, Andreas
- KB/DK: Søren, Stephen, Nicholas
- SB: Sabine, Colin, Niels
- BNE: Juan Carlos, Fernando, Elena
- KB/Sweden: Bengt
IIPC crawler hackathon in London
September 22-23. Søren, Colin, Bert will attend.
Topics, attendees: https://drive.google.com/drive/folders/0BwTi-qdD0KvdNEE4Qmpaa2dJeHM
Common questions/interests to bring?
NAS 5.2 Developement Update
https://sbforge.org/jira/secure/RapidBoard.jspa?rapidView=8
On BnF side: some bugfixes:
- NAS-2544Getting issue details... STATUS
- NAS-2545Getting issue details... STATUS
- NAS-2546Getting issue details... STATUS
- NAS-2553Getting issue details... STATUS
Translation of new keys in French and German.
Considering the adoption of WARC revisit records for duplicates.
NAS workshop in Vienna
January 30th 2017 - February 1st 2017 - Vienna
Participants: http://doodle.com/poll/nk6dfc3kav4a4hs8
NetarchiveSuite Curator Issues
Should we "reanimate" our curator roadmap/backlog, revise it and discuss it in Vienna?
Status of the production sites
Netarkivet
Broad crawl
- Last week we launched the third broad crawl 2016. The crawl limit per domaine will be max. 100 MB. There will be special crawls for ministeries and government bodies, and for ultra big sites (e.g. dr.dk)
- We will try to get in touch with the webpage owneers/web hotels who are blocking our crawler (about 11% are blocking us)
Event crawl
- The event collection for the Olympics in Rio 2016 will go on until the end of the Paralympics 2016
Selctive crawls
- We are working on the configuration of the regional/local news media crawls.
- Facebook
- We have test-crawled about 60 Danish Facebook profiles with Archive-IT. We are analyzing how much we get from the profiles. We have to renew our account with Archive-IT after the end of November and we are trying to negotiate a good prize.
- We made a special crawl of Prime Minister Lars Løkkes Facebook profile on 2016.08.30, the day he published his 2025 plan.
Compression of the archive
- We are preparing for the compression, but this awaits NAS release 5.2
Last not least
Last week we learned, that the ministry of culture wants KB and SB to merge: From January 2017 we will be “Nationalbiblioteket” with two locations, in Copenhagen and Aarhus
BnF
ONB
- We switched to NAS 5.2 already because we had severe problems with https websites with the former version. These problems are fixed now by using H3 which runs under java 1.8.0_77 and following disabled jdk.tls Algorithms in /opt/jdk1.8.0_77/jre/lib/security/java.security
jdk.tls.disabledAlgorithms=SSLv3, DHE, ECDHE, RC4, MD5withRSA, DH keySize < 768
It went smooth so far. We are still using the arc format, because we have to refactor all our tools before we switch to warc. - The crawl about our presidential elections still running, we have a new election date beginning of December and hope to be able to finish the crawl soon.
- Apart from one small, additional thematic crawl we will only have ongoing crawls until the end of the year. Next domain crawl is scheduled for 2017.
BNE
KB/Sweden
Next meetings
- October 25
- November 29
- January 3, 2017
Any other business?