2016-10-25 Statusmeeting
Agenda for the joint BNF, ONB, SB, KB and BNE NetarchiveSuite tele-conference 2016-10-25, 13:00-14:00.
Practical information
- Go to https://c.deic.dk/netarkivstyregruppe
- Login as guest
- Write your name
- Insert password: wayback
Participants
- BNF: Lam, Annick, Sara
- ONB: Michaela, Andreas
- KB/DK: Søren, Stephen, Tue
- SB: Sabine, Colin, Niels
- BNE: Mar
- KB/Sweden: -
NAS 5.2 Release Update
https://kb-dk.atlassian.net/secure/RapidBoard.jspa?rapidView=8
What is the status of 5.2?
NAS workshop in Vienna
Postponed. Please see http://doodle.com/poll/mvgm5w2v3bk6dsc7
Wiki page and inventory of topics to be discussed: 2017 NAS workshop (add details and new topics before November 29)
Status of the production sites
Netarkivet
We have finished our “little” broad crawl:
Name | Start time | Stop time | Bytes | Documentes |
2016-3-100MB | 13.09.2016 | 30.09.2016 | 11.654.316.537.483 | 252.238.201 |
We have nearly finished the reorganization of our selective crawls according to the new strategy:
- Daily crawls of all national news sites
- Daily crawls of all regional news sites
- Weekly crawls of all local news sites
- Monthly crawls of political parties’ sites
- Trimonthly crawls of ministries’ and administrative bodies’ sites
- “Streamlining” of Twitter crawls
- Analyze of depth and frequency for a crawl of organizations and associations
We renewed our account at Archive-IT, it is supposed to be used for Facebook crawls
NAS 5.2 is released for developers test. Test for curators is planned for the end of October.
We are upgrading the citrix installation, which gives access to wayback.
We have testet Ilya Kraemers W/ARC player for displaying https pages: it works fine, but there are some security issues to be fixed.
BnF
- Start of our 2017 broad crawl on October 10th (4,4 million domains, 3500 URLs per domain, 40 crawlers working with 10 threads each during the day and 30 threads during the night because of bandwidth constraints). We expect to harvest 80 TB of data.
- We redesigned our Wayback and will give access to our full text indexed 1996-2000 collection with Shine in November.
- On November 23, we are celebrating our anniversary: 20th anniversary of the very first web archives, 10th anniversary of the law that gave us our legal mandate. More information on the event: http://www.bnf.fr/fr/professionnels/anx_journees_pro_2016/a.jp_161122_23_archivage_web.html
ONB
- The crawl about our presidential elections still running
- Currently we are compressing all metadata arc Files by using the jwat tools. We're trying to gain between 1 and 2 TB extra diskspace with this task.
BNE
- Our General Elections crawl is still running. We hope we can close it by the end of the week, in case the Prime Minister is voted by the Parliament. For this last week, the crawl is running daily, with collaboration of regional web curators that have been adding, reviewing and doing quality assurance of the collection of seeds.
- Our main tasks by now are related to the working meeting we are preparing with the regional web curators, here at the Library, on November 7th.
- We have scheduled a short workshop for BCWeb, that web curators are using in a preproduction environment so far. But they are already building their own web collections using CWeb.
- We are working also in a safe access to our web collections (and the non-print legal deposit in general) using Remote Desktop, for the Regional Libraries to remotely access to the non-print legal deposit, including web archive and deposited publications.
- Elena is not working with us anymore. She moved to the European Commission in Luxemburg.
KB/Sweden
Next meetings
- November 29
- January 3, 2017
Any other business?