2016-10-25 Statusmeeting

Agenda for the joint BNF, ONB, SB, KB and BNE NetarchiveSuite tele-conference 2016-10-25, 13:00-14:00.

Practical information

Participants

  • BNF: Lam, Annick, Sara
  • ONB: Michaela, Andreas
  • KB/DK: Søren, Stephen, Tue
  • SB: Sabine, Colin, Niels
  • BNE: Mar
  • KB/Sweden: -

NAS 5.2 Release Update

https://kb-dk.atlassian.net/secure/RapidBoard.jspa?rapidView=8

What is the status of 5.2?

Latest features and issues.

Release tests.

NAS workshop in Vienna

Postponed. Please see http://doodle.com/poll/mvgm5w2v3bk6dsc7

Wiki page and inventory of topics to be discussed: 2017 NAS workshop (add details and new topics before November 29)

Status of the production sites

Netarkivet

We have finished our “little” broad crawl:

Name

Start time

Stop time

Bytes

Documentes

2016-3-100MB

13.09.2016

30.09.2016

11.654.316.537.483

252.238.201

We have nearly finished the reorganization of our selective crawls according to the new strategy:

  • Daily crawls of all national news sites
  • Daily crawls of all regional news sites
  • Weekly crawls of all local news sites
  • Monthly crawls of political parties’ sites
  • Trimonthly crawls of ministries’ and administrative bodies’ sites
  • “Streamlining” of Twitter crawls
  • Analyze of depth and frequency for a crawl of organizations and associations

We renewed our account at Archive-IT, it is supposed to be used for Facebook crawls

NAS 5.2 is released for developers test. Test for curators is planned for the end of October. 

We are upgrading the citrix installation, which gives access to wayback.

We have testet  Ilya Kraemers W/ARC player for displaying https pages: it works fine, but there are some security issues to be fixed.

BnF

  • Start of our 2017 broad crawl on October 10th (4,4 million domains, 3500 URLs per domain, 40 crawlers working with 10 threads each during the day and 30 threads during the night because of bandwidth constraints). We expect to harvest 80 TB of data.
  •  We redesigned our Wayback and will give access to our full text indexed 1996-2000 collection with Shine in November.

ONB

  • The crawl about our presidential elections still running
  • Currently we are compressing all metadata arc Files by using the jwat tools. We're trying to gain between 1 and 2 TB extra diskspace with this task.

BNE

  • Our General Elections crawl is still running. We hope we can close it by the end of the week, in case the Prime Minister is voted by the Parliament. For this last week, the crawl is running daily, with collaboration of regional web curators that have been adding, reviewing and doing quality assurance of the collection of seeds.
  • Our main tasks by now are related to the working meeting we are preparing with the regional web curators, here at the Library, on November 7th.
    • We have scheduled a short workshop for BCWeb, that web curators are using in a preproduction environment so far. But they are already building their own web collections using CWeb.
    • We are working also in a safe access to our web collections (and the non-print legal deposit in general) using Remote Desktop, for the Regional Libraries to remotely access to the non-print legal deposit, including web archive and deposited publications.
  • Elena is not working with us anymore. She moved to the European Commission in Luxemburg. 

KB/Sweden

 

Next meetings

  • November 29
  • January 3, 2017

Any other business?