2019-11-05 Statusmeeting

Agenda for the joint NetarchiveSuite tele-conference 2019-11-05, 13:00-14:00.

Participants

  • BNF: Clara, Sara
  • ONB: Andreas
  • KB/DK - Copenhagen: Tue, Stephen, Anders, Kristian
  • KB/DK - Aarhus: Colin, Sabine, Knud Åge
  • BNE: Alicia, María, Manuel, José
  • KB/Sweden: Par, Thomas, Peter

Join from PC, Mac, Linux, iOS or Android:

    https://kbdk.zoom.us/j/104443571

Or an H.323/SIP room system:

    H.323: 109.105.112.236
    Meeting ID: 104 443 571

    SIP: 104443571@109.105.112.236

Or Skype for Business (Lync):

    https://kbdk.zoom.us/skype/104443571

Or Telephone:

Denmark: +45 89 88 37 88 or +45 32 71 31 57
United Kingdom: +44 203 051 2874 or +44 203 481 5237 or +44 203 966 3809 or +44 131 460 1196
Finland: +358 9 4245 1488 or +358 3 4109 2129
Sweden: +46 850 539 728 or +46 8 4468 2488
Norway: +47 7349 4877 or +47 2396 0588
US: +1 669 900 6833 or +1 646 558 8656
    Meeting ID: 104 443 571

    International numbers available: https://zoom.us/u/acRu0MV3xJ

You can join a meeting by using apps from a pc, a tablet or a smartphone, but you can also use the browser based version (it works with newer versions of Chrome or Firefox)


Update on NAS latest tests and developments

Feedback on usage / tests on NetarchiveSuite 5.6 release: see NetarchiveSuite 5.6 Release Notes

Feedback on tests on BnF test NAS 6.0 + IIPC H3 release : see presentation

Status of the production sites

Netarkivet

Because of a political decision to change the terms of conditions, a broadcast station (radio 24/syv) decided to close down on 31 October. The announcement of the popular broadcast station to be closed raised a storm of reactions in the social media. People asked whether KB DK was going to keep this broadcasting station’s archive. We tried to capture as much podcasts as possible with umbra. As to the QA of the harvested content, we had to wait for the generation of an index (there was a long queue in the index generator), which was not ready before 4 November.

As Yahoo Groups are going to close down, too, we crawled Danish Yahoo Groups in the last couple of days.

The fourth broad crawl for 2019 is in preparation: there will be a step 1 with a domain limit of 50 MB and a step 2 with a domain limit of 16 GB. Together with this broad crawl we will run the following selective crawls: Research databases, Municipalities and regions, Ministries and Government Agencies, YouTube
We will crawl with NAS 5.5 and expect step 1 to last about 2 weeks, step 2 about 6-8 weeks.

Other projects keeping us busy:
• Work on risk assessment
• Implementation of SolR Wayback
• Consolidation of BCWeb (build up a community)
• Revision of collection strategies
• Capture of content behind paywalls – the never ending story

BnF


ONB

  •  We finished our 7th domain crawl, which was done again with only one stage. We crawled 150 Million objects in almost 7 TB (this is about 3 TB on disk).
  •  We upgraded to NAS 5.6 and use it in production
  • We moved out the database to a stronger server, and since then the duplicate job generation error disappeared. Currently we are doing only selective crawls. We will see how it will work during a domain crawl.


BNE

We have finished our annual broad crawl:

  • There were about 2 million of websites
  • We have configured it with a limit of 150 MB/domain
  • It lasted 29 days
  • We have crawled 88% of the domains completely

We have some problems with the index and we are not be able to make QA of some of our content since February

We are going to install 6.1 version of BCWeb  in the next weeks. After that, we will send our changes in BCWeb in order to let you analyze them and consider if you want to include them in next versions

KB-Sweden


Next meetings

  • December 3
  • January 7, 2020

Any other business?

·