2019 NAS workshop

The 2019 NAS workshop will take place on February 20-22 and will be hosted by National Library of Spain in Madrid.

Location: National Library of Spain (entry at the ground floor)

Address: Paseo de Recoletos, 20-22 - 28071-Madrid

Map: https://www.google.com/maps/place/Biblioteca+Nacional+de+Espa%C3%B1a/@40.4235049,-3.6916211,17z/data=!3m1!4b1!4m5!3m4!1s0xd4228907e039627:0xd5b5764d8a0a53f7!8m2!3d40.4235049!4d-3.6894324

Tourist information: recommended things to see in Madrid

Hotels: see below

Participants:

Organization

Technical

Curator 

Netarkivet

Colin Samuel Rosenthal

Knud Aage Hansen 

Tue Hejlskov Larsen

Kristian Bak

Anders Klindt Myrvoll

Sabine Schostag

Stephen Hunt

ONB (via Skype)

Andreas P

Michaela

BnF

Sara Aubry

Clara Wiatrowski

Géraldine Camile

BNE

Juan Carlos García

José María Martín

Fernando Monzón

Luis Sánchez

Nuria Serrano

Alicia Pastrana

María Bueno

María Ezquerra

Yasmín Rommaneh

Mar Pérez

NL of SwedenThomas Roos

Pär Nilsson

Peter Svanberg

Topics to be discussed:

TechnicalCuratorial
  • State of the art of current bugs and possible fixes
  • State of the art of current developments and upcoming developments  in NetarchiveSuite
  • Integration of latest H3 stable release
  • Videos, social media, Umbra (installation, configuration tests and usage)
  • Introducing WARC 1.1
  • Brainstorming on priorities for future developments
  • Brief state of the art on access tools: use and perspectives in the different institutions
  • SolrWayback demo
  • How does your NAS Deployment/Configuration look like (Settings, Hardware)?
  • Oracle java is subject to a fee. ONB's IT department would like to switch completely to OpenJava if possible. Does NAS work as usual on OpenJava? What do you think? What will you do?

  • NAS missing features
  • Brainstorming on priorities for future developments
  • Scheduling harvests at a precise date or period
  • Presentation of BCweb latest release and futures evolutions
  • Presentation of how do we collect and crawl youtube and give access
  • Coordination of external selections
  • What documentation shall we provide for researchers?
  • How do we make workspaces for researchers - tools, limits?
  • Capturing social media
  • Webarchives and digital preservation
  • Which browser and version do we support today and in the near future in harvester requests, Umbra and in archive acccess? Umbra usage and experiences
  • OpenWayback and CDX creation issues and development - and experiences with other tools e.g. pywb, SOLRWayback?
  • Broad crawls
    • How do we make job monitoring during broad or big “deep” crawl’s?
    • How do we manage huge webhotels?
    • How do we manage byte/objects limits for different groups of domains?


Agenda

Schedule for 20.02.2019 (12:30-17:30)

12:30 - 14:00 Arrival, sandwiches and coffee

14:00 - 14:15 Welcome (Ana Santos Aramburo, director of BNE)

14:15 - 14:30 Workshop introduction (Mar, Sara)

14:30 - 16:00 Institution updates and plans for 2019 (15 min each)

16:00 - 16:15 Coffee break

16:15 - 17:30 NetarchiveSuite 5.5: demo and discussion of latest features including Umbra (Colin), Umbra usage and experiences, Feedback on tests with input from Clara (ppt), Tue (ppt)

20:00 Dinner (at own expense). Inclan Brutal Bar (Calle Álvarez Gato, 4)

Schedule for 21.02.2019 (9:00-17:00)

09:00 - 12:00

Technical track:

  • Share NAS deployment and configuration in our institutions to identify used/unused components: See form.
  • Discuss state of the art of current bugs and possible fixes
  • Review lists of NAS bugs and missing features and internal lists : NAS curator roadmap (NASC), BnF 2019 list
  • JIRA issues labelled "Madrid":
      

    key summary created reporter
    Loading...
    Refresh

  • Discuss possible integration of OpenJava, latest H3 stable release and WARC 1.1
  • Brainstorm on priorities and NAS codebase evolution for future developments
  • Discuss the possibility to submit an IIPC project

Curator track:

  • Review and update the NAS curator roadmap (NASC)
  • Brainstorm on priorities for future developments from a curatorial perspective
  • Discuss practices and challenges in coordinating external selections (Géraldine, Sabine, Mar)

10:30 - 10:45 Coffee break

12:00 - 12:30 Sum up of curators and technical priorities

12:30 - 14:00 Lunch

14:00 - 17:00 Complex harvesting

Share experiences, practices and questions in the management of broad crawls:

  • How do we make job monitoring during broad or big “deep” crawls?
  • How do we manage huge webhotels (companies that host many websites)?
  • How do we track web parkings?
  • How do we manage byte/objects limits for different groups of domains?

Share experiences and practices in crawling and giving access to YouTube videos (Sara)

Share experiences in crawling social media (Facebook, Twitter, SlideShare, Flickr, Instagram)

Discuss possible further cooperation on these topics, common tools integration

15:30 - 15:45 Coffee break

17:00 - 19:00 Guided tour of the BNE

Schedule for 22.02.2019 (9:00-14:30)

09:00 - 10:30 Update on BCweb (Géraldine, Clara) - CSV sample

  • Demo of BCweb new functionalities
  • Update on BnF current and upcoming developments
  • Update on open source status
  • Discuss interest in upgrading and possible community developments

10:30 - 10:45 Coffee break

10:45 - 12:30 Access tools to webarchives

12:30 - 13:00 Community next steps

13:00 - 14:30 Lunch and goodbye




Hotels nearby: We suggest to look for some offers on www.booking.com for these hotels below, as the Library can't provide special offers.