Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Participants

  • BNF: Sara, Annick, Lam
  • ONB: Michaela, Andreas
  • KB/DK: Søren, Tue, Jonas, Stephen, Nicholas
  • SB: Colin, Sabine, Niels
  • BNE: Mar
  • KB/SE: Bengt ??, Stewart ??

NAS 5.1 Update

...

Now in use in Netarkivet production environment

IIPC GA (all)

Feedback and important information from GA

NAS workshop (Sara)

Topics

1) Share experience with NAS 5 and Heritrix 3

2) Discuss challenges with specific types of sites (news, social media)

3) Discuss collection strategies

4) Discuss features/a GUI to handle the harvester

5) Look into the possibility to integrate another crawler into NAS (Colin proposed to come with a prototype with a headless browser)

Schedulefrom DK on running NAS 5 + Heritrix 3.

KB/SE joining the NAS meetings

The royal Library of Sweden is going to use NAS.

Bengt Neiss and Stewart Rutledge are joining the teleconferences.

NAS workshop

End of January 2017 - 2,5 days - in Vienna

Poll from Michaela Please complete Michaela's poll : http://doodle.com/poll/nk6dfc3kav4a4hs8

Status of the production sites

Netarkivet

 
Panel

Broad crawl
We started the second broad crawl 2016 with a limit of 100 MB from each domain to be crawled.

Event crawls
We stopped the refugee crisis crawl. We did a smaller event crawl for the “Eurovision Song Contest”, were we focused on the Danish participants presence on Twitter and on thematic news sections. We are preparing for a crawl of the Olympic in Rio.

Selctive crawls
We started the implementatoin of our revised collection strategy. We have almost established the new selective crawls of national news sites.

One of the first social media platforms, arto.com, closed at 1st  June. We had problems with our last complete crawl before the closing. With a specially developed modul, where the FetchDNS method is changed, we hope to be able to get all content directly from their server.

Potential collaboration project
The Parliamentary Library gives inhouse access to historical (archived) versions of the political parties’ websites. They are not quite satisfied with their solution. Netarchive and the Parliamentary Library are looking at potential future cooperation on this subject.

Internal
Niels Bønding is project lead for curation now.

BnF

 
Panel

This month we're opening an experimental access interface, Archives de l'internet Labs. This interface provides full-text searching of a small part of our collections, with the possibility to export results and save searches and selections in a personal workspace. It also provides access to statistics and metadata on the collections.

This interface builds on the work we have done over the past year or so on data mining and full text indexing. It is part of a four-year project at the BnF studying the creation of a service to provide researchers with corpora from the digital collections of the BnF, the web archives having been chosen as the case study for the first year of the project. For the moment this interface will only be available to researchers working on two specific projects who have signed a convention with the BnF, but as part of the overall project we will be looking at how this kind of service can be offered to more researchers.

ONB

Panel
 

BNE

Panel
 

Next meeting

2016-0607-14

Any other business?