Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Agenda for the joint BNF, ONB, SB and KB NetarchiveSuite tele-conference April 15th Septemper 9th 2014, 13:00-14:00.

Practical information

...

  • BNF: Sara and Lam
  • ONB: Michaela and Andreas
  • KB: Tue, Søren and Nicholas
  • SB: Mikis and SabineAny other issues to be discussed on today's tele-conference?Colin, Sabine and Mikis

Development

  • Planning of the next iterations 

Planning for GA in Paris (19th to the 23rd May)

Who will be attending the GA?

  • KB: Birgit, Tue, Mads (22-23), Eld (19-21).
  • SB: Sabine, Ditte (19-21), Colin, Mikis
  • ONB: Andreas, Michaela
  • BnF: Sara, Lam, Bert, Sébastien, Clément, Annick, Géraldine, Peter, Sophie

Internal NetarchiveSuite discussion, see 2014-05-23 IIPC GA in Paris.

NetarchiveSuite workshop 2014

When and where

NetarchiveSuite workshop 2014-2015

Proposal from The National Library of Estonia (Jaanus Kõuts) in Tallinn

  • 28.01.2015 International seminar on web archiving
  • 29.-30.01.2015 NAS meeting (thursday-friday)

Contributions to the international seminar on the 28th?

Status of the production sites

Netarkivet

Panel
  • Event harvests: In May we finished one of our largest event harvests, the ESC 2014 event harvest. For the first time researchers have participated almost from the beginning. We executed two more event harvests: one on the European elections in May and one on the Danish tabloid magazine “Se og Hør” s  use of illegal methods for journalistic research in May/June
  • Documentation: Planning the migration of our documentation from the oldfashioned MoinMoin Wiki to a system which can meet the requirements from both curators and users/researchers is an ongoing  process. We have nearly finished our requirement specification and we are testing the extended fields in NAS for usability on a part of the documentation
  • Access: We are working on a citrix-login based access for our users. Until now it is opened for employees only.
  • Technical issues: we successfully upgraded our test environment to NAS 4.4, and we have planned to upgrade our production environment in August.

BnF

Panel

This month we thought we'd give you an overview of all the project crawls we are running this year, as several of them have taken place during the past month.

 

We have several crawls relating to events and anniversaries in 2014:

- The centenary of the First World War - this is a project that began last November and will continue until 2018 with three or four crawls per year.

- The 250th anniversary of the death of Jean-Philippe Rameau (covered in our last monthly update).

- Local and European elections - the French local elections took place last month and we are preparing the crawls in the lead up to the European elections in May.

- Winter Olympic and Paralympic Games - as part of the IIPC project.

 

There are also project crawls on specific themes or types of document (these are all continued from previous years):

- News and subscription news sites - crawled every day.

- Online personal and literary journals - the first crawl took place in March, the second will be in August.

- Solidarity and social movements - planned for May and June

- Travel journals - planned for June

- Auction catalogues - planned for July

- French and American official publications - two separate crawls both planned for July.

- Dailymotion videos - planned for August.

 

In addition, we also maintain our  "ongoing crawls", i.e. all the sites selected by BnF departments according to their collection policies which are collected at different frequencies: once a year, twice a year, monthly or weekly.

 

Since our storage budget is the same in 2014 as in 2013, the number of project crawls and the increase in the number of domains in our broad crawl means we are trying to optimise our ongoing crawls. We are working with the librarians who select sites to limit the number of sites that are included in multiple crawls, and to make sure that the sites collected more frequently than once a year change often enough to justify this. We've also removed the largest budget from the twice-yearly crawl, and we've changed the way Heritrix handles queues for sites with a "domain" depth - previously we had queues per host, so the budget allocated was multiplied by the number of hosts. We now have a single queue and therefore a single budget for each domain. This doesn't seem to have had an impact on the speed of crawls.

ONB

Panel
  • Olympics and Paralympics crawl finished
  • Preparing for EU elections (starting in May) and WWI crawl (starting in June)

           We have finished our 2nd broad crawl 2014 and will start the 3rd one in the end of August..

           In the end of July Netarchive surmounted 500 TB

           A Citrix access solution to our wayback is almost in place, we  are doing the final tests and bug fixes.

           We are still working on requirements for a new platform for our documentation. Confluence wiki probably will be part of the solution.

    BnF

    Panel

    We have just launched the third capture of our crawl on the centenary of the First World War. This project started in November 2013 and will continue until 2018 ; there are currently around 500 sites that have been selected by BnF librarians and partner institutions. This crawl is linked to a research project whereby we will be working with a researcher to develop tools and approaches for text and data mining on our collections.

    We are also pleased to welcome a new member of the team - Ange Aniesa joined us at the beginning of July, and he'll be working in particular on cooperation with institutions in France.

    ONB

    Panel

    We are working on a new user interface for the webarchive. It will include fulltext search (with elasticsearch) for a part of our collections. It will we opened to the public soon and will be accessible online. The archived data will still be available only on site.

    The webarchive had its 5th birthday this year and will soon reach 2 billion archived objects. ONB might release a press statement about this in conjunction with the opening of the new interface.

    We prepare crawls about WWI and regional elections.

    Next meeting

     

    Any other business?

    ...