Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Panel

A quick summary of the different selective crawls we are doing this year.

We distinguish between "ongoing crawls", in which librarians  in the different departments in the BnF select seeds based on the collection policy of their department, and  "project crawls", which are collaborations between two or more departments, sometimes with external partners, based around a particular theme or an event.

For ongoing crawls there is a choice of four depths, four frequencies and three budgets. The use of budgets (small, medium or large) allows us to plan and monitor the crawls more efficiently; in terms of harvest definitions in NAS, for the twice-yearly and annual crawls we create harvest definitions for each budget, while weekly and monthly crawls are only given a "small" budget.  Project crawls can have a different range of technical settings for specific reasons.

The harvest definitions for "ongoing crawls" are as follows:

- weekly - launched every Monday at noon

- monthly - launched the first of each month

- twice-yearly (small, medium and large budgets) - the first crawl took place in February/March, and the second will be launched in August

- annual (small, medium and large budgets) - the yearly crawl has been launched this week.

 

The list of "project crawls" for 2013 is as follows:

- news sites - around 100 sites crawled every day, at a depth of homepage plus 1 click.

- subscription news sites - we are progressively adding titles to our crawl to collect subscription editions of news sites (5 at the moment).

- online journals - twice a year, personal and literary blogs. The first crawl was completed in March and the second will be held in August.

- videos - once a year, currently limited to Dailymotion. We have just finished this crawl and will give more details in next month's update.

- solidarity and social movements - two project crawls on social issues in France, to be launched in May and June.

- blogs - once a year, to improve the collection of blog platforms that are poorly covered in the broad crawl. The crawl will be launched in June.

- auction houses - annual crawl of auction catalogues, to be launched in June.

- travel journals - a crawl of online travel journals, also in June.

- official publications - annual crawl of government websites and publications; takes place in July.

- US official publications - crawl of US governement publications under the IDEA agreement to replace exchanges of paper documents with electronic versions; also takes place in July.

- Jean-Philippe Rameau - a crawl for next year's 250th anniversary of the death of this French composer, the crawl is planned for September.

 

ONB:
Panel
 
  • Started 2nd stage of domain crawl 2013 using NAS 4.01

Next meeting

August 20th 13-14??

...