2011 Workshop Curator Track

2011 Workshop Curator Track


Day 1 (Thursday 24) - 14:00 - 17:00

Location: Tower 1, Level 4, Annick and Géraldine's office // Chair: Sara

Curator tool developped by BnF (Peter, Sara)

Presentation of the curator tool currently developped by BnF, features, usage, articulation with NetarchiveSuite.

We discussed the opportunity to share the curator tool developped by BnF on the top of NetarchiveSuite. It could suit any organisation which would have a broader and less-trained network of partners selecting websites. Michaela, Karen and Sabine would like to test it. Sara will check if it is possible.

Selecting websites in cooperation with external partners (Michaela)

The discussion was about sharing practices and tips to raise awareness and have librarians or researchers collaborate into selecting websites for webarchiving.
Slides from ONB (also including examples for metrics, rich/social media, news and media harvesting).
Slides from BnF.

Monitoring of broad crawls (Karen)

How to get it all, without getting too much. Ways of improving harvesting, setting up filters.

The discussion was about monitoring practices with NetarchiveSuite and Heritrix.
Slides from KB.
Slides from BnF.

Metrics (Annick)

Presenting the five main metrics used by each institution and sharing how we communicate them to partners and researchers.

No time to cover this point

Back to the curator roadmap (Sara)

Sabine: Revision of curator wish list (wish list with time estimate developpers KB/SB: NAS-curator-wishlist-2011-2012v2_commentary_v2_SAS.docx)

We reviewed and updated the curator wish list to identify where to focus development efforts in the coming months.

Day 2 (Friday 25) - 09:00 - 12:30 Common Curator/Technical Track

Location: Tower 3, Level 4, Meeting Room // Chair: Mikis, Sara

Sharing experience on harvesting and accessing rich and social media websites (all)

All (curators and developpers) prepare and present short presentations on use cases (examples of websites using videos, flash, Facebook, Twitter...) which are difficult to harvest and/or to access for research or even QA purposes. Presentations should include potential tricks and technical solutions to troubleshoot the problems.

Presentations made on these topics showed areas for more collaboration on harvesting challenges between the institutions.
Examples from Netarkivet (video, audio, login-content, Facebook, Twitter)
Examples from BnF on Facebook, Twitter, Dailymotion.

News and media websites harvesting (Géraldine, Sabine)

Same as above. We will use the wiki to share templates, scripts or other tricks used to harvest these websites.
Examples from BnF on Ouest France, a French regional newspaper.


Curator roadmap

Exchange of experiences

  • harvesting special signs (#, ? etc )
  • harvesting dynamic url’s
  • harvesting javascript (e.g. browse buttons…)
  • Flash pages
  • NAS performance
  • …?