Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Curator roadmap

Panel

 

IIPC GA

Summary (Sara and Tue).

...

Panel
  • We are more than half way through our second broad crawl for 2013, we have undtil now harvested about 15000 GB. Unfortunately there is an unsolved bug (https://sbforge.org/jirakb-dk.atlassian.net/browse/NAS-2198): we can't create warc-files larger than 100 MB.
  • We started harvesting Facebook for mobile devices - thus we are able to harvest all commentaries. It is done with all Facebook profiles to be harvested encoded into the harvest definition.
  • We are preparing a corpus from the archive for teaching purpose, that is to say according to a new interpretation of our personal data protection law we will give access to a part of our archived websites (event harvest on the 2011 parliamentary elections) via wayback and full text search (SOLR)
  • We are performing parallel tests on wayback 1.7 / 1.8 while we are waiting  for BNF’s solution for wayback support of https in proxy mode J
  • We have harvested more YouTube videos med følgende emner:  GRand prix Eurovision de la chanCon in a historical perspective, television and commercials, Bruce Springsteen in Denmark, Danish Jazz
  • We are still working on a general solution for harvesting stuff behind pay walls on news sites.

...