- We are more than half way through our second broad crawl for 2013, we have undtil now harvested about 15000 GB. Unfortunately there is an unsolved bug (https://sbforge.org/jirakb-dk.atlassian.net/browse/NAS-2198): we can't create warc-files larger than 100 MB.
- We started harvesting Facebook for mobile devices - thus we are able to harvest all commentaries. It is done with all Facebook profiles to be harvested encoded into the harvest definition.
- We are preparing a corpus from the archive for teaching purpose, that is to say according to a new interpretation of our personal data protection law we will give access to a part of our archived websites (event harvest on the 2011 parliamentary elections) via wayback and full text search (SOLR)
- We are performing parallel tests on wayback 1.7 / 1.8 while we are waiting for BNF’s solution for wayback support of https in proxy mode J
- We have harvested more YouTube videos med følgende emner: GRand prix Eurovision de la chanCon in a historical perspective, television and commercials, Bruce Springsteen in Denmark, Danish Jazz
- We are still working on a general solution for harvesting stuff behind pay walls on news sites.
|