Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Status of the production sites

Netarkivet

Panel

We have finished our “little” broad crawl:

Name

Start time

Stop time

Bytes

Documentes

2016-3-100MB

13.09.2016

30.09.2016

11.654.316.537.483

252.238.201

We have nearly finished the reorganization of our selective crawls according to the new strategy:

  • Daily crawls of all national news sites
  • Daily crawls of all regional news sites
  • Weekly crawls of all local news sites
  • Monthly crawls of political parties’ sites
  • Trimonthly crawls of ministries’ and administrative bodies’ sites
  • “Streamlining” of Twitter crawls
  • Analyze of depth and frequency for a crawl of organizations and associations

We renewed our account at Archive-IT, it is supposed to be used for Facebook crawls

NAS 5.2 is released for developers test. Test for curators is planned for the end of October. 

We are upgrading the citrix installation, which gives access to wayback.

We have testet  Ilya Kraemers W/ARC player for displaying https pages: it works fine, but there are some security issues to be fixed.

 

BnF

Panel

Start of our 2017 broad crawl on October 10th (4,4 million domains, 3500 URLs per domain, 40 crawlers working with 10 threads each during the day and 30 threads during the night because of bandwidth constraints). We expect to harvest 80 TB of data.

 We redesigned our Wayback and will give access to our full text indexed 1996-2000 collection with Shine in November.

...