Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Panel
    • 3rd Broadcrawl 2024- step 2 - started
    • Browsertrix
      • Local Production-instance will be ready this week - the existing test-platform will be used as a staging platform 
      • Trying to see if we can support Webrecorder with subscription to ge textra crawling instance + support

      • 500K+ front pages harvested. 3.5 TB 8 days?
    • We participate in a broad national collection/metholodigal effort on climate change/debate and more. It´s called "More water in the system".  https://www.rigsarkivet.dk/nyheder/dokumentation-af-reaktioner-paa-klimaforandringer/
    • Crawl parallel with curl and Brave and comparing links to find holes in the NAS/aheritrix crawls.

BnF

Panel

We are still continuing the preparations for our 2024 broadcrawl, which includes both significant changes concerning the technical environment (new storage bay, new OS, diversion of the network hardware to another BnF site) and developments aiming to fix the 404 errors problem. This harvest will be launched in October.

The European and legislative elections harvests are now finished. They lasted 5 months, from March to July 2024, for a total size of 5.9 TB. Instagram harvests on the theme of the elections were also carried out between June and July for a total of 0.16 TB as well as Videos crawls for a total size of nearly 3 TB.

Finally, the Olympic Games harvest is still ongoing with a new stage crawl for the Paralympic Games. An Instagram harvest is also underway and a Video crawl should be launched after the end of the games.

...