Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Status of the production sites

Netarkivet

Panel
  • 4th Broadcrawl 2023- step 2 - all jobs running - expected to finish before christmas holidays
  • Still testing on site installation of Browsertrix Cloud as well as the beta-installation (Webrecorders installation)  - will intensify the coming month - possibility of customnaming of WARC-files is a must
    • Lots of great possibilities
  • Focus on data delivery for researchers
  • PyWb and CDX-indexing done.
    • PyWb instance on SolrWayback Stage (alternative playback).
    • Awaiting to move CDX-index to CPH-servers to enhance performance. The index needs to be close to the WARC-files and not between 2 firewalls.
    • Crawling and playback of advanced sites like Instagram, Facebook is still an issue. We thought our PyWb -installation would playback Instagram well, we made some tests earlier that indicated that, but seems there´s something wrong in PyWb playback with OutBack CDX (using JWARC as converter from WARC to CDX). We might talk to the community about these isuues. ALlo relevant in terms of PyWb roadmap, differences between PyWb and Replayweb.page (Browsertrix CLoud) and more. It would be great to have the same playback in PyWb as Replaywebpage (native or in Browsertrix Cloud)
  • Finished week 46 special crawls of Radio/TV websites.
  • New Citrix/VLan for Netarkivet ready to g ointo production (will be able to replay aprox. 70 mio. Flashsites)
  • Data delivery. At the moment we have to exculde many sites were content can be optained elsewhere as a service 

BnF

Panel


ONB

Panel

BNE

Panel


KB-Sweden

...