1st Broadcrawl 2025- step 2 almost finished - smoother crawl than ever Data delivery of all text from the archive +some metadata for research project finished. 32 TB compressed. “Mere vand i systemet/More water in the system” climatechange debate-project Proceeding as planned: Using Browsertrix Cloud to crawl hard-to-get content like video (YouTube + LinkedIn logged in) and more. Waiting on results from development from Webrecorder on Facebook-behaviour (expand comments, view reels/content etc.). Logged in. Lots of experience and findings using Browsertrix including live-exclusions (text-regex etc.)
Browsertrix Solr-index - new SDD-drives update. Outreach and more
|