Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Status of the production sites

Netarkivet

Panel
  • Broad crawl
    • 3rd broad crawl ´22 finished end October (2-3 weeks more than anticipated)
    • 4th broadcrawl for 2022 started Nov 1st. (4 broad crawls is the norm)
    • We expect around 110TB, data for 2022.

  • Event harvest on the General election including TikTok content using both Heritrix and archiveweb.page. Still running but will end soon

  • IIPC WAC 2023
    • 4 proposals submitted


      Submission Type / Conference Track: IN PERSON: 60 minute panel

      SolrWayback: Best practice, community usage and engagement Image Added 

      Egense, Thomas (1); Toth, Laszlo (2); Eldakar, Youssef (3); Aubry, Sara (4); Klindt Myrvoll, Anders (1)
      Organization(s): 1: Royal Danish Library (KB); 2: National Library of Luxembourg (BnL); 3: Bibliotheca Alexandrina (BA); 4: National Library of France (BnF)


      Submission Type / Conference Track: IN PERSON: 60, 90, or 120-minute conference-themed workshop

      Run your own full stack SolrWayback Image Added 

      Egense, Thomas; Eskildsen, Toke; Thøgersen, Jørn; Klindt Myrvoll, Anders

      Organization(s): Royal Danish Library, Denmark


      Submission Type / Conference Track: IN PERSON: 60, 90, or 120-minute conference-themed workshop

      Browser-Based Crawling For All: Getting Started with Browsertrix Cloud 

      Jackson, Andrew N. (1); Klindt Myrvoll, Anders (2); Kreymer, Ilya (3)
      Organization(s): 1: The British Library, United Kingdom; 2: Royal Danish Library; 3: Webrecorder


      Submission Type / Conference Track: ONLINE: 45 minute panel

       rowser-Based Crawling For All: The Story So Far 


      Klindt Myrvoll, Anders (1); Jackson, Andrew (2); Bingham, Nicola (2); Lelkes-Rarugal, Carlos (2); O'Brien, Ben (3); Duncan, Sholto (3); Kreymer, Ilya (4); Ko, Lauren (5); Mulliken, Jasmine (6)
      Organization(s): 1: Royal Danish Library; 2: The British Library, United Kingdom; 3: National Library of New Zealand | Te Puna Mātauranga o Aotearoa; 4: Webrecorder; 5: UNT; 6: Stanford




  • Still almost finished with the updated JWAT for validation of Warc-files - awaiting builf for JAVA8

  • Quite a few enquiries form researchers on our Facebook-content. We have a lot of old content, but curated new content is very sparse. There´s no good way to get Facebook content, cause our account will be recognized as a robot quickly, when using browsertrix cloud eg.. and blocked or logged out. We are testing the limits with browser-profiles in Browsertrix cloud and logged-in crawling of Facebook - and it´s possible, but scoping will be important. 

  • NAS 7.4.3 in production

  • SolrWayback updated 4 days ago - https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md


BnF

Panel



ONB

Panel

BNE

Panel



KB-Sweden

...