Agenda for the joint NetarchiveSuite tele-conference 2022-11-08, 13:00-14:00.
Participants
- BNF: Auriane, Clara, Sara
- ONB: Andreas
- KB/DK - Copenhagen: Anders, Thomas, Stephen , Tue
- KB/DK - Aarhus: Colin
- BNE: Alicia, Miguel, José
- KB/Sweden: Peter, Pär, Jonas
Update on NAS latest tests and developments
Status of the production sites
Netarkivet
- Broad crawl
- 3rd broad crawl ´22 finished end October (2-3 weeks more than anticipated)
- 4th broadcrawl for 2022 started Nov 1st. (4 broad crawls is the norm)
- We expect around 110TB, data for 2022.
- Event harvest on the General election including TikTok content using both Heritrix and archiveweb.page. Still running but will end soon
- IIPC WAC 2023
- 4 proposals submitted
Submission Type / Conference Track: IN PERSON: 60, 90, or 120-minute conference-themed workshop
Browser-Based Crawling For All: Getting Started with Browsertrix Cloud
Jackson, Andrew N. (1); Klindt Myrvoll, Anders (2); Kreymer, Ilya (3)
Organization(s): 1: The British Library, United Kingdom; 2: Royal Danish Library; 3: WebrecorderSubmission Type / Conference Track: ONLINE: 45 minute panel
rowser-Based Crawling For All: The Story So Far
Klindt Myrvoll, Anders (1); Jackson, Andrew (2); Bingham, Nicola (2); Lelkes-Rarugal, Carlos (2); O'Brien, Ben (3); Duncan, Sholto (3); Kreymer, Ilya (4); Ko, Lauren (5); Mulliken, Jasmine (6)
Organization(s): 1: Royal Danish Library; 2: The British Library, United Kingdom; 3: National Library of New Zealand | Te Puna Mātauranga o Aotearoa; 4: Webrecorder; 5: UNT; 6: Stanford
- 4 proposals submitted
- Still almost finished with the updated JWAT for validation of Warc-files - awaiting builf for JAVA8
- Quite a few enquiries form researchers on our Facebook-content. We have a lot of old content, but curated new content is very sparse. There´s no good way to get Facebook content, cause our account will be recognized as a robot quickly, when using browsertrix cloud eg.. and blocked or logged out. We are testing the limits with browser-profiles in Browsertrix cloud and logged-in crawling of Facebook - and it´s possible, but scoping will be important.
- NAS 7.4.3 in production
- SolrWayback updated 4 days ago - https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md
BnF
ONB
BNE
KB-Sweden
Next meetings
- December 6th
- January 10th, 2023