Agenda for the joint NetarchiveSuite tele-conference 2022-12-06, 13:00-14:00.
...
Panel |
---|
- Broad crawl
- 4th broadcrawl step 2 - 2022 started a few weeks ago. More than 100 harvesters used concurrently (120 harvester capacity, 77 broadcrawlers)
- Also working on other part of the broadcrawl with selective harvesters.
- Bytelimit downgraded 61K shops to 10K maxobjects and 499MB maxbyte
- Event harvest
- General election still running but will end soon
- World Championship Soccer in Quatar- needs more seeds and then to be ended
- IIPC WAC 2023
- 4 proposals approved:
SolrWayback: Best practice, community usage and engagement
Run your own full stack SolrWayback Browser-Based Crawling For All: Getting Started with Browsertrix Cloud
Browser-Based Crawling For All: The Story So Far
- JWAT for validation of Warc-files updated - there might be some more work on documentation.
- Browserbased crawling for all IIPC-project proceeding. UX update will come soon with enhancements of exclusions and also using more explanations for each step/input.
December update for Browsertrix Cloud: https://docs.google.com/document/d/1G7cy8mebn8mDA5w1rFn0mQ4QEN_B3Ajte3C28ChneEs/edit# IIPC Just launched our new docs for browsertrix cloud at: http://docs.browsertrix.cloud/
|
BnF
ONB
BNE
KB-Sweden
...