Location: National Library of Estonia, Eesti Rahvusraamatukogu / Tõnismägi 2, Tallinn (http://www.nlib.ee ), meeting point: main entrance
Participants:
Organization | Technical | Curator |
---|---|---|
Clement Oury, Annick Le Follic, Géraldine Camile | ||
Andreas Predikaka (participating through Skype) | Michaela MAYR (participating through Skype) | |
DK | ||
Estonia | Meelis Mihhailov, Rando Rostok | Jaanus Kõuts, Tiiu Daniel, Elis Karpov Liina Abner (ebooks/newspapers discussion) |
Spain | Juan Carlos García Arratia | Mar Pérez Morillo |
The day before the workshop itself 6 NAS participants will give talks on the International Seminar on Web Archiving in Tallinn, see 2015-01-28 International seminar on web archiving in Estonia for details.
Topics to be discussed:
Heritrix 3 - technical | Heritrix 3 - curatorial | NetarchiveSuite | |
---|---|---|---|
|
|
|
Agenda
Schedule for Day 1 (Thursday 29)
Location: Cupola Hall
09:00 - 09:30 Welcome and coffee
09:30 - 10:00 Workshop introduction (Sara)
Summary of the agenda, including any last minute additions.
10:00 - 11:15 Institution updates (one person from ONB, Estonia, BNE, BnF, KB/SB)
Each institution presenting the main work topics and developments for 2014/2015.
11:15 - 11:30 Coffee break
11:30 - 13:00 Statistics on web archives using ISO metrics (Annick, ?SB/KB)
BnF presenting its currents statistics, tool and workflow, KB/SB presenting thoughts and decisions, exchanges/possible actions.
13:00 - 14:00 Lunch
14:00 - 14:10 Introducing Heritrix 3 in NetarchiveSuite: NAS 5.0 status and plans (Mikis)
14:10 - 14:30 Quick demonstration of Heritrix 3 (Søren)
14:30 - 14:45 Introducing Heritrix 3 in practices: BnF approach (Sara)
14:45 - 17:00 Heritrix 3 WARC/coders tracks: WARC usage in NAS compared to Archive-it (1h), NAS 5.0 code redesign/collaborative development possibilities (1h) (Mikis), location: seminar room
14:30 - 17:00 Heritrix 3 curator track: monitoring and QA crawls with Heritrix 3, identification of missing features (Annick/Géraldine)
Important: if possible, all participants should prepare this discussion by having run some preliminary tests on H3 as a standalone application.
15:30 - 15:45 Coffee break
17:00 - 17:30 Tour of web archiving activities in Estonia (Jaanus)
19:00 - Dinner
Schedule for Day 2 (Friday 30)
Location: Cupola Hall
09:00 - 09:30 Harvesting complex websites: experiments with Archive-it 4.9/5.0 using 3.3.0 with Umbra (Tue)
Experience With IA Umbra (Colin)
09:30 - 11:15 Digging in the data mines of the Net Archive (Per)
Per presenting a study he just run on DK collections, all presenting on current practices and questions.
Details on the file identification experiment using Nanite: A Weekend With Nanite
Details on the "can we trust the MIME type as it was reported by the web server" experiment: http://rpubs.com/perdalum/de-dup1
Details on the comparison of the domains of the two broadcast companies Comparing the domains of two Danish broadcast companies
The easiest way to get started with R: RStudio
My fork of JWAT-tools for easy extraction of craw.log files: https://bitbucket.org/perdalum/jwat-tools/branch/netarkivet
11:15 - 11:30 Coffee break
11:30 - 13:00 H3 tracks sum-up, review of NAS curator roadmap, community next steps
13:00 - 14:00 Lunch
14:00 - 15:30 Ebooks/newspapers: deposit or FTP harvesting (Tue, Liina, Géraldine)
15:30 - 15:45 Coffee break
15:45 - 17:00 Open space for an additional topic, individual discussions or free time