...
Participants
- BNF: Sara, Annick, Lam
- ONB: Michaela, Andreas
- KB/DK: Søren, Tue, Jonas, Stephen, Nicholas
- SB: Colin, Sabine, Niels
- BNE: Mar
- KB/SE: Bengt ??, Stewart ??
Introducing new members
Niels Bønding from Netarkivet.
The royal Library of Sweden is going to use NAS. Bengt Neiss and Stewart Rutledge are joining the teleconferences.
NAS 5.1 Update
...
Now in use in Netarkivet production environment
IIPC GA (all)
Feedback and important information from GA
NAS workshop (Sara)
Topics
1) Share experience with NAS 5 and Heritrix 3
2) Discuss challenges with specific types of sites (news, social media)
3) Discuss collection strategies
4) Discuss features/a GUI to handle the harvester
5) Look into the possibility to integrate another crawler into NAS (Colin proposed to come with a prototype with a headless browser)
Schedulefrom DK on running NAS 5 + Heritrix 3.
A property in H3 respects the crawl-delay in robots.txt and by default sets it with 300 sec.
If you want to disable this property then add value 0 to ignore the robots.txt crawl-delay.
See the property marked with yellow:
NAS workshop
End of January 2017 - 2,5 days - in Vienna
Poll from Michaela Please complete Michaela's poll : http://doodle.com/poll/nk6dfc3kav4a4hs8
Status of the production sites
Netarkivet
Panel |
---|
Broad crawl Event crawls Selctive crawls One of the first social media platforms, arto.com, closed at 1st June. We had problems with our last complete crawl before the closing. With a specially developed modul, where the FetchDNS method is changed, we hope to be able to get all content directly from their server. Potential collaboration project Internal |
BnF
Panel |
---|
This month we're opening an experimental access interface, Archives de l'internet Labs. This interface provides full-text searching of a small part of our collections, with the possibility to export results and save searches and selections in a personal workspace. It also provides access to statistics and metadata on the collections. This interface builds on the work we have done over the past year or so on data mining and full text indexing. It is part of a four-year project at the BnF studying the creation of a service to provide researchers with corpora from the digital collections of the BnF, the web archives having been chosen as the case study for the first year of the project. For the moment this interface will only be available to researchers working on two specific projects who have signed a convention with the BnF, but as part of the overall project we will be looking at how this kind of service can be offered to more researchers. |
ONB
Panel |
---|
|
BNE
Panel |
---|
Next meeting
2016-0607-14
Any other business?