IIPC-GA April 2016
We (well, some of us) will be participating in the workshop ‘Building Better Tools, Together’ (Tuesday 12th 9:30-11) and we have been asked to talk for a couple of minutes on the work we are doing on tools, APIs etc. The idea of the workshop will be to foster future collaborations. Given that we are talking about a 2-3 minute presentation we shouldn't try to say too much. Some points:
- What is NAS? Who are "we"? Who might benefit from learning more about it?
- NAS is now "modernised" to use H3 (3.3.0-LBS-2014-03) and WARC which means that the core code-base is now likely to be pretty stable for a while. This makes contributing to NAS easier.
- Forseeable areas for contrbution
- Custom Heritrix processors
- Support for Heritrix scripting
- Finer control of harvesting from NAS GUI
- Further i8n
- Integration of NAS with other harvesters - especially browser-based
Other areas of interest
Do we (as NAS) want to talk about other focus-areas? Or should we rather present them as individual organisations? I'm thinking about issues like
- Full-text indexing and presentation
- Tool-support for mass-processing
- Corpus extraction
- Derived formats
- analysis + visualisation
- index-server API
- harvesting API
- Discovery API + Services
- WARC standard + usage
- Deduplication/revisits
- Standards and tools for metadata + provenance
- Integration of web- and nonweb- collections
- more automation of QA (crawl.log analyse)