...
- What is NAS? Who are "we"? Who might benefit from learning more about it?
- NAS is now "modernised" to use H3 (3.3.0-LBS-2014-03) and WARC which means that the core code-base is now likely to be pretty stable for a while. This makes contributing to NAS easier.
- Forseeable areas for contrbution
- Custom Heritrix processors
- Support for Heritrix scripting
- Finer control of harvesting from NAS GUI
- Further i8n
- Integration of NAS with other harvesters - especially browser-based
...
- Full-text indexing and presentation
- Tool-support for mass-processing
- Corpus extraction
- Derived formats
- analysis + visualisation
- index-server API
- harvesting API
- Discovery API + Services
- WARC standard + usage
- Deduplication/revisits
- Standards and tools for metadata + provenance
- Integration of web- and nonweb- collections
- more automation of QA (crawl.log analyse)