Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • What is NAS? Who are "we"? Who might benefit from learning more about it?
  • NAS is now "modernised" to use H3 (3.3.0-LBS-2014-03) and WARC which means that the core code-base is now likely to be pretty stable for a while. This makes contributing to NAS easier.
  • Forseeable areas for contrbution
    • Custom Heritrix processors
    • Support for Heritrix scripting
    • Finer control of harvesting from NAS GUI
    • Further i8n
    • Integration of NAS with other harvesters - especially browser-based

...

  • Full-text indexing and presentation
  • Tool-support for mass-processing
    • Corpus extraction
    • Derived formats
    • analysis + visualisation
  • index-server API
  • harvesting API
  • Discovery API + Services
  • WARC standard + usage
    • Deduplication/revisits
    • Standards and tools for metadata + provenance
  • Integration of web- and nonweb- collections
  • more automation of QA (crawl.log analyse)