IIPC-GA April 2016

We (well, some of us) will be participating in the workshop ‘Building Better Tools, Together’ (Tuesday 12th 9:30-11) and we have been asked to talk for a couple of minutes on the work we are doing on tools, APIs etc. The idea of the workshop will be to foster future collaborations. Given that we are talking about a 2-3 minute presentation we shouldn't try to say too much. Some points:

  • What is NAS? Who are "we"? Who might benefit from learning more about it?
  • NAS is now "modernised" to use H3 (3.3.0-LBS-2014-03) and WARC which means that the core code-base is now likely to be pretty stable for a while. This makes contributing to NAS easier.
  • Forseeable areas for contrbution
    • Custom Heritrix processors
    • Support for Heritrix scripting
    • Finer control of harvesting from NAS GUI
    • Further i8n
    • Integration of NAS with other harvesters - especially browser-based

 

Other areas of interest

Do we (as NAS) want to talk about other focus-areas? Or should we rather present them as individual organisations? I'm thinking about issues like

  • Full-text indexing and presentation
  • Tool-support for mass-processing
    • Corpus extraction
    • Derived formats
    • analysis + visualisation
  • index-server API
  • harvesting API
  • Discovery API + Services
  • WARC standard + usage
    • Deduplication/revisits
    • Standards and tools for metadata + provenance
  • Integration of web- and nonweb- collections
  • more automation of QA (crawl.log analyse)