Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Why?

  • The code is still a mess even though it has been converted from an ant to a maven project.

  • The Harvester-core module is just a trashcan location for anything remotely harvester related.
  • Unit tests are separated into their own modules making code coverage close to impossible.

  • Impractical to use 1-2 weeks of running test cases which should be covered much better by unit testing. (every time anything major is released)

  • Registering/deregistering JMS listeners just seems wrong and it is surprising it has worked so far. Nothing in the JMS specifiction suggests that this is even a good idea.

Overall solutions

  • Isolate storage implementation completely. It is going to happen sooner or later anyway.
  • Remove JMS as much as possible. (Except from legacy code like the bitarchive implementation)
  • Split the different parts of NAS into separate modules with little or no dependencies.

Reasoning

  • It is easier to build/test/deploy. With improved and localized unit testing you could most likely...
    • release much quicker without having to use 1-2 weeks testing.
    • release only the module(s) you changed.
  • People who don not want to use NAS could instead just use the parts that fit their needs.
    • For example people could use the harvest controllers and nothing else.

 

 

 

 

Harvest Control Managers

-------------------------

 

- Remove the JMS dependency from the controller.

  - Instead use a REST interface or some other means of exposing an API.

- Remove the notion of channels from the controller.

  - The management of organizing controllers into groups is left to the user of these APIs.

- Make the code independent of the rest of NAS so it can be used not only by NAS.

  - Controllers can be deployed independently of the rest of NAS.

- Use plugins for core functionality. (Use classloaders)

  - build progress reports

  - build metadata files when the job is complete

  - upload data

- A controller is built for a specific harvester; H1, H3, API

- The API should include all required functions to control the harvest manager.

  - Extendable using custom commands that the plugins add to the controller. (Thinking beyond H3...)

  - Offer base client implemention. (Used by a job manager/monitor)

  - Submit job.

    - Upload configuration files.

    - Upload addititional files; indexes etc.

  - Start job.

  - Get progress/report.

  - Stop job.

  - Initiate metadata generation.

  - Initiate upload.

    - Try to avoid harvests being ... manual bla.

 

Netarkivet would then migrate existing code into plugins and other users could use these as a reference to adapt then to their own infrastructure.