Why?
The code is still a mess even though it has been converted from an ant to a maven project.
- The Harvester-core module is just a trashcan location for anything remotely harvester related.
Unit tests are separated into their own modules making code coverage close to impossible.
Impractical to use 1-2 weeks of running test cases which should be covered much better by unit testing. (every time anything major is released)
- Registering/deregistering JMS listeners just seems wrong and it is surprising it has worked so far. Nothing in the JMS specifiction suggests that this is even a good idea.
Overall solutions
- Isolate storage implementation completely. It is going to happen sooner or later anyway.
- Remove JMS as much as possible. (Except from legacy code like the bitarchive implementation)
- Split the different parts of NAS into separate modules with little or no dependencies.
Reasoning
- It is easier to build/test/deploy. With improved and localized unit testing you could most likely...
- release much quicker without having to use 1-2 weeks testing.
- release only the module(s) you changed.
- People who don not want to use NAS could instead just use the parts that fit their needs.
- For example people could use the harvest controllers and nothing else.
Harvest Control Managers
-------------------------
- Remove the JMS dependency from the controller.
- Instead use a REST interface or some other means of exposing an API.
- Remove the notion of channels from the controller.
- The management of organizing controllers into groups is left to the user of these APIs.
- Make the code independent of the rest of NAS so it can be used not only by NAS.
- Controllers can be deployed independently of the rest of NAS.
- Use plugins for core functionality. (Use classloaders)
- build progress reports
- build metadata files when the job is complete
- upload data
- A controller is built for a specific harvester; H1, H3, API
- The API should include all required functions to control the harvest manager.
- Extendable using custom commands that the plugins add to the controller. (Thinking beyond H3...)
- Offer base client implemention. (Used by a job manager/monitor)
- Submit job.
- Upload configuration files.
- Upload addititional files; indexes etc.
- Start job.
- Get progress/report.
- Stop job.
- Initiate metadata generation.
- Initiate upload.
- Try to avoid harvests being ... manual bla.
Netarkivet would then migrate existing code into plugins and other users could use these as a reference to adapt then to their own infrastructure.