Why?
The code is still a mess even though it has been converted from an ant to a maven project.
- The Harvester-core module is just a trashcan location for anything remotely harvester related.
Unit tests are separated into their own modules making code coverage close to impossible.
Impractical to use 1-2 weeks of running test cases which should be covered much better by unit testing. (every time anything major is released)
- Registering/deregistering JMS listeners just seems wrong and it is surprising it has worked so far. Nothing in the JMS specifiction suggests that this is even a good idea.
Overall solutions
- Isolate storage implementation completely. It is going to happen sooner or later anyway.
- Remove JMS as much as possible. (Except from legacy code like the bitarchive implementation)
- Split the different parts of NAS into separate modules with little or no dependencies.
Reasoning
- It is easier to build/test/deploy. With improved and localized unit testing you could most likely...
- release much quicker without having to use 1-2 weeks testing.
- release only the module(s) you changed.
- People who don not want to use NAS could instead just use the parts that fit their needs.
- For example people could use the harvest controllers and nothing else.
Harvest Control Manager Component
- Remove the JMS dependency from the controller.
- Instead use a REST interface or some other means of exposing a simple extendable API.
- Remove the notion of channels from the controller.
- The management of organizing controllers into groups is left to the user of these APIs.
- Make the code independent of the rest of NAS so it can be used not only by NAS.
- Controllers should be deployed independently of the rest of NAS.
- Use a plugin architecture for core functionality. (Use classloaders)
- configure harvester
- build progress reports
- build metadata files when the job is complete
- upload data to persistant storage
- A controller is built for a specific harvester; H1, H3, API
- Extendable using custom commands that the plugins add to the controller. (Thinking beyond H3...)
- The API should include all required functions to control the harvest manager
Submit job.
Upload configuration files.
Upload additional files; indexes etc.
Start job.
Get progress/report.
Stop job.
Initiate metadata generation.
Initiate upload.
- Offer base client implemention. (Used by a job manager/monitor)
The API should make it possible to redo certain operations which occasionally fail and require manual intervention.
Only when the worst happens should i be required for a person to fiddle with the server.
Netarkivet would then migrate existing code into plugins and other users could use these as a reference to adapt then to their own infrastructure.