Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Why?

  • The code is still a mess even though it has been converted from an ant to a maven project.

  • The Harvester-core module is just a trashcan location for anything remotely harvester related.
  • Unit tests are separated into their own modules making code coverage close to impossible.

  • Impractical to use 1-2 weeks of running test cases which should be covered much better by unit testing. (every time anything major is released)

  • Registering/deregistering JMS listeners just seems wrong and it is surprising it has worked so far. Nothing in the JMS specifiction suggests that this is even a good idea.

Overall solutions

  • Isolate storage implementation completely. It is going to happen sooner or later anyway.
  • Remove JMS as much as possible. (Except from legacy code like the bitarchive implementation)
  • Split the different parts of NAS into separate modules with little or no dependencies.

Reasoning

  • It is easier to build/test/deploy. With improved and localized unit testing you could most likely...
    • release much quicker without having to use 1-2 weeks testing.
    • release only the module(s) you changed.
  • People who don not want to use NAS could instead just use the parts that fit their needs.
    • For example people could use the harvest controllers and nothing else.

Harvest Control Manager Component

  • Remove the JMS dependency from the controller.
    • Instead use a REST interface or some other means of exposing a simple extendable API.
  • Remove the notion of channels from the controller.
    • The management of organizing controllers into groups is left to the user of these APIs.
  • Make the code independent of the rest of NAS so it can be used not only by NAS.
  • Controllers should be deployed independently of the rest of NAS.
  • Use a plugin architecture for core functionality. (Use classloaders)
    • configure harvester
    • build progress reports
    • build metadata files when the job is complete
    • upload data to persistant storage
  • A controller is built for a specific harvester; H1, H3, API
  • Extendable using custom commands that the plugins add to the controller. (Thinking beyond H3...)
  • The API should include all required functions to control the harvest manager
    • Submit job.

    • Upload configuration files.

    • Upload additional files; indexes etc.

    • Start job.

    • Get progress/report.

    • Stop job.

    • Initiate metadata generation.

    • Initiate upload.

  • Offer base client implemention. (Used by a job manager/monitor)

The API should make it possible to redo certain operations which occasionally fail and require manual intervention.

Only when the worst happens should i be required for a person to fiddle with the server.

 

 

Netarkivet would then migrate existing code into plugins and other users could use these as a reference to adapt then to their own infrastructure.

 

  • No labels