Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Throughout the process, the state of a Batch can be tracked by the Surveillance Interface

HTML CommentWhat matters is not that there is a surveillance interface but that there is a monitor components component that records a persistent state for each batch. - CSR

Add something like "The state of each batch is stored in DOMS and accessed through an API which queries a caching layer (e.g. a lucene index of batch objects in DOMS). So there will be some latency between updates to DOMS batch objects and the result of API queries."

Insert excerpt
Newspaper Digitisation Process Monitor
Newspaper Digitisation Process Monitor

Each step in the process is handled by an autonomous component. Are these what are commonly called Autonomous Agents? - CSR

Include Page
Autonomous Components
Autonomous Components

...

  • digitize a batch of newspaper microfilm
  • Upload the batch to our servers (by rsync)
  • Notify us that the receival process can begin for this batch (The "Batch Object Creation" event)

There are some minor questions of detail here about who creates the intial object (e.g. a record in state "NEW" in a database) is it:

  • Us, when we send the batch out to Ninestars
  • Ninestars, after they have successfully uploaded the files to us
  • Us, after we have received a message (in some form ...) from Ninestars that the files have been uploaded

Any of these could be made to work. - CSR

The first real robot is the "Autonomous Bitrepository Ingester". It polls for "Batch Object Creation" events, so it will receive batches right after Ninestars have uploaded them. For a batch, it will iterate over the jpeg2000 files and for each:

...

The next robot is the "Autonomous Doms Ingester". It polls for the "Bitrepository Ingest" event, so it will always run on batches after they have been ingested into the bit repository. It will create the metadata structure (batch->reel->newspaper->page) structure in DOMS with all the supplied metadata. When this task is done, we have no further need of the data in the Scratch storage, as it should all have been ingested into our preservation platform. Finally, it will add the "Metadata Ingest" event to the batch object. What is the story with content models? Does metadata-ingest include content-model validation? -CSR

Robots can occupy the same location on the assembly line. Here we have the first example of this. The "Autonomous JPylyzer" is a robot that, like the "Autonomous Doms Ingester" polls for "Bitrepository Ingest" events. The task of this robot is to run jpylyzer on the jpeg2000 files in the batch. The task will be done as a hadoop job.   This assumes that the ABI ensures that the jp2 files are in hdfs, or that the Autonomous Jpylyzer can bring them in if necessary. -CSR

  • As the map step, run JPylyzer on each jpeg2000 file
  • As the reduce step, sdd add the output of this process to the file object in DOMS

...

We have two robots, that might end up working concurrently. The first is the "Autonomous Batch Structure Checker". This robot might in fact be a set of robots (TO BE DECIDED), but for now we can think of it as a single step. It polls for "Metadata Ingest" events in the SBOI (Batch Status API -CSR). It will perform a series of checks of the metadata in DOMS. When done, it will add the "Batch Structure Checked" event to the Batch object.

...

  • Query the bit repository for any files pertaining previous versions of the batch
  • If any is found, request the delete key
  • Delete the files
  • Query DOMS for any previous run-number objects. If any is found, purge them and their subtree.
  • Need to clarify what metadata we want to keep from failed batches, since it could be very useful for comparison with the new version. -CSR

Child pages (Children Display)
depth2
excerpttrue

...