0-effort Ingest

Problem: Getting stuff recognized by our systems take way to long

Current state of most collections: Limbo

Files on disk

Disk is backed up

Files are checksummed

Metadata reside somewhere and referenced in the collection document

All known collections are created as Bitrepository collections right now.

This should be easy, as the Digital Preservation Group have catalogued these and decided on preservation levels already.

Files are put in the Bitrepository, rather than Limbo

This require a generic tool, but such a tool have already been developed for the Yousee workflow.

In addition to checksumming, they are analysed with Tika or Fits or similar.

This should also be extremely easy, if we do not attempt to validate the files, just characterise them.

Create doms collection corresponding to bitrepository collection

Attach metadata sources to doms collection

We need to determine where to store the metadata. If it is something simple, store it in DOMS. If it is big, we should probably store it in the bit repository and reference it.

Create file object for each file in the collection

Ie. create the simple datamodel. The datamodel is as follows. Only objects are file objects. These must have a content datastream and a datastream for the characterisation information. If we have metadata for each file separated (ie. separate xml files), it should be added the file objects.

End result