Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 5 Next »

The Bitrepository platform is used for longterm preservation of the newspaper data

The bitrepository for newspapers will consist of:

  • A nearline pillar, managed by JHLJ. The pillar's cache will act as processing area the jpeg2000 datafiles.
  • A offline pillar, managed by LFM.
  • A checksum pillar, managed by ???

In-house wiki description of the Bitrepository: https://sbprojects.statsbiblioteket.dk/display/DIGSAM/4.5+Bitbevaring+Avis

Ingester

The bitrepository ingester takes care of the archiving of the jp2 files into the bitrepository archive. This is done by traversed the batch structure and for each jp2 files perform the following steps:

  1. Generated a unique FileID identifying the file in the repository. The key is constructed by using the file name including the path in the batch structure with the batch number as the root element. The path separator is the '_' (underscore) char. The maximum length allowed is 250.
  2. Ingest the file into the bit repository verifying the ingest using the checksum for the file.
  3. Register the archived file in the DOMS system.

Data processing

As part of the general architecture files are processed when they have been ingested in the repositories. I.e. Bitrepository for datafiles (jpeg2000) and DOMS for metadata.

A tape based pillar normally gives no guarantees of when files are online, which poses a problem when using it for processing. To solve this problem the tape backend api (layer 2, python) is being extended with additional methods so that:

  • We can control which files are kept in cache while processing.
  • We can minimize the number of files that has been rejected that ends up on tape storage.

The api methods for the tape backend is:

  • force-online <prefix>
    • Method to keep files with <prefix> online. The method should be called prior to ingesting any files with the prefix, to prevent them from being rolled out on tape.
    • A call to the method may fail if the qouta for online files has been exceeded. 
    • In the future, a call to this method with a prefix for which files already is on tape may bring the files online.
  • release-online <prefix>
    • Method to release files with a prefix to be rolled out on tape.
  • status-online
    • Method to list which prefixes are kept online.
    • May in the future provide more information of the online/offline state.

Additionally the contract is that if a force-online call fails (i.e. the online files qouta is exceeded), it will still be possible to ingest files in the bitrepository, they will however be rolled out on tape by the usual policy.

  • No labels