Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When requesting a Blob from the BlobstoreConnection, a Blob is returned, even if it does not exist. Like a File object, it has an exist(); method. You can then open input and outputstreams on the blob. 

Tapes, the basic design

Invariant: No data will ever be overwritten. This is the fundamental invariant in the tape design. Every write creates a new instance of an object. This is the fundamental invariant in the tape design.  

(If Fedora writes two changes to an object it could happen that only the latest write is "taped". More about the deferred writing later.)

The tapes are tar files. The can be said to exist in a long chain. Each tape is named according to the time it was created. Only the newest tape can be written. When the newest tape reaches a certain size, it is closed, and a new tape is started. This new tape is now the newest tape. 

...

Each object instance is named as "<objectId>#<currentTimeMillis>" in the tape. The naming is not really important. The name should contain the objectId, so reindexing of the tapes are possible. To help people reading the tapes with normal tar tools, the names should be unique (inside the tape, if not globally), as extracting the content becomes annoying otherwise. This system, however, would not mind.

The tapes themselves are named "tape<currentTimeMillis>.tar"

When an outputstream is When reindexing the tape, the offset of the object in the tar file is used to determine if which object is newer.

The tapes themselves are named "tape<currentTimeMillis>.tar"

 

Gliffy
nametapesForFedora

TapeArchive

TapeArchive is the component that handles the tar files and the index.

When an outputstream is opened to a blob, the global write lock is acquired by this thread. As Fedora does not tell the blob how much data it is going to write, the outputstream will buffer the written data until the stream is closed. When the stream is closed, the buffer will be written to the newest tape as a new tar entry. The object instance will be registered in the index. Lastly, the write lock will be released. 

...

It is principially not nessesary to acquire the write lock until the stream is closed, but it is acquired when the stream is opened. If the write lock is acquired on closing, it needs to be able to determine what tape is the newest tape at that time. By acquiring it on the "open" time, it can be fed this information. Since Fcrepo seems to burst-write to the disk, deadlocks or even slowdowns, have not been seen.

Reading is done by quering querying the index for the tape name and offset. With this information, an inputstream can be opened to the exact entry in the relevant tape. No locking is necessary for reading. After skipping to the correct offset in the tape, we read until we get to tar record header, which ought to be after 0 bytes. We then read the tar record header to get the size of the record, and then returning an inputstream starting from this position and stopping when the entire record have been read (to prevent the user from reading into the next record). At no time in this do we examine the name of the tar record we are reading.

A tape is marked as indexed (in the index below) when it is closed and a new tape started. As will be explained, tapes that are marked for indexed will not be re-read upon server startup. 

Taper

Taper is the component that handles the deferred writing. It is a singleton object, which, upon creating, starts a timer task. Every few milliseconds, it examines all changes objects, and tapes all objects whose changes are more than "tapeDelay" seconds old. 

The procedure is as follows

  1. Acquire a lock on the tapingDir
  2. Tape all that is in the tapingDir
  3. Acquire a lock on the cacheDir
  4. Iterate through all files in the cacheDir
    1. If the file is older than tapeDelay
      1. move to tapingDir
  5. release lock on cacheDir
  6. Tape all that is in the tapingDir
  7. release lock on tapingDir

The Index

For the index, a separate system called Redis http://redis.io/ is used.

...