Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

Gliffy
nametapesForFedora

TapeArchive

TapeArchive is the component that handles the tar files and the index.

When an outputstream is opened to a blob, the global write lock is acquired by this thread. As Fedora does not tell the blob how much data it is going to write, the outputstream will buffer the written data until the stream is closed. When the stream is closed, the buffer will be written to the newest tape as a new tar entry. The object instance will be registered in the index. Lastly, the write lock will be released. 

Each outputstream will have a 1MB buffer per default. If the system attempts to write content exceeding the remaining bytes in the buffer, the buffer is marked as finished and a new buffer is allocated of size max(1MB,sizeNeeded). So, if you write 1 byte, 1 byte will be used of the 1MB buffer. If you then write 1MB in one operation, the default buffer will be finished, and a new buffer of 1MB will be allocated. This new buffer will then be filled with the 1MB written. So almost 1MB will be wasted here. 

It is principially not nessesary to acquire the write lock until the stream is closed, but it is acquired when the stream is opened. If the write lock is acquired on closing, it needs to be able to determine what tape is the newest tape at that time. By acquiring it on the "open" time, it can be fed this information. Since Fcrepo seems to burst-write to the disk, deadlocks or even slowdowns, have not been seen.

Reading is done by querying the index for the tape name and offset. With this information, an inputstream can be opened to the exact entry in the relevant tape. No locking is necessary for reading. After skipping to the correct offset in the tape, we read until we get to tar record header, which ought to be after 0 bytes. We then read the tar record header to get the size of the record, and then returning an inputstream starting from this position and stopping when the entire record have been read (to prevent the user from reading into the next record). At no time in this do we examine the name of the tar record we are reading.

A tape is marked as indexed (in the index below) when it is closed and a new tape started. As will be explained, tapes that are marked for indexed will not be re-read upon server startup. 

Taper

Taper is the component that handles the deferred writing. It is a singleton object, which, upon creating, starts a timer task. Every few milliseconds, it examines all changes objects, and tapes all objects whose changes are more than "tapeDelay" seconds old. 

The procedure is as follows

  1. Acquire a lock on the tapingDir
  2. Tape all that is in the tapingDir
  3. Acquire a lock on the cacheDir
  4. Iterate through all files in the cacheDir
    1. If the file is older than tapeDelay
      1. move to tapingDir
  5. release lock on cacheDir
  6. Tape all that is in the tapingDir
  7. release lock on tapingDir

Threads

To understand how the system works, threads needs to be understood. Whenever a user/client performs a request to the fedora webservice, a new Thread is started to handle this request. This thread will then attempt to handle the request, and when finished send the result back to the user. The user is thus "blocked" while waiting for his thread to return. Many users can, however, work concurrently, so many threads can be executing concurrently in this system.

When a object is requested for reading, the thread goes to the XmlTapesBlobStore. XmlTapesBlobStore asks Cache, which holds recently changed objects in the cacheDir. If the object is not found there, Cache asks Taper, which holds objects about to be taped in the tapingDir. If the object is not found there, Taper asks the TapeArchive, which finds the object in the tar taper.

When the user operation deleted an object, the thread goes to Cache. Cache immediately goes to Taper, which immediately goes to TapeArchive to delete the object. 

When the user operation changed or created a new object, the thread also goes to Cache. Here the new version of the object is written in cacheDir. The thread then returns. As can be seen, changes to the objects are written asynchronously to the tapes. 

As will be explained in detail later, Taper holds a timer thread which constantly archives changed objects from the cacheDir.

Locks

Each of the three storage directories (cacheDir, tapingDir, tapeFolder) is protected by a lock. In order to change the content of one of these dirs, the lock must be acquired. Only one thread can hold the lock at any one time, but the thread can hold the lock multiple times. The lock is not released until the thread have released all the instances it holds. Acquiring the lock is a blocking operation.

Locks are only used for writing operations, never for reading. Due to the way filesystems work, a file can be read, even if it is deleted or changed after opening. 

 

TapeArchive

TapeArchive is the component that handles the tar files and the index.

When an outputstream is opened to a blob, the global write lock is acquired by this thread. As Fedora does not tell the blob how much data it is going to write, the outputstream will buffer the written data until the stream is closed. When the stream is closed, the buffer will be written to the newest tape as a new tar entry. The object instance will be registered in the index. Lastly, the write lock will be released. 

Each outputstream will have a 1MB buffer per default. If the system attempts to write content exceeding the remaining bytes in the buffer, the buffer is marked as finished and a new buffer is allocated of size max(1MB,sizeNeeded). So, if you write 1 byte, 1 byte will be used of the 1MB buffer. If you then write 1MB in one operation, the default buffer will be finished, and a new buffer of 1MB will be allocated. This new buffer will then be filled with the 1MB written. So almost 1MB will be wasted here. 

It is principially not nessesary to acquire the write lock until the stream is closed, but it is acquired when the stream is opened. If the write lock is acquired on closing, it needs to be able to determine what tape is the newest tape at that time. By acquiring it on the "open" time, it can be fed this information. Since Fcrepo seems to burst-write to the disk, deadlocks or even slowdowns, have not been seen.

Reading is done by querying the index for the tape name and offset. With this information, an inputstream can be opened to the exact entry in the relevant tape. No locking is necessary for reading. After skipping to the correct offset in the tape, we read until we get to tar record header, which ought to be after 0 bytes. We then read the tar record header to get the size of the record, and then returning an inputstream starting from this position and stopping when the entire record have been read (to prevent the user from reading into the next record). At no time in this do we examine the name of the tar record we are reading.

A tape is marked as indexed (in the index below) when it is closed and a new tape started. As will be explained, tapes that are marked for indexed will not be re-read upon server start-up. 

Taper

Taper is the component that handles the deferred writing. It is a singleton object, which, upon creating, starts a timer task. Every few milliseconds, it examines all changes objects, and tapes all objects whose changes are more than tapeDelay seconds old. 

The procedure is as follows

  1. Acquire a write-lock on the tapingDir
  2. Tape all that is in the tapingDir
  3. Acquire a write-lock on the cacheDir
  4. Iterate through all files in the cacheDir
    1. If the file is older than tapeDelay
      1. move to tapingDir
  5. release write-lock on cacheDir
  6. Tape all that is in the tapingDir
  7. release write-lock on tapingDir

Taping all that is in tapingDir goes like this
  1. Acquire a write-lock on the tapingDir
  2. iterate through all the files in tapingDir
    1. forward the create/remove operation to the TapeArchive
  3. release write-lock on tapingDir

If the Fedora system requests to read an object not in the Cache, the request will be forwarded to the Taper. If the object is in the tapingDir, it will be served from that location, otherwise the request will be forwarded to the TapeArchive. 

Removal operations are handled separately from this procedure, as we want the removal to happen instantly. When the Cache receives a removal request, it immediately forwards this request to the Taper. This operation then blocks until a write-lock can be acquired on the tapingDir and the cacheDir. This, of course, marks it as impossible to run concurrently with the taper timer task. As the timer tasks run basically constantly, it will finish a run, release the locks. The delete thread will hopefully then acquire the locks. The timer thread will thus be unable to start until the delete thread have completed. 

Cache

The cache is the place where changed objects are written. Objects are served from the cache first, so that changed objects always get served in the changed version. The cache does not cache objects retrieved from the tapes, only changed objects submitted by the user.

The Index

For the index, a separate system called Redis http://redis.io/ is used.

...

The index implementation have to provide the following methods

  • tape,offset   getLocation(objectID)

...

  • void setLocation(objectID, tape, offset)
  • iterator<objectID> list(idPrefix)
  • remove(objectID)
  • boolean isTapeIndexed(tape)
  • void setTapeIndexed(tape)

...

The Redis instance holds a number of keys and sets.

...