NetarchiveSuite archive POC

As of April 2022 this proof of concept seems to have been abandoned.


POC usage by the NetarchiveSuite project as bitrepository.
The NAS POC implementation for the BitRepository should only contain the most simple elements:


  • Harvesting through the NAS software.
  • Implementing a new ArcRepository for communicating with the BitRepository.
  • The bitpreservation is handled by the BitRepository.
  • No batchjobs -> No Index handling -> No deduplication, no ViewerProxy and no Wayback.

ArcRepository aka. BitArcRepository

The following NAS messages are to be handled by this BitArcRepository.

Store

This message tells that a harvester (or any other user) wants the ArcRepository to store a file.

Convert the NAS StoreMessage into a call through the PutFileClient

  • Receive the NAS StoreMessage
  • Make a eventhandler for receiving updates on how it is going.
  • Use the BR reference PutFileClient with the following arguments:
    • The URL should be taken from the RemoteFile in the StoreMessage.
    • The FileID is also present in the RemoteFile in the StoreMessage.
    • The FileSize is also present in the RemoteFile
    • The EventHandler is the one just created.
  • If the EventHandler receives a 'Failed', then reply 'Store NotOk' to the StoreMessage.
  • If the EventHandler on the other hand receives a 'Complete', then reply 'Store Ok'.

RemoveAndGetFile

This is for deleting a file in NAS.

Should be replaced by a call through the GetFileClient followed by a call through the DeleteClient.

Is not required for this setup.

GetFileMessage

Is for retrieving a file in a specific archive/pillar.

Can be converted into a call through the GetFileClient.

GetAllFilenamesMessage

Is for retrieving a list of the name of the files in a specific archive/pillar.

Can be converted into a call through the GetFileIDsClient.

Is not required for this setup.

GetAllChecksumsMessage

Is for retrieving all the checksums of all the files in a specific archive/pillar.

Can be converted into a call through the GetChecksumsClient (Or if it is not possible to define 'all' as fileIDs, then a combination between the GetFileIDsClient and the GetChecksumsClient)

Is not required for this setup.

GetChecksumMessage

Is for retrieving the checksum of a single file in a specific archive/pillar.

Can be converted into a call through the GetChecksumsClient.

CorrectMessage

Is for replacing bad files in the archives.

Can be converted into a call through the ReplaceClient.

Is not required for the setup.

GetMessage

Is for retrieving a specific part of a file.

The whole file has to be retrieved through the GetFileClient, and then the wanted part of the file has to be obtained and returned.

The following messages cannot be handled by the BitRepository:

  • AdminDataMessage
    • Is obsolette for changing the state of the files in AdminData (predecessor for the AdminDatabase)
  • BatchReplyMessage
    • Is for handling the replies of a batchjob.
  • BatchMessage
    • Is for initiating a batchjob