DOMS object model creation from tree

Description of the general method of creating the initial object hierarchy in DOMS based on a tree structure in a file system

General description

The goal of this document is to describe the work of transforming data files and accompanying metadata files in a tree structure on disk to a Content-Model-less object tree in DOMS.

For this to be generic some assumptions needs to be taken:

Data files and metadata files that belongs together have the same prefix
Data files can be recognized by their file suffix.
Checksum files is not supposed to be represented directly, but rather as a property of the data the are associated with. They will thus be skipped as objects in the tree, but used in the ingest of the data they belong to.

General rules:

Fedora Objects correspond to file system directories
Sub-directories are represented by a "hasPart" relation to the subdirectory object.
Each object will, as an identifier, have the file system path to the directory
A data file is a file containing data. The actual data is stored outside DOMS. The data file is represented as a file object in doms.
A metadata file is a file containing metadata. The file is stored inside DOMS.
A grouping of files (files having a common prefix) is represented as a object with:
1. Datastreams for each metadata file
2. "hasFile" relations to data files (hasFile is a specialization of hasPart)

Pseudo code expressing the above rules

The following pseudo code is meant to express the above rules on a more formal basis.

In the codes, the methods:

groupByPrefix(): returns a list of lists of files, grouped by their common prefix.
isDataFile(): returns a boolean telling if the given file is a datafile.

void handleDir(myDir, domsParentObject) {
  thisDirObject = new Object(identifier = myDir.getPath());
  domsParentObject.addHasPart(object = thisDirObject);

  handleFiles(myDir.getFiles(), thisDirObject);
 
  for(dir in myDir.getDirectories()) {

    handleDir(dir, thisDirObject);
  }
}

void handleFiles(myFiles, dirParentObject) {
  groupedByPrefix = groupByPrefix(myFiles) //groupedByPrefix is a set of groups. A group is a prefix and a set of files
  if (groupedByPrefix.size == 1){ //There is only one group, so add them as datastreams to the current
      group = groupedByPrefix.get(0)
      for (file in group){
      	handleFile(file, dirParentObject);
	  }
  } else { // there is more than one group, so introduce sub directories
    for(group in groupedByPrefix) {
      addPart(group, dirParentObject);
    }
  }
}

void handleFile(file, parentObject) {
  if(isDataFile(file)) {
    addFile(file, parentObject);
  } else {
    addDataStream(file, parentObject);
  }
}

void addPart(fileGroup, dirParentObject) {
  thisPartObject = new Object(identifier = fileGroup.getPrefix());
  dirParentObject.addHasPart(object = thisPartObject);
  
  for(file in fileGroup) {
    handleFile(file, thisPartObject);
  }
}
 
void addFile(file, parentObject) {
   thisFileObject = new Object(identifier = file.getName);
   parentObject.addHasFile(object = thisFileObject);
}