/
Structure checks done

Structure checks done

Below are lists of the checks done by each checker. Each is prefixed with an error number (in boldface) that should appear in the given error message returned by checkers.

The format of the error number is: "2F-" (for appendix 2F of the specification to Ninestars, which specifies demands to file structure) followed by a letter specifying which checker checks for the given error, finally followed by a number unique to the given checker. The letters for the different checkers are:

S - schematron file structure checks

Q - sequence number checks (java)

M - mfpak-related checks

O - other checks

Schematron simple file structure checks

Each of these is prefixed with a name (like "batchNode") which shows which node of the batch structure is checked.

2F-S1 batchNode: Form of name: B<batchID>-RT<Roundtrip>

2F-S2 batchNode: Existence of WORKSHIFT-ISO-TARGET

2F-S3 batchNode: All folders except WORKSHIFT-ISO-TARGET have form <batchID>-[0-9]{2} No other files/folders

2F-S4 workshiftIsoTarget: Existence of nodes in WORKSHIFT-ISO-TARGET, i.e. Target-files

2F-S5 workshiftIsoTarget: Names (nodes) in WORKSHIFT-ISO-TARGET must be of the right format: Target-[0-9]{6}-[0-9]{4}

2F-S6 workshiftIsoTarget: No other files or folders

2F-S7 workshiftImage: Form of names: Target-<targetSerialisedNumber>-<billedID>.(jp2|mix)

2F-S8 workshiftImage: One mix-file per jp2-file

2F-S9 workshiftImage: 6-digit targetSerialisedNumber

2F-S10 workshiftImage: 4-digit billedID

2F-S11 workshiftImage: There must exist a file in each WORKSHIFT-ISO-TARGET/Target-[0-9]{6}-[0-9]{4} called Target-[0-9]{6}-[0-9]{4}.mix.xml

2F-S12 workshiftImage: There must exist a jp2-node in each WORKSHIFT-ISO-TARGET/Target-[0-9]{6}-[0-9]{4} called Target-[0-9]{6}-[0-9]{4}.jp2 containing a contents attribute

2F-S13 film: Any folder in BATCH not called WORKSHIFT-ISO-TARGET must have name of format <batchID>-[0-9]{2} (a FILM folder) with batchID as in BATCH folder

2F-S14 film: Existence of film.xml

2F-S15 film: Existence of edition-folder(s) with name of form [12][0-9]{3}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])-[0-9]{2}

2F-S16 film: Only existence of FILM-ISO-target, UNMATCHED, or [12][0-9]{3}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])-[0-9]{2} are allowed

2F-S17 film: Existence of file with name: [avisID]-[batchID]-[filmSuffix].film.xml (batchID as in parent dir FilmNodeChecker, filmSuffix as in parent dir FilmNodeChecker) No other files/folders.

2F-S18 unmatched: Nodes in UNMATCHED must have format [avisID]-[filmID]-[0-9]{4}[A-Z]? where [avisID]-[filmID] is as found in the film metadata file for this film.

2F-S19 filmIsoTarget: nodes have form: [avisID]-[filmID]-ISO-[1-9] where [avisID]-[filmID] is as in film-xml of parent directory

2F-S20 filmIsoTargetFile: If there is a FILM-ISO-target folder, it must contain atleast one file (node)

2F-S21 edition: folder name has form: [dato]-[udgaveLbNummer] i.e. [12][0-9]{3}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])-[0-9]{2}

2F-S22 edition: atleast one node (i.e. newspaper page scan) must exist in edition folder

2F-S23 edition: a file exists with name [avisID]-[editionID].edition.xml where avisID is as in the film-xml and editionID is as in our parent folder name

2F-S24 edition: If there is an attribute (file) in the edition directory, it must have name [avisID]-[editionID].edition.xml where avisID is as in the film-xml and editionID is as in our parent folder name

2F-S25 editionPage: Any node not ending in .brik must have name of the form [avisID]-[editionID]-[0-9]{4}[A-Z]? where avisID is as in film-xml and editionID is as parent directory name

2F-S26 <functionality moved to Mfpak-checks>

2F-S27 <functionality moved to Mfpak-checks>

2F-S28 editionPage: Any node not ending in .brik must contain a .mods.xml attribute with name prefix as that of parent node

2F-S29 editionPage: Any node not ending in .brik must contain a .mix.xml attribute with name prefix as that of parent node

2F-S30 editionPage: Any node not ending in .brik must contain a .jp2 node with name prefix as that of parent node

2F-S31 editionPage: Any node not ending in .brik can only contain attibutes ending in .mix.xml, .mods.xml, or .alto.xml

2F-S32 editionPage: Any node not ending in .brik can only nodes ending in .jp2 (no other nodes)

2F-S33 editionPage: For any node not ending in .brik, any sub-node must contain an attribute called "contents"

2F-S34 unmatchedPage: Any node in UNMATCHED must contain an attribute with name ending in .mix.xml

2F-S35 unmatchedPage: Any node in UNMATCHED must contain an attribute with name ending in .jp2

2F-S36 unmatchedPage: Any node in UNMATCHED can only contain attributes with names ending in .mix.xml, .mods.xml, or .alto.xml

2F-S37 unmatchedPage: Any node in UNMATCHED can only contain nodes with names ending in .jp2

2F-S38 unmatchedPage: Any node under a node in UNMATCHED must contain an attribute called "contents"

2F-S39 brik: Any node in an edition, with a name X ending in -brik must contain an attribute with name X.mix.xml

2F-S40 brik: Any node in an edition, with a name X ending in -brik must contain a node with name X.jp2

2F-S41 brik: For any node in an edition, with a name X ending in -brik, any contained attribute must have name X.mix.xml

2F-S42 brik: For any node in an edition, with a name X ending in -brik, any contained node must have name X.jp2

2F-S43 brik: For any node in an edition, with a name X ending in -brik, any contained node must contain an attribute called "contents"

2F-S44 filmIsoTargetScan: Any node in FILM-ISO-target with a name X must contain an attribute with name X.mix.xml

2F-S45 filmIsoTargetScan: Any node in FILM-ISO-target with a name X must contain a node with name X.jp2

2F-S46 filmIsoTargetScan: For any node in FILM-ISO-target with a name X, any contained attribute must have name X.mix.xml

2F-S47 filmIsoTargetScan: For any node in FILM-ISO-target with a name X, any contained node must have name X.jp2

2F-S48 filmIsoTargetScan: For any node in FILM-ISO-target with a name X, any contained node must contain an attribute called "contents"

2F-S49 checksumExistence: Every attribute (file) must have a checksum

Sequence number checks (java)

2F-Q1 Page numbering: The pages numbered inside of a film across editions and UNMATCHED folder must be sequential without holes starting with 0001.

2F-Q2 Partial page subnumbering: For each node (in an edition) that corresponds to a scanned page, if the [billedID] is followed by a letter, these letters must be sequential "within" the [billedID].

2F-Q3 Workshift-iso-target numbering: For the a given targetSerialisedNumber, the billedID numbering (last number NNNN) must be sequential without holes and start with 1.

2F-Q4 Film-suffix numbering: Inside of a Batch node, the film sequence numbers (last number in filmID) must be without holes starting with 1.

2F-Q5 Edition numbering: For the edition nodes for a given date, the numbering (the number after the date) must be in sequence without holes starting with 1.

MF-PAK related checks (java)

2F-M1 film: Validate that all film filenames contain the avisID from the MFPak database

2F-M2 batchNode: The batch contains the correct number of films

2F-M3 film/edition: Each film only contains editions from dates that are expected from MFpak

2F-M4 editionPage: Any node not ending in .brik must contain a .alto.xml attribute with name prefix as that of parent node (if option B1, B2, or B9 is "on")

2F-M5 editionPage: Any node not ending in .brik must not contain a .alto.xml attribute with name prefix as that of parent node (if all of options B1, B2, and B9 are "off")



Other checks

2F-O1 Checksums: Check that all checksums are correct