What should I keep in mind when preparing data for preservation and re-use?
Preparing data for preservation and reuse is not a stage, but an ongoing part of the research process
- Archives and repositories require clarity on who owns data and that permission for preservation and re-use is granted
- Data containing direct or a number of significant indirect identifiers will not be accepted unless anonymised or removed
- Data requires good explanatory contextual material and information to be accepted into an archive or repository
- Converting or migrating data to make the data preservable for the long term
...
At some time during your research you may need to convert or migrate your data files from one format to another - maybe because the place chosen for long-time term preservation cannot handle the current format. This may also be due to a new computer, new software, sharing with someone who has different software, working on a shared platform instead of your own PC, or simply in order to ensure that your data can be read and used in the future, because the safest option to guarantee long-term data access and usable data is to convert data to standard formats that most software are capable of interpreting, and that are suitable for data interchange and transformation
Some “lossiness” (i.e. reduction in quality) may occur when migrating from one file format to another. It is important for you to understand what is at risk for the type of data you are working with.
Potential risks for loss or corruption on conversion or migration to new media include the following:
- Textual data: editing such as highlighting, bold text or headers/footers may be lost
- Data held in statistical packages, spreadsheets or databases: some data or internal metadata such as missing value definitions, decimal numbers, formula or variable labels may be lost during conversion to another format, or data may be truncated
- Image files: loss of layers, color fidelity, resolution etc.
- Multimedia: as above, but attention to frame rates, sound quality, codecs and wrappers is needed.
It is worth briefing yourself on the format you are converting from and to before you begin; at least look them up on the web.
Check the integrity of converted files as thoroughly as possible immediately afterwards, e.g. by counting rows and columns, testing functionality, testing export, etc.