Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What should I keep in mind when preparing data for preservation and re-use?

Preparing data for preservation and reuse is not a stage, but an ongoing part of the research process

  • Archives and repositories require clarity on who owns data and that permission for preservation and re-use is granted
  • Data containing direct or a number of significant indirect identifiers will not be accepted unless anonymised or removed
  • Data requires good explanatory contextual material and information to be accepted into an archive or repository  
  • Converting or migrating data to make the data preservable for the long term

Data Ownership  

Is it clear who owns the data? Archives and repositories are unable to accept data where ownership is unclear or permission for preservation and re-use is not given. Clarify ownership early in the research in case there is a problem. If data cannot be shared because of ownership issues, the research funder must be informed during the funding application stage. Funders will hold principal investigators liable where data is unable to be shared because ownership rights were not resolved or permission to deposit data had been not sought.

...

At some time during your research you may need to convert or migrate your data files from one format to another - maybe because the place chosen for long-time term preservation cannot handle the current format. This may also be due to a new computer, new software, sharing with someone who has different software, working on a shared platform instead of your own PC, or simply in order to ensure that your data can be read and used in the future, because the safest option to guarantee long-term data access and usable data is to convert data to standard formats that most software are capable of interpreting, and that are suitable for data interchange and transformation

Some “lossiness” (i.e. reduction in quality) may occur when migrating from one file format to another. It is important for you to understand what is at risk for the type of data you are working with.

Potential risks for loss or corruption on conversion or migration to new media include the following:

  • Textual data: editing such as highlighting, bold text or headers/footers may be lost
  • Data held in statistical packages, spreadsheets or databases: some data or internal metadata such as missing value definitions, decimal numbers, formula or variable labels may be lost during conversion to another format, or data may be truncated
  • Image files: loss of layers, color fidelity, resolution etc.
  • Multimedia: as above, but attention to frame rates, sound quality, codecs and wrappers is needed.

It is worth briefing yourself on the format you are converting from and to before you begin; at least look them up on the web. 

 Check the integrity of converted files as thoroughly as possible immediately afterwards, e.g. by counting rows and columns, testing functionality, testing export, etc.