When it is time to preserve your data, you will need to carefully consider exactly which components of your research need to be preserved. Answering "yes" to any of the following questions with regard to a particular data file or set of data may indicate that those data should be preserved for the long-term.
- Do the data support published research?
- Are the data vulnerable?
- Are the data required for your research but from another source (i.e. not your original research data)?
- If so, is the future availability of those data from the original source uncertain?
- Do you wish, or are you required, to share your data?
- Are the data historically significant?
In addition, you should also consider whether you will need to preserve multiple versions of a file or whether the most recent version will be sufficient for preservation. It may also be important to consider whether the project is still in progress or whether it is complete. Long-term projects, such as those that involve sampling of a single site repeatedly over months or years, may require periodic preservation of data before the project is actually considered "finished."
Different disciplines conduct research in different ways and produce content in different forms, so we have provided some general guidelines below that apply to most researchers, as well as specific examples for various fields of study, including
- science & engineering
- social sciences
- humanities.
GENERAL GUIDELINES
- Definitely deposit:
- original data sets, original software code, raw data obtained from analysis of physical samples, observational data that can not be regenerated
- data sets that are not original but that are not easily available online and that you have permission to share
- for social science data, include study descriptions, codebooks, and summary statistics
- Maybe deposit:
- intermediate versions of analyses or code if they are potentially useful to others or were used in publications or theses
- Not necessary to deposit:
- incomplete, non-functional, or intermediate versions of code that would be of marginal usefulness to others
- output files from analyses if 1) the data set and code used to generate the output are deposited and 2) regenerating the output from the deposited files is fairly easy to do
- data sets that are preserved and accessible via other institutions or organizations
- graphs or charts created from the original data that could easily be regenerated
- Do not deposit:
- any data that contains personal identifying information for human subjects
- Exceptions:
- Output files from analyses may be deposited if they are time-intensive to regenerate or are not excessively large, or can not be easily recreated from the deposited data set and code.
SCIENCE & ENGINEERING RESEARCH
Example 1: Measurement of the size of stars and planets based on images from the Keppler satelite.
- Definitely deposit:
- analytical files and software code
- metadata that identifies exactly which Kepler data were used for your analysis
- Maybe deposit:
- Kepler mission data, as these are managed by NASA
Example 2: To be announced
- Definitely deposit:
- Not necessary to deposit:
Example 3: To be announced
- Definitely deposit:
- Maybe deposit:
SOCIAL SCIENCES RESEARCH
Example 4: How may development of research documentation during the research process contribute to strengthen researcher work and at the same time strengthen data sharing between researcher, preservation of research data, citation and comparison of data through time
- Definitely deposit:
- Maybe deposit:
Example 5: How much does different languages and the teachers skill in these languages influence the efficiency of teaching
- Definitely deposit:
- Do not deposit:
Example 6: To be announced
- Definitely deposit:
- Do not deposit:
HUMANITIES PROJECTS
Example 7: Research and publication regarding Søren Kierkegaards writings
- Definitely deposit:
- Maybe deposit:
- Maybe deposit:
Example 8: Get an overview of the radio data that are accessed through LARM.fm and establish a solution for long time preservation and usage of these data
- Definitely deposit:
- Do not deposit:
Example 9: The development through time of .dk domains from 2005 to 2015
- Definitely deposit:
- Do not deposit:
ABOUT IDENTIFYING INFORMATION, especially for Biomedical or Social Science research
- Direct identifiers should be removed or masked prior to depositing. These include: names, addresses (including zip codes), phone numbers, social security numbers, drivers license numbers, certification numbers etc.
- Indirect identifiers, such as occupation, dates of significant events, job history, educational institutions, rare diseases, place of medical treatment or doctor giving care, and other types of information that could be used in conjunction with other information to identify individuals may need to be recoded in order to minimize the risk of disclosure.
- Respondent identifiers used for the study should also be removed before depositing data.
- If it is not possible to remove or recode all identifying information without significantly impacting the usability of the data, you may not be able to deposit or share the data.
- You should also be aware of and conform to any policies or procedures set forth by your local IRB (Institutional Review Board).