DTU data management plan

Description:
This is a template for research data management plans at DTU.
A data management plan (DMP) describes the data that are being collected in a research project and how they are used. It should cover the whole data life cycle.
The DMP will help to ensure that data are handled securely and efficiently and that external requirements, e.g. from funding agencies or publishers, are being fulfilled. It will also help to identify potential problems before start of the research project.
Proper data management will allow for reproducibilty of scientific results as required by the Code of Conduct for Research Integrity.
While making a DMP, keep in mind that:

It is not a fixed document, but should be updated whenever necessary.
Additional resources might need to be assigned for data management.
Responsibilities for data management need to be defined.

The template is divided into 5 sections with one or two general questions each. Additional questions, suggested answers, further information and useful links are given for inspiration in the corresponding guidelines. Not all questions are equally relevant for all areas of research.
For more information about the DMPonline tool, example DMPs and general information about data management, visit the website of the Digital Curation Center.
This template has been developed by the Office for Bibliometrics and Data Management at DTU.
Contact: Falco Hüser (falh@dtu.dk)

Data Collection

Research data can be many different things - from a table of numbers or a document describing a physical sample to a video interview or a complete database. But also methods, algorithms and software can be considered as data, depending on the type of research they are used for. In many cases, there is also differentiation between raw data, temporary data and results, all of which need to be handled differently. Additionally, the access and use of data can be restricted for various reasons.

Describe the data that will be collected.

Questions to consider:

What type of data will be collected?

e.g. observational data, experimental data, simulation data, data products.

How will the data be collected?

e.g. laboratory equipment, surveys, software.

Which file formats are the data in?

e.g. open or proprietary formats.

What are the estimated amounts of data?

in terms of GB or TB.

How will the data be structured?

e.g. as data sets, naming conventions, ID numbers.

How will the data be versioned?

e.g. version control.

Describe any restrictions to the data.

Questions to consider:

Are there any limitations on the use of existing data?

e.g. commercial databases or software, licenses.

Are there any ethical or legal issues to be considered?

e.g. sensitive data, personal data (see regulations from the Danish Data Protection Agency), Intellectual Property Rights, liability to patents.
Contact the legal advisors at DTU for additional help:
Susanne Schultz sus@dtu.dk and Ane Sandager anesa@dtu.dk
or the DTU Office for Bibliometrics and Data Management.

Are there other external requirements?

e.g. demands from funders.

Data Storage

Depending on the type, amount and intended use of data, special infrastructures might be required for storing and sharing data. Compliance with institutional and departmental guidelines needs to be accounted for.

Describe the IT infrastructure to be used.

Questions to consider:

Where are the raw data and results stored?

e.g. M- and O-drives, department server, lab notebooks.

How are the data backed up?

e.g. AIT services, department IT.

How is access control managed?

e.g. user groups.

How are data shared within the project?

e.g. common file server.

How is security for sensitive data guaranteed?

e.g. anonymization of personal data, restricted access.
Ask the local IT support or the DTU Office for Bibliometrics and Data Management for help with security standards in compliance with ISO 27001.

Documentation

Documentation of data is often considered time-consuming and costly. But documentation means also adding value to the data and making it usable in a broader sense. Research integrity demands that scientific claims are traceable and thus that the underlying data are well documented.

Describe the metadata to be associated with the data.

Questions to consider:

Are there metadata standards?

e.g. disciplinary standards.

What metadata will be included?

e.g. title, timestamp, location, sample ID, creator, version, parameters.

How will the metadata be generated?

e.g. instrument or software logfiles.

Describe the types of documentation that will accompany the data.

Questions to consider:

How will data be documented?

e.g. electronic lab notebooks, accompanying ReadMe files, publications.

How will the data be understandable for secondary users?

e.g. definitions of variables, vocabularies, units of measurement, any assumptions made.

How will reproducibility of results be ensured?

e.g. description of the methodology, analytical and procedural information. Data Sharing
Research data is very valuable and of high interest for others in the scientific community. When research is funded by public money, its methods should be transparent and its outcomes should be made available for everyone. Sharing data will enable reuse and stimulate new research projects. Published data can - in the same way as regular articles - be acknowledged and cited and thereby increase the visibility of the scientists' work.

Describe which data will be shared.

Questions to consider:

Which data will be shared?

e.g. describe value of the data for possible reuse.

Which tools/software are needed to view/visualize/analyze the data?

e.g. freely available, open-source.

Which data cannot be shared?

e.g. due to sensitivity, classification, commercial interests.

Who will have access to the data?

e.g. restricted access, use of licenses.

Describe how the data will be shared for possible reuse.

Questions to consider:

When will data be shared?

e.g. along with a scientific publication, embargo periods.

Where will data be shared?

e.g. in a public repository (see the Registry of Research Repositories for examples), data journal, Supplementary Material.

How will the data be made discoverable?

e.g. Digital Object Identifiers, machine-readable metadata.

Long-term Preservation

There is a high risk of data getting lost, once a project finishes or the researcher who collected the data leaves the institution. This would mean a big waste of time, money and knowledge. Choosing, which data should be preserved for a longer time and making sure that it is readable and understandable is a major challenge but also a rewarding investment.

Describe how data will be archived beyond the scope of the research project.

Questions to consider:

Which criteria will be used to select the data that should be archived for preservation and long-term access?

e.g. value for the scientific community or the public.

Where will data be archived?

e.g. repository, National Archive.

How will readability of the data be guaranteed?

e.g. long-lived file formats.

Which data has to be destroyed?

e.g. due to contractual, legal or regulatory purposes.

Who will be responsible for long-term preservation?

e.g. IT services.

How long should the data be preserved?

e.g. 5 years, 10 years, permanently.

How will long-term preservation be financed?

e.g. repository fees.

DTU_template

DTU data management plan

Data Collection

Describe the data that will be collected.

Describe any restrictions to the data.

Data Storage

Describe the IT infrastructure to be used.

Documentation

Describe the metadata to be associated with the data.

Describe the types of documentation that will accompany the data.

Describe which data will be shared.

Describe how the data will be shared for possible reuse.

Long-term Preservation

Describe how data will be archived beyond the scope of the research project.