NetLab: Managing active data checklist
During the project | ||
---|---|---|
During the research project data will be collected and registrations of the method have to be documented. | ||
Active data | In which way are collections and methods organised? (Hint: Organizing files and folders) | |
How much storage space will the data consume? | ||
Which data formats get used (ARC, WARC, CDX, PDF, CSV, ...)? It is strongly recommended that material that shall be long term preserved is in a durable format, which is
| ||
Version Control | Which files get version controlled (sensitive data are not allowed to be that)? | |
Which version control system is used? | ||
Which version of the version control system is used? | ||
Data validation and authentication | Current data volume - total size in MB/GB/TB - and likely rate of growth | |
Number of files and folders, and how they are organised | ||
Platform - Mac/windows/Linux | ||
Applications used to access and work with your data | ||
Frequency of update, e.g. working data that changes daily, or data from project that needs to be retained but would not be used often | ||
Data type(s): spreadsheets, database, documents, images, datasets, etc. | ||
Any special security needs, e.g. personal data, commercial potential | ||
Access control: Who needs access to which areas? Do they have access to Netarkivet? If not, where are they from and who are they, e.g. journalist, lawyers, journals etc. | ||
Backup | Is there a backup strategy? | |
How many copies are there? | ||
Are they placed in another place than the main data storage? | ||
Are they stored securely (for instance sensitive data)? | ||
On which devices are they placed? | ||
How prone is the device to writing errors? | ||
Is there a plan for periodically 'refresh' the data (i.e. copy to a new disk, USB stick, or portable drive)? | ||
Organizing and documenting data | ||
You should create and maintain sufficient documentation or metadata (i.e. structured information about the data) to enable research data to be identified, discovered, associated with its owners and creators, linked to other related data or publications, contextualised in time and space, and to have the quality of the data assessed and research results validated.If you poorly document your data, it will be difficult (or impossible) to find it and manage it in the longer term. Even if you (or others, in future) can find the data, its value will be diminished if it is hard to interpret. You should always ensure that protocols are agreed early in the project and adopted by all researchers consistently. | ||
File naming | Digital file names can be important for identifying and finding digital files. You should develop file naming conventions early in a research project, and agree on these with colleagues and collaborators before data is created. (Hint: Organising data: file naming) | |
Controlled vocabularies | What vocabulary is used? A vocabulary sets out the common language a discipline has agreed to use to refer to concepts of interest in that discipline. It models the concepts in a discipline by applying labels to the concepts and relating the concepts to each other in a formal structure. Vocabularies take many forms. They include glossaries, dictionaries, gazetteers, code lists, taxonomies, subject headings, thesauri, semantic networks and ontologies. Wherever possible, you should use an existing controlled vocabulary. Even if you need to adapt or customise an existing standard, this is preferable to creating something from scratch. (Vocabularies and research data) |