NetLab: Managing active data checklist

 

 

During the project

 
During the research project data will be collected and registrations of the method have to be documented.
Active dataIn which way are collections and methods organised? (Hint: Organizing files and folders) 
 How much storage space will the data consume? 
 

Which data formats get used (ARC, WARC, CDX, PDF, CSV, ...)?

It is strongly recommended that material that shall be long term preserved is in a durable format, which is
 
Version ControlWhich files get version controlled (sensitive data are not allowed to be that)? 
 Which version control system is used? 
 Which version of the version control system is used? 
Data validation and authenticationCurrent data volume - total size in MB/GB/TB - and likely rate of growth 
 Number of files and folders, and how they are organised 
 Platform - Mac/windows/Linux 
 Applications used to access and work with your data 
 Frequency of update, e.g. working data that changes daily, or data from project that needs to be retained but would not be used often 
 Data type(s): spreadsheets, database, documents, images, datasets, etc. 
 Any special security needs, e.g. personal data, commercial potential 
 Access control: Who needs access to which areas? Do they have access to Netarkivet? If not, where are they from and who are they, e.g. journalist, lawyers, journals etc. 
BackupIs there a backup strategy? 
 

How many copies are there?

 
 Are they placed in another place than the main data storage? 
 Are they stored securely (for instance sensitive data)? 
 On which devices are they placed? 
 

How prone is the device to writing errors?

 
 Is there a plan for periodically 'refresh' the data (i.e. copy to a new disk, USB stick, or portable drive)? 

Organizing and documenting data

 
 
You should create and maintain sufficient documentation or metadata (i.e. structured information about the data) to enable research data to be identified, discovered, associated with its owners and creators, linked to other related data or publications, contextualised in time and space, and to have the quality of the data assessed and research results validated.
If you poorly document your data, it will be difficult (or impossible) to find it and manage it in the longer term. Even if you (or others, in future) can find the data, its value will be diminished if it is hard to interpret. You should always ensure that protocols are agreed early in the project and adopted by all researchers consistently. 

File naming

Digital file names can be important for identifying and finding digital files. You should develop file naming conventions early in a research project, and agree on these with colleagues and collaborators before data is created. (Hint: Organising data: file naming) 
Controlled vocabularies

What vocabulary is used?

A vocabulary sets out the common language a discipline has agreed to use to refer to concepts of interest in that discipline. It models the concepts in a discipline by applying labels to the concepts and relating the concepts to each other in a formal structure. Vocabularies take many forms. They include glossaries, dictionaries, gazetteers, code lists, taxonomies, subject headings, thesauri, semantic networks and ontologies. Wherever possible, you should use an existing controlled vocabulary. Even if you need to adapt or customise an existing standard, this is preferable to creating something from scratch. (Vocabularies and research data)