Forberedelse/Input fra cases

Case'Select data for archiving'Repository/Evaluation Points
Handler om at bevare TEI-filer. TEI-filerne er selvbeskrivende (metadata i TEI header) så der er ikke så mange dikkedarer med dem som bevaringsobjekter (tror jeg).Regner med at bevaring kommer til at ske i KB’s repository/Bitmagasinet.
Hum_SB_LARM  Forventes at blive SB's “Research Data Repository”.
Hum_SB_Netlab

Procedurer for at skabe indeks, altså algoritmer skal bevares og deles.

Korpus metadata i CSV-format skal bevares og deles.

Beskrivelsen af index-filen og beskrivelse af procedure for, hvordan man viser data skal bevares og deles.

Indekser over korpus vil kunne indeholde personhenførbare informationer, og kan derfor ikke deles. Indekser skal i stedet slettes.

Hvor længe skal data bevares? 10 år

Forventes at blive SB's “Research Data Repository”.
NatKB_Kepler
The data produced and corrected by the KASC members and workgroups should remain accessible for research until at least 2065. To ensure that data remain accesible for reuse for that entire period, independantly of the continued availability of the KASOC website and database, data sets of suitable structure, content and documentation will be extracted from KASOC and deposited in a long-term preservation repository.
Research data/data products that should be considered for preservation, include
  • Kepler orginal data
  • Concatenated time-series
  • Power spectra
  • Ground-based observations
Documentation for individual files includes
  • Data Release Notes
  • Data file dependency information (dependencies on other Kepler data products)
  • Processing information/algorithms
  • Metadata for the specific observation in the FITS file header
General information includes
  • Target selection criteria for the entire operation
  • Calibration data from the basic processing of spacecraft data
The purpose of the preservation exercise is to provide for reuse of the Kepler data currently stored in KASOC. It is envisioned that the data will be valuable for researchers working on research questions which are quite different from the ongoing KASC research. Preservation datasets should present themselves to a future researcher as a self-contained package, offering all the necessary information for the researcher to asses the relevance of the data and to utilise it for research. Therefore the preservation datasets must contain all the necessary metadata and documentation, either physically as part of the package or through links to persistent identifiers of externally hosted items (such as journal articles).
Furthermore, as all Kepler data are open data, the preservation datasets must support discovery from criteria that are considered to be most relevant for future researchers. That means that the metadata holding information that can be expected to be used as search criteria, must be secured and structured in a way that enables exposure for harvesting and indexing.
There may be financial limitations on how much material can be long-term preserved. The criteria for selection are, in prioritized order:
  1. Data necessary for enabling reuse, as described above
  2. Data which is not being preserved elsewhere and accessible through persistsent identifiers
  3. Data which may exist elsewhere, but will improve discoverability and help future researchers' interaction with and reuse of the data
The envisioned preservation period is 50 years, this exceeds the sustainability models for most data repositories. It is therefore necessary to look for repositories with an extraordinary solid financial and institutional support.
The aspect of time period also needs to be considered, when evaluating which data are being preserved elsewhere in other data or journal repositories. What is the sustainability model for these repositories or archives, for how long is the preservation of data guaranteed?
 
Sam/Sund_SDUB_DDA_Surveydata i eksperiment og tidsserie
  • Select data for archiving: Hvilke 'selection criteria' bruger I til at udvælge data til langtidsbevaring i denne case?
    • Definitely deposit
-        Original datasets, survey/questionnaire, study description, codebook/variable description, summary statistics, publications, other relevant documentation/metadata (2009 undersøgelsen som eksempel: http://dda.dk/catalogue/26798?lang=da)
    • Maybe deposit
-        Unoriginal datasets (from official statistics or registries – data that can be obtained from other sources), secondary publications, working documents
    • Do not deposit
-        Duplicate data (either the latest version or raw data, e.g. only survey and background variables, will be preserved)
  • RDM Repository Evaluation Hvilket repository vil I bruge til langtidsbevaring i denne case?
    • Hvis repository er valgt: hvilke repositories har i evalueret, og hvordan?
-        The National Archive (Rigsarkivet).
    • Hvis repository ikke er afklaret: hvilke evalueringspunkter er vigtigst for jeres case? Har I nogle forslag til repositories, som kunne være relevante? Ønsker I at jeres case kan bruge et nyt SB Research Data Repository, som forhåbentlig bliver etableret i løbet af forsommeren?
Sam/Sund_SDUB_DDA_Sundheds- og sygelighedsundersøgelsen  
HumRUB_(CALPIU’s storehouse)CALPIU vil bruge alle recordings til langtidsopbevaring. Programmet CLAN er brugt til transskriptionsfiler og mediefiler.
 

Der er ikke en repository valgt, men data ligger nu i noget der hedder Storehouse.

Her er især sikkerhed og adgangsstyring vigtige elementer for vores case da det drejer sig om personfølsomme informationer. Så snart der er fundet en repository ønsker CALPIU / RUC at prøve den af med data af en størrelse af omkring 500 GB.

Tek_DTUB_DTU Space 
Data skal gemmes på DTU-servere. 
Zenodo anbefales til data som skal offentliggøres (men er vel strengttaget ikke egentlig arkivering).
Forskningsdiscipliner som har deres egne data arkiv infrastrukturer skal bruge disse.
Tek_DTUB_DTU Vind [tbc] 
Data skal gemmes på DTU-servere. 
Zenodo anbefales til data som skal offentliggøres (men er vel strengttaget ikke egentlig arkivering).
Forskningsdiscipliner som har deres egne data arkiv infrastrukturer skal bruge disse.