The two major use cases and drivers for what to keep are Research Integrity and Reproducibility (availability of the data supporting the findings in research); and the Potential for Reuse (availability of data for sharing with other users). | Beagrie, 2019
Can we afford not to preserve certain research data? That is the question that is central to the selection of research data for long-term archiving and data publication. Which research data do we archive for verification purposes only? And which datasets do we really make findable and reusable by publishing the (meta)data in a data archive? The criteria are discussed in this section.
Reasons for the retention of research data
There may be several reasons for retaining research data:
- The importance of the research data
Potential value for reuse and (inter)national positioning. Quality, originality, size, scale, production costs of the data or, for example, the innovative nature of the research.
- The uniqueness of the data
The data include non-repeatable observations.
- The importance of data for historical research
The data is important for historical research, especially scientific-historical research.
- Other reasons
The research data is important for non-scientific purposes (cultural heritage, museums or presentations).
In addition to these general considerations, research funders such as the Netherlands Organisation for Scientific Research (NWO, n.d.) are increasingly making it compulsory for research data to be retained in order to make re-use possible. The Netherlands Code of Conduct for Research Integrity (VSNU, 2018) also obliges researchers to retain both raw and processed data for a period appropriate to the discipline and methodologies used.
The selection of research data is not only done on the basis of substantive arguments. In addition, there is a whole list of considerations and preconditions that contribute to the arguments for making the final decision. Consider, for example, the following:
In which formats are the data available? Is the data format and software format usable? For (re)usability, data should preferably be stored in sustainable data formats.
What is the processing phase of the data? Raw/unprocessed, semi-processed or published?
Metadata and data documentation
Is enough metadata and datadocumentation available? Is the information of sufficient quality to understand what the data is all about?
Does clarity exist about intellectual property rights, such as copyright or database rights? Are personal data involved? Can they be archived or published as such or are additional measures required?
Is there a sustainable infrastructure available for archiving or publishing the data? Think of a data archive or an institutional or thematic repository.
Are the costs of selecting, archiving, converting, storing and making data available for reuse taken into account? Whether data is archived for the long term remains a consideration of costs and benefits. How do the costs of archiving or publishing relate to the costs of reproducing the research data?
Archive or publish
If the preconditions are met, it is important to decide whether you will:
- Archive the data for verification purposes or to keep open the possibility to use the data again in future research.
- Publish the data for reuse by (future) others in a data archive or institutional repository.
In the flow chart below, the arguments for making an informed choice are visualised in a simplified manner.
Once it has been established that a dataset will be archived or included in a data archive, it is important to determine how long it should be kept. The retention period will depend on the developments in the discipline, the costs of storage and making data accessible, and the expected (re)use potential. Datasets that are regarded as heritage, such as the results of archaeological research, are generally kept for eternity.
If the retention period has not been determined, a decision on permanent archiving will have to be taken after a certain period of time. The report 'Selection of Research Data' (DANS, 2011) mentions a period of 10 years as the time to reconsider whether research data should still be retained or destroyed.
Click to open/close
4TU.Centre for Research Data (n.d.). Atmospheric Observation Collection Cabauw. http://data.4tu.nl/repository/collection:cabauw
Beagrie, N. (2019). What to Keep: A Jisc research data study. http://repository.jisc.ac.uk/7262/1/JR0100_WHAT_RESEARCH_DATA_TO_KEEP_FEB2019_v5_WEB.pdf
CERN (n.d.). CERN Open data portal. http://opendata.cern.ch/
DANS (2012): Thematische collectie: Oral History. https://doi.org/10.17026/dans-z3c-f26d
DANS (n.d.). Collectie Tweede Wereldoorlog. https://easy.dans.knaw.nl/ui/?wicket:bookmarkablePage=:nl.knaw.dans.easy.web.search.pages.PublicSearchResultPage&q=collectie+tweede+wereldoorlog
Gibney, E. (2013, November 26). LHC Plans for open data future. Nature News. http://www.nature.com/news/lhc-plans-for-open-data-future-1.14244
NASA. (2011). Astronomers find elusive planets in decade old hubble-data. http://www.nasa.gov/mission_pages/hubble/science/elusive-planets.html
NWO (n.d.) Open science. https://www.nwo.nl/en/policies/open+science
Tjalsma, H.; Rombouts, J. (2011). Selection of research data - Guidelines for appraising and selecting research data. Retrieved from from http://www.dans.knaw.nl/nl/over/organisatie-beleid/publicaties/DANSselectionofresearchdata.pdf
Utrecht University (n.d.a.). Storing and preserving data. RDM Support. [Guide]. https://www.uu.nl/en/research/research-data-management/guides/storing-and-preserving-data
Utrecht University (n.d.b.). Publishing and sharing data. RDM Support. [Guide]. https://www.uu.nl/en/research/research-data-management/guides/publishing-and-sharing-data
VSNU (2018). The Netherlands Code of Conduct for Research Integrity. https://doi.org/10.17026/dans-2cj-nvwu.