Can we afford to lose certain research data? That is the key question when selecting research data for long-term archiving.
As for the data that is presently collected from the Large Hadron Collider (particle accelerator) there is no doubt(1)."We cannot afford to lose it", says Cristinel Diaconu (chair of the international Data Preservation in Long Term Analysis in High Energy Physics (DPHEP) study group).
A report by the European Union(2) mentions data classes. It is more important for certain data classes to be stored for the long term than it is for others. Data that is eligible for a data archive:
- Data with potential for reuse (which is important (or seems to be) for a larger community);
- Data that improves an open access publication;
- Data that must be archived because the financier demands this;
- Data that is produced via processes that are difficult to repeat.
The flow chart below (amended from an illustration in the DMP template of Wageningen University)(3) illustrates when you should consider archiving research data for the long term. Before going through the chart, it is important to assess whether all the pre-conditions have been met:
- Can the data format and software format be used?
- Is the quality of the data documentation (metadata) sufficient to understand which data is concerned?
- Are there any legal objections preventing the data from being shared (contracts, privacy)?
Whether data should be archived for the long term is always subject to a costs and benefits analysis. How do archiving costs and availability relate to the costs of reproducing the research data? To date, data archives are not really able to calculate this, but there are presently projects in place that are looking into this subject.
Once it has been determined that a data set will be included in a data archive, it is important to determine how long it needs to be saved. The preservation period will depend on the discipline, the developments, the costs for storage and accessibility and the expected (re)use. Data sets that are considered national heritage, e.g. the results of archaeological research, are generally archived indefinitely.
If the preservation period has not been stipulated, it is important to determine after a certain period whether or not the information needs to be permanently archived. The report 'Selection of Research Data'(4) [pdf] states that a period of 10 years is appropriate to reconsider whether research data still needs to be preserved or whether it should be destroyed.