Data selection

   Main points

Can we afford to lose certain research data? That is the key question when selecting research data for long-term archiving. 
As for the data that is presently collected from the Large Hadron Collider (particle accelerator) there is no doubt(1)."We cannot afford to lose it", says Cristinel Diaconu (chair of the international Data Preservation in Long Term Analysis in High Energy Physics (DPHEP) study group).  

report by the European Union(2) mentions data classes. It is more important for certain data classes to be stored for the long term than it is for others. Data that is eligible for a data archive:   

  • Data with potential for reuse (which is important (or seems to be) for a larger community);
  • Data that improves an open access publication;
  • Data that must be archived because the financier demands this;
  • Data that is produced via processes that are difficult to repeat.

The flow chart below (amended from an illustration in the DMP template of Wageningen University)(3) illustrates when you should consider archiving research data for the long term. Before going through the chart, it is important to assess whether all the pre-conditions have been met:

  • Can the data format and software format be used?
  • Is the quality of the data documentation (metadata) sufficient to understand which data is concerned?
  • Are there any legal objections preventing the data from being shared (contracts, privacy)?

Whether data should be archived for the long term is always subject to a costs and benefits analysis. How do archiving costs and availability relate to the costs of reproducing the research data? To date, data archives are not really able to calculate this, but there are presently projects in place that are looking into this subject.

Once it has been determined that a data set will be included in a data archive, it is important to determine how long it needs to be saved. The preservation period will depend on the discipline, the developments, the costs for storage and accessibility and the expected (re)use. Data sets that are considered national heritage, e.g. the results of archaeological research, are generally archived indefinitely.


If the preservation period has not been stipulated, it is important to determine after a certain period whether or not the information needs to be permanently archived. The report 'Selection of Research Data'(4) [pdf] states that a period of 10 years is appropriate to reconsider whether research data still needs to be preserved or whether it should be destroyed. 


Case 1

Case 1

The Cabauw Radar Data(5) in 4TU.Centre for Research Data is a clear example of data that meets the selection criteria. These data sets contain information on the climatic influence of substance particles on the formation of clouds. These are measurements that you can only do once; measurements that could provide valuable information about climate change. The processed data as well as the raw data of these climatic data are stored. The argument for also keeping the raw data is that it might include information that we are not yet able to retrieve.

Case 2

Case 2

A good example of new insights from old data comes from NASA. When old data of the Hubble telescope was re-analysed(6), two new planets were found. The present technology for analysis is more advanced than it was previously. As a result, it was possible to make this new discovery.

Case 3

Case 3

Projects involving interviews can also be considered research that is difficult to repeat. Recording personal experiences of for example WW2 is now often a matter of 'now or never' due to the advanced age of the people that are interviewed. DANS has a vast amount of interview data in its collections Oral History en WW2  (in Dutch) that is a valuable source of information for historic research, now and in the future. These interviews are archived behind the screens in large format and are considered 'raw data' and are displayed via EASY in MP4.

Case 4

Case 4

The Cultural Changes study, a biennial survey of the Social Cultural Planning Office (SCP), was originally based on a replication in 1975 of 200 survey questions of some fifteen studies that were stored at the then Steinmetz archive. The data from all these studies, including those from the Cultural Changes research are available via EASY. 

   Sources and additional reading

Click to open/close


  1. Gibney, E. (2013, November 26). LHC Plans for open data future. Nature News. Retrieved from
  2. Expert group on scientific data, European commission. (2010). Riding the wave. Retrieved from
  3. Wageningen Universiteit. Data Management Plans. Retrieved from
  4. Tjalsma, H.; Rombouts, J. (2011). Selection of research data - Guidelines for appraising and selecting research data. Retrieved from from
  5. 4TU.Centre for Research Data. Atmospheric Observation Collection Cabauw. Retrieved from
  6. NASA. (2011). Astronomers find elusive planets in decade old hubble-data. Retrieved from

Additional reading