Research data

Research data are facts, observations or experiences on which an argument or theory is based | Cited in ANDS, 2017

Essentials 4 data support is an introductory course for data supporters, those who (want to) support researchers in storing, managing, archiving and sharing their research data. But what do we actually mean by research data? In this section you will find different definitions and ways of looking at research data. 

Definitions

What a researcher understands by 'research data' depends on the significance of these data in the research process. And that will vary from one scientific discipline to another. Research data exist in many formats, which can be read with just as many different types of software. In the slideshow below you can see a number of definitions of research data.

Five ways 

There are roughly five ways of looking at research data (University of Southampton, 2016; CESSDA, 2017): 

1. The way in which data is collected or obtained

Data can be collected or obtained in various ways, for example through experiments, simulations, observations, derived data or source research. 

2. The forms that data take

Research data are often defined by the form in which they are recorded. Examples include text documents, spreadsheets, electronic lab journals, field notebooks and diaries, questionnaires, transcriptions and code books, audio and video tapes, photographs and films, artefacts, slides, database schemes, models, algorithms and scripts, workflows, protocols, metadata and other data files such as reports from literature research and e-mail archives.

3. De formats in which data are stored

A third way of thinking about data is the data format in which different data types (textual, numerical, multimedia, structured, software code, etc.) are stored. Statistical data can be stored, for example, as SPSS (* .sav) or STATA file formats, films such as * .mpg or * .avi, structured data such as * .xml or in a relational MySQL database and text files such as * .docx, * .pdf or * .rtf.

4. The size of the data files

The size of the data files is important, as is their complexity. Managing a relatively small and simple dataset poses different challenges than managing large, complex databases.

5. The phase in the research lifecycle

The different life stages of research data each have their own challenges for (supporting) research data management.

Practice


Do you want to get a better feel for the concept of research data? Click to view the exercise

This exercise is taken from RDM Rose (2015),, activity sheet 5.2.2. It is an optional exercise that you can do if you want to get a better feel for the concept of research data.   

Case studies

On pages 6-22 of a document of the University of Southampton (2016) you will find five case studies in the field of research data: 

  • medical research
  • materials science
  • aerodynamics
  • chemistry
  • archaeology

Look at one case study in detail and then answer the next two questions:

  • Do you recognize the five ways of looking at research data? How?
  • Identify a number of possible issues that researchers may have in storing, managing, archiving and sharing their research data.

Do you want to see how an ex-student performed the exercise? Click here for an elaboration

Below you can see the elaboration of one of the students of Essentials 4 Data Support, who was looking at case study 3 (Aerodynamics).

5 ways of looking at the data:

  • Collection: numerical model simulations
  • Types: models, algorithms and scripts; software configuration, post-process files a.o. Figures
  • Electronic storage: textual, software code, software specific (mesh), multimedia (figures)
  • Size and complexity: large output files (hundreds of gigabytes) with corresponding additional files such as the input/configuration files and post-processing results (figures and aggregated results)
  • Life cycle: this type of numerical modelling is typically done in the research phase where various wing shapes are “tested” with the model and the performance is compared. A subset of all the simulations carried out, with typical results to underpin the drawn conclusions, is usually described in the publication and therefore minimal required to be published.

Possible issues:

  • Storage: With data volumes of 300GB per 1 sec of simulated flow, the total data volume easily exceeds the size of a regular laptop’s hard drive. Using network or cloud storage, that also has a good connection with the HPC to be used is recommended.
  • Manage: For keeping track of a variety of simulations with sometimes minor differences in model input/configuration it is important to think before starting. A clear directory structure and sufficient description of modifications, and reasons for it, is crucial for good handling of the results. For reproducibility it is important to keep track of the used software version (even more important if it varies between different simulations). I recommend to use a version control system for model input/configuration and pre/post-processing scripts.
  • Archive: For archiving again the data volume and the associated costs may play a role. Therefore archiving only the simulations results for the simulations that are actively discussed in the publication to draw conclusions from and archiving the input/configuration and software version (all necessary information for reproduction) of the remaining simulations might be wise.
  • Share: For sharing of model results, it is crucial that others are able to interpret and reproduce the results. This means that the remarks made in the “manage” section are once more important. Basically, proper data management during the research phase makes you ready for sharing at any time.

 


Sources

Click to open/close

ANDS (2017). ANDS Guides and Resources. What is research data. https://www.ands.org.au/guides/what-is-research-data (PDF https://www.ands.org.au/__data/assets/pdf_file/0006/731823/Whatis-research-data.pdf)

CESSDA (2017). Data Management Expert Guide. Research Data. https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/1.-Plan/Research-data 

OECD (2007). Principles and Guidelines for Access to Research Data from Public Funding, OECD Publishing, Paris. http://www.oecd.org/sti/inno/38500813.pdf

Queensland University of Technology. (2013). Management of Research data. http://www.mopp.qut.edu.au/D/D_02_08.jspRDM Rose (2015). RDM Rose Learning Materials. http://rdmrose.group.shef.ac.uk/?page_id=10#session-51-researchers-and-their-data 

Utrecht University (2016). University policy framework for research data Utrecht University. https://www.uu.nl/sites/default/files/university_policy_framework_for_research_data_utrecht_university_-_january_2016.pdf

University of Southampton. (2016). Introducing Research Data. 4th Edition. https://eprints.soton.ac.uk/403440/1/introducing_research_data.pdf

Van Berchum, M. & Grootveld, M. (2017). Research data management. An overview of recent developments in the Netherlands. http://hdl.handle.net/20.500.11755/a9539a60-ecef-4e62-a998-0fda190b303b