Reproducibility is like brushing your teeth. It is good for you, but it takes time and effort. Once you learn it, it becomes a habit. | Irakli Loladze, 2016
When planning a research project, it is important to consider how integrity, reproducibility and FAIRness can be incorporated into the research design. In this section, we give a number of tips.
The research design in a nutshell
Even before a researcher starts collecting research data, he or she will start working on the research design. The research design provides answers to questions such as:
What is the research question?
What are the research questions and hypotheses and where do they follow from? What is already known in existing scientific literature?
Which research method is suitable?
Are you going to collect new research data or are you going to use existing datasets? If you are going to collect new data, what research methods will you use to do so?
The table below shows a number of simple examples of common research methods, combined with the instrument with which the research is carried out and the data documentation that can be used in this phase. A codebook is a kind of legend for the data files (Lavrakas, 2008). You can find out which variables are in the data files and what the codings used mean. Using a diary and lab journal, researchers document the data collection process.
|Research method||Instrument||Data documentation|
|Case study||Combination of interviews and observations||Diary|
|Experimental research in a laboratory||Measurement or observation||(Elektronic) Lab journal|
How do you prepare your research for reproducibility and reuse?
What is the best way to organise, store and share the expected research data/code in order to achieve reproducibility goals and enable reuse? It goes without saying that the answer to this question depends on the research discipline. With Barba (2012), Goodman (2014), Chen (2019), Stodden (2017), and the Science Code Manifesto (Barnes, n.d.) we put together a number of tips that may be useful in the design phase:
- Define the reproducibility objectives of the research project
- Imagine what a 'reproducibility declaration' in an article could look like
More and more scientific articles contain a 'data availability statement'. This is a nice step, but it does not tell another researcher how he or she could reproduce the research. With a reproducibility statement a research cán. If you think beforehand about how such a reproducibility declaration could look, that will tell a lot about how to design and document the research in order to make reproducibility possible.
- Consider what it takes to make a workflow recipe available
How is a researcher going to make sure others understand what he or she has done? What is the best way to structure and document the research process and the research data? Good data documentation ensures that research data can be found and clearly understood and used by current and future users (including the researcher him- or herself). The publication of a workflow recipe - the description of the collection and processing/analysis of research data or software code - provides essential context for the interpretation and reuse of the data. The name for 'the workflow recipe' varies from one scientific discipline to another. Sometimes researchers include a 'methods' or 'analysis' section in their scientific output. In computer and information sciences, 'workflow' is a common term. The information recorded in the workflow shows how the data was created.
- Embrace openness where possible
If possible, opt for open source software with large user communities and store data in open data formats. Share workflow and research output as openly as possible. Publish research data, software, workflows, and anything else needed to repeat the experiment in reliable repositories and encourage data and software citation. Use a license that is as open as possible.
Simple compliance with openness is not sufficient to foster reuse and reproducibility in particle physics. Sharing data is not enough; it is also essential to capture the structured information about the research data analysis workflows and processes to ensure the usability and longevity of results | Chen, 2019
In 'a manifesto for reproducible science' (Munafo e.a., 2017), the authors set out a series of measures that directly address specific threats to reproducible science. These include attention to and training on the robustness of research methods used, the promotion of team science, data transparency and open science, and rewarding open and reproducible practices through measures, such as the funding of replication studies.
A replication study is a study that attempts to repeat an earlier study using similar methods and is carried out under similar circumstances. Since 2016, the Dutch research financier NWO has been encouraging replication studies (NWO, n.d.). Not all replication studies succeed. This does not necessarily mean, however, that the original research was incorrect. This is too short-sighted (Vrieze, 2019). By funding such studies, NWO is demonstrating that it is able to value the integrity of scientific research not only in words but also in deeds. KNAW also believes that replication research is an important tool for improving scientific knowledge and the functioning of scientific disciplines. Replication research should be applied more often and more systematically than currently is the case (KNAW, 2018).
In Essentials 4 Data Support we zoom in on two concrete tools for planning for scientific integrity and reproducible research (Kavanagh, 2019):
- A data management plan
A data management plan (DMP) - sometimes also called an output management plan (OMP) (F1000, n.d.) -is an established tool for integrity by design and for preparing research data for a future that is FAIR.
With preregistration, the quality of the research question, the research design, the methodology and the intended method of data analysis are recorded and assessed via peer review prior to the research. The intention to preregister can also be part of a DMP.
These two tools are discussed in detail in the next two paragraphs. Integrity by design also includes privacy by design. We will go into this in chapter V.
Scientific knowledge can only grow if researchers can trust the results of earlier studies. Being able to reproduce results is important, not only because it aids scientific progress, but also because non-reproducible results waste resources, can harm individuals and society, and may erode public trust in science | KNAW, 2018
Click to open/close
Barba, L.A. (2012). Reproducibility PI Manifesto. figshare. [Presentation]. https://doi.org/10.6084/m9.figshare.104539.v1
Barnes, N. (n.d.) Science Code Manifesto. http://sciencecodemanifesto.org/
Chen, X. et al. (2019). Open is not enough. Nature Physics 15, 113-119. https://doi.org/10.1038/s41567-018-0342-2
F1000 (n.d). Your go-to guide to making your data Findable, Accessible, Interoperable, and Reusable (FAIR). https://f1000.com/resources/FAIR_Open_Guide.pdfhttps://f1000.com/resources/FAIR_Open_Guide.pdf
Goodman, A., Pepe, A., Blocker, A.W., Borgman, C.L., Cranmer, K., Crosas, M., et al. (2014). Ten Simple Rules for the Care and Feeding of Scientific Data. PLoS Comput Biol 10(4): e1003542. https://doi.org/10.1371/journal.pcbi.1003542
Kavanagh, C. M., & Kapitány, R. (2019, June 17). Promoting the Benefits and clarifying misconceptions about Preregistration, Preprints, and Open Science for CSR. doi.org/10.31234/osf.io/e9zs8
KNAW (2018). Replication studies. Improving reproducibility in the empirical sciences. https://www.knaw.nl/shared/resources/actueel/publicaties/pdf/20180115-replication-studies-web
Lavrakas, P.J. (2008). Encyclopedia of Survey Research Methods. Codebook [encyclopedia entry]. http://methods.sagepub.com/reference/encyclopedia-of-survey-research-methods/n69.xml
Loladze, I. (2016) [Quote] taken from: Baker, M. (2016, 26 May). 1500 scientists lift the lid on reproducibility. Survey sheds light on the ‘crisis’ rocking research. Nature 533, 452–454. https://doi.org/10.1038/533452a
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie Du Sert, N., ... Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), . https://doi.org/10.1038/s41562-016-0021
National Academies of Sciences, Engineering, and Medicine. (2018). Open Science by Design: Realizing a Vision for 21st Century Research. Washington, DC: The National Academies Press. https://doi.org/10.17226/25116
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., … DeHaven, A. C. (2015). Transparency and Openness Promotion (TOP) Guidelines. Science. Vol. 348, Issue 6242, pp. 1422-1425. https://doi.org/10.1126/science.aab2374
NWO (n.d.) Programme Replication Studies. https://www.nwo.nl/en/research-and-results/programmes/replication+studies
Stodden, V. (2017, July 20). Enhancing Reproducibility for Computational Methods. Towards an Open Science Committee. [Presentation]. http://sites.nationalacademies.org/cs/groups/pgasite/documents/webpage/pga_180684.pdf
Vrieze, J. de (2019, 1 Maart). Analyse herhaalstudies. Herhaalstudies zijn hip, maar wat leren we ervan? Dit onderzoek naar pupillen biedt inzicht. [Nieuwsbericht] https://www.volkskrant.nl/wetenschap/herhaalstudies-zijn-hip-maar-wat-leren-we-ervan-dit-onderzoek-naar-pupillen-biedt-inzicht~b56d74d3