Data documentation and metadata

"Scientific metadata provide the information necessary for investigators separated by time, space, institution or disciplinary norm to establish common ground."  -  Christine Borgman e.a.(1)



   Main points

Data documentation is describing the characteristics of a dataset, occurring at various levels, such as:

  • A description of the process a researcher uses to collect data. Documentation takes place in, for instance a codebook, lab journal, log or diary.
  • A description of the data itself (how much, what data format, what software to use to read the data).
  • A description of the changes of the dataset in time. This is used to create a historical report of all uses and edits of the research data over a period of time. In data jargon this is called data provenance. In order to make a historical report, a description of the data collection process and of the data itself is also essential.  

Proper data documentation ensures that research data are traceable and unambiguously understood and used by current and future users (including the researcher). 

Due to the great diversity of datasets, the choices for documenting the data are not always obvious.

It is useful to know that metadata can sometimes be derived from the data itself. Certain data formats include metadata in their data, e.g. digital photos. When you store them, details about the circumstances you took the picture in are automatically stored: diaphragm, lighting, etc.  

The function of data documentation depends on the phase of the research lifecycle it is in. It is important for data archives to strive for a certain (international) standard for their documentation of data to be able to tie in with other archives. This will be discussed under Metadata for data archives.

"We don't know when data is metadata or just data. Metadata is data that is used to describe other data, so the usage turns it into metadata." - Bargmeyer and Gillman(2)

Metadata types 

Metadata are often called data about data, or information about information. There are metadata to describe content (descriptive metadata) and metadata to interpret the context (data of creation, instruments etc.).

Without contextual metadata some data would appear to be no more than an accidental range of numbers, images or words. Without descriptive metadata it would be impossible to find relevant data in a data archive (also see Metadata for data archives). 

The types of metadata that occur the most:  

Types of metadata Goal Example
Descriptive metadata

The minimal metdata, required to find a digital object.  

If there are additional contextual metadata, a user will have a better idea on how to use the data

Author, title, abstract, date

Contextuele metadata are for example location, time, data collection method (tools)

Structural metadata

These link the individual objects of a unity

Links to related digital objects, (e.g. the article written based on the linked research data)
Technical metadata Information on the technical aspects of the data set Data format, hardware/software used, calibration, version, authentication, encryption, metadata standard
Administrative metadata Metadata focusing on user rights and management of digital objects

License, possible reasons for an embargo, waivers

Search logs, user tracking



Data documentation can take place at various levels. The accordion below shows two cases: one about the data collection process and one about documenting version management.

In addition, metadata tools(3) are used in an increasing number of fields of study. These tools help fit the process of adding metadata into the workflow.

Open notebook science

An example of recording the data collection process is open notebook science:

Version management

Evan Lantsoght, researcher at Delft Technical University, describes how researchers copy tables from one sheet to another when they analyse their research data. And when they want to write an article they cannot help but wonder: what did I edit and why? In the blog she describes a solution(6):  

 "Start by adding an extra 'version management' tab to a new spreadsheet. In this sheet, carefully write down a version name (name of the file, typically) in the first column, in the second column the date, and in a third column an explanation of all changes you made to the sheet. Carefully fill out this sheet every single time you move something around, or tinker with the sheet."

   Sources and additional reading

Click to open/close
  1. Edwards, P. (2011). Science Friction: Data, Metadata, Collaboration. Social Studies of Science, 41(5), 667-690. doi:10.1177/0306312711413314
  2. Bargmeyer, B.E. Metadata standards and metadata registries. Retrieved from
  3. Metadata tools. Retrieved from
  4. Stanford University Libraries. University of Southampton. Open Source malaria. Retrieved from
  5. Bohle, S. (2014, January 1). A four part series on open notebook science. [blog]. Retrieved from 
  6. Lantsoght, E. (2013, October 10). Keeping your spreadsheets under control. [blog]. Retrieved from

Additional reading