Data jargon

A variety of organisations and perspectives on data has led to different definitions. In the course we use the definitions below.

Data archive

A data archive is a facility which moves data to an environment for long-term retention. A data archive is indexed and has search facilities, enabling data to be retrieved.

Data format

The way in which data or information is coded and stored. A data format (or file format) gives information on how to process the data.

Data backup A copy of the data for the purpose of creating a duplicate dataset.
Data lab A data lab is a virtual research environment that enables researchers to organize and share their research data and related output during their research project.
Data management plan

A written agreement describing the research project, the type and volume of data produced and stating which data will be saved, how they will be saved (file format, version control, metadata), whether and when data will be submitted to a repository and under which terms. If necessary, it describes the tools (hardware and software) that are required to (re)use the data.

Data provenance Data provenance is providing a historical record of the data and its origins. It refers to the process of tracing and recording the origins of data and its movement between databases. (see
Data repository A general term for a location to store data. A data repository with a policy for long-term preservation is called a data archive.

Data sharing policy



An institutional policy concerning the sharing of research data. It is often written as a letter of intent declaring that research data will be submitted to dedicated repositories as soon as possible, complying with international data and exchange formats.D

Datatweeps People tweeting about data


The digital object identifier is a unique and stable identifier that ensures that a digital object can be permanently found on the World Wide Web, regardless of changes in the URL where the object is found. A central registry ensures that the user of a DOI will be referred to its current location. (see e.g. 

Data seal of approval

An archive holding a Data Seal of Approval (DSA) complies with requirements ensuring that in the future, research data can still be processed in a high-quality and reliable manner. (see 

Linked data

A term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information and knowledge on the Semantic Web using RDF. Linked data refers to data published on the web in such a way that it is machine-readable, that its meaning is explicitly defined, that it is linked to other external data sets, and that in turn it can be linked to from external data sets. (see


Data about data. Standardised structured information explaining the purpose, origin, time references, geographic location, creator, access conditions and terms of use of a data collection.

Open data A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike. (see
Persistent identifier

A unique code that is coupled to a digital object. With this code, the object can be identified even when the object is moved to a different location. The DOI and the URN:NBN are examples of persistent identifiers.

Preferred format

A file format that has, to the best of our knowledge at this moment, the best chances of being useable in the (far) future.


When speaking about preservation, two distinct perspectives are distinguished: 

  1. Short-term preservation: Keeping data available in its present shape.
  2. Long-term preservation: Keeping data available in a usable shape for future users.

Keeping data in its present shape, means protecting data from incidental loss and making data findable through proper metadata. Long-term preservation adds the task of changing the data format in a reliable way and being accountable for all manipulations in order to keep the data in a shape that is demanded by future software or future working practices of the designated community.

RDF RDF is a standard model for data interchange on the Web (see
Research data

Data are facts, observations or experiences on which an argument or theory is based. (see


A system that brings about the link between a persistent identifier and the location where the object is currently situated.

Text- and data mining

The computer-based process of deriving or organising information from text or data. It works by copying large quantities of material, extracting the data, and recombining it to identify patterns, trends and hypotheses or by providing the means to organise the information mined. (see

Trusted digital repository (TDR)

A certified digital repository that has been set up to provide reliable, sustainable access to the data deposited. TDRs may be certified at three levels:

  1. A Basic Certification granted by the DSA (data seal of approval)
  2. An Extended Certification after the repository performs a self-audit in accordance with ISO 16363 (or DIN 31644)
  3. A Formal Certification granted on top of an Extended Certification after an additional external audit and certification according tot ISO 16363 or DIN 3164410



A unique and stable identifier (persistent identifier) that ensures that a digital object can be permanently found on the World Wide Web, regardless of changes in URL where the object is found. A central registry (see ensures that the user of a URN will be referred to its current location. This persistent identifier is based on the Uniform Resource Name (URN), the National Bibliographic Number (NBN), a land code and a unique string.