Citing data and data impact

"Data citation is the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to other scholarly resources." - ANDS(1)


   Main points

The publication of data sets is becoming more and more important as a citable contribution to the research curriculum. DataCite is actively supporting this. 

Being able to cite research data is important for:

  • Findability.
  • Giving credit (impact).
    • Giving credit means that you thoroughly document the relation between the data and the researcher who produced it.
    • Researchers kan link their research data to their ORCID-id(2) (ORCID-id (3) is a persistent author identifier).
    • Citing research data is part of the Altmetrics(4) (alternative metrics) movement that states that the impact of your research is determined by the references to a wide range of research output such as data sets, software, blog posts, presentations, tweets, ect. For example by citing code.

FORCE 11(5) promotes data citation and has published a manifest listing a number of data citation principles(6) that have put the significance and the ingredients of data citation on the radar. This type of initiative influences the status quo and helps to create Initiates file downloada culture of data citation.

In the video below the various elements involved with data citation are reviewed.

RDNL video concerning data citation; select HD-quality for the best viewing experience.


A table listing the advantages of data citation for the short and the long term is derived from 'Datacite Implementation Recommendations'(7), written by the Datacite Taskforce.

Short-term advantages Long-term advantages
Easy to locate data Creates a publication structure that enables long-term availability of data
Easy to reuse and verify data Data citation makes it easier to discover and locate data
Makes it easer to grant credits to the rightful data producer Possible justification for granting subsidies
Promotes reproducible research The impact of data sets and dat producers can be measured

Persistent identifiers

In order to enable citation of data sets, they must have a persistent identifier. A persistent identifier is a unique label that is linked to a digital object. As a result, that object will always be able to be found, even if the name and place change. This prevents broken links or a page not found from occurring. Thanks to a persistent identifier an object will always be uniform and will always be able to be found and referred to (citable).

There are several persistent identifier systems, e.g. URN, PURL, ARK and DOI. Depending on the goal, an object can be assigned various persistent identifiers.


In this framework we repeat what was discussed in the video about data citation.

A DOI (Digital Object Identifier) is especially suitable to make a digital object citable and is only assigned to objects that are managed for the long term and remain accessible. DOIs are already frequently being used in scientific literature in order to link to an article in a journal. By assigning a DOI to a data set, the origin will be traceable and citable. 

DOIs are increasingly being accepted as the preferred persistent identifier for data citation purposes. This is demonstrated by the fact that other systems that use other persistent identifiers are now also offering DOIs. Dataverse Network initially only offered Handle and is now switching to(8) DOIs. DANS offers DOIs in addition to URNs.

DOI Structure

The DOI consists of two parts:

  • a prefix that consists of the number '10' followed by 4 or more numbers;
  • and a suffix;
  • which are seperated by a slash. 

The identification code in the prefix stands for the person who registered the data set. The forward slash is followed by the identifier for the data set.

Example of a DOI: 10.4121/uuid:c1ac7344-1419-4398-ba13-c757551c303f.


DOIs are registered via DataCite and in the Netherlands via DataCite Netherlands.(9) A researcher receives a DOI for his data set as soon as he deposits his data set in a data archive from one of DataCite's clients. Subsequently, the institution registers the DOI for the data set that the institution itself archives. It is not possible to register a DOI as an individual researcher. This is part of DataCite's general policy.

When a DOI is registered, a minimal set of metadata must be provided. All mandatory, optional and recommended metadata are described in in the DataCite Metadata Scheme.(10) All granted metadata are stored in the DataCite Metadata Store(11) and can therefore be searched for.


DataCite offers advice on(12) how you must cite a data set if you include it in a publication. The recommended style of citation is:

Creator (PublicationYear): Title. Publisher. Identifier

This dataset(13)  in 4TU.Centre for Research Data, for instance, would look like this: 

   Sources and additional reading

Click to open/close


  1. ANDS. Data citation awareness. Retrieved from
  2. ORCID. (2013, June 17). Connecting research datasets and researchers: ORCID use cases and integrations. [blog]. Retrieved from
  3. ORCID-ID. Retrieved from
  4. Altmetrics: a manifesto. (2010). Retrieved from
  5. FORCE 11. Retrieved from
  6. FORCE 11. (2013). Data citation principles. Retrieved from
  7. DataCite Task Force. (2013). DataCite Implementation Recommendations. A report of the DataCite Task Force. Retrieved from
  8. The Dataverse Network Project. (2013). DOIs - coming to a dataverse near you. Retrieved from
  9. DataCite Netherlands. Retrieved from
  10. Datacite. Datacite metadata schema v 3.0. Retrieved from
  11. Datacite. Datacite metadata store. Retrieved from
  12. Datacite. Why cite data? Retrieved from
  13. Keen, A.S. (2011). Erosive Bar Migration Using Density and Diameter Scaled Sediment Erosive Profile Set-Prototype Scale (Actual Scal 1:10). TU Delft.

Additional reading