"Not all data can be open. There maybe funding constraints, where use of data is governed by a pre-existing research agreement. The data may be confidential and as such there may be privacy issues which mean the data cannot be open." - LERU Roadmap for research(1)
A licensing agreement is a legal tool that determines beforehand under which circumstances data files may be used. With a licence, a data archive gains permission to make the data available under certain conditions.
Making data available isn't the same as allowing the research data to be “mined” and reused. In those instances, limitations in a licence can be a large stumbling block. Imagine mixing datasets with other datasets, which are then again mixed with other datasets, etc… If you are obligated to mention the creator of a dataset if you build on their data, this may turn out to be a labour-intensive (and actually impossible) task.
In the SURF report 'Data in public-private projects' ('Data in publiek-private projecten'(2) (Dutch)) and the report 'Legal status of research data'(3) of Knowledge Exchange, the authors recommend:
- Prevent rights from becoming an obstacle in sharing and reusing research data.
- Use codes of conduct instead of the available standard licenses.
- Urge legislators (EU-wide) to prevent obstacles in the sharing and reusing of research data.
One of the possibilities is to grant a so-called Creative Commons licence. CC-licences(6) offer a simple, standard way to share content with permission and under certain conditions. The CC0 Public Domain Dedication licentie(7) was specifically created to clear any legal and technological obstacles for reusing research data.
November 25, 2013, saw the launch(8) of the latest version of the Creative Commons licences. CC 4.0 also includes database rights.(9) This means that a database holder can use a CC 4.0 licence to allow use that could cause issues under database rights.
For now, data archives will continue using licences with restrictions, at least as long as the use of data is limited to certain groups. Privacy protection, (third party) copyrights and patent laws will also continue to form obstacles.
"CC 4.0 licenses may be used for any material that has copyright, database rights, or certain other rights. We still recommend CC0 Public Domain Dedication as the default for scientific data as it essentially removes the legal requirement for attribution thereby making reuse maximally easy and flexible." - BioMed Central(9)
An in-depth look
A report of OpenAIRE(15) offers recommendations regarding the legal aspects of open access e-infrastructures for research data, such as data archives. One of the conclusions is:
"It is fundamental that the databases used by e-infrastructures such as OpenAIRE be made available under licences such as the upcoming version 4.0 of the Creative Commons licences in their entirety, therefore, not only the data, but the databases in themselves. Only in such latter circumstance, activities such as data‐mining of the entire databases and reproduction of their contents will be in accordance with the employed licences."
Judith Gulpers - Bestaan er ook rapporten over hoe om te gaan met 'data issues' bij onderzoek waarbij gebruik gemaakt wordt van data van bv. Bloomberg of WRDS?
Een voorbeeld: onderzoek naar Amerikaanse CEO's met een salaris van 1 dollar per jaar - die gegevens zijn afkomstig uit de databank ExecuComp, maar daarna handmatig gecontroleerd en aangevuld. Wat mag je als onderzoeker dan met die data doen?
Marjo Bakker - Ik weet dan onderzoekers via hun gedragscodes aan attributie zullen doen en ik begrijp ook dat bepaalde licenties bepaald hergebruik in de weg zitten, maar toch lijkt CC0 te conflicteren met data citatie / waardering voor de onderzoeker. Stel een onderzoeker wijst je hierop, dan zeg je:....?