Researchers are very eager to safely store their data. | Renate Mattiszik
Where and how do researchers best store their research data during their research project? How can they best deal with backups and version management? How can they exchange research data with others? How can they protect research data against accidental loss and against unauthorised manipulation? In this section, we give a general overview of the possibilities.
The challenges of data storage
The two infographic The evolution of data storage (GoCanvas, 2014) provides good insight into the transience of storage media, the carriers of information. Perhaps a researcher once thought he or she was doing a good job by backing up the research data on a USB stick, but how long will these storage media still exist? Will you be able to retrieve the data stored on such a stick later on? For example, not all laptops still have a USB port. And if a researcher does succeed in retrieving the data from such a stick, can it still be read by the software which is used at that moment? And how do you prevent data from being lost altogether? There are plenty of data horror stories that clearly illustrate the risk of data loss (Pinboard, n.d.).
Research data can become unreadable in roughly two ways:
Loss of bits
The quality of the data carrier deteriorates to such an extent that the bits - the order of zeros and ones - spontaneously change. Informally this is also called bit rot. The loss of bits can, for example, occur due to a virus, fire, the accidental deletion of files, or the loss of files, but spontaneous bit rot also occurs over time.
To make sure that the order of zeros and ones remains intact, you can take the following measures (Netwerk Digitaal Erfgoed (n.d.), in Dutch):
- Maintaining on-site and off-site backups;
- Regularly performing a virus check;
- Copying files to new storage media;
- Regularly checking the data integrity with a checksum (Digital Preservation Handbook, n.d.).
Loss of rendering capability
Research data can no longer be rendered and displayed if the appropriate combination of the operating system, the hardware and the application no longer exists, can no longer be used or cannot be imitated. The following measures, for example, can be taken to limit the risk of the loss of rendering capability:
- Store data in open data formats;
- Store the software which was used or developed together with the documentation;
- Mimic outdated software and hardware environments so that old files can still be used. The latter strategy is called emulation and is a lot more complicated and expensive than the previous two.
If you want to keep data readable and usable during research, it is important to think carefully about a storage strategy. The following questions are important:
- How large is the dataset?
- Is it about 'active' data?
- For what period of time should the dataset be stored?
- Should the software also be stored?
- Is it privacy-sensitive or confidential data?
- Who needs access when? Are these datasets that several researchers from several institutions should be able to work on?
- How often should the data be backed up?
- What precautions should be taken to protect the data against loss?
- Does the data have to be encrypted?
CESSDA has made an elaborate overview of the advantages and disadvantages of different types of storage solutions (CESSDA (n.d.a.)).
Options for data storage during research in the Netherlands
For the storage and backup of individual data during a research project, solutions are available on local (network)drives within most institutions. Often, however, researchers also want to share the data and/or collaborate on the data with others from outside their own institution. The illustration below shows a number of cross-institutional solutions used in the Netherlands, subdivided by the goal that researchers have for the data.
- Storing data
SURFDrive (SURF, n.d.a.) is used by many researchers in the Netherlands for personal storage.
- Working on data togehter
- Figshare for institutions
The University of Amsterdam (UvA) and the Amsterdam University of Applied Sciences (HvA) offer their researchers Figshare (UvA, 2017). Researchers can safely store their research data in a custom-made Figshare environment (Figshare, n.d.) and share it with other researchers during research. Upon completion of their research, they can use the same system to archive and publish their research data.
- Research Drive
In the next section you can read an interview about the implementation of Research Drive from SURF (n.d.b.) at Saxion University of Applied Sciences. With Research Drive, a data steward or principal investigator manages and monitors the project environment, such as managing users, granting rights and permissions, allocating quotas, transferring data and closing the project environment when a research project is completed. These possibilities aren't present in SURFdrive.
DataverseNL (DANS, n.d.) is used, for example, by Avans University of Applied Sciences and several universities in the Netherlands. In a case on the website of the Vrije Universiteit Amsterdam (2019), university lecturer Sander Groffen of the Functional Genome Analysis Department explains how he uses Dataverse to store, share and archive data.
- Figshare for institutions
- Sending data
SURFfilesender (SURF, n.d.c.) is being used by many Dutch researchers for the secure transmission of data
An advantage of the above solutions is that the data is stored in the Netherlands. The GDPR prescribes that personal data may only be stored within the European Economic Area (European Union, 2016). A service such as Dropbox (n.d.), where the data is stored in the U.S., does not meet this requirement.
In addition to these 'national solutions', B2drop (EUdat, n.d.) also offers cloud storage at the European level.
The solutions for long-term storage will be dealt with in chapter IV. You will see that some solutions apply both during and after the research.
Click to open/close
4TU.Center for Research Data (n.d.). Researchers about us. https://researchdata.4tu.nl/en/about-4turesearchdata/researchers-about-us/
Apache (n.d.). Apache Subversion https://subversion.apache.org/
Backlog (2018, 4th of April). Git vs. SVN: Which version control system is right for you? https://backlog.com/blog/git-vs-svn-version-control-system/
CESSDA (n.d.a.). Data Management Expert Guide. Storage. https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/4.-Store/Storage
CESSDA (n.d.b.). Data Management Expert Guide. Data authenticity. https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/3.-Process/Data-authenticity
Digital Preservation Handbook (n.d.). Fixity and checksums. https://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums
DANS (n.d.). DataverseNL. https://dans.knaw.nl/nl/over/diensten/DataverseNL/DataverseNL?set_language=nl
Dropbox (n.d.). https://www.dropbox.com/
EUDAT (n.d.). B2Drop. https://eudat.eu/services/b2drop
European Union (2016). GDPR. https://eur-lex.europa.eu/eli/reg/2016/679/oj
Figshare (n.d.). Discover research from University of Amsterdam / Amsterdam University of Applied Sciences. https://uvaauas.figshare.com/
Git (n.d.) https://git-scm.com/
GitHub (n.d.). https://github.com/
GoCanvas (2014). The evolution of data storage. [Infographic]. https://www.slideshare.net/GoCanvas/historyofdatastor
Tennant, J., Worthington, S., Allard, T, Zumstein, P., Katz, D.S., Morley, A., Druskat, S., Colomb, J., Smith, A., Smith, I., Steiner, T., Vos, R., Förstner, K., Seibold. H., Saretta, A., Mayes, A.C., (2018, December 4). OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source: Third release (Version 3.0.0). Zenodo. http://doi.org/10.5281/zenodo.1937708. Alsol see https://eliademy.com/catalog/oer/module-5-open-research-software-and-open-source.html
Mashable. (2011). The history of digital storage. Mashable Infographics. Retrieved from http://mashable.com/2011/10/08/digital-storage-infographic/
Netwerk Digitaal Erfgoed (n.d.). Leren Preserveren. Bit preservering [cursus'. https://lerenpreserveren.nl/topic/bit-preservering/
Pinboard (n.d.). Data horror stories. https://pinboard.in/u:dsalo/t:horrorstories/t:datacuration
SURF (n.d.a.). SURFdrive. https://www.surf.nl/en/store-and-share-your-files-securely-in-the-cloud-with-surfdrive
SURF (n.d.b.). Research Drive. https://www.surf.nl/en/research-drive-securely-and-easily-store-and-share-research-data
SURF (n.d.c.). SURFfilesender. https://www.surf.nl/en/surffilesender-send-large-files-securely-and-encrypted
SURF (n.d.d.). Science Collaboration Zone Home. https://wiki.surfnet.nl/display/SCZ
SURF (n.d.e.). eduVPN. https://www.surf.nl/en/eduvpn
SURF (n.d.f.). edu.nl. De URL-shortner voor onderwijs en onderzoek met respect voor privacy. https://edu.nl/
UK Data Service (n.d.). Data encryption. https://www.ukdataservice.ac.uk/manage-data/store/encryption
Vrije Universiteit Amsterdam (2019). ‘In Dataverse kan ik mijn data makkelijk opslaan, archiveren en delen’. [Nieuwsbericht] https://www.ub.vu.nl/nl/nieuws-agenda/nieuwsarchief/2019/jan-mrt/in-dataverse-kan-ik-mijn-data-makkelijk-opslaan-archiveren-en-delen.aspx