Please select a page template in page properties.

Storing data

Researchers are very eager to safely store their data. | Renate Mattiszik

Where and how do researchers best store their research data during their research project? How can they best deal with backups and version management? How can they exchange research data with others? How can they protect research data against accidental loss and against unauthorised manipulation? In this section, we give a general overview of the possibilities.

The challenges of data storage

The two infographic The evolution of data storage (GoCanvas, 2014) provides good insight into the transience of storage media, the carriers of information. Perhaps a researcher once thought he or she was doing a good job by backing up the research data on a USB stick, but how long will these storage media still exist? Will you be able to retrieve the data stored on such a stick later on? For example, not all laptops still have a USB port. And if a researcher does succeed in retrieving the data from such a stick, can it still be read by the software which is used at that moment? And how do you prevent data from being lost altogether? There are plenty of data horror stories that clearly illustrate the risk of data loss (Pinboard, n.d.). 

Research data can become unreadable in roughly two ways:

Loss of bits

The quality of the data carrier deteriorates to such an extent that the bits - the order of zeros and ones - spontaneously change. Informally this is also called bit rot. The loss of bits can, for example, occur due to a virus, fire, the accidental deletion of files, or the loss of files, but spontaneous bit rot also occurs over time.

To make sure that the order of zeros and ones remains intact, you can take the following measures (Netwerk Digitaal Erfgoed (n.d.), in Dutch):  

  • Maintaining on-site and off-site backups; 
  • Regularly performing a virus check;
  • Copying files to new storage media;
  • Regularly checking the data integrity with a checksum (Digital Preservation Handbook, n.d.).

Loss of rendering capability

Research data can no longer be rendered and displayed if the appropriate combination of the operating system, the hardware and the application no longer exists, can no longer be used or cannot be imitated. The following measures, for example, can be taken to limit the risk of the loss of rendering capability:

  • Store data in open data formats;
  • Store the software which was used or developed together with the documentation;
  • Mimic outdated software and hardware environments so that old files can still be used. The latter strategy is called emulation and is a lot more complicated and expensive than the previous two.

Storage strategy

If you want to keep data readable and usable during research, it is important to think carefully about a storage strategy. The following questions are important:  

  • How large is the dataset?
  • Is it about 'active' data?   
  • For what period of time should the dataset be stored?  
  • Should the software also be stored? 
  • Is it privacy-sensitive or confidential data?
  • Who needs access when? Are these datasets that several researchers from several institutions should be able to work on?
  • How often should the data be backed up? 
  • What precautions should be taken to protect the data against loss? 
  • Does the data have to be encrypted? 

CESSDA has made an elaborate overview of the advantages and disadvantages of different types of storage solutions (CESSDA (n.d.a.)).

Options for data storage during research in the Netherlands

For the storage and backup of individual data during a research project, solutions are available on local (network)drives within most institutions. Often, however, researchers also want to share the data and/or collaborate on the data with others from outside their own institution. The illustration below shows a number of cross-institutional solutions used in the Netherlands, subdivided by the goal that researchers have for the data.

  • Storing data 
    SURFDrive (SURF, n.d.a.) is used by many researchers in the Netherlands for personal storage.
  • Working on data togehter
    • Figshare for institutions
      The University of Amsterdam (UvA) and the Amsterdam University of Applied Sciences (HvA) offer their researchers Figshare (UvA, 2017). Researchers can safely store their research data in a custom-made Figshare environment (Figshare, n.d.) and share it with other researchers during research. Upon completion of their research, they can use the same system to archive and publish their research data.
    • Research Drive
      In the next section you can read an interview about the implementation of Research Drive from SURF (n.d.b.) at Saxion University of Applied Sciences. With Research Drive, a data steward or principal investigator manages and monitors the project environment, such as managing users, granting rights and permissions, allocating quotas, transferring data and closing the project environment when a research project is completed. These possibilities aren't present in SURFdrive.
    • DataverseNL
      DataverseNL (DANS, n.d.) is used, for example, by Avans University of Applied Sciences and several universities in the Netherlands. In a case on the website of the Vrije Universiteit Amsterdam (2019), university lecturer Sander Groffen of the Functional Genome Analysis Department explains how he uses Dataverse to store, share and archive data.
  • Sending data
    SURFfilesender (SURF, n.d.c.) is being used by many Dutch researchers for the secure transmission of data

 

An advantage of the above solutions is that the data is stored in the Netherlands. The GDPR prescribes that personal data may only be stored within the European Economic Area (European Union, 2016). A service such as Dropbox (n.d.), where the data is stored in the U.S., does not meet this requirement.  

In addition to these 'national solutions', B2drop (EUdat, n.d.) also offers cloud storage at the European level.

The solutions for long-term storage will be dealt with in chapter IV. You will see that some solutions apply both during and after the research.

In the spotlight


Course to teach researchers how to store and share their software code

Module 5 of the Open Science MOOC teaches researchers to save and share their software code in three steps (Tennant, 2018). 

Tips for version control

If you are constantly working on the data, it makes sense to introduce a form of version control that allows you to keep track of the changes. The simplest way of versioning is to add a number to the end of a file after every important change. For example, experiment_021213_v2.doc.

Some programs and virtual research environments have their own automatic form of version control. When working with code/software, it is useful, for example, to use a tool such as GitHub (n.d.), Git (n.d.) of SVN (Apache, n.d.). The Backlog weblog contains a comparison between Git and SVN (Backlog, 2018).

Need more tips? 

Tips to keep data safe

Data security is defined as the set of measures to protect personal data and confidential information against unauthorised manipulation or deletion of files (intentional or unintentional). Keeping data safe is usually done on several levels:

  • Policy
    Almost every organisation has an information security policy.
  • Organisational measures such as:  
    • Assign responsibilities;
    • Register who has access to the data. For example, use a tool such as SURF Research Access Management (SURF, n.d.d.)  to manage access to research data or applications. 
  • Technical measures such as:
    • Use a firewall to protect your PC against viruses and perform a virus scan on a regular basis. This is usually done at the central level of the institution; 
    • Always install operating system and software updates;
    • Only use secure wireless networks. Or use EduVPN from SURF (SURF, n.d.e). It allows you to surf securely on unsecured wireless networks; 
    • Use Edu.nl (SURF. n.d.f.) to create secure short URLs;
    • Manage access to a file with a password;
    • Encrypt the files. Encryption makes files unreadable for those who don't have the 'key'. Commonly used encryption tools are showcased on the website of UK Data Service (n.d.);
    • Never send sensitive data via e-mail or FTP, but use for example SURFfilesender (n.d.c.). Files can be sent with encryption;
    • Lock the computer if it is left alone, even if only for a moment. (Windows key + L or for Mac: ⌘+ Control + Q);
    • Avoid overwriting or deleting a file by making it 'read only'. 

More tips can be found in the CESSDA Data Management Expert Guide (n.d.c).


Sources

Click to open/close

4TU.Center for Research Data (n.d.). Researchers about us. https://researchdata.4tu.nl/en/about-4turesearchdata/researchers-about-us/

Apache (n.d.). Apache Subversion https://subversion.apache.org/

Backlog (2018, 4th of April). Git vs. SVN: Which version control system is right for you? https://backlog.com/blog/git-vs-svn-version-control-system/

CESSDA (n.d.a.). Data Management Expert Guide. Storage. https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/4.-Store/Storage

CESSDA (n.d.b.). Data Management Expert Guide. Data authenticity. https://www.cessda.eu/Training/Training-Resources/Library/Data-Management-Expert-Guide/3.-Process/Data-authenticity

Digital Preservation Handbook (n.d.). Fixity and checksums. https://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums

DANS (n.d.). DataverseNL. https://dans.knaw.nl/nl/over/diensten/DataverseNL/DataverseNL?set_language=nl

Dropbox (n.d.). https://www.dropbox.com/

EUDAT (n.d.). B2Drop. https://eudat.eu/services/b2drop 

European Union (2016). GDPR. https://eur-lex.europa.eu/eli/reg/2016/679/oj

Figshare (n.d.). Discover research from University of Amsterdam / Amsterdam University of Applied Sciences. https://uvaauas.figshare.com/

Git (n.d.) https://git-scm.com/

GitHub (n.d.). https://github.com/

GoCanvas (2014). The evolution of data storage. [Infographic]. https://www.slideshare.net/GoCanvas/historyofdatastor

Tennant, J., Worthington, S., Allard, T, Zumstein, P., Katz, D.S., Morley, A., Druskat, S., Colomb, J., Smith, A., Smith, I., Steiner, T., Vos, R., Förstner, K., Seibold. H., Saretta, A., Mayes, A.C., (2018, December 4). OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source: Third release (Version 3.0.0). Zenodo. http://doi.org/10.5281/zenodo.1937708. Alsol see https://eliademy.com/catalog/oer/module-5-open-research-software-and-open-source.html

Mashable. (2011). The history of digital storage. Mashable Infographics. Retrieved from http://mashable.com/2011/10/08/digital-storage-infographic/

Netwerk Digitaal Erfgoed (n.d.). Leren Preserveren. Bit preservering [cursus'. https://lerenpreserveren.nl/topic/bit-preservering/

Pinboard (n.d.). Data horror stories. https://pinboard.in/u:dsalo/t:horrorstories/t:datacuration

SURF (n.d.a.). SURFdrive. https://www.surf.nl/en/store-and-share-your-files-securely-in-the-cloud-with-surfdrive

SURF (n.d.b.). Research Drive. https://www.surf.nl/en/research-drive-securely-and-easily-store-and-share-research-data

SURF (n.d.c.). SURFfilesender. https://www.surf.nl/en/surffilesender-send-large-files-securely-and-encrypted

SURF (n.d.d.). Science Collaboration Zone Home. https://wiki.surfnet.nl/display/SCZ

SURF (n.d.e.). eduVPN. https://www.surf.nl/en/eduvpn

SURF (n.d.f.). edu.nl. De URL-shortner voor onderwijs en onderzoek met respect voor privacy. https://edu.nl/

UK Data Service (n.d.). Data encryption. https://www.ukdataservice.ac.uk/manage-data/store/encryption

Vrije Universiteit Amsterdam (2019). ‘In Dataverse kan ik mijn data makkelijk opslaan, archiveren en delen’. [Nieuwsbericht] https://www.ub.vu.nl/nl/nieuws-agenda/nieuwsarchief/2019/jan-mrt/in-dataverse-kan-ik-mijn-data-makkelijk-opslaan-archiveren-en-delen.aspx