Storing data

   Main points

Where and on which medium will a researcher store research data? How are backups and version control used? (see box) In this section the possibilities are outlined.

Storage media

Information requires an information carrier – a storage medium. Over time we have learned that storage media quickly become outdated (see infographics  (1) en (2)). Maybe a researcher thinks the best option to store data is a backup on a USB flash drive, but how long will they be around? Will it be possible to get information from it in the future? Will laptops (or equal) still have a USB port? Will it be possible to open data stored in a certain format with the software that will be available at that time? 

Storage strategy

If you want to keep data readable and usable for a long period of time, you have to carefully think about a strategy. UK Data Archive has included the following points(3) on its list with data storage best practices

Store data in an open format, unattached to a specific software supplier (also see preferred formats).

Use a data storage strategy in which two different types of storage media are used (for instance CD and hard disk), even for a short-term project. 

Copy or migrate data to new storage media every two to five years. Storage media degrade and will be impossible to open with the hardware and software used in the future.

Never rewrite an old backup with a new one. It is better to make an entirely new backup of changed files.

Regularly check the data integrity, for instance with a checksum checker.(4)

Organize and document research data. Make digital versions of paper data documentation in a PDF/A format (suitable for long-term storage).

Short-term storage

There are roughly three options available for short-term storage and backup: 

  • On own PC or laptop.

    If a researcher works on his or her own PC or laptop, then that is where the master file is: the file that is used when the data is entered. 
    The backup is the file that is used to restore data when the master file is lost, damaged, accidentally deleted or wrongly changed. 
    Make regular backups of your master file on a USB flash drive, DVD, CD or external hard disk (disk storage). 

    Researchers often have several workstations. They work on the lab PC or on their own laptops when they are at home or on the road and share their research data through cloud services. Of course it is possible to copy files from one computer onto another one. However, that does mean that you will have to copy them by hand and it is very easy to lose track of the latest version of your file (see also version management). In this case file synchronization software offers a solution (e.g. Syncback.(5)).
     
  • Through central storage services (network storage) at the institute the researcher works for. 

    If a researcher uses the institute's network storage facilities, making a backup has often been arranged for. Often there are also restore possibilities in place, offering the possibility to return to older data versions. 
    Some research groups install their own NAS server, which is in fact an external hard disk with network facilities. Such a NAS server can be linked to a computer network and from that moment on every device linked provides access to your files. All PCs share the same backup server. Setting up such a NAS server requires expert knowledge. 
  • On cloud storage services with synchronisation facilities such as SURFdrive(6) or Dropbox(7).

    With the emergence of cloud services the term 'Master file' gradually loses its meaning. With the cloud services you can store your data in the cloud and share and synchronize it across different devices. 
    • SURFdrive: is a personal cloud storage service for the Dutch education and research community, offering staff, researchers and students an easy way to store, synchronize and share files in the secure and reliable SURF community cloud. Users get 100 GB data storage capacity, and can access their files at all times from any location by means of offline synchronization. Users can also grant guest users access to their personal files. All data transmitted over the networks is always encrypted.

    • Dropbox: Dropbox is also a cloud-based service which is widely used for sharing small data between scientists. A program like Dropbox can easily be downloaded on your PC. All changes you make will automatically be stored online. If you change the online document from another computer, those changes will also be stored on your PC when you turn it on (and there is internet access). The disadvantage of global services like Dropbox is that you do not know if data is safe and whether someone reads your files or not. For this reason several Dutch research organisations prefer SURFdrive to Dropbox.

    • SURFSpace: You can also make your own cloud storage (Dutch).(8)

The table below lists the various possibilities - with their pros and cons. The table has been reproduced from the 'Data management plan template' by Wageningen University.(9)

Storage Solutions Advantages Disadvantages Suitable for
Personal Computer & Laptop

Always available

Portable

Drive may fail

Laptop may be stolen

Temporary storage

Networked drives

File servers managed by your university, research group or facilities like a NAS-server

Regularly backed up

Stored securely in a single place

Costs

Master copy of your data

(if enough storage space is provided ..)

External storage devices

USB flash drive, DVD/CD, external hard drive

Low cost

Portability

Easily damaged or lost Temporary storage
Cloud services

Automatic synchronization between folders and files

Easy to access and use

It's not sure whether data security is taken care of

You don't have direct influence on how often backups take place and by whom

Data sharing

 

Mid-term storage

SURFsara offers the BeeHub service for mid-term storage and sharing of large amounts of data. Affiliates of one of the Dutch Universities or Grand Technology Institutes can use the storage capacities on BeeHub to store and share their research data. Unlike SURFdrive, BeeHub does not have synchronization facilities. However, periodic backups are made from the data that is stored on BeeHub. The first 100 GB storage space is granted to users for free. If more than 100 GB of storage space is required, the user should apply for it through a SURFsara resource request form (https://e-infra.surfsara.nl/).

Long-term storage

Sometimes research projects can take many years and researchers need long-term storage to store and backup their data. The SURFsara Data Archive allows the user to safely store up to petabytes of valuable research data. The Data Archive uses tape library technology to store data sets for the long term and allows access at any time. It provides bit-wise preservation and periodical backups in two locations in the Netherlands. 

Version management

If data is worked on continually, it is useful to introduce some kind of version management to be able to properly follow the changes. The easiest way of version management is by adding a number at the end of a file after every important change, e.g. experiment_021213_v2.doc.

Version management can also be used within a single file. In the section Data documentation you can read a case (see tab 'version management') in which a researcher includes version management in her data files by adding a 'version management' tab.

Some programs have their own automated version management. On the right is an example of the Dropbox program.  

If research is not too complicated, the methods mentioned above are excellent way to manage versions. If a researcher often works with other people on data and/or the same dataset is continuously edited, version management software such as Git(10) (also used in Github.(11)) might be a solution.

   Sources

Click to open/close
  1. Mashable. (2011). The history of digital storage. Mashable Infographics. Retrieved from http://mashable.com/2011/10/08/digital-storage-infographic/
  2. Mozy. (2011). The past, present and future of data storage. Retrieved from http://mozy.com/infographics/the-past-present-and-future-of-data-storage/
  3. UK Data Archive. (2011). Managing and sharing data. Retrieved from http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
  4. National Archives of Australia. Checksum Checker. Retrieved from http://checksumchecker.sourceforge.net/ 
  5. Dropbox. Retrieved from https://www.dropbox.com/
  6. SURFdrive. Retrieved from https://www.surfdrive.nl/en
  7. 2BrightSparks. Syncback: backup software. Retrieved from http://www.2brightsparks.com/syncback/
  8. Vanderfeesten, M. Maak je eigen cloudopslag. Retrieved from https://www.surfspace.nl/artikel/1151-maak-je-eigen-cloudopslag/
  9. Wageningen Universiteit. Data Management Plans. Retrieved from http://www.wageningenur.nl/en/Expertise-Services/Data-Management-Support-Hub/Browse-by-Subject/Storage-solutions.htm (zie het DMP Template)
  10. Git, fast version control. Retrieved from git-scm.com
  11. Github. Retrieved from https://github.com/

Additional reading

   Your additions

Do you have examples of reliable ways to store data and do backups? Do you have tips on how to use version management? Or do you have other comments on this section? Let us know and post a comment below.


botMessage_toctoc_comments_9210