Data processing is an umbrella term for the transformations that research data can undergo during the various stages. In this section we will examine the transformations from the moment the research data is included in a data archive. Reference Model for an Open Archival Information System OAIS(1) knows three stages for data sets that are included in a data archive:
- Submission Information Package (SIP).
- Archival Information Package (AIP).
- Dissemination Information Package (DIP).
Before, during and after these three stages, data sets can be subject to transformation. For instance:
- Conversion from one data format to another (for example durable) data format.
- Reorganization of folders and files.
- Adding (extra) metadata and a persistent identifier.
- Zipping and compressing a data set so that it takes up less storage space.
- Data interaction where, for example, you can download part of a data set based on a query
Case from SIP to DIP
Movie is in Dutch; Select HD-quality for the best viewing experience
The image below image is an example of data processing after the data set has been deposited and before it is offered to users. On the left-hand side you will see the files as the data depositor uploaded them into EASY. On the right-hand side you will see how a DANS data manager re-organizes the files before he makes them available to EASY users:
- The pictures are no longer archived separately but are collected in the folder 'Photos'.
- The Excel file has been converted to .csv. This preferred format can easily be opened as text or as a table.
An in-depth look
- Read this article(3) that compares the data deposit practices of sixteen data archives.
Do you have any examples of data that has undergone a transformation? What was the result? Do you have any other observations? If so, please share them in the comments.