Data processing is an umbrella term for the transformations that research data can undergo during the various stages. In this section we will examine the transformations from the moment the research data is included in a data archive. Reference Model for an Open Archival Information System OAIS(1) knows three stages for data sets that are included in a data archive: 

  • Submission Information Package (SIP).
  • Archival Information Package (AIP).
  • Dissemination Information Package (DIP).

Before, during and after these three stages, data sets can be subject to transformation. For instance: 

  • Conversion from one data format to another (for example durable) data format.
  • Reorganization of folders and files.
  • Adding (extra) metadata and a persistent identifier.
  • Zipping and compressing a data set so that it takes up less storage space.
  • Data interaction where, for example, you can download part of a data set based on a query
  • ....

    Case from SIP to DIP

Stage 1

  • Deposit

Below is a short film about depositing an archaeological data set in EASY, DANS' online archiving system. 70% of the archaeological data sets in EASY are available via open access(2) (in Dutch).

Movie is in Dutch; Select HD-quality for the best viewing experience

Stage 2

  • Archiving

The image below image is an example of data processing after the data set has been deposited and before it is offered to users. On the left-hand side you will see the files as the data depositor uploaded them into EASY. On the right-hand side you will see how a DANS data manager re-organizes the files before he makes them available to EASY users: 


  • The pictures are no longer archived separately but are collected in the folder 'Photos'.
  • The Excel file has been converted to .csv. This preferred format can easily be opened as text or as a table.

Stage 3

  • Presentation and Availability

Above-mentioned data set can be found here.

  An in-depth look

  • Read this article(3) that compares the data deposit practices of sixteen data archives.


