Data Archives and Repositories

What are data archives and repositories?

Data archives and (data) repositories belong to the basic information- and communication technology that makes research data available to the global scientific community or even beyond to the general public. There is no distinct definition of the terms data archive and repository, often these terms are used synonymously. We use repository as the broader term here. Repositories which refer to themselves as data archive (e.g. PsychData, ICPSR, UK data archive) have a tendency to follow a more systematic and long-term approach of data storing.

Characteristics of Repositories

The DGPs defined six criteria in their Recommendations on „Data Management in Psychological Science:  Specification of the DFG Guidelines”  which help to choose trustworthy repositories (Schönbrodt, Gollwitzer, Abele-Brehm, 2016, p.3):

  • Economic and ideological autonomy and scientific professionalism of the institutional provider;

  • Persistence of data: Long-term data storage (at least 10 years, ideally substantially longer) must be guaranteed; there should be a protocol describing what happens to the data in case the repository ceases to exist;

  • Accessibility of data: It must be possible to retrieve data openly and freely; however, defining access restrictions (in terms of “Scientific Use Files”) should also be possible (for a discussion of optional access restrictions see section 5);

  • Identifiability of data: There must be a persistent data identifier (e.g. a persistent URL or, if possible, DOI);

  • Clarification of data property rights: Storing data must not imply ceding exclusive rights of use to third parties (however, simple rights of use, i.e. the right to archive and copyright, must be conferred to the operator of the repository);

  • The option to store data publicly as well as non-publicly.

Moreover, while some repositories like figshare or the open science framework do not actively curate data that is deposited with them, other repositories like Dryad or PsychData ensure validity of files and metadata through an active curation process. Additionally, repositories differ on the conditions of data use. For example, archives like PsychData or the UK Data Archive stipulate terms and conditions of access and usage on their data that prohibit explicitly, among others, re-identification of subjects or commercial data usage.

No such conditions apply if you upload your data on repositories that assign a CC0 license (see also the knowledge base’s section on licenses) to their data and have no additional end-user-license (e.g. figshare or Dryad). In these case, there are no restrictions on commercial usage, redistribution or re-identification (other than imposed by data protection laws) of subjects for subsequent data users. You should bear this in mind when writing your informed consent. To support you in finding an appropriate archive for your data, the registry tool of research repositories ( re3data) can be used.

Further Resources