First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.

Sunday, 15 May 2011

A first analysis of data management

As previously mentioned in this blog, the Sonex workgroup is now try to extend its use case scenario analysis on 'Deposit opportunities into repositories' to the realm of research data. A first meeting held at EDINA on Mar 30th served the purpose of drawing a general picture of the data management landscape.

Stress should be put on the fact that the way of handling SSH and STM data may substantially differ. Given the strong IASSIST-attachment of some Sonex members, the workgroup initial approach to data management may therefore be a bit biased towards procedures in the area of Social Science and Humanities. However, attention will be paid as well to specific ways of dealing with STM datasets as the analysis gets fine-tuned.

Moving along the same lines as we did for research articles, we first try to tackle the ACTIONS scope. Data deposit is certainly an issue, but there's more to data-related processes than just deposit. It's also about Access to data and also about Data Notification/Register.

Next we get on to the WHAT and the WHO. Answer to WHAT? is a data set. Previous analysis by Peter Burnhill shows -at least- three different types of research data (see image below).

Dealing mainly with the data file itself, this data type classification is somewhat narrow for the general picture of data management, so Sonex would rather set a new and more generic data classification for answering the question WHAT is there to deposit:

  • Metadata record

  • Codebook or user guide, where all necessary information is provided to allow for data re-use*

  • Raw data or dataset file(s)

* See a DCMI-based description at: Inter-university Consortium for Political and Social Research (ICPSR). (2009). Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle (4th ed.). Ann Arbor, MI. Section 'Important documentation elements', p. 22

These three elements should ideally be supplied as a single package.

As to the question of WHO performs each data-related operation (Notification-Deposit-Grant access), a handful of running projects within the JISC MRD (phase I) programme should serve to test the different use cases resulting from a double-entry 'Action/What' table as featured below.

Next step as we proceed to further development of this preliminary analysis should be a survey for gathering information on procedures for data handling as carried out in specific JISC MRD projects.