First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.

Thursday, 25 November 2010

SONEX and Research Data: new deposit usecase scenarios

A SONEX meeting was held last Sat Nov 20th at JISC Office in Brettenham House, London. The meeting was intented to produce some feedback on the RFC version of the Technology and Methodology Digital Library Cookbook. SONEX feedback on featured interoperability solutions was mainly focused on enhancing the Sword protocol description in the Cookbook as to cover functionality updates in the new version of Sword.

Richard Jones (SONEX-Symplectic), Balviar Notay (JISC manager for SONEX) and Pablo de Castro (SONEX-Carlos III Univ Madrid) at SONEX meeting in Brettenham House

The second half of the SONEX meeting was devoted to preliminary analysis of deposit into Open Access repositories of raw research data produced either as specific research output or as supplementary material of research publications. Raw data as a further SONEX usecase deposit scenario was already included in the list of issues for the SONEX Bird-of-Feather session held at the Open Repositories Wokshop (OR2010) last July in Madrid, where it was identified as 'the missing piece in the general deposit picture' at the time.

Some deposit-related projects are already running since Jul 2010 along the JISC Deposit Call (JISCdepo), but none of them so far is dealing with deposit of research data. However, dataset handling is already being considered as a forthcoming candidate for Sword-based transfer, and preliminary analysis of this new deposit usecase scenario may well be partially carried out under the SONEX umbrella.

Some of the discussed ideas on research data and their deposit via Sword into repositories follow:

- A JISCdepo meeting will be held in early Mar 2011 as an internal coordination event for JISC Deposit Call projects. It's a good opportunity for SONEX to fine-tune analysis of usecase scenarios at running projects, as well as for sharing potential new deposit usecases arising both from the Kultivate project (digital versions of creative works ie non-textual materials) and the research data-based approach.

- Regarding deposit of research data into repositories, the DRYAD international repository of data underlying peer-reviewed articles in the basic and applied biosciences was highlighted as a pioneering implementation of infrastructure for research data filing and preservation. DRYAD acts as a kind of PubMed Central for research data – with an equivalent mandate by a group of 50 journals (so far) to their authors for depositing publication-related research data into this specific repository (besides archiving them in their IR or with the publisher).

- The JISC-funded DRYAD UK project was also discussed. DRYAD UK, currently being developed within the JISC Managing Research Data (JISCMRD) programme, is planning to expand Dryad into the UK by both establishing a UK mirror site and extending service to new publishers and disciplines.

- A JISC Managing Research Data Programme (JISCMRD) International Workshop will be held in Mar 2011 for analysis and evaluation of outputs and progress of the JISCMRD Programme. There will be a place in the Workshop programme for issues related to research data, such as citation, deposit and metadata/identifier exchange with publishers. SONEX is expected to bring in some input into some of those subjects.

- Regarding creation of research data management infrastructure for collection, digital organization, metadata annotation and controlled sharing of datasets, the ADMIRAL project (A Data Management Infrastructure for Research Across the Life sciences) was identified as the main presently running initiative to be followed. DataPac, an idea for a standard data shipping container for submitting research data with identifier and other information in RDF and HTML formats, was mentioned too as a potential complementary infrastructure to ADMIRAL.

In terms of SONEX deposit usecase analysis, deposit of research data poses a double usecase framework,

  • R2R usecase scenario (IR to DRYAD, other)

  • Publisher to repository usecase scenario

as well as a set of Sword-related procedural issues to be checked from a SONEX perspective, such as:
  • metadata-related issues – very case-specific and different from metadata standards being used for research papers (previous work on the subject done by JISCMRD MRDonto Group: “Metadata for Datasets: Identifiers and Ontologies”)

  • SONEX should definitely NOT get into identification schemas for datasets – DOIs should do for identification purposes

  • issue of attached file sizes – should deposit by reference be considered instead/besides binary data transfer?

  • At what point along the publication lifecycle should dataset deposit take place? Picturing the process via workflow diagrams would help

  • How should Sword deal with this particular deposit usecase?

Some interesting examples of international initiatives dealing with dataset management are also being examined by SONEX, such as:

- PANGAEA [Germany]: Publishing Network for Geoscientific & Environmental Data, see example dataset with attached DOI

- [Dutch] NARCIS (National Academic Research and Collaborations Information System) FAQ page contains info on handling datasets.

Further references on submission of research data to repositories: