First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.






Sunday 13 March 2011

Strategies for research data deposit in ongoing data management projects


  Prior to start performing pattern analysis for research data deposit into (institutional or subject-based) data repositories –whether or not open access– first step by Sonex is to scope ongoing projects dealing with that kind of deposit, as well as already closed projects which supplied relevant guidelines on the subject. A list of projects working on data management follows, with their specific approach on how to deal with actual data deposit as taken from project blogs:


TARDIS (Monash University–Australian National Data Service).
“There is a pressing need for the archival and curation of raw X-ray diffraction data. However, the relatively large size of these datasets has presented challenges for storage in a single worldwide repository. This problem can be avoided by using a federated approach, where each institution or university utilizes its institutional repository”.


ADMIRAL: A JISC-funded data management infrastructure for research across the life sciences.
"The purpose of the ADMIRAL Project is to create a two-tier federated data management infrastructure for use by life science researchers, that will provide services (a) to meet their local data management needs for the collection, digital organization, metadata annotation and controlled sharing of biological datasets; and (b) to provide an easy and secure route for archiving annotated datasets to an institutional repository, The Oxford University Data Store, for long-term preservation and access, complete with assigned Digital Object Identifiers and Creative Commons open access licences".
(See Oxford University Library Services' Databank)


XYZ Project. “The XYZ Project will create a demonstrator of a new workflow for publishing data in support of full-text. The author prepares data for publication (if possible with validation) in a third-party trusted repository before the paper is submitted to a publisher. Our software will manage the deposition, release to reviewers, dis-embargo and for conventional publication or as a data journal. Two Open Access publishers (International Union of Crystallography and BioMed Central) are engaged with the project and will test the new workflow”.
Anticipated Outputs and Outcomes: A demonstrator repository hosted by the IUCr.


FISHnet: Freshwater information sharing network. “This project will allow researchers in multiple academic, governmental and voluntary-sector institutions to share their data. Data will be held securely in a sustainable subject repository which preserves and disseminates multiple datasets as part of the FreshwaterLife.org information portal. Data creators will be able to manage access rights to their content, from Open Access to sharing with trusted colleagues”.


DMBI: Data Management in Bio-Imaging. “The quantity of data generated by modern high-throughput bio-imaging systems presents a significant challenge in both data management and processing. Furthermore, there is no explicit system/way to record the processing algorithms and parameters that are used to produce results. Thus there is no strong link between images, software and results. This projects aims to address these issues”.
Anticipated Outputs and Outcomes: Build a prototype DMBI system around OMERO.


CaiRO: Curating Artistic Research Output. “No prominent subject-based repository exists to act as the custodians of arts practice-as-research data. Where institution provision for data management is in place (for instance, an institutional repository service) the arts researcher-practitioner cannot always rely on an understanding of the special nature of arts research data. More commonly, data is retained in departmental collections, built and maintained by small teams which often include researchers themselves”.


BRIL: Biophysical Repositories in the Lab. “The BRIL project aims to enhance the repository facilities at the Randall Division of Cell and Molecular Biophysics at King’s College London. This will involve:
» Embedding the repository within the researchers’ day-to-day research and experimental practices;
» Integrating the repository into the wider King’s infrastructure”.
Example of KCL “internal” repository: Mutation Testing Repository.


ADS+: Enhancing and Sustaining the Archaeology Data Service digital repository. The project aims to “Increase the sustainability of the ADS, by implementing Fedora (Flexible Extensible Digital Object Repository Architecture). This is a world-leading open source digital repository application which will allow the automation of many ADS curatorial functions, according to the Open Archival Information System (OAIS) Reference Model (ISO 14721:2003). This will help ensure the long term preservation of all ADS digital archives, as well as making the ADS archival procedures more cost-effective”.


IDMB (Institutional Data Management Blueprint) Project, U. Southampton.
The project’s aims are to provide the University of Southampton with a ten-year roadmap for delivery of a comprehensive data management infrastructure.

[IDMB Recommendations] The data management audit and gap analysis indicates where improvements can be made in the short, medium and long-term to improve data management practices and capabilities at the University. The following preliminary recommendations are put forward for short (one year), medium (one to three years), long (more than three years) term action.
[Short Term (1 year)] Crucial to supporting researchers is the consolidation of data management into a coherent framework that is easy to understand, use, and has a sustainable business model behind it. A number of major recommendations are put forward here for the short-term:
Create an institutional data repository
• Develop a scalable business model
• One-stop shop for data management advice and guidance


MaDAM: Pilot data management infrastructure for biomedical researchers at University of Manchester.
A pilot infrastructure for Biomedical Researchers at the University of Manchester, which covers data capture, data storage and data curation. This infrastructure comprises procedural support, hardware and software.
[18/03/2010] The development team have built a prototype data management front end which fits a generic set of needs amongst our Life Sciences researchers. It is aimed at being flexible enough to allow researchers themselves to assign attributes (i.e. metadata) to their experiments and datasets for them to be usefully categorised and tagged. The prototype is also entirely dispensable and intended as a catalyst for feedback from our use cases on their specific functionality requirements.


DISC-UK DataShare Project. The DISC-UK DataShare project, led by EDINA National Data Centre and the Edinburgh University Data Library, with partners at the Universities of Southampton and Oxford, has advanced the current provision of repository services for accommodating datasets in the UK.
Key conclusions: 1) Data management motivation is a better bottom-up driver for researchers than data sharing but is not sufficient to create culture change, 2) Data librarians, data managers and data scientists can help bridge communication between repository managers & researchers, 3) Institutional repositories can improve impact of sharing data over the internet.

1 comment: