Publishing descriptions of non-public clinical datasets: proposed guidance for researchers, repositories, editors and funding organisations

Hrynaszkiewicz, Iain; Khodiyar, Varsha; Hufton, Andrew L.; Sansone, Susanna-Assunta

doi:10.1186/s41073-016-0015-6

Methodology
Open access
Published: 22 June 2016

Publishing descriptions of non-public clinical datasets: proposed guidance for researchers, repositories, editors and funding organisations

Iain Hrynaszkiewicz¹,
Varsha Khodiyar²,
Andrew L. Hufton² &
…
Susanna-Assunta Sansone^2,3

Research Integrity and Peer Review volume 1, Article number: 6 (2016) Cite this article

6473 Accesses
13 Citations
47 Altmetric
Metrics details

Abstract

Sharing of experimental clinical research data usually happens between individuals or research groups rather than via public repositories, in part due to the need to protect research participant privacy. This approach to data sharing makes it difficult to connect journal articles with their underlying datasets and is often insufficient for ensuring access to data in the long term. Voluntary data sharing services such as the Yale Open Data Access (YODA) and Clinical Study Data Request (CSDR) projects have increased accessibility to clinical datasets for secondary uses while protecting patient privacy and the legitimacy of secondary analyses but these resources are generally disconnected from journal articles—where researchers typically search for reliable information to inform future research. New scholarly journal and article types dedicated to increasing accessibility of research data have emerged in recent years and, in general, journals are developing stronger links with data repositories. There is a need for increased collaboration between journals, data repositories, researchers, funders, and voluntary data sharing services to increase the visibility and reliability of clinical research. Using the journal Scientific Data as a case study, we propose and show examples of changes to the format and peer-review process for journal articles to more robustly link them to data that are only available on request. We also propose additional features for data repositories to better accommodate non-public clinical datasets, including Data Use Agreements (DUAs).

Peer Review reports

Background

Open access to research data that can be understood and reused by others is a means to further scientific progress and publish more reliable and reproducible research [1, 2]. However, clinical research data often include information that could potentially identify individuals, meaning datasets must be anonymised prior to being shared beyond the study for which the data were originally collected. Although guidelines and processes for anonymisation of clinical data exist [3, 4], publication of freely available clinical datasets (such as [5]) remains uncommon. As open access to clinical datasets is often unfeasible, a more felicitous and pragmatic approach may be needed.

Some clinical datasets may be made available on request from authors of journal articles or through the recent emergence of dedicated data request systems. However, as large amounts of clinical research can go unpublished [6, 7], and clinical trials unregistered [8], the discoverability of many clinical datasets is suboptimal. In this paper, we use the term “non-public clinical datasets” to mean datasets that have been generated through experimental clinical research, such as clinical trials, and which are not openly accessible, but are available on request. Clinical research involving surgical, disease-specific or epidemiologic cohort databases, and electronic health records, which can be continually updated and held by an institution, should not be excluded from data sharing but may require specific additional guidance not covered in this paper.

In consultation with relevant stakeholders representing pharmaceutical companies, research funders, researchers and data repository managers, the editors and publishers of the journal Scientific Data propose guidelines for linking peer-reviewed journal articles to non-public clinical datasets. Throughout this paper, Scientific Data is used as a case study of how these guidelines will work in practice and we also propose how similar approaches could be taken by other journals that consider manuscripts describing clinical or other data that cannot be publicly available.

Data repositories are recognised as essential for enabling reliable access to data underlying research in a number of life science disciplines. Their use is ingrained in the practices of some research communities and in the editorial policies of, for example, the Nature journals [2]. Common repositories for non-public clinical data are less well established, however. We propose implementation of specific features in data repositories for the archival of non-public clinical datasets, to enable appropriate cross-linking of these datasets to scholarly journal articles.

Summary of recommendations

Clinical researchers and their sponsors

Be prepared to share experimental data with editors, peer reviewers and other researchers in accordance with journal policies
Apply the shortest possible embargoes on data

Repositories

Develop mechanisms to host clinical research data, including:
- ○ Provide stable identifiers for metadata records about non-public dataset(s)
- ○ Implement Data Use Agreements (DUAs)
- ○ Implement a transparent system for requesting access to data and reviewing requests to access data
- ○ Allow access to data in a timely manner and include a proportionate review of the scientific rationale, without introducing unnecessary barriers

Clinical journal editors and publishers

Check compliance with their journal’s data sharing policies for every submission
For manuscripts based on secondary access to trial data (e.g. data the original trial sponsor has made available for further research), check the research is consistent with the DUA and purpose for which data access was granted or ask authors to attest that the submission is compliant with these conditions
Increase the visibility of clinical datasets to peer reviewers
Build relationships with repositories for non-public clinical datasets to support public archiving of metadata and data from clinical research
Introduce links to data, data sharing statements and transparency statements in published articles

Data journal editors and publishers

Develop data article formats (for example Scientific Data’s Data Descriptor) to permanently link articles to descriptions of non-public clinical datasets

All sponsors and funders of trials

Build partnerships with and between data sharing initiatives, trusted repositories, and peer-reviewed journals committed to data sharing
Apply the shortest possible embargoes on data and ensure that data access is subject to a proportionate review (e.g. of the scientific rationale and qualifications of the research team), without introducing unnecessary barriers

How do researchers currently access non-public clinical data?

Data sharing between researchers has traditionally occurred through direct contact between individuals and research groups [9]. Many journals have policies that require authors to share data that support their results with other researchers on request, but enforcement of such policies varies between journals (for a summary of journal policy types and approaches see [10]). Journal policies that only require data to be “available on request” without also mandating data availability statements have been found to be less effective for ensuring data access for future researchers [11–14]. However, even clinical journals with strong and enforced policies on data access (such as the BMJ and PLOS Medicine), data about identifiable human subjects will usually not be in the public domain, due to the need to protect research participants’ privacy.

Alongside changes in journal policies, initiatives from other stakeholders (regulatory agencies, the pharmaceutical industry and research groups) in clinical research have begun to facilitate greater access to non-public clinical datasets. This includes the European Medicines Agency (EMA) which has committed to providing access to individual patient data (IPD) in the future [15].

When surveyed about their data sharing attitudes and behaviours, clinical researchers express concerns about inappropriate secondary analysis of their data and patient privacy [16]. The Yale Open Data Access (YODA; http://yoda.yale.edu/) project and Clinical Study Data Request (CSDR; http://clinicalstudydatarequest.com) have since 2012 provided restricted access to non-public clinical datasets while addressing these two concerns. As of January 2016, CSDR listed more than 2800 clinical studies from 12 pharmaceutical companies, for which access to data could be requested. Researchers are also able to enquire about the availability of other non-listed studies. The project has been described as a success by its independent data access review panel [17], and 179 research proposals (data requests) were submitted between May 2013 and November 2015 (https://www.clinicalstudydatarequest.com/Metrics.aspx). As of March 2015 GlaxoSmithKline, one of the companies using CSDR, had received 99 requests (approximately four requests per month) to access data from the 1200 trials it had listed. The YODA Project has compiled data from more than 120 studies, representing two commercial data providers, and has received 39 requests (http://yoda.yale.edu/summary-data-inquiries-and-requests).

Connecting non-public clinical data with journals and repositories

With increasing numbers of open access repositories for research data for many scientific disciplines (http://www.nature.com/sdata/data-policies/repositories; http://www.re3data.org/), which provide persistent, citable links to datasets and metadata records, it is relatively easy to link journal articles to publicly accessible data. In the area of clinical trials, the concept of allowing secondary researchers to access the data is relatively new and so data are not yet widely shared. In addition, since non-public clinical datasets often have no permanent public record or identifier, links between these and the peer-reviewed literature are far less robust than links between public datasets and the literature. This could be addressed by developing both the data access policies of journals and the relationships of journals with data repositories that can archive non-public datasets. Increasing the visibility of such data could also be supported by a new type of journal article focused on describing non-public clinical datasets—particularly those that are previously unpublished. These new articles could be an adaptation of “data papers” that are already published in data journals. Data journals are a relatively new type of journal focused on data publication [18]. Below we describe the benefits of this approach for researchers, as opposed to simply posting information about non-public datasets on a website or posting summary results of clinical trials in trial registration databases.

Discoverability

YODA and CSDR provide public information about clinical studies for which data are available on request. Some of these studies may not be described in journal articles, and for studies that are published, it may not be clear in the published article that the underlying data are available. Moreover, when planning a new research project or systematic review, medical researchers primarily rely on bibliographic databases (such as PubMed/MEDLINE, Scopus, Web of Science), individual journals and clinical trial registries to find information rather than these websites. Better links with journal articles, which are more visible due to indexing in bibliographic databases, might increase the visibility and use of data on request services.

Scientific Data (http://www.nature.com/scientificdata) from the Nature Publishing Group is one example of a data journal (see [18] for others). It is an open access journal for descriptions of scientifically valuable datasets, where data articles (termed Data Descriptors) are linked to their corresponding publicly available datasets. With non-public datasets journal articles need to link to persistent and citable metadata, a “stub” record or landing page, for a clinical dataset that is available on request. The landing page for a non-public clinical dataset should contain sufficient metadata to facilitate an understanding of what the dataset is, and importantly, the conditions that must be met for access to the data (see below). Data Descriptors (and other types of research-based article) linked to the citeable landing page could then be written by those who created the non-public clinical dataset, increasing the discoverability of the dataset and providing a means to obtain scholarly credit for generating and sharing the data.

As well as increasing discoverability of content through indexing in bibliographic databases open access journals, in particular, can help ensure content is highly visible through exposure to standard internet search engines such as Google.

Quality and peer review

Peer review is central to publishing research in journals. Publication of Data Descriptors at Scientific Data (and some other data journals) involves formal peer review by independently selected experts, of both the article describing the dataset(s) and the dataset itself. The peer-review process at Scientific Data focuses on (re)usability and data integrity, rather than on the perceived importance or impact of the data (http://www.nature.com/sdata/for-referees). However, systematic peer review of underlying data is not routine in traditional research journals.

Data curation

Scientific Data’s publication process includes data curation by a dedicated Editor, in addition to peer reviewer checks. This process includes the creation of standardised, machine readable metadata for every Data Descriptor. This is intended to facilitate data discoverability and reuse by using controlled vocabulary terms to capture sample and subject provenance, and outline the experimental workflow. Datasets in curated, readily consumable formats should increase confidence in the delivered data formats, adding further value to the offerings of data journals. Use of subject-specific repositories, which tend to use community-accepted data formats and standards, and have standardised data curation processes, also increases the likelihood that archived data can be reused.

Permanence

A role of journals is to ensure the permanence and integrity of the scientific record, for example, by maintaining persistent links between articles and datasets and placing copies of content in redundant archives. Web link decay or “link rot” is well documented, and even in the peer-reviewed literature, an estimated 20 % of articles published in 2012 already suffer from broken web links when regular websites or URLs are cited [19]. This deterioration of referential integrity across corresponding data sources presents obstacles to replicating or re-analysing results underlying the scientific record. Publishers’ use of persistent identifiers for journal articles, Digital Object Identifiers (DOIs), helps ensure readers and future researchers can access content as publishers commit to keeping article links up-to-date using the DOI system. DOIs are increasingly being created for datasets and metadata records, via data repositories.

Links with data repositories

While the YODA Project and CSDR have succeeded in increasing access to clinical research data, their websites, and documents hosted therein are not citable and linkable in the same way as journal articles, and other research objects assigned DOIs. Furthermore, researchers and projects funded by organisations such as Cancer Research UK, Medical Research Council Clinical Trials Unit (MRC CTU) [20] and the Wellcome Trust may have well-managed non-public datasets available via local hosting, but these archives are unlikely to meet preservation, discoverability and linking standards of journals and publishers. Scientific Data, for example, works with trusted repositories to publish its Data Descriptors and supports data archiving policies and activities across the Nature journals [2]. Scientific Data has established criteria for assessing public research data repositories (see below) and has so far approved more than 80 repositories (http://www.nature.com/sdata/data-policies/repositories; https://biosharing.org/collection/ScientificData). Other publishers and journals also list suggested or recommended repositories for authors including BMJ Open, PLOS, and BioMed Central.

Negative, incomplete or inconclusive trial data

While a number of journals exist that explicitly encourage negative or inclusive trials to be published (such as Trials, Journal of Negative Results in Biomedicine), a type of article designed to describe non-public clinical trial data—regardless of whether discussion or analysis of the data are available—might be further incentive for researchers and their sponsors to share data.

Alternative approaches to data sharing

While this paper proposes an important role for journals and data journals in making non-public research data more widely available, other approaches exist. We summarise some of these and their advantages and disadvantages in Table 1.

Table 1 Advantages and disadvantages of alternative approaches to data sharing

Full size table

While these other models are not completely inconsistent with data publications, the involvement of journals and data journals in data sharing and publication additionally provides independent data quality evaluation, better preservation through dedicated independent repositories, and data access or request procedures that are more transparent and less susceptible to bias.

Additional considerations for journals and data repositories

Data repositories that host non-public clinical datasets which are linked to peer-reviewed articles need to provide the following additional services to those repositories hosting only publicly accessible data.

Data use agreements

An essential component for the secondary use of non-public clinical datasets is a data use agreement (DUA) between the data generator or repository and the secondary researcher(s). The purposes of DUAs include reducing risks to participants and other parties involved in the study and to ensure the scientific value of secondary analyses, which includes commitments to publishing secondary analyses in peer-reviewed journals [21].

Template DUAs are provided by the YODA Project and CSDR (https://www.clinicalstudydatarequest.com/Documents/DATA-SHARING-AGREEMENT.pdf). The Multi-regional Clinical Trials Center at Harvard University also provides a template DUA (http://mrctcenter.org/resources/?project=framework-for-data-sharing).

Recommending wording for DUAs is beyond the scope of these guidelines. However, the position of Scientific Data is that DUAs describing datasets intended for publication in journals dedicated to open science should support wider use by independent qualified researchers, including for competitive analysis, and that the authors agree to relinquish intellectual property claims stemming from the data. Mandatory requirements for co-authorship for data creators should be avoided.

Controlled access and governance

Few repositories have processes for managing and approving requests to access non-public clinical datasets but we identify some candidates and possible candidates (Table 2). Controlled access approaches for IPD have been described by Tudur Smith et al. [22]—which, for clinical trials, do not always use data repositories—can include independent advisory or data access committees. The role of a trusted intermediary and characteristics of independent review panels are highlighted by the IOM’s report [21] (chapter 5). The report recommends holders of clinical trial data implement “operational strategies that include employing data use agreements, designating an independent review panel, including members of the lay public in governance, and making access to clinical trial data transparent.” Independent panels are costly to manage—employed typically by large pharmaceutical companies or major research funders—and there is no widely established governance solution for all experimental clinical research data.

Table 2 Data repositories that meet or potentially could meet the proposed requirements (in this article) for hosting non-public clinical trial data

Full size table

Some concerns have been raised about the speed of data release and usability of data from “voluntary data-sharing portals” [23] such as YODA and CSDR, and, more broadly, clinical trial units are understandably concerned about costs of managing controlled access to data [24].

Landing pages

Repository metadata records (landing pages) for linking journal articles to non-public clinical data will share many characteristics with repository records for public datasets. Good practice for data citation (citing of datasets in the reference list/bibliography similar to citing papers) is for links to datasets to resolve to landing pages rather than the raw data files. Standards are emerging for the information that is essential and desirable to be included in landing pages [25]. These standards consist of a dataset description comprising a dataset identifier, title and brief description, creator, publisher, and publication year. Landing pages should also include persistence/permanence information and licensing information for the data. We recommend that landing pages for non-public clinical datasets also include information detailing the access controls pertinent to the data. Examples of landing pages can be found at the UK Data Archive (e.g. http://dx.doi.org/10.5255/UKDA-SN-7612-2) and the European Genome-phenome Archive (e.g. https://www.ebi.ac.uk/ega/datasets/EGAD00000000001) and in Table 2. For clinical trials, where prospective trial registration is established as ethical research practice, we recommend unique trial identifiers be included on landing pages.

Landing pages for non-public datasets in trusted repositories (Table 2) can facilitate robust citation and linking of papers reporting reanalysis of data obtained on request. Papers reporting reanalysis of clinical data accessed via data request portals, including from YODA [26] and Project Data Sphere [27] have begun to be published, but can lack robust—citable—links to data sources. A reanalysis of shared clinical data should ideally cite, in the reference list, the paper reporting the primary analysis of the data, the landing page(s) of the dataset(s) themselves (where the data have persistent links, such as DOIs) and, if applicable any data papers describing the datasets. This helps provide credit and acknowledgement to all those involved in data acquisition, sharing and analysis. Data repositories are also increasingly able to add links to citing articles as they are published.

Additional repository assessment criteria

Taking the above issues into consideration, we propose additional criteria by which journals and publishers can assess repositories for hosting of non-public clinical datasets:

Trusted repositories for non-public datasets must:

Provide stable identifiers for metadata records about non-public dataset(s)
Allow access to data with the minimum of restrictions needed to ensure protection of privacy and appropriateness of secondary analyses, codified in Data Use Agreements (DUAs)
Allow access to data in a timely manner
Provide support for users of data

Trusted repositories for non-public datasets should, ideally:

Be independent of the study sponsors and principal investigators
Have a transparent and persistent system for requesting access to data and reviewing requests to access data
Provide relevant data analysis software environments where data are not permitted to be downloaded locally
Provide public access to the metadata of archived data for third party search and discovery functionalities

These criteria are in addition to the current repository selection criteria of Scientific Data which require trusted repositories:

Be broadly supported and recognised within their scientific community
Ensure long-term persistence and preservation of datasets in their published form
Implement relevant, community-endorsed reporting requirements
Provide stable identifiers for submitted datasets
Allow public access to data without unnecessary restrictions

Since this working group was formed, at least one other group (Burton et al.) has proposed criteria for “data safe havens” in healthcare [28].

Repositories that could or may meet the current and additional requirements

Specialist trial data systems such as CSDR (and YODA) have indicated a willingness to listen to feedback from researchers and may evolve as data sharing progresses. As well as the potential for these repositories to develop to meet certain criteria, there are number of other general and specialist repositories that could or may meet the above criteria (Table 2), at least for specific types of clinical research.

There are also clinical data systems and “federations” associated with specific projects and some have played roles in disseminating data in specific communities. The ADNI portal is one such example (http://neuroinformatics.harvard.edu/gsp/loni). The group responsible for this portal have published a peer-reviewed article [29] describing and linking to non-public data in this resource.

Another Data Descriptor published at Scientific Data describes a non-public clinical dataset—of neuroimaging data from brain tumour patients—hosted at the UK Data Archive [30]. The Synapse repository has, also, enabled secure access to patient self-reported clinical observation data collected through smartphone apps—that have subsequently been described in and linked to peer-reviewed publications [31]. Scientific Data has developed bespoke workflows with Harvard Dataverse, UK Data Archive and Synpase to facilitate anonymous peer review of confidential datasets. Collaboration with the European Genome-phenome Archive (EGA) has also facilitated journal coordinated peer review of non-public human genome data [32].

Additional manuscript sections required

The format of published articles—data papers and traditional research articles—needs to be developed to accommodate links to non-public datasets.

Articles should include information about why the data are not publicly available (i.e. because they contain personally identifiable information) and describe the restrictions on accessing the dataset. They should also state if the data are subject to a DUA and where the DUA can be found—ideally, this should be hosted permanently with the landing page of the non-public dataset. In Scientific Data, the Usage Notes section would be appropriate for this information. Persistent links to landing pages should also be cited, in the same way public datasets are cited. Other journals, such as PLOS ONE, Genome Biology, Palgrave Communications, GigaScience and Royal Society Open Science are now routinely including dedicated article sections to describe and link to datasets supporting published articles—these could be adapted to meet these requirements.

Where Data Descriptors, and other articles, link to non-public clinical datasets, we recommend authors include in their articles a transparency declaration guaranteeing that their description of the dataset is an honest and accurate account. Transparency statements for regular journal articles, for other aspects of research integrity, have been implemented by the BMJ [33].

See Fig. 1 for an overview of the standard editorial workflow of Scientific Data and Fig. 2 for the proposed modified workflow to accommodate Data Descriptors of non-public clinical datasets.

Research participant consent

Part of a journal’s role is to enforce relevant ethical expectations regarding consent. An important consideration is whether participants gave appropriate consent for data to be made available to secondary researchers in the future (if data are not fully anonymised [3]). Where informed consent for data sharing was obtained, consent should include scholarly publications (peer-reviewed articles) that describe or link to datasets.

What data should be available to secondary researchers?

Different types of experiments produce different types of information in a variety of formats, leading to different minimum requirements for secondary researchers seeking to replicate or understand results. In general, reproducible medical research requires access to data, code, and study protocols [34]. CSDR has, furthermore, defined data and document types for the studies it lists, although all items are not always available for each study:

i.
Raw dataset
ii.
Blank case report forms
iii.
Annotations of blank case report forms
iv.
Dataset specifications
v.
Protocol (all versions)
vi.
Analysis-ready dataset
vii.
Reporting and analysis plan
viii.
Clinical study report

While the scope of these guidelines potentially goes beyond clinical trials—including molecular data types—this list defined by CSDR is a reasonable guide. The IOM has also described the clinical research data types that are needed for reanalysis, which differs slightly from the CSDR list (Chapter 5, p. 112) [21]. Standards for reusability for historic, non-public clinical datasets might need to be less stringent if data are only available in file formats that might not be optimised for reuse.

Who should have secondary access to data?

The consensus of regulators, industry sponsors and funders of clinical trials is for data access to be granted to suitably qualified researchers with a legitimate reanalysis proposal. A requirement of CSDR’s procedures is that a researcher with a degree in statistics or a related discipline should be part of the research team. The Independent Review Panel for CSDR had, as of February 2015, approved 71 requests and rejected or advised to resubmit 4 requests. The YODA Project’s approval committee assesses “basic information about the Principal Investigator, Key Personnel, and the project Research Proposal, including a scientific abstract and research methods” when reviewing data access requests [35].

Journal polices generally require data supporting submitted works must be accessible to peer reviewers and editors, and study sponsors should already be used to providing access to data supporting manuscripts submitted to major medical journals. These repository criteria and guidelines could make these processes more efficient if applied to clinical research journals. To publish Data Descriptors in Scientific Data, peer reviewers and editors must be given controlled access to supporting data for every article. The majority of journals operate single or double blind peer review, which means some reviewers or journals might require their anonymity to be maintained (public data repositories often support anonymous peer review, although there is increasing adoption of open peer review).

Peer review of non-public clinical datasets

Journals that implement these guidelines will be able to make non-public clinical datasets more visible to peer reviewers, potentially, as well as editors. Scientific Data is reviewing its peer review guidelines (http://www.nature.com/sdata/for-referees#writing-review) as it considers the first articles describing non-public datasets, but many journals have their own guidelines and processes for their peer reviewers. In general, however, peer review of articles describing and linking to non-public clinical datasets should include but not be limited to:

Whether the access controls on the data are warranted and if enough information is provided on how to request access to the data
Whether the data are sufficiently well described to enable independent researchers to assess the reuse value of the dataset (to help ensure data requested are reused)

When to provide secondary access to data

The IOM has recommended embargoes of up to 18 months from study completion before clinical trialists are required to share data, although this has been criticised for being too long [36]. The International Committee of Medical Journal Editors (ICMJE) in 2016 proposed a draft policy for its member journals requiring sharing of anonymised IPD within 6 months of publication [37]. In a major epidemic, a long embargo on data access and reuse could be to the detriment of fighting disease. But a reasonable period for analysis—a right of first use—is acknowledged in most research communities. Any embargo on non-public clinical dataset(s) described in and linked to a journal article would have to have expired to comply with the recommendations in this article. In general we advocate no, or short, embargoes on data release wherever feasible.

Conclusion

In consultation with a working group (see Acknowledgements), Scientific Data is developing its editorial and peer-review processes and relationships with repositories to support publication of Data Descriptors for non-public clinical datasets as we receive relevant submissions. Other journals—data journals and traditional journals—may wish to consider these repository, linking and editorial policy proposals. Some members of our working group are also helping to identify interested research teams and relevant datasets that could be part of a publication pilot. Indeed, we need real data with which to develop more robust links between non-public datasets and journal articles. We strongly encourage others to contact the editors (scientificdata@nature.com), to discuss proposals.

References

Royal Society, 2012. Science as an open enterprise. at https://royalsociety.org/~/media/policy/projects/sape/2012-06-20-saoe.pdf. Accessed 3 May 2016.
Data-access practices strengthened. Nature. 2014;515:312–312. doi:10.1038/515312a.
Hrynaszkiewicz I, Norton ML, Vickers AJ, Altman DG. Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers. BMJ. 2010;340:c181.
Article Google Scholar
El Emam K, Rodgers S, Malin B. Anonymising and sharing individual patient data. BMJ. 2015;350:h1139.
Article Google Scholar
Sandercock PAG, Niewada M, Członkowska A. The International Stroke Trial database. Trials. 2011;12:101.
Article Google Scholar
Jones CW et al. Non-publication of large randomized clinical trials: cross sectional analysis. BMJ. 2013;347:f6104.
Article Google Scholar
McGauran N et al. Reporting bias in medical research—a narrative review. Trials. 2010;11:37.
Article Google Scholar
van de Wetering FT, Scholten RJPM, Haring T, Clarke M, Hooft L. Trial registration numbers are underreported in biomedical publications. PLoS One. 2012;7:e49599.
Article Google Scholar
Kratz JE, Strasser C. Researcher perspectives on publication and peer review of data. PLoS One. 2015;10:e0117619.
Article Google Scholar
Hrynaszkiewicz I, Li P, Edmunds SC. Open science and the role of publishers in reproducible research. In: Stodden V, Leisch F, Peng, RD, editors. Implementing Reproducible Research. CRC Press; 2014.
Savage CJ, Vickers AJ. Empirical study of data sharing by authors publishing in PLoS journals. PLoS One. 2009;4:e7078.
Article Google Scholar
Vines TH, et al. Mandated data archiving greatly improves access to research data. FASEB J. 2013;fj.12–218164. doi:10.1096/fj.12-218164.
Jaspers GJ, Degraeuwe PLJ. A failed attempt to conduct an individual patient data meta-analysis. Syst Rev. 2014;3:97.
Article Google Scholar
Wicherts JM, Borsboom D, Kats J, Molenaar D. The poor availability of psychological research data for reanalysis. Am Psychol. 2006;61:726–8.
Article Google Scholar
Koenig F et al. Sharing clinical trial data on patient level: opportunities and challenges. Biom J. 2015;57:8–26.
Article Google Scholar
Rathi V et al. Clinical trial data sharing among trialists: a cross-sectional survey. BMJ. 2012;345:e7570.
Article Google Scholar
Data sharing, year 1—access to data from industry-sponsored clinical trials. NEJM. at http://www.nejm.org/doi/full/10.1056/NEJMp1411794. doi:10.1056/NEJMp1411794.
Kratz J, Strasser C. Data publication consensus and controversies. F1000Res. 2014;3:94.
Google Scholar
Klein M et al. Scholarly context not found: one in five articles suffers from reference rot. PLoS One. 2014;9:e115253.
Article Google Scholar
Sydes MR et al. Sharing data from clinical trials: the rationale for a controlled access approach. Trials. 2015;16:104.
Article Google Scholar
Sharing clinical trial data: maximizing benefits, minimizing risk. Institute of Medicine. at http://www.iom.edu/Reports/2015/Sharing-Clinical-Trial-Data.aspx. Accessed 3 May 2016.
Tudur Smith C et al. How should individual participant data (IPD) from publicly funded clinical trials be shared? BMC Med. 2015;13:298.
Article Google Scholar
Geifman N, Bollyky J, Bhattacharya S, Butte AJ. Opening clinical trial data: are the voluntary data-sharing portals enough? BMC Med. 2015;13:280.
Article Google Scholar
Hopkins C, et al. UK publicly funded Clinical Trials Units supported a controlled access approach to share individual participant data but highlighted concerns. J. Clin. Epidemiol. 2015. doi:10.1016/j.jclinepi.2015.07.002.
Starr J et al. Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Comput Sci. 2015;1:e1.
Article Google Scholar
Noshchenko A, Lindley EM, Burger EL, Cain CMJ, Patel VV. What is the clinical relevance of radiographic nonunion after single-level lumbar interbody arthrodesis in degenerative disc disease?: a meta-analysis of the YODA project database. Spine (Phila Pa 1976). 2016;41:9–17.
Article Google Scholar
Geifman N, Butte AJ. A patient-level data meta-analysis of standard-of-care treatments from eight prostate cancer clinical trials. Sci Data. 2016;3:160027 doi:10.1038/sdata.2016.27.
Burton PR et al. Data Safe Havens in health research and healthcare. Bioinformatics. 2015;31:3241–8.
Article Google Scholar
Holmes AJ et al. Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures. Sci Data. 2015;2:150031.
Article Google Scholar
Pernet CR, Gorgolewski KJ, Job D, Rodriguez D, Whittle I, Wardlaw J. A structural and functional magnetic resonance imaging dataset of brain tumour patients. Sci Data. 2016;3:160003. doi:10.1038/sdata.2016.3.
Bot BM et al. The mPower study, Parkinson disease mobile data collected using ResearchKit. Sci Data. 2016;3:160011.
Article Google Scholar
Tang D et al. Reference genotype and exome data from an Australian Aboriginal population for health-based research. Sci Data. 2016;3:160023.
Article Google Scholar
Altman DG, Moher D. Declaration of transparency for each research article. BMJ. 2013;347:f4796.
Article Google Scholar
Laine C, Goodman SN, Griswold ME, Sox HC. Reproducible research: moving toward research the public can really trust. Ann Intern Med. 2007;146:450–3.
Article Google Scholar
The YODA Project. Policies & procedures to guide external investigator access to clinical trial data. at http://yoda.yale.edu/policies-procedures-guide-external-investigator-access-clinical-trial-data. Accessed 3 May 2016.
Krumholz HM. Why data sharing should be the expected norm. BMJ. 2015;350:h599.
Article Google Scholar
Taichman DB et al. Sharing clinical trial data. BMJ. 2016;532:i255.
Article Google Scholar
Hrynaszkiewicz I, Khodiyar V, Hufton A, Sansone S-A. Publishing descriptions of non-public clinical datasets: guidance for researchers, repositories, editors and funding organisations. bioRxiv. Cold Spring Harbor Labs J. 2015. doi:10.1101/021667.
Let’s be pragmatic about clinical data. Sci Data. 2015;2:150034. doi:10.1038/sdata.2015.34
Ashish N, Bhatt P, Toga AW. Global data sharing in Alzheimer disease research. Alzheimer Dis Assoc Disord. 2015. Epud ahead of print.

Download references

Acknowledgements

Thanks to the Working Group members, who attended a meeting in December in 2014 which identified many of the issues highlighted in this paper and who provided comments on earlier drafts of the paper:

Robert Frost, Policy Director, Medical Policy, GSK

Richard Lehman, on behalf of the Yale Open Data Access Project

Fiona Reddington, Head of Clinical and Population Research Funding, Cancer Research UK

Will Greenacre, Policy Officer, Wellcome Trust

Matthew Sydes, Senior Scientist and Statistician, MRC Clinical Trials Unit

John Gonzalez, Publications Director, AstraZeneca

Jay Bergeron, Director, Translational and Bioinformatics, Pfizer

Jessica Ritchie/Joseph Ross, Yale Open Data Access Project

Mark Hahnel, CEO and Founder, figshare

Thanks also to those who submitted comments on the first public version of these guidelines, which was announced in July 2015 [38, 39].

Authors’ contributions

IH wrote the first draft of the original and revised manuscript. ALH drafted Table 1, and VK drafted Table 2. ALH created the figures. ALH, VK and SS were involved in the critical revision and development of the manuscript. All authors read and approved the final manuscript.

Competing interests

IH, VK and ALH are employees of Springer Nature, which publishes Scientific Data. SS is Honorary Academic Editor of Scientific Data.

Author information

Authors and Affiliations

Springer Nature, The Campus, Trematon Walk, Wharfdale Road, London, N1 9FN, UK
Iain Hrynaszkiewicz
Scientific Data, The Campus, Trematon Walk, Wharfdale Road, London, N1 9FN, UK
Varsha Khodiyar, Andrew L. Hufton & Susanna-Assunta Sansone
Oxford e-Research Centre, University of Oxford, Oxford, OX1 3QG, UK
Susanna-Assunta Sansone

Authors

Iain Hrynaszkiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Varsha Khodiyar
View author publications
You can also search for this author in PubMed Google Scholar
Andrew L. Hufton
View author publications
You can also search for this author in PubMed Google Scholar
Susanna-Assunta Sansone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iain Hrynaszkiewicz.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Hrynaszkiewicz, I., Khodiyar, V., Hufton, A.L. et al. Publishing descriptions of non-public clinical datasets: proposed guidance for researchers, repositories, editors and funding organisations. Res Integr Peer Rev 1, 6 (2016). https://doi.org/10.1186/s41073-016-0015-6

Download citation

Received: 17 February 2016
Accepted: 22 April 2016
Published: 22 June 2016
DOI: https://doi.org/10.1186/s41073-016-0015-6

Publishing descriptions of non-public clinical datasets: proposed guidance for researchers, repositories, editors and funding organisations

Abstract

Background

Summary of recommendations

Clinical researchers and their sponsors

Repositories

Clinical journal editors and publishers

Data journal editors and publishers

All sponsors and funders of trials

How do researchers currently access non-public clinical data?

Connecting non-public clinical data with journals and repositories

Discoverability

Quality and peer review

Data curation

Permanence

Links with data repositories

Negative, incomplete or inconclusive trial data

Alternative approaches to data sharing

Additional considerations for journals and data repositories

Data use agreements

Controlled access and governance

Landing pages

Additional repository assessment criteria

Repositories that could or may meet the current and additional requirements

Additional manuscript sections required

Research participant consent

What data should be available to secondary researchers?

Who should have secondary access to data?

Peer review of non-public clinical datasets

When to provide secondary access to data

Conclusion

References

Acknowledgements

Authors’ contributions

Competing interests

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Research Integrity and Peer Review

Contact us