Draft NIH Genomic Data Sharing Policy Request for Public Comments, 57860-57865 [2013-22941]
Download as PDF
57860
Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices
parameters that should be routinely
assessed in toxicology studies for INDs,
NDAs, and BLAs that are designed to
determine the potential for a drug to
disrupt the endocrine system. This draft
guidance also discusses factors that
should be considered in determining the
need for additional studies to
characterize potential endocrine
disruptor properties of a drug.
This draft guidance is being issued
consistent with FDA’s good guidance
practices regulation (21 CFR 10.115).
The draft guidance, when finalized, will
represent the Agency’s current thinking
on nonclinical evaluation of endocrine
disruption potential of drugs. It does not
create or confer any rights for or on any
person and does not operate to bind
FDA or the public. An alternative
approach may be used if such approach
satisfies the requirements of the
applicable statutes and regulations.
II. Paperwork Reduction Act of 1995
This draft guidance refers to
previously approved collections of
information that are subject to review by
the Office of Management and Budget
(OMB) under the Paperwork Reduction
Act of 1995 (44 U.S.C. 3501–3520). The
collections of information in 21 CFR
parts 312 and 314 have been approved
under OMB control numbers 0910–0014
and 0910–0001, respectively.
III. Comments
Interested persons may submit either
electronic comments regarding this
document to https://www.regulations.gov
or written comments to the Division of
Dockets Management (see ADDRESSES). It
is only necessary to send one set of
comments. Identify comments with the
docket number found in brackets in the
heading of this document. Received
comments may be seen in the Division
of Dockets Management between 9 a.m.
and 4 p.m., Monday through Friday, and
will be posted to the docket at https://
www.regulations.gov.
mstockstill on DSK4VPTVN1PROD with NOTICES
IV. Electronic Access
Persons with access to the Internet
may obtain the document at either
https://www.fda.gov/Drugs/Guidance
ComplianceRegulatoryInformation/
Guidances/default.htm or https://
www.regulations.gov.
Dated: September 16, 2013.
Leslie Kux,
Assistant Commissioner for Policy.
[FR Doc. 2013–22864 Filed 9–19–13; 8:45 am]
BILLING CODE 4160–01–P
VerDate Mar<15>2010
17:24 Sep 19, 2013
Jkt 229001
DEPARTMENT OF HEALTH AND
HUMAN SERVICES
National Institutes of Health
Draft NIH Genomic Data Sharing Policy
Request for Public Comments
The National Institutes of
Health (NIH) is seeking public
comments on the draft Genomic Data
Sharing (GDS) Policy that promotes
sharing, for research purposes, of largescale human and nonhuman genomic 1
data generated from NIH-supported and
NIH-conducted research.
DATES: To ensure that your comments
will be considered, please submit your
response to this Request for Comments
no later than 60 days after publication
of this notice.
ADDRESSES: Submit comments by any of
the following methods:
• Online: https://gds.nih.gov/
survey.aspx.
• Fax: 301–496–9839.
• Mail/Hand delivery/Courier (for
paper, disk, or CD–ROM submissions)
to: Genomic Data Sharing Policy Team,
Office of Science Policy, National
Institutes of Health, 6705 Rockledge
Drive, Suite 750, Bethesda, MD 20892.
FOR FURTHER INFORMATION CONTACT:
Genomic Data Sharing Policy Team,
Office of Science Policy, National
Institutes of Health, 6705 Rockledge
Drive, Suite 750, Bethesda, MD 20892,
301–496–9838, GDS@mail.nih.gov.
SUPPLEMENTARY INFORMATION:
SUMMARY:
Background
The NIH’s mission is to seek
fundamental knowledge about the
nature and behavior of living systems
and the application of that knowledge to
enhance health, lengthen life, and
reduce illness and disability. The draft
GDS Policy supports this mission by
promoting the sharing of genomic
research data, which maximizes the
knowledge gained. Not only does data
sharing allow data generated from one
research study to be used to explore a
wide range of additional research
questions, it also enables data from
multiple projects to be combined,
amplifying the scientific value of data
many times. Broad research use of the
data enhances public benefit by helping
to speed discoveries that increase the
understanding of biological processes
that affect human health and the
development of better ways to diagnose,
treat, and prevent disease.
The NIH has promoted data sharing
for many years, and in 2003, the NIH
issued a general policy for sharing
research data.2 3 In 2007, the NIH issued
a more specific policy to promote
PO 00000
Frm 00024
Fmt 4703
Sfmt 4703
sharing of data generated through
genome wide association studies
(GWAS),4 5 which examine thousands of
single nucleotide polymorphisms
(SNPs) across the genome to identify
genetic variants that contribute to
human diseases, conditions, and traits.
To facilitate the sharing of genomic and
phenotypic data from GWAS, the NIH
created the database of Genotypes and
Phenotypes (dbGaP) with a two-tiered
system for distributing the data: Open
access, for data that are available to the
public without restrictions, and
controlled access for data that are made
available only for research purposes that
are consistent with the original
informed consent under which the data
were collected.
Not long after the GWAS policy was
issued, advances in DNA sequencing
and other high-throughput technologies,
and a steep drop in DNA sequencing
costs, enabled the NIH to fund research
that generated even greater volumes of
GWAS and other types of genomic data.
In 2009, the NIH announced 6 its
intention to extend the GWAS Policy to
encompass data from a wider range of
genomic research.
The draft GDS Policy applies to
research involving nonhuman genomic
data as well as human data that are
generated through array-based and highthroughput genomic technologies (e.g.,
SNP, whole-genome, transcriptomic,
epigenomic, and gene expression data).
(See section II of the draft Policy.) The
NIH considers access to such data
particularly important because of the
opportunities to accelerate research
through the power of combining such
large and information-rich datasets. The
draft GDS Policy is aligned with
Administration priorities and a recent
directive to agencies to increase access
to digital scientific data resulting from
federally funded research.7
Overview of the Policy
The draft GDS Policy describes the
responsibilities of investigators and
institutions for the submission of
nonhuman and human genomic data to
the NIH (section IV) and the use of
controlled-access data (section V). The
Policy also provides expectations
regarding intellectual property (section
VI).
When data sharing involves human
data, the protection of research
participant privacy and confidentiality
is paramount, and the Policy reflects the
NIH’s continued commitment to
responsible data stewardship, which is
essential to uphold the public trust in
biomedical research. The draft GDS
Policy, like the GWAS Policy, includes
a number of provisions to protect
E:\FR\FM\20SEN1.SGM
20SEN1
Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices
research participant privacy (see section
IV.C). For example, prior to data
submission, traditional identifiers such
as name, date of birth, street address,
and social security number should be
removed. The de-identified 8 data are
coded using a random, unique code to
protect participant privacy. The NIH
also maintains the expectation
established under the GWAS Policy that
the responsible Institutional Signing
Official 9 of the submitting institution
should provide an Institutional
Certification to the funding NIH
Institute or Center prior to award. An
Institutional Certification assures that
the data have been or will be collected
in a legal and ethically appropriate
manner and have been de-identified.
The draft GDS Policy clarifies the
provisions of the Institutional
Certification for datasets submitted to
NIH-designated data repositories in
Section IV.C.5.
The NIH expects the Policy to be
effective 60 days after the publication of
the final Policy.
mstockstill on DSK4VPTVN1PROD with NOTICES
Request for Comments
As part of the process of developing
the GDS Policy, the NIH encourages the
public to provide comments on any
aspect of the draft GDS Policy.
Comments should be submitted
electronically to https://gds.nih.gov/
survey.aspx. Comments may also be
submitted by fax (301–496–9839), or
mailed to the Genomic Data Sharing
Policy Team, Office of Science Policy,
National Institutes of Health, 6705
Rockledge Drive, Suite 750, Bethesda,
MD 20892.
Responding to this request for
comments is voluntary. Submitted
comments are considered public
information; do not include any
information that you wish to remain
private and confidential. Comments in
their entirety will be posted along with
the submitter’s name and affiliation on
the NIH GDS Web site after the public
comment period closes. Commenters
will receive a confirmation
acknowledging receipt of comments but
will not receive individual feedback on
any suggestions. Please note that the
government will not pay for the use of
any information contained in the
response.
The NIH intends to hold one or more
public webinars on the draft Policy.
Information about the webinars will be
made available at https://gds.nih.gov.
Draft NIH Genomic Data Sharing Policy
I. Purpose
The draft Genomic Data Sharing
(GDS) Policy sets forth expectations that
VerDate Mar<15>2010
17:24 Sep 19, 2013
Jkt 229001
ensure the broad and responsible
sharing of genomic research data.
Sharing research data supports the NIH
mission 10 and is essential to facilitate
the translation of research results into
knowledge, products, and procedures
that improve human health. The NIH
has longstanding policies to make data
publicly available in a timely manner
from the research activities that it
funds.11 12
II. Scope and Applicability
This Policy applies to all NIH-funded
research that involves large-scale human
and nonhuman genomic data produced
by array-based or high-throughput
genomic technologies, such as GWAS 13
SNP, whole-genome, transcriptomic,
epigenomic, and gene expression data,
irrespective of funding level and
funding mechanism (i.e., grant, contract,
or intramural support). Appendix A
provides examples of research that are
subject to the Policy. At appropriate
intervals, the NIH will review the types
of research to which this Policy may be
applicable, and changes to the scope
will be defined in supplementary
materials to the final GDS Policy.
Notification of any changes will be
provided to investigators and
institutions through standard NIH
communication channels (e.g., NIH
Guide for Grants and Contracts).
Compliance with this Policy will
become a special term and condition in
the Notice of Award or the Contract
Award. Failure to comply with the
terms and conditions of the funding
agreement could lead to enforcement
actions, including the withholding of
funding, consistent with 45 CFR 74.62
and/or other authorities, as appropriate.
III. Effective Date
The effective date of this Policy is [To
Be Determined], and pertains to the
following funding mechanisms:
• Competing grant applications 14 that
are submitted to the NIH as of the [TBD]
receipt date;
• Proposals for contracts that are
submitted to the NIH as of [TBD]; and
• NIH intramural research projects
that are approved as of [TBD].
IV. Responsibilities of Investigators
Submitting Genomic Data
A. Data Sharing Plans
Investigators seeking NIH funding
should contact appropriate Institute or
Center (IC) Program or Project
Officials 15 as early as possible to
discuss data sharing expectations and
timelines that would apply to their
proposed studies. Investigators and their
institutions are expected to address
PO 00000
Frm 00025
Fmt 4703
Sfmt 4703
57861
plans for following this Policy in the
data sharing section of funding
applications and proposals. Any
resources needed to support a proposed
data sharing plan should be included in
the project’s budget. NIH intramural
investigators are expected to address
data sharing plans with their IC
scientific leadership prior to initiating
applicable research and are encouraged
to contact their IC leadership or the
Office of Intramural Research for
guidance.
B. Nonhuman and Model Organism
Genomic Data
1. Data Submission Expectations and
Timeline
Nonhuman data (including microbial
and microbiome data) and data from
large-scale genomic projects for model
organisms 16 are to be shared in a timely
manner. Investigators should make
nonhuman and model organism data
publicly available no later than the date
of initial publication. However, certain
data types or NIH research initiatives
may expect an earlier data release (e.g.,
microbial or microbiome data, or
projects with broad utility as a resource
for the scientific community). (See
Appendix A for specific expectations for
data submission and release.)
2. Data Repositories
Data should be made available
through any widely used data
repository, whether NIH-funded or not,
such as the Gene Expression Omnibus
(GEO),17 Sequence Read Archive
(SRA),18 Trace Archive,19 Array
Express,20 Mouse Genome Informatics
(MGI),21 WormBase,22 the Zebrafish
Model Organism Database (ZFIN),23
GenBank,24 European Nucleotide
Archive (ENA),25 or DNA Data Bank of
Japan (DDBJ).26
C. Human Genomic Data
1. Data Submission Expectations and
Timeline
Guidance to govern human genomic
data submission timelines and data
release expectations is provided in
Appendix A. The NIH will release data
submitted to NIH-designated data
repositories without restrictions on
publication or other dissemination no
later than six months after the initial
data submission to an NIH-designated
data repository,27 or at the time of
acceptance of the first publication,
whichever occurs first.
Human data that are submitted to
NIH-designated data repositories should
be de-identified according to the
standards set forth in the HHS
Regulations for the Protection of Human
E:\FR\FM\20SEN1.SGM
20SEN1
57862
Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices
Subjects 28 and the Health Insurance
Portability and Accountability Act
(HIPAA) Privacy Rule.29 The deidentified data should be assigned a
random, unique code, and the key held
by the submitting institution.
The NIH encourages researchers and
institutions submitting large-scale
genomic datasets to NIH-designated data
repositories to consider whether a
Certificate of Confidentiality could serve
as an additional safeguard to prevent
compelled disclosure of any personally
identifiable information that it may
hold.30 The NIH has obtained a
Certificate of Confidentiality for
dbGaP.31
2. Data Repositories
Applicable studies with human
genomic data should be registered in the
database of Genotypes and Phenotypes
(dbGaP) 32 no later than the time that
data cleaning and quality control
measures begin. Investigators should
submit human data to the relevant NIHdesignated data repository (e.g., dbGaP,
GEO, SRA, the Cancer Genomics
Hub 33). NIH-designated data
repositories need not be the exclusive
source for facilitating the sharing of
genomic data. Investigators who elect to
submit data to a non-NIH-designated
data repository should confirm that
appropriate data security,
confidentiality, and privacy measures
are in place.
mstockstill on DSK4VPTVN1PROD with NOTICES
3. Tiered System for the Distribution of
Human Data
Respect for and protection of the
interests of research participants is
fundamental to the NIH’s stewardship of
human genomic data. The informed
consent under which the data or sample
were collected is the basis for the
submitting institution to determine the
appropriateness of data submission to
NIH-designated data repositories, and
whether the data should be available
through open or controlled access.
Controlled-access data in NIHdesignated data repositories are made
available for secondary research only
after investigators have obtained
approval from the NIH to use the
requested data for a particular project.
Open-access data are publicly available
without restriction (e.g., The 1000
Genomes Project 34).
4. Informed Consent
Submitting institutions, through their
Institutional Review Boards (IRBs), are
to review the informed consent
materials for studies that are to be
submitted to NIH-designated data
repositories to determine whether the
data are appropriate for sharing for
VerDate Mar<15>2010
17:24 Sep 19, 2013
Jkt 229001
secondary research use. Specific
considerations may vary with the type
of study and whether the data are
obtained through prospective or
retrospective data collections. The NIH
provides additional information on
issues related to the respect for research
participant interests in its Points To
Consider for IRBs and Institutions in
Their Review of Data Submission Plans
for Institutional Certifications.35 This
and other policy-related documents will
be updated once the Policy is final.
For studies initiated after the effective
date of this Policy, the NIH expects the
informed consent process and
documents to state that a participant’s
genomic and phenotypic data may be
shared broadly for future research
purposes and also explain whether the
data will be shared through open or
controlled access. If human genomic
data are to be shared in open-access
repositories, the NIH expects that
participants will have provided explicit
consent for sharing their data through
open-access mechanisms. For studies
proposing to use cell lines or clinical
specimens,36 the NIH expects that
informed consent for future research use
and broad data sharing will have been
obtained even if the cell lines or clinical
specimens are de-identified. If there are
compelling scientific reasons that
necessitate the use of cell lines or
clinical specimens that were created or
collected after the effective date of this
Policy and that lack consent for research
use and data sharing, investigators
should provide a justification for the use
of any such materials in the funding
request.
For studies using data or specimens
collected before the effective date of this
Policy, there may be considerable
variation in the extent to which data
sharing and future genomic research
was addressed within the informed
consent materials for the primary
research. In these cases, an assessment
by an IRB, Privacy Board, or equivalent
group is essential to ensure that data
submission is not inconsistent with the
informed consent provided by the
research participant.
The NIH will accept data derived
from cell lines or clinical specimens
lacking consent for research use that
were created or collected before the
effective date of this Policy.
Grandfathered genomic data that are
currently available through open access
may be submitted to an open-access
NIH-designated data repository;
otherwise, the data should be submitted
to a controlled-access NIH-designated
data repository.
While the NIH encourages broad
access to genomic data, in some
PO 00000
Frm 00026
Fmt 4703
Sfmt 4703
circumstances broad sharing may be
inconsistent with the informed consent
of the research participants whose data
are included in the dataset. In such
circumstances, institutions planning to
submit aggregate- or individual-level
data to the NIH for controlled access
should note any data use limitations in
the data sharing or data management
plan submitted as part of the funding
request. These data use limitations
should be specified in the Institutional
Certification submitted to the NIH prior
to award.
5. Institutional Certification
The responsible Institutional Signing
Official of the submitting institution
should provide an Institutional
Certification to the funding IC prior to
award. The Institutional Certification
should indicate whether the data will be
submitted to an open- or controlledaccess database and assure that:
• The data submission is consistent
with applicable laws, regulations, and
institutional policies; 37
• The appropriate research uses of the
data and any uses that are specifically
excluded in the informed consent
documents are delineated; 38
• The identities of research
participants will not be disclosed to
NIH-designated data repositories; and
• An IRB, Privacy Board, and/or
equivalent body 39 has reviewed the
investigator’s proposal for data
submission and assures that:
Æ The protocol for the collection of
genomic and phenotypic data was
consistent with 45 CFR part 46;
Æ Data submission and subsequent
data sharing for research purposes are
consistent with the informed consent of
study participants from whom the data
were obtained; 40
Æ Risks to individuals and their
families associated with data submitted
to NIH-designated data repositories
were considered;
Æ To the extent relevant and possible,
risks to groups or populations
associated with data submitted to NIHdesignated data repositories were
considered; and
Æ The investigator’s plan for deidentifying datasets is consistent with
the standards outlined in this Policy
(see section IV.C.1.).
Institutions should indicate in the
certification whether aggregate genomic
data from datasets with data use
limitations may be appropriate for
general research use (i.e., use for any
research question such as research to
understand the biological mechanisms
underlying disease, development of
statistical research methods, the study
of populations origins). If so, the
E:\FR\FM\20SEN1.SGM
20SEN1
Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices
aggregate genomic data will be made
available through the controlled-access
compilation of aggregate genomic data 41
to facilitate secondary research.
6. Data Withdrawal
Submitting investigators and their
institutions may request removal of data
on individual participants from NIHdesignated data repositories in the event
that a research participant withdraws
his or her consent. However, data that
have been distributed for approved
research use cannot be retrieved.
7. Exceptions to Data Submission
Expectations
The NIH acknowledges that in some
cases, circumstances beyond the control
of investigators may preclude
submission of data to NIH-designated
data repositories (e.g., country or state
laws that prohibit data submission to a
U.S. federal database). In such cases,
investigators should provide a
justification for any exceptions
requested in the application or proposal.
The funding IC may grant an exception
to the submission of relevant data to the
NIH, and the investigator would be
expected to develop a plan to share data
through other mechanisms. For
transparency purposes, when
exceptions are granted, studies will still
be registered in dbGaP and the reason
for the exception will be included in the
registration record. Information about
current expectations for exception
requests will be made available on the
GDS Web site.
mstockstill on DSK4VPTVN1PROD with NOTICES
V. Responsibilities of Investigators
Accessing and Using Genomic Data
A. Requests for Controlled-Access Data
Access to human data is through a
two-tiered model involving open- and
controlled-data access mechanisms.
Requests for controlled-access data 42
are reviewed by NIH Data Access
Committees (DACs).43 DAC decisions
are based primarily upon conformance
of the proposed research as described in
the access request to the data use
limitations established by the
submitting institution through the
Institutional Certification. The NIH
DACs will accept requests for proposed
research uses beginning one month
prior to the anticipated data release
date. The access period for all
controlled-access data is one year; at the
end of each approved period, data users
can request an additional year of access
or close out the project.
Investigators approved to download
controlled-access data from NIHdesignated data repositories and their
institutions are expected to abide by the
VerDate Mar<15>2010
17:24 Sep 19, 2013
Jkt 229001
NIH User Code of Conduct 44 through
their agreement to the Data Use
Certification.45 The Data Use
Certification, co-signed by the
investigators requesting the data and
their Institutional Signing Official,
specifies the terms and conditions for
the secondary research use of
controlled-access data, such as:
• Using the data only for the
approved research;
• Protecting data confidentiality;
• Following all applicable laws,
regulations, and local institutional
policies and procedures for handling
genomic data;
• Not attempting to identify
individual participants from whom the
data were obtained;
• Not selling any of the data obtained
from the NIH-designated data
repositories;
• Not sharing any of the data obtained
from the NIH-designated data
repositories with individuals other than
those listed in the data access request;
• Agreeing to the listing of a summary
of approved research uses in dbGaP
along with the investigator’s name and
organizational affiliation;
• Agreeing to report, in real time,
violations of the GDS Policy to the
appropriate DAC;
• Providing annual updates on
research using controlled-access
datasets.
For investigators who are approved to
use the data, the NIH maintains
guidance on security practices 46 that
outlines expected data security
protections (e.g., physical security
measures and user training) to ensure
that the data are kept secure and not
released to any person not permitted to
access the data.
B. Acknowledgment Responsibilities
The NIH expects all investigators who
access genomic datasets from NIHdesignated data repositories to
acknowledge in all resulting oral or
written presentations, disclosures, or
publications the contributing
investigator(s) who conducted the
original study, the funding
organization(s) that supported the work,
the specific dataset(s) and applicable
accession number(s), and the NIHdesignated data repositories through
which the investigator accessed any
data.
VI. Intellectual Property
Naturally occurring DNA sequences
are not patentable in the United
States.47 Therefore, basic sequence data
and certain related information (e.g.,
genotypes, haplotypes, p values, allele
PO 00000
Frm 00027
Fmt 4703
Sfmt 4703
57863
frequencies) are pre-competitive, and
such data made available through NIHdesignated data repositories and all
conclusions derived directly from them
should remain freely available, without
any licensing requirements, for uses
such as markers for developing assays
and guides for identifying new potential
targets for drugs, therapeutics, and
diagnostics. In addition, the NIH
discourages the use of patents to prevent
the use of or block access to genomic or
genotype-phenotype data developed
with NIH support. The NIH encourages
broad use of NIH-funded genomic data
that is consistent with a responsible
approach to management of intellectual
property derived from downstream
discoveries, as outlined in the NIH Best
Practices for the Licensing of Genomic
Inventions 48 and Research Tools
Policy.49 The NIH encourages patenting
of technology suitable for subsequent
private investment that may lead to the
development of products that address
public needs.
Appendix A
Supplemental Information for the NIH
Genomic Data Sharing Policy
Overview
This document provides additional
guidance on the types of research projects to
which the Genomic Data Sharing (GDS)
Policy applies and the NIH’s expectations for
data submission and release.
Examples of Types of Research Covered
Under the GDS Policy
The GDS Policy is applicable to any NIHfunded research project involving nonhuman
organisms or human specimens that
produces genomic, metagenomic,
epigenomic, or transcriptomic data from
large-output sequencing instruments or
genotyping platforms, such as projects that
involve:
• Sequence data from tens of isolates from
infectious organisms.
• Sequencing more than one gene or genesized region in more than 100 participants.
• More than 10,000 genes or regions from
one participant (e.g., whole genome
sequencing).
• More than 100,000 variant sites in more
than 100 participants.
Expectations for Data Submission and Data
Release
Data submitted to NIH-designated data
repositories undergo different levels of data
processing, and the expectations for data
submission and data release are based on
those levels. The table and text below
describe the expectations for each level. The
NIH will review these expectations at regular
intervals, and any updates will be published
on the GDS Web site and the research
community will be notified through
appropriate communication methods (e.g.,
The NIH Guide for Grants and Contracts).
E:\FR\FM\20SEN1.SGM
20SEN1
57864
Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices
Level
General description of data
processing
Example data types
Data submission expectation
0 ...............
Raw data generated directly
from the instrument platform.
Initial sequence reads, the
most fundamental form of
the data after the basic
translation of raw input.
Instrument image data ............
Not expected ...........................
NA.
DNA sequencing reads, ChIPSeq reads, RNA-Seq reads,
SNP arrays, arrayCGH.
Not expected for human data
if reads are included in
Level 2 aligned sequence
file (e.g., BAM).
Nonhuman de novo sequence
data.
Project specific, generally within 3 months after data generation.
NA.
1 ...............
2 ...............
3 ...............
mstockstill on DSK4VPTVN1PROD with NOTICES
4 ...............
Data after an initial round of
analysis or computation to
clean the data and assess
basic quality measures.
Analysis to identify genetic
variants, gene expression
patterns, or other features of
the dataset.
Final analysis that relates the
genomic data to phenotype
or other biological states.
Level 0 and level 1 data are the raw images
and initial sequence reads, respectively, and
have limited value to secondary data users.
NIH policy does not expect submission of
these data. An exception is made for de novo
sequencing of nonhuman organisms unless
those read data are provided within the level
2 submission. In the case of de novo
sequencing for nonhuman organisms,
investigators who are submitting level 1 data
may request a holding period, not to exceed
six months, during which the datasets will
not be released for use by other investigators.
For data submitted to NIH-designated data
repositories, provisions may be made for
creating an exchange area in which such
datasets may be shared among investigative
teams prior to general release.
Submission of array-based data, such as
gene expression, ChIP-chip, ArrayCGH, and
SNP arrays can be submitted to GEO as level
1 data, which will not be accessible until a
manuscript describing the data is published.
It is the submitter’s responsibility to ensure
that the data and files submitted to GEO
protect participant privacy in accordance
with all applicable laws, regulations, and
institutional policies, including the GDS
Policy.
Level 2 constitutes a computational
analysis in the form of higher order assembly
or placement of the sequencing reads on a
reference template. For human sequencing
projects, the level 2 file comprises the reads
‘‘piled’’ on a reference human genome. A
submission would be a file (e.g., binary
alignment matrix (BAM) files) usually
containing the unmapped reads as well.
GWAS and other types of projects (e.g., RNA
expression profiling or de novo sequencing)
would also generate a level 2 placement or
assembly file.
Generation of data files at level 2 generally
requires substantial analysis and quality
checks relating to both breadth of coverage of
the targeted region and accuracy of assembly.
Sufficient time will be allowed to complete
the analysis and generate the assembly, up to
the coverage and quality thresholds specified
by a project or investigative team. In general,
it is anticipated that this work could
VerDate Mar<15>2010
17:24 Sep 19, 2013
Jkt 229001
DNA sequence alignments to
a reference sequence or de
novo assembly, RNA expression profiling.
SNP or structural variant calls,
expression peaks,
epigenomic features.
Genotype-phenotype relationships, relationships of RNA
expression or epigenomic
patterns to biological state.
Project specific, generally within 3 months after data generation.
Data submitted as analyses
are completed.
reasonably be completed within three
months, and data submission would follow
shortly thereafter. Data files may be held in
an exchange area accessible only to the
submitting investigators and collaborators for
a period not to exceed six months from the
time of submission. Following this period of
exclusivity, the data will be available for
research access without restrictions on
publication.
Phenotype or clinical data should be
submitted to the NIH-designated data
repository at the earliest opportunity, but no
later than the date of level 2 genomic data
submission (or levels 2 and 3 for GWAS
datasets), especially for studies in which all
phenotype data have already been gathered.
For studies in which phenotype data
collections are ongoing and/or may be
regularly updated, data files should be
submitted to NIH-designated data
repositories as early as possible considering
the practical needs for ensuring data
accuracy; generally speaking, this time
should not exceed six months after data
collection.
Level 3 includes analysis to identify
variants or to elucidate other features of the
genomic dataset, such as gene expression
patterns in an RNAseq assay. Level 3 data
may be generated from a single level 2 data
file (e.g., variant sites versus the human
reference genome), but will often derive from
a compilation of sequencing assemblies (e.g.,
in a genome study of a specific cancer type).
Data submission expectations for level 3 files
will vary substantially by project and
therefore will require consultation with NIH
program staff. As in level 2 data submission,
level 3 files will be date stamped and the
data producer may request a period of
exclusivity not to exceed six months, after
which time the datasets will be released
through open- or controlled-access
mechanisms as appropriate and without
publication limitations.
Level 4 constitutes the final analysis,
relating the genomic datasets to phenotype or
other biological states as pertinent to the
research objective. Data in this level are the
project findings or the publication dataset.
PO 00000
Frm 00028
Fmt 4703
Sfmt 4703
Data release timeline
Up to 6 months for nonhuman
data.
Up to 6 months after data submission or at the time of acceptance of the first publication, whichever occurs first.
Up to 6 months after data submission or at the time of acceptance of the first publication, whichever occurs first.
Data released with publication.
Investigators should submit these data prior
to publication, and the data will be released
concurrent with publication.
References
1 The genome is the entire set of genetic
instructions found in a cell. See https://
ghr.nlm.nih.gov/glossary=genome.
2 Final NIH Statement on Sharing Research
Data. February 26, 2003. See https://
grants.nih.gov/grants/guide/notice-files/
NOT-OD-03-032.html.
3 NIH Intramural Policy on Large Database
Sharing. April 5, 2002. See https://
sourcebook.od.nih.gov/ethic-conduct/largedb-sharing.htm.
4 Policy for Sharing of Data Obtained in
NIH Supported or Conducted Genome-Wide
Association Studies (GWAS). August 28,
2007. See https://grants.nih.gov/grants/guide/
notice-files/NOT-OD-07-088.html.
5 A GWAS is defined as any study of
genetic variation across the entire human
genome that is designed to identify genetic
associations with observable traits (such as
blood pressure or weight), or the presence or
absence of a disease or condition.
6 Notice on Development of Data Sharing
Policy for Sequence and Related Genomic
Data. October 19, 2009. See https://
grants.nih.gov/grants/guide/notice-files/
NOT-HG-10-006.html.
7 Office of Science and Technology Policy
Memorandum, Expanding Public Access to
the Results of Federally Funded Research.
February 22, 2013. See https://
www.whitehouse.gov/blog/2013/02/22/
expanding-public-access-results-federallyfunded-research.
8 ‘‘De-identified’’ refers to removing
information that could be used to associate
a dataset or record with a human individual.
Under this Policy, data should be deidentified according to the standards set forth
in the HHS Regulations for the Protection of
Human Subjects and the Health Insurance
Portability and Accountability Act (HIPAA)
Privacy Rule. The HIPAA Privacy Rule lists
18 identifiers that must be removed to
classify data as de-identified. For the full list,
E:\FR\FM\20SEN1.SGM
20SEN1
mstockstill on DSK4VPTVN1PROD with NOTICES
Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices
see https://privacyruleandresearch.nih.gov/
pr_08.asp.
9 An Institutional Signing Official is
generally a senior official at an institution
who is credentialed through the NIH eRA
Commons system and is authorized to enter
the institution into a legally binding contract
and sign on behalf of an investigator who has
submitted data or a data access request to the
NIH.
10 The NIH’s mission is to seek
fundamental knowledge about the nature and
behavior of living systems and the
application of that knowledge to enhance
health, lengthen life, and reduce illness and
disability. See https://www.nih.gov/about/
mission.htm.
11 Final NIH Statement on Sharing
Research Data. February 26, 2003. See https://
grants.nih.gov/grants/guide/notice-files/
NOT-OD-03-032.html.
12 NIH Intramural Policy on Large Database
Sharing. April 5, 2002. See https://
sourcebook.od.nih.gov/ethic-conduct/largedb-sharing.htm.
13 GWAS has the same definition in this
policy as in the 2007 GWAS Policy: a study
in which the density of genetic markers and
the extent of linkage disequilibrium should
be sufficient to capture (by the r2 parameter)
a large proportion of the common variation
in the genome of the population under study,
and the number of samples (in a case-control
or trio design) should provide sufficient
power to detect variants of modest effect.
14 Competing grant applications encompass
all activities with a research component,
including but not limited to the following:
Research Grants (Rs), Program Projects (Ps),
Cooperative Research Mechanisms (Us),
Career Development Awards (Ks), and SCORs
and other S grants with a research
component.
15 Investigators should refer to funding
announcements or IC Web sites for contact
information.
16 NIH Policy on Sharing of Model
Organisms for Biomedical Research. Release
Date May 7, 2004. See https://grants.nih.gov/
grants/guide/notice-files/NOT-OD-04042.html.
17 Gene Expression Omnibus at https://
www.ncbi.nlm.nih.gov/geo/.
18 Sequence Read Archive at https://
www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?.
19 Trace Archive at https://
www.ncbi.nlm.nih.gov/Traces/trace.cgi.
20 Array Express at https://www.ebi.ac.uk/
arrayexpress/.
21 Mouse Genome Informatics at https://
www.informatics.jax.org/.
22 WormBase at https://www.wormbase.org.
23 The Zebrafish Model Organism Database
at https://zfin.org/.
24 GenBank at https://
www.ncbi.nlm.nih.gov/genbank/.
25 European Nucleotide Archive at https://
www.ebi.ac.uk/ena/.
26 DNA Data Bank of Japan at https://
www.ddbj.nig.ac.jp/.
27 A period for data preparation is
anticipated prior to data submission to the
NIH, and the appropriate time intervals for
that data preparation (or data cleaning) will
be subject to the particular data type and
project plans (see Appendix A). Investigators
VerDate Mar<15>2010
17:24 Sep 19, 2013
Jkt 229001
should work with NIH Program or Project
Officials for specific guidance.
28 See 45 CFR 46.102(f) at https://
www.hhs.gov/ohrp/humansubjects/guidance/
45cfr46.html#46.102.
29 See 45 CFR 164.514(b)(2). The list of
HIPAA identifiers that must be removed is
available at: https://www.gpo.gov/fdsys/pkg/
CFR-2002-title45-vol1/pdf/CFR-2002-title45vol1-sec164-514.pdf.
30 For additional information about
Certificates of Confidentiality, see https://
grants.nih.gov/grants/policy/coc/.
31 Confidentiality Certificate. HG–2009–01.
Issued to the National Center for
Biotechnology Information, National Library
of Medicine, NIH. See https://www.ncbi.
nlm.nih.gov/projects/gap/cgi-bin/
GetPdf.cgi?document_name=Confidentiality
Certificate.pdf.
32 Database of Genotypes and Phenotypes
at https://www.ncbi.nlm.nih.gov/gap.
33 Cancer Genomics Hub at https://
cghub.ucsc.edu/.
34 The 1000 Genomes Project at https://
www.1000genomes.org/.
35 Points to Consider for IRBs and
Institutions in their Review of Data
Submission Plans for Institutional
Certifications. See https://gwas.nih.gov/pdf/
PTC_for_IRBs_and_Institutions_revised5-3111.pdf.
36 Clinical specimens are specimens that
have been obtained through clinical practice.
37 For the submission of data derived from
cell lines or clinical specimens lacking
research consent that were created or
collected before the effective date of this
Policy, the Institutional Certification needs to
address only this item.
38 For guidance on clearly communicating
inappropriate data uses, see NIH Points to
Consider in Drafting Effective Data Use
Limitation Statements, https://gwas.nih.gov/
pdf/NIH_PTC_in_Drafting_DUL_
Statements.pdf.
39 ‘‘Equivalent body’’ is used here to
acknowledge that some primary studies may
be conducted abroad and in such cases the
expectation is that an analogous review
committee to an IRB or Privacy Board (e.g.,
Research Ethics Committees) may be asked to
participate in the presubmission review of
proposed genomic projects.
40 As noted earlier, for studies using data
or specimens collected before the effective
date of this Policy, the IRB or Privacy Board
should review informed consent materials to
ensure that data submission is not
inconsistent with the informed consent
provided by the research participants.
41 Compilation of Aggregate Genomic Data.
dbGaP study accession: phs000501.v1.p1.
See https://www.ncbi.nlm.nih.gov/projects/
gap/cgi-bin/study501.cgi?study_
id=phs000501.v1.p1&pha=&phaf=.
42 dbGaP Authorized Access. See https://
dbgap.ncbi.nlm.nih.gov/aa/
wga.cgi?page=login.
43 For a list of NIH Data Access
Committees, see https://gwas.nih.gov/04po2_
1DAC.html.
44 User Code of Conduct. See https://
dbgap.ncbi.nlm.nih.gov/aa/GWAS_Code_of_
Conduct.html.
PO 00000
Frm 00029
Fmt 4703
Sfmt 4703
57865
45 Model Data Use Certification Agreement.
See https://gwas.nih.gov/pdf/Model_DUC_726-13.pdf.
46 Security Best Practices. See https://
www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/
GetPdf.cgi?document_name=dbgap_2b_
security_procedures.pdf.
47 In Association for Molecular Pathology
et al. v. Myriad Genetics, Inc., et al. 569 U.S.
___ 2013. See https://www.supremecourt.gov/
opinions/12pdf/12-398_1b7d.pdf.
48 NIH Best Practices for the Licensing of
Genomic Inventions. See https://www.ott.nih.
gov/policy/genomic_invention.html.
49 Research Tools Policy. See https://
www.ott.nih.gov/policy/research_tool.aspx.
Dated: September 16, 2013.
Lawrence A. Tabak,
Deputy Director, National Institutes of Health.
[FR Doc. 2013–22941 Filed 9–19–13; 8:45 am]
BILLING CODE 4140–01–P
DEPARTMENT OF HEALTH AND
HUMAN SERVICES
National Institutes of Health
National Institute of Mental Health;
Notice of Meeting
Pursuant to section 10(a) of the
Federal Advisory Committee Act, as
amended (5 U.S.C. App.), notice is
hereby given of an Interagency Autism
Coordinating Committee (IACC or
Committee) meeting.
The purpose of the IACC meeting is
to discuss committee business, updates
and issues related to autism spectrum
disorder (ASD) research and services
activities. The meeting will be open to
the public and will be accessible by
webcast and conference call.
Name of Committee: Interagency Autism
Coordinating Committee (IACC).
Type of meeting: Open Meeting.
Date: October 9, 2013.
Time: 9:00 a.m. to 5:00 p.m.* Eastern
Time * Approximate end time.
Agenda: To discuss committee business,
updates and issues related to ASD research
and services activities.
Place: Fishers Lane Conference Center,
5635 Fisher Lane, Terrace Level, Rockville,
MD 20852, (Parking on site.)
Web cast Live: https://videocast.nih.gov/.
Conference Call: Dial: 888–989–4620.
Access: Access code: 2327818.
Cost: The meeting is free and open to the
public.
Registration: Pre-registration is
recommended to expedite check-in. Seating
in the meeting room is limited to room
capacity and on a first come, first served
basis. To register, please visit
www.iacc.hhs.gov.
Deadlines: Notification of intent to present
oral comments: Friday, September 27, 2013
by 5:00 p.m. e.t.
Submission of written/electronic statement
for oral comments: Wednesday, October 2,
2013 by 5:00 p.m. e.t.
E:\FR\FM\20SEN1.SGM
20SEN1
Agencies
[Federal Register Volume 78, Number 183 (Friday, September 20, 2013)]
[Notices]
[Pages 57860-57865]
From the Federal Register Online via the Government Printing Office [www.gpo.gov]
[FR Doc No: 2013-22941]
-----------------------------------------------------------------------
DEPARTMENT OF HEALTH AND HUMAN SERVICES
National Institutes of Health
Draft NIH Genomic Data Sharing Policy Request for Public Comments
SUMMARY: The National Institutes of Health (NIH) is seeking public
comments on the draft Genomic Data Sharing (GDS) Policy that promotes
sharing, for research purposes, of large-scale human and nonhuman
genomic \1\ data generated from NIH-supported and NIH-conducted
research.
DATES: To ensure that your comments will be considered, please submit
your response to this Request for Comments no later than 60 days after
publication of this notice.
ADDRESSES: Submit comments by any of the following methods:
Online: https://gds.nih.gov/survey.aspx.
Fax: 301-496-9839.
Mail/Hand delivery/Courier (for paper, disk, or CD-ROM
submissions) to: Genomic Data Sharing Policy Team, Office of Science
Policy, National Institutes of Health, 6705 Rockledge Drive, Suite 750,
Bethesda, MD 20892.
FOR FURTHER INFORMATION CONTACT: Genomic Data Sharing Policy Team,
Office of Science Policy, National Institutes of Health, 6705 Rockledge
Drive, Suite 750, Bethesda, MD 20892, 301-496-9838, GDS@mail.nih.gov.
SUPPLEMENTARY INFORMATION:
Background
The NIH's mission is to seek fundamental knowledge about the nature
and behavior of living systems and the application of that knowledge to
enhance health, lengthen life, and reduce illness and disability. The
draft GDS Policy supports this mission by promoting the sharing of
genomic research data, which maximizes the knowledge gained. Not only
does data sharing allow data generated from one research study to be
used to explore a wide range of additional research questions, it also
enables data from multiple projects to be combined, amplifying the
scientific value of data many times. Broad research use of the data
enhances public benefit by helping to speed discoveries that increase
the understanding of biological processes that affect human health and
the development of better ways to diagnose, treat, and prevent disease.
The NIH has promoted data sharing for many years, and in 2003, the
NIH issued a general policy for sharing research data.2 3 In
2007, the NIH issued a more specific policy to promote sharing of data
generated through genome wide association studies (GWAS),4 5
which examine thousands of single nucleotide polymorphisms (SNPs)
across the genome to identify genetic variants that contribute to human
diseases, conditions, and traits. To facilitate the sharing of genomic
and phenotypic data from GWAS, the NIH created the database of
Genotypes and Phenotypes (dbGaP) with a two-tiered system for
distributing the data: Open access, for data that are available to the
public without restrictions, and controlled access for data that are
made available only for research purposes that are consistent with the
original informed consent under which the data were collected.
Not long after the GWAS policy was issued, advances in DNA
sequencing and other high-throughput technologies, and a steep drop in
DNA sequencing costs, enabled the NIH to fund research that generated
even greater volumes of GWAS and other types of genomic data. In 2009,
the NIH announced 6 its intention to extend the GWAS Policy
to encompass data from a wider range of genomic research.
The draft GDS Policy applies to research involving nonhuman genomic
data as well as human data that are generated through array-based and
high-throughput genomic technologies (e.g., SNP, whole-genome,
transcriptomic, epigenomic, and gene expression data). (See section II
of the draft Policy.) The NIH considers access to such data
particularly important because of the opportunities to accelerate
research through the power of combining such large and information-rich
datasets. The draft GDS Policy is aligned with Administration
priorities and a recent directive to agencies to increase access to
digital scientific data resulting from federally funded
research.7
Overview of the Policy
The draft GDS Policy describes the responsibilities of
investigators and institutions for the submission of nonhuman and human
genomic data to the NIH (section IV) and the use of controlled-access
data (section V). The Policy also provides expectations regarding
intellectual property (section VI).
When data sharing involves human data, the protection of research
participant privacy and confidentiality is paramount, and the Policy
reflects the NIH's continued commitment to responsible data
stewardship, which is essential to uphold the public trust in
biomedical research. The draft GDS Policy, like the GWAS Policy,
includes a number of provisions to protect
[[Page 57861]]
research participant privacy (see section IV.C). For example, prior to
data submission, traditional identifiers such as name, date of birth,
street address, and social security number should be removed. The de-
identified 8 data are coded using a random, unique code to
protect participant privacy. The NIH also maintains the expectation
established under the GWAS Policy that the responsible Institutional
Signing Official 9 of the submitting institution should
provide an Institutional Certification to the funding NIH Institute or
Center prior to award. An Institutional Certification assures that the
data have been or will be collected in a legal and ethically
appropriate manner and have been de-identified. The draft GDS Policy
clarifies the provisions of the Institutional Certification for
datasets submitted to NIH-designated data repositories in Section
IV.C.5.
The NIH expects the Policy to be effective 60 days after the
publication of the final Policy.
Request for Comments
As part of the process of developing the GDS Policy, the NIH
encourages the public to provide comments on any aspect of the draft
GDS Policy.
Comments should be submitted electronically to https://gds.nih.gov/survey.aspx. Comments may also be submitted by fax (301-496-9839), or
mailed to the Genomic Data Sharing Policy Team, Office of Science
Policy, National Institutes of Health, 6705 Rockledge Drive, Suite 750,
Bethesda, MD 20892.
Responding to this request for comments is voluntary. Submitted
comments are considered public information; do not include any
information that you wish to remain private and confidential. Comments
in their entirety will be posted along with the submitter's name and
affiliation on the NIH GDS Web site after the public comment period
closes. Commenters will receive a confirmation acknowledging receipt of
comments but will not receive individual feedback on any suggestions.
Please note that the government will not pay for the use of any
information contained in the response.
The NIH intends to hold one or more public webinars on the draft
Policy. Information about the webinars will be made available at https://gds.nih.gov.
Draft NIH Genomic Data Sharing Policy
I. Purpose
The draft Genomic Data Sharing (GDS) Policy sets forth expectations
that ensure the broad and responsible sharing of genomic research data.
Sharing research data supports the NIH mission 10 and is
essential to facilitate the translation of research results into
knowledge, products, and procedures that improve human health. The NIH
has longstanding policies to make data publicly available in a timely
manner from the research activities that it funds.11 12
II. Scope and Applicability
This Policy applies to all NIH-funded research that involves large-
scale human and nonhuman genomic data produced by array-based or high-
throughput genomic technologies, such as GWAS 13 SNP, whole-
genome, transcriptomic, epigenomic, and gene expression data,
irrespective of funding level and funding mechanism (i.e., grant,
contract, or intramural support). Appendix A provides examples of
research that are subject to the Policy. At appropriate intervals, the
NIH will review the types of research to which this Policy may be
applicable, and changes to the scope will be defined in supplementary
materials to the final GDS Policy. Notification of any changes will be
provided to investigators and institutions through standard NIH
communication channels (e.g., NIH Guide for Grants and Contracts).
Compliance with this Policy will become a special term and
condition in the Notice of Award or the Contract Award. Failure to
comply with the terms and conditions of the funding agreement could
lead to enforcement actions, including the withholding of funding,
consistent with 45 CFR 74.62 and/or other authorities, as appropriate.
III. Effective Date
The effective date of this Policy is [To Be Determined], and
pertains to the following funding mechanisms:
Competing grant applications 14 that are
submitted to the NIH as of the [TBD] receipt date;
Proposals for contracts that are submitted to the NIH as
of [TBD]; and
NIH intramural research projects that are approved as of
[TBD].
IV. Responsibilities of Investigators Submitting Genomic Data
A. Data Sharing Plans
Investigators seeking NIH funding should contact appropriate
Institute or Center (IC) Program or Project Officials 15 as
early as possible to discuss data sharing expectations and timelines
that would apply to their proposed studies. Investigators and their
institutions are expected to address plans for following this Policy in
the data sharing section of funding applications and proposals. Any
resources needed to support a proposed data sharing plan should be
included in the project's budget. NIH intramural investigators are
expected to address data sharing plans with their IC scientific
leadership prior to initiating applicable research and are encouraged
to contact their IC leadership or the Office of Intramural Research for
guidance.
B. Nonhuman and Model Organism Genomic Data
1. Data Submission Expectations and Timeline
Nonhuman data (including microbial and microbiome data) and data
from large-scale genomic projects for model organisms 16 are
to be shared in a timely manner. Investigators should make nonhuman and
model organism data publicly available no later than the date of
initial publication. However, certain data types or NIH research
initiatives may expect an earlier data release (e.g., microbial or
microbiome data, or projects with broad utility as a resource for the
scientific community). (See Appendix A for specific expectations for
data submission and release.)
2. Data Repositories
Data should be made available through any widely used data
repository, whether NIH-funded or not, such as the Gene Expression
Omnibus (GEO),17 Sequence Read Archive (SRA),18
Trace Archive,19 Array Express,20 Mouse Genome
Informatics (MGI),21 WormBase,22 the Zebrafish
Model Organism Database (ZFIN),23 GenBank,24
European Nucleotide Archive (ENA),25 or DNA Data Bank of
Japan (DDBJ).26
C. Human Genomic Data
1. Data Submission Expectations and Timeline
Guidance to govern human genomic data submission timelines and data
release expectations is provided in Appendix A. The NIH will release
data submitted to NIH-designated data repositories without restrictions
on publication or other dissemination no later than six months after
the initial data submission to an NIH-designated data
repository,27 or at the time of acceptance of the first
publication, whichever occurs first.
Human data that are submitted to NIH-designated data repositories
should be de-identified according to the standards set forth in the HHS
Regulations for the Protection of Human
[[Page 57862]]
Subjects 28 and the Health Insurance Portability and
Accountability Act (HIPAA) Privacy Rule.29 The de-identified
data should be assigned a random, unique code, and the key held by the
submitting institution.
The NIH encourages researchers and institutions submitting large-
scale genomic datasets to NIH-designated data repositories to consider
whether a Certificate of Confidentiality could serve as an additional
safeguard to prevent compelled disclosure of any personally
identifiable information that it may hold.30 The NIH has
obtained a Certificate of Confidentiality for dbGaP.31
2. Data Repositories
Applicable studies with human genomic data should be registered in
the database of Genotypes and Phenotypes (dbGaP) 32 no later
than the time that data cleaning and quality control measures begin.
Investigators should submit human data to the relevant NIH-designated
data repository (e.g., dbGaP, GEO, SRA, the Cancer Genomics Hub
33). NIH-designated data repositories need not be the
exclusive source for facilitating the sharing of genomic data.
Investigators who elect to submit data to a non-NIH-designated data
repository should confirm that appropriate data security,
confidentiality, and privacy measures are in place.
3. Tiered System for the Distribution of Human Data
Respect for and protection of the interests of research
participants is fundamental to the NIH's stewardship of human genomic
data. The informed consent under which the data or sample were
collected is the basis for the submitting institution to determine the
appropriateness of data submission to NIH-designated data repositories,
and whether the data should be available through open or controlled
access. Controlled-access data in NIH-designated data repositories are
made available for secondary research only after investigators have
obtained approval from the NIH to use the requested data for a
particular project. Open-access data are publicly available without
restriction (e.g., The 1000 Genomes Project 34).
4. Informed Consent
Submitting institutions, through their Institutional Review Boards
(IRBs), are to review the informed consent materials for studies that
are to be submitted to NIH-designated data repositories to determine
whether the data are appropriate for sharing for secondary research
use. Specific considerations may vary with the type of study and
whether the data are obtained through prospective or retrospective data
collections. The NIH provides additional information on issues related
to the respect for research participant interests in its Points To
Consider for IRBs and Institutions in Their Review of Data Submission
Plans for Institutional Certifications.35 This and other
policy-related documents will be updated once the Policy is final.
For studies initiated after the effective date of this Policy, the
NIH expects the informed consent process and documents to state that a
participant's genomic and phenotypic data may be shared broadly for
future research purposes and also explain whether the data will be
shared through open or controlled access. If human genomic data are to
be shared in open-access repositories, the NIH expects that
participants will have provided explicit consent for sharing their data
through open-access mechanisms. For studies proposing to use cell lines
or clinical specimens,\36\ the NIH expects that informed consent for
future research use and broad data sharing will have been obtained even
if the cell lines or clinical specimens are de-identified. If there are
compelling scientific reasons that necessitate the use of cell lines or
clinical specimens that were created or collected after the effective
date of this Policy and that lack consent for research use and data
sharing, investigators should provide a justification for the use of
any such materials in the funding request.
For studies using data or specimens collected before the effective
date of this Policy, there may be considerable variation in the extent
to which data sharing and future genomic research was addressed within
the informed consent materials for the primary research. In these
cases, an assessment by an IRB, Privacy Board, or equivalent group is
essential to ensure that data submission is not inconsistent with the
informed consent provided by the research participant.
The NIH will accept data derived from cell lines or clinical
specimens lacking consent for research use that were created or
collected before the effective date of this Policy. Grandfathered
genomic data that are currently available through open access may be
submitted to an open-access NIH-designated data repository; otherwise,
the data should be submitted to a controlled-access NIH-designated data
repository.
While the NIH encourages broad access to genomic data, in some
circumstances broad sharing may be inconsistent with the informed
consent of the research participants whose data are included in the
dataset. In such circumstances, institutions planning to submit
aggregate- or individual-level data to the NIH for controlled access
should note any data use limitations in the data sharing or data
management plan submitted as part of the funding request. These data
use limitations should be specified in the Institutional Certification
submitted to the NIH prior to award.
5. Institutional Certification
The responsible Institutional Signing Official of the submitting
institution should provide an Institutional Certification to the
funding IC prior to award. The Institutional Certification should
indicate whether the data will be submitted to an open- or controlled-
access database and assure that:
The data submission is consistent with applicable laws,
regulations, and institutional policies; \37\
The appropriate research uses of the data and any uses
that are specifically excluded in the informed consent documents are
delineated; \38\
The identities of research participants will not be
disclosed to NIH-designated data repositories; and
An IRB, Privacy Board, and/or equivalent body \39\ has
reviewed the investigator's proposal for data submission and assures
that:
[cir] The protocol for the collection of genomic and phenotypic
data was consistent with 45 CFR part 46;
[cir] Data submission and subsequent data sharing for research
purposes are consistent with the informed consent of study participants
from whom the data were obtained; \40\
[cir] Risks to individuals and their families associated with data
submitted to NIH-designated data repositories were considered;
[cir] To the extent relevant and possible, risks to groups or
populations associated with data submitted to NIH-designated data
repositories were considered; and
[cir] The investigator's plan for de-identifying datasets is
consistent with the standards outlined in this Policy (see section
IV.C.1.).
Institutions should indicate in the certification whether aggregate
genomic data from datasets with data use limitations may be appropriate
for general research use (i.e., use for any research question such as
research to understand the biological mechanisms underlying disease,
development of statistical research methods, the study of populations
origins). If so, the
[[Page 57863]]
aggregate genomic data will be made available through the controlled-
access compilation of aggregate genomic data \41\ to facilitate
secondary research.
6. Data Withdrawal
Submitting investigators and their institutions may request removal
of data on individual participants from NIH-designated data
repositories in the event that a research participant withdraws his or
her consent. However, data that have been distributed for approved
research use cannot be retrieved.
7. Exceptions to Data Submission Expectations
The NIH acknowledges that in some cases, circumstances beyond the
control of investigators may preclude submission of data to NIH-
designated data repositories (e.g., country or state laws that prohibit
data submission to a U.S. federal database). In such cases,
investigators should provide a justification for any exceptions
requested in the application or proposal. The funding IC may grant an
exception to the submission of relevant data to the NIH, and the
investigator would be expected to develop a plan to share data through
other mechanisms. For transparency purposes, when exceptions are
granted, studies will still be registered in dbGaP and the reason for
the exception will be included in the registration record. Information
about current expectations for exception requests will be made
available on the GDS Web site.
V. Responsibilities of Investigators Accessing and Using Genomic Data
A. Requests for Controlled-Access Data
Access to human data is through a two-tiered model involving open-
and controlled-data access mechanisms. Requests for controlled-access
data \42\ are reviewed by NIH Data Access Committees (DACs).\43\ DAC
decisions are based primarily upon conformance of the proposed research
as described in the access request to the data use limitations
established by the submitting institution through the Institutional
Certification. The NIH DACs will accept requests for proposed research
uses beginning one month prior to the anticipated data release date.
The access period for all controlled-access data is one year; at the
end of each approved period, data users can request an additional year
of access or close out the project.
Investigators approved to download controlled-access data from NIH-
designated data repositories and their institutions are expected to
abide by the NIH User Code of Conduct \44\ through their agreement to
the Data Use Certification.\45\ The Data Use Certification, co-signed
by the investigators requesting the data and their Institutional
Signing Official, specifies the terms and conditions for the secondary
research use of controlled-access data, such as:
Using the data only for the approved research;
Protecting data confidentiality;
Following all applicable laws, regulations, and local
institutional policies and procedures for handling genomic data;
Not attempting to identify individual participants from
whom the data were obtained;
Not selling any of the data obtained from the NIH-
designated data repositories;
Not sharing any of the data obtained from the NIH-
designated data repositories with individuals other than those listed
in the data access request;
Agreeing to the listing of a summary of approved research
uses in dbGaP along with the investigator's name and organizational
affiliation;
Agreeing to report, in real time, violations of the GDS
Policy to the appropriate DAC;
Providing annual updates on research using controlled-
access datasets.
For investigators who are approved to use the data, the NIH
maintains guidance on security practices \46\ that outlines expected
data security protections (e.g., physical security measures and user
training) to ensure that the data are kept secure and not released to
any person not permitted to access the data.
B. Acknowledgment Responsibilities
The NIH expects all investigators who access genomic datasets from
NIH-designated data repositories to acknowledge in all resulting oral
or written presentations, disclosures, or publications the contributing
investigator(s) who conducted the original study, the funding
organization(s) that supported the work, the specific dataset(s) and
applicable accession number(s), and the NIH-designated data
repositories through which the investigator accessed any data.
VI. Intellectual Property
Naturally occurring DNA sequences are not patentable in the United
States.\47\ Therefore, basic sequence data and certain related
information (e.g., genotypes, haplotypes, p values, allele frequencies)
are pre-competitive, and such data made available through NIH-
designated data repositories and all conclusions derived directly from
them should remain freely available, without any licensing
requirements, for uses such as markers for developing assays and guides
for identifying new potential targets for drugs, therapeutics, and
diagnostics. In addition, the NIH discourages the use of patents to
prevent the use of or block access to genomic or genotype-phenotype
data developed with NIH support. The NIH encourages broad use of NIH-
funded genomic data that is consistent with a responsible approach to
management of intellectual property derived from downstream
discoveries, as outlined in the NIH Best Practices for the Licensing of
Genomic Inventions \48\ and Research Tools Policy.\49\ The NIH
encourages patenting of technology suitable for subsequent private
investment that may lead to the development of products that address
public needs.
Appendix A
Supplemental Information for the NIH Genomic Data Sharing Policy
Overview
This document provides additional guidance on the types of
research projects to which the Genomic Data Sharing (GDS) Policy
applies and the NIH's expectations for data submission and release.
Examples of Types of Research Covered Under the GDS Policy
The GDS Policy is applicable to any NIH-funded research project
involving nonhuman organisms or human specimens that produces
genomic, metagenomic, epigenomic, or transcriptomic data from large-
output sequencing instruments or genotyping platforms, such as
projects that involve:
Sequence data from tens of isolates from infectious
organisms.
Sequencing more than one gene or gene-sized region in
more than 100 participants.
More than 10,000 genes or regions from one participant
(e.g., whole genome sequencing).
More than 100,000 variant sites in more than 100
participants.
Expectations for Data Submission and Data Release
Data submitted to NIH-designated data repositories undergo
different levels of data processing, and the expectations for data
submission and data release are based on those levels. The table and
text below describe the expectations for each level. The NIH will
review these expectations at regular intervals, and any updates will
be published on the GDS Web site and the research community will be
notified through appropriate communication methods (e.g., The NIH
Guide for Grants and Contracts).
[[Page 57864]]
----------------------------------------------------------------------------------------------------------------
General
Level description of Example data types Data submission Data release
data processing expectation timeline
----------------------------------------------------------------------------------------------------------------
0............................... Raw data generated Instrument image Not expected...... NA.
directly from the data.
instrument
platform.
1............................... Initial sequence DNA sequencing Not expected for NA.
reads, the most reads, ChIP-Seq human data if
fundamental form reads, RNA-Seq reads are
of the data after reads, SNP included in Level
the basic arrays, arrayCGH. 2 aligned
translation of sequence file
raw input. (e.g., BAM).
Nonhuman de novo Up to 6 months for
sequence data. nonhuman data.
2............................... Data after an DNA sequence Project specific, Up to 6 months
initial round of alignments to a generally within after data
analysis or reference 3 months after submission or at
computation to sequence or de data generation. the time of
clean the data novo assembly, acceptance of the
and assess basic RNA expression first
quality measures. profiling. publication,
whichever occurs
first.
3............................... Analysis to SNP or structural Project specific, Up to 6 months
identify genetic variant calls, generally within after data
variants, gene expression peaks, 3 months after submission or at
expression epigenomic data generation. the time of
patterns, or features. acceptance of the
other features of first
the dataset. publication,
whichever occurs
first.
4............................... Final analysis Genotype-phenotype Data submitted as Data released with
that relates the relationships, analyses are publication.
genomic data to relationships of completed.
phenotype or RNA expression or
other biological epigenomic
states. patterns to
biological state.
----------------------------------------------------------------------------------------------------------------
Level 0 and level 1 data are the raw images and initial sequence
reads, respectively, and have limited value to secondary data users.
NIH policy does not expect submission of these data. An exception is
made for de novo sequencing of nonhuman organisms unless those read
data are provided within the level 2 submission. In the case of de
novo sequencing for nonhuman organisms, investigators who are
submitting level 1 data may request a holding period, not to exceed
six months, during which the datasets will not be released for use
by other investigators. For data submitted to NIH-designated data
repositories, provisions may be made for creating an exchange area
in which such datasets may be shared among investigative teams prior
to general release.
Submission of array-based data, such as gene expression, ChIP-
chip, ArrayCGH, and SNP arrays can be submitted to GEO as level 1
data, which will not be accessible until a manuscript describing the
data is published. It is the submitter's responsibility to ensure
that the data and files submitted to GEO protect participant privacy
in accordance with all applicable laws, regulations, and
institutional policies, including the GDS Policy.
Level 2 constitutes a computational analysis in the form of
higher order assembly or placement of the sequencing reads on a
reference template. For human sequencing projects, the level 2 file
comprises the reads ``piled'' on a reference human genome. A
submission would be a file (e.g., binary alignment matrix (BAM)
files) usually containing the unmapped reads as well. GWAS and other
types of projects (e.g., RNA expression profiling or de novo
sequencing) would also generate a level 2 placement or assembly
file.
Generation of data files at level 2 generally requires
substantial analysis and quality checks relating to both breadth of
coverage of the targeted region and accuracy of assembly. Sufficient
time will be allowed to complete the analysis and generate the
assembly, up to the coverage and quality thresholds specified by a
project or investigative team. In general, it is anticipated that
this work could reasonably be completed within three months, and
data submission would follow shortly thereafter. Data files may be
held in an exchange area accessible only to the submitting
investigators and collaborators for a period not to exceed six
months from the time of submission. Following this period of
exclusivity, the data will be available for research access without
restrictions on publication.
Phenotype or clinical data should be submitted to the NIH-
designated data repository at the earliest opportunity, but no later
than the date of level 2 genomic data submission (or levels 2 and 3
for GWAS datasets), especially for studies in which all phenotype
data have already been gathered. For studies in which phenotype data
collections are ongoing and/or may be regularly updated, data files
should be submitted to NIH-designated data repositories as early as
possible considering the practical needs for ensuring data accuracy;
generally speaking, this time should not exceed six months after
data collection.
Level 3 includes analysis to identify variants or to elucidate
other features of the genomic dataset, such as gene expression
patterns in an RNAseq assay. Level 3 data may be generated from a
single level 2 data file (e.g., variant sites versus the human
reference genome), but will often derive from a compilation of
sequencing assemblies (e.g., in a genome study of a specific cancer
type). Data submission expectations for level 3 files will vary
substantially by project and therefore will require consultation
with NIH program staff. As in level 2 data submission, level 3 files
will be date stamped and the data producer may request a period of
exclusivity not to exceed six months, after which time the datasets
will be released through open- or controlled-access mechanisms as
appropriate and without publication limitations.
Level 4 constitutes the final analysis, relating the genomic
datasets to phenotype or other biological states as pertinent to the
research objective. Data in this level are the project findings or
the publication dataset. Investigators should submit these data
prior to publication, and the data will be released concurrent with
publication.
References
\1\ The genome is the entire set of genetic instructions found
in a cell. See https://ghr.nlm.nih.gov/glossary=genome.
\2\ Final NIH Statement on Sharing Research Data. February 26,
2003. See https://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html.
\3\ NIH Intramural Policy on Large Database Sharing. April 5,
2002. See https://sourcebook.od.nih.gov/ethic-conduct/large-db-sharing.htm.
\4\ Policy for Sharing of Data Obtained in NIH Supported or
Conducted Genome-Wide Association Studies (GWAS). August 28, 2007.
See https://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html.
\5\ A GWAS is defined as any study of genetic variation across
the entire human genome that is designed to identify genetic
associations with observable traits (such as blood pressure or
weight), or the presence or absence of a disease or condition.
\6\ Notice on Development of Data Sharing Policy for Sequence
and Related Genomic Data. October 19, 2009. See https://grants.nih.gov/grants/guide/notice-files/NOT-HG-10-006.html.
\7\ Office of Science and Technology Policy Memorandum,
Expanding Public Access to the Results of Federally Funded Research.
February 22, 2013. See https://www.whitehouse.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research.
\8\ ``De-identified'' refers to removing information that could
be used to associate a dataset or record with a human individual.
Under this Policy, data should be de-identified according to the
standards set forth in the HHS Regulations for the Protection of
Human Subjects and the Health Insurance Portability and
Accountability Act (HIPAA) Privacy Rule. The HIPAA Privacy Rule
lists 18 identifiers that must be removed to classify data as de-
identified. For the full list,
[[Page 57865]]
see https://privacyruleandresearch.nih.gov/pr_08.asp.
\9\ An Institutional Signing Official is generally a senior
official at an institution who is credentialed through the NIH eRA
Commons system and is authorized to enter the institution into a
legally binding contract and sign on behalf of an investigator who
has submitted data or a data access request to the NIH.
\10\ The NIH's mission is to seek fundamental knowledge about
the nature and behavior of living systems and the application of
that knowledge to enhance health, lengthen life, and reduce illness
and disability. See https://www.nih.gov/about/mission.htm.
\11\ Final NIH Statement on Sharing Research Data. February 26,
2003. See https://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html.
\12\ NIH Intramural Policy on Large Database Sharing. April 5,
2002. See https://sourcebook.od.nih.gov/ethic-conduct/large-db-sharing.htm.
\13\ GWAS has the same definition in this policy as in the 2007
GWAS Policy: a study in which the density of genetic markers and the
extent of linkage disequilibrium should be sufficient to capture (by
the r\2\ parameter) a large proportion of the common variation in
the genome of the population under study, and the number of samples
(in a case-control or trio design) should provide sufficient power
to detect variants of modest effect.
\14\ Competing grant applications encompass all activities with
a research component, including but not limited to the following:
Research Grants (Rs), Program Projects (Ps), Cooperative Research
Mechanisms (Us), Career Development Awards (Ks), and SCORs and other
S grants with a research component.
\15\ Investigators should refer to funding announcements or IC
Web sites for contact information.
\16\ NIH Policy on Sharing of Model Organisms for Biomedical
Research. Release Date May 7, 2004. See https://grants.nih.gov/grants/guide/notice-files/NOT-OD-04-042.html.
\17\ Gene Expression Omnibus at https://www.ncbi.nlm.nih.gov/geo/
.
\18\ Sequence Read Archive at https://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?.
\19\ Trace Archive at https://www.ncbi.nlm.nih.gov/Traces/trace.cgi.
\20\ Array Express at https://www.ebi.ac.uk/arrayexpress/.
\21\ Mouse Genome Informatics at https://www.informatics.jax.org/
.
\22\ WormBase at https://www.wormbase.org.
\23\ The Zebrafish Model Organism Database at https://zfin.org/.
\24\ GenBank at https://www.ncbi.nlm.nih.gov/genbank/.
\25\ European Nucleotide Archive at https://www.ebi.ac.uk/ena/.
\26\ DNA Data Bank of Japan at https://www.ddbj.nig.ac.jp/.
\27\ A period for data preparation is anticipated prior to data
submission to the NIH, and the appropriate time intervals for that
data preparation (or data cleaning) will be subject to the
particular data type and project plans (see Appendix A).
Investigators should work with NIH Program or Project Officials for
specific guidance.
\28\ See 45 CFR 46.102(f) at https://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.html#46.102.
\29\ See 45 CFR 164.514(b)(2). The list of HIPAA identifiers
that must be removed is available at: https://www.gpo.gov/fdsys/pkg/CFR-2002-title45-vol1/pdf/CFR-2002-title45-vol1-sec164-514.pdf.
\30\ For additional information about Certificates of
Confidentiality, see https://grants.nih.gov/grants/policy/coc/.
\31\ Confidentiality Certificate. HG-2009-01. Issued to the
National Center for Biotechnology Information, National Library of
Medicine, NIH. See https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=ConfidentialityCertificate.pdf.
\32\ Database of Genotypes and Phenotypes at https://www.ncbi.nlm.nih.gov/gap.
\33\ Cancer Genomics Hub at https://cghub.ucsc.edu/.
\34\ The 1000 Genomes Project at https://www.1000genomes.org/.
\35\ Points to Consider for IRBs and Institutions in their
Review of Data Submission Plans for Institutional Certifications.
See https://gwas.nih.gov/pdf/PTC_for_IRBs_and_Institutions_revised5-31-11.pdf.
\36\ Clinical specimens are specimens that have been obtained
through clinical practice.
\37\ For the submission of data derived from cell lines or
clinical specimens lacking research consent that were created or
collected before the effective date of this Policy, the
Institutional Certification needs to address only this item.
\38\ For guidance on clearly communicating inappropriate data
uses, see NIH Points to Consider in Drafting Effective Data Use
Limitation Statements, https://gwas.nih.gov/pdf/NIH_PTC_in_Drafting_DUL_Statements.pdf.
\39\ ``Equivalent body'' is used here to acknowledge that some
primary studies may be conducted abroad and in such cases the
expectation is that an analogous review committee to an IRB or
Privacy Board (e.g., Research Ethics Committees) may be asked to
participate in the presubmission review of proposed genomic
projects.
\40\ As noted earlier, for studies using data or specimens
collected before the effective date of this Policy, the IRB or
Privacy Board should review informed consent materials to ensure
that data submission is not inconsistent with the informed consent
provided by the research participants.
\41\ Compilation of Aggregate Genomic Data. dbGaP study
accession: phs000501.v1.p1. See https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study501.cgi?study_id=phs000501.v1.p1&pha=&phaf=.
\42\ dbGaP Authorized Access. See https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login.
\43\ For a list of NIH Data Access Committees, see https://gwas.nih.gov/04po2_1DAC.html.
\44\ User Code of Conduct. See https://dbgap.ncbi.nlm.nih.gov/aa/GWAS_Code_of_Conduct.html.
\45\ Model Data Use Certification Agreement. See https://gwas.nih.gov/pdf/Model_DUC_7-26-13.pdf.
\46\ Security Best Practices. See https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=dbgap_2b_security_procedures.pdf.
\47\ In Association for Molecular Pathology et al. v. Myriad
Genetics, Inc., et al. 569 U.S. ------ 2013. See https://www.supremecourt.gov/opinions/12pdf/12-398_1b7d.pdf.
\48\ NIH Best Practices for the Licensing of Genomic Inventions.
See https://www.ott.nih.gov/policy/genomic_invention.html.
\49\ Research Tools Policy. See https://www.ott.nih.gov/policy/research_tool.aspx.
Dated: September 16, 2013.
Lawrence A. Tabak,
Deputy Director, National Institutes of Health.
[FR Doc. 2013-22941 Filed 9-19-13; 8:45 am]
BILLING CODE 4140-01-P