Draft NIH Genomic Data Sharing Policy Request for Public Comments, 57860-57865 [2013-22941]

Download as PDF 57860 Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices parameters that should be routinely assessed in toxicology studies for INDs, NDAs, and BLAs that are designed to determine the potential for a drug to disrupt the endocrine system. This draft guidance also discusses factors that should be considered in determining the need for additional studies to characterize potential endocrine disruptor properties of a drug. This draft guidance is being issued consistent with FDA’s good guidance practices regulation (21 CFR 10.115). The draft guidance, when finalized, will represent the Agency’s current thinking on nonclinical evaluation of endocrine disruption potential of drugs. It does not create or confer any rights for or on any person and does not operate to bind FDA or the public. An alternative approach may be used if such approach satisfies the requirements of the applicable statutes and regulations. II. Paperwork Reduction Act of 1995 This draft guidance refers to previously approved collections of information that are subject to review by the Office of Management and Budget (OMB) under the Paperwork Reduction Act of 1995 (44 U.S.C. 3501–3520). The collections of information in 21 CFR parts 312 and 314 have been approved under OMB control numbers 0910–0014 and 0910–0001, respectively. III. Comments Interested persons may submit either electronic comments regarding this document to https://www.regulations.gov or written comments to the Division of Dockets Management (see ADDRESSES). It is only necessary to send one set of comments. Identify comments with the docket number found in brackets in the heading of this document. Received comments may be seen in the Division of Dockets Management between 9 a.m. and 4 p.m., Monday through Friday, and will be posted to the docket at https:// www.regulations.gov. mstockstill on DSK4VPTVN1PROD with NOTICES IV. Electronic Access Persons with access to the Internet may obtain the document at either https://www.fda.gov/Drugs/Guidance ComplianceRegulatoryInformation/ Guidances/default.htm or https:// www.regulations.gov. Dated: September 16, 2013. Leslie Kux, Assistant Commissioner for Policy. [FR Doc. 2013–22864 Filed 9–19–13; 8:45 am] BILLING CODE 4160–01–P VerDate Mar<15>2010 17:24 Sep 19, 2013 Jkt 229001 DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health Draft NIH Genomic Data Sharing Policy Request for Public Comments The National Institutes of Health (NIH) is seeking public comments on the draft Genomic Data Sharing (GDS) Policy that promotes sharing, for research purposes, of largescale human and nonhuman genomic 1 data generated from NIH-supported and NIH-conducted research. DATES: To ensure that your comments will be considered, please submit your response to this Request for Comments no later than 60 days after publication of this notice. ADDRESSES: Submit comments by any of the following methods: • Online: https://gds.nih.gov/ survey.aspx. • Fax: 301–496–9839. • Mail/Hand delivery/Courier (for paper, disk, or CD–ROM submissions) to: Genomic Data Sharing Policy Team, Office of Science Policy, National Institutes of Health, 6705 Rockledge Drive, Suite 750, Bethesda, MD 20892. FOR FURTHER INFORMATION CONTACT: Genomic Data Sharing Policy Team, Office of Science Policy, National Institutes of Health, 6705 Rockledge Drive, Suite 750, Bethesda, MD 20892, 301–496–9838, GDS@mail.nih.gov. SUPPLEMENTARY INFORMATION: SUMMARY: Background The NIH’s mission is to seek fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce illness and disability. The draft GDS Policy supports this mission by promoting the sharing of genomic research data, which maximizes the knowledge gained. Not only does data sharing allow data generated from one research study to be used to explore a wide range of additional research questions, it also enables data from multiple projects to be combined, amplifying the scientific value of data many times. Broad research use of the data enhances public benefit by helping to speed discoveries that increase the understanding of biological processes that affect human health and the development of better ways to diagnose, treat, and prevent disease. The NIH has promoted data sharing for many years, and in 2003, the NIH issued a general policy for sharing research data.2 3 In 2007, the NIH issued a more specific policy to promote PO 00000 Frm 00024 Fmt 4703 Sfmt 4703 sharing of data generated through genome wide association studies (GWAS),4 5 which examine thousands of single nucleotide polymorphisms (SNPs) across the genome to identify genetic variants that contribute to human diseases, conditions, and traits. To facilitate the sharing of genomic and phenotypic data from GWAS, the NIH created the database of Genotypes and Phenotypes (dbGaP) with a two-tiered system for distributing the data: Open access, for data that are available to the public without restrictions, and controlled access for data that are made available only for research purposes that are consistent with the original informed consent under which the data were collected. Not long after the GWAS policy was issued, advances in DNA sequencing and other high-throughput technologies, and a steep drop in DNA sequencing costs, enabled the NIH to fund research that generated even greater volumes of GWAS and other types of genomic data. In 2009, the NIH announced 6 its intention to extend the GWAS Policy to encompass data from a wider range of genomic research. The draft GDS Policy applies to research involving nonhuman genomic data as well as human data that are generated through array-based and highthroughput genomic technologies (e.g., SNP, whole-genome, transcriptomic, epigenomic, and gene expression data). (See section II of the draft Policy.) The NIH considers access to such data particularly important because of the opportunities to accelerate research through the power of combining such large and information-rich datasets. The draft GDS Policy is aligned with Administration priorities and a recent directive to agencies to increase access to digital scientific data resulting from federally funded research.7 Overview of the Policy The draft GDS Policy describes the responsibilities of investigators and institutions for the submission of nonhuman and human genomic data to the NIH (section IV) and the use of controlled-access data (section V). The Policy also provides expectations regarding intellectual property (section VI). When data sharing involves human data, the protection of research participant privacy and confidentiality is paramount, and the Policy reflects the NIH’s continued commitment to responsible data stewardship, which is essential to uphold the public trust in biomedical research. The draft GDS Policy, like the GWAS Policy, includes a number of provisions to protect E:\FR\FM\20SEN1.SGM 20SEN1 Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices research participant privacy (see section IV.C). For example, prior to data submission, traditional identifiers such as name, date of birth, street address, and social security number should be removed. The de-identified 8 data are coded using a random, unique code to protect participant privacy. The NIH also maintains the expectation established under the GWAS Policy that the responsible Institutional Signing Official 9 of the submitting institution should provide an Institutional Certification to the funding NIH Institute or Center prior to award. An Institutional Certification assures that the data have been or will be collected in a legal and ethically appropriate manner and have been de-identified. The draft GDS Policy clarifies the provisions of the Institutional Certification for datasets submitted to NIH-designated data repositories in Section IV.C.5. The NIH expects the Policy to be effective 60 days after the publication of the final Policy. mstockstill on DSK4VPTVN1PROD with NOTICES Request for Comments As part of the process of developing the GDS Policy, the NIH encourages the public to provide comments on any aspect of the draft GDS Policy. Comments should be submitted electronically to https://gds.nih.gov/ survey.aspx. Comments may also be submitted by fax (301–496–9839), or mailed to the Genomic Data Sharing Policy Team, Office of Science Policy, National Institutes of Health, 6705 Rockledge Drive, Suite 750, Bethesda, MD 20892. Responding to this request for comments is voluntary. Submitted comments are considered public information; do not include any information that you wish to remain private and confidential. Comments in their entirety will be posted along with the submitter’s name and affiliation on the NIH GDS Web site after the public comment period closes. Commenters will receive a confirmation acknowledging receipt of comments but will not receive individual feedback on any suggestions. Please note that the government will not pay for the use of any information contained in the response. The NIH intends to hold one or more public webinars on the draft Policy. Information about the webinars will be made available at https://gds.nih.gov. Draft NIH Genomic Data Sharing Policy I. Purpose The draft Genomic Data Sharing (GDS) Policy sets forth expectations that VerDate Mar<15>2010 17:24 Sep 19, 2013 Jkt 229001 ensure the broad and responsible sharing of genomic research data. Sharing research data supports the NIH mission 10 and is essential to facilitate the translation of research results into knowledge, products, and procedures that improve human health. The NIH has longstanding policies to make data publicly available in a timely manner from the research activities that it funds.11 12 II. Scope and Applicability This Policy applies to all NIH-funded research that involves large-scale human and nonhuman genomic data produced by array-based or high-throughput genomic technologies, such as GWAS 13 SNP, whole-genome, transcriptomic, epigenomic, and gene expression data, irrespective of funding level and funding mechanism (i.e., grant, contract, or intramural support). Appendix A provides examples of research that are subject to the Policy. At appropriate intervals, the NIH will review the types of research to which this Policy may be applicable, and changes to the scope will be defined in supplementary materials to the final GDS Policy. Notification of any changes will be provided to investigators and institutions through standard NIH communication channels (e.g., NIH Guide for Grants and Contracts). Compliance with this Policy will become a special term and condition in the Notice of Award or the Contract Award. Failure to comply with the terms and conditions of the funding agreement could lead to enforcement actions, including the withholding of funding, consistent with 45 CFR 74.62 and/or other authorities, as appropriate. III. Effective Date The effective date of this Policy is [To Be Determined], and pertains to the following funding mechanisms: • Competing grant applications 14 that are submitted to the NIH as of the [TBD] receipt date; • Proposals for contracts that are submitted to the NIH as of [TBD]; and • NIH intramural research projects that are approved as of [TBD]. IV. Responsibilities of Investigators Submitting Genomic Data A. Data Sharing Plans Investigators seeking NIH funding should contact appropriate Institute or Center (IC) Program or Project Officials 15 as early as possible to discuss data sharing expectations and timelines that would apply to their proposed studies. Investigators and their institutions are expected to address PO 00000 Frm 00025 Fmt 4703 Sfmt 4703 57861 plans for following this Policy in the data sharing section of funding applications and proposals. Any resources needed to support a proposed data sharing plan should be included in the project’s budget. NIH intramural investigators are expected to address data sharing plans with their IC scientific leadership prior to initiating applicable research and are encouraged to contact their IC leadership or the Office of Intramural Research for guidance. B. Nonhuman and Model Organism Genomic Data 1. Data Submission Expectations and Timeline Nonhuman data (including microbial and microbiome data) and data from large-scale genomic projects for model organisms 16 are to be shared in a timely manner. Investigators should make nonhuman and model organism data publicly available no later than the date of initial publication. However, certain data types or NIH research initiatives may expect an earlier data release (e.g., microbial or microbiome data, or projects with broad utility as a resource for the scientific community). (See Appendix A for specific expectations for data submission and release.) 2. Data Repositories Data should be made available through any widely used data repository, whether NIH-funded or not, such as the Gene Expression Omnibus (GEO),17 Sequence Read Archive (SRA),18 Trace Archive,19 Array Express,20 Mouse Genome Informatics (MGI),21 WormBase,22 the Zebrafish Model Organism Database (ZFIN),23 GenBank,24 European Nucleotide Archive (ENA),25 or DNA Data Bank of Japan (DDBJ).26 C. Human Genomic Data 1. Data Submission Expectations and Timeline Guidance to govern human genomic data submission timelines and data release expectations is provided in Appendix A. The NIH will release data submitted to NIH-designated data repositories without restrictions on publication or other dissemination no later than six months after the initial data submission to an NIH-designated data repository,27 or at the time of acceptance of the first publication, whichever occurs first. Human data that are submitted to NIH-designated data repositories should be de-identified according to the standards set forth in the HHS Regulations for the Protection of Human E:\FR\FM\20SEN1.SGM 20SEN1 57862 Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices Subjects 28 and the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.29 The deidentified data should be assigned a random, unique code, and the key held by the submitting institution. The NIH encourages researchers and institutions submitting large-scale genomic datasets to NIH-designated data repositories to consider whether a Certificate of Confidentiality could serve as an additional safeguard to prevent compelled disclosure of any personally identifiable information that it may hold.30 The NIH has obtained a Certificate of Confidentiality for dbGaP.31 2. Data Repositories Applicable studies with human genomic data should be registered in the database of Genotypes and Phenotypes (dbGaP) 32 no later than the time that data cleaning and quality control measures begin. Investigators should submit human data to the relevant NIHdesignated data repository (e.g., dbGaP, GEO, SRA, the Cancer Genomics Hub 33). NIH-designated data repositories need not be the exclusive source for facilitating the sharing of genomic data. Investigators who elect to submit data to a non-NIH-designated data repository should confirm that appropriate data security, confidentiality, and privacy measures are in place. mstockstill on DSK4VPTVN1PROD with NOTICES 3. Tiered System for the Distribution of Human Data Respect for and protection of the interests of research participants is fundamental to the NIH’s stewardship of human genomic data. The informed consent under which the data or sample were collected is the basis for the submitting institution to determine the appropriateness of data submission to NIH-designated data repositories, and whether the data should be available through open or controlled access. Controlled-access data in NIHdesignated data repositories are made available for secondary research only after investigators have obtained approval from the NIH to use the requested data for a particular project. Open-access data are publicly available without restriction (e.g., The 1000 Genomes Project 34). 4. Informed Consent Submitting institutions, through their Institutional Review Boards (IRBs), are to review the informed consent materials for studies that are to be submitted to NIH-designated data repositories to determine whether the data are appropriate for sharing for VerDate Mar<15>2010 17:24 Sep 19, 2013 Jkt 229001 secondary research use. Specific considerations may vary with the type of study and whether the data are obtained through prospective or retrospective data collections. The NIH provides additional information on issues related to the respect for research participant interests in its Points To Consider for IRBs and Institutions in Their Review of Data Submission Plans for Institutional Certifications.35 This and other policy-related documents will be updated once the Policy is final. For studies initiated after the effective date of this Policy, the NIH expects the informed consent process and documents to state that a participant’s genomic and phenotypic data may be shared broadly for future research purposes and also explain whether the data will be shared through open or controlled access. If human genomic data are to be shared in open-access repositories, the NIH expects that participants will have provided explicit consent for sharing their data through open-access mechanisms. For studies proposing to use cell lines or clinical specimens,36 the NIH expects that informed consent for future research use and broad data sharing will have been obtained even if the cell lines or clinical specimens are de-identified. If there are compelling scientific reasons that necessitate the use of cell lines or clinical specimens that were created or collected after the effective date of this Policy and that lack consent for research use and data sharing, investigators should provide a justification for the use of any such materials in the funding request. For studies using data or specimens collected before the effective date of this Policy, there may be considerable variation in the extent to which data sharing and future genomic research was addressed within the informed consent materials for the primary research. In these cases, an assessment by an IRB, Privacy Board, or equivalent group is essential to ensure that data submission is not inconsistent with the informed consent provided by the research participant. The NIH will accept data derived from cell lines or clinical specimens lacking consent for research use that were created or collected before the effective date of this Policy. Grandfathered genomic data that are currently available through open access may be submitted to an open-access NIH-designated data repository; otherwise, the data should be submitted to a controlled-access NIH-designated data repository. While the NIH encourages broad access to genomic data, in some PO 00000 Frm 00026 Fmt 4703 Sfmt 4703 circumstances broad sharing may be inconsistent with the informed consent of the research participants whose data are included in the dataset. In such circumstances, institutions planning to submit aggregate- or individual-level data to the NIH for controlled access should note any data use limitations in the data sharing or data management plan submitted as part of the funding request. These data use limitations should be specified in the Institutional Certification submitted to the NIH prior to award. 5. Institutional Certification The responsible Institutional Signing Official of the submitting institution should provide an Institutional Certification to the funding IC prior to award. The Institutional Certification should indicate whether the data will be submitted to an open- or controlledaccess database and assure that: • The data submission is consistent with applicable laws, regulations, and institutional policies; 37 • The appropriate research uses of the data and any uses that are specifically excluded in the informed consent documents are delineated; 38 • The identities of research participants will not be disclosed to NIH-designated data repositories; and • An IRB, Privacy Board, and/or equivalent body 39 has reviewed the investigator’s proposal for data submission and assures that: Æ The protocol for the collection of genomic and phenotypic data was consistent with 45 CFR part 46; Æ Data submission and subsequent data sharing for research purposes are consistent with the informed consent of study participants from whom the data were obtained; 40 Æ Risks to individuals and their families associated with data submitted to NIH-designated data repositories were considered; Æ To the extent relevant and possible, risks to groups or populations associated with data submitted to NIHdesignated data repositories were considered; and Æ The investigator’s plan for deidentifying datasets is consistent with the standards outlined in this Policy (see section IV.C.1.). Institutions should indicate in the certification whether aggregate genomic data from datasets with data use limitations may be appropriate for general research use (i.e., use for any research question such as research to understand the biological mechanisms underlying disease, development of statistical research methods, the study of populations origins). If so, the E:\FR\FM\20SEN1.SGM 20SEN1 Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices aggregate genomic data will be made available through the controlled-access compilation of aggregate genomic data 41 to facilitate secondary research. 6. Data Withdrawal Submitting investigators and their institutions may request removal of data on individual participants from NIHdesignated data repositories in the event that a research participant withdraws his or her consent. However, data that have been distributed for approved research use cannot be retrieved. 7. Exceptions to Data Submission Expectations The NIH acknowledges that in some cases, circumstances beyond the control of investigators may preclude submission of data to NIH-designated data repositories (e.g., country or state laws that prohibit data submission to a U.S. federal database). In such cases, investigators should provide a justification for any exceptions requested in the application or proposal. The funding IC may grant an exception to the submission of relevant data to the NIH, and the investigator would be expected to develop a plan to share data through other mechanisms. For transparency purposes, when exceptions are granted, studies will still be registered in dbGaP and the reason for the exception will be included in the registration record. Information about current expectations for exception requests will be made available on the GDS Web site. mstockstill on DSK4VPTVN1PROD with NOTICES V. Responsibilities of Investigators Accessing and Using Genomic Data A. Requests for Controlled-Access Data Access to human data is through a two-tiered model involving open- and controlled-data access mechanisms. Requests for controlled-access data 42 are reviewed by NIH Data Access Committees (DACs).43 DAC decisions are based primarily upon conformance of the proposed research as described in the access request to the data use limitations established by the submitting institution through the Institutional Certification. The NIH DACs will accept requests for proposed research uses beginning one month prior to the anticipated data release date. The access period for all controlled-access data is one year; at the end of each approved period, data users can request an additional year of access or close out the project. Investigators approved to download controlled-access data from NIHdesignated data repositories and their institutions are expected to abide by the VerDate Mar<15>2010 17:24 Sep 19, 2013 Jkt 229001 NIH User Code of Conduct 44 through their agreement to the Data Use Certification.45 The Data Use Certification, co-signed by the investigators requesting the data and their Institutional Signing Official, specifies the terms and conditions for the secondary research use of controlled-access data, such as: • Using the data only for the approved research; • Protecting data confidentiality; • Following all applicable laws, regulations, and local institutional policies and procedures for handling genomic data; • Not attempting to identify individual participants from whom the data were obtained; • Not selling any of the data obtained from the NIH-designated data repositories; • Not sharing any of the data obtained from the NIH-designated data repositories with individuals other than those listed in the data access request; • Agreeing to the listing of a summary of approved research uses in dbGaP along with the investigator’s name and organizational affiliation; • Agreeing to report, in real time, violations of the GDS Policy to the appropriate DAC; • Providing annual updates on research using controlled-access datasets. For investigators who are approved to use the data, the NIH maintains guidance on security practices 46 that outlines expected data security protections (e.g., physical security measures and user training) to ensure that the data are kept secure and not released to any person not permitted to access the data. B. Acknowledgment Responsibilities The NIH expects all investigators who access genomic datasets from NIHdesignated data repositories to acknowledge in all resulting oral or written presentations, disclosures, or publications the contributing investigator(s) who conducted the original study, the funding organization(s) that supported the work, the specific dataset(s) and applicable accession number(s), and the NIHdesignated data repositories through which the investigator accessed any data. VI. Intellectual Property Naturally occurring DNA sequences are not patentable in the United States.47 Therefore, basic sequence data and certain related information (e.g., genotypes, haplotypes, p values, allele PO 00000 Frm 00027 Fmt 4703 Sfmt 4703 57863 frequencies) are pre-competitive, and such data made available through NIHdesignated data repositories and all conclusions derived directly from them should remain freely available, without any licensing requirements, for uses such as markers for developing assays and guides for identifying new potential targets for drugs, therapeutics, and diagnostics. In addition, the NIH discourages the use of patents to prevent the use of or block access to genomic or genotype-phenotype data developed with NIH support. The NIH encourages broad use of NIH-funded genomic data that is consistent with a responsible approach to management of intellectual property derived from downstream discoveries, as outlined in the NIH Best Practices for the Licensing of Genomic Inventions 48 and Research Tools Policy.49 The NIH encourages patenting of technology suitable for subsequent private investment that may lead to the development of products that address public needs. Appendix A Supplemental Information for the NIH Genomic Data Sharing Policy Overview This document provides additional guidance on the types of research projects to which the Genomic Data Sharing (GDS) Policy applies and the NIH’s expectations for data submission and release. Examples of Types of Research Covered Under the GDS Policy The GDS Policy is applicable to any NIHfunded research project involving nonhuman organisms or human specimens that produces genomic, metagenomic, epigenomic, or transcriptomic data from large-output sequencing instruments or genotyping platforms, such as projects that involve: • Sequence data from tens of isolates from infectious organisms. • Sequencing more than one gene or genesized region in more than 100 participants. • More than 10,000 genes or regions from one participant (e.g., whole genome sequencing). • More than 100,000 variant sites in more than 100 participants. Expectations for Data Submission and Data Release Data submitted to NIH-designated data repositories undergo different levels of data processing, and the expectations for data submission and data release are based on those levels. The table and text below describe the expectations for each level. The NIH will review these expectations at regular intervals, and any updates will be published on the GDS Web site and the research community will be notified through appropriate communication methods (e.g., The NIH Guide for Grants and Contracts). E:\FR\FM\20SEN1.SGM 20SEN1 57864 Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices Level General description of data processing Example data types Data submission expectation 0 ............... Raw data generated directly from the instrument platform. Initial sequence reads, the most fundamental form of the data after the basic translation of raw input. Instrument image data ............ Not expected ........................... NA. DNA sequencing reads, ChIPSeq reads, RNA-Seq reads, SNP arrays, arrayCGH. Not expected for human data if reads are included in Level 2 aligned sequence file (e.g., BAM). Nonhuman de novo sequence data. Project specific, generally within 3 months after data generation. NA. 1 ............... 2 ............... 3 ............... mstockstill on DSK4VPTVN1PROD with NOTICES 4 ............... Data after an initial round of analysis or computation to clean the data and assess basic quality measures. Analysis to identify genetic variants, gene expression patterns, or other features of the dataset. Final analysis that relates the genomic data to phenotype or other biological states. Level 0 and level 1 data are the raw images and initial sequence reads, respectively, and have limited value to secondary data users. NIH policy does not expect submission of these data. An exception is made for de novo sequencing of nonhuman organisms unless those read data are provided within the level 2 submission. In the case of de novo sequencing for nonhuman organisms, investigators who are submitting level 1 data may request a holding period, not to exceed six months, during which the datasets will not be released for use by other investigators. For data submitted to NIH-designated data repositories, provisions may be made for creating an exchange area in which such datasets may be shared among investigative teams prior to general release. Submission of array-based data, such as gene expression, ChIP-chip, ArrayCGH, and SNP arrays can be submitted to GEO as level 1 data, which will not be accessible until a manuscript describing the data is published. It is the submitter’s responsibility to ensure that the data and files submitted to GEO protect participant privacy in accordance with all applicable laws, regulations, and institutional policies, including the GDS Policy. Level 2 constitutes a computational analysis in the form of higher order assembly or placement of the sequencing reads on a reference template. For human sequencing projects, the level 2 file comprises the reads ‘‘piled’’ on a reference human genome. A submission would be a file (e.g., binary alignment matrix (BAM) files) usually containing the unmapped reads as well. GWAS and other types of projects (e.g., RNA expression profiling or de novo sequencing) would also generate a level 2 placement or assembly file. Generation of data files at level 2 generally requires substantial analysis and quality checks relating to both breadth of coverage of the targeted region and accuracy of assembly. Sufficient time will be allowed to complete the analysis and generate the assembly, up to the coverage and quality thresholds specified by a project or investigative team. In general, it is anticipated that this work could VerDate Mar<15>2010 17:24 Sep 19, 2013 Jkt 229001 DNA sequence alignments to a reference sequence or de novo assembly, RNA expression profiling. SNP or structural variant calls, expression peaks, epigenomic features. Genotype-phenotype relationships, relationships of RNA expression or epigenomic patterns to biological state. Project specific, generally within 3 months after data generation. Data submitted as analyses are completed. reasonably be completed within three months, and data submission would follow shortly thereafter. Data files may be held in an exchange area accessible only to the submitting investigators and collaborators for a period not to exceed six months from the time of submission. Following this period of exclusivity, the data will be available for research access without restrictions on publication. Phenotype or clinical data should be submitted to the NIH-designated data repository at the earliest opportunity, but no later than the date of level 2 genomic data submission (or levels 2 and 3 for GWAS datasets), especially for studies in which all phenotype data have already been gathered. For studies in which phenotype data collections are ongoing and/or may be regularly updated, data files should be submitted to NIH-designated data repositories as early as possible considering the practical needs for ensuring data accuracy; generally speaking, this time should not exceed six months after data collection. Level 3 includes analysis to identify variants or to elucidate other features of the genomic dataset, such as gene expression patterns in an RNAseq assay. Level 3 data may be generated from a single level 2 data file (e.g., variant sites versus the human reference genome), but will often derive from a compilation of sequencing assemblies (e.g., in a genome study of a specific cancer type). Data submission expectations for level 3 files will vary substantially by project and therefore will require consultation with NIH program staff. As in level 2 data submission, level 3 files will be date stamped and the data producer may request a period of exclusivity not to exceed six months, after which time the datasets will be released through open- or controlled-access mechanisms as appropriate and without publication limitations. Level 4 constitutes the final analysis, relating the genomic datasets to phenotype or other biological states as pertinent to the research objective. Data in this level are the project findings or the publication dataset. PO 00000 Frm 00028 Fmt 4703 Sfmt 4703 Data release timeline Up to 6 months for nonhuman data. Up to 6 months after data submission or at the time of acceptance of the first publication, whichever occurs first. Up to 6 months after data submission or at the time of acceptance of the first publication, whichever occurs first. Data released with publication. Investigators should submit these data prior to publication, and the data will be released concurrent with publication. References 1 The genome is the entire set of genetic instructions found in a cell. See https:// ghr.nlm.nih.gov/glossary=genome. 2 Final NIH Statement on Sharing Research Data. February 26, 2003. See https:// grants.nih.gov/grants/guide/notice-files/ NOT-OD-03-032.html. 3 NIH Intramural Policy on Large Database Sharing. April 5, 2002. See https:// sourcebook.od.nih.gov/ethic-conduct/largedb-sharing.htm. 4 Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS). August 28, 2007. See https://grants.nih.gov/grants/guide/ notice-files/NOT-OD-07-088.html. 5 A GWAS is defined as any study of genetic variation across the entire human genome that is designed to identify genetic associations with observable traits (such as blood pressure or weight), or the presence or absence of a disease or condition. 6 Notice on Development of Data Sharing Policy for Sequence and Related Genomic Data. October 19, 2009. See https:// grants.nih.gov/grants/guide/notice-files/ NOT-HG-10-006.html. 7 Office of Science and Technology Policy Memorandum, Expanding Public Access to the Results of Federally Funded Research. February 22, 2013. See https:// www.whitehouse.gov/blog/2013/02/22/ expanding-public-access-results-federallyfunded-research. 8 ‘‘De-identified’’ refers to removing information that could be used to associate a dataset or record with a human individual. Under this Policy, data should be deidentified according to the standards set forth in the HHS Regulations for the Protection of Human Subjects and the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. The HIPAA Privacy Rule lists 18 identifiers that must be removed to classify data as de-identified. For the full list, E:\FR\FM\20SEN1.SGM 20SEN1 mstockstill on DSK4VPTVN1PROD with NOTICES Federal Register / Vol. 78, No. 183 / Friday, September 20, 2013 / Notices see https://privacyruleandresearch.nih.gov/ pr_08.asp. 9 An Institutional Signing Official is generally a senior official at an institution who is credentialed through the NIH eRA Commons system and is authorized to enter the institution into a legally binding contract and sign on behalf of an investigator who has submitted data or a data access request to the NIH. 10 The NIH’s mission is to seek fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce illness and disability. See https://www.nih.gov/about/ mission.htm. 11 Final NIH Statement on Sharing Research Data. February 26, 2003. See https:// grants.nih.gov/grants/guide/notice-files/ NOT-OD-03-032.html. 12 NIH Intramural Policy on Large Database Sharing. April 5, 2002. See https:// sourcebook.od.nih.gov/ethic-conduct/largedb-sharing.htm. 13 GWAS has the same definition in this policy as in the 2007 GWAS Policy: a study in which the density of genetic markers and the extent of linkage disequilibrium should be sufficient to capture (by the r2 parameter) a large proportion of the common variation in the genome of the population under study, and the number of samples (in a case-control or trio design) should provide sufficient power to detect variants of modest effect. 14 Competing grant applications encompass all activities with a research component, including but not limited to the following: Research Grants (Rs), Program Projects (Ps), Cooperative Research Mechanisms (Us), Career Development Awards (Ks), and SCORs and other S grants with a research component. 15 Investigators should refer to funding announcements or IC Web sites for contact information. 16 NIH Policy on Sharing of Model Organisms for Biomedical Research. Release Date May 7, 2004. See https://grants.nih.gov/ grants/guide/notice-files/NOT-OD-04042.html. 17 Gene Expression Omnibus at https:// www.ncbi.nlm.nih.gov/geo/. 18 Sequence Read Archive at https:// www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?. 19 Trace Archive at https:// www.ncbi.nlm.nih.gov/Traces/trace.cgi. 20 Array Express at https://www.ebi.ac.uk/ arrayexpress/. 21 Mouse Genome Informatics at https:// www.informatics.jax.org/. 22 WormBase at https://www.wormbase.org. 23 The Zebrafish Model Organism Database at https://zfin.org/. 24 GenBank at https:// www.ncbi.nlm.nih.gov/genbank/. 25 European Nucleotide Archive at https:// www.ebi.ac.uk/ena/. 26 DNA Data Bank of Japan at https:// www.ddbj.nig.ac.jp/. 27 A period for data preparation is anticipated prior to data submission to the NIH, and the appropriate time intervals for that data preparation (or data cleaning) will be subject to the particular data type and project plans (see Appendix A). Investigators VerDate Mar<15>2010 17:24 Sep 19, 2013 Jkt 229001 should work with NIH Program or Project Officials for specific guidance. 28 See 45 CFR 46.102(f) at https:// www.hhs.gov/ohrp/humansubjects/guidance/ 45cfr46.html#46.102. 29 See 45 CFR 164.514(b)(2). The list of HIPAA identifiers that must be removed is available at: https://www.gpo.gov/fdsys/pkg/ CFR-2002-title45-vol1/pdf/CFR-2002-title45vol1-sec164-514.pdf. 30 For additional information about Certificates of Confidentiality, see https:// grants.nih.gov/grants/policy/coc/. 31 Confidentiality Certificate. HG–2009–01. Issued to the National Center for Biotechnology Information, National Library of Medicine, NIH. See https://www.ncbi. nlm.nih.gov/projects/gap/cgi-bin/ GetPdf.cgi?document_name=Confidentiality Certificate.pdf. 32 Database of Genotypes and Phenotypes at https://www.ncbi.nlm.nih.gov/gap. 33 Cancer Genomics Hub at https:// cghub.ucsc.edu/. 34 The 1000 Genomes Project at https:// www.1000genomes.org/. 35 Points to Consider for IRBs and Institutions in their Review of Data Submission Plans for Institutional Certifications. See https://gwas.nih.gov/pdf/ PTC_for_IRBs_and_Institutions_revised5-3111.pdf. 36 Clinical specimens are specimens that have been obtained through clinical practice. 37 For the submission of data derived from cell lines or clinical specimens lacking research consent that were created or collected before the effective date of this Policy, the Institutional Certification needs to address only this item. 38 For guidance on clearly communicating inappropriate data uses, see NIH Points to Consider in Drafting Effective Data Use Limitation Statements, https://gwas.nih.gov/ pdf/NIH_PTC_in_Drafting_DUL_ Statements.pdf. 39 ‘‘Equivalent body’’ is used here to acknowledge that some primary studies may be conducted abroad and in such cases the expectation is that an analogous review committee to an IRB or Privacy Board (e.g., Research Ethics Committees) may be asked to participate in the presubmission review of proposed genomic projects. 40 As noted earlier, for studies using data or specimens collected before the effective date of this Policy, the IRB or Privacy Board should review informed consent materials to ensure that data submission is not inconsistent with the informed consent provided by the research participants. 41 Compilation of Aggregate Genomic Data. dbGaP study accession: phs000501.v1.p1. See https://www.ncbi.nlm.nih.gov/projects/ gap/cgi-bin/study501.cgi?study_ id=phs000501.v1.p1&pha=&phaf=. 42 dbGaP Authorized Access. See https:// dbgap.ncbi.nlm.nih.gov/aa/ wga.cgi?page=login. 43 For a list of NIH Data Access Committees, see https://gwas.nih.gov/04po2_ 1DAC.html. 44 User Code of Conduct. See https:// dbgap.ncbi.nlm.nih.gov/aa/GWAS_Code_of_ Conduct.html. PO 00000 Frm 00029 Fmt 4703 Sfmt 4703 57865 45 Model Data Use Certification Agreement. See https://gwas.nih.gov/pdf/Model_DUC_726-13.pdf. 46 Security Best Practices. See https:// www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/ GetPdf.cgi?document_name=dbgap_2b_ security_procedures.pdf. 47 In Association for Molecular Pathology et al. v. Myriad Genetics, Inc., et al. 569 U.S. ___ 2013. See https://www.supremecourt.gov/ opinions/12pdf/12-398_1b7d.pdf. 48 NIH Best Practices for the Licensing of Genomic Inventions. See https://www.ott.nih. gov/policy/genomic_invention.html. 49 Research Tools Policy. See https:// www.ott.nih.gov/policy/research_tool.aspx. Dated: September 16, 2013. Lawrence A. Tabak, Deputy Director, National Institutes of Health. [FR Doc. 2013–22941 Filed 9–19–13; 8:45 am] BILLING CODE 4140–01–P DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health National Institute of Mental Health; Notice of Meeting Pursuant to section 10(a) of the Federal Advisory Committee Act, as amended (5 U.S.C. App.), notice is hereby given of an Interagency Autism Coordinating Committee (IACC or Committee) meeting. The purpose of the IACC meeting is to discuss committee business, updates and issues related to autism spectrum disorder (ASD) research and services activities. The meeting will be open to the public and will be accessible by webcast and conference call. Name of Committee: Interagency Autism Coordinating Committee (IACC). Type of meeting: Open Meeting. Date: October 9, 2013. Time: 9:00 a.m. to 5:00 p.m.* Eastern Time * Approximate end time. Agenda: To discuss committee business, updates and issues related to ASD research and services activities. Place: Fishers Lane Conference Center, 5635 Fisher Lane, Terrace Level, Rockville, MD 20852, (Parking on site.) Web cast Live: https://videocast.nih.gov/. Conference Call: Dial: 888–989–4620. Access: Access code: 2327818. Cost: The meeting is free and open to the public. Registration: Pre-registration is recommended to expedite check-in. Seating in the meeting room is limited to room capacity and on a first come, first served basis. To register, please visit www.iacc.hhs.gov. Deadlines: Notification of intent to present oral comments: Friday, September 27, 2013 by 5:00 p.m. e.t. Submission of written/electronic statement for oral comments: Wednesday, October 2, 2013 by 5:00 p.m. e.t. E:\FR\FM\20SEN1.SGM 20SEN1

Agencies

[Federal Register Volume 78, Number 183 (Friday, September 20, 2013)]
[Notices]
[Pages 57860-57865]
From the Federal Register Online via the Government Printing Office [www.gpo.gov]
[FR Doc No: 2013-22941]


-----------------------------------------------------------------------

DEPARTMENT OF HEALTH AND HUMAN SERVICES

National Institutes of Health


Draft NIH Genomic Data Sharing Policy Request for Public Comments

SUMMARY: The National Institutes of Health (NIH) is seeking public 
comments on the draft Genomic Data Sharing (GDS) Policy that promotes 
sharing, for research purposes, of large-scale human and nonhuman 
genomic \1\ data generated from NIH-supported and NIH-conducted 
research.

DATES: To ensure that your comments will be considered, please submit 
your response to this Request for Comments no later than 60 days after 
publication of this notice.

ADDRESSES: Submit comments by any of the following methods:
     Online: https://gds.nih.gov/survey.aspx.
     Fax: 301-496-9839.
     Mail/Hand delivery/Courier (for paper, disk, or CD-ROM 
submissions) to: Genomic Data Sharing Policy Team, Office of Science 
Policy, National Institutes of Health, 6705 Rockledge Drive, Suite 750, 
Bethesda, MD 20892.

FOR FURTHER INFORMATION CONTACT: Genomic Data Sharing Policy Team, 
Office of Science Policy, National Institutes of Health, 6705 Rockledge 
Drive, Suite 750, Bethesda, MD 20892, 301-496-9838, GDS@mail.nih.gov.

SUPPLEMENTARY INFORMATION: 

Background

    The NIH's mission is to seek fundamental knowledge about the nature 
and behavior of living systems and the application of that knowledge to 
enhance health, lengthen life, and reduce illness and disability. The 
draft GDS Policy supports this mission by promoting the sharing of 
genomic research data, which maximizes the knowledge gained. Not only 
does data sharing allow data generated from one research study to be 
used to explore a wide range of additional research questions, it also 
enables data from multiple projects to be combined, amplifying the 
scientific value of data many times. Broad research use of the data 
enhances public benefit by helping to speed discoveries that increase 
the understanding of biological processes that affect human health and 
the development of better ways to diagnose, treat, and prevent disease.
    The NIH has promoted data sharing for many years, and in 2003, the 
NIH issued a general policy for sharing research data.2 3 In 
2007, the NIH issued a more specific policy to promote sharing of data 
generated through genome wide association studies (GWAS),4 5 
which examine thousands of single nucleotide polymorphisms (SNPs) 
across the genome to identify genetic variants that contribute to human 
diseases, conditions, and traits. To facilitate the sharing of genomic 
and phenotypic data from GWAS, the NIH created the database of 
Genotypes and Phenotypes (dbGaP) with a two-tiered system for 
distributing the data: Open access, for data that are available to the 
public without restrictions, and controlled access for data that are 
made available only for research purposes that are consistent with the 
original informed consent under which the data were collected.
    Not long after the GWAS policy was issued, advances in DNA 
sequencing and other high-throughput technologies, and a steep drop in 
DNA sequencing costs, enabled the NIH to fund research that generated 
even greater volumes of GWAS and other types of genomic data. In 2009, 
the NIH announced 6 its intention to extend the GWAS Policy 
to encompass data from a wider range of genomic research.
    The draft GDS Policy applies to research involving nonhuman genomic 
data as well as human data that are generated through array-based and 
high-throughput genomic technologies (e.g., SNP, whole-genome, 
transcriptomic, epigenomic, and gene expression data). (See section II 
of the draft Policy.) The NIH considers access to such data 
particularly important because of the opportunities to accelerate 
research through the power of combining such large and information-rich 
datasets. The draft GDS Policy is aligned with Administration 
priorities and a recent directive to agencies to increase access to 
digital scientific data resulting from federally funded 
research.7

Overview of the Policy

    The draft GDS Policy describes the responsibilities of 
investigators and institutions for the submission of nonhuman and human 
genomic data to the NIH (section IV) and the use of controlled-access 
data (section V). The Policy also provides expectations regarding 
intellectual property (section VI).
    When data sharing involves human data, the protection of research 
participant privacy and confidentiality is paramount, and the Policy 
reflects the NIH's continued commitment to responsible data 
stewardship, which is essential to uphold the public trust in 
biomedical research. The draft GDS Policy, like the GWAS Policy, 
includes a number of provisions to protect

[[Page 57861]]

research participant privacy (see section IV.C). For example, prior to 
data submission, traditional identifiers such as name, date of birth, 
street address, and social security number should be removed. The de-
identified 8 data are coded using a random, unique code to 
protect participant privacy. The NIH also maintains the expectation 
established under the GWAS Policy that the responsible Institutional 
Signing Official 9 of the submitting institution should 
provide an Institutional Certification to the funding NIH Institute or 
Center prior to award. An Institutional Certification assures that the 
data have been or will be collected in a legal and ethically 
appropriate manner and have been de-identified. The draft GDS Policy 
clarifies the provisions of the Institutional Certification for 
datasets submitted to NIH-designated data repositories in Section 
IV.C.5.
    The NIH expects the Policy to be effective 60 days after the 
publication of the final Policy.

Request for Comments

    As part of the process of developing the GDS Policy, the NIH 
encourages the public to provide comments on any aspect of the draft 
GDS Policy.
    Comments should be submitted electronically to https://gds.nih.gov/survey.aspx. Comments may also be submitted by fax (301-496-9839), or 
mailed to the Genomic Data Sharing Policy Team, Office of Science 
Policy, National Institutes of Health, 6705 Rockledge Drive, Suite 750, 
Bethesda, MD 20892.
    Responding to this request for comments is voluntary. Submitted 
comments are considered public information; do not include any 
information that you wish to remain private and confidential. Comments 
in their entirety will be posted along with the submitter's name and 
affiliation on the NIH GDS Web site after the public comment period 
closes. Commenters will receive a confirmation acknowledging receipt of 
comments but will not receive individual feedback on any suggestions. 
Please note that the government will not pay for the use of any 
information contained in the response.
    The NIH intends to hold one or more public webinars on the draft 
Policy. Information about the webinars will be made available at https://gds.nih.gov.

Draft NIH Genomic Data Sharing Policy

I. Purpose

    The draft Genomic Data Sharing (GDS) Policy sets forth expectations 
that ensure the broad and responsible sharing of genomic research data. 
Sharing research data supports the NIH mission 10 and is 
essential to facilitate the translation of research results into 
knowledge, products, and procedures that improve human health. The NIH 
has longstanding policies to make data publicly available in a timely 
manner from the research activities that it funds.11 12

II. Scope and Applicability

    This Policy applies to all NIH-funded research that involves large-
scale human and nonhuman genomic data produced by array-based or high-
throughput genomic technologies, such as GWAS 13 SNP, whole-
genome, transcriptomic, epigenomic, and gene expression data, 
irrespective of funding level and funding mechanism (i.e., grant, 
contract, or intramural support). Appendix A provides examples of 
research that are subject to the Policy. At appropriate intervals, the 
NIH will review the types of research to which this Policy may be 
applicable, and changes to the scope will be defined in supplementary 
materials to the final GDS Policy. Notification of any changes will be 
provided to investigators and institutions through standard NIH 
communication channels (e.g., NIH Guide for Grants and Contracts).
    Compliance with this Policy will become a special term and 
condition in the Notice of Award or the Contract Award. Failure to 
comply with the terms and conditions of the funding agreement could 
lead to enforcement actions, including the withholding of funding, 
consistent with 45 CFR 74.62 and/or other authorities, as appropriate.

III. Effective Date

    The effective date of this Policy is [To Be Determined], and 
pertains to the following funding mechanisms:
     Competing grant applications 14 that are 
submitted to the NIH as of the [TBD] receipt date;
     Proposals for contracts that are submitted to the NIH as 
of [TBD]; and
     NIH intramural research projects that are approved as of 
[TBD].

IV. Responsibilities of Investigators Submitting Genomic Data

A. Data Sharing Plans
    Investigators seeking NIH funding should contact appropriate 
Institute or Center (IC) Program or Project Officials 15 as 
early as possible to discuss data sharing expectations and timelines 
that would apply to their proposed studies. Investigators and their 
institutions are expected to address plans for following this Policy in 
the data sharing section of funding applications and proposals. Any 
resources needed to support a proposed data sharing plan should be 
included in the project's budget. NIH intramural investigators are 
expected to address data sharing plans with their IC scientific 
leadership prior to initiating applicable research and are encouraged 
to contact their IC leadership or the Office of Intramural Research for 
guidance.
B. Nonhuman and Model Organism Genomic Data
1. Data Submission Expectations and Timeline
    Nonhuman data (including microbial and microbiome data) and data 
from large-scale genomic projects for model organisms 16 are 
to be shared in a timely manner. Investigators should make nonhuman and 
model organism data publicly available no later than the date of 
initial publication. However, certain data types or NIH research 
initiatives may expect an earlier data release (e.g., microbial or 
microbiome data, or projects with broad utility as a resource for the 
scientific community). (See Appendix A for specific expectations for 
data submission and release.)
2. Data Repositories
    Data should be made available through any widely used data 
repository, whether NIH-funded or not, such as the Gene Expression 
Omnibus (GEO),17 Sequence Read Archive (SRA),18 
Trace Archive,19 Array Express,20 Mouse Genome 
Informatics (MGI),21 WormBase,22 the Zebrafish 
Model Organism Database (ZFIN),23 GenBank,24 
European Nucleotide Archive (ENA),25 or DNA Data Bank of 
Japan (DDBJ).26
C. Human Genomic Data
1. Data Submission Expectations and Timeline
    Guidance to govern human genomic data submission timelines and data 
release expectations is provided in Appendix A. The NIH will release 
data submitted to NIH-designated data repositories without restrictions 
on publication or other dissemination no later than six months after 
the initial data submission to an NIH-designated data 
repository,27 or at the time of acceptance of the first 
publication, whichever occurs first.
    Human data that are submitted to NIH-designated data repositories 
should be de-identified according to the standards set forth in the HHS 
Regulations for the Protection of Human

[[Page 57862]]

Subjects 28 and the Health Insurance Portability and 
Accountability Act (HIPAA) Privacy Rule.29 The de-identified 
data should be assigned a random, unique code, and the key held by the 
submitting institution.
    The NIH encourages researchers and institutions submitting large-
scale genomic datasets to NIH-designated data repositories to consider 
whether a Certificate of Confidentiality could serve as an additional 
safeguard to prevent compelled disclosure of any personally 
identifiable information that it may hold.30 The NIH has 
obtained a Certificate of Confidentiality for dbGaP.31
2. Data Repositories
    Applicable studies with human genomic data should be registered in 
the database of Genotypes and Phenotypes (dbGaP) 32 no later 
than the time that data cleaning and quality control measures begin. 
Investigators should submit human data to the relevant NIH-designated 
data repository (e.g., dbGaP, GEO, SRA, the Cancer Genomics Hub 
33). NIH-designated data repositories need not be the 
exclusive source for facilitating the sharing of genomic data. 
Investigators who elect to submit data to a non-NIH-designated data 
repository should confirm that appropriate data security, 
confidentiality, and privacy measures are in place.
3. Tiered System for the Distribution of Human Data
    Respect for and protection of the interests of research 
participants is fundamental to the NIH's stewardship of human genomic 
data. The informed consent under which the data or sample were 
collected is the basis for the submitting institution to determine the 
appropriateness of data submission to NIH-designated data repositories, 
and whether the data should be available through open or controlled 
access. Controlled-access data in NIH-designated data repositories are 
made available for secondary research only after investigators have 
obtained approval from the NIH to use the requested data for a 
particular project. Open-access data are publicly available without 
restriction (e.g., The 1000 Genomes Project 34).
4. Informed Consent
    Submitting institutions, through their Institutional Review Boards 
(IRBs), are to review the informed consent materials for studies that 
are to be submitted to NIH-designated data repositories to determine 
whether the data are appropriate for sharing for secondary research 
use. Specific considerations may vary with the type of study and 
whether the data are obtained through prospective or retrospective data 
collections. The NIH provides additional information on issues related 
to the respect for research participant interests in its Points To 
Consider for IRBs and Institutions in Their Review of Data Submission 
Plans for Institutional Certifications.35 This and other 
policy-related documents will be updated once the Policy is final.
    For studies initiated after the effective date of this Policy, the 
NIH expects the informed consent process and documents to state that a 
participant's genomic and phenotypic data may be shared broadly for 
future research purposes and also explain whether the data will be 
shared through open or controlled access. If human genomic data are to 
be shared in open-access repositories, the NIH expects that 
participants will have provided explicit consent for sharing their data 
through open-access mechanisms. For studies proposing to use cell lines 
or clinical specimens,\36\ the NIH expects that informed consent for 
future research use and broad data sharing will have been obtained even 
if the cell lines or clinical specimens are de-identified. If there are 
compelling scientific reasons that necessitate the use of cell lines or 
clinical specimens that were created or collected after the effective 
date of this Policy and that lack consent for research use and data 
sharing, investigators should provide a justification for the use of 
any such materials in the funding request.
    For studies using data or specimens collected before the effective 
date of this Policy, there may be considerable variation in the extent 
to which data sharing and future genomic research was addressed within 
the informed consent materials for the primary research. In these 
cases, an assessment by an IRB, Privacy Board, or equivalent group is 
essential to ensure that data submission is not inconsistent with the 
informed consent provided by the research participant.
    The NIH will accept data derived from cell lines or clinical 
specimens lacking consent for research use that were created or 
collected before the effective date of this Policy. Grandfathered 
genomic data that are currently available through open access may be 
submitted to an open-access NIH-designated data repository; otherwise, 
the data should be submitted to a controlled-access NIH-designated data 
repository.
    While the NIH encourages broad access to genomic data, in some 
circumstances broad sharing may be inconsistent with the informed 
consent of the research participants whose data are included in the 
dataset. In such circumstances, institutions planning to submit 
aggregate- or individual-level data to the NIH for controlled access 
should note any data use limitations in the data sharing or data 
management plan submitted as part of the funding request. These data 
use limitations should be specified in the Institutional Certification 
submitted to the NIH prior to award.
5. Institutional Certification
    The responsible Institutional Signing Official of the submitting 
institution should provide an Institutional Certification to the 
funding IC prior to award. The Institutional Certification should 
indicate whether the data will be submitted to an open- or controlled-
access database and assure that:
     The data submission is consistent with applicable laws, 
regulations, and institutional policies; \37\
     The appropriate research uses of the data and any uses 
that are specifically excluded in the informed consent documents are 
delineated; \38\
     The identities of research participants will not be 
disclosed to NIH-designated data repositories; and
     An IRB, Privacy Board, and/or equivalent body \39\ has 
reviewed the investigator's proposal for data submission and assures 
that:
    [cir] The protocol for the collection of genomic and phenotypic 
data was consistent with 45 CFR part 46;
    [cir] Data submission and subsequent data sharing for research 
purposes are consistent with the informed consent of study participants 
from whom the data were obtained; \40\
    [cir] Risks to individuals and their families associated with data 
submitted to NIH-designated data repositories were considered;
    [cir] To the extent relevant and possible, risks to groups or 
populations associated with data submitted to NIH-designated data 
repositories were considered; and
    [cir] The investigator's plan for de-identifying datasets is 
consistent with the standards outlined in this Policy (see section 
IV.C.1.).
    Institutions should indicate in the certification whether aggregate 
genomic data from datasets with data use limitations may be appropriate 
for general research use (i.e., use for any research question such as 
research to understand the biological mechanisms underlying disease, 
development of statistical research methods, the study of populations 
origins). If so, the

[[Page 57863]]

aggregate genomic data will be made available through the controlled-
access compilation of aggregate genomic data \41\ to facilitate 
secondary research.
6. Data Withdrawal
    Submitting investigators and their institutions may request removal 
of data on individual participants from NIH-designated data 
repositories in the event that a research participant withdraws his or 
her consent. However, data that have been distributed for approved 
research use cannot be retrieved.
7. Exceptions to Data Submission Expectations
    The NIH acknowledges that in some cases, circumstances beyond the 
control of investigators may preclude submission of data to NIH-
designated data repositories (e.g., country or state laws that prohibit 
data submission to a U.S. federal database). In such cases, 
investigators should provide a justification for any exceptions 
requested in the application or proposal. The funding IC may grant an 
exception to the submission of relevant data to the NIH, and the 
investigator would be expected to develop a plan to share data through 
other mechanisms. For transparency purposes, when exceptions are 
granted, studies will still be registered in dbGaP and the reason for 
the exception will be included in the registration record. Information 
about current expectations for exception requests will be made 
available on the GDS Web site.

V. Responsibilities of Investigators Accessing and Using Genomic Data

A. Requests for Controlled-Access Data
    Access to human data is through a two-tiered model involving open- 
and controlled-data access mechanisms. Requests for controlled-access 
data \42\ are reviewed by NIH Data Access Committees (DACs).\43\ DAC 
decisions are based primarily upon conformance of the proposed research 
as described in the access request to the data use limitations 
established by the submitting institution through the Institutional 
Certification. The NIH DACs will accept requests for proposed research 
uses beginning one month prior to the anticipated data release date. 
The access period for all controlled-access data is one year; at the 
end of each approved period, data users can request an additional year 
of access or close out the project.
    Investigators approved to download controlled-access data from NIH-
designated data repositories and their institutions are expected to 
abide by the NIH User Code of Conduct \44\ through their agreement to 
the Data Use Certification.\45\ The Data Use Certification, co-signed 
by the investigators requesting the data and their Institutional 
Signing Official, specifies the terms and conditions for the secondary 
research use of controlled-access data, such as:
     Using the data only for the approved research;
     Protecting data confidentiality;
     Following all applicable laws, regulations, and local 
institutional policies and procedures for handling genomic data;
     Not attempting to identify individual participants from 
whom the data were obtained;
     Not selling any of the data obtained from the NIH-
designated data repositories;
     Not sharing any of the data obtained from the NIH-
designated data repositories with individuals other than those listed 
in the data access request;
     Agreeing to the listing of a summary of approved research 
uses in dbGaP along with the investigator's name and organizational 
affiliation;
     Agreeing to report, in real time, violations of the GDS 
Policy to the appropriate DAC;
     Providing annual updates on research using controlled-
access datasets.
    For investigators who are approved to use the data, the NIH 
maintains guidance on security practices \46\ that outlines expected 
data security protections (e.g., physical security measures and user 
training) to ensure that the data are kept secure and not released to 
any person not permitted to access the data.
B. Acknowledgment Responsibilities
    The NIH expects all investigators who access genomic datasets from 
NIH-designated data repositories to acknowledge in all resulting oral 
or written presentations, disclosures, or publications the contributing 
investigator(s) who conducted the original study, the funding 
organization(s) that supported the work, the specific dataset(s) and 
applicable accession number(s), and the NIH-designated data 
repositories through which the investigator accessed any data.

VI. Intellectual Property

    Naturally occurring DNA sequences are not patentable in the United 
States.\47\ Therefore, basic sequence data and certain related 
information (e.g., genotypes, haplotypes, p values, allele frequencies) 
are pre-competitive, and such data made available through NIH-
designated data repositories and all conclusions derived directly from 
them should remain freely available, without any licensing 
requirements, for uses such as markers for developing assays and guides 
for identifying new potential targets for drugs, therapeutics, and 
diagnostics. In addition, the NIH discourages the use of patents to 
prevent the use of or block access to genomic or genotype-phenotype 
data developed with NIH support. The NIH encourages broad use of NIH-
funded genomic data that is consistent with a responsible approach to 
management of intellectual property derived from downstream 
discoveries, as outlined in the NIH Best Practices for the Licensing of 
Genomic Inventions \48\ and Research Tools Policy.\49\ The NIH 
encourages patenting of technology suitable for subsequent private 
investment that may lead to the development of products that address 
public needs.

Appendix A

Supplemental Information for the NIH Genomic Data Sharing Policy

Overview

    This document provides additional guidance on the types of 
research projects to which the Genomic Data Sharing (GDS) Policy 
applies and the NIH's expectations for data submission and release.

Examples of Types of Research Covered Under the GDS Policy

    The GDS Policy is applicable to any NIH-funded research project 
involving nonhuman organisms or human specimens that produces 
genomic, metagenomic, epigenomic, or transcriptomic data from large-
output sequencing instruments or genotyping platforms, such as 
projects that involve:
     Sequence data from tens of isolates from infectious 
organisms.
     Sequencing more than one gene or gene-sized region in 
more than 100 participants.
     More than 10,000 genes or regions from one participant 
(e.g., whole genome sequencing).
     More than 100,000 variant sites in more than 100 
participants.

Expectations for Data Submission and Data Release

    Data submitted to NIH-designated data repositories undergo 
different levels of data processing, and the expectations for data 
submission and data release are based on those levels. The table and 
text below describe the expectations for each level. The NIH will 
review these expectations at regular intervals, and any updates will 
be published on the GDS Web site and the research community will be 
notified through appropriate communication methods (e.g., The NIH 
Guide for Grants and Contracts).

[[Page 57864]]



----------------------------------------------------------------------------------------------------------------
                                        General
              Level                 description of    Example data types    Data submission      Data release
                                    data processing                           expectation          timeline
----------------------------------------------------------------------------------------------------------------
0...............................  Raw data generated  Instrument image    Not expected......  NA.
                                   directly from the   data.
                                   instrument
                                   platform.
1...............................  Initial sequence    DNA sequencing      Not expected for    NA.
                                   reads, the most     reads, ChIP-Seq     human data if
                                   fundamental form    reads, RNA-Seq      reads are
                                   of the data after   reads, SNP          included in Level
                                   the basic           arrays, arrayCGH.   2 aligned
                                   translation of                          sequence file
                                   raw input.                              (e.g., BAM).
                                                                          Nonhuman de novo    Up to 6 months for
                                                                           sequence data.      nonhuman data.
2...............................  Data after an       DNA sequence        Project specific,   Up to 6 months
                                   initial round of    alignments to a     generally within    after data
                                   analysis or         reference           3 months after      submission or at
                                   computation to      sequence or de      data generation.    the time of
                                   clean the data      novo assembly,                          acceptance of the
                                   and assess basic    RNA expression                          first
                                   quality measures.   profiling.                              publication,
                                                                                               whichever occurs
                                                                                               first.
3...............................  Analysis to         SNP or structural   Project specific,   Up to 6 months
                                   identify genetic    variant calls,      generally within    after data
                                   variants, gene      expression peaks,   3 months after      submission or at
                                   expression          epigenomic          data generation.    the time of
                                   patterns, or        features.                               acceptance of the
                                   other features of                                           first
                                   the dataset.                                                publication,
                                                                                               whichever occurs
                                                                                               first.
4...............................  Final analysis      Genotype-phenotype  Data submitted as   Data released with
                                   that relates the    relationships,      analyses are        publication.
                                   genomic data to     relationships of    completed.
                                   phenotype or        RNA expression or
                                   other biological    epigenomic
                                   states.             patterns to
                                                       biological state.
----------------------------------------------------------------------------------------------------------------

    Level 0 and level 1 data are the raw images and initial sequence 
reads, respectively, and have limited value to secondary data users. 
NIH policy does not expect submission of these data. An exception is 
made for de novo sequencing of nonhuman organisms unless those read 
data are provided within the level 2 submission. In the case of de 
novo sequencing for nonhuman organisms, investigators who are 
submitting level 1 data may request a holding period, not to exceed 
six months, during which the datasets will not be released for use 
by other investigators. For data submitted to NIH-designated data 
repositories, provisions may be made for creating an exchange area 
in which such datasets may be shared among investigative teams prior 
to general release.
    Submission of array-based data, such as gene expression, ChIP-
chip, ArrayCGH, and SNP arrays can be submitted to GEO as level 1 
data, which will not be accessible until a manuscript describing the 
data is published. It is the submitter's responsibility to ensure 
that the data and files submitted to GEO protect participant privacy 
in accordance with all applicable laws, regulations, and 
institutional policies, including the GDS Policy.
    Level 2 constitutes a computational analysis in the form of 
higher order assembly or placement of the sequencing reads on a 
reference template. For human sequencing projects, the level 2 file 
comprises the reads ``piled'' on a reference human genome. A 
submission would be a file (e.g., binary alignment matrix (BAM) 
files) usually containing the unmapped reads as well. GWAS and other 
types of projects (e.g., RNA expression profiling or de novo 
sequencing) would also generate a level 2 placement or assembly 
file.
    Generation of data files at level 2 generally requires 
substantial analysis and quality checks relating to both breadth of 
coverage of the targeted region and accuracy of assembly. Sufficient 
time will be allowed to complete the analysis and generate the 
assembly, up to the coverage and quality thresholds specified by a 
project or investigative team. In general, it is anticipated that 
this work could reasonably be completed within three months, and 
data submission would follow shortly thereafter. Data files may be 
held in an exchange area accessible only to the submitting 
investigators and collaborators for a period not to exceed six 
months from the time of submission. Following this period of 
exclusivity, the data will be available for research access without 
restrictions on publication.
    Phenotype or clinical data should be submitted to the NIH-
designated data repository at the earliest opportunity, but no later 
than the date of level 2 genomic data submission (or levels 2 and 3 
for GWAS datasets), especially for studies in which all phenotype 
data have already been gathered. For studies in which phenotype data 
collections are ongoing and/or may be regularly updated, data files 
should be submitted to NIH-designated data repositories as early as 
possible considering the practical needs for ensuring data accuracy; 
generally speaking, this time should not exceed six months after 
data collection.
    Level 3 includes analysis to identify variants or to elucidate 
other features of the genomic dataset, such as gene expression 
patterns in an RNAseq assay. Level 3 data may be generated from a 
single level 2 data file (e.g., variant sites versus the human 
reference genome), but will often derive from a compilation of 
sequencing assemblies (e.g., in a genome study of a specific cancer 
type). Data submission expectations for level 3 files will vary 
substantially by project and therefore will require consultation 
with NIH program staff. As in level 2 data submission, level 3 files 
will be date stamped and the data producer may request a period of 
exclusivity not to exceed six months, after which time the datasets 
will be released through open- or controlled-access mechanisms as 
appropriate and without publication limitations.
    Level 4 constitutes the final analysis, relating the genomic 
datasets to phenotype or other biological states as pertinent to the 
research objective. Data in this level are the project findings or 
the publication dataset. Investigators should submit these data 
prior to publication, and the data will be released concurrent with 
publication.

References

    \1\ The genome is the entire set of genetic instructions found 
in a cell. See https://ghr.nlm.nih.gov/glossary=genome.
    \2\ Final NIH Statement on Sharing Research Data. February 26, 
2003. See https://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html.
    \3\ NIH Intramural Policy on Large Database Sharing. April 5, 
2002. See https://sourcebook.od.nih.gov/ethic-conduct/large-db-sharing.htm.
    \4\ Policy for Sharing of Data Obtained in NIH Supported or 
Conducted Genome-Wide Association Studies (GWAS). August 28, 2007. 
See https://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html.
    \5\ A GWAS is defined as any study of genetic variation across 
the entire human genome that is designed to identify genetic 
associations with observable traits (such as blood pressure or 
weight), or the presence or absence of a disease or condition.
    \6\ Notice on Development of Data Sharing Policy for Sequence 
and Related Genomic Data. October 19, 2009. See https://grants.nih.gov/grants/guide/notice-files/NOT-HG-10-006.html.
    \7\ Office of Science and Technology Policy Memorandum, 
Expanding Public Access to the Results of Federally Funded Research. 
February 22, 2013. See https://www.whitehouse.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research.
    \8\ ``De-identified'' refers to removing information that could 
be used to associate a dataset or record with a human individual. 
Under this Policy, data should be de-identified according to the 
standards set forth in the HHS Regulations for the Protection of 
Human Subjects and the Health Insurance Portability and 
Accountability Act (HIPAA) Privacy Rule. The HIPAA Privacy Rule 
lists 18 identifiers that must be removed to classify data as de-
identified. For the full list,

[[Page 57865]]

see https://privacyruleandresearch.nih.gov/pr_08.asp.
    \9\ An Institutional Signing Official is generally a senior 
official at an institution who is credentialed through the NIH eRA 
Commons system and is authorized to enter the institution into a 
legally binding contract and sign on behalf of an investigator who 
has submitted data or a data access request to the NIH.
    \10\ The NIH's mission is to seek fundamental knowledge about 
the nature and behavior of living systems and the application of 
that knowledge to enhance health, lengthen life, and reduce illness 
and disability. See https://www.nih.gov/about/mission.htm.
    \11\ Final NIH Statement on Sharing Research Data. February 26, 
2003. See https://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html.
    \12\ NIH Intramural Policy on Large Database Sharing. April 5, 
2002. See https://sourcebook.od.nih.gov/ethic-conduct/large-db-sharing.htm.
    \13\ GWAS has the same definition in this policy as in the 2007 
GWAS Policy: a study in which the density of genetic markers and the 
extent of linkage disequilibrium should be sufficient to capture (by 
the r\2\ parameter) a large proportion of the common variation in 
the genome of the population under study, and the number of samples 
(in a case-control or trio design) should provide sufficient power 
to detect variants of modest effect.
    \14\ Competing grant applications encompass all activities with 
a research component, including but not limited to the following: 
Research Grants (Rs), Program Projects (Ps), Cooperative Research 
Mechanisms (Us), Career Development Awards (Ks), and SCORs and other 
S grants with a research component.
    \15\ Investigators should refer to funding announcements or IC 
Web sites for contact information.
    \16\ NIH Policy on Sharing of Model Organisms for Biomedical 
Research. Release Date May 7, 2004. See https://grants.nih.gov/grants/guide/notice-files/NOT-OD-04-042.html.
    \17\ Gene Expression Omnibus at https://www.ncbi.nlm.nih.gov/geo/
.
    \18\ Sequence Read Archive at https://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?.
    \19\ Trace Archive at https://www.ncbi.nlm.nih.gov/Traces/trace.cgi.
    \20\ Array Express at https://www.ebi.ac.uk/arrayexpress/.
    \21\ Mouse Genome Informatics at https://www.informatics.jax.org/
.
    \22\ WormBase at https://www.wormbase.org.
    \23\ The Zebrafish Model Organism Database at https://zfin.org/.
    \24\ GenBank at https://www.ncbi.nlm.nih.gov/genbank/.
    \25\ European Nucleotide Archive at https://www.ebi.ac.uk/ena/.
    \26\ DNA Data Bank of Japan at https://www.ddbj.nig.ac.jp/.
    \27\ A period for data preparation is anticipated prior to data 
submission to the NIH, and the appropriate time intervals for that 
data preparation (or data cleaning) will be subject to the 
particular data type and project plans (see Appendix A). 
Investigators should work with NIH Program or Project Officials for 
specific guidance.
    \28\ See 45 CFR 46.102(f) at https://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.html#46.102.
    \29\ See 45 CFR 164.514(b)(2). The list of HIPAA identifiers 
that must be removed is available at: https://www.gpo.gov/fdsys/pkg/CFR-2002-title45-vol1/pdf/CFR-2002-title45-vol1-sec164-514.pdf.
    \30\ For additional information about Certificates of 
Confidentiality, see https://grants.nih.gov/grants/policy/coc/.
    \31\ Confidentiality Certificate. HG-2009-01. Issued to the 
National Center for Biotechnology Information, National Library of 
Medicine, NIH. See https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=ConfidentialityCertificate.pdf.
    \32\ Database of Genotypes and Phenotypes at https://www.ncbi.nlm.nih.gov/gap.
    \33\ Cancer Genomics Hub at https://cghub.ucsc.edu/.
    \34\ The 1000 Genomes Project at https://www.1000genomes.org/.
    \35\ Points to Consider for IRBs and Institutions in their 
Review of Data Submission Plans for Institutional Certifications. 
See https://gwas.nih.gov/pdf/PTC_for_IRBs_and_Institutions_revised5-31-11.pdf.
    \36\ Clinical specimens are specimens that have been obtained 
through clinical practice.
    \37\ For the submission of data derived from cell lines or 
clinical specimens lacking research consent that were created or 
collected before the effective date of this Policy, the 
Institutional Certification needs to address only this item.
    \38\ For guidance on clearly communicating inappropriate data 
uses, see NIH Points to Consider in Drafting Effective Data Use 
Limitation Statements, https://gwas.nih.gov/pdf/NIH_PTC_in_Drafting_DUL_Statements.pdf.
    \39\ ``Equivalent body'' is used here to acknowledge that some 
primary studies may be conducted abroad and in such cases the 
expectation is that an analogous review committee to an IRB or 
Privacy Board (e.g., Research Ethics Committees) may be asked to 
participate in the presubmission review of proposed genomic 
projects.
    \40\ As noted earlier, for studies using data or specimens 
collected before the effective date of this Policy, the IRB or 
Privacy Board should review informed consent materials to ensure 
that data submission is not inconsistent with the informed consent 
provided by the research participants.
    \41\ Compilation of Aggregate Genomic Data. dbGaP study 
accession: phs000501.v1.p1. See https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study501.cgi?study_id=phs000501.v1.p1&pha=&phaf=.
    \42\ dbGaP Authorized Access. See https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login.
    \43\ For a list of NIH Data Access Committees, see https://gwas.nih.gov/04po2_1DAC.html.
    \44\ User Code of Conduct. See https://dbgap.ncbi.nlm.nih.gov/aa/GWAS_Code_of_Conduct.html.
    \45\ Model Data Use Certification Agreement. See https://gwas.nih.gov/pdf/Model_DUC_7-26-13.pdf.
    \46\ Security Best Practices. See https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=dbgap_2b_security_procedures.pdf.
    \47\ In Association for Molecular Pathology et al. v. Myriad 
Genetics, Inc., et al. 569 U.S. ------ 2013. See https://www.supremecourt.gov/opinions/12pdf/12-398_1b7d.pdf.
    \48\ NIH Best Practices for the Licensing of Genomic Inventions. 
See https://www.ott.nih.gov/policy/genomic_invention.html.
    \49\ Research Tools Policy. See https://www.ott.nih.gov/policy/research_tool.aspx.

    Dated: September 16, 2013.
Lawrence A. Tabak,
Deputy Director, National Institutes of Health.
[FR Doc. 2013-22941 Filed 9-19-13; 8:45 am]
BILLING CODE 4140-01-P
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.