Genomic sequencing data for PNG15 and PNG16
This data set contains genomic sequencing data and variant calls for PNG15 and PNG16.
- 02/10/2025
- 2 samples
- DAC: EGAC50000000674
- Technologies: Illumina NovaSeq 6000, PromethION, Sequel II, unspecified
Papuan New Guinea Pangenome Project
DATA ACCESS POLICY Papua New Guinea Pangenome Project These terms and conditions govern access to the managed access datasets (details of which are set out in Appendix I) to which the User Institution has requested access. The User Institution agrees to be bound by these terms and conditions. DEFINITIONS Authorized Personnel: The individuals at the User Institution to whom the Committee grants access to the Data. This includes the User, the individuals listed in Appendix II and any other individuals for whom the User Institution subsequently requests access to the Data. Details of the initial Authorized Personnel are set out in Appendix II. Committee: The Papua New Guinea Pangenome Project Committee comprises the following members: Dr Matthew Leavesley from the University of Papua New Guinea, National Capital District, Papua New Guinea; Dr Nicolas Brucato from the Université Paul Sabatier – Toulouse III, Toulouse, France; Dr François-Xavier Ricaut from the Université Paul Sabatier – Toulouse III, Toulouse, France; Assoc Prof Irene Gallego Romero from St Vincent's Institute of Medical Research, Melbourne, Australia; Dr. PingHsun Hsieh from the University of Minnesota, Twin Cities, MN, USA. These members will retain their roles even if they relocate to other institutions. Membership of this Committee may change over time. For legal purposes, the Committee member at the University of Papua New Guinea has precedence on all matters relating to this document. Data: The managed access datasets to which the User Institution has requested access. Data Producers: The Committee and the collaborators listed in Appendix I are responsible for the development, organization, and oversight of these Data. External Collaborator: A collaborator of the User, working for an institution other than the User Institution. Project: The project for which the User Institution has requested access to these Data. A description of the Project is set out in Appendix II. Publications: Includes, without limitation, articles published in print journals, electronic journals, reviews, books, posters and other written and verbal presentations of research. Research Participant: An individual whose data form part of these Data. Research Purposes: This shall mean research that seeks to advance the understanding of genetics and genomics, including the treatment of disorders, and work on statistical methods that may be applied to such research. User: The principal investigator for the Project. User Institution(s): The Institution that has requested access to the Data. DATA ACCESS AGREEMENT 1. The User Institution agrees to only use these Data for the purpose of the Project (described in Appendix II) and only for Research Purposes. The User Institution further agrees that it will only use these Data for Research Purposes which are within the limitations (if any) set out in Appendix I. 2. The User Institution agrees to preserve, at all times, the confidentiality of these Data. In particular, it undertakes not to use, or attempt to use these Data to compromise or otherwise infringe the confidentiality of information on Research Participants. Without prejudice to the generality of the foregoing, the User Institution agrees to use at least the measures set out in Appendix I to protect these Data. 3. The User Institution agrees to protect the confidentiality of Research Participants in any research papers or publications that they prepare by taking all reasonable care to limit the possibility of identification. 4. The User Institution agrees not to link or combine these Data to other information or archived data available in a way that could re-identify the Research Participants, even if access to that data has been formally granted to the User Institution or is freely available without restriction. 5. The User Institution agrees only to transfer or disclose these Data, in whole or part, or any material derived from these Data, to the Authorized Personnel. Should the User Institution wish to share these Data with an External Collaborator, the External Collaborator must complete a separate application for access to these Data. 6. The User Institution agrees that the Data Producers, and all other parties involved in the creation, funding, or protection of these Data: a) make no warranty or representation, express or implied as to the accuracy, quality, or comprehensiveness of these Data; b) exclude to the fullest extent permitted by law all liability for actions, claims, proceedings, demands, losses (including but not limited to loss of profit), costs, awards damages and payments made by the Recipient that may arise (whether directly or indirectly) in any way whatsoever from the Recipient’s use of these Data or from the unavailability of, or break in access to, these Data for whatever reason and; c) bear no responsibility for the further analysis or interpretation of these Data. 7. The User Institution agrees to follow the Fort Lauderdale Guidelines (https://www.wtccc.org.uk/wtccc/assets/wtd003207.pdf) and the Toronto Statement (http://www.nature.com/nature/journal/v461/n7261/full/461168a.html). This includes but is not limited to recognizing the contribution of the Data Producers and including a proper acknowledgment in all reports or publications resulting from the use of these Data. 8. The User Institution agrees to follow the Publication Policy in Appendix III. 9. The User Institution agrees not to make intellectual property claims on these Data and not to use intellectual property protection in ways that would prevent or block access to, or use of, any element of these Data, or conclusion drawn directly from these Data. 10. The User Institution can elect to perform further research that would add intellectual and resource capital to these data and decide to obtain intellectual property rights on these downstream discoveries. In this case, the User Institution agrees to implement licensing policies that will not obstruct further research and to follow the U.S. National Institutes of Health Best Practices for the Licensing of Genomic Inventions (2005) (https://www.icgc.org/files/daco/NIH_BestPracticesLicensingGenomicInventions_2005_en. pdf) in conformity with the Organization for Economic Co-operation and Development Guidelines for the Licensing of the Genetic Inventions (2006) (http://www.oecd.org/science/biotech/36198812.pdf). 11. The User Institution agrees to destroy/discard the Data held once it is no longer used for the Project unless obliged to retain the data for archival purposes in conformity with audit or legal requirements. 12. The User Institution will notify the Committee within 30 days of any changes or departures of Authorized Personnel. 13. The User Institution will notify the Committee prior to any significant changes to the protocol for the Project. 14. The User Institution will notify the Committee as soon as it becomes aware of a breach of the terms or conditions of this agreement. 15. The Committee may terminate this agreement by written notice to the User Institution. If this agreement terminates for any reason, the User Institution will be required to destroy any Data held, including copies and backup copies. This clause does not prevent the User Institution from retaining these data for archival purposes in conformity with audit or legal requirements. 16. The User Institution accepts that it may be necessary for the Data Producers to alter the terms of this agreement from time to time. As an example, this may include specific provisions relating to the Data required by Data Producers other than the Committee. In the event that changes are required, the Data Producers or their appointed agent will contact the User Institution to inform it of the changes and the User Institution may elect to accept the changes or terminate the agreement. 17. If requested, the User Institution will allow data security and management documentation to be inspected to verify that it is complying with the terms of this agreement. 18. The User Institution agrees to distribute a copy of these terms to the Authorized Personnel. The User Institution will procure that the Authorized Personnel comply with the terms of this agreement. 19. This agreement (and any dispute, controversy, proceedings, or claim of whatever nature arising out of this agreement or its formation) shall be construed, interpreted, and governed by the laws of Papua New Guinea and shall be subject to the exclusive jurisdiction of Papua New Guinean courts. AGREED FOR USER INSTITUTION Name: Title: Date: Signature: PRINCIPAL INVESTIGATOR I confirm that I have read and understood this Agreement Name: Title: Date: Signature: AGREED FOR THE COMMITTEE Name: Title: Date: Signature: APPENDIX I – DATASET DETAILS This collection contains whole-genome sequencing data, including both short- and long-read sequences, from individuals across Papua New Guinea, a region with significant cultural and genetic diversity yet underrepresented in human pangenomics. Pangenome references are essential for capturing the full spectrum of genetic variation, enabling new discoveries in genetics and medicine. To address this gap, we aim to sequence and create haplotype- phased de novo assemblies from individuals in Papua New Guinea and surrounding areas, constructing a pangenome reference that comprehensively represents the region’s genetic diversity. This effort shifts from a single reference genome to a pangenome, better reflecting genomic diversity within the region and across human populations. Contact Person: Dr Nicolas Brucato from the Université Paul Sabatier – Toulouse III, Toulouse, France nicolasbrucato@gmail.com; nicolas.brucato@univ-tlse3.fr Dr. PingHsun Hsieh from the University of Minnesota, Twin Cities, MN, USA hsiehph@umn.edu The Papua New Guinea Pangenome Project is a consortium comprising: Dr Matthew Leavesley from the University of Papua New Guinea, National Capital District, Papua New Guinea Dr Nicolas Brucato from the Université Paul Sabatier – Toulouse III, Toulouse, France Dr François-Xavier Ricaut from the Université Paul Sabatier – Toulouse III, Toulouse, France Assoc Prof Irene Gallego Romero, St Vincent's Institute of Medical Research, Melbourne, Australia Names of other data producers/collaborators: Prof Murray P. Cox, Massey University, New Zealand Specific limitations on areas of research: Users must be formally affiliated with an officially recognized Institution. The User can replicate existing studies published by the Papua New Guinea Pangenome Project research program, using similar techniques, approaches and methods, to ensure that the published science is reproducible. Approval will be automatically granted for such use. The User cannot publicly release any Data. All rights regarding data release remain with the Committee. The User cannot use any Data for for-profit purposes. The User cannot undertake studies of a medical or clinical nature without first seeking the approval of the Committee. In such cases, evidence of specific ethical approvals, including documentation from a Papua New Guinea ethics board, may be necessary for approval to be granted. Note that all uses of the data must have specific prior approval from the Committee. Evidence of ethical approvals, including documentation from a Papuan New Guinea ethics board, may be necessary for approval to be granted in some cases. Minimum protection measures required: Data can be held in unencrypted files on an institutional computer system, behind a secure firewall, with Unix user group read/write access for one or more appropriate groups but not Unix world read/write access. Laptops holding these data should have password protected logins and screen locks (set to lock after 5 min of inactivity). If held on USB keys or other portable hard drives, the data must be encrypted. APPENDIX II – PROJECT DETAILS (to be completed by the Requestor) Brief abstract of the Project in which the Data will be used (500 words max) All Individuals who the User Institution to be named as registered users (repeat as needed). Name of Registered User: Job Title: Email: Supervisor: All Individuals that should have an account created at the EGA (repeat as needed) Note that EGA usually requires institutional email addresses. Name of Registered User: Job Title: Email: APPENDIX III – PUBLICATION POLICY In all publications that include this dataset, the User Institution must describe how the data can be accessed, including the name of the dataset, the name of the data repository, and the data accession numbers. The User Institution must also cite the paper in which this dataset is described scientifically. All of these details are listed below. Dataset Title: The Papua New Guinea Pangenome Project Repository and Accession Number: European Genome-phenome Archive Citation: Hsieh P, Soisangwan N, Gordon D, Javidh A, Harvey W, Porubsky D, Hoekzema K, Baker CA, Munson KM, Kinipi C, Leavesley M, Brucato N, Cox MP, Ricaut FX, Gallego Romero I, Eichler EE. 2025. A global map for archaic introgressed structural variation in humans. bioRxiv. doi: https://doi.org/10.1101/2025.06.24.661368
Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.
| Study ID | Study Title | Study Type |
|---|---|---|
| EGAS50000001105 | Whole Genome Sequencing |
This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.
| ID | File Type | Size | Quality Report |
Located in
i
|
|---|---|---|---|---|
| EGAF50000400479 | fastq.gz | 26.9 GB |
|
|
| EGAF50000400480 | fastq.gz | 16.3 GB |
|
|
| EGAF50000400481 | fastq.gz | 28.2 GB |
|
|
| EGAF50000400482 | fastq.gz | 28.9 GB |
|
|
| EGAF50000400483 | fa | 2.9 GB |
|
|
| EGAF50000400484 | fastq.gz | 32.6 GB |
|
|
| EGAF50000400485 | fastq.gz | 37.2 GB |
|
|
| EGAF50000400486 | fastq.gz | 12.3 GB |
|
|
| EGAF50000400487 | fastq.gz | 21.2 GB |
|
|
| EGAF50000400488 | fa | 3.0 GB |
|
|
| EGAF50000400489 | fq.gz | 14.6 GB |
|
|
| EGAF50000400490 | fastq.gz | 27.9 GB |
|
|
| EGAF50000400491 | fastq.gz | 11.3 GB |
|
|
| EGAF50000400492 | fastq.gz | 58.3 GB |
|
|
| EGAF50000400493 | fastq.gz | 36.6 GB |
|
|
| EGAF50000400494 | fastq.gz | 61.2 GB |
|
|
| EGAF50000400495 | fastq.gz | 131.5 GB |
|
|
| EGAF50000400496 | fq.gz | 33.7 GB |
|
|
| EGAF50000400497 | fastq.gz | 117.2 GB |
|
|
| EGAF50000400498 | fastq.gz | 26.9 GB |
|
|
| EGAF50000400499 | fastq.gz | 1.8 GB |
|
|
| EGAF50000400500 | fastq.gz | 24.4 GB |
|
|
| EGAF50000400501 | fastq.gz | 13.2 GB |
|
|
| EGAF50000400502 | fastq.gz | 19.1 GB |
|
|
| EGAF50000400503 | fastq.gz | 24.4 GB |
|
|
| EGAF50000400504 | fq.gz | 32.4 GB |
|
|
| EGAF50000400505 | fa | 3.0 GB |
|
|
| EGAF50000400506 | fa | 3.0 GB |
|
|
| EGAF50000400507 | fq.gz | 13.2 GB |
|
|
| EGAF50000400508 | fastq.gz | 28.2 GB |
|
|
| EGAF50000400509 | fastq.gz | 12.0 GB |
|
|
| EGAF50000400510 | fastq.gz | 15.9 GB |
|
|
| EGAF50000400511 | vcf.gz | 604.0 MB |
|
|
| EGAF50000400512 | vcf.gz | 756.0 MB |
|
|
| EGAF50000400513 | vcf.gz | 665.7 MB |
|
|
| EGAF50000400514 | vcf.gz | 528.8 MB |
|
|
| EGAF50000400515 | vcf.gz | 533.8 MB |
|
|
| EGAF50000400516 | vcf.gz | 364.9 MB |
|
|
| EGAF50000400517 | vcf.gz | 292.0 MB |
|
|
| EGAF50000400518 | vcf.gz | 958.1 MB |
|
|
| EGAF50000400519 | vcf.gz | 789.7 MB |
|
|
| EGAF50000400520 | vcf.gz | 1.3 GB |
|
|
| EGAF50000400521 | vcf.gz | 496.7 MB |
|
|
| EGAF50000400522 | vcf.gz | 488.7 MB |
|
|
| EGAF50000400523 | vcf.gz | 508.2 MB |
|
|
| EGAF50000400524 | vcf.gz | 1.4 GB |
|
|
| EGAF50000400525 | vcf.gz | 1.0 GB |
|
|
| EGAF50000400526 | vcf.gz | 1.0 GB |
|
|
| EGAF50000400527 | vcf.gz | 1.1 GB |
|
|
| EGAF50000400528 | vcf.gz | 737.2 MB |
|
|
| EGAF50000400529 | vcf.gz | 854.2 MB |
|
|
| EGAF50000400530 | vcf.gz | 490.4 MB |
|
|
| EGAF50000400531 | vcf.gz | 300.0 MB |
|
|
| EGAF50000400532 | vcf.gz | 3.3 MB |
|
|
| EGAF50000400533 | vcf.gz | 960.1 MB |
|
|
| EGAF50000400534 | mod.gz | 1.3 GB |
|
|
| 56 Files (936.8 GB) | ||||
