Khoe-San Genome Project
The dataset contains BAM files of whole genome sequencing data of African Khoe-San population (n=169, 30X coverage). DNA extracted from blood underwent 2 × 150 bp sequencing on the Illumina HiSeq X instrument (Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research). Raw sequencing reads were aligned against human reference hg38 including alternative contigs using bwa (v.0.7.15). The Genome Analysis Toolkit (GATK, v.3.5-0) was used for duplicate marking, indel realignment and base quality recalibration.
- 27/11/2025
- 169 samples
- DAC: EGAC50000000798
- Technology: HiSeq X Ten
Khoe-San Genome Project (KSGP)
A subsidiary project of the Southern African Prostate Cancer Study (SAPCS) and Namibian Prostate Cancer Study (NamPCS), with the approved and funded objective to generate a population-representative panel of normal (PoN) genetic diversity and whole genome representation across the region. Background The Khoe-San genome Project (KSGP), previously the Diversity inclusive Genome Study for Southern Africa (DiGS-SA) and Ubuntu Genomics Projects was first established in 2008 with seed funding received from the University of Limpopo to Professors Philip Venter (University of Limpopo, South Africa) and Vanessa M. Hayes (University of Sydney, Australia). In South Africa, the project formed a sub-aim of the National Health and Medical Research Council (NHMRC) of South Africa initiated 2008 Southern African Prostate Cancer Study (SAPCS), with approval granted to generate population-matched panel of normal (PoN) genomic and referenced based diversity data to achieve the required goals. Originally managed out of the University of Limpopo, under ethics committee clearance certificate MREC/HS/001/2008 (PI-Venter) and Ubuntu Genome Project clearance MREC/HS/214/2011 (PI-Venter), upon the retirement of Professor Venter in 2010, the study was transferred to Professor M.S. Riana Bornman (University of Pretoria, South Africa) and approved by the University of Pretoria Faculty of Health Sciences Research Ethics Committee (with US Federal wide assurance FWA00002567 and IRB00002235 IORG0001762) HREC/43/2010 (PI-Bornman) and HREC/280/2017 (PI-Bornman). In Namibia, the study was reviewed and approved by the Ministry of Health and Social Services first in 2008 17/3/3HAYES (PI-Hayes), updated in 2014 to include the Namibian SAPCS under 17/3/3HEYNS (PI Professor Christopher Heyns, University of Stellenbosch, South Africa), which on his passing was updated to MoHSS 17/3/3HEAF003 (PI Dr Hagen E.A. Förtsch, Windhoek Central Hospital). In 2025, NamPCS (Namibian Prostate Cancer Study) and associated KSGP will be led by Professor Lamech Mwapagha (Namibia University of Science & Technology). All molecular-genetic/genomic data has been generated under ethics approval granted by the St. Vincent’s Human Research Ethics Committee in Sydney, Australia #SVH/15/227 (PI-Hayes), including established Material Transfer (MTA) and Data Sharing Agreements (DSA). All data remains the property of the country from which the participants were recruited. For the purposes of publication and cohort recognition, the Khoe-San Genome Project (KSGP) is a derivative of these funded projects. Largely derived from Namibian participants, the Data Access Committee (DAC) is primarily led out of the Namibia University for Science & Technology (NUST), with additional local, South African and data science representation. Data generated for the KSGP has been funded by the University of Limpopo, South Africa (to the late PI-Venter and PI-Hayes); Garvan Institute of Medical Research Foundation and Medical Genome Research Biobank (MGRB) in Australia (to PI-Hayes); and U.S.A. Congressionally Directed Medical Research Programs (CDMRP) Prostate Cancer Research Program (PCRP) Health Equity Research and Outcomes Improvement Consortium (HEROIC) Award (PC210168 and PC230673, HEROIC Prostate Cancer Precision Health (PCaPH) Africa1K to PI-Hayes and PI-Bornman). The data generated will be made accessible (at publication) to the research community under request, while maintaining ethical standards and practices related to generated research data, with priority to safeguarding the interests of the participants. Notwithstanding the data will only be supplied to successful applicants in a de-identified form, to reduce the risk of re-identification, the data is deemed personal information and is highly confidential. While donor consents do not require disclosure of research findings to individual participants, the KSGP management team is obligated to return research findings to the contributing communities via established engagement and awareness programs, as well as contributing to local policy development. It is therefore critical that all findings emanating from the KSGP research data is communicated back to the KSGP research team for community benefit. This policy may be updated from time-to-time. Purpose The Data Access Policy (DAP) outlines the membership of the KSGP Data Access Committee (DAC) and the processes and procedures used to ensure equitable, ethical and efficient access to and the release of published genomic data. KSGP-DAC Leads • KSGP Chair, Professor Lamech Mwapagha, Namibia University of Science & Technology (NUST), Department of Biology, Chemistry and Physics, 13 Jackson Kaujeua Street, Windhoek, Namibia Email: lmwapagha@nust.na • Data Science Lead, Professor Vanessa Hayes, Ancestry & Health genomics Laboratory, Director HEROIC PCaPH Africa1K, University of Sydney, Australia; Email: vanessa.hayes@sydney.edu.au • NamPCS Clinical Lead, Dr Hagen E.A. Förtsch, Windhoek Central Hospital, Urology Department, Urology Practice Luxury Hill, 19 Heinitzburg Street, Windhoek, Namibia Email: uroreception@mweb.com.na • South African Lead, Professor Riana Bornman, School of Systems Health & Public Health, University of Pretoria, South Africa; Email: Riana.Bornman@up.ac.za Additional KSGP-DAC Members • Mr Uvatera Maurihungirire, Namibia University of Science & Technology (NUST), Department of Biology Chemistry and Physics; Expertise: KSGP Biobank Manager • Dr Weerachai Jaratlerdsiri (PhD), Ancestry & Health Genomics Lab, University of Sydney, Australia; Expertise: Computational Genomics • Mr Elton Adams, Living Museum of the Damara, Twyfelfontein, Namibia; Expertise: Community Representative • Mr Josef /Kunta, Headman Nhoma Ju/'hoan village, Nyae Nyae Conservatory, Namibia; Expertise: Community Representative • Ms Jue Jiang, Ancestry & Health Genomics Lab, University of Sydney, Australia; Expertise: Data Management & Security Officer Application Procedures The KSGP-DAC will consider requests for published data from all researchers. Data will only be released to researchers who can provide a statement of ethics approval from an IRB or Human Research Ethics Committee (HREC) of their host institution to safeguard patient rights. Before submitting a request, applicants are encouraged to confer with one of the KSGP Leads to discuss the appropriateness of the data for the proposed study and the feasibility of the request. The following information (via a KSGP Concept Note) will be requested from applicants: (i) The applicant's name, applicant title, institution, country, and email address, note, where the application is being made on behalf of a consortium or collaborative group, provide the name of each external collaborator, their institution and country; (ii) the research question/hypotheses to be tested; (iii) list of data required; (iv) statistical justification for the number and types of cases required (where applicable); (v) brief description of the technical approach; (vi) evidence of the proposed technical approach's prior successful use (where applicable); (vi) IRB/HREC approval number (and copy of approval if requested) to conduct proposed research (external applicants); (vii) a data management plan (see below); and (viii) provide a signed letter of collaboration with a KSGP Lead or Community Representative. A copy of the data management plan should be submitted or provide the following information: (i) brief description of technical systems, policies and processes the applicant has in place to ensure the data is secure, kept confidential and safe from unauthorised use or disclosure including during any mandatory retention periods at the conclusion of the research; (ii) brief description of any third parties or service providers involved in the processing (eg. linking, storing) of the data, (iii) brief description of the intended outputs from the research (eg. statistical summaries, publications, PhD thesis, production of another data set); and (iv) what steps will the applicant take to anonymise or pseudonymise the data output? Decision-making The KSGP-DAC will meet monthly (by zoom, when required) to consider requests. Quorum for decision-making will require a minimum of five of nine members and include the Chair, Data Lead and both Community Representatives. Decisions will be made by the majority and the majority must include both Community Representatives. As all data has been generated through peer-reviewed funding mechanisms and as such, research projects may well be underway to address relevant questions. In such circumstances, priority will need to be given to already funded KSGP research efforts. The KSGP-DAC will make decisions on approval of access to data using the following criteria: (i) The track record of the applicants in the technical approach and likelihood of significant outcomes from the research proposed. (ii) Non-overlap with projects being undertaken by the KSGP team (except with compelling justification). (iii) Alignment of the research purpose with patient’s informed consent and ethics approval (e.g., restrictions as to commercial research, or other requirements needing additional approval). (iv) Availability of a data management plan. (v) Any other restrictions or conditions that may apply to the use or disclosure of the relevant data. The KSGP-DAC can impose certain restrictions on all approvals, and/or specific restrictions on approvals, as follows: (i) No transfer to third parties allowed (all). (ii) Acknowledgment of the KSGP in publications/presentations (all). (iii) A report of the results of the research to be provided to KSGP-DAC after completion (or when requested). (iv) Researchers cannot utilize the data for commercial purposes (all). (v) Approval will not be given that excludes other researchers from accessing data (all). (vi) Approval might be time-limited exclusive rights for a particular research development (particular). (v) Where a conflict of interest exists. Approved applications will require a fully institutionally executed KSGP Data Sharing Agreement (DSA) prior to recipient researcher being granted access to deidentified data through an established database. Each researcher within the recipient team needs to be registered with the KSGP-DAC prior to accessing data. Rejected applications. Decisions made by the KSGP-DAC will be communicated in writing to the applicant setting out the reasons for rejecting the application. Responsibilities It is the responsibility of the applicants to: (i) provide documentation of local IRB/ethics approval, (ii) agree to make results of studies using the data available to the larger scientific community, (iii) provide a letter of collaboration with the primary study investigator(s) thereby enhancing local inclusion, (iv) use of data is limited to not-for-profit organisations, (v) genomic data, under discussion with KSGP lead investigators, may be used for methods development purposes under collaborative agreement. It is the responsibility of Data Access Committee to: (i) review applications and approve the release of data based on the scientific value of the research proposal that is not in conflict with current KSGP approved/funded studies; (ii) coordinate data release, (iii) review the completed data and publication, including providing copies of all proposed publications to the KSGP-DAC for review and comment (prior to submission) and appropriately acknowledge KSGP in publications using relevant data.
Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.
| Study ID | Study Title | Study Type |
|---|---|---|
| EGAS50000001408 | Whole Genome Sequencing |
This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.
| ID | File Type | Size | Quality Report |
Located in
i
|
|---|---|---|---|---|
| EGAF00008618693 | bam | 173.9 GB |
|
|
| EGAF00008618702 | bam | 179.2 GB |
|
|
| EGAF00008618707 | bam | 170.0 GB |
|
|
| EGAF00008618710 | bam | 183.1 GB |
|
|
| EGAF00008618716 | bam | 193.9 GB |
|
|
| EGAF00008618878 | bam | 173.9 GB |
|
|
| EGAF00008623343 | bam | 198.7 GB |
|
|
| EGAF00008624311 | bam | 178.3 GB |
|
|
| EGAF00008624312 | bam | 176.1 GB |
|
|
| EGAF00008624313 | bam | 189.2 GB |
|
|
| EGAF00008624656 | bam | 182.1 GB |
|
|
| EGAF00008624657 | bam | 184.0 GB |
|
|
| EGAF00008624658 | bam | 177.8 GB |
|
|
| EGAF00008624659 | bam | 181.0 GB |
|
|
| EGAF00008624660 | bam | 182.2 GB |
|
|
| EGAF00008624661 | bam | 178.1 GB |
|
|
| EGAF00008624663 | bam | 185.2 GB |
|
|
| EGAF00008624676 | bam | 179.6 GB |
|
|
| EGAF00008624677 | bam | 182.3 GB |
|
|
| EGAF00008624678 | bam | 178.8 GB |
|
|
| EGAF00008624679 | bam | 199.1 GB |
|
|
| EGAF00008624680 | bam | 204.4 GB |
|
|
| EGAF00008624681 | bam | 213.8 GB |
|
|
| EGAF00008624684 | bam | 192.1 GB |
|
|
| EGAF00008630301 | bam | 179.0 GB |
|
|
| EGAF00008630302 | bam | 187.1 GB |
|
|
| EGAF00008630303 | bam | 178.9 GB |
|
|
| EGAF00008630304 | bam | 197.2 GB |
|
|
| EGAF00008630305 | bam | 170.2 GB |
|
|
| EGAF00008630306 | bam | 184.3 GB |
|
|
| EGAF00008630307 | bam | 181.3 GB |
|
|
| EGAF00008630308 | bam | 212.8 GB |
|
|
| EGAF00008630744 | bam | 178.6 GB |
|
|
| EGAF00008630797 | bam | 180.6 GB |
|
|
| EGAF00008630798 | bam | 184.3 GB |
|
|
| EGAF00008630983 | bam | 179.3 GB |
|
|
| EGAF00008630984 | bam | 177.7 GB |
|
|
| EGAF00008630985 | bam | 191.8 GB |
|
|
| EGAF00008630986 | bam | 197.4 GB |
|
|
| EGAF00008630987 | bam | 193.4 GB |
|
|
| EGAF00008630988 | bam | 196.4 GB |
|
|
| EGAF00008630989 | bam | 202.6 GB |
|
|
| EGAF00008632390 | bam | 173.9 GB |
|
|
| EGAF00008632391 | bam | 187.2 GB |
|
|
| EGAF00008632392 | bam | 186.6 GB |
|
|
| EGAF00008632394 | bam | 188.0 GB |
|
|
| EGAF00008632395 | bam | 185.3 GB |
|
|
| EGAF00008632398 | bam | 182.5 GB |
|
|
| EGAF00008632399 | bam | 193.8 GB |
|
|
| EGAF00008632400 | bam | 206.2 GB |
|
|
| EGAF00008632402 | bam | 208.5 GB |
|
|
| EGAF00008632404 | bam | 203.1 GB |
|
|
| EGAF00008632405 | bam | 200.6 GB |
|
|
| EGAF00008632419 | bam | 181.5 GB |
|
|
| EGAF00008632424 | bam | 178.3 GB |
|
|
| EGAF00008632426 | bam | 186.5 GB |
|
|
| EGAF00008632428 | bam | 187.4 GB |
|
|
| EGAF00008632432 | bam | 181.4 GB |
|
|
| EGAF00008634798 | bam | 181.0 GB |
|
|
| EGAF00008634801 | bam | 197.0 GB |
|
|
| EGAF00008634827 | bam | 190.2 GB |
|
|
| EGAF00008636346 | bam | 200.3 GB |
|
|
| EGAF00008637273 | bam | 176.2 GB |
|
|
| EGAF00008637279 | bam | 172.3 GB |
|
|
| EGAF00008637289 | bam | 183.1 GB |
|
|
| EGAF00008637314 | bam | 174.5 GB |
|
|
| EGAF00008637502 | bam | 175.2 GB |
|
|
| EGAF00008637503 | bam | 181.1 GB |
|
|
| EGAF00008637504 | bam | 179.9 GB |
|
|
| EGAF00008637505 | bam | 174.1 GB |
|
|
| EGAF00008637506 | bam | 182.0 GB |
|
|
| EGAF00008637507 | bam | 197.9 GB |
|
|
| EGAF00008637508 | bam | 192.3 GB |
|
|
| EGAF00008637509 | bam | 185.9 GB |
|
|
| EGAF00008637529 | bam | 172.8 GB |
|
|
| EGAF00008637533 | bam | 182.3 GB |
|
|
| EGAF00008637604 | bam | 188.3 GB |
|
|
| EGAF00008637609 | bam | 191.2 GB |
|
|
| EGAF00008637610 | bam | 194.4 GB |
|
|
| EGAF00008637680 | bam | 188.3 GB |
|
|
| EGAF00008637681 | bam | 181.0 GB |
|
|
| EGAF00008637682 | bam | 172.4 GB |
|
|
| EGAF00008637683 | bam | 180.4 GB |
|
|
| EGAF00008637684 | bam | 186.7 GB |
|
|
| EGAF00008637685 | bam | 177.4 GB |
|
|
| EGAF00008637697 | bam | 204.5 GB |
|
|
| EGAF00008637700 | bam | 190.7 GB |
|
|
| EGAF00008637732 | bam | 188.4 GB |
|
|
| EGAF00008637889 | bam | 182.8 GB |
|
|
| EGAF00008637890 | bam | 183.0 GB |
|
|
| EGAF00008637891 | bam | 175.2 GB |
|
|
| EGAF00008637892 | bam | 195.8 GB |
|
|
| EGAF00008637893 | bam | 187.7 GB |
|
|
| EGAF00008637894 | bam | 188.8 GB |
|
|
| EGAF00008637895 | bam | 186.7 GB |
|
|
| EGAF00008637896 | bam | 192.0 GB |
|
|
| EGAF00008637897 | bam | 194.8 GB |
|
|
| EGAF00008637898 | bam | 210.1 GB |
|
|
| EGAF00008637912 | bam | 194.7 GB |
|
|
| EGAF00008637913 | bam | 187.6 GB |
|
|
| EGAF00008637914 | bam | 225.1 GB |
|
|
| EGAF00008638104 | bam | 201.4 GB |
|
|
| EGAF00008638110 | bam | 180.7 GB |
|
|
| EGAF00008638111 | bam | 185.0 GB |
|
|
| EGAF00008638114 | bam | 178.4 GB |
|
|
| EGAF00008638115 | bam | 182.9 GB |
|
|
| EGAF00008638116 | bam | 171.0 GB |
|
|
| EGAF00008638119 | bam | 179.9 GB |
|
|
| EGAF00008638120 | bam | 178.8 GB |
|
|
| EGAF00008639155 | bam | 180.9 GB |
|
|
| EGAF00008639156 | bam | 182.5 GB |
|
|
| EGAF00008639157 | bam | 177.4 GB |
|
|
| EGAF00008639158 | bam | 181.9 GB |
|
|
| EGAF00008639159 | bam | 175.2 GB |
|
|
| EGAF00008639160 | bam | 186.1 GB |
|
|
| EGAF00008639161 | bam | 186.3 GB |
|
|
| EGAF00008639162 | bam | 182.1 GB |
|
|
| EGAF00008639618 | bam | 189.2 GB |
|
|
| EGAF00008639682 | bam | 186.1 GB |
|
|
| EGAF00008639685 | bam | 174.3 GB |
|
|
| EGAF00008639743 | bam | 181.2 GB |
|
|
| EGAF00008639747 | bam | 187.0 GB |
|
|
| EGAF00008640000 | bam | 175.1 GB |
|
|
| EGAF00008640383 | bam | 181.6 GB |
|
|
| EGAF00008640384 | bam | 176.8 GB |
|
|
| EGAF00008640388 | bam | 182.3 GB |
|
|
| EGAF00008640398 | bam | 178.6 GB |
|
|
| EGAF00008640399 | bam | 176.7 GB |
|
|
| EGAF00008640400 | bam | 187.1 GB |
|
|
| EGAF00008640403 | bam | 178.1 GB |
|
|
| EGAF00008640415 | bam | 171.9 GB |
|
|
| EGAF00008640416 | bam | 182.3 GB |
|
|
| EGAF00008640419 | bam | 178.2 GB |
|
|
| EGAF00008640421 | bam | 172.1 GB |
|
|
| EGAF00008640460 | bam | 183.2 GB |
|
|
| EGAF00008640475 | bam | 172.2 GB |
|
|
| EGAF00008640481 | bam | 180.4 GB |
|
|
| EGAF00008640550 | bam | 187.0 GB |
|
|
| EGAF00008640665 | bam | 178.2 GB |
|
|
| EGAF00008640677 | bam | 187.8 GB |
|
|
| EGAF00008640694 | bam | 179.2 GB |
|
|
| EGAF00008640705 | bam | 174.9 GB |
|
|
| EGAF00008640729 | bam | 175.5 GB |
|
|
| EGAF00008640966 | bam | 169.6 GB |
|
|
| EGAF00008640983 | bam | 180.2 GB |
|
|
| EGAF00008641019 | bam | 188.2 GB |
|
|
| EGAF00008641040 | bam | 181.5 GB |
|
|
| EGAF00008641074 | bam | 178.7 GB |
|
|
| EGAF00008641219 | bam | 175.9 GB |
|
|
| EGAF00008641388 | bam | 176.7 GB |
|
|
| EGAF00008641389 | bam | 179.2 GB |
|
|
| EGAF00008641390 | bam | 182.3 GB |
|
|
| EGAF00008641391 | bam | 175.3 GB |
|
|
| EGAF00008641665 | bam | 175.7 GB |
|
|
| EGAF00008641666 | bam | 179.5 GB |
|
|
| EGAF00008641699 | bam | 182.1 GB |
|
|
| EGAF00008641706 | bam | 177.2 GB |
|
|
| EGAF00008641765 | bam | 172.1 GB |
|
|
| EGAF00008641766 | bam | 182.4 GB |
|
|
| EGAF00008641767 | bam | 178.6 GB |
|
|
| EGAF00008641865 | bam | 179.9 GB |
|
|
| EGAF00008641866 | bam | 176.7 GB |
|
|
| EGAF00008641867 | bam | 178.5 GB |
|
|
| EGAF00008641868 | bam | 176.8 GB |
|
|
| EGAF00008641903 | bam | 176.2 GB |
|
|
| EGAF00008641906 | bam | 185.7 GB |
|
|
| EGAF00008641910 | bam | 182.9 GB |
|
|
| EGAF00008641911 | bam | 177.4 GB |
|
|
| EGAF00008641936 | bam | 180.5 GB |
|
|
| 169 Files (31.1 TB) | ||||
