Need Help?

Khoe-San Genome Project

The dataset contains BAM files of whole genome sequencing data of African Khoe-San population (n=169, 30X coverage). DNA extracted from blood underwent 2 × 150 bp sequencing on the Illumina HiSeq X instrument (Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research). Raw sequencing reads were aligned against human reference hg38 including alternative contigs using bwa (v.0.7.15). The Genome Analysis Toolkit (GATK, v.3.5-0) was used for duplicate marking, indel realignment and base quality recalibration.

Request Access

Khoe-San Genome Project (KSGP)

A subsidiary project of the Southern African Prostate Cancer Study (SAPCS) and Namibian Prostate Cancer Study (NamPCS), with the approved and funded objective to generate a population-representative panel of normal (PoN) genetic diversity and whole genome representation across the region. Background The Khoe-San genome Project (KSGP), previously the Diversity inclusive Genome Study for Southern Africa (DiGS-SA) and Ubuntu Genomics Projects was first established in 2008 with seed funding received from the University of Limpopo to Professors Philip Venter (University of Limpopo, South Africa) and Vanessa M. Hayes (University of Sydney, Australia). In South Africa, the project formed a sub-aim of the National Health and Medical Research Council (NHMRC) of South Africa initiated 2008 Southern African Prostate Cancer Study (SAPCS), with approval granted to generate population-matched panel of normal (PoN) genomic and referenced based diversity data to achieve the required goals. Originally managed out of the University of Limpopo, under ethics committee clearance certificate MREC/HS/001/2008 (PI-Venter) and Ubuntu Genome Project clearance MREC/HS/214/2011 (PI-Venter), upon the retirement of Professor Venter in 2010, the study was transferred to Professor M.S. Riana Bornman (University of Pretoria, South Africa) and approved by the University of Pretoria Faculty of Health Sciences Research Ethics Committee (with US Federal wide assurance FWA00002567 and IRB00002235 IORG0001762) HREC/43/2010 (PI-Bornman) and HREC/280/2017 (PI-Bornman). In Namibia, the study was reviewed and approved by the Ministry of Health and Social Services first in 2008 17/3/3HAYES (PI-Hayes), updated in 2014 to include the Namibian SAPCS under 17/3/3HEYNS (PI Professor Christopher Heyns, University of Stellenbosch, South Africa), which on his passing was updated to MoHSS 17/3/3HEAF003 (PI Dr Hagen E.A. Förtsch, Windhoek Central Hospital). In 2025, NamPCS (Namibian Prostate Cancer Study) and associated KSGP will be led by Professor Lamech Mwapagha (Namibia University of Science & Technology). All molecular-genetic/genomic data has been generated under ethics approval granted by the St. Vincent’s Human Research Ethics Committee in Sydney, Australia #SVH/15/227 (PI-Hayes), including established Material Transfer (MTA) and Data Sharing Agreements (DSA). All data remains the property of the country from which the participants were recruited. For the purposes of publication and cohort recognition, the Khoe-San Genome Project (KSGP) is a derivative of these funded projects. Largely derived from Namibian participants, the Data Access Committee (DAC) is primarily led out of the Namibia University for Science & Technology (NUST), with additional local, South African and data science representation. Data generated for the KSGP has been funded by the University of Limpopo, South Africa (to the late PI-Venter and PI-Hayes); Garvan Institute of Medical Research Foundation and Medical Genome Research Biobank (MGRB) in Australia (to PI-Hayes); and U.S.A. Congressionally Directed Medical Research Programs (CDMRP) Prostate Cancer Research Program (PCRP) Health Equity Research and Outcomes Improvement Consortium (HEROIC) Award (PC210168 and PC230673, HEROIC Prostate Cancer Precision Health (PCaPH) Africa1K to PI-Hayes and PI-Bornman). The data generated will be made accessible (at publication) to the research community under request, while maintaining ethical standards and practices related to generated research data, with priority to safeguarding the interests of the participants. Notwithstanding the data will only be supplied to successful applicants in a de-identified form, to reduce the risk of re-identification, the data is deemed personal information and is highly confidential. While donor consents do not require disclosure of research findings to individual participants, the KSGP management team is obligated to return research findings to the contributing communities via established engagement and awareness programs, as well as contributing to local policy development. It is therefore critical that all findings emanating from the KSGP research data is communicated back to the KSGP research team for community benefit. This policy may be updated from time-to-time. Purpose The Data Access Policy (DAP) outlines the membership of the KSGP Data Access Committee (DAC) and the processes and procedures used to ensure equitable, ethical and efficient access to and the release of published genomic data. KSGP-DAC Leads • KSGP Chair, Professor Lamech Mwapagha, Namibia University of Science & Technology (NUST), Department of Biology, Chemistry and Physics, 13 Jackson Kaujeua Street, Windhoek, Namibia Email: lmwapagha@nust.na • Data Science Lead, Professor Vanessa Hayes, Ancestry & Health genomics Laboratory, Director HEROIC PCaPH Africa1K, University of Sydney, Australia; Email: vanessa.hayes@sydney.edu.au • NamPCS Clinical Lead, Dr Hagen E.A. Förtsch, Windhoek Central Hospital, Urology Department, Urology Practice Luxury Hill, 19 Heinitzburg Street, Windhoek, Namibia Email: uroreception@mweb.com.na • South African Lead, Professor Riana Bornman, School of Systems Health & Public Health, University of Pretoria, South Africa; Email: Riana.Bornman@up.ac.za Additional KSGP-DAC Members • Mr Uvatera Maurihungirire, Namibia University of Science & Technology (NUST), Department of Biology Chemistry and Physics; Expertise: KSGP Biobank Manager • Dr Weerachai Jaratlerdsiri (PhD), Ancestry & Health Genomics Lab, University of Sydney, Australia; Expertise: Computational Genomics • Mr Elton Adams, Living Museum of the Damara, Twyfelfontein, Namibia; Expertise: Community Representative • Mr Josef /Kunta, Headman Nhoma Ju/'hoan village, Nyae Nyae Conservatory, Namibia; Expertise: Community Representative • Ms Jue Jiang, Ancestry & Health Genomics Lab, University of Sydney, Australia; Expertise: Data Management & Security Officer Application Procedures The KSGP-DAC will consider requests for published data from all researchers. Data will only be released to researchers who can provide a statement of ethics approval from an IRB or Human Research Ethics Committee (HREC) of their host institution to safeguard patient rights. Before submitting a request, applicants are encouraged to confer with one of the KSGP Leads to discuss the appropriateness of the data for the proposed study and the feasibility of the request. The following information (via a KSGP Concept Note) will be requested from applicants: (i) The applicant's name, applicant title, institution, country, and email address, note, where the application is being made on behalf of a consortium or collaborative group, provide the name of each external collaborator, their institution and country; (ii) the research question/hypotheses to be tested; (iii) list of data required; (iv) statistical justification for the number and types of cases required (where applicable); (v) brief description of the technical approach; (vi) evidence of the proposed technical approach's prior successful use (where applicable); (vi) IRB/HREC approval number (and copy of approval if requested) to conduct proposed research (external applicants); (vii) a data management plan (see below); and (viii) provide a signed letter of collaboration with a KSGP Lead or Community Representative. A copy of the data management plan should be submitted or provide the following information: (i) brief description of technical systems, policies and processes the applicant has in place to ensure the data is secure, kept confidential and safe from unauthorised use or disclosure including during any mandatory retention periods at the conclusion of the research; (ii) brief description of any third parties or service providers involved in the processing (eg. linking, storing) of the data, (iii) brief description of the intended outputs from the research (eg. statistical summaries, publications, PhD thesis, production of another data set); and (iv) what steps will the applicant take to anonymise or pseudonymise the data output? Decision-making The KSGP-DAC will meet monthly (by zoom, when required) to consider requests. Quorum for decision-making will require a minimum of five of nine members and include the Chair, Data Lead and both Community Representatives. Decisions will be made by the majority and the majority must include both Community Representatives. As all data has been generated through peer-reviewed funding mechanisms and as such, research projects may well be underway to address relevant questions. In such circumstances, priority will need to be given to already funded KSGP research efforts. The KSGP-DAC will make decisions on approval of access to data using the following criteria: (i) The track record of the applicants in the technical approach and likelihood of significant outcomes from the research proposed. (ii) Non-overlap with projects being undertaken by the KSGP team (except with compelling justification). (iii) Alignment of the research purpose with patient’s informed consent and ethics approval (e.g., restrictions as to commercial research, or other requirements needing additional approval). (iv) Availability of a data management plan. (v) Any other restrictions or conditions that may apply to the use or disclosure of the relevant data. The KSGP-DAC can impose certain restrictions on all approvals, and/or specific restrictions on approvals, as follows: (i) No transfer to third parties allowed (all). (ii) Acknowledgment of the KSGP in publications/presentations (all). (iii) A report of the results of the research to be provided to KSGP-DAC after completion (or when requested). (iv) Researchers cannot utilize the data for commercial purposes (all). (v) Approval will not be given that excludes other researchers from accessing data (all). (vi) Approval might be time-limited exclusive rights for a particular research development (particular). (v) Where a conflict of interest exists. Approved applications will require a fully institutionally executed KSGP Data Sharing Agreement (DSA) prior to recipient researcher being granted access to deidentified data through an established database. Each researcher within the recipient team needs to be registered with the KSGP-DAC prior to accessing data. Rejected applications. Decisions made by the KSGP-DAC will be communicated in writing to the applicant setting out the reasons for rejecting the application. Responsibilities It is the responsibility of the applicants to: (i) provide documentation of local IRB/ethics approval, (ii) agree to make results of studies using the data available to the larger scientific community, (iii) provide a letter of collaboration with the primary study investigator(s) thereby enhancing local inclusion, (iv) use of data is limited to not-for-profit organisations, (v) genomic data, under discussion with KSGP lead investigators, may be used for methods development purposes under collaborative agreement. It is the responsibility of Data Access Committee to: (i) review applications and approve the release of data based on the scientific value of the research proposal that is not in conflict with current KSGP approved/funded studies; (ii) coordinate data release, (iii) review the completed data and publication, including providing copies of all proposed publications to the KSGP-DAC for review and comment (prior to submission) and appropriately acknowledge KSGP in publications using relevant data.

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
EGAS50000001408 Whole Genome Sequencing

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Quality Report
Located in
EGAF00008618693 bam 173.9 GB
EGAF00008618702 bam 179.2 GB
EGAF00008618707 bam 170.0 GB
EGAF00008618710 bam 183.1 GB
EGAF00008618716 bam 193.9 GB
EGAF00008618878 bam 173.9 GB
EGAF00008623343 bam 198.7 GB
EGAF00008624311 bam 178.3 GB
EGAF00008624312 bam 176.1 GB
EGAF00008624313 bam 189.2 GB
EGAF00008624656 bam 182.1 GB
EGAF00008624657 bam 184.0 GB
EGAF00008624658 bam 177.8 GB
EGAF00008624659 bam 181.0 GB
EGAF00008624660 bam 182.2 GB
EGAF00008624661 bam 178.1 GB
EGAF00008624663 bam 185.2 GB
EGAF00008624676 bam 179.6 GB
EGAF00008624677 bam 182.3 GB
EGAF00008624678 bam 178.8 GB
EGAF00008624679 bam 199.1 GB
EGAF00008624680 bam 204.4 GB
EGAF00008624681 bam 213.8 GB
EGAF00008624684 bam 192.1 GB
EGAF00008630301 bam 179.0 GB
EGAF00008630302 bam 187.1 GB
EGAF00008630303 bam 178.9 GB
EGAF00008630304 bam 197.2 GB
EGAF00008630305 bam 170.2 GB
EGAF00008630306 bam 184.3 GB
EGAF00008630307 bam 181.3 GB
EGAF00008630308 bam 212.8 GB
EGAF00008630744 bam 178.6 GB
EGAF00008630797 bam 180.6 GB
EGAF00008630798 bam 184.3 GB
EGAF00008630983 bam 179.3 GB
EGAF00008630984 bam 177.7 GB
EGAF00008630985 bam 191.8 GB
EGAF00008630986 bam 197.4 GB
EGAF00008630987 bam 193.4 GB
EGAF00008630988 bam 196.4 GB
EGAF00008630989 bam 202.6 GB
EGAF00008632390 bam 173.9 GB
EGAF00008632391 bam 187.2 GB
EGAF00008632392 bam 186.6 GB
EGAF00008632394 bam 188.0 GB
EGAF00008632395 bam 185.3 GB
EGAF00008632398 bam 182.5 GB
EGAF00008632399 bam 193.8 GB
EGAF00008632400 bam 206.2 GB
EGAF00008632402 bam 208.5 GB
EGAF00008632404 bam 203.1 GB
EGAF00008632405 bam 200.6 GB
EGAF00008632419 bam 181.5 GB
EGAF00008632424 bam 178.3 GB
EGAF00008632426 bam 186.5 GB
EGAF00008632428 bam 187.4 GB
EGAF00008632432 bam 181.4 GB
EGAF00008634798 bam 181.0 GB
EGAF00008634801 bam 197.0 GB
EGAF00008634827 bam 190.2 GB
EGAF00008636346 bam 200.3 GB
EGAF00008637273 bam 176.2 GB
EGAF00008637279 bam 172.3 GB
EGAF00008637289 bam 183.1 GB
EGAF00008637314 bam 174.5 GB
EGAF00008637502 bam 175.2 GB
EGAF00008637503 bam 181.1 GB
EGAF00008637504 bam 179.9 GB
EGAF00008637505 bam 174.1 GB
EGAF00008637506 bam 182.0 GB
EGAF00008637507 bam 197.9 GB
EGAF00008637508 bam 192.3 GB
EGAF00008637509 bam 185.9 GB
EGAF00008637529 bam 172.8 GB
EGAF00008637533 bam 182.3 GB
EGAF00008637604 bam 188.3 GB
EGAF00008637609 bam 191.2 GB
EGAF00008637610 bam 194.4 GB
EGAF00008637680 bam 188.3 GB
EGAF00008637681 bam 181.0 GB
EGAF00008637682 bam 172.4 GB
EGAF00008637683 bam 180.4 GB
EGAF00008637684 bam 186.7 GB
EGAF00008637685 bam 177.4 GB
EGAF00008637697 bam 204.5 GB
EGAF00008637700 bam 190.7 GB
EGAF00008637732 bam 188.4 GB
EGAF00008637889 bam 182.8 GB
EGAF00008637890 bam 183.0 GB
EGAF00008637891 bam 175.2 GB
EGAF00008637892 bam 195.8 GB
EGAF00008637893 bam 187.7 GB
EGAF00008637894 bam 188.8 GB
EGAF00008637895 bam 186.7 GB
EGAF00008637896 bam 192.0 GB
EGAF00008637897 bam 194.8 GB
EGAF00008637898 bam 210.1 GB
EGAF00008637912 bam 194.7 GB
EGAF00008637913 bam 187.6 GB
EGAF00008637914 bam 225.1 GB
EGAF00008638104 bam 201.4 GB
EGAF00008638110 bam 180.7 GB
EGAF00008638111 bam 185.0 GB
EGAF00008638114 bam 178.4 GB
EGAF00008638115 bam 182.9 GB
EGAF00008638116 bam 171.0 GB
EGAF00008638119 bam 179.9 GB
EGAF00008638120 bam 178.8 GB
EGAF00008639155 bam 180.9 GB
EGAF00008639156 bam 182.5 GB
EGAF00008639157 bam 177.4 GB
EGAF00008639158 bam 181.9 GB
EGAF00008639159 bam 175.2 GB
EGAF00008639160 bam 186.1 GB
EGAF00008639161 bam 186.3 GB
EGAF00008639162 bam 182.1 GB
EGAF00008639618 bam 189.2 GB
EGAF00008639682 bam 186.1 GB
EGAF00008639685 bam 174.3 GB
EGAF00008639743 bam 181.2 GB
EGAF00008639747 bam 187.0 GB
EGAF00008640000 bam 175.1 GB
EGAF00008640383 bam 181.6 GB
EGAF00008640384 bam 176.8 GB
EGAF00008640388 bam 182.3 GB
EGAF00008640398 bam 178.6 GB
EGAF00008640399 bam 176.7 GB
EGAF00008640400 bam 187.1 GB
EGAF00008640403 bam 178.1 GB
EGAF00008640415 bam 171.9 GB
EGAF00008640416 bam 182.3 GB
EGAF00008640419 bam 178.2 GB
EGAF00008640421 bam 172.1 GB
EGAF00008640460 bam 183.2 GB
EGAF00008640475 bam 172.2 GB
EGAF00008640481 bam 180.4 GB
EGAF00008640550 bam 187.0 GB
EGAF00008640665 bam 178.2 GB
EGAF00008640677 bam 187.8 GB
EGAF00008640694 bam 179.2 GB
EGAF00008640705 bam 174.9 GB
EGAF00008640729 bam 175.5 GB
EGAF00008640966 bam 169.6 GB
EGAF00008640983 bam 180.2 GB
EGAF00008641019 bam 188.2 GB
EGAF00008641040 bam 181.5 GB
EGAF00008641074 bam 178.7 GB
EGAF00008641219 bam 175.9 GB
EGAF00008641388 bam 176.7 GB
EGAF00008641389 bam 179.2 GB
EGAF00008641390 bam 182.3 GB
EGAF00008641391 bam 175.3 GB
EGAF00008641665 bam 175.7 GB
EGAF00008641666 bam 179.5 GB
EGAF00008641699 bam 182.1 GB
EGAF00008641706 bam 177.2 GB
EGAF00008641765 bam 172.1 GB
EGAF00008641766 bam 182.4 GB
EGAF00008641767 bam 178.6 GB
EGAF00008641865 bam 179.9 GB
EGAF00008641866 bam 176.7 GB
EGAF00008641867 bam 178.5 GB
EGAF00008641868 bam 176.8 GB
EGAF00008641903 bam 176.2 GB
EGAF00008641906 bam 185.7 GB
EGAF00008641910 bam 182.9 GB
EGAF00008641911 bam 177.4 GB
EGAF00008641936 bam 180.5 GB
169 Files (31.1 TB)