Genentech gallbladder cancer study - exome
SPECTA Lung cancer VCF files
65 prostate cancer cases transcriptome sequencing
The Young Boost Trial (YBT, YOUNG BOOST / BOOG 2004-01 (Borstkanker) | Kanker.nl investigated the optimal radiation dose for breast cancer patients aged 50 years or younger, treated with breast-conserving therapy (BCT). In this randomized study, participants were assigned to receive either a 26 Gy or a 16 Gy radiation boost, with the primary endpoint being local recurrence. To explore potential predictive biomarkers of treatment response in this patient group, whole exome sequencing (WES) was performed on tumor samples to identify genetic factors that could guide personalized treatment strategies and improve clinical outcomes for these young breast cancer patients.
Molecular cancer paper (https://doi.org/10.1186/s12943-021-01327-5): This dataset contain shallow whole-genome sequencing (sWGS) of plasma cell-free DNA from cancer patients and healthy subjects, obtained with both Nanopore and Illumina technology. A total of 6 cancer patients and 5 healthy subjects have been sequenced with Nanopore; 4 of the cancer patients have been also sequenced with Illumina. In addition, genomic DNA from white blood cells of one healthy subjects, genomic and 160bp DNA from HEK cells have been sequenced with Nanopore. Genome Biology paper: 3 additional healthy samples have been sequenced (HU), two different bioinformatic pipeline were applied. 2019: Fastqs from the molecular cancer paper were re-demultiplexed and adapter-trimmed (using guppy for multiplex samples, and porechop for singleplex) preserving 5' ends to allow fragmentomics analysis. HAC: All the samples were basecalled with the same updated High Accuracy model (the latest at the time of the analysis) and post-processed as the 2019 dataset. Raw FAST5 are currently available upon request, but will be uploaded soon.
Within the framework of the NCI-sponsored Cohort Consortium, investigators from 12 prospective epidemiologic cohorts formed the Pancreatic Cancer Cohort Consortium in 2006. This study, also known as "PanScan", is funded by the National Cancer Institute (NCI) and involves conducting a genome-wide association study (GWAS) of common genetic variants to identify markers of susceptibility to pancreatic cancer. In 2007, the study was expanded to include 8 case-control studies. The study team includes scientists from the cohorts comprising the Consortium, the NCI and the Pancreatic Cancer Case Control Consortium (PanC4). PanScan I and II were conducted in 12 cohort studies and 8 case-control studies, leading to the discovery of four novel regions in the genome associated with risk for pancreatic adenocarcinoma. The third phase of PanScan (PanScan III) was conducted using recently identified incident pancreatic cancer cases drawn from fourteen cohorts from the cohort consortium, including nine prospective cohorts who participated in PanScan I, and five newly joined cohorts. The nine cohort studies that participated in PanScan I and had new genotyping of cases in PanScan III include ATBC, CPS-II, EPIC, HPFS, NHS, PHS, PLCO, SMWHS, and WHI; the five newly joined cohort studies include the Agricultural Health Study (AHS), the Multiethnic Cohort Study (MEC), the Melbourne Collaborative Cohort Study (MCCS), the Vitamins and Lifestyle Study (VITAL), and Selenium and Vitamin E Cancer Prevention Trial (SELECT). In addition to the cases from cohorts, we also included cases from the Gastrointestinal Cancer Clinic of Dana-Farber Cancer Institute Study (DFCI-GCC); from the University Hospital in Heidelberg, Germany, which is part of a larger European clinical case-control study (PANDoRA); and from clinic-based cases from eastern Spain (PANKRAS-II). The dbGaP datasets available include all subjects previously made available from PanScan I and II, plus 1,582 new incident pancreatic cancer cases of European descent from prospective cohorts, case-control studies or case series (genotyped as part of PanScan III). Also included are 61 pancreatic cancer cases and 67 control subjects from PanScan I as well as 173 pancreatic cancer cases from PanScan III of Asian ancestry from the Shanghai Men's and Women's Health Study (Supplemental Table 10, Wolpin et al. (Nat Genet, 2014)). The control population used in the analysis for the Wolpin et al. manuscript included cancer-free individuals from the prospective cohorts that contributed pancreatic cancer cases to PanScan III and controls from the Spanish Bladder Cancer SBC/EPICURO study that were previously genotyped using the OmniExpress, Omni 1M or Omni 2.5M SNP arrays. The data from these control subjects were posted to dbGaP under the GWAS in which they were initially genotyped and will not be made available in duplicate under this dbGaP study.The summary statistics for PanScan I-III were generated as detailed in Wolpin BM. et al., Genome-wide association study identifies multiple susceptibility loci for pancreatic cancer, Nature Genetics 2014; 46(9):994-1000 (https://www.nature.com/articles/ng.3052), and Klein, A. et al., Genome-wide meta-analysis identifies five new susceptibility loci for pancreatic cancer, Nature Communications, 2018;9(1):556 (https://www.nature.com/articles/s41467-018-02942-5). The dataset includes results from an association study of 5,117 individuals diagnosed with pancreatic ductal adenocarcinoma (PDAC) and 8,845 control individuals, or a total of 13,962 subjects of European ancestry (as compared to the genotype and phenotype information under this project that includes 9,437 individuals (PanScan I and II PDAC case and control individuals and PanScan III PDAC cases only). This is due to the fact that PanScan III “borrowed” GWAS data from control individuals genotyped separately from the PanScan GWAS project and are therefore not included as raw genotypes in phs000206.v5.p3. Association analysis was performed separately for PanScan I-II and PanScan III, followed by a meta-analysis of the two datasets. Results were filtered based on a minor allele frequency (MAF) < 0.01, an imputation INFO score < 0.3 and a heterogeneity P-value < 1x10-10 leaving a total of 9,758,390 variants. Columns in the summary statistics dataset are as follows: ID: variant rsIDChr: chromosome numberPosition: position in the chromosome, genome build GRCh37/hg19MarkerName: variant identifierAllele1: reference alleleAllele2: alternative alleleFreq1: allele frequency for allele2FreqSE: standard error of the allele frequencyMinFreq: the minimal allele frequency across studiesMaxFreq: the max allele frequency across studiesEffect: effect size for allele2StdErr: standard errorP-value: meta-analysis p-valueDirection: summary of effect direction for each studyHetISq: I^2 statistic which measures heterogeneity on scale of 0-100%HetChiSq: chi-squared statistic in simple test of heterogeneityHetDf: degrees of freedom for heterogeneity statisticHetPVal: P-value for heterogeneity statistic
Matched Pair Cancer Cell line Whole Genomes
ICGC prostate cancer miRNA sequencing
Genome-wide NanoRCS of cfDNA from plasma of Esophageal cancer patients