Biotechnological advancements in recent years have enabled the development of high-throughput next-generation sequencing assays and computational techniques, which are applied to evaluate nucleic acids, gene copy number, gene expression, and epigenetic silencing. This emerging genomics technology has significantly influenced various biological studies, especially towards developing novel therapies/drugs against major diseases, including cancer which is primarily linked to the genome. Cancer evolves and grows with the addition/s of somatic mutations, such as altered copy-numbers, epigenomic variations, and structural variants which may sometime have hereditability (germline variants). Evidences from the hereditary cancer segregation and loss-of-heterozygosity studies have identified both somatic and germline mutations of few typical tumor suppressor genes, such as TP53, RB1, and APC in cancer tissues. Also, copy-number studies have shown the involvement of certain oncogenes and primary oncogenic activators (HER2/ERBB2 and MYC). Thus, these oncogenic mutations have become the new target for the molecular therapy. Also, the identification of very specific and recurring mutations in these oncogenes is now employed to predict the sensitivity of the therapy, diagnosis and adopting an effective treatment against cancers. In general, cancer genomics study involves the analyses of high-throughput genomic data using different computational algorithms to correlate the disease with clinical effects for identifying molecular targets or biological pathways to treat cancer patients with better therapies.
Currently, cancer genome profiles have been comprehensively analyzed both at research and clinical stages using technologies, such as whole exome sequencing (WES), RNA sequencing (RNA-Seq), targeted sequencing of genes, and whole genome sequencing (WGS). To complement this, several computational tools have also been developed to interpret the obtained enormous ‘omics’ data to link with biological functions. Recent profiling technologies, such as ChIA-PET and Hi-C facilitated us to evaluate gene regulations at the transcriptional level. These technologies can find interactions in the chromatin fragment and also, allow us to screen and segregate the genome into active and suppressive domains. Till now, there are more than 50,000 sequenced cancer genomes available in the database maintained by the initiatives of The Cancer Genome Atlas and The International Cancer Genome Consortium projects. This database size will increase with the addition of millions of cancer patient’s genome predicted to be sequenced by 2030. These projects mainly used WES platform for sequencing and reveled cancer genes, new pathways, and elucidated the recurrent mutations in cancers. In common cancers, such as lung and melanoma, a high number of somatic mutations have been reported to occur in coding regions. While, leukemia and pediatric cancers have less mutations or only few protein-altered mutations in their coding regions. At present, there is a summarized database of the coding mutations for over 20,000 cancers. The non-coding regions constitute nearly 98% of the human genome. It includes introns, untranslated regions, promoters, non-coding functional RNA, regulatory elements, and repetitive regions. Nevertheless, facts on the somatic mutations in these non-coding regions are limited. In addition, somatic structural variants, such as inversion, deletions/insertions, translocation, duplication, and virus integration in cancer genomes are yet to be explored. In this regard, WGS platform can be a good option to explore all kinds of mutations at both coding and non-coding regions and provide a better insights of cancer genomes.
In WGS, DNA is fragmented randomly through physical shearing and about 90-150 Gb of both cancer and normal genomes covering 99% of the entire human genome is sequenced through 3rd generation NGS methodologies, such as nanopore or PacBio SMART sequencing. The expense for each WGS analysis is affordable (~US$ 1000). However, a high error rate (~5%) in each read is expected. The computational analysis of the WGS data is very challenging as it generates nearly 2× 90-150 Gb (normal and cancer DNA) sequence data (~ 1 Tb of raw data). Therefore, cloud computing systems might be useful in solving these hitches and simplify sharing of data globally. In general, analysis involves the alignment of normal and cancer genome sequences with the human reference sequence (3 Gb) to produce BAM files, and PCR duplications are removed. Somatic mutations such as single nucleotide variants (SNV), copy number alterations (CNA), short indels, and structural variants, and others are detected using several types of algorithmic tools by comparing cancer genomes with those in normal genomes. For germline variant calling, HaplotypeCaller software developed by GATK (https://software.broadinstitute.org/gatk/) is used.
WGS can identify somatic SNV of 1-10 bp in splicing sites, coding and intronic regions. Some studies have shown that mutations in both coding and intronic regions may modify the exonic motifs which control the splicing mechanism and cancer-related genes functions. However, detecting non-coding mutations are difficult and hence, a systematic collective analysis by WGS and RNA-Seq is recommended for its interpretation. Mutations in 5′-untranslated regions might occur in cancer inducing genes and regulates the stability of RNA and the translation of protein through miRNA binding. Studies have shown that somatic mutations occur in nearby long non-coding RNAs (NEAT1 and MALAT1) reported to be responsible for cancer invasions. WGS studies in melanoma samples have revealed the occurrence of mutations in TERT promoter sequence. Also, these promoter specific mutations are commonly detected in bladder cancer, glioblastoma, liver cancer, thyroid cancer, and melanoma. Likewise, WGS has evidenced the occurrence of mutations at the regulatory elements or promoter sites (PLEKHS1, TFPI2, WDR74, and BCL6). Now, WGS can detect CAN in oncogenes and tumor suppressor genes. About 40-70% of prostate cancers are identified to have structural variants in genes such as ERG, TMPRSS2, and other ETS related gene families. Also, WGS analysis of liver cancers was found to have the integration of Hepatitis-B virus DNA genome (3 kb) to the MLL4 and TERT loci of the genomic regions. Likewise, the integration of human papillomavirus DNA genome has been detected in cervical cancer using WGS.
The reduced cost of sequencing and improved computational tools for analyzing the WGS data encouraged cancer genomics research and its clinical benefits could be expected in future. WGS of different cancers offers copious information to understand cancer biology at the genomic level. It allows one to explore the functions of several non-coding regions in addition to the somatic mutations possibly involved in cancer development. However, integrative approach of using RNA-Seq and multi-omics data is required to interpret the immunobiology of cancers. Overall, WGS is very useful in identifying non-coding and structure variants. However, integrated analysis of WGS data with RNA-Seq, immuno-genomics, epigenomics, and clinic-pathological data is desired. This potential genomic approach can be a future hope to understand the dynamics of molecular mechanisms involved in carcinogenesis.