The application of massively parallel, high-throughput next-generation DNA sequencing methodologies have revolutionized the detection of rare Mendelian and monogenetic diseases with genetic variations. In recent times, these technologies are increasingly available from individual research groups and clinical diagnostic labs. Thus, it is possible to analyze whole human genomes at a reasonable cost within short time. It allows one to correctly diagnose patients and their families challenged with rare inherited disorders. With the genetic diagnosis methods, all recognized single gene defects can be established rapidly and safely. The detection of genes responsible for diseases with definite phenotypes also offer an answer to diseases due to mutations. In addition, novel reasons of disease in earlier unexplained cases can be recognized.
A known fact is that a single gene and monogenic inheritance in humans follow a Mendelian segregation in families, and any single gene defects may lead to certain diseases. To date, nearly 10000 monogenic disorders differing at the phenotypic level are recorded in the online database, OMIM (Online Mendelian Inheritance in Man). Nevertheless, only a limited number of genes responsible for these diseases have been identified so far. Until recently, the commonly used methods for identifying disease-causing mutation were based on the physiological functions, protein level, recombinant DNA technologies, and the use of human genetic linkage maps. In the last decade, there has been a paradigm shift in investigating Mendelian disorders using the powerful next-generation sequencing technologies which are based on massively parallel sequencing (MPS) i.e., millions or billions of DNA pieces are sequenced simultaneously. Several improvisation in sequencing methods has enabled a whole genome sequencing (WGS), a reality in recent times for reasonably less price. Though the human genome is comprised of 3 billion base pairs of DNA, only about 1 to 2 % of it codes for protein. Interestingly, the majority of disease instigating mutations for monogenic disorders occur in this protein-coding regions (exons) of the human genome. Thus, sequencing only this selected region will be rapid and cost-effective in analyzing the coding variation in an individual genome. This method is often termed as whole-exome sequencing (WES) or Targeted exome sequencing (TES). WES has been successfully employed to detect the genetic reasons for a variety of monogenic diseases. However, many uncharacterized gene related diseases are yet to be solved fully. In recent times, WES has undeniably changed the investigation approaches of diagnosing rare inherited diseases. The study results have certainly encouraged to adopt WES as a forefront diagnostic tool for identifying genetic diseases.
In 2009, for the first time, the multiple malformation disorder (Miller syndrome) was resolved using WES. The candidate gene, DHODH was identified to cause the disorder. It encoded for a crucial enzyme required for the pyrimidine de novo biosynthesis pathway. Likewise, WES of four individuals with Freeman-Sheldon syndrome due to mutation in the gene, MYH3 showed causal genetic variants (OMIM 193700). This confirms that WES as a potent approach to identify causal variants of rare disorders. In suspected Bartter syndrome patients, unexpected highly conserved recessive mutation in a gene, SLC26A3 was identified using WES. The gene, SLC26A3 is also linked to a disease called congenital chloride diarrhea (CLD). The genetic screening of CLD was later confirmed by referring to a clinician and thus, provided a proof of concept to use WES as a clinical approach to evaluate patients having undiagnosed gene disorders. This clinical report is viewed as the first claim of diagnosing a patient through next generation sequencing technology. Subsequently, developments have led to identify 800 novel monogenic disease genes. Most of these candidate genes brings loss-of-function mutations leading to recessive diseases, however a considerable number of them are missense variants and allows a certain degree of residual protein function. At present, WES can identify nearly ∼20000 to 60000 single-nucleotide variants per genome. However, in this large quantity of data, identifying a pathogenic variant is very challenging and thus, filtering of data using several parameters such as allele frequencies, loss-of-function variants, severity of missense mutations is needed.
The cost of sequencing has fallen significantly in recent times, and it is encouraging to sequence whole human genomes. For that reason, WGS should be encouraged to detect any small structural variants. This not only provides more uniform coverage when compared to WES, but also offer data on noncoding parts of the genome at a base pair resolution. Nonetheless, WGS involves more processing power and requires refined tools for storing huge data. Moreover, WGS allows the genome wide investigation of untranslated regions, introns, promoters, microRNAs, enhancers/repressors, etc. The practical effect of genetic variants in both protein-coding and non-coding regions are relatively difficult to evaluate. Interpreting these variants with functional data from the public databases such as The Encyclopedia of DNA Elements (ENCODE) will be crucial in deducing their role in pathogenesis. Despite several significant applications of WGS mainly to characterize human genetic variations, this approach has some technical limitations. For instance, in most MPS methods, library preparation requires the native DNA to be cloned and amplified before sequencing. This may lead to biases in the obtained sequenced data, with under-representation in GC rich regions. This affects the sensitivity of variant calling particularly in the first exons and introns of genes as they have a high GC content. Besides, the read length of current MPS platforms is not effective in detecting indels that are >500 base pairs and to bridge simple repeats, segmental duplicates, and long tandem repeats.
In clinical diagnosis, the rapid pulsed WGS was effectively used to comprehensively diagnose all in-borne errors of metabolisms with a known genetic basis. At present, around 600 genes are curated manually and continuously they are updated in the databases. The workflow of clinical diagnosis consists of the following steps (Figure 1a):
Variants are scored based on a weighted scoring model using subsets of annotated data. In the rank score model, weights are given for the pattern of inheritance, allele frequencies, annotation of genetic regions, the functional consequences of annotation and protein severity predictions (Figure 1b). So, all variants are taken into consideration, however priority is decided by assessing their disease causing potential considering the rank score.
Figure-1: (a) Summary of steps in the sequence data analysis workflow of the rapid, pulsed whole-genome sequencing used for clinical diagnostics. (b) Annotations used in the weighted sum model to calculate a rank score for each variant. (Source: Stranneheim and Wedell, 2016; doi: 10.1111/joim.12399)
Therefore, genome and exome sequencing provide an excellent opportunity to medical fields in the form of clinical sequencing to comprehend clinical medicine. A large number of monogenic disease diagnosis could be recognized and treatment approaches can be initiated at their early disease stages. This improves the quality of living in the affected individuals and their families. If WES or WGS is used wisely and coordinated with clinical experts, preventing many monogenic diseases in future may become a reality.