Transcriptomics evaluates the full transcriptome (mRNAs from actively expressed genes) of an organism and their functions. It is one of the well-developed fields in the post-genomics era. The modern high-throughput technologies, such as RNA sequencing (RNA-Seq), and microarray analysis are used commonly for transcriptomic analysis. Microarrays quantify a group of specific sequences, while RNA-Seq capture all expressed gene sequences. The assessment of transcriptomes allows one to identify genes that are expressed differentially in a cell, or in response to diverse conditions/treatments. Assessing the expression of genes in different tissues under different conditions discloses the molecular mechanisms involved in various biological processes in an organism. In addition, it provides evidence on how and when genes are expressed or regulated. Transcriptome analysis also supports the proteomics study. It explains the inconsistency in coding gene numbers with the quantity of translated proteins and allows to study the translational regulation. Recent discoveries have found that large numbers of non-protein coding RNAs (ncRNAs) also exist and they play a key role in gene regulation. Thus, it has become a tool to understand on various human diseases.
The transcriptome analysis began in the early 1990s. Expressed sequence tags (ESTs) represent a short nucleotide sequence produced from a single RNA transcript. They are first copied as cDNAs using the reverse transcriptase enzyme and later, cDNAs are sequenced by the Sanger method of sequencing. However, technological advancements have led to make use of high-throughput sequencing approaches, such as sequencing by synthesis (Solexa/Illumina, San Diego, CA) in recent times. Generally, EST libraries give information on the sequences for early microarray designs. For instance, 350000 earlier sequenced ESTs were used to design a barley GeneChip. The earliest sequencing-based transcriptomic technique developed was Serial Analysis of Gene Expression (SAGE), where transcripts were randomly sequenced through Sanger’s sequencing approach and quantified by matching them with the known genes. SAGE is an improved EST methodology, and it allows quantitation of a large number of transcripts by increasing the throughput of the tags (11 bp) generated for sequencing. When these tags are aligned with a reference genome, their corresponding gene can be identified. In the absence of a reference genome, the tags can be directly used as diagnostic genome markers if observed to be differentially expressed in a disease condition. Likewise, a variant of SAGE method is the Cap analysis of gene expression (CAGE) method where, tags from the 5? end of mRNAs are sequenced. Both SAGE and CAGE approaches produce data on increase number of genes as compared to that of sequencing with a single ESTs. However, the methodology is very complex and labor intensive while preparing samples and analyzing the data.
Microarray consists of probes, i.e., short nucleotide oligomers that are arrayed on solid substrates. The abundance of transcripts is determined by hybridizing fluorescently labelled transcripts to these oligomeric probes. The intensity of fluorescence developed at each location of the probe on microarrays indicates the abundance of transcripts for that probe. However, microarrays need a prior understanding of an annotated genome sequence or ESTs to be used for generating the probes for arrays. On the other hand, RNA-Seq uses both high-throughput sequencing and computational approaches to detect and quantify transcripts occurring in RNA extracts (Figure 1).
Figure-1: Summary of RNA sequencing.
Within the organisms, genes are transcribed and spliced (in eukaryotes) to produce mature mRNA transcripts (red). The mRNA is extracted from the organism, fragmented and copied into stable double-stranded–cDNA (ds-cDNA; blue). The ds-cDNA is sequenced using high-throughput, short-read sequencing methods. These sequences can then be aligned to a reference genome sequence to reconstruct which genome regions were being transcribed. These data can be used to annotate where expressed genes are, their relative expression levels, and any alternative splice variants. (Adopted from Lowe et al. 2017. https://doi.org/10.1371/journal.pcbi.1005457.g004)
RNA-Seq can generate sequences of about 100 bp in length, however can range between 30 and 10000 bp based on the sequencing method types. RNA-Seq transcript data is aligned computationally with a reference genome to predict the transcriptomes. RNA-Seq is advantageous over microarrays as it requires nanogram quantity of input RNA amounts as compared to microgram quantity required for microarrays. Also, it allows an adequate analysis of cellular structures, and examination of cDNAs at the level of individual cells. As a result, it is widely used to detect genes within a genome and their relative expressions. The invention of next generation sequencing (NGS) has progressively changed the way of genomic research in recent time. At present, NGS-based RNA-Seq is the method of choice for gene expression analysis. The large amount of data generated by transcriptomics studies is usually deposited into public databases, such as Gene Expression Omnibus, ArrayExpress, Expression Atlas, RefEx, etc., to safeguard and utilize it in future by various scientific communities.
Transcriptomic approaches have a broad application in various fields of biomedical research, comprising disease detection/diagnosis and profiling. Transcriptomic methodologies have helped to identify the transcription initiation sites, alternate promoter usages, and new splicing alterations. All these controlling elements play a key role in causing human diseases, and hence, describing such sequence variants is very crucial in interpreting the cause of a disease and their possible associations. RNA-Seq allows to identify single nucleotide polymorphisms (SNP) associated with diseases and allele specific expressions leading to contribute diseases. Likewise, RNA-Seq is very useful in understanding immune-related diseases in patients based on the identification of T cell and B cell receptors. Similarly, RNA-Seq of pathogenic microbes quantifies changes in gene expression, identify factors responsible for the virulence, predict multi-drug resistance, and reveal host-pathogen immune relations. Transcriptomics will be useful in identifying responses of genes when exposed to biotic as well as abiotic environmental stresses. For example, a study on the gene expression analysis during the formation of biofilm by Candida albicans (a fungal pathogen) has revealed that certain genes are highly co-regulated and are critical to develop and maintain the biofilm. All transcriptomic methods are effective in recognizing the gene functions, phenotypic changes. Also, helpful in identifying previously unidentified protein-coding regions in the sequenced genome. RNA-Seq analysis has shown that non-ribosomal transcriptome is responsible for the human chromosome imbalance i.e., Down syndrome. A new exon in the retinitis pigmentosa GTPase regulator (RPGR) gene when mutated is reported as the cause for X-linked Retinitis pigmentosa in patients. Human diseases, including neurological pathologies, immunohematology disorders, and malignancies are linked to aberrant RNA splicing. For example, recently evaluated Huntington’s disease transcriptomes have identified about 593 distinct alternative splicing incidents among pathological and control brains. Transcriptomic analyses have revealed that >93 % of the human DNA is copied to RNA, but only 2% of it forms the mRNA, while the rest includes non-coding RNAs (ncRNA) including ribosomal RNAs, transfer RNAs, small nuclear RNAs guide RNAs, and other vital RNA species. More recently, polymorphism found in the long ncRNA (lnc13) was shown to have a link with an intestinal autoimmune disorder.
Over the past few years, transcriptomics study has revolutionized with the use of microarrays and application of NGS platforms to sequence RNAs. It has updated our knowledge of how a genome is expressed and regulated and their disease-association. It is progressing rapidly in various fields of research focused on detecting gene expression level in diverse species. It has become as a powerful tool in understanding the molecular mechanisms involved in human pathogenesis and might be effectively applied in clinical testing to identify a wide ranging human disease. Further, the reduced cost of transcriptomic approaches has encouraged even small laboratories to engage in transcriptomic analysis of different organisms, tissues under diverse environmental conditions. Overall, this trend is expected to continue as the sequencing technologies improve.