About 80% of the world's population depend on plant-derived drugs for curing health problems. Herbal drugs are effective, low-priced and relatively safe as compared to conventional synthetic drugs. Plant compounds have enormous structural diversity and hence, exhibit wide-ranging biological activities and find extensive usage in biomedicine. Some of the well-known therapeutic plant-derived drugs include camptothecin, taxol, artemisinin, colchicine, quinine, morphine, quinidine, allicin, etc. In recent years, researchers have focused on discovering new drug leads from plant sources. In particular, secondary metabolites such as alkaloids, phenolics, terpenoids, etc., produced due to the enormous diversity of plant metabolism are being explored in herbal drug discovery. Nevertheless, many plants and their metabolic pathways are yet to be examined and explored scientifically to understand their bioactive phytoconstituents.
The quantity and quality of phytocompounds isolated from medicinal plants usually vary due to several factors, including genotypes, the geography, edaphic conditions, harvesting and processing methods. Usually, these metabolites occur in plant parts in low quantities. Additionally, plant species loss has truncated the production from the wild plants, and compelled to look for other alternatives including biotechnology and bioengineering strategies. The biosynthesis of phytocompounds is controlled by several genes, control elements, enzymes, and other regulatory proteins. However, only few plant metabolic pathways are explored till now. In this regard, medicinal plant genomics research, genomics consortium and innovations in omics data are significantly increasing in recent times. Overall, these advancements have led to an emerging field called ‘herbal genomics’, which is based on high-throughput sequencing technologies focused to identify and explore unknown genes, enzymes and metabolic pathways. Medicinal plant genes and their functions can be evaluated through genome sequencing, assembling and annotation studies. So far, only limited herbal genomes have been completely sequenced and assembled owing to their genome complexity. The collective information on genomics, transcriptomics, proteomics and metabolomics is used to forecast the metabolic pathways of secondary metabolites in herbs.
Plants have evolved to biosynthesize various kinds of compounds through secondary biochemical pathways in response to a definite surrounding/environmental stimuli. Some of the major biosynthetic pathways identified to date include the terpenoids, alkaloids, and the phenolic compounds pathways. Hence, sequencing the whole nuclear or/and chloroplast genomes will be more useful in understanding the full metabolic pathways involved in medicinally valued plants. Several projects, including Herb Genome Programme, The Medicinal Plant Genomics Consortium, The 1000 Green Plant Transcriptome Project, The Medicinal Plant Transcriptome Project, etc. have successfully sequenced and analyzed the functional genomics of many medicinal plants such as Catharanthus roseus, Salvia miltiorrihiza, Ganoderma lucidum, Chlorophytum borivilianum, etc.,. The data has deciphered the involvement of different secondary metabolite biosynthetic pathways and the evolution of plant species. The biosynthetic pathways for triterpenes, indole alkaloids, and diterpene quinone have been well-established in some herbs.
Figure 1. Chlorophytum borivilianum unigenes involved in two secondary metabolic pathways.
C. borivilianum unigenes involved in; (A) saponin biosynthesis, (B) flavonoid biosynthesis and (C) alkaloid biosynthesis. Red number in the bracket following each gene name indicates the number of corresponding unigenes. (Adopted from Kalra et al. (2013). https://doi.org/10.1371/journal.pone.0083336.g007)
Presently, several genes linked to the biosynthesis of camptothecin, vindoline, and catharanthine have been characterized, and the major steps involved in the synthesis of morphine and taxol have been elucidated. Using bacterial artificial chromosome (BAC) sequencing, 7 small clusters having 2-3 genes encoding enzymes for vincristine and vinblastine biosynthesis pathway have been realized. The chloroplast genome have been successfully sequenced for Pogostemon cablin and Salvia miltiorrhiza (red sage) plants which are rich in bioactive compounds such as flavonoids, terpenoids, tanshinone and phenolic acids. S. miltiorrhiza possessed 151328 bp in length chloroplast genome constituting 114 unique genes that code for proteins, tRNAs and rRNAs. While, P. cablin had the chloroplast genome size of 152460 bp in length and contained with 127 genes coding for proteins, rRNAs and tRNAs. The knowledge on the chloroplast genome sequence data will assist in phylogenetic, population, and other genetic engineering investigations in these herbs.
Likewise, the whole genome of Azadirachta indica (neem), Ziziphus jujube (jujube), etc., have been sequenced successfully. The analysis of A. indica genome and transcriptome has revealed that the genome is rich in A-T base pairs, contained nearly 20000 genes and possess minor repetitive DNA elements. Both exclusive and enhanced expression of familiar genes involved in neem terpenoid biosynthesis pathways have been identified from the comparative transcript expression analysis. De novo assembly of Z. jujuba genomics and transcriptomics data proved augmented expression of genes encoding for the GDP-L-galactose phosphorylase and GDP-D-mannose 3,5 epimerase enzymes contributing for the sugar metabolism. Also, the involvement of L-galactose pathway as the key synthetic pathway for vitamin C has been proved. The whole genome sequencing of S. miltiorrihiza plant revealed the genome size of ∼600 MB containing 30478 protein coding genes and 1620 transcription factors coding genes. Many of these transcription factors shown to regulate the biosynthesis of phenolic acids and tanshinone. The genome sequence and annotation studies in Ocimum species have shown increased expression of genes involved in the phenylpropanoid and terpenoid biosynthesis. Also, studies have shown that local (tandem) duplication (LDs) and whole-genome duplications (WGDs) play a significant role in specialized metabolisms in herbs.
High-throughput sequencing of medicinal plant genome can speed up the process of discovering previously unidentified enzymes or pathways involved in the biosynthesis of plant metabolites. Some of the metabolites such as artemisinic acid, benzylisoquinoline alkaloids and monoterpenoid indole alkaloids are being produced commercially in heterologous host using these deduced pathways. The herbal genomic databases are becoming more useful in discovering new pathways considering the operon like gene clusters as defined in rice, barley, etc. In the plant Papaver somniferum (opium poppy), a 10-gene cluster (401 kb) located over the genomic segment is shown to involve in the synthesis of noscapine (antitumor alkaloid). More recently, the transcriptomic data of Podophyllum hexandrum (mayapple) have been selectively used for combinatorial expression of genes in tobacco plant to produce podophyllotoxin. Likewise, co-expression of 10 genes in tobacco resulted in the production of a naturally occurring lignan, etoposide aglycone.
A thorough knowledge about the genomics data, genes and metabolic pathway enzymes in medicinal plants can lead to new tactics of drug discovery including metabolite engineering, microbial and plant genetic engineering. The development of low-cost genome sequencing methods along with proteomics and metabolomics information will benefit in identifying new biosynthetic pathways, and may possibly assist in identifying the gene functions in several medicinal plants. These investigations will surely lead to the discovery of novel and chemo-diverse classes of pharmacologically active secondary metabolites and their production in large scale. In this regard, future drug discovery research should focus on the herbal genomics for the speedy discovery of unidentified genes, biosynthetic pathways, and enzymes. Therefore, interdisciplinary investigations on herbal genomics together with comprehensive proteomics and metabolomics is absolutely necessary. In future, it is hoped to obtain novel bioactive plant secondary metabolites using herbal genomics information.