Integrated metabolomic and transcriptomic profiling reveals the tissue-specific flavonoid compositions and their biosynthesis pathways in Ziziphora bungeana

Background Ziziphora bungeana Juz. is a folk medicine from the Xinjiang Uygur Autonomous Region. The herb or the aerial parts of it have been used to medicinally treat cardiovascular diseases. Flavonoids are the main pharmacologically active ingredients in Z. bungeana. Identification of the tissue-specific distribution of flavonoids in Z. bungeana is crucial for effective and sustainable medicinal use of the plant. Furthermore, understanding of the biosynthesis pathways of these flavonoids in Z. bungeana is of great biological significance. Methods The flavonoids from different tissues of Z. bungeana were identified using liquid chromatography-tandem mass spectrometry (LC–MS/MS). The full-length transcriptome of Z. bungeana was determined using a strategy based on a combination of Illumina and PacBio sequencing techniques. The functions of differentially expressed unigenes were predicted using bioinformatics methods and further investigated by real-time quantitative PCR and phylogenetic relationship analysis. Results Among the 12 major flavonoid components identified from Z. bungeana extracts, linarin was the most abundant component. Nine flavonoids were identified as characteristic components of specific tissues. Transcriptome profiling and bioinformatic analysis revealed that 18 genes were putatively involved in flavonoid biosynthesis. The gene expression and phylogenetic analysis results indicated that ZbPALs, Zb4CL3, ZbCHS1, and ZbCHI1 may be involved in the biosynthesis of the main flavonoid intermediate. ZbFNSII, ZbANS, and ZbFLS may be involved in the biosynthesis of flavones, anthocyanins, and flavonols, respectively. A map of the biosynthesis pathways of the 12 major flavonoids in Z. bungeana is proposed. Conclusions The chemical constituent analysis revealed the compositions of 9 characteristic flavonoids in different tissues of Z. bungeana. Linarin can be hydrolysed into acacetin to exert a pharmaceutical role. Apigenin-7-O-rutinoside is hypothesised to be the precursor of linarin in Z. bungeana. There was greater content of linarin in the aerial parts of the plant than in the whole herb, which provides a theoretical basis for using the aerial parts of Z. bungeana for medicine. These results provide a valuable reference for further research on the flavonoid biosynthesis pathways of Z. bungeana and will be significant for the effective utilisation and ecological protection of Z. bungeana.


Background
Ziziphora bungeana Juz. is a perennial semi-shrub. It is a subspecies of Z. clinopodioides Lam. and belongs to the genus Ziziphora L. in the Labiatae family. The plant is mainly distributed in Kazakhstan, China, Central Asia, and Mongolia, and is used to treat hypertension and heart diseases [1]. In China, it only grows in the Xinjiang Uygur Autonomous Region and has long been used as a folk medicine to treat diseases such as cardiopathy, hypertension, fever, headache, insomnia, and cardiopalmus [2,3]. The main active constituents of Z. bungeana are flavonoids and essential oils [1][2][3][4][5][6]. The essential oils of Z. bungeana have antioxidant and antimicrobial activities [1,4]. The flavonoids of Z. bungeana have significant protective functions for vascular endothelial cells [5][6][7][8].
Flavonoid represents a large subgroup of plant secondary metabolites with various biological activities. The major groups of flavonoids include flavones, flavanones, isoflavones, flavonols, chalcones, and anthocyanins. The flavone group is one of the largest groups of flavonoids. Flavones have hundreds of structures with diverse pharmacological properties. In recent years, there has been rapid development of flavone-based therapeutic agents [9,10]. The flavonoid ingredients of Z. bungeana, including some flavones, have significant effects on cardiovascular diseases. Previous studies have reported that apigenin, chrysin, and ethyl 4-coumarate of Z. bungeana are effective vasorelaxant compounds [11]. Linarin (buddleoside) and acacetin can also protect myocardial tissue from ischemia-reperfusion injury [12][13][14]. Traditionally, the whole herb of Z. bungeana has been used as a medicine to treat cardiovascular diseases. In China, Z. bungeana resources are limited and have been decreasing over time. To protect the Z. bungeana resources, many areas now only use the aerial parts of the plant for medicine [2,3]. However, whether there are differences in the flavonoid composition of the aerial parts relative to the whole herb remains unknown. Recently, studies have reported that among the three active ingredients of Z. bungeana (caffeic acid, rosmarinic acid, and linarin), linarin was the most abundant active ingredient [15], with the highest contents of linarin observed in the inflorescence [16]; however, it is still not clear whether linarin is the major flavonoid constituent in Z. bungeana. Aside from linarin, understanding of the tissue-specific distribution of other flavonoids is lacking. Therefore, it is necessary to comprehensively study the major flavonoids in different tissues of Z. bungeana.
Over the past decade, studies have revealed the core biosynthesis pathways of flavonoids in various plants such as Arabidopsis [17], rice [17], lettuce [18], Camellia sinensis [19], Salvia miltiorrhiza [20], Oroxylum indicum [21], Chrysanthemum morifolium [22], Chrysanthemum indicum, and Scutellaria baicalensis [23,24]. The biosynthesis pathways of Chrysanthemum indicum's bioactive ingredients, including apigenin, luteolin, and linarin, have been elucidated. Linarin is thought to be biosynthesised from acacetin-7-O-glucoside [23]. However, it is still not known whether linarin in Z. bungeana is biosynthesised from acacetin-7-O-glucoside. In this study, metabolomic and transcriptomic profiling was used to identify the tissue-specific flavonoids in Z. bungeana and their biosynthesis pathways. A combination of Illumina-and SMRT-based sequencing techniques was used to identify the full-length unigenes [25][26][27][28]. Real-time quantitative PCR and phylogenetic relationship analyses were used to determine the functions of flavonoid biosynthesis genes in Z. bungeana. Our analysis revealed that linarin is the major flavonoid in Z. bungeana and it is most probably biosynthesised from apigenin-7-O-rutinoside. A map of the biosynthesis pathways of the major flavonoid ingredients, and the involved genes, is proposed.

Collection of plant materials
The plant materials were three-year-old Z. bungeana plants grown in the Xinjiang Institute of Materia Medica; all were of similar growth status. The plants were cultivated in plastic baskets (35 × 45 × 35 cm) with soil from the natural habitat of Z. bungeana. The root, stem, leaf, and inflorescence of Z. bungeana were collected on July 2, 2016. The infructescence was collected on July 30, 2016. Each tissue sample is with three biological replicates. All tissues were cut into small pieces, placed into liquid nitrogen for an hour, and then stored in an ultralow temperature freezer (− 80 °C) before high throughput sequencing. Meanwhile, part of each sample was dried at 60 °C for 48 h, crushed, and sifted passing a 40 mesh sieve. Each powder (approximately 1 g) was extracted twice by reflux extraction in 50 ml 70% ethanol (v/v) for 90 min. After filtering, the extracts were mixed and diluted to 100 ml for further LC-MS/MS analysis.

LC-MS/MS analysis
Caffeic acid, hyperoside, rutin, rosmarinic acid, quercetin, apigenin, chrysin, and genkwanin standards were purchased from the National Institutes for Food and Drug Control (Beijing, China). Linarin, luteolin, and kaempferol were purchased from ChromaDex (Irvine, United States). Acacetin was purchased from Spring & Autumn Biological Engineering (Nanjing, China). High-performance liquid chromatography (HPLC) was performed using an Agilent 1260 HPLC system. The separation and purification for total flavonoids of Z. bungeana was following our reported method [15,16,29]. For each sample, 2 ml of extracted solution was filtered through a 0.22 μm membrane and injected into the HPLC system. The separation was performed on a 150 × 4.6 mm extend C18 column held at 30 °C; the flow rate was 0.8 ml/min. The optimal mobile phase consisted of A (CH3COOH/H2O, 0.2:100, v/v) and B (CH3OH). The optimized HPLC gradient elution conditions were as follows: 0-8 min, 5% B; 8-60 min, 5-80% B; 60-65 min, 80-95% B; 65-70 min, 95-5% B; 70-80 min, 5% B. The injection volume was 15 μl and the detection wavelength was 340 nm. Flavonoids were quantified by calculating the area of each peak and comparing them with standard curves obtained from pure compounds.
HPLC-MS/MS was performed using an LC/MSD-Trap-SL electrospray ion trap LC/MS (1100 Series LC/ MSD Trap, a complete LC-MS/MS). All HPLC system devices were from the Agilent 1100 series and consisted of a vacuum degasser (model G1322A), a quaternary pump (model G1311A), an autosampler (model G1329A), a thermostated column compartment (model G1316A), and a diode array detector (model G1315A). The settings were as follows: ion source type, APCI (positive mode); nebulizer pressure, 40 psi; dry gas temperature, 350 °C; dry gas flow, 9.0 l/min; APCI Vap temperature, n/a; corona current, n/a; capillary voltage, 3500 V. The LC/ MSD-Trap software 5.3 was used for data analysis. The relevant literature on Z. bungeana published in CNKI, PubMed, SciFinder, and ChemSpider was used to establish the database. The chemical structures were further confirmed based on fragment ions [30][31][32][33][34].

SMRT-based RNA sequencing
For Pacbio-based RNA sequencing (RNA-seq), the samples for Pac-Bio library construction were the mixed RNAs from all five plant samples. The full-length cDNAs were synthesised using a SMARTer PCR cDNA Synthesis Kit (Clontech, USA). The cDNA library was constructed using the SMRTbell ™ Template Prep Kit 1.0, following the manufacturer's instructions. The cDNA library was sequenced using a Sequel ™ Sequencing Kit 2.0 with a PacBio Sequel system (Pacific Biosciences, USA). The raw sequence data were deposited in the SRA database and are available under accession number SRR10352319.

Sequence assembly and unigene annotation
After filtering the adaptors and low-quality reads, the clean reads generated by the PacBio Sequel system were assembled and clustered by sequence recognition and isoform-level clustering using CD-HIT software; this was used as the reference transcriptome [35]. Then, the clean reads generated by the Illumina sequencing platform were mapped to the reference transcriptome; the unmapped sequences were assembled using Trinity software [27]. Lastly, the unigenes acquired by the PacBio Sequel system and the Illumina sequencing platform were further combined and clustered using CD-HIT software to generate assembled unigenes. The unigenes were annotated based on their sequence similarity to other genes or proteins in public databases such as NCBI Non-Redundant Protein (NR), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomics (KEGG), Evolutionary Genealogy of Genes: Non-supervised Orthologous Groups (eggNOG), and Swiss-Prot.

Analysis of differentially expressed unigenes
The clean reads acquired by the Illumina platform were mapped to the assembled transcript sequences using the software RSEM [36]. The number of reads mapped to each gene was counted and normalised into an FPKM (Fragments Per Kilobase of transcript per Million fragments mapped) value using Cufflinks [37]. The unigene expression levels were evaluated based on the FPKM values. The differentially expressed genes (DEGs) between every two samples were determined using the software DEseq [38]. Significant expression differences between different samples were determined based on a |log2 (fold change)| > 1 and a significance p value < 0.05. Volcano plots and a heatmap were constructed using the ggplots2 software package and the pheatmap package of R.

Identification of genes involved in flavonoid biosynthesis
We classified the DEGs based on GO categories and KEGG pathway analysis. The DEGs enriched in the GO categories and KEGG pathways related to flavonoid metabolism were selected and analysed. We searched for unigenes involved in flavonoid biosynthesis via their sequence similarity with flavonoid biosynthesis enzymes in other plant species using the tBLASTn algorithm (E < 1e−5). These unigenes were further screened based on their ORF (open reading frame) lengths and their expression levels (FPKM values). The screening criteria are: 1. ORF > 300 bp; 2. FPKM value > 10. Primers were designed for amplifying the coding sequences of 18 unigenes that satisfied the two criteria (Additional file 1: Table S1). Then, the amplified sequences that agreed with the predictions were submitted to the Genbank group for an arrangement of accession numbers.

Sequence feature and phylogenetic relationship analysis
The coding sequences of genes and predicted proteins were acquired using the software Vector NTI 11.5 [39]. The phylogenetic tree was constructed using the neighbour-joining clustering method with 1000 bootstrapped replications [40]. The phylogenetic tree was customised and annotated using the online tool Evolview v2 [41].

Real-time quantitative PCR
Total RNA was extracted from Z. bungeana tissues from the root (Rf ), stem (Sf ), leaf (Lf ), and inflorescence (Ff ) using an RNAprep Pure Plant kit for polysaccharides and polyphenolics-rich materials (TIANGEN, Beijing, China). RNA integrity was analysed on 1% agarose gel and RNA quantity was determined using a NanoDrop 2000c spectrophotometer (Thermo Scientific, USA). Reverse transcription was carried out with PrimeScript ™ RT Master Mix (TAKARA, Dalian, China). Quantitative real-time PCR (qPCR) was carried out in triplicate reactions using a BIO-RAD CFX96 system. The qPCR experiment used the TB Green ® Premix Ex Taq ™ (Tli RNaseH Plus) reagent (TAKARA, Dalian, China) with the internal reference ZbEF1γ [42]. The online primer design tool Primer3Plus (http://www.prime r3plu s.com/ cgi-bin/dev/prime r3plu s.cgi) was used to design primers for quantitative real-time RT-PCR. For gene family members, the primers with the best specificity were chosen (Additional file 1: Table S2).

Unigene characterisation and functional annotation
Using the Illumina-and SMRT-based high-throughput sequencing platforms, we acquired the full-length transcriptome of Z. bungeana, with a total of 397,182 assembled unigenes. These unigenes had an average length of 870 bp (Additional file 1: Table S5). The number distributions of these unigenes in each 200 bp range were shown as Fig. 3a. All the 397,182 unigenes were used to search against the NCBI non-redundant protein sequences (Nr), Gene Ontology (GO), Kyoto Encyclopedia of Genes and  Table S6. The Nr annotation results showed that about 225,633 unigenes had significant similarities with other genes in Genbank, accounting for 56.81% of the total number of unigenes. As shown in Fig. 3b, BlastX against the Nr database indicated that the unigenes matched many species. The top nine hits were all plant species. Although Salvia miltiorrhiza and Z. bungeana are in the same family (Labiatae), S. miltiorrhiza was only ranked 9th among the top hit species.
The levels of gene expression among the various tissues were compared. The expression levels of the unigenes were determined based on the normalised FPKM values; 42,875 DEGs were identified. The FPKM values of these DEGs were used to construct a heatmap (Additional file 2: Figure S2). The distribution and expression fold changes in these DEGs are shown as four volcano plots (Additional file 2: Figure S3). We enriched the DEGs into the GO categories and the KEGG pathways (Additional file 3). Among the DEGs between different tissues, 7757 DEGs (Rf vs. Ff ), 5132 DEGs (stem vs. inflorescence, Sf vs. Ff ), 6975 DEGs (leaf vs. inflorescence, Lf vs. Ff ), and 7687 DEGs (infructescence vs. inflorescence, Faf vs. Ff ) were categorised by their GO terms. When comparing the enriched GO terms, we found that the 10 most redundant terms were similar. Thus, we counted the numbers of DEGs in the 10 most redundant GO terms. The enriched GO terms and the number of enriched DEGs are shown in Additional file 2: Figure S4. We used the DEGs to search the KEGG pathway database and arranged the pathways. Furthermore, the significantly enriched pathways were obtained by hypergeometric distribution computation (the total unigenes were used as the reference genome). Then, the dot plots were drawn to show the 20 most significantly enriched pathways for each comparison (Additional file 2: Figures S5-S8). Among the 20 most significantly enriched KEGG pathways for Rf vs. Ff, 'Flavonoid biosynthesis' was ranked 13th and contained 8 DEGs. For Sf vs. Ff, 'Flavonoid biosynthesis' was also ranked 13th among the 20 most enriched KEGG pathways, but it contained only 6 DEGs.

Analysis of genes involved in flavonoid biosynthesis
The search of the KEGG pathways and GO categories with enriched DEGs revealed 14 DEGs enriched in the KEGG pathways related to flavonoid biosynthesis ('ko00941 Flavonoid biosynthesis' and 'ko00944 Flavone , all of these DEGs with ORFs more than 300 bp; 23 DEGs encoded glycosyltransferase (GT) with ORFs more than 600 bp. The expression patterns of these DEGs in various tissues were analysed based on their FPKM values. As shown in Additional file 5, the unigene c334398_g1 encoding CHS had the highest FPKM value in the inflorescence. The two unigenes that encoded ANS (c312087_g1 and c332631_g1) also had high FPKM values. For these DEGs, a heatmap was constructed to show the classification of tissue-specific expression. The result indicated that 10 unigenes had inflorescence-specific expression (Fig. 4). To further analyse and identify the candidate genes that are putatively involved in flavonoid biosynthesis in Z. bungeana, the key enzymes, such as phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumarate-CoA ligase (4CL), and flavone synthase II (FNSII), from Arabidopsis and Salvia miltiorrhiza were used to search for homologous unigenes in the assembled Z. bungeana unigene set using the tBLASTn algorithm (E < 1e−5). The results revealed 18 candidate genes with ORFs > 300 bp and FPKM values > 10. The Z. bungeana genes included 3 in the PAL family, 8 in the 4CL or 4CL-like family, and 1 gene each in the C4H, CHS, CHI, FNSII, FLS, DFR (dihydroflavonol reductase), and ANS family. Interestingly, unigenes c334398_g1, c378458_g1, and c332631_g1 were specifically expressed in the inflorescence, and encoded CHS, CHI, and ANS, respectively. Specific primers were designed to amplify the cDNAs of the 18 candidate gene sequences via RT-PCR. Based on the PCR amplification and cDNA sequencing results, the coding sequences of The phylogenetic relationships of the proteins and tissue-specific expression patterns of the flavonoid biosynthesis genes 4CL (EC 6.2.1.12) is the key enzyme involved in the three steps in the common phenylpropanoid pathway and is the main branch point enzyme generating activated thioesters [56]. The 4CLs from monocot and dicot plants are separated into two distinct clades. In dicot plants, the bona fide 4CLs can be grouped into three clusters; the type I cluster is mainly involved in lignin synthesis or the production of additional phenolic compounds other than flavonoids, the type II cluster is most probably associated with flavonoid biosynthesis [57,58]; the type III cluster may be involved in converting more broad substrates, including sinapate [59]. The identified 8 genes that encode 4CL and 4CL-like proteins in Z. bungeana were used to construct a phylogenetic tree with the 4CL and 4CL-like proteins from Arabidopsis, and the bona fide 4CLs from several other plant species (Fig. 5a). The results showed that 4 of the putative Zb4CLs were clustered in 4CL-like groups, which probably have no measurable catalytic activity toward phenylpropanoid pathway intermediates, and so they were named Zb4CLK1, Zb4CLK2, Zb4CLK3, and Zb4CLK4 Fig. 4 Heatmap depicting the tissue-specific expression patterns of DEGs probably involved in flavonoid biosynthesis. The heatmap was conducted from the standardized FPKM profiles of the unigenes. Rf, Sf, Lf, Ff, and Faf represent root, stem, leaf, inflorescence, and infructescence, respectively. The unigene marked using black '#' represent DEG encoding CHS. The unigene marked using red '#' represent DEG encoding CHI. The unigene marked using blue '*' represent DEG encoding FLS. The unigene marked using green '*' represent DEG that encode ANS. The unigene marked using red '*' represent DEG that encode GT [60]. For the other bona fide Zb4CLs, Zb4Cl1 and Zb4CL2 were classified as type I 4CLs, Zb4CL3 was classified as a type II 4CL, and Zb4CL4 was classified as a type III 4CL [57,59,61]. Zb4CL1, Zb4CL2, and Zb4CL4 may also play overlapping roles in flavonoid biosynthesis, like the four isoforms of 4CLs in Arabidopsis [62]. FNSII enzymes play key roles in flavone biosynthesis in plants [63]. The FNSII-1 isoform has more broad specificity for and Zb4CLK4 (MN599463). b Tissue-specific expression patterns of ZbPALs, Zb4CLs, ZbC4H1, ZbCHS1, ZbCHI1, and ZbFNSII in the root (R), stem (S), leaf (L) and inflorescence (F). The number at the y-axis represents the fold changes of gene expression levels. The relative abundance of ZbPAL3, Zb4CL3, ZbC4H1, ZbCHS1, ZbCHI1, ZbFNSII, ZbFLS, ZbDFR and ZbANS in root is arbitrarily set to 1 in each chart, respectively flavanones as substrates, whereas the FNSII-2 isoform is specific for pinocembrin and responsible for the synthesis of chrysin [63][64][65][66][67][68][69]. To identify the relationship between FNSIIs in Z. bungeana and other plant species, a phylogenetic tree was constructed (Additional file 2: Figure S9). The result showed that ZbFNSII is closely related to CYP93B25 in Salvia miltiorrhiza, which suggests that the two proteins may have similar functions.
Based on the predicted sequences of ZbPAL1, ZbPAL2, ZbPAL3, and ZbC4H1, and the verified sequences of Zb4CLs, ZbCHS1, ZbCHI1, ZbFNSII, ZbFLS, ZbDFR and ZbANS, primers were designed to analyse their expression patterns by qPCR (Fig. 5b). ZbEF1γ was used as the internal reference gene; it was confirmed as suitable for qPCR after evaluation with GeNorm, NormFinder, and BestKeeper software (Genbank accession number MN599468) [42]. Among the three ZbPALs, qRT-PCR analysis showed that ZbPAL1 had the highest expression level in roots, ZbPAL2 had the highest expression level in leaves, and ZbPAL3 had the highest expression level in leaves and inflorescence. ZbC4H1 had the highest expression level in the stem. 4CL (EC 6.2.1.12) is the last enzyme in the common phenylpropanoid pathway. Among the 4 bona fide Zb4CLs, Zb4CL3 had the highest expression level in inflorescence, followed by in stem and leaf, suggesting that it plays essential roles in flavonoid biosynthesis in flowers. Zb4CL1 and Zb4CL2 had the highest expression levels in the stem, which suggesting they may be involved in the biosynthesis of lignin and phenolic compounds. The expression level of Zb4CL4 was the highest in inflorescence. Among the other genes that encode CHS (EC 2.3.1.74), CHI (EC 5.5.1.6), FNSII (EC 1.14.19.76), FLS (EC 1.14.20.6), DFR (EC 1.1.1.219), and ANS (EC 1.14.20.4), ZbCHS1, ZbFNSII, and ZbANS had the highest expression levels in inflorescence, whereas ZbCHI1, ZbFLS, and ZbDFR had the highest expression levels in leaves (Fig. 5).

Tissue-specific flavonoid compositions in Z. bungeana and the proposed mode of action of linarin for myocardial tissues
This study has revealed 39 chemical compounds from Z. bungeana extracts by LC-MS/MS, including 25 flavonoids and 22 possible new compounds. Analysis of the relative redundancies of the 12 major flavonoids (including two new compounds) indicated that 9 flavonoids have tissue-specific distributions. The 9 flavonoids were taken as the characteristic chemical compounds of the root, stem, leaves, or inflorescence. We detected 5 (poly)methoxylated flavones (5, 9, 10, 11, and 12, in Fig. 1). Methoxylated flavones have an array of bioactivities and may be involved in plant chemical defence mechanisms; they also represent promising natural lead molecules for the development of potential antiproliferative, antidiabetic, or anti-inflammatory drugs [70]. In rice, the naringenin O-methyltransferase (OsNOMT) can be induced by UV and jasmonate treatment [71]. Z. bungeana is distributed in the Tianshan and Altai mountains (altitude 1200-2500 m) in Xinjiang, China, where there is always strong UV radiation in summer. Therefore, we suggest that the (poly)methylation of flavonoids in Z. bungeana may be induced by UV radiation or some other adverse factors. Consistent with previous studies, linarin was found to be the major flavonoid component, and it was found to mainly accumulate in flowers [16]. Previous studies also showed that acacetin extracted from Z. bungeana has an important protection function for rat cardiomyocytes, but the content of acacetin in the extracts was very low (0.004-0.005%) [13]. In this study, we did not detect any acacetin in Z. bungeana extracts. However, it has been reported that linarin is unstable and can be hydrolysed to acacetin in vitro and in the rat intestine [72,73]. Considering the available evidence, we postulate that linarin can be hydrolysed into acacetin and glycosides in the stomach of rats, and the resulting acacetin has a protective effect on rat myocardial tissue.

A functional gene analysis platform has been established based on transcriptome profiling and data analysis
In this work, we acquired 397,182 unigenes by transcriptome profiling and data analysis. In these unigenes, 97,562 sequences had lengths between 1000 and 4000 bp. This number is far more than other plant species, such as Arabidopsis, rice, or Salvia miltiorrhiza. This indicates that the sequencing result has sufficient depth and coverage. Thus, it appears that a combination of Illumina-and SMRT-based RNA sequencing techniques can acquire more transcriptome data with a comparatively lower cost as compared to the use of a single sequencing method. Furthermore, this method is more conducive for identifying full-length cDNAs, alternative-splicing events, and gene expression patterns [74][75][76][77]. In this study, besides massive full-length genes, we also discovered alternativesplicing events, such as for some Zb4CL genes. Further research is currently underway to identify the underlying causes and modes of action of these alternative transcripts. Until now, the genome of Z. bungeana has been unknown; thus, the Z. bungeana transcriptome reported here will serve as a valuable resource for further Z. bungeana functional gene research.

Systemic identification of enzyme genes involved in flavonoid biosynthesis pathways
Analysis of the DEGs enriched in GO categories and KEGG pathways, and the BLAST search, identified a total of 18 unigenes with high similarities to enzyme genes involved in flavonoid biosynthesis in other plant species. Among them, the coding sequences of 14 genes were verified by cDNA sequencing, including 8 genes that encode 4CL or 4CL-like proteins. In Euscaphis konishii Hayata [77], Arabidopsis [78], Camellia sinensis [79], and Ilex paraguariensis [80], tissue-specific gene expression is closely related with tissue-specific synthesis, accumulation, and modification of flavonoids. Thus, we analysed the relationships between tissue-specific gene expression patterns and flavonoid accumulation in Z. bungeana. Phylogenetic analysis revealed that Zb4CL3 may be involved in flavonoid biosynthesis. Tissue-specific expression analysis revealed that Zb4CL3 has a higher expression level in inflorescence than other tissues, which suggests that it plays a specific role in flavonoid biosynthesis in flowers. However, the other three Zb4CLs may also play overlapping roles in flavonoid biosynthesis, like the four isoforms of 4CLs in Arabidopsis [62]. Based on the FPKM value analysis and qPCR analysis, ZbCHS1 had the highest expression level in inflorescence, suggesting it plays a significant role in the crucial branch point of the flavonoid biosynthesis pathway in the inflorescence. ZbFNSII is closely related to CYP93B25, and ZbFNSII had the highest expression level in inflorescence, which suggests that it plays an important role in flavone biosynthesis (Additional file 2: Figure S9 and Fig. 5). ZbANS also had a very high expression level in inflorescence, but we did not detect any related anthocyanidin in extracts from various tissues. This may be because anthocyanidins are water-soluble substances and cannot be extracted from 70% ethanol solvent with relatively low polarity [81]. qPCR analysis revealed that ZbFLS had the highest expression level in leaves, suggesting that they have specific functions in the biosynthesis of some flavonol glycosides (such as 1 and 3 in Fig. 1) in leaves. In this study, we discovered that most of the flavones in Z. bungeana were 7-O-rutinosides flavones, which suggests that flavones from different plant species have specific glycosylation modes [81][82][83][84][85]. Based on our comparison of gene expression levels, we suggest that gene expression analysis results based on qPCR are more reliable than those based on FPKM values because qPCR provided better differentiation of gene family members. However, the FPKM value analysis result can provide reference data for the overall range of gene expression levels. A combination of qPCR and FPKM value analysis may produce more reliable results with respect to gene expression levels.
Proposed biosynthesis pathways of the 12 major flavonoids in Z. bungeana Z. Bungeana contains special flavonoids, which may be correlated with its pharmaceutical efficacy. The biosynthesis pathways for some special flavonoids have remained largely unknown. Based on the reported results and our analysis, we have proposed the biosynthesis pathways for 12 major flavonoids including two new compounds in Z. Bungeana (Fig. 6). The map also depicts the involvement of several new genes. It is proposed that ZbPALs, Zb4CL3, ZbCHS1, and ZbCHI1 are involved in the biosynthesis of the flavonoid skeleton; ZbFNSII is involved in the accumulation of linarin; and ZbFLS is involved in the accumulation of some flavonol glycosides in the leaves of Z. bungeana. ZbANS is involved in the biosynthesis of anthocyanin. This map provides the first systematic summary of the main flavonoid components, their synthesis pathways, and the involved genes in Z. bungeana. It provides a research basis for producing some specific flavonoids of Z. bungeana via bio-engineering.
Unlike model plants, the biosynthesis pathways of some secondary metabolites in non-model plants is not single, for example, the proanthocyanins in Euscaphis konishii Hayat have two biosynthesis pathways [77]. In the 12 major flavonoids, linarin with the highest relative abundance may be one of the main active ingredients. Recently, the linarin biosynthesis pathway in C. indicum has been described. It was suggested linarin is produced by adding a rhamnose to acacetin-7-O-glucoside [23]. However, linarin is most probably biosynthesized by adding a methoxyl group to apigenin-7-O-rutinoside in Z. Bungeana because it was identified by LC-MS/MS but not acacetin-7-O-glucoside. Moreover, the glycosylated modification of apigenin may occur before the methoxylation step in Z. bungeana. These findings have provided new insight into the key steps in linarin biosynthesis.

Major contributions to resource protection and drug development
Our results showed the tissue-specific distribution of flavonoids in Z. bungeana, which supported the use of the aerial parts of the plant for medicine but not for including roots. Since Z. bungeana is a kind of perennial plants, retaining its roots will be benefit for its reproduction in the next year. The full-length transcriptome can also contribute to investigate the mechanism of Z. Bungeana's growth, development and defense. So this finding is in favor of protecting this medicinal resource. Besides, this study has revealed some new enzyme genes that involved in the biosynthesis pathways of the 12 major flavonoids in this species. These enzyme genes can be introduced into expression platforms to elucidate the functions of the enzymes, which may be used to biosynthesize some drugs or intermediates by bio-engineering. Furthermore, the new flavonoids discovered in Z. Bungeana will contribute to the natural drug development.

Conclusions
This study integrated metabolomic and transcriptomic data to analyze the flavonoid constituents and their biosynthesis pathways in Z. bungeana. The 12 major flavonoids were revealed by metabolomic data and 9 of these flavonoids were tissue-specific components, of which, linarin was the most abundant flavonoid in Z. bungeana. Compared with the whole herb, linarin was abundant in the aerial parts of the plant, especially in flowers, providing a theoretical basis for using the aerial parts of Z. bungeana for medicine as possible. Linarin might be primarily biosynthesized from apigenin-7-Orutinoside and hydrolysed into acacetin and glycosides in the stomach of rats, then the resulting acacetin had a protective effect on rat myocardial tissue. Based on the functional gene research platform established by fulllength transcriptomic profiling, we have identified 18 unigenes that are probably involved in the core flavonoid biosynthesis pathways in Z. bungeana and 7 genes are firstly described as key genes for flavonoid biosynthesis. Finally, the biosynthesis pathway map for the 12 major flavonoids in Z. Bungeana was provided. It has provided a basis for the resource protection and drug development from this medicinal plant.