Integrated quality evaluation strategy for multi-species resourced herb medicine of Qinjiao by metabolomics analysis and genetic comparation

Background Quality evaluation of multi-species resourced herb medicine (MSRHM) is a main problem for quality control of herb medicine. Current quality evaluation methodology lost consideration of species discrepancy. New quality evaluation strategy for MSRHM is in urgent need. Qinjiao, a representative MSRHM, originated from Gentiana macrophylla Pall., Gentiana straminea Maxim., Gentiana crassicaulis Duthie ex Burk. or Gentiana dahurica Fisch., has been used as an important herb medicine over 2000 years for expelling wind-dampness and relieving impediment pain. However, quality evaluation among species has never been revealed. The current work proposes an integrated quality evaluation strategy for MSRHM of Qinjiao, which may promote innovation of quality control of MSRHM. Methods In this work, 58 batches of Qinjiao covering 4 species were collected. Genetic comparative analysis based on ITS2 sequence was conducted. Metabolomics analysis based on TOF–MS and NMR spectrum were carried out. Compounds underlying species differences were identified and their discrepancies among species were investigated by ANOVA analysis and multivariate analysis. Results Four species of Qinjiao can be authenticated by ITS2 sequence comparation. Metabolomics analysis by TOF/MS and NMR revealed chemical discrepancies among species of Qinjiao. Maximum discrepancy was present between Gentiana crassicaulis Duthie ex Burk. and Gentiana dahurica Fisch. Chemical difference among species were tentative explored. For TOF–MS profiling, 28 constituents were tentative identified, 17 of which were further confirmed by standards. For 1H-NMR profiling, signals from 5 compounds were assigned. Contents discrepancies were investigated by ANOVA analysis. It seems that (seco)iridoids like loganic acid, gentiopicroside or swertiamarin were richer in specie of Gentiana crassicaulis Duthie ex Burk., while flavonoid (morroniside) and triterpenoids (roburic aicd, ursolic acid, oleanolic acid, β-sitosterone) were richer in specie of Gentiana dahurica Fisch. The current research demonstrates that metabolite profiling based on both UPLC/Q-TOF MS and 1H-NMR coupled with ITS2 sequence comparation can be a powerful tool for quality investigation of MSRHM of Qinjiao. Conclusions A comprehensive quality evaluation strategy for MSRHM was proposed by integrating UPLC-Q-TOF–MS, NMR based metabolic analysis and ITS2 sequence genetic comparation. The proposed quality evaluation strategy shall promote innovation of quality control of traditional Chinese medicine.


Background
Quality control of multi-species resourced herb medicine (MSRHM) is a main problem for quality control of herb medicine. Current quality control methodology bears disadvantages of lacking consideration of species discrepancy, ignoring the fact that discrepancies among species were inevitable and shall produce different chemical component and clinical effects. As a result, new quality investigation strategy for MSRHM is in urgent need.
Qinjiao, namely Gentianae Macrophyllae Radix, is an ancient Chinese herb medicine and has been described and recorded in several ancient Chinese Medicine monographs like Shen Nong's Herbal Classic (Han Dynasty, Shen Nong Ben Cao Jing), Compendium of Materia Medica (Ming Dynasty, Ben Cao Gang Mu) [1] and also in Chinese pharmacopoeia. Over 2000 years, Qinjiao has been utilized to treat a wide range of diseases, including hypertension, osteoarthritis, and especially rheumatism [2]. According to China Pharmacopoeia 2015 version [3], Qinjiao consists of the dried roots of Gentiana macrophylla Pall. (G. macrophylla), Gentiana straminea Maxim. (G. straminea), Gentiana crassicaulis Duthie ex Burk. (G. crasicaulis) or Gentiana dahurica Fisch. (G. daurica). As a represented MSRHM, quality and efficacy of Qinjiao among species were inevitable variable [4,5]. Former researches mainly focused on contents determination or chemical profiling of certain kind of Qinjiao [6,7], or even chemical and genetic analysis of G. crasicaulis and G. macrophylla [8]. No systematic species investigation for four kinds of Qinjiao has been revealed, which may shed new light into quality control and clinical utilization of Qinjiao.
DNA barcode of internal transcribed spacer 2 (ITS2) is prevalently adopted as a universal barcode for plant, especially herbal medicinal identification [9]. ITS2 barcode has been successfully employed for species identification of Qinjiao [8,10]. Chemical profiling combined with multivariate analysis provided systematic chemical comparison of metabolites, and can be powerful tool for species investigation of MSRHM [11]. Ultraperformance liquid chromatography quadrupole timeof-flight mass spectrometry (UPLC-Q-TOF-MS) and proton nuclear magnetic resonance spectrometer ( 1 H-NMR) are the most frequently employed platform for metabolic profiling.
Herein, this research collected 58 batches of Qinjiao, covering four species, originated from main producing areas of China (Gansu, Shanxi, Hebei, Shaanxi, Heilongjiang, Liaoning, Nei Mongol, Yunnan, Szuchuan, Qinghai, Tibet). Genetic comparative and metabolic profiling analysis of four species of Qinjiao were carried out. First, all herbs were authenticated by morphological identification as well as DNA barcoding comparison. Then, chemical profiles were acquired by UPLC-Q-TOF-MS and 1 H-NMR. Third, metabolomics based species investigation, including principal component analysis (PCA) and orthogonal partial least squares discrimination analysis (OPLS-DA), was conducted to reveal the quality discrepancy among species. Finally, an integrated quality evaluation strategy for MSRHM of Qinjiao by UPLC-Q-TOF-MS, NMR based metabolomics analysis and ITS2 sequence genetic comparation was established. The flow chart of the proposed methodology is shown in Fig. 1. The current work may facilitate quality control, utilization and species discrimination of different kinds of Qinjiao.

Sample collection and preparation
Fifty-eight batches of rhizomes of G. macrophylla, G. straminea, G. crasicaulis and G. daurica were collected from different herbal markets or harvested from various locations of China. All samples were authenticated by Professor Jiuzhi Yuan (Shenyang Pharmaceutical   Table 1. The roots were gently washed and dried at 50 °C for 48 h, and then were grounded into powder and stored in glass jars in the dark at room temperature until further analysis.

Genetic Analysis of collected Qinjiao samples
ITS2 was prevalently adopted as a universal barcode for plant, especially herbal medicinal identification [9]. ITS2 barcode has been successfully employed for species identification of Qinjiao [8,10]. In current research, ITS2 region was compared to verify the authentication of collected Qinjiao samples. To extract total genomic DNA from dried rhizomes (50 mg), the protocol provided by the Plant Genomic DNA Kit (Tiangen Biotech, Co., Ltd., Beijing, China) was used. Extracted DNA sample was stored at − 20 °C until use. The ITS2 sequence was amplified using a pair of primers (ITS2F: 5′-ATG CGA TAC TTG GTG TGA AT-3′; ITS3R: 5′-GAC GCT TCT CCA GAC TAC AAT-3′) described previously by Chen et al. [12]. The PCR amplification was conducted as described by Luo et al. [13]. Amplification products were examined by electrophoresis in 1% (wt/vol.) agarose gels and visualized under ultraviolet light to detect successfully amplified products and the possible contamination of negative controls. After purifying, the PCR products were directly subjected to sequencing.
Sequences were edited and assembled using DNA-MAN software (version 6.0) and refined manually. ITS2 resuquence were identified using DNA barcoding system for identifying herbal medicine (http://www.tcmba rcode .cn). Genetic distances were calculated using the Kimura-2-Parameter (K2P) model. All the newly obtained ITS2 sequences were uploaded to GenBank.
UPLC-Q-TOF-MS profiles analysis of the collected Qinjiao 0.5 g power (through No. 3 sieve) of each Qinjiao sample was placed into a separate 50 mL stopper conical flask followed by the addition of 20 mL methanol. The mixtures were vortexed for 1 min, sonicated (40 kHz, 500 W) for 30 min. Cooling down to ambient temperature, the lost weight was made up by methanol. After centrifugation at 10,000 rpm, 4 °C for 10 min, a 2 μL aliquot of the supernatants was injected into a UPLC-ESI-Q-TOF system (AB Sciex, Framinghan, MA, USA) for MS analysis.
The mass spectrometric data were collected on a SCIEX X500R QTOF mass spectrometer (AB Sciex, Framinghan, MA, USA) coupled with an electrospray ionization interface in negative ion modes (ESI-). SCIEX OS software 1.2 (AB, Milford, MA) was employed for data acquisition and procession. The following parameters settings were used: the ion spray voltage of 4000 V; turbo spray temperature (TEM) of 600 °C; declustering potential (DP) of − 80 V; collision energy (CE) of − 45 V; nebulizer gas (gas 1) of 55 psi; heater gas (gas 2) of 55 psi, CAD gas of 7 psi, and curtain gas of 35. Nitrogen was kept as the nebulizer and auxiliary gas. TOF MS and TOF MS/MS were scanned with the mass range of m/z 50-1000. Continuous recalibration was carried out every six samples. In addition, dynamic background subtraction (DBS) trigger information-dependent acquisition (IDA) was used to trigger acquisition of MS/MS information of low-level constituents. The accurate mass and composition for the precursor ions and fragment ions were analyzed using the Markerview ™ software (Version 4.1, Waters Co., Milford, MA, USA) integrated with the instrument.
QC samples were prepared by combining equal aliquots from all Qinjiao samples and were injected every six specimens during the whole analysis. QC data obtained was used to assess the stability of the LC/MS platform. For all QCs, 5 characteristic features (list in Additional file 1: Table S1) were picked out to verify the stability. The results proved that variations of retention times were less than 0.2 min, drift values of m/z were less than 10 PPM, and the RSD of peak areas were all below 10% (Additional file 1: Table S1). Raw data from Q/TOF-MS were analyzed using Markerview for peak deconvolution and peak alignment with the following parameters: initial retention time 0.5 min, final retention time 23 min, mass tolerance 10 PPM, ion intensity threshold (3000 counts) and retention time tolerance 0.1 min. The data were combined into a single matrix by aligning peaks with the same mass-retention time pair together from each data file in the data set. The ion intensities of each peak detected (2806 MS features for ESI-modes) were normalized to the sum of the peak intensities in each sample. After normalization, the data was processed according to the "80% rule", briefly only variables with values above zero presenting in at least 80% of each group were kept for the following analysis [14].

H-NMR profiles analysis of the collected Qinjiao
The methanol extracting supernatants (600 μL) were dried down using a centrifugal vacuum concentrator and were redissolved in 600 μL of MeOD. After mixing well, 450 μL of reconstitution wan transferred into the 5 mm NMR tubes (Norell, Landisville, NJ, USA) for NMR analysis.
NMR spectral data were obtained at 300 K on a Bruker 600-MHz AVANCE III NMR spectrometer (Bruker, Germany), equipped with a 5.0-mm BBO probe, operating at 600.13 MHz for 1 H. The zg30 Bruker pulse program was used for 1D 1 H NMR, with a TD of 64 k, relaxation delay of 1 s, spectral width of 20 ppm, and 256 scans. A line-broadening factor of 0.3 Hz was applied to FIDs before Fourier transformation. All NMR spectra were phased and baseline-corrected manually using TOPSPIN 3.5 (Bruker, Germany). The spectra were referenced internally to the chemical shift of H-3 signal of gentiopicroside at 7.46 ppm. Each 1 H-NMR spectrum over the ranged 0.5-10.0 ppm was reduced to 238 regions of equal width (0.04 ppm) and the signal intensity in each region was integrated using AMIX (version 3.9, Bruker, Germany). The region of 4.75-5.20 ppm was removed prior to any statistical analysis in order to eliminate any residual water signal. Then data was normalized in AMIX by dividing each integrated segment by the total area of the spectrum to reduce any significant.

Statistical analysis and compound annotation
Output data from TOF-MS or NMR analysis was separately imported into SIMCA (version 14.0, Umetrics, Umeå, Sweden) for multivariate statistical analysis (MS data unit variance scaled, NMR data pareto-scaled). To provide comparative interpretations and visualization of the metabolic differences among the four species of Qinjiao, PCA and OPLS-DA were applied to the TOF-MS or NMR data set. The quality of the models was described by R2X and Q2 values. R2X shows the proportion of variance in the data explained by the models and indicates goodness of fit. The value closer to 1 indicates the goodness of fit. Q2, on the other hand, shows the proportion of variance in the data predictable by the model and indicates predictability. The results were visualized in the form of score plots, where each point represents an individual sample (to QJ01-QJ42 were deposited in department of pharmacy, the first affiliated hospital of Zhengzhou University, Zhengzhou, China; S0-S18 were deposited the Institute of Chinese Materia Medica, Shanghai University of Traditional Chinese Medicine, Shanghai, China show the group clusters), and loading plots or S-plots, where each coordinate represents one mass-retention feature or 1 H-NMR spectral region (to identify the variables contributing to the classification). The variable importance of projection (VIP) is the vector to summarize the total importance of the variable in explaining the model. The corresponding variables with VIP > 1.0 were chosen as potential discriminative metabolites. To justify the OPLS-DS models, analysis of variance testing of Cross-Validated predictive residuals (CV-ANOVA) were conducted. CV-ANOVA is a diagnostic tool for assessing the reliability of PLS and OPLS models. The P value produced by CV-ANOVA indicates the probability level. The common practice is to interpret a p-value lower than 0.05 as pointing to a significant model. Statistical analysis was also performed using one-way analysis of variance (ANOVA) followed by Tukey's multiple comparison test (SPSS, Chicago, IL, USA). A probability of P < 0.05 was considered to be statistically significant between two groups. LC-MS Peaks were identified according to actual mass, MS/MS fragments and retention time (RT). First, the m/z value of the molecular ion of interest was searched against a self-build Qinjiao constituent Database, where data was collected from published researches [15,16]. Then, the putative identifications were verified by comparing the MS2 fragmentations. Part of the constituents were further identified by reference standards. 1 H-NMR signals were assigned by comparing the spectrum of Qinjiao with that of gentiopicroside, loganic acid, sweroside, swertiamarin or sucrose by Chenomx NMR suit (version7.6, Chenomx, Edmonton, Canada) with method described in our previous work [17].

ITS2 comparation
ITS2 sequences from collected Qinjiao samples were submitted to GenBank database (Accession numbers were listed in Table 1), assembled with CodonCode Aligner 3.7.1 (CodonCode Co., Dedham, MA, USA) and aligned using ClustalW. Kimura 2-Parameter (K2P) distances. GC content of base and NJ trees were calculated and constructed using the MEGA X software with the Bootstrap method (500 resampling) and K2P model [18]. The distances within or among species are separately list in Table 2. The average ITS2 region was 497 bp in length in 4 species of Qinjiao, and the G+C content ranged from 55.7% to 56.2%, with an average of 56.0% (Table 2). A neighbor-joining (NJ) tree of ITS2 barcode was formed on the basis of K2P model (Fig. 2).

TOF-MS data metabolomics
Representative base peak intensity (BPI) chromatograms of G. crassicaulis is shown in Fig. 3. By comparing actual mass, MS/MS fragments and retention time (RT) of target compounds, 28 constituents were identified, among which 17 were further identified by reference standards. The compound information is list in Table 3, with XIC chromatography shown in Additional file 2: Fig. S1. ANOVA followed by Tukey's multiple comparison test were conducted for these 28 compounds (partly shown in Fig. 4).
Multivariate analysis of the TOF-MS data was carried out. Initially, unsupervised PCA-X analysis were conducted among groups, showing preferably discriminative distribution (not shown, R2X = 0.661, Q2 = 0.398). Subsequently, to maximize the variation among groups and to determine the variables that contributed to this variation, supervised OPLS-DA model (Fig. 5a) was employed among four species of Qinjiao, with R2X = 0.38, R2Y = 0.666, Q2 = 0.549. The loading plot (Fig. 5b) revealed the correlations between class (species) and variables (MASS feature), where variables clustering close to each class were considered to make great contributions to the classification. Furthermore, it was noticed that G. crassicaulis and G. dahurica clustered furthest from each other in the score plot (Fig. 5a). To explore the difference, OPLS-DA analysis for these two groups was conducted, with R2Y 0.924 and Q2 0.864 (Fig. 5c). Corresponding S-plot (Fig. 5d) was analyzed, which was commonly used and effectively showed the difference between groups. Part of the variable selected from S-plot with VIP value > 1.0 and their abundances were shown in Additional file 3: Table S2. The established OPLS-DA models were further validated by CV-ANOVA test, with P valve less than 0.05 indicating significant models (Additional file 4: Table S3).
To explore the diversity between G. crassicaulis and G. dahurica, OPLS-DA analysis was achieved between them, with R2X 0.726, R2Y 0.884 and Q2 0.848 (Fig. 7c). Corresponding S plot (Fig. 7d) was generated. The established OPLS-DA models were further validated by CV-ANOVA test, with P valve less than 0.05 indicating significant models (Additional file 4: Table S3).

ITS2 data analysis
Minimum distance within species was observed between G. macrophylla_and G. straminea, with K2P distance value 0.0146. While maximum K2P distance within species was present between G. crassicaulis and G. dahurica, with value 0.0371. The minimum interspecific distance of ITS2 region was higher than the maximum intraspecific distance, indicating that the ITS2 barcode performed well in the discrimination of four species of Qinjiao. The distances among species revealed were in accordance with discrepancy detected by following metabolomics analysis. It was illustrated that species of G. macrophylla, G. straminea, G. dahurica and G. crassicaulis can be clearly distinguished by the NJ tree. ITS2 analysis confirmed the potential discrepancy among species and guaranteed reliability of subsequent metabolomics-based species evaluation.

TOF-MS data analysis
The score plots (Fig. 5a) shows that samples from species of G. macrophylla, G. straminea, G. dahurica and G. crassicaulis located different areas in score plot, inferring distinctive chemical profiles of four species of Qinjiao. In general, G. crassicaulis and G. straminea were closest in the score plot, while G. macrophylla and G. dahurica were progressively far away from them. Maximum spatial distance was present between G. dahurica and G. crassicaulis. The result was consistent with discrepancy revealed by ITS2 analysis. The loading plot (Fig. 5b) revealed MASS features making great contributions to the classification, which was further confirmed by ANOVA test. Flavonoids (isovitexin, morronside, quercetin) and triterpenoids (oleanolic acid, ursolic acid, roburic acid, β-sitosterone, daucosterol), as well as certain iridoids (macrophylloside A and Qinjiaoside A) were significant higher in G. dahurica. than other 3 species; while other(seco)iridoids (swertiamarin, secologanoside, loganic acid, gentimacroside, loganic acid 11-O-β-glucopyranosylester) and citric aicd were richer in other 3 species of Qinjiao than G. daurica. No obvious distinction was detected in contents of sweroside, sucrose, swertiapunimarin, and swertiajaposide A among four species of Qinjiao.

H-NMR data analysis
According to the score plot (Fig. 7a, c), NMR based metabolomics validated results revealed by MS based metabolomics. The S-plot (Fig. 7d) between these two groups showed that signal abundance from 3 to 7 ppm was richer in G. crassicaulis, while which from around 0.8-2 ppm was richer in G. dahurica. Unfortunately, limited to the complexity of the 1 H-NMR spectra, compounds related to the abundance difference were not directly identified. However, the difference  . 4 Part of ANOVA test result of identified constituents by TOF-MS. *P < 0.05; **P < 0.01; ***P < 0.001;****P < 0.0001