Evaluation of seven DNA barcodes for differentiating closely related medicinal Gentiana species and their adulterants

Background Species identification of living organisms by standard DNA sequences has been well-accepted. Consortium for the Barcode of Life (CBOL) recommends chloroplast regions rbcL and matK as the DNA barcodes for the land plants. This study aims to evaluate the feasibility and limitations of rbcL, matK, and 5 other commonly used regions as the DNA barcodes for the medicinal Gentiana and their adulterants, Gentiana. rhodantha and Podophyllum hexandrum. Methods The species differentiation power of rbcL, matK, nuclear internal transcribed spacer (ITS) and 5S rRNA intergenic spacer, and chloroplast trnH-psbA, trnL-F and rpl36-rps8 intergenic spacers were tested in different medicinal Gentiana, including Gentiana scabra, Gentiana triflora, Gentiana manshurica and Gentiana rigescens, from common adulterants such as Gentiana rhodantha and Podophyllum hexandrum (a toxic herb producing podophyllotoxin). Results All seven tested loci could be used to differentiate medicinal Gentiana species from their adulterants, and to distinguish Guanlongdan from Jianlongdan. In terms of general differentiation powers, rbcL and matK had no significant advantages over the other five loci. Only the 5S rRNA and trnL-F intergenic spacers were able to discriminate the closely related species G. triflora, G. scabra and G. manshurica. Conclusion The DNA barcodes rbcL and matK are useful in differentiation of closely related medicinal species of Gentiana, but had no significant advantages over the other five tested loci.


Background
The nuclear and chloroplast genomes are the major targets for plant species authentication and phylogenetic studies. Since the rate of evolution varies across each genome, different DNA regions may be selected to reveal different taxonomic levels. The criteria for a useful DNA marker for authentication are: (1) high interspecific divergence; (2) low intraspecific divergence; (3) short PCR product of around 1 kb; and (4) availability of universal primers for amplification [1,2]. The Consortium for the Barcode of Life (CBOL) set up a standardized sampling method and experimental protocol to analyze agreed-upon "DNA barcodes" [3]. This universal identification system is called DNA barcoding. Recently, the CBOL Plant Working Group recommended that rbcL and matK should be used as the land plant barcodes [4]. The former offers high universality and good discrimination power, while the latter has higher resolution than other loci. However, it is known that the differentiation powers of rbcL and matK may not be sufficient for closely related species [5]. Indeed, plenty of land plants are identified by other DNA regions as markers.
The internal transcribed spacer (ITS) of the nuclear ribosomal cistron consists of ITS1 and ITS 2, and has been demonstrated to be useful for phylogenic studies in many angiosperm families [6]. Recently, over 60,000 ITS sequences of plants and animals from GenBank were compared [7]. At the species level, the success rates of identification were 91.9%, 76.1%, 74.2%, 67.1%, 88.1% and 77.4% for animals, dicotyledons, monocotyledons, gymnosperms, ferns and mosses, respectively. ITS regions can be found in plants, animals and fungi, and occasionally ITS regions of fungi in medicinal materials were co-amplified, thereby making direct sequencing of the amplified DNA product unsuccessful. The nontranscribed spacer of 5S rRNA is highly variable, and some studies have illustrated that its resolving power is higher than those of the ITS sequences [8]. In the chloroplast genome, the trnH-psbA spacer is a rapidly evolving region suitable for identification at the species level [9]. Other chloroplast DNA loci, including trnL-F, have been demonstrated to be informative at the generic level [10]. In a recent study, trnL-F has also been used to separate Cardiocrinum giganteum from its variant C. giganteum var. yunnanense and their closely related species [11]. Four medicinal Gentiana species, including Gentiana manshurica Kitag., Gentiana scabra Bunge, Gentiana triflora Pall., and Gentiana rigescens Franch., are listed in the Chinese Pharmacopoeia as Gentianae Radix et Rhizoma or "Longdan" in Chinese [12]. They are common medicinal materials used for treating liver diseases [13], and hepatoprotective against acetaminopheninduced acute toxicity [14]. The first three species are mainly distributed in the northeastern part of China and called "Guanlongdan" (GL), while G. rigescens is located in the southwestern part of China and called "Jianlongdan" (JL). The genus Gentiana is divided into 12 sections in China [15]. GL and JL belong to the adjacent sections of Pneumonanthe (Section III) and Monopodiae (section IV), respectively. While different plant species may be used for the same medicinal purpose in Chinese medicine (e.g. Gentiana rhodantha Franch. is frequently used as a substitute in southwestern China), the neurotoxic Podophyllum hexandrum Royle in the family Berberidaceae with a similar morphology is deemed adulterant [16].
This study aims to evaluate the feasibility and limitations of rbcL and matK and five other commonly used DNA regions for authentication of medicinal Gentiana species and their adulterants, G. rhodantha and P. hexandrum. In particular, the sequence divergences and differentiation powers of the tested regions were determined and compared.

Methods
Authentic samples were collected from various regions of China, as identified by Dr. Hui Cao based on morphological characters. (Table 1) [17]. The voucher specimens were deposited in the Institute of Chinese Medicine, The Chinese University of Hong Kong.
The rhizome of each sample (0.05 g) was ground and total DNA was extracted by a modified CTAB extraction method with a minor modification [18] that the DNA pellet was resuspended in 30 μL of water instead of 50 μL of Tris-EDTA buffer. Polymerase chain reaction was performed in a 25-μL mixture. Details of the primer sequences and the respective amplified regions are presented in Table 2. The specific PCR products were isolated from the PCR mixture by a Gel-M™ Gel Extraction System (Viogene, Taiwan). Except for 5S rRNA, the purified PCR products of the DNA barcodes were directly sequenced. The 5S rRNA PCR product was ligated into the pGEM-T Easy vector (Promega, USA) at 25°C for 2 hours. Three to four clones containing the insert were sequenced for each individual sample. A Rapid Plasmid Miniprep System (Viogene, Taiwan) was used rbcLaR GTAAAATCAAGTCCACCRCG [4] matK 3 F KIM f CGTACAGTACTTTTGTGTTTACGAG [4] 1R KIM r ACCCAGTCCATCTGGAAATCTTGGTTC [4] trnH-psbA trnHf CGCGCATGGTGGATTCACAATCC [1] psbA3′f GTTATGCATGAACGTAATGCTC [1] trnL-F Tab C CGAAATCGGTAGACGCTACG [22] Tab F ATTTGAACTGGTGACACGAG [22] rpl36-rps8 rpl36f CACAAATTTTACGAACGAAG [1] rps8r TAATGACAGAYCGAGARGCTCGAC [1] ITS ITS5 GGAAGTAAAAGTCGTAACAAGG [23] ITS4 TCCTCCGCTTATTGATATGC [23] 5S rRNA S1 GGATCCGTGCTTGGGCGAGAGTAGTA [24] AS1 GGATCCTTAGTGCTGGTATGATCGCA [24] for plasmid extraction. The purified PCR products or plasmids were sequenced using a BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, USA). Alignment of the DNA sequences was accomplished by ClustalW using the BioEdit program [19,20], and manual adjustment of the sequence alignment was performed if necessary. The genetic distance among samples was determined by the nucleotide model Kimura 2-parameter in MEGA 5 software [21]. All distances were calculated from pairwise global alignments, in which alignment Gentiana sequences downloaded from GenBank were converted into their reverse complement before alignment against 3 F Kim f.
gaps and missing data were eliminated by choosing the "pairwise deletion option". If the minimum sequence divergence between two groups of species was larger than the maximum intraspecific sequence divergence of the two groups of species, the discrimination was considered successful. Phylogenetic trees of the seven loci were constructed by MEGA5 with the neighbor-joining (NJ) method [21]. Bootstrap analyses for 1000 replicates were performed to provide confidence estimates for the tree topologies.

Sequence divergences of the seven DNA regions
The sizes of the seven loci (excluding the primerbinding sites) of the examined species are shown in Table 4. The sizes ranged from 239 to 940 bp, with most falling between 400 to 800 bp as the optimum range for routine PCR. The lengths of the protein-encoding genes rbcL and matK were identical across the samples, while the five intergenic spacers were found to be varied. To show the discriminative powers of the seven DNA regions, we compared the sequence divergence of (1) medicinal Gentiana species (G. scabra, G. manshurica, G. triflora and G. rigescens) and their adulterants (G. rhodantha and P. hexandrum); and (2) GL (G. scabra, G. manshurica and G. triflora) and JL (G. rigescens) ( Table 5). When comparing the divergences between medicinal Gentiana species and their adulterants, 5S rRNA had the highest divergence values, both interspecifically and intraspecifically, while rbcL had the lowest values ( Table 5). The minimum divergence values of rbcL, matK, trnH-psbA, trnL-F, rpl36-rps8, ITS and 5S rRNA between medicinal Gentiana and P. hexandrum   Since the maximum intraspecific divergences of the seven loci were lower than the interspecific divergences, all of them could be employed to discriminate between medicinal Gentiana species and their adulterants. The DNA sequences were significantly different in GL and JL. The minimum divergence values of rbcL, matK, trnH-psbA, trnL-F, rpl36-rps8, ITS and 5S rRNA between these two groups were 0.0109, 0.0521, 0.0780, 0.0332, 0.0392, 0.0462 and 0.4897, while the maximum intraspecific divergence values were 0.0018, 0.0042, 0.0101, 0.0026, 0.0000, 0.0043 and 0.0914, respectively. Therefore, GL and JL could be distinguished from each other using any of the seven DNA loci (Table 5). On the other hand, the genetic variability in the three GL species was extremely low for all loci. Only 5S rRNA could differentiate between G. manshurica and G. triflora, while trnL-F could distinguish G. scabra and G. triflora. Table 4 shows the selected polymorphic sites for differentiating among the three GL species. G. triflora, G. scabra and G. manshurica are genetically closely related, and possess the interchangeable medicinal applications.
To confirm the effectiveness of rbcL and matK in the identification of Gentiana species, we included all available Gentiana sequences in NCBI in the analysis of these two barcodes. In total, 14 rbcL sequences (including 10 sequences generated in this study) of 9 Gentiana species and 68 matK sequences (including 10 sequences generated in this study) of 23 Gentiana species and subspecies were aligned. For rbcL, the maximum intraspecific divergence value was 0.00215, while the minimum interspecific divergence value was 0. We found that the rbcL sequences of Gentiana andrewsii (HQ590117.1)  As shown in Figures 1, 2, 3, 4, 5, 6 and 7, the NJ trees of the seven barcodes revealed that medicinal Gentiana species were clearly differentiated from P. hexandrum. Among the Gentiana species, the three GL species were clustered together as a clade and separated from JL and G. rhodantha with high supporting bootstrap values (>70%), suggesting that the species identification among GL, JL and G. rhodantha can be well resolved by the seven DNA barcodes.

Discussion
This study performed a comparative assessment of the discriminative powers of seven DNA regions for the authentication of genetically closely related medicinal Gentiana species and their adulterants. rbcL and matK are the two recommended DNA barcodes that can resolve 72% of land plants when used in combination [4]. In our study, however, rbcL provided the lowest intraspecific and interspecific divergences. There were only 6 bp that differed out of 553 bp between GL and JL. It has also been shown that rbcL is the least divergent locus among 11 DNA barcode candidates for differentiating species in Solanaceae [1].
The other CBOL-recommended barcode matK had higher sequence divergence, but was difficult to amplify  by PCR. There were mismatches between the primer and the published Gentiana sequences, indicating that the recommended matK primers might not be applicable to all land plants. A recent study of medicinal plants in Southern Morocco [25] shows that the success rate of PCR amplification of matK is less than 30%. Regarding the resolving power, matK had the third-highest value for differentiating between GL and JL (Table 5). Nevertheless, it was only ranked fifth and sixth for distinguishing between medicinal Gentiana species and their adulterants P. hexandrum and G. rhodantha, respectively.
trnL-F had the longest DNA sequence among the tested loci (Table 4). A Gentiana sample could not be amplified, which was probably due to fragmentation of the DNA or other reasons. trnL-F had a high resolving power, and was the only locus capable of differentiating G. scabra from G. triflora (Table 4), suggesting trnL-F as a good locus for differentiation of the closely related Gentiana species.
The size of rpl36-rps8 was small among the seven loci (Table 4) The PCR product of P. hexandrum was about 200 bp larger than those of Gentiana. Thus, the size difference could be used as a marker to distinguish Gentiana from P. hexandrum without DNA sequencing. Like rbcL, rpl36-rps8 also had low interspecific and intraspecific divergences, although its ranking was slightly higher than that of rbcL. Its major drawback was the limited number of reference sequences in GenBank.
The size of the trnH-psbA region ranged from 399 to 646 bp, which was moderate among the seven DNA regions (Table 4). There was a significant size difference between Gentiana and Podophyllum. In terms of the resolving power, trnH-psbA had ranked second for differentiating GL from JL, and provided higher resolving power than matK and rbcL. This intergenic spacer also shows a good amplification success rate and discrimination power among the nine loci tested [1]. Among 19 species in seven families of angiosperms, trnH-psbA shows nearly three-fold higher divergence than other tested chloroplast regions, while the ITS region exhibits two-fold higher divergence than trnH-psbA [1].
Some studies [26][27][28] show that nuclear ITS is an appropriate DNA marker for herbal authentication and plant phylogenetic studies. In our study, the ITS region was the third longest region across Gentiana and P. hexandrum, and the sizes differed slightly from one another ( Table 4). The divergence ranking was average among the five Gentiana species, but increased to the second highest for distinguishing medicinal Gentiana and P. hexandrum (Table 5), indicating that the ITS regions among the studied Gentiana species were quite conserved.
The size of the 5S rRNA intergenic spacer regions ranged from 239 to 457 bp, which was the smallest but most varied (Table 4). Among the tested regions, only 5S rRNA could distinguish G. triflora from G. manshurica and G. scabra. Our study showed that the intraspecific divergence was high, which was probably due to the non-homogeneity of the different copies of the 5S rRNA gene spacer. It is essential to clone the amplified PCR product prior to sequencing to overcome the sequence degeneration issue.
Jiang et al.  manshurica and G. scabra are nearly identical, except that the former has a higher sweroside content [29]. The chemical profiles therefore support our observations in the DNA barcode analyses.

Conclusion
All the tested loci could differentiate medicinal Gentiana species from their adulterants, and distinguish GL from JL. The two official DNA barcodes, rbcL and matK, have no significant advantages over the remaining five loci examined.  Submit your manuscript at www.biomedcentral.com/submit