Molecular identification of Uncaria (Gouteng) through DNA barcoding

Background While DNA barcoding is an important technology for the authentication of the botanical origins of Chinese medicines, the suitable markers for DNA barcoding of the genus Uncaria have not been reported yet. This study aims to determine suitable markers for DNA barcoding of the genus Uncaria (Gouteng). Methods Genomic DNA was extracted from the freshly dried leaves of Uncaria plants by a Bioteke’s Plant Genomic DNA Extraction Kit. Five candidate DNA barcode sites (ITS2, rbcL, psbA–trnH, ITS, and matK) were amplified by PCR with established primers. The purified PCR products were bidirectionally sequenced with appropriate amplification primers in an ABI-PRISM3730 instrument. The candidate DNA barcodes of 257 accessions of Uncaria in GenBank were aligned by ClustalW. Sequence assembly and consensus sequence generation were performed with CodonCode Aligner 3.7.1. The identification efficiency of the candidate DNA barcodes was evaluated with BLAST and nearest distance methods. The interspecific divergence and intraspecific variation were assessed by the Kimura 2-Parameter model. Genetic distances were computed with Molecular Evolutionary Genetics Analysis 6.0. Results The accessions of the five candidate DNA barcodes from 11 of 12 species of Uncaria in China and four species from other countries were included in the analysis, while 54 of total accessions were submitted to GenBank. In a comparison of the interspecific genetic distances of the five candidate barcodes, psbA–trnH exhibited the highest interspecific divergence based on interspecific distance, theta prime, and minimum interspecific distance, followed by ITS2. The distribution of the interspecific distance of ITS2 and psbA–trnH was higher than the corresponding intraspecific distance. Additionally, psbA–trnH showed 95.9 % identification efficiency by both the BLAST and nearest distance methods regardless of species or genus level. ITS2 exhibited 92.2 % identification efficiency by the nearest distance method, but 87 % by the BLAST method. Conclusion While psbA–trnH and ITS2 (used alone) were applicable barcodes for species authentication of Uncaria, psbA–trnH was a more suitable barcode for authentication of Uncaria macrophylla. Electronic supplementary material The online version of this article (doi:10.1186/s13020-015-0072-7) contains supplementary material, which is available to authorized users.

DNA barcoding can accurately identify species on the basis of short standardized genes or DNA regions [12,13], without confounding factors such as environmental influence, growth phase, and morphological diversity within species [14][15][16]. The mitochondrial gene encoding cytochrome c oxidase subunit 1 (co1) is a potential DNA barcode in most animal species as well as some fungal species. However, the co1 gene and other mitochondrial genes from plants have limited use in identifying plant species across a wide range of taxa, due to their low genetic variations and variable mitochondrial genomes [17]. Several DNA regions, such as ITS2, psbA-trnH, matK, rbcL, ITS, ycf5, and rpoC1 [14,[18][19][20][21] have been evaluated as potential DNA barcodes in medicinal plants. Among these candidate barcoding loci, the ITS2 locus not only had the highest identification efficiency among all tested regions, but also discriminated a wide range of plant taxa [14,22]. By contrast, ITS1 was a useful barcode for identifying Salvia species [23]. The psbA-trnH intergenic region was a suitable DNA marker for identification of flowering plants [17,18], pteridophytes [24], Lonicera japonica Thunb from Caprifoliaceae [21], and aquatic plant species [25].
The authentication of the botanical origins of Gouteng is based on the morphological characteristics, microscopic structures, or chemical components of specimens [26]. The accuracy is often affected by environmental and subjective factors, especially for dry medicinal materials from different origins [26]. Chemical analysis methods, such as high-performance liquid chromatography (HPLC) and HPLC coupled with quadrupole time-offlight mass spectrometry, have also been studied [27]. Multiple genetic molecular markers have been used to screen Uncaria, such as random amplified polymorphic DNA (RAPD) and rDNAs (including 5.8S rDNA, ITS1, and ITS2) [28].
This study aims to determine suitable markers for DNA barcoding of the genus Uncaria. In this study, five candidate loci (ITS2, rbcL, psbA-trnH, ITS, and matK) were tested for their potential as DNA barcodes for Uncaria.

Plant materials
Fifty-four sequences from our laboratory (all submitted to GenBank), among which 12 samples of six species of  [7,10]. All of the voucher specimens (all the voucher numbers can be seen in Table 1) were deposited in the Key Laboratory of Biological Molecular Medicine Research of Guangxi Higher Education, Guangxi Medical University.
In total, 257 accessions related to the five candidate DNA barcoding sites (ITS2, rbcL, psbA-trnH, ITS, and matK) from 89 samples belonging to 15 species of Uncaria were analyzed in this study. All accession data were downloaded from GenBank, except for the above 54 sequences, which were amplified and sequenced in our laboratory. All datasets of Uncaria species used in the study contained more than two samples, except for Uncaria africana, Uncaria guianensis, and Uncaria lanosa. Some accessions in which the sequences contained undetermined bases or were from sp. species (taxa of species unclear or unnamed) were not selected. In this study, the correctness of the accessions downloaded from GenBank was tested through blasting against those of congener plants. Only the sequences with both a similarity ratio and query cover ratio higher than 90 % in the same species were suitable for selection. However, some accessions containing inversion sequences were collected in this dataset because they could influence the sequence divergence and supply some important genetic characters [29]. The total data and sample information used in this study are shown in Table 1.

DNA extraction, PCR amplification, and sequencing
In this study, genomic DNA was extracted from the freshly dried leaves of Uncaria plants by the improved protocol of a new rapid Plant Genomic DNA Extraction Kit (centrifugal column type, DP3112; Bioteke Corporation, Beijing, China). The Uncaria leaves were ground in liquid nitrogen, and the cell nuclear separation solution (3 ml for 0.5 g sample) was immediately added to the samples to remove impurities from the cytoplasm before the cell nuclei were lysed [30]. PCR amplification of the five candidate DNA barcode sites was performed in a Tprofessional Gradient 96 Type (Biometra, Göttingen, Germany) with approximately 30 ng of genomic DNA as a template in a 25-µL reaction mixture. Each reaction contained 1 × PCR buffer (2.0 mM MgCl 2 , 0.2 mM each dNTP, 0.1 µM each primer; synthesized by

Sequence alignment and data analysis
Sequence assembly and consensus sequence generation were performed by CodonCode Aligner 3.7.1 (Codon-Code Co., MA, USA) by trimming the low quality sequence and primer areas. The matK and rbcL regions were delimited by alignment with known sequences in databases by CodonCode Aligner. After removal of the psbA and trnH genes at the ends of psbA-trnH, the boundary of the psbA-trnH intergenic spacer was determined according to the annotations of similar sequences in GenBank. The five candidate DNA barcodes were aligned by ClustalW (EMBL-EBI, Heidelberg, German). Kimura 2-Parameter (K2P) genetic distances were computed with Molecular Evolutionary Genetics Analysis 6.0 (The Biodesign Institute, AZ, USA) [31]. All interspecific and intraspecific distances, including theta prime, minimum interspecific distance, theta, and coalescent depth for all accessions of each locus, were calculated and compared to evaluate the interspecific divergence and intraspecific variation by the K2P model. Meanwhile, statistical analysis of the distribution divergency of the genetic distance between different sequences was performed through the Wilcoxon signed-rank test to assess the barcoding gap for different candidate loci with SPSS software (SPSS 16.0: International Business Machines Corporation Statistical Product and Service Solutions, Armonk, New York, USA), which the test statistical W+ and W− were calculated for two side test, as described previously [14,22]. The BLAST1 and nearest distance methods were used to evaluate the species identification efficiency [32,33].

PCR amplification and base composition of the five loci of Uncaria
The sequence length and GC content of the five candidate loci (ITS2, rbcL, psbA-trnH, ITS, and matK) were obtained from the CodonCode Aligner and Clustal W alignment results ( Table 2). The GC content of psbA-trnH was the lowest, while that of ITS2 was the highest. The variability of the length range of the psbA-trnH intergenic spacer was greater than that of the other candidates. The psbA-trnH region of U. macrophylla was more divergent than that of the other Uncaria plants.

Genetic interspecific divergence and intraspecific variation
Six parameters (Table 3) represented the genetic divergences of species in Uncaria. In a comparison of the intraspecific distances of the five candidate barcodes among Uncaria species, the intraspecific distance of psbA-trnH was higher than that of the other loci at the species level. Meanwhile, the interspecific genetic distance of the psbA-trnH intergenic spacer exhibited the highest divergence according to the interspecific distance, theta prime, and minimum interspecific distance. The interspecific distance of ITS2 was the second highest after psbA-trnH. All interspecific divergences of ITS2, psbA-trnH, and ITS were greatly higher than the corresponding intraspecific divergences. Furthermore, the overall mean distance of psbA-trnH was the highest among the five loci (Fig. 1).
The psbA-trnH intergenic spacer had the highest interspecific divergence among all the loci based on the Wilcoxon signed-rank test. The second highest interspecific divergence was shown by ITS2. The scale of the interspecific divergence of psbA-trnH was higher than ITS2, ITS, matK and rbcL, respectively (all P < 0.001), that of ITS2 was higher than ITS, matK and rbcL, respectively (all P < 0.001, Table 4). Furthermore, the intraspecific divergences between ITS and matK, rbcL and matK, ITS2 and matK, psbA-trnH and matK, and ITS and rbcL did not exhibit any significant differences (P > 0.05, Table 5).

Analysis of barcoding gaps
As a barcode for identifying botanical species, the divergence between species should be higher than the variation within species [34]. Although the histogram of the K2P genetic distance analysis revealed a partial overlap "barcoding gap" between the intraspecific and   . 1 Distribution of overall mean distance for all sequence pairs among five loci. The number at right y axis is the estimates of average evolutionary divergence over all sequence pairs for each locus, which is the base substitutions per site from averaging over all sequence pairs. Analyses were conducted by the maximum composite likelihood method in MEGA6 [31] interspecific divergence of ITS2 or psbA-trnH (Fig. 2), the intraspecific variation of psbA-trnH and ITS2 was considerably lower than the distribution of their interspecific divergence. The genetic divergence distribution of ITS was similar to that of ITS2. No clear "barcoding gap" corresponding to the rbcL or matK loci was observed, wherein the genetic distance distribution of more than 90 % of accessions was less than 0.020. However, the distribution of the interspecific divergence of ITS2 and psbA-trnH provided a better resolution than that of rbcL and matK.

Identification efficiency and characteristics of Clustal W alignment
The BLAST and nearest distance methods were employed to test the applicability of the five loci for species identification of Uncaria. psbA-trnH presented 95.9 % identification efficiency with both the BLAST and nearest distance methods at the species or genus level. ITS2 exhibited 92.2 % identification efficiency by the nearest distance method, but 87 % by the BLAST method, whereas rbcL showed only 76.2 % by the nearest distance method and 42.9 % by the BLAST method (  [31] showed that only four accessions (4/77 accessions) were in the incorrect taxonomic category (Fig. 6), which was less than the other loci tested. Thus, ITS2 could be another suitable DNA barcode for Uncaria.

Significance of authentication of Uncaria by DNA barcoding
Gouteng is commonly exploited as the major ingredient herb of CM prescriptions for hypertension or migraine treatment [2,35]. The amount of stems with hooks of U.   [7]. Therefore, the correct genotypic identification of Uncaria plant material is essential in order to protect public health and for industrial production.
Although some methods have been developed to distinguish Uncaria plants based on morphotype, microcharacter, or physical and chemical reactions [8,9], these are dependent on taxonomy experts. Currently, the genetic molecular markers for the genus Uncaria were related to RAPD, rDNA, and ITS, while DNA barcoding assays have not yet been reported. This study included 11 of 12 species of Uncaria in China, with U. rhynchophylloides missing in the screen for suitable DNA barcodes for Uncaria.
In the present study, psbA-trnH presented 95.9 % identification efficiency for Uncaria accessions tested with both BLAST and nearest distance methods at the species

Quality and amplification efficiency of DNA from Uncaria
The DNA of Uncaria was not extracted efficiently, due to the large amounts of polysaccharides, polyphenols, and alkaloids present in the samples. A cell nuclear separation solution was used to remove the impurities from genomic DNA [30]. The quality of the DNA extracted from the Uncaria plants satisfied the requirements for PCR amplification and sequencing. The efficiency of both PCR amplification and sequencing for psbA-trnH was the highest among the five candidate loci. Specifically, PCR amplification showed 96.7 % efficiency, while sequencing showed 100 % efficiency. Because the average GC content of ITS2 was 66.3 %, which was higher than that of the other loci, the resulting DNA extract was slightly difficult to amplify.

Selection of candidate DNA barcodes
In this study, the length of psbA-trnH of Uncaria ranged from 235 to 315 bp (mean 287 bp), which was longer than that of ITS2, but shorter than that of rbcL, ITS, and matK. Additionally, psbA-trnH of Uncaria exhibited the highest interspecific divergence among the five loci tested, based on the results of six parameters of the K2P model or Wilcoxon signed-rank test of interspecific divergence. The interspecies divergence of psbA-trnH was higher than the relevant intraspecies variation. Furthermore, psbA-trnH of U. macrophylla was significantly distinct from that of U. rhynchophylla and the other species because of two insertion fragments: one was a seven A repeat inserted at 171-177 bp and the other was two cis-repeats of ATTAAA at 233-247 bp (Figs. 3, 4, 5). Although one TAAAAAA repeat was observed at 171-177 bp in psbA-trnH from Uncaria yunnanensis, no double cis-repeats of ATTAAA were observed at 233-247 bp. Meanwhile, one inversion sequence of length 73-74 bp with identity ratios of more than 98 % in psbA-trnH of Uncaria was found in this study (Additional file 2). The intragenic variation of the genus Uncaria was large because of this inversion phenomenon existing in psbA-trnH. This situation was also observed in psbA-trnH of Aconitum L. [29]. The characteristics of the insertion sequences in psbA-trnH could effectively authenticate Uncaria species.
ITS2 was another suitable locus for distinguishing different species of Uncaria. The length range of ITS2 was 210-221 bp (mean 219.9 bp), which was the shortest among the five loci. Consequently, 95.8 % efficiency could be reached by PCR amplification. In a comparison of the interspecific genetic distances of the five candidate barcodes among Uncaria species, the mean interspecific distance of ITS2 was higher than its mean intraspecific divergence, and the values were second only to those of psbA-trnH (Table 3). Based on the phylogenetic analysis of ITS2 by the neighbor-joining method and the evolutionary distances computed by the Maximum Composite Likelihood model, more than 93 % of Uncaria at the species level in this study were divided into monophyla as recognized species. Among 77 accessions of ITS2, comprising 14 species of Uncaria, only four accessions were in an incorrect taxonomic category, according to the construction of a phylogenetic tree for ITS2 (Fig. 6). Uncaria manifested complex morphological features and genetic backgrounds, and even some specimens with obvious differences in appearance possessed similar ITS sequences [28]. This could explain the existence of some accessions that appeared in different monophyla from their original morphological taxa. Some species submitted to Fig. 6 Phylogeny tree of Uncaria ITS2. The evolutionary history was inferred using the neighbor-joining method, the evolutionary distances were computed using the maximum composite likelihood model. Only four accessions labeled by triangular, square or circular symbol were incorrectly taxonomic category GenBank may have been wrongly categorized. Sequences with lengths of less than 100 bp, those with ambiguous bases containing more than one "N", or those belonging to unnamed species (such as those with spp. and aff. in the species name) were excluded [20] from this study to guarantee the reliability of the selected sequences.
A better "barcoding gap" was observed between the interspecific divergence and intraspecific variation of ITS2 compared with the other loci. ITS, which contained three fragments (ITS1, 5.8S rDNA, ITS2), exhibited a similar identification efficiency to that of ITS2. Both rbcL and matK were unsuitable genetic loci for authentication of the botanical origins of Gouteng, because of the absence of a clear barcoding gap between the interspecific divergence and intraspecific variation by the K2P model. The overall mean distance of rbcL was only 0.002 and that for matK was 0.005, as computed by the Maximum Composite Likelihood model (Fig. 1). Moreover, we found that the combination of psbA-trnH with ITS2 would provide a better result for the authentication of Uncaria plants, and could even distinguish between incorrect and correct taxa or identify some cryptic species. Currently, a preliminary system for DNA barcoding of herbal materials has been established based on a two-locus combination of ITS2 and psbA-trnH barcodes [36]. Recently, ITS2 was successfully exploited in a survey involving commercial Rhodiola products, including decoction pieces [37].
psbA-trnH and ITS2 also exhibited high authentication power for different species of Uncaria. Both psbA-trnH and ITS2 revealed the distinct divergence of U. macrophylla from U. rhynchophylla and the other species at the species level.

Conclusion
While psbA-trnH and ITS2 (used alone) were applicable barcodes for species authentication of Uncaria, psbA-trnH was a more suitable barcode for authentication of U. macrophylla.