Fingerprint analysis of phenolic acid extract of Salvia miltiorrhiza by digital reference standard analyzer with one or two reference standards

Background Fingerprint analysis and simultaneous multi-components determination are crucial for the holistic quality control of traditional Chinese medicines (TCMs). Yet, reference standards (RS) are often commercially unavailable and with other shortages, which severely impede the application of these technologies. Methods A digital reference standard (DRS) strategy and the corresponding software called DRS analyzer, which supports chromatographic algorithms, spectrum algorithms, and the combination of these algorithms, was developed. The extensive function also enabled the DRS analyzer to recommend the chromatographic column based on big data. Results Various quality control methods of fingerprints of 11 compounds in polyphenolic acid extract of Salvia miltiorrhiza (S. miltiorrhiza) were developed based on DRS analyzer, involving relative retention time (RRT) method, linear calibration using two reference substances (LCTRS) technique, RRT combined with Photon Diode Array (PDA) method, LCTRS combined with PDA method. Additionally, the column database of samples was established. Finally, our data demonstrated that the DRS analyzer could accurately identify 11 compounds of the samples, using only one or two physical RSs. Conclusions The DRS strategy is an automated, intelligent, objective, accurate, eco-friendly, universal, sharing, and promising method for overall quality control of TCMs that requires the usage of fewer RSs.

RS. Furthermore, due to the low content of these compounds in TCMs, the preparation of the RS requires a large quantity of TCMs and organic solvents, which is not eco-friendly.
However, the substitute RS method used in the holistic quality control of medicines still has some problems. In particular, the qualitative analysis of chromatographic peaks is the critical issue and the most challenging problem of substitute RS method. For this part, the RRT method and ERS method were adopted by the Pharmacopoeia of several countries, such as Chinese Pharmacopoeia, European Pharmacopoeia, etc. Yet, the drawbacks of the RRT method are large retention time (t R ) deviation and poor column durability. Also, the reference chromatogram provided by only one chromatographic column by the method of ERS leads to the differences between the actual and reference chromatogram due to the various brands or types of columns. Consequently, scholars have studied the selectivity of reversed-phase columns [26], classified the columns [27,28], and put forward the method of selection system of columns [29,30] to solve the problem of blind selection of columns. Nonetheless, the problem of a large prediction deviation of the RRT method has not yet been fundamentally solved.
Compared with the RRT method, the LCTRS method could reduce the deviation of t R prediction [13][14][15]. However, there is still a challenge for improving the prediction accuracy of t R , especially under the circumstances of different types of compounds, or with experiments that are conducted by columns with large differences in retention performance, which may even result in the reverse order of peaks [18]. PDA method may solve the problem of large deviation or reversed the order of peaks to some extent. However, it is difficult to effectively share data or objectively evaluate data in different laboratories, due to a lack of uniform PDA data exchange format among different brands of chromatography workstations [16,17].
To solve these problems, we introduced the concept of the digital reference standard (DRS) in our previous study [31]. In the present study, a strategy for holistic quality control of TCM was proposed by the DRS analyzer using a phenolic acid extract of Salvia miltiorrhiza as an example. DRS analyzer is an algorithm software, which was developed to support the chromatographic algorithm methods of RRT and LCTRS, similarity algorithm of PDA spectrum, as well as the combination of different algorithms mentioned above. It is also a multidimensional database, which stores all the original data of the HPLC chromatogram and PDA spectrum during the establishment of the method. These data are not only useful for the calculation by software. Still, they are also crucial for searching and comparison of the chromatographic data by users, finally realizing the recommendation of column based on these data and improving the reproducibility and accuracy of the holistic quality control method. Phenolic acid extract of S. miltiorrhiza is the extract of Salviae Miltiorrhizae Radix (Danshen in Chinese), a popular TCM. Salviae Miltiorrhizae Radix is also used as a dietary supplement in other Asian countries, as well as in Europe and America. The design, algorithm, application, and characteristics of DRS analyzer were discussed in this study. Also, a series of quality control methods of fingerprint involving 11 compounds of polyphenolic acid extract of S. miltiorrhiza were developed based on DRS method.

Chemicals and reagents
The phenolic acid extract of S. miltiorrhiza was obtained from the National Institutes for Food and Drug Control (NIFDC, Beijing, China). RSs of Sodium Danshensu, Salvianolic acid D, and Lithospermic acid were purchased from Shanghai Yuanye Bio-Technology (Shanghai, China). Reference standards of Protocatechuic aldehyde, Caffeic acid, Rosmarinic Acid, Salvianolic acid B, Salvianolic acid H/I, Salvianolic acid E, Salvianolic acid L, and Salvianolic acid Y were obtained from NIFDC (Beijing, China).
Ethanol, which was analytical grade, was purchased from Sinopharm Chemical Reagent (Shanghai, China). Acetonitrile, methanol, phosphoric acid, and formic acid, which were chromatographic grade, were purchased from Fisher Scientific (Pittsburgh, PA, USA). Deionized water was prepared by a Milli-Q system (Millipore, Bedford, USA).

Instruments and chromatographic conditions
Chromatographic analysis was performed on Agilent 1260 high-performance liquid chromatography with a DAD detector, ChemStation online control, and offline analysis workstation (Agilent, Santa Clara, CA, USA). Twenty-two columns ( Table 1) from seven manufacturers were randomly selected. It is recommended to use at least ten columns from three manufacturers for DRS method research.

Preparation of sample and reference standard solution
The solvent used to dissolve and storage the sample was 25% ethanol-water solution, with pH adjusted to 2.0 by formic acid. The phenolic acids were relatively stable under this condition.
Appropriate amounts (above 16 mg) of phenolic acid extract of S. miltiorrhiza and 10 ml solution mentioned above were put into a conical flask, shaken and filtered through a 0.22 µm membrane before use.
An appropriate amount of 11 RSs, including sodium Danshensu, protocatechuic aldehyde, caffeic acid, salvianolic acid D, salvianolic acid E, salvianolic acid H/I, rosmarinic acid, lithospermic acid, salvianolic acid B, salvianolic acid L, and salvianolic acid Y were dissolved by the solution mentioned above to obtain the reference standard solution.

Software development Data format
DRS Analyzer supports the NetCDF (ANDI) data format [32], which is used for the exchanging and reading of chromatography and spectrometry data. The spectrum data from the PDA detector adopts an extended ANDI format [18]. HPLC instrument vendors such as Agilent and Waters have provided support for PDA spectrum exchanging with the extended ANDI format in their chromatographic workstation through macro or software upgrade.

Program design
DRS analyzer is developed with C + + language, and Model View Controller (MVC) framework is adopted. It supports the chromatographic algorithm, PDA spectrum algorithm, as well as the combination of different algorithms mentioned above. The chromatographic algorithm includes the RRT method using one RS and the LCTRS method using two RSs. RRT is the ratio between tR of the analyte to the reference compound, which is the reference value for calculating the t R of an analyte. As RRT, St R is also the reference value. But St R is not the ratio; it is the arithmetic average of t R for the same compound on different HPLC systems under the same chromatographic conditions [14]. Also, there is a linear relationship between t R and St R for all compounds [14], as shown in Fig. 1. For the LCTRS method, t R of the two RSs and St R are substituted into linear equation [as expressed in formula (1)] to calculate the t R of the analyte [14]. The similarity algorithm of the PDA spectrum is the cosine method [33].
In addition, the software is a multi-dimensional database, which stores all the original data of the HPLC chromatogram and PDA spectrum during the establishment of the method, and the recommendation of the column could be realized based on these data. The method of recommendation for the column is based on correlation, which is different from the existing recommendation method based on causation [14,[27][28][29][30]

Optimization of HPLC conditions and method validation
The mobile phase was investigated, including the separation effects of methanol and acetonitrile, the differences between phosphoric acid and formic acid, and the influences of column temperature. The gradient elution procedures and flow rates were optimized. The selected chromatographic conditions had good resolution, symmetrical peak shape, and reasonable analysis time. Chromatograms of samples were collected on 22 columns under optimized chromatographic conditions. Representative chromatograms and spectra are shown in Figs. 2, 3. The peaks were identified by the RSs, UV-Vis spectrum and mass spectrum.
Methodological validation experiments were performed on the Agilent Zorbax SB C18 column. The precision (n = 6), stability (12 h, n = 6), and repeatability (n = 6) were tested. The results showed that RSD of the t R of the 11 peaks and the peak areas were both less than 3%, thus meeting the requirements of fingerprint analysis.

Initialization for the DRS method
Since the columns of number 1 to 17 could effectively separate 11 peaks of the samples, data on these columns were utilized to initialize the model by steps, as shown in Fig. 4. The first step was data importing. The chromatographic data and corresponding of the samples on columns 1 to 17 were imported into the software, and integration operations such as adding and deleting peaks were performed. The chromatographic data were in ANDI format, with the file name extension ".cdf ". The spectral data were in extended ANDI format, with the file name extension ".nc". The PDA data was optional. The second step was the peak assignment. Names of the 11 compounds were input into the software, and then the corresponding peaks of the 17 columns and the compounds (the red box part of Fig. 5) were matched one-to-one. The third step was setting the qualitative chromatographic method, taking LCTRS as an example. The t R window of the peak was set to 1 minute. If the t R deviation for the peak was ≤ t R window, the peak could  be identified. In this study, peak 1 and peak 9 (recommended to select the peaks close to the first peak and last peak respectively, including the first peak and last peak as well) were selected as two reference compounds, as shown in the green box of Fig. 5. The spectral data were available in the present study, and the fourth step was to establish a spectral qualitative method. As shown in the area of the blue box in Fig. 5, the synthesized spectrum was selected as a spectral matching method, and the similarity threshold was set to 0.95.

Optimization and evaluation of DRS method Selection of reference compound
Since the selection of the reference compound can significantly affect the accuracy of the RRT and LCTRS method to calculate the t R , the optimization was needed. According to our previous studies [14,34], the general principles for RRT and LCRRS method to select reference compounds were as follows: the t R coverage of the reference compounds was 50-100%, and their non-linear deviation was small enough. The coverage of t R was a reflection of the relative position of reference compound between the first compound and the last compound. For the LCTRS method and RRT method, the calculation of the coverage method was expressed in formula (2, 3), respectively. Since there were various marker compounds in the overall quality control method, even if following the above principle, a large amount of calculation was still required to obtain the optimal reference compounds for the sample under certain chromatographic conditions t R2 is t R (or St R ) of second reference compound; t R1 is t R (or St R ) of first reference compound; t Rlast is t R (or St R ) of last compound; t Rfirst is t R (or St R ) of first compound [14] (2)  t Rreference is t R of reference compound; t Rlast is t R of the last compound; t Rfirst is t R of the first compound [34].
In the present study, 11 marker compounds and a total of 55 reference compound pairs were obtained, among which about 20 pairs were with t R coverage more than 50%. The software's method optimization function provided the top 10 reference compound pairs with the highest accuracy, as shown in Table 2. It was revealed that the t R deviation (average deviation of 11 peaks on 17 columns) of the reference compound pair peak 1 and peak 9 was 0.304 min, and the identification rate was 99.5%, ranking 9th. However, the best pair was peak 3 and peak 9, with t R deviation being 0.258 min and identification rate being 99.5%. In comparison, the optimal combination reduced the deviation by 0.046 min.

Adjustment of t R window
Obviously, on one hand, the smaller the t R window, the more accurate the method was, but on the other hand, the fewer the applicable columns were. The optimal t R window could be determined by the statistical results in the software's method optimization function. According to Table 3, which showed the average t R deviation on 17 columns of different peaks, the average t R deviation of No.1 to 10 was less than 0.3 min, but for No.11, it was 0.6 min. Therefore, it might be appropriate to set a t R window of 0.8 min to cover the t R deviation of all peaks.
To verify this value, different t R windows were set; the t R deviation (average deviation of 11 peaks) and identification rates on different columns are summarized in Table 4; Fig. 6. The obtained results revealed that the windows of 0.3 min and 0.5 min were so narrow that the identification rate was less than 93%, and only a few columns were available, with a proportion less than 53%. Furthermore, the identification rates of 1.5 min and 2.0 min and the available columns were more than 99% and 94%, respectively, and the t R window was considerably large; however, there was a risk of misjudgment. It was demonstrated that 0.8 min and 1.0 min were near the inflection point, being a good balance for both the accuracy and the applicability. Finally, 0.8 min was selected.
Each peak can be set its own t R window. For example, a window of 0.8 min could be set for peak 11 and 0.5 min for the other peaks. Smaller t R windows were used for the other peaks in this study, which further improved the accuracy of the method and reduced the misjudgment rates.
When the PDA spectrum qualitative function was available, the t R window could be widened. In the current study, it was set to 1.5 min according to the results of Table 4. According to our previous study, t R window was set to 0.5 min [13], 0.6 min, 1.2 min [14], 0.3 min [15] and 0.7 min [18], respectively. Therefore, when only the chromatographic qualitative function was used, the t R    window was recommended to be 0.5 to 1.0 min. However, when the PDA spectrum function was obtained as well, it could be widened to 0.5-1.5 min.

Comparison of different methods
The software could provide four methods for peak identification, including the RRT method, LCTRS method, RRT combined with the PDA method, and LCTRS combined with the PDA method. The conditions of the four methods optimized according to "3.3.1" and "3.3.2" are shown in Table 5. Taking Col15 (sunfire C18) as an example, Fig. 7a, b showed the results of RRT and LCTRS combined with PDA methods, respectively. The peak identification results in the red box indicated that Salvianolic acid B was incorrectly identified as Salvianolic acid L by the RRT method. Meanwhile, the two peaks of Salvianolic acid L and Salvianolic acid Y could not be identified due to the large t R deviation. Yet, LCTRS combined with the PDA method, accurately identified all peaks. Additionally, the green box revealed the t R deviation of each peak and the similarity of PDA. The blue box provided linear fitting results of t R . The yellow box showed the results of the PDA spectrum. The case suggested that LCTRS combined with the PDA method was superior to the RRT method.
The comparison results of t R from column 1 to 17 by the four optimized methods mentioned above are summarized in Table 6. For the number of positive columns (t R deviation ≤ t R window and/or PDA similarity ≥ similarity threshold), it was demonstrated that LCTRS combined with PDA method was the best, with the smallest average t R deviation, the highest identification rate, and the largest amount of available columns. However,   LCTRS ranked the highest when only the chromatographic algorithm was used.

Sample tests
Considering the overlap of Salvianolic acid D peak and Salvianolic acid E peak in the chromatogram on columns 18-22, these columns were used for sample testing rather than method establishment. Three steps were included for sample testing. Firstly, the chromatographic and spectral data were introduced, and the peaks were integrated. Secondly, the reference compounds (peak 3 and peak 9) in the sample chromatogram were assigned. Thirdly, the results were obtained after running the method. The sample test results were exhibited in the same way as shown in Fig. 7, which included the qualitative results of peaks, qualitative result tables, linear fitting results, and spectrum. The peak qualitative results on column Agilent TC-C18 (2) of the four methods are shown in Fig. 8 and A shows the results of the RRT method, which had the smallest t R deviation of 0.110 min. Nevertheless, Salvianolic acid B peak was unidentified; Salvianolic acid L peak and Salvianolic acid Y peak were incorrectly identified. Figure 8b shows the results of the LCTRS method, which had the second smallest t R deviation of 0.280 min. Salvianolic acid L peak was correctly identified, but the Salvianolic acid Y peak was incorrectly identified. The RRT, combined with the PDA method (Fig. 8c) and the LCTRS combined with the PDA method (Fig. 8d) had the same identified results. As shown in figures, the Salvianolic acid L peak and Salvianolic acid Y peak were both correctly identified by the two methods. Still, the LCTRS, combined with the PDA method, had a smaller t R deviation of 0.293 min. Table 7 shows a summary of the comparison results of the four methods established on five columns revealing that the RRT method was still the worst method with the lowest identification rate of 72.7%. On the other hand, LCTRS combined with the PDA method remained the optimal method with a smaller t R deviation of 0.240 min and the highest identification rate of 80.0%.

Column recommendation by database
In the study of the HPLC analysis method, a lot of chromatographic data on different columns are generally collected. However, only the information of column type, such as C18, is provided by the legal standard method. In contrast, data of the brand of the column or related chromatograms are not shown. Nevertheless, these data are indeed valuable, and differences between more useful data (such as with better separation effect, shorter separation time, smaller t R deviation, lower cost of the column) and common data are also meaningful. Therefore, based on the idea of big data, these available data were stored as a part of DRS and used for column recommendation.
Positive and negative columns were defined for column recommendation. Positive columns were referred to columns on which all peaks could be effectively separated and identified. Negative columns were columns on which some peaks could not be separated or identified. In this study, 11 compounds could not be effectively separated on column 21; therefore, this column was considered a negative column for all the four methods (Fig. 8). Column 15 was a positive column for LCTRS combined with the PDA method (Fig. 7b); however, it was negative for the RRT method due to the large retention time deviation of certain compounds (Fig. 7a). For better analysis method reproducibility, future studies should choose the positive column instead of the negative one. For columns that are not on the list of positive or negative columns used, the results, chromatographic data, and PDA spectrum of the column are also meaningful. They can be applied to upgrade and improve the DRS method. Obviously, the positive or negative columns are distinguished for different medicines, different chromatographic conditions, and even for different peak identification methods for the same medicine. The list of the positive and negative columns for the phenolic acid extract of S. miltiorrhiza for the four methods is shown in Table 8, while more detailed information is presented on the software database.

Discussion
In the current study, the offline version of the DRS analyzer was used. In order to improve the convenience of data updating and data sharing, an online version should be developed in the future. The future direction of DRS is expected to be with big data, based on which the artificial intelligence could be introduced. In addition, specifications and the guideline of DRS should be studied in the future so as to ensure the authenticity, accuracy, and reliability.

Conclusions
To the best of our knowledge, the present study is the first that developed a DRS strategy. A series of quality control methods of fingerprints in the phenolic acid extract of S. miltiorrhiza was developed based on the DRS analyzer, involving the RRT method, LCTRS method, RRT combined with PDA spectrum method, and LCTRS combined with PDA spectrum method. In addition, the column database of samples was also established. The obtained results revealed the LCTRS combined with the PDA spectrum as an optimal way. The results also demonstrated that DRS analyzer could accurately identify 11 compounds of the samples, using only one or two physical RSs. The strategy significantly reduced the analysis cost and ensured the accuracy and reproducibility of the analysis method. The DRS strategy adopted in this study has the following advantages. (1) the software automatically processes data, instead of the complex manual calculation, thus saving time and avoiding mistakes in calculation than RRT method and LCTRS method. (2) The results are objective and consistent, avoiding the subjectivity of manual identification than RRT method, ERS method, and LCTRS method. (3) The chromatographic and spectral data formats supported by the software are universal and compatible with mainstream chromatograph workstations; therefore, the popularization and application of the method can be easily realized. (4) It is compatible with a variety of substitute RS methods (such as RRT  method, ERS method, and LCTRS method) and supports chromatographic algorithms, spectrum algorithms, and the combination of these algorithms, which has complementary advantages of each method. (5) DRS analyzer is based on the idea of big data to realize the recommendation of the column for different medicines, different chromatographic conditions and different peak identification methods (such as RRT method and LCTRS method) for the same medicine. In summary, the DRS strategy can effectively reduce the cost of RSs, and achieve higher accuracy and reproducibility than the single substitute RS method. Moreover, it is automated, intelligent, objective, accurate, eco-friendly, universal, sharing, and promising, thus representing a feasible method for overall quality control (such as fingerprint analysis and simultaneous multi-components determination) of TCMs and herbal medicines on different chromatographic columns.

Abbreviations
TCMs: Traditional Chinese Medicines; RS: Reference standards; ERS: Extractive reference substance; DRS: Digital reference standard; RRT : Relative retention time; LCTRS: Linear calibration using two reference substances; PDA: Photon diode array; t R : Retention time.