A simple method for HPLC retention time prediction: linear calibration using two reference substances

Background Analysis of related substances in pharmaceutical chemicals and multi-components in traditional Chinese medicines needs bulk of reference substances to identify the chromatographic peaks accurately. But the reference substances are costly. Thus, the relative retention (RR) method has been widely adopted in pharmacopoeias and literatures for characterizing HPLC behaviors of those reference substances unavailable. The problem is it is difficult to reproduce the RR on different columns due to the error between measured retention time (tR) and predicted tR in some cases. Therefore, it is useful to develop an alternative and simple method for prediction of tR accurately. Methods In the present study, based on the thermodynamic theory of HPLC, a method named linear calibration using two reference substances (LCTRS) was proposed. The method includes three steps, procedure of two points prediction, procedure of validation by multiple points regression and sequential matching. The tR of compounds on a HPLC column can be calculated by standard retention time and linear relationship. Results The method was validated in two medicines on 30 columns. Conclusion It was demonstrated that, LCTRS method is simple, but more accurate and more robust on different HPLC columns than RR method. Hence quality standards using LCTRS method are easy to reproduce in different laboratories with lower cost of reference substances. Electronic supplementary material The online version of this article (doi:10.1186/s13020-017-0137-x) contains supplementary material, which is available to authorized users.


Background
Multi-components analysis is an effective strategy for quality control of traditional Chinese medicines (TCMs), which have complex chemical profiles. But the classic external standard method was severely confined in its application due to the high cost of reference substances. As a consequence, substitute reference substance methods such as extractive reference substance (ERS) method and single standard to determine multi-components (SSDMC) method for overall quality control of TCMs have emerged, and widely used in Chinese pharmacopoeia 2015 edition, the United States Pharmacopoeia (USP39-NF34) and literatures [1][2][3][4][5][6][7][8][9][10]. In general, ERS method provides only one reference chromatogram in the pharmacopoeias, instructions of ERS and literatures. But there are hundreds of brands of C 18 columns in the market. It means that the reference chromatogram may be different from the actual chromatogram. Due to the column types and other various factors, the error between measured retention time (t R ) and predicted t R by the relative retention (RR) method cannot be ignored sometimes.
In order to improve the reproducibility of chromatographic separation and RR, the method of classification of C 18 columns has been proposed [11][12][13][14][15]. The columns were divided into three types: A, B and EP. Although the

Open Access
Chinese Medicine *Correspondence: masc@nifdc.org.cn same type of columns was used to repeat the analytical methods, the differences in the performance and the separation effects were still large. And then the methods for selecting columns with equivalent selectivity, such as the USP approach [16], the PQRI approach [17,18] and Katholieke Universiteit Leuven column classification system [19][20][21] were proposed. Take PQRI approach [17,18] as an example, hydrophobicity (H), steric interaction (S), hydrogen-bond acidity (A), hydrogen-bond basicity (B) and ion-exchange capacity (C), were used to describe the performance of the column. And the similarity between a column and the reference column was calculated by these five parameters. When the similarity was less than three, the two columns were regarded to be equivalent. Using the equivalent column, the reproducibility of separation and RR could be improved to some extent. However, in addition to column, many other factors also have great influences on the chromatograms, such as the dead volume of chromatographic system, the different structure of analytes, the complexity of the chromatographic conditions, and so on. Therefore, it is necessary to develop a method that takes all aforementioned factors into account to reduce the prediction error of the t R .
According to the thermodynamic theory of liquid chromatography, there is a linear relationship between the t R of the compounds on two different HPLC systems (including chromatographs and columns) [22]. For better understanding, the pdf of reference 22 (in Chinese, Additional file 1) and the English version of reference 22 (only the section of theory was translated, Additional file 2) are provided. Combined with the above principle and previous studies [23][24][25], a novel method using two reference substances for predicting HPLC t R has been proposed (linear calibration by two reference substances, LCTRS). The St R (arithmetic average of t R for the same compound on different HPLC system under the same chromatographic conditions) is used as the reference value, and the linear regression is used as the basic algorithm for t R prediction. In this study, the method was validated in two medicines on 30 C 18 columns. Compared with the RR method, LCTRS method is proved to be more accurate, and more robust on different HPLC columns. Hence, it provides a good prospective application in quantification of multi-components in TCMs as well as related substances in pharmaceutical chemicals.

Methods
The Minimum Standards of Reporting Checklist contains details of the experimental design, and statistics, and resources used in this study (Additional file 3).

Instruments and reagents
Waters e2695 HPLC (2998PDA detector), Agilent 1260 HPLC (DAD detector), and Shimadzu LC-2010A HT HPLC (UV-Vis detector) were used. Matlab software was provided by Math Works Inc. USA. 30 C 18 columns (shown in Table 1), from 13 manufacturers, included A, B, and EP types were used. And most columns belong to type B according to the previous study [11][12][13][14][15]. According to the PQRI approach [17,18] and using the data from the USP website (http://www.usp.org/USPNF/columns. html), the similarity of columns were calculated using col1 as the reference column. The similarity (0- 13.18) showed that the differences among the columns were large, which indicated that the selected columns are in a wide range and have good representative trait.
Reference substances of psoralen, isopsoralen, Chonglou saponin I, Chonglou saponin II, Chonglou saponin VI, Chonglou saponin VII, ethinylestradiol, and herbal reference substances including Psoraleae Fructus (Psoraleae) and Paridis Rhizome (Paridis), were supplied by the National Institutes for Food and Drug Control, China. Methanol, acetonitrile, and phosphoric acid were HPLC graded and supplied by the Fischer Company, USA. Ammonium nitrate (analytical grade) was supplied by Beijing Chemical Works. Water was prepared by Milli-Q system, Millipore Company, USA.

HPLC chromatogram of samples
The typical chromatograms of Psoraleae and Paridis were shown in Fig. 1. The peaks were mainly identified by the reference substances. For those peaks without reference substances, UV-Vis spectrum and mass spectrum were used for identification.

Standard retention time (St R )
Under the same chromatographic conditions, measured retention time (t R mea) of the four saponins in Paridis on different chromatographic systems (which includes HPLC instruments and columns, hereinafter referred to as columns due to the differences of t R mainly caused by columns) were shown in Table 1. The arithmetic average of t R for the same compound on different columns is called St R , formula (1). Just like RR, St R is the reference value for calculating the predicted retention time (t R pre) of analyte in the samples. Theoretically, under the same chromatographic condition, the RR calculated by different columns is constants, but St R is not. It will be discussed in Section "Minimum number of columns for St R calculation" that the advantages of using St R was better than t R of any single column. In this paper, the deviation (Δt R ) of t R mea and t R pre (formula 2) was used to evaluate the merits and defects of RR method and LCTRS method.

Linear principle of LCTRS
According to the chromatographic thermodynamic theory, Wang et al. proved that there was a linear relationship between the t R of the same compounds on different HPLC system (mainly considered as columns) under the same chromatographic conditions [22], as expressed in formula (3) and Fig. 2a, b.
Since formulas (1) and (3) are both linear, thus there is a linear relationship between t R and St R for each compound, as shown in formula (4) and Fig. 2c, d. It is noteworthy that the correlation coefficient of the linear regression is higher than that shown in Fig. 2a, b.

Minimum number of columns for St R calculation
Theoretically, t R on any column can be used as reference value for linear fitting. But the Δt R calculated with random column were instable. Thus, the reasonable number of columns for St R calculation was thoroughly investigated by random sampling. St R was calculated based on 1, 5, 10, 15, 20, 25, and 30 columns combined with nonreplicate random sampling times of 30, 100, 100, 100, 100, 100, and 1, respectively. The value of St R with t R mea on 30 columns was used to fit multiple point linear equation. The averages of Δt R (average ± standard deviation) were calculated, as shown in Fig. 3. For both medicines, the prediction deviation was reduced with increasing number of columns. However, the prediction accuracy will not be significantly improved when the number of columns reaches five, which is considered as a low-cost and reasonable limit. It is recommended to choose fivefifteen columns for St R calculation.
Even for the columns with same type of packing material, there are still some differences among the column stationary phase, packing techniques and errors in the process of chromatographic analysis. Those differences will cause deviation of t R . The physical explanation of St R calculation was to evenly mix and refill the stationary phase of the columns selected. Because of reducing the random and system errors, the prediction result was accurate and robust.

Procedure of two points prediction
For RR method, only one compound was chosen as reference compound (reference substance required), and RR of all other compounds were used as reference value for calculating t R pre. For LCTRS method, two compounds were chosen as reference compounds (reference substances required), and St R of all other compounds were used as reference value for calculating t R pre. The reference compounds, the value of RR and St R were shown in Tables 2 and 3. Take Paridis as an example. First of all, reference substances solution of two reference compounds (Chonglou saponin VII and Chonglou saponin I) and sample solution were performed on a C 18 column (col4: BDS Hypersil C 18 ). The t R mea (21.014 and 35.170 min) of two reference compounds in the sample solution were obtained by the reference substances solution (Fig. 4a). Then two points, Chonglou saponin VII (19.803, 21.014) and Chonglou saponin I (33.035, 35.170), could be determined in the coordinate using St R as abscissa and t R mea as ordinate. Based on the two points, the following linear equation Fig. 2 Linear fitting results of Psoraleae and Paridis, code No. is the same as that in Fig. 1 was given: y = 1.0698x − 0.1719 (Fig. 4b). Taking St R of analytes (Chonglou saponin VI and Chonglou saponin II) into equation, the t R pre of Chonglou saponin VI (23.481 min) and Chonglou saponin II (32.263 min) were attained, respectively. Finally, in the chromatogram of the sample solution, the corresponding peaks of Chonglou saponin VI and Chonglou saponin II can be found within the range of t R pre ± t R W (t R W is abbreviation of t R window, in this case is 0.6 min), as shown in Fig. 4c. It can be seen that Δt R of analytes calculated by prediction of two points were 0.583 min and 0.416 min (The t R mea of analytes were 22.898 min and 32.679 min).

Procedure of multiple points regression
After assignment of the peaks of analytes in the sample solution by prediction of two points regression, the t R mea of those peaks should be validated by multiple points regression. In this procedure, t R mea and St R of reference compounds and analytes were used to fit a multiple points linear regression: Y = 1.1038x − 1.1075 (Fig. 4d). Taking St R of analytes (Chonglou saponin VI and Chonglou saponin II) into new equation, the new t R pre of Chonglou saponin VI (23.297 min) and Chonglou saponin II (32.358 min) were calculated. If Δt R of all analytes were less than the given t R L (t R L: t R limit, in this case    Fig. 4 Flow chart of LCTRS (Paridis, code No. is the same as that in Fig. 1) is 0.5 min), the prediction was success, otherwise failure (Fig. 4e). In this case, Δt R were 0.399 and 0.321 min, respectively. The step of validation by multiple points was based on the principle of stepwise linear regression, which can further improve the prediction accuracy.
The purpose of setting that the t R W is larger than t R L is to increase the amount of suitable columns and to improve the accuracy of prediction. Generally, the recommended ranges of t R W and t R L are 0.8-2.0 and 0.5-1.5 min, respectively. If necessary, the values can be adjusted in accordance with different samples under different chromatographic conditions. If the Δt R of some compounds are large, their t R W and t R L can be set individually.

Sequential matching rule
If the t R of two peaks are too close, e.g. less than 2 min, there would be a mistake for peak matching by the least Δt R rule. Take Psoraleae for example, as shown in Fig. 5a, peak #6 was assigned to peak A in the sample solution on col6 (Kromasil C 18 ) with a small Δt R of 0.515 min. However, peak #5 was not found and peak B in the sample solution was not matched. When t R W was set as 1.2 min, t R mea of peak A was within the window of t R pre of peak #5. t R mea of the peaks A and B would both fall into the window of t R pre of peak #6. Because of the existence of one common peak (peak A), peaks #5 and 6 should be treated as peak series for sequential matching. That is, the earlier t R pre will be matched to the peak with the earlier t R mea. Although Δt R of peak #6 increased to 1.036 min, the match results were correct, as shown in Fig. 5b. This rule can be further applied to multiple-peak series, which has a close t R .

Comparison between LCTRS method and RR method
The comparison among unadjusted RR method, adjusted RR method (dead time was measured by ammonium nitrate as probe compound), prediction by two points, and validation by multiple points was summarized in Tables 4 and 5. The results showed that the unadjusted RR method and adjusted RR method were similar, their prediction accuracy were bigger and suitable for less positive columns. But the prediction deviation was reduced and the number of positive columns was increased by LCTRS method. The best was validation by multiple points which was based on the prediction by two points.

Exclusion of column and compound by linear fitting
Nonlinear shift of t R for a compound on different columns could be caused either by different column packing materials and use of other packing techniques, or by the different compound structure. In order to exclude the columns and compounds with relatively large nonlinear shift, linear fitting of t R mea and St R were performed. The Fig. 5 Advantage of sequential matching (Psoraleae, code No. is the same as that in Fig. 1) following rules were used to identify the outlier column and compound. (1) In a regression scatter plot, the compounds obviously deviated from a regression line (the correlation coefficient is usually less than 0.99). (2) Δt R was usually larger than 1-2 min. The excluded columns and compounds would not be used for St R calculating.
For Psoraleae: no obvious nonlinear deviation was observed of all 11 compounds. 23 columns met the requirements (the average of correlation coefficient was 0.9989). The outlier columns were col2, 8, 12 (Fig. 6a), 16, 19, 20 and 30. For Paridis: no obvious nonlinear deviation was observed of four saponins. All 30 columns met the requirement with average correlation coefficient of 0.9993.
In order to simulate t R of compound with large structural difference, reference substances solution of   6 Outlier column (a) and Outlier compounds (b), code No. is the same as that in Fig. 1 ethinylestradiol mixing with four Chonglou saponins were used to measure t R of those five compounds on 30 columns. Nonlinear shift of ethinylestradiol was observed on col1 (Fig. 6b), 2-6, 8, 11, 15-18, 26-28 and 30. It appears that the HPLC retention behaviors of ethinylestradiol and four Chonglou saponins were significantly different on this chromatographic condition. It further indicated that the classification and similarity evaluation of columns should be based on the characteristics of columns as well as analytes.
If the outlier compounds cannot be excluded. The following approaches could be used: (1) specify one or more suitable columns; (2) provide reference substances for those compounds; (3) use UV-Vis spectrum and/or mass spectrum for assistant peak identification.

Selection of two reference compounds
Ideally there should be no difference in selecting any of the two compounds as reference compounds. However, because of the difference of HPLC instruments, columns, compounds structure, complexity of elution condition, and accidental error of analysis, different selection of reference compound pairs will make differences. In order to find out the rule for reference compounds selection, each combination of possible reference compound pairs for the two medicines was studied. The average of Δt R corresponding to each reference compound pair were calculated and shown in Fig. 7. It can be seen that, for the two medicines, the Δt R of prediction by two points step would be decreased with increasing coverage of t R [as shown in formula (5). The coverage of t R is a reflex of the relative position of the two reference compounds. The first compound is at one end (with smaller t R ), the last compound is at the other end (with bigger t R ). If the coverage is high, the two reference compounds are near both ends, otherwise they are in the middle or near the same end]. The coverage corresponding to the smallest Δt R was 80-100%.The results of Psoraleae (Fig. 7a) showed the advantage of choosing reference compounds with smaller linear deviation, when the coverages of t R were similar. Therefore, the optimized reference compounds for Psoraleae were peak #2 and peak #8, rather than peak #1 and peak #11 which had a maximum coverage of t R but with more deviation from linearity. The selection rule decreases the randomness of choosing reference compounds and the amount of calculation (or the Δt R of all possible reference compound pairs will be calculated every time). The accurate and simple selection procedures were as follows. Firstly, Select two reference compound pairs with large t R coverage (80-100%). Secondly, exclude compounds with large linear deviation based on the linear fitting results. Thirdly, calculate the Δt R of the rest reference compound pairs and select reference compound pairs with the smallest Δt R . t R2 is t R (or St R ) of second reference compound; t R1 is t R (or St R ) of first reference compound; t Rlast is t R (or St R ) of last compound; t Rfirst is t R (or St R ) of first compound.
In summary, the establishment procedures of LCTRS were as follows. Firstly, select five-fifteen different brands of C 18 columns, and record the HPLC chromatograms of reference substances and sample on all columns. Secondly, calculate initial St R by using all columns, and perform linear fitting of t R on each column with St R . Exclude outlier columns and compounds, and recalculate final St R using remaining columns. Finally, select two reference compounds with large t R coverage and low linear deviation.

Advantages of LCTRS method
According to the study of Wang et al. [22], t R of the compounds on different HPLC system follows the linearity principle. The RR method can be regarded as external standard one point method, which means the regression line is forced to pass origin. However, most of the linear equations have intercepts, which is why the deviation of unadjusted RR method was large. For considering the dead time, adjusted RR method should be better than unadjusted RR method in theory. But the probe compound for dead time measurement would be interacted with mobile phases and stationary phases of the columns. The interaction would increase the error in dead time measurement. So the prediction accuracy of this method was not improved in practice. For prediction by two points and validation by multiple points, dead time, gradient delay, volume exclusion effect of stationary phase, retention behavior of homologous compounds and so forth, were fully considered. Thus, the prediction accuracy was significantly improved. Stepwise linear regression was used in the validation by multiple points step, which further improved the prediction effect.

Compatibility of LCTRS method and RR method
Both LCTRS method and RR method are equivalent in mathematics. Formulas can be expressed in the same form. In the LCTRS, calibrated retention (CR) is defined as the ratio of St R of analytes to reference compounds, as shown in formula (6). Different from RR, CR is based on statistics of St R . Thus, its prediction accuracy was equal to LCTRS (only equal to prediction by two points).
where t Ri is St R of analytes in CR, or t R of analytes in RR; t R1 is St R of the first reference compound in CR, or dead time in adjusted RR, or zero in unadjusted RR; t R2 is St R of the second reference compound in CR, or t R of reference compound in RR.

Conclusion
A new method for t R prediction of HPLC chromatographic peaks was proposed. 16 compounds in two medicines under isocratic or gradient elution conditions were tested through three brands of HPLC instruments with 30 different brands of C 18 columns. It is demonstrated that the method is simple, accurate, and robust for more HPLC columns. Furthermore, the calculation approach of St R and the selection rule of the two reference compounds were discussed.
Both multi-components analysis in TCMs and determination of related substances in pharmaceutical chemicals need lots of reference substances for peak identification. But it may be not affordable for routine analysis and research using all reference substances. LCTRS is a simple and low-cost alternative method for peak identification. Compared with RR method, it need one more reference substance but is more accurate and suitable for more HPLC columns. LCTRS method provides a good prospective application for overall quality evaluation of TCMs and impurities analysis in pharmaceutical chemicals.

Abbreviations
TCMs: traditional Chinese medicines; RR: relative retention; LCTRS: linear calibration using two reference substances; t R : retention time; SSDMC: single standard to determine multi-components; St R : arithmetic average of t R ; Δt R : deviation of t R ; t R pre: predicted retention time; t R mea: measured retention time; CR: calibrated retention.