Online near-infrared analysis coupled with MWPLS and SiPLS models for the multi-ingredient and multi-phase extraction of licorice (Gancao)

Background This study aims to analyze the active pharmaceutical ingredients (APIs) of licorice (Radix Glycyrrhizae; gancao), including glycyrrhizic acid, liquiritin, isoliquiritin and total flavonoids, in multi-ingredient and multi-phase extraction by online near-infrared technology with fiber optic probes and chemometric analysis. Methods High-performance liquid chromatography and ultraviolet spectrophotometry determined the APIs content in different extraction phases by online near-infrared analysis, which included sample set selection by the Kennard–Stone algorithm, optimization of spectral pretreatment methods (i.e., orthogonal signal correction and wavelet denoising spectral correction), and model calibration by the partial least-squares algorithm, moving-window partial least-squares algorithm and synergy interval partial least-squares (SiPLS) algorithm. The relative errors and F values were used to assess the models in different extraction phases. Results The root-mean-square error of correction, root-mean-square error of cross-validation and root-mean-square error of prediction of APIs in the SiPLS model was less than 0.07. The F values of glycyrrhizic acid, liquiritin, isoliquiritin and total flavonoids were 10,765, 32,431, 649 and 6080, respectively, which were larger than 6.90 (P < 0.01). Conclusion The study demonstrated the feasibility of online NIR analysis in the multi-ingredient and multi-phase extraction of APIs from licorice. Electronic supplementary material The online version of this article (doi:10.1186/s13020-015-0069-2) contains supplementary material, which is available to authorized users.


Background
The Process Analysis Technology Industry Guide was published by the U.S. Food and Drug Administration for encouraging drug development with the use of online analysis [1]. Process analysis technology is applicable monitoring of raw materials and key intermediates in real time and for quality assurance of the final products.
Near-infrared (NIR) analysis can be applied online as an effective process analysis [1]. Online NIR analysis is coupled with an optical fiber in manufacturing for the online monitoring of critical process parameters that control the quality of production [2]. NIR analysis can be used to identify active pharmaceutical ingredients (APIs) [2,3]. The technology has also been applied to Chinese medicine (CM) in the extraction of an individual ingredients; e.g., Ligusticum chuanxiong (Chuanxiong) [4], Salvia miltiorrhiza (Danshen) [5], Paeonia lactiflora (Shaoyao) [6] and Pueraria lobata Ohwi (Gegen) [7]. However, only a few reports mentioned the application of online NIR analysis for multiple ingredients and APIs of low concentration, e.g., Astragali Radix (Huangqi) [8] and Radix Paeoniae Rubra (Chishao) [9].
There is a gap to fill in CM process analysis with an online and reliable detection method that can simultaneously detect multiple ingredients in real time. The majority of APIs is usually extracted with water or other solvents for CM. Multiple phases should be applied to accurately observe the extraction process by NIR technology. However, there was no previous work on online NIR analysis demonstrating the simultaneous detection capability for multi-phase extraction in CM.
Licorice (Radix Glycyrrhizae) (Gancao) is widely used in CM [10]. APIs are taken from extraction of the dried roots and rhizomes of Glycyrrhiza glabra (Gancao) [11]. The APIs of licorice include flavonoids, saponins, glycyrrhizic acid and liquiritin, according to Chinese Pharmacopoeia (2010 Edition). There was no report on the online monitoring of the multi-phase extraction and the multiple ingredients of licorice.
Online NIR technology was applied to collect spectra in a pilot-scale extraction process. Results obtained using the partial least-squares (PLS) algorithm, moving-window partial least-squares (MWPLS) algorithm and synergy interval partial least-squares (SiPLS) algorithm were compared to high-performance liquid chromatography (HPLC) and ultraviolet (UV) spectrophotometry. Common chemometric indicators [i.e., the lowest root-meansquare error of correction (RMSEC), root-mean-square error of cross-validation (RMSECV) and root-meansquare error of prediction (RMSEP)] were used to assess the models and demonstrate reliable analysis [12]. Furthermore, the relative errors and F-values were used in analysis of the extraction of different phases to evaluate the reliability and detection ability of online NIR analysis [13].
This study aims to analyze the APIs of licorice, including glycyrrhizic acid, liquiritin, isoliquiritin and total flavonoids, in multi-ingredient and multi-phase extraction by online NIR technology with fiber optic probes and chemometric analysis.

Processing and sampling of different extraction phases
A 9-kg quantity of licorice was extracted with eight-fold deionized water in a multi-functional extractor (100 L) three times at 2.5-h intervals. The stirring paddle (HCHT System, Beijing, China) was set at a speed of 50 rpm. During the extraction, NIR spectra were scanned periodically (Table S1 in Additional file 1). According to the contents of the four ingredients, a reasonable sampling interval was determined. In the initial heating and boiling phase, the contents of ingredients varied rapidly, and a short sampling interval was set. As the contents of ingredients varied less in the second and third extractions than in the first extraction, the sampling interval was lengthened to reduce the amount of work in the second and third extractions.
The system included an online NIR scanning instrument ( Fig. 1). Licorice was added to the tank and extracted with deionized water. Bubbles were eliminated in the bypass pipe by completely submerging the filter in the tank, which was interlinked with the bypass pipe. The extraction solution was circulated in the bypass under the action of a pump. The pump was powered by compressed air provided by an air compressor to eliminate contamination. The 80-and 100-μm filters were used to eliminate the interference from solid content when the extraction solution passed through the bypass [14,15]. The pump was turned on for 30 s to update the solution in the bypass. The sample was scanned in a flow cell by an optical fiber to ensure samples were in the same environment as the solution in the tank [14]. The recoil loop that reduced the risk of the bypass clogging and eliminated bubbles in the pipe was included.
The temperature was recorded in real time by thermometers (HCHT System, Beijing, China). Throughout the extraction process, spectra were recorded by an online NIR instrument with an optical fiber. As soon Fig. 1 Platform of extraction as the scanning was completed, the sampling tap was opened and 10 mL of extract solution was collected for HPLC and UV analysis.

NIR equipment and measurement
Online NIR spectra were collected by fiber optic probes. NIR radiation was applied through a 2-mm optical path using an XDS process analyzer and VISION software (Foss NIR System, Silver Spring, MD, USA). The wavelength range of spectra was between 800 and 2200 nm. Spectra were obtained from an average of 32 scans with a wavelength increment of 0.5 nm.

HPLC methods
All samples were diluted with 70 % (v/v) ethanol-water solution and the contents of glycyrrhizic acid, liquiritin and isoliquiritin were determined by a reversed-phase HPLC assay with analytical validation. Chromatographic analysis was performed by a Waters 2695 HPLC system and Waters 2996 DAD detector (Waters Technologies, USA). The concentrations of glycyrrhizic acid, liquiritin and isoliquiritin were analyzed by chromatography on an octadecyl silica column (250 mm × 4.6 μm, Dikma, China) with isocratic elution of the mobile phase consisting of acetonitrile and deionized water with 0.1 % phosphoric acid at a flow rate of 1.0 mL/min. The column temperature was 30 °C and the detection wavelengths of glycyrrhizic acid, liquiritin and isoliquiritin were 250, 276 and 360 nm, respectively. A 10-μL quantity of the extract solution was injected into the HPLC system for analysis.

UV methods
UV spectrophotometry was employed to analyze the content of licorice total flavonoids. The UV method was implemented on an Agilent 8450 UV spectrophotometer with a quartz cuvette (Agilent Technologies, USA). The analysis of licorice total flavonoids was as follows. A 0.5-mL quantity of 10 % KOH was used to prepare different diluted solutions. Reactions proceeded for 60 min in 5-mL volumetric flasks. The detection wavelength of licorice total flavonoids was 335 nm.  [16,17]. Additionally, the PLS, MWPLS and SiPLS models were evaluated according to chemometrics indicators. All three methods were based on the root-mean-square error (RMSE):

Software and data analysis
where c i is the reference values of the extraction of Gancao detected by HPLC and UV analysis, ĉ i denotes the estimated values for different samples, I is the number of samples in each set [18,19].

Quantitative analysis of glycyrrhizic acid, liquiritin and isoliquiritin by HPLC
The reference values of three compounds were given in (Table S2 in Additional file 1). The calibration curves of glycyrrhizic acid, liquiritin and isoliquiritin exhibited good linearity (R 2 = 0.9990, R 2 = 0.9995, R 2 = 0.9990) with the linear range extending from 0.407 to 4.070 μg, from 0.108 to 1.085 μg and from 0.016 to 0.168 μg, respectively. The response precision (intermediate precision and repeatability), stability and accuracy (recovery) met the requirements of analysis.

Quantitative analysis of total flavonoids by the UV method
The linear regression of licorice total flavonoids gave y = 97.323x + 0.0413 (R 2 = 0.9992), with the linear range being 1.59-9.54 μg. The precision (intermediate precision and repeatability), stability and accuracy (recovery studies) of the UV method satisfied the demands of analysis. The minimum, maximum and average concentrations of licorice total flavonoids were 0.044, 1.914 and 0.753 mg/mL, respectively.

NIR spectral characteristics
There was a large fluctuation in 2000-2200 nm because of a high level of noise in the combination region (Fig. 2). Additionally, aqueous solution is intensely absorbed at 1950 nm [20,21]. There are large signal fluctuations in the spectral region of 780-2100 nm, suggesting that this spectral region contained the main information on concentrations. Furthermore, variable selection was selected by MWPLS and SiPLS method to obtain multivariable models.

Optimum result of NIR pretreatment methods and latent factors
The spectra were affected by spectral noise, baseline drift and overlapping peaks. Spectral pretreatment methods were applied before the model was established to improve the accuracy of the model performance. Several pretreatment methods were applied to the spectral data set. The raw spectra, 11-point Savitzky-Golay and first derivative (SG + 1D) spectra, 11-point Savitzky-Golay and second derivative (SG + 2D) spectra, ninepoint Savitzky-Golay (SG) spectra and 11-point SG spectra were thus compared in eliminating interference information [22]. The standard normal variation (SNV) and multiplicative scatter correction (MSC) were applied to reduce the effect of small particles in the extraction solution [23]. An orthogonal signal correction (OSC) was applied to pretreat the complex system [24]. Normalization was also applied before establishing the PLS model. Leave-one-out cross-validation was used to select an appropriate pretreatment method. The number of latent variable factors was investigated by leave-one-out cross-validation. The optimum number of latent factors was determined according to the lowest predicted residual sum of squares (PRESS) value [23]. Figure 3 shows the relationship between the latent variable and PRESS value for different pretreatment methods. OSC was found to be the best pretreatment  (Table 1). Therefore, combining with the evaluation parameters, the raw spectra was selected to establish the PLS model for each quality parameter. According to the PLS results, the model performances achieved by MWPLS and SiPLS algorithms were compared to obtain low prediction error.

Performance of the MWPLS model for the four compounds
The function of the MWPLS model can be briefly described as the selection of informative regions and the approximation of latent factors [13]. Different moving window sizes H were selected, and the RMSECV was calculated for the various window sizes and a various number of factors. If the MWPLS model was better than the PLS model, it would have a lower RMSECV than the PLS model. For the four compounds in licorice, the MWPLS model was established in the range from 800 to 2200 nm, a range corresponding to 2800 data. The size of the moving window H varied from 13 to 41. Thus, moving windows were optimized with an RMSECV value lower than that for the PLS model [29]. The result demonstrated that RMSECV values for glycyrrhizic acid, liquiritin and licorice total flavonoids were all higher than those in the case of the full-spectrum PLS model, revealing that it was inappropriate to use MWPLS models for these three ingredients (Fig. 4). For isoliquiritin, the MWPLS model had the lowest RMSECV value, corresponding to H = 35. However, in contrast to the full-spectrum PLS model, the MWPLS model could not perform better for isoliquiritin, which might be attributed to the low content of isoliquiritin.

Performance of the SiPLS model for the four compounds
The use of the SiPLS model was investigated as another variable selection method. The full spectrum was split into intervals. Several intervals constituted a joint model. The PLS was established for each joint model. The RMSECV value was regarded as a measurement of the accuracy of the model. The subinterval combination was selected on the basis of the combination of high accuracy of the joint model and a low RMSECV value. For the extraction of APIs, the optimal parameters of the SiPLS model were taken from the literature [25]. Each optimal  (Table 3 in Additional file 1). The performance results of the SiPLS and PLS models in calibration set were similar for the four compounds in licorice, but in the predicted sets of the compounds. The SiPLS model performed better than the PLS model. SiPLS models were thus established for the extraction of licorice.

Performance of SiPLS models for the extraction of the four compounds
The SiPLS method was used to establish models of extraction. R 2 for glycyrrhizic, liquiritin and licorice total flavonoids mostly exceeded 0.98, indicating that the models had good accuracy. The RMSEC, RMSECV, and RMSEP were less than 0.07 for the four ingredients. Figure 6 presents the regression of calibration and the prediction result for each SiPLS model. The results showed that the reference value and predicted value almost aligned. However, for isoliquiritin, R 2 was about 0.93, which can be attributed to the low content of isoliquiritin and high detection limit of NIR technology.

SiPLS model assessment by relative errors and the F-values
The relative errors and F values were further employed to determine the predictive ability of the SiPLS model and to verify the reliability of the online NIR model in the extraction process for licorice. Different extraction phases of licorice for the four ingredients are shown in Table 2. As the contents of the four compounds (glycyrrhizic acid, liquiritin, isoliquiritin and total flavonoids) were different, and 93 samples were selected by the KS algorithm for each compound, the number of samples of each compound was different in the same phase. Although some samples could not be detected by HPLC and UV analyses, all results except those of the third extraction and isoliquiritin satisfied the needs of analysis. The mean relative error of the third extraction phase was higher than that of the first and second extraction phases. In the same extraction phase, the relative error of isoliquiritin was higher than that of other ingredients. These results could be attributed to the low concentration (micro analysis) of the third extraction and isoliquiritin.
In addition, the NIR and reference methods were compared using an F test [26]. The F values of glycyrrhizic acid, liquiritin, isoliquiritin and total flavonoids were 10,765, 32,431, 649 and 6080 respectively (P < 0.01). According to the F value distribution table, for a significance level ∂ = 0.01 and number of samples n = 93, the F value is 6.90 (P < 0.01). The F values of the four compounds given above were much higher than 6.90 (P < 0. 01), showing the significant relationship between the prediction value and reference value. Furthermore, multivariate detection limit (MDL) values were proposed in evaluating the model according to the type of errors and concentration ranges [27]. The MDL was almost 14 ppm,

Conclusion
The study demonstrated the feasibility of online NIR analysis in the multi-ingredient and multi-phase extraction of APIs from licorice.

Additional file
Additional file 1.