Predictive QSAR model confirms flavonoids in Chinese medicine can activate voltage-gated calcium (CaV) channel in osteogenesis

Background Flavonoids in Chinese Medicine have been proven in animal studies that could aid in osteogenesis and bone formation. However, there is no consented mechanism for how these phytochemicals action on the bone-forming osteoblasts, and henceforth the prediction model of chemical screening for this specific biochemical function has not been established. The purpose of this study was to develop a novel selection and effective approach of flavonoids on the prediction of bone-forming ability via osteoblastic voltage-gated calcium (CaV) activation and inhibition using molecular modelling technique. Method Quantitative structure–activity relationship (QSAR) in supervised maching-learning approach is applied in this study to predict the behavioral manifestations of flavonoids in the CaV channels, and developing statistical correlation between the biochemical features and the behavioral manifestations of 24 compounds (Training set: Kaempferol, Taxifolin, Daidzein, Morin, Scutellarein, Quercetin, Apigenin, Myricetin, Tamarixetin, Rutin, Genistein, 5,7,2′-Trihydroxyflavone, Baicalein, Luteolin, Galangin, Chrysin, Isorhamnetin, Naringin, 3-Methyl galangin, Resokaempferol; test set: 5-Hydroxyflavone, 3,6,4′-Trihydroxyflavone, 3,4′-Dihydroxyflavone and Naringenin). Based on statistical algorithm, QSAR provides a reasonable basis for establishing a predictive correlation model by a variety of molecular descriptors that are able to identify as well as analyse the biochemical features of flavonoids that engaged in activating or inhibiting the CaV channels for osteoblasts. Results The model has shown these flavonoids have high activating effects on CaV channel for osteogenesis. In addition, scutellarein was ranked the highest among the screened flavonoids, and other lower ranked compounds, such as daidzein, quercetin, genistein and naringin, have shown the same descending order as previous animal studies. Conclusion This predictive modelling study has confirmed and validated the biochemical activity of the flavonoids in the osteoblastic CaV activation.

viticis and Perilla frutescens, various flavones such as apigenin, isovitexin, luteolin and vitexin could be obtained [3,4]. Favones is another class of flavonoids that could be found easily in Radix scutellariae and Cuscuta chinensis. For example, naringenin can reduce cholesterol levels [5], hesperidin can reduce inflammation via its suppression pathways of lipopolysaccharide (LPS)-elicited and infection-induced Tumor necrosis factor alpha (TNF-α) production [6], and naringin can be used in bone graft material to induce osteogenesis [7]. Flavan-3-ols include the catechins and the catechin gallates. The major compounds are catechin, epicatechin, catechin gallate and epicatechin gallate which are the active components of green tea leaves (Camellia sinensis) and have antimutagenic, antitumour, anti-inflammatory and free-radical scavenging activities [8]. Isoflavones has its main sources in soy cheese, soy flour, soybean and tofu, etc. Daidzein and genistein are among several known isoflavones [9].
Some flavonoids, such as daidzein [10], quercetin [11], genistein [12] and naringin [7], have been proven in animal studies that could aid in osteogenesis and bone formation. However, there is no consented mechanism for how these chemicals action on the bone-forming osteoblasts. Some evidences have shown bone resorption [13], cell proliferation [14] and cell signal transport [15] were related to activation of osteoblastic calcium channels. In particular, L-type calcium channels (e.g. CaV1.2) could mediate the change of Ca 2+ inside the osteoblasts by some regulatory agents such as parathyroid hormone (PTH) [16] and vitamin D [17]. However, study also showed the inhibition of the channels might also promote osteoblast differentiation [18]. On the other hand, Saponara et al. [19] has recently shown in the whole-cell patchclamp experiments that 24 flavonoids are either activators or deactivators of CaV1.2 channel current measured in artery myocytes of rat tail. Thus, it seems to that the actual physiological mechanisms are unclear [20].
To establish the correlation for the flavonoids with known biochemical activity of being a blocker or activators of Ca 2+ channels, quantitative structure-activity relationship (QSAR) modeling might be useful to identify and screen the flavonoids, since QSAR could predict biochemical activities for new or untested flavonoids of the same class via selected molecular characteristics (descriptors) that has correlation with the biochemical activity of activating or blocking CaV channels. In fact, QSAR has been used in the fields of medicine, biochemistry, molecular biology and biomaterial science for more than three decades. QSAR is particular useful in screening and predicting the biochemical interaction between, for example, enzyme and complex phytocompounds [21]. Recently, predictive QSAR can be operated in either small, focused and good biochemical data (so-called supervised machine learning approach) that can easily map the biological responses with input feature parameters [22], or utilizing on big enough data for "black box" (so-called deep learning or descriptor-free approach) [23] that can assist in drug design even the chemicals are not existed [24].
Thus, to consider the specific effects of flavonoids on CaV channel current kinetics, a supervised machine learning QSAR model can be built based on chemical structures together with suitable biochemical data (molecular descriptors), such as pKa and patch-clamp experiment data, of the flavonoids to provide a better understanding of the structure-activity relationship between the compounds and CaV. These descriptors have been selectively chosen for the flavonoids since there have been increasing amount of molecular descriptors represented by quantum-chemical and various classical parameters that were designed and tested as potential variables for QSAR modeling. Quantum-chemical parameters represent a special class of molecular properties. They can be obtained from sophisticated ab initio calculations or by means of relatively inexpensive semi-empirical methods, but in the case of flavonoids, such calculations require more time and effort than those for one, two or three-dimensional classical parameters which can be computed from molecular structures of flavonoids within a few minutes. However, in contrast to most classical descriptors, quantum-chemical parameters are capable of expressing all the electronic and geometric properties of the flavonoids being analyzed as well as their interactions [25]. Therefore, the interpretation of quantum-chemical descriptors can provide much deeper insights into the nature of flavonoids' biochemical and physicochemical mechanisms than that of classical descriptors. The advantages of descriptors calculated by means of quantum-chemical approaches that account for specific and non-specific solvation effects are of prime importance. This paper focuses on QSAR studies by applying the quantum descriptors based on the current literatures' biochemical data in predicting the effects of flavonoids' biochemical activities on the osteoblast's CaV channel current in order to demonstrate its important biochemical and physicochemical properties on either activating or deactivating the CaV channel for osteogenesis. As such, we anticipate a novel selection and effective approach of flavonoids for CaV activation and inhibition could be developed.

Materials and methods
QSAR equation modeling attempted to calculate the mathematical correlations between the tested compounds' chemical attributes and its biochemical response. Such attempt aimed to establish statistical formalism that was indicated as biochemical response of flavonoid = f (flavonoids' biochemical attributes). The flavonoids' biochemical attributes and property were derived from the flavonoid's chemical structure and property. Hence, the QSAR equation was expressed as: To be stated in a simple way, the QSAR equation could be statistically expressed as: where β 0 was a constant; β 1, β 2, β 3, . . . β n were the inputs of descriptors; X 1, X 2, X 3, . . . X n were different flavonoids' structural features; and Y was biochemical response.

Steps of QSAR modeling
The four basic steps of QSAR study included (i) data preparation, (ii) data processing, (iii) data prediction and validation, and (iv) data interpretation. The first step was allowed to arrange the data in a convenient and usable form. Since biochemical responses of the flavonoids on CaV channel were considered as the dependent variable, the input data were the flavonoids' rate of activation and inhibition that were retrieved from Saponara et al. [19]. The predictor variables (i.e. molecular descriptors) could be obtained from chemical structure and property of the flavonoids. After the determination and computation of descriptors, a QSAR table was formed that was a two-dimensional (2D) array of numbers with the columns representing descriptors and response and compounds were depicted in successive rows. As QSAR was basically a statistical approach, the number of observations was higher than the number of descriptors used in the final models for achieving sufficient modeling reliability and robustness. By considering the presence of intercorrelated and redundant data, a pretreatment procedure was also used in the data-processing step.
In each step of the QSAR model development, several statistical operations were involved right from the generation of descriptors which were encoding of information to the pretreatment of data, classification of the data set, development of model, validation and reliability check of the model. Although the partial least squares (PLS) and multiple linear regression (MLR) were common statistical tools to develop QSAR models with genetic algorithm (GA) serving as variable selection methods, these techniques might be inappropriate if X ij is highly correlated or high dimensional, especially in comparison to sample size that might cause variable selection procedures to be unstable.
In these cases, it could find the way to reduce the amount of covariate information because the main focus was on future prediction.
Biochemical response = f (flavonoid's chemical structure and property).
Instead, ξ j was the principle components (PCs) of X i : Then, the principle components regression (PCR) model was developed for some p ′ < p.
PCR had the advantages that α ij were uncorrelated to strengthen the stability of estimates, and established stable variable selection through dimension reduction. By choosing p ′ , enough PCs could be used to do variable selection, capture higher percentage of variation, and maximize adjusted r 2 . The basic workflow of QSAR analysis along with the principal component regression (PCR) was depicted in Fig. 1. On successful runs of Principal Component Regression (PCR) by the QSAR Module of the VLifeMDS 4.3 software (VLife Technologies, Pune, India), the QSAR equations were generated to statistically analyze and determine the model.

Computation of molecular descriptors
First, the QSAR software was applied to align the 3D structural data of the flavonoids as shown in Fig. 2 in order to investigate the variation of molecular shape of each molecule. There had been 20 flavonoids to be included in the training set for deriving QSAR model whereas the chemical structures classified by subclass of flavonoids are shown in Table 1 and the acid dissociation constant (pKa) (i.e. − log10 Ka) values of the hydroxyl groups are shown in Fig. 2. Moreover, the data of the rate of activation and inhibition from Saponara et al. [19] are supplemented as the dataset in which the current evoked at 0 mV from a V h of − 50 mV activated and inhibited with τ of activation ranging between 2.2 and 3.1 ms, and τ of inactivation between 92.0 and 127.9 ms are shown in Table 2. Molecular descriptors, which characterized specific information about a flavonoid, were the numerical value affiliated with the biochemical response for correlation of chemical structures with various biochemical properties. In other words, the modeled response was represented as a function of quantitative values of structural features or properties that were termed as descriptors for the QSAR model. Ab initio derived electronic properties in combination with topological quantumchemical descriptors (i.e. "k2alpha", "Id", "IdwAverage", "Most+vePotential", "MomInertiaY" and "DeltaEpsi-lonC") were used to help to describe the electronic environment of the flavonoids and locate molecular regions responsible for given bioactivity of flavonoids on the CaV channel [26]. The type of descriptors used and the extent to which they could encode the structural features of the molecules that were correlated to the response were critical determinants of the quality of the QSAR model. The ways of chemical structures used to calculate descriptors for QSAR model were illustrated in Fig. 3. The data set of flavonoids constituted a group of small polyphenol compounds which can both block and enhance Ca 2+ current. Firstly, the half maximal activiting/inhibitory concentration (IC 50 ) was regarded as the activatory/inhibitory activity values. Then, [IC 50 (μM)] that was referred as the activity data was transformed into the logarithmic scale pIC 50 , i.e. [− log IC 50 (μM)], that had been applied as the response variables to obtain the linear relationship in the QSAR equation. Secondly, the biochemical database of the study was randomly classified into two subsets that include 4 compounds of test-set and 20 compounds training-set (Table 3). Thirdly, the molecular descriptors were computed by the docking QSAR software for different types of theoretical descriptors for each flavonoid. Finally, two models, namely model A for CaV activation and model B for CaV inhibition, were generated by PCR after it screened for different combinations of descriptors by genetic algorithm.

Validation on QSAR models
Although there are no confirmatory experiments performed to validate simulation results, these in silico data could be confirmed by other QSAR methods such as comparative molecular field analysis and support vector machine. Validation for QSAR model was done based on the flavonoids for detecting the precision of predictions. The leave-one-out cross validation technique was mainly involved in validating the sample (n = 24). For checking reliability of a QSAR model for prediction of the response property on the data, the original data set was classified into 6 subsets with each of size 4. The validation process was repeated by using 5 subsets as training set and the rest 4 as testing set. Training set was employed for model development while the ability of the model to predict response value of the flavonoids was done using the testing set. The developed models were subjected to statistical validation tests to establish its reliability. Steps of validation methods were indicated in Fig. 4. The following metrics for determination of QSAR quality, as well as internal and external validation were used:  Leave-one-out (LOO) cross-validation The r 2 m metric for internal validation ii. Metrics for external validation . Root mean square error in prediction (RMSEP) iii. Molecular descriptor "k2alpha" is descriptor indicating second order kappa alpha shape index ( 2 kα or k2alpha): "Id" and "IdwAverage" that are the type of information theory based descriptors on distance equality, whereas the total information content on the distance equality (Id): where f is the number of distances with equal g values in the triangular D submatrix. D is an A × A matrix that contains the graph distances between atoms. The graph distances are calculated as 1/(the number of bonds between atoms) 2 The mean information content on the distance equality (IdwAverage): .

Results
As listed in Table 4, for Model A, the QSAR model not only had internal predictive ability ( q 2 = 6.93%) and external predictive ability ( r 2 pred = 95.86%), but also could explain 31.82% of the total variance ( r 2 = 0.3182) in the training database. The F-test = 3.9664 showed that the Model A was statistically significant with p-value < 0.001 for which it indicated the model had less than 0.1% probability of making an error. For Model B, although the QSAR model had only external predictive ability ( r 2 pred = 52.27%) and could only explains 8.45% of the total variance ( r 2 = 0.0845) in the training dataset, its internal predictive ability ( q 2 = 11.48%) was relatively higher than that of Model A. Similarly, the F-test = 1.5682 also indicated the Model B was statistically significant with p < 0.001 which meant that the Model B's likelihood of committing an error was less than 5%. Therefore, both Model A and B were justified for their internal and external predictive ability. Tables 5 and 6 indicate the observed values and the predicted values of the activation (Model A) and inhibition (Model B) activity of flavonoids on CaV channel, respectively. Figure 5a shows the goodness of fit graph of observed activity and predicted activity of the flavonoids activating on CaV channel. Moreover, it indicated how good the actual training dataset could be fitted by the predicted PCR equation. From the Radar plots as shown in Fig. 5b, c, the fitted PCR equation of the training data set could be predicted well by the test data set. Hence, the predictive ability of Model A could be confirmed. On the other hand, Fig. 6a indicates that the fitness plot of observed activity and predicted activity of the flavonoids inhibiting on CaV channel. In addition, it showed how well the actual training dataset could be fits by the predicted PCR equation. From Radar plots in Fig. 6b, c, it was also revealed that the fitted PCR equation of the training data set could be predicted well by the test data set. Therefore, the predictive ability of Model B could also be confirmed. Table 7 shows the ranking the order of activation of flavonoids on the CaV channels such that the descending order of flavonoids activation on the CaV channels are: scutellarein > morin > daidzein > myricetin > apigenin > quercetin > (±)-taxifolin > 5,7,2′-trihydroxyflavone > genistein and so on.

Discussion
The results indicated in Table 7 has shown the molecular structures, contribution graphs and parameters of selected descriptors of flavonoids as predicted in the CaV activation model with QSAR equations (model A) The results were surprisingly consistent with the order of the flavonoids ranked by the percentage of increase in new bone formation, i.e. daidzein (602%) > quercetin (556%) > genistein (520%) > naringin (490%) as reported in series of in vivo animal studies by Wong et al. [7,[10][11][12]. This partial external validation, despite not a full set of comparison, gives a good guarantee for the present predictive supervised machine-learning QSAR model that can precisely ranks and predicts the flavonoids effects on osteogenesis based on the existing biochemical information with CaV [19].
Activation and inhibition activities of flavonoids were investigated based on their ability to maintain the balance between activation and inactivation in the CaV by binding to the beta subunit receptor of CaV. In this case of activation, it was indicated that activating activity is mainly the outcome of electronic interactions between atomic charges within flavonoids and possible receptorlike structures in the CaV. In the case of inhibition, it was shown that the binding affinities of selected flavonoids to the CaV receptor are highly dependent on physicochemical properties involved in the interactions. As such, statistically the Model A presented the justified characteristics of molecular structure of flavonoid compounds that were required for activating CaV channel in which electrostatic fields were estimated using "k2alpha" that was descriptor signifying second alpha modified shape index, and "Id" and "IdwAverage" that were the type of information theory based descriptors. On the contrary, the Model B showed different structural features that were required for flavonoids to inhibit CaV channel. Its electrostatic fields were estimated using "Most+vePotential" that was descriptor indicating the highest value of positive electrostatic potential on the van der Waals surface area of the flavonoids, "MomInertiaY" that was steric descriptor signifying moment of inertia at Y-axis, and "DeltaEpsi-lonC" that was descriptor for electronegativity signifying differences between the frontier molecular orbital energies. Hence, from the six descriptors selected for the  Fig. 4 Steps of validation methods for the QSAR model principal component regression model, one was related to the electronic (i.e. "Most+vePotential") or two were related to the physicochemical ("MomInertiaY" and "DeltaEpsilonC") properties of the whole molecules and three ("k2alpha", "Id", "IdwAverage") described electronic properties of individual atoms. All these selected descriptors correspond to the analogous behavior in terms of tendency that was being also observed by the QSAR analysis on the CaV channel to be activated and inhibited by the flavonoids. Furthermore, the QSAR model was chosen in accordance with the parameter estimates of r 2 , q 2 , r 2 pred , F-stat and p-value. Since the r 2 value = 0.3182 of Model A was higher than the Model B's r 2 value = 0.0845, and the q 2 value = 0.0693 of Model A was relatively lower than the Table 4 The QSAR model with the corresponding parameters of estimates r 2 , determination coefficient; q 2 , internal predictive ability; r 2 pred , external predictive ability; se, standard error; pIC 50 , concentration of flavonoids in logarithm scale required for 50% activation/inhibition of CaV channel activity   Model B's q 2 value = 0.1148, and r 2 pred = 0.9586 of Model A was higher than the Model B's r 2 pred = 0.5227, Model A had justified values for being selected to be the better QSAR model to support the argument that CaV channel was more likely to be activated rather than being inhibited by the flavonoids. From this point, the QSAR approach pursues its objective of understanding the biochemical effects of the flavonoids on the CaV channel and providing practical suggestions for screening optimal flavonoids have been demonstrated in the investigations of its activating and inhibitory activity on CaV channel. Moreover, analogous behavior in terms of selected descriptors tendency was also observed by the QSAR analysis on the CaV channel to be activated and inhibited by the flavonoids. Activation and inhibition activity of flavonoids was investigated based on their ability to maintain the balance between activation and inactivation in the CaV by binding to the beta subunit receptor of CaV. In this case of activation, it was indicated that activating activity is mainly the outcome of electronic interactions between atomic charges within flavonoids and possible receptor-like structures in the CaV. In the case of inhibition, it was shown that the binding affinities of selected flavonoids to the CaV receptor are highly dependent on physicochemical properties involved in the interactions. Apparently, the electronic properties of the flavonoids were found to be significant after exploring the entire pool of the classical and electronic variables for screening a QSAR model which has thousands of parameters available from experiment and in silico calculations that could potentially serve as independent variables (descriptors) in statistical analysis. However, it has also been known from fitting this QSAR study that utilization of an excessive number of descriptors leads to over-fitting of QSAR models and/or increases the risk of chance correlations. Despite the existence of rules for building successful and meaningful QSAR models, the increasing complexity of biochemical mechanisms the flavonoids on the CaV creates the need for considering a large variety of variables that makes the knowledge-based approach to the identification of the most significant descriptors for this particular case of investigating flavonoids' biochemical activities on the CaV channel extremely difficult. Therefore, this is the main reason to apply PCR to perform reduction of data by generating linear combinations of molecular descriptors [27]. The PCR method identifies correlated variables, groups them into linear combinations, and generates uncorrelated orthogonal variables that are uncorrelated and called principal components. The process of data transformation is given by X = TP T , where X represents the initial data matrix, T is a score matrix that defines the position of data points in a new coordinate system and P is a loadings matrix. The loadings indicate how much each original descriptor contributes to the corresponding PC. Scores and loadings allow the data points to be mapped into the new vector space defined by PCs [28].
The correlation between independent and dependent variables could statistically be determined to fit a PCR line to the data so as to obtain a best-fit equation. Then, the goodness of fit for a PCR equation was estimated by referring to its standard deviation and correlational coefficient in which the level of statistical significance of the PCR equation was represented by the F-statistics with its corresponding p-value. By applying PCR analysis, enough PCs could be used to do variable selection by choosing p-value in order to maximize adjusted r 2 . We notice that the limitation of this study could be only ascribable to 24 compounds, and the predictive power could be less using this small dataset. Indeed, various QSAR studies [29][30][31] have used 10-25 compounds to generate the predictive model that seems to be quite successful. On the other hand, regarding to the results, although it is generally recommended that r 2 should be > 0.7, and q 2 should be > 0.5, these are not stringent guidelines. The predictive power of QSAR should not solely rely on r 2 and q 2 [32]. In particular, we have also cross-compared with the in vivo animal studies (Table 7) for daidzein, quercetin, genistein and naringin [7,[10][11][12]. If necessary, further time-consuming in vivo studies could be done in order to completely validate the model.
In this study the combination of electronic and physicochemical descriptors helped to identify molecular shape, hydrophobicity and electronic properties as three major factors responsible for these types of activation and inhibition activity of flavonoids on the CaV. The use of QSAR in screening the bioactive flavonoids for tissue engineering applications is relatively new. The success of this QSAR modeling in the accurate determination of electronic properties of biochemically significant flavonoids may initiate QSAR studies in tissue engineering that focus specifically on the exploration of bioactive growth factors for cells. QSAR provides an invaluable tool for calculating quantum-chemical descriptors that demonstrate high potential in generating predictive QSAR models without the addition of a large number of descriptors for various groups of growth factors [33,34]. Since osteogenesis is a very dynamic process, other factors that are related to CaV channel such as runx2 activation, ALP secretion, osteocalcin level, angiogenesis and mineralization can also be incorporated if appropriate Table 7 The order of the flavonoids ranked by relative contribution of individual descriptors using pIC 50  in the future study. Nevertheless, cautious should be taken into account for QSAR studies because it is only an approximating method. When many physicochemical properties are involved, it is not always possible to vary one property without affecting another. Moreover, it does not provide an in-depth insight on the mechanism of biological action of flavonoids. Also, there may be some risk of inaccurate predictions of biological activity of this type of flavonoid compounds. Through the QSAR study, we have established a predictive molecular modeling method that allows one to estimate the properties of flavonoids as bioactive compounds at a much lower cost and environmentally-friendly than that of actual laboratory screening. Since both the model's predictive ability and the scientific insights into biochemical activity in the CaV depend on the descriptors selected in the modeling process, this study has indicated that the use of quantum-chemical descriptors under supervised machine-learning has an obvious advantage over other experimentally measured properties. Since they are reproducible in the framework of the chosen approximation, they allow meaningful interpretation of QSAR models in terms of the biochemical mechanism of flavonoids as activator of the CaV. Thus, it can offer a clear guidance for molecule optimization or design of flavonoids as a growth factor for osteogenesis.

Conclusions
This predictive QSAR study confirmed and validated the biochemical activity of the flavonoids in the CaV, such that flavonoids can activate CaV in osteogenesis. Scutellarein was predicted to rank the highest among the screened flavonoids.