Computerized tongue image segmentation via the double geo-vector flow

Background Visual inspection for tongue analysis is a diagnostic method in traditional Chinese medicine (TCM). Owing to the variations in tongue features, such as color, texture, coating, and shape, it is difficult to precisely extract the tongue region in images. This study aims to quantitatively evaluate tongue diagnosis via automatic tongue segmentation. Methods Experiments were conducted using a clinical image dataset provided by the Laboratory of Traditional Medical Syndromes, Shanghai University of TCM. First, a clinical tongue image was refined by a saliency window. Second, we initialized the tongue area as the upper binary part and lower level set matrix. Third, a double geo-vector flow (DGF) was proposed to detect the tongue edge and segment the tongue region in the image, such that the geodesic flow was evaluated in the lower part, and the geo-gradient vector flow was evaluated in the upper part. Results The performance of the DGF was evaluated using 100 images. The DGF exhibited better results compared with other representative studies, with its true-positive volume fraction reaching 98.5%, its false-positive volume fraction being 1.51%, and its false-negative volume fraction being 1.42%. The errors between the proposed automatic segmentation results and manual contours were 0.29 and 1.43% in terms of the standard boundary error metrics of Hausdorff distance and mean distance, respectively. Conclusions By analyzing the time complexity of the DGF and evaluating its performance via standard boundary and area error metrics, we have shown both efficiency and effectiveness of the DGF for automatic tongue image segmentation.


Background
Simple, non-invasive, and inexpensive visual inspection of the human tongue has been a unique diagnostic method of traditional Chinese medicine (TCM) [1], through observing any abnormal changes in the tongue properties and coating. Clinical studies have suggested relationships between visceral cancers, heart diseases, and abnormalities of the tongue and its coating [2][3][4]. However, the current practice in TCM is mainly subjective, and the quality of the visual inspection varies among medical professionals. Thus, it is beneficial to devise objective quantitative evaluation methods for the color, texture, and surface of the tongue and define their relationships with patients' health conditions [5,6].
Recently, the development of an automated digital tongue diagnostic system was attempted. Chiu [7] built a computerized tongue examination system, in which the colors of the tongue and the thickness of its coating were identified by a proposed chromatic algorithm. Cai [8] and Li and Cai [9] constructed a digital imaging system to capture tongue images and extract various features for tongue analysis. Zhang et al. [10] demonstrated a novel computer-aided tongue diagnosis system, in which a standard data acquisition device and a new color correction method were used to capture tongue images. Some other computerized systems, such as the tongue computing model, computerized system for tongue diagnosis, and automatic tongue diagnosis system, were also built in [11][12][13], of which Gao et al. [12] established a mapping relationship between quantitative tongue features and diseases via the support vector machine and obtained promising performances.
A general computerized tongue diagnostic system is shown in Figure 1. This system mainly executes four tasks: tongue image acquisition; tongue image segmentation; tongue feature extraction; and tongue image classification. Figure 1 shows the tongue body segmentation as a bridge that combines the acquisition of the tongue image with the feature extraction. Tongue features, such as color, texture, coating, and shape, create difficulties for accurate segmentation of tongue images. Recent studies have mainly focused on the active contour model (ACM), or "Snake", which evolves a curve by minimizing a specifically defined energy function [14]. The curve is initialized manually or automatically, and propagated to the real boundary on both the internal and external forces [15].
Studies using the ACM have included the general gradient vector flow [16], double snake [17], level set [18], and C 2 G 2 FSnake [19]. Owing to the special characteristics of the tongue color, different color channels have been embedded into these segmentation methods, such as region growing [20], fuzzy C-means [21], and shortest path [22].
In our previous work [19], C 2 G 2 FSnake was proposed as a novel approach for automatic tongue segmentation, using prior location of the tongue body for active contour initialization. It embedded the color space information into the active contour evolution, and the segmentation precision was thus enhanced. Nevertheless, there remained difficulties associated with background removal in clinical images ( Figure 2) and parameter selection.
Regarding the background removal, a saliency object detector [23] was adopted to refine the clinical tongue image. This was an unsupervised method that was easier to implement compared with the method of Fu et al. [15]. For the parameter selection, a novel computerized tongue image segmentation method named the double geo-vector flow (DGF) was proposed based on the unique shape of the tongue body and its location on the face. The saliency window was refined and the active contour was initialized for the upper and under parts accordingly. The geodesic flow first propagated the under part, the geo-gradient vector flow (Geo-GVF) then took over the work, and the upper curve was evolved to the real boundary. Based on its ability to split, the geodesic flow showed superior flexibility compared with the parameterized model used by Shi et al. [19], and the geometric term in the gradient vector flow (GVF) speeded up the curve propagation for the upper part and maintained its accuracy.
The tongue images were captured in a closed darkroom that shielded against light disturbance from the outside, and the position of the camera was flexible and adjustable, thus avoiding color distortion caused by light interference. A clinical tongue image database has already been collected by Shanghai University of TCM.
This study aims to devise an objective approach and quantitative model to evaluate automatic tongue segmentation in a computerized tongue diagnostic system.

Methods
In this study, an active contour-based tongue image segmentation method was proposed. First, the tongue image was refined by a saliency window, and the active contour was initialized by taking advantage of the prior knowledge of the tongue image. Second, the tongue area was initialized into two parts: the binary upper part and the level set matrix for the under part. Third, the DGF was proposed to detect the tongue edge and segment the tongue region in the image, such that the geodesic flow was evaluated in the under part, and the Geo-GVF was evaluated in the upper part.

Tongue region detection
Before the initialization of the active contour, the tongue region in the clinical image was detected. We used the saliency object detector proposed by Feng et al. [23]. This detector used the entire image as the context, which agreed with human intuition. The tongue body was the deep red region in the tongue image, which corresponded to the brightest region in the saliency map ( Figure 2, bottom row). The saliency window composition cost function was defined in Feng et al. [23]. In particular, the algorithm detected two windows with the highest saliencies, and the one with the smaller size was adopted for the tongue body segmentation.

Contour initialization
Based on the refined tongue image, the active contour was initialized depending on the extracted feature points on the image [19]. As illustrated in Figure 3, the under half contour of the first image was constructed based on three special points: two angular points and the tongue tip. The upper half contour was replaced by straight lines, and another point (tongue root) was added to guarantee that the contour was inside the tongue body. The two angular points were automatically detected by the Harris detector while the tongue tip and root were located by edge detectors. The geodesic curve would shrink to the real boundary when the selected tip point was below the real tongue tip. Once the direction was guaranteed, the curve would not stop until the real boundary was reached.
For some tongue images, part of the patient's nose might be included in the refined window ( Figure 2, third image), thereby influencing the propagation of the proposed Geo-GVF. To solve this problem, we obtained the area of the dark region between the tongue root and the upper lip in the binary image ( Figure 3), and assumed that the thickness of the upper lip (its height in the image) was at most twice that of the dark region. We then cut off the additional region above the upper lip. Although this refinement may not be accurate, we focused on the extraction of the tongue body rather than other parts.
Overall, owing to the robust ability of the DGF, only the rough locations of these four points were required to confirm that the direction of the initial curve was maintained during propagation. For the upper half contour, it swelled, while for the under part, it shrank. Any inaccuracy caused by the window detection or contour initialization could be overcome as long as the propagating direction was guaranteed.

Region initialization
We divided the whole region into two parts according to the row of angular points ( Figure 3, fifth image). In the upper part, the curve over-learned or became stuck at some local minimum points, and we adopted the binary image and solved the problem completely. For the under Figure 3 Tongue image initialization. From left to right: first, the four special tongue points corresponding to the first image in Figure 2; second, rescaling of the refined tongue image depending on the special tongue points; third, contour initialization; fourth, the level set function (matrix) is initialized for the under part of the tongue image with a negative constant (ρ > 0) inside the contour (red region) and a positive value outside the contour; fifth, image denoting the region initialization with the binary image for the upper part. The geodesic model evolves the curve through the propagation of the level set function, which was introduced by Osher and Sethian [24]. The initial contour is represented by a zero level set function: The curve propagation equation is: where F is the speed function, which depends on the image data and the level set function (Φ). In the geodesic model, the level set function is approximated to the signed distance function (matrix) with |∇Φ| = 1. Geodesic flow is introduced as a speed function and c is a positive constant to keep the geodesic flow positive. The geodesic model can be rewritten as: where I is an image, and g(I) is the edge detector. G σ is a two-dimensional Gaussian function with standard deviation σ, and is used to smooth the image noise. When the curve is close to the real boundary, g(I) is approaching zero.
In the traditional level set method, the signed distance function must be re-initialized every several iterations.
The Gateaux derivative [25] is adopted with the functional ε: ∂Φ ∂t ¼ − ∂ε ∂Φ to simplify this mode. It transfers the energy minimization to the minimization of the functional ε. ε is defined as its variational formulation: is a metric to characterize how close a function is to a signed distance function: By calculating the Gateaux derivative of the functional ε in (1), the final geodesic model can be obtained [26]: where Δ is the Laplacian operator.
Since the direction of the geodesic flow is fixed, we need the under half curve to shrink. The signed distance function needs to be initialized as a row × col matrix (where row and col denote the image row and column, respectively), with a negative constant (ρ > 0) inside the contour (red region) and a positive value outside the contour (Figure 3): Supposing Ω un is a subset in the image domain Ω (Figure 3), then ∂Ω un denotes all the points on the boundaries of Ω un . ρ > 0 is a constant for initializing the function ΦΩ un . In this way, the direction of Ω un points to the negative side, or the inside of the contour. Thereby, the curve will shrink during propagation: For numerical implementation, since the propagating result is used for evaluating the GVF, we only need the under part to shrink. Φ un k is the kth iterated level set matrix, with its element being ϕ i,j , and then its k + 1th iterated form is: where S(ϕ i,j k ) corresponds to the right hand side in (5) with τ being the stepsize controller. sline denotes the partition line between the two parts, and is the row number of the angular point. The intermediate propagating result is shown in Figure 4.

Geo-gradient vector flow (Geo-GVF)
The traditional parametric ACM is sensitive to the contour initialization, and the real boundary can only be reached when the initial contour is close to it [14]. In a different manner, the GVF obtains a continuous gradient space by solving a set of vector equations of thermal expansion [27]. Local gradient vectors are not only determined by themselves, but also by their neighboring gradients, and the GVF can overcome the recesses and obstacles during its evolution and converge to the real boundary from a far distance. It is defined as follows: where V I is a two-dimensional vector of the GVF, and (i,j) V I (p) = (u I (p), v I (p)), p = (i,j). I is an image, and ∇I points to the image edge. ω (|∇I|) ∇ 2 V I is the smoothing term. According to this objective function, when ∇I is close to zero, the area is nearly a constant, and E(V I ) is mainly dominated by the partial derivatives of the vector field. When the curve is close to the real edge, ∇I becomes large. The second item dominates the evolution and drives the curve to the real boundary until V I -∇I.
Despite the superior performance of the GVF, overlearning can easily occur during the curve evolution [15][16][17]. To solve this problem, we replace the gray image I with the binary image B (Figure 3) in (7): The gradient vectors are a constant value on either the side of the dark region or the tongue root, and the tractive forces toward the two sides are the same, which prevents the curve from over-learning and ensures its accurate convergence. The weighting function ω ( • ) is chosen as a constant ω (∇B) = w, indicating that the whole image is being smoothed. h ( • ) is chosen as h (|∇B|) = |∇B| 2 , which is related to the strength of the edge.
Based on the above, the active contour in the binary GVF model is calculated according to the following equation: It evolves in the upper part, where y < sline, y is the number of rows in the image, and sline is the separated line between the two parts.
Referring to (6), the geodesic flow only worked for the under part, and the curve in the upper part held on. In the proposed binary GVF, there was no force in the under part. The geodesic flow ensured the convergence of the under curve, whereas the gradient flow ensured the convergence of the upper curve. In real implementation, after the geodesic propagation, we obtained the final level set function, which provided the contour information inside. To unpack it, we calculated the onedimensional Dirac measure δ 0 of (6). The active contour is denoted by C(p), where p = p(i, j) denotes the image coordinates: Taking C (i, j) as the initial active contour of the GVF, we obtain the propagating equation [27]: In real implementation, considering that the initial contour in the upper part is too far to propagate to the real boundary, a geometric term G B ¼ g B Â n → is introduced to (11) to help the initial curve quickly converge to the neighborhood of the real boundary, like (2): while n → denotes the outward normal direction of the curve: When the curve is far away from the real boundary, V B is nearly zero, while g B is close to 1, and the curve is propagating as illustrated in Figure 4. When the curve is close to the tongue edge, V B will dominate the propagation and adjust the curve to the real boundary of the tongue body. This new scheme is called the Geo-GVF.
The DGF was implemented in MATLAB, and the source codes can be downloaded from Additional file 1. We initialized the tongue image by adopting the saliency object detection. The region of the tongue body was refined by a rectangular window (Figure 2, top row). The contour initialization used the special shape of the tongue body and its location on the face. The parameter settings were μ = 1, λ = 3, and ν = 0.5 for the geodesic flow. μ was the coefficient of the internal (penalizing) energy term ΔΦ un −div ∇Φ un ∇Φ un j j , while λ represented the coefficient of the weighted length term δΦ un div g I ð Þ ∇Φ un ∇Φ un j j . ν denoted the coefficient of the weighted area term gδ(Φ un ), and generally, a small ν was chosen.
For the Geo-GVF, its viscosity parameters were α = 1, β = 1. Some examples of the tongue segmentations are shown in Figure 5.
We compared the DGF with our previous work C 2 G 2 FSnake [19], Zhai et al. [17], and Fu et al. [15]. Zhai Figure 5 Examples of tongue image segmentation. Top row: refined tongue images and final propagating results of the proposed DGF (white curves); second row: corresponding segmentation results of the first row; third row: segmentation results of C 2 G 2 FSnake [19]; fourth row: segmentation results of double snakes [17]. et al. [17] adopted the H-channel as an initial contour for the outside Snake contour, and a double snake was carried out afterwards. The inside contour kept propagating as long as it did not meet the outside contour. The outside contour kept the inside curve close to the real boundary without over-learning as a barrier. Fu et al. [15] introduced AdaBoost to locate the tongue body in the human face, and then eliminated the Hchannel, which was regarded as the same as the lip and skin color. The initial contour was obtained close to the real boundary through a polar edge detector, and the Snake was evolved afterwards.

Dataset evaluation and error measurements
The clinical tongue image database was provided by Shanghai University of TCM, and categorized by tongue color as follows: light white tongue (tinge, light white); red tongue (red, deep red, deep purple); purple tongue (heliotrope, pompadour, heliotrope, purple, modena); and carmoisine tongue. There were four classes and ten branches. Ten images were chosen for each branch, giving 100 images in total. Some examples are shown in Figure 2. The entire dataset can be downloaded from Additional file 2.
This study has been approved by the Shanghai society of medical ethics. All the patients have signed the informed consent form.
Boundary error metrics and area error metrics were used to evaluate the performance of the tongue image segmentation. For the boundary error metrics, the Hausdorff distance (HD) and the mean distance (MD) to the closest point were adopted to measure the errors between the proposed automatic segmentation result and the manual contour [19]. We used the normalized terms of the two distances (norm.HD and norm.MD) as the boundary error metrics. For the area error metrics, referring to Shi et al. [19], the false-positive (FP) volume fraction, false-negative (FN) volume fraction, and truepositive (TP) volume fraction were adopted.

Results and discussion
In this paper, a computerized tongue image segmentation approach, the DGF, was proposed for tongue diagnosis. We adopted a saliency object detector to refine the clinical tongue image. By taking advantage of the prior knowledge of the tongue image, the DGF was evaluated in the under and upper parts of the image, respectively. The two curves in the DGF precisely extracted the tongue region.

Color map and geometric term
The tongue color map has been highly analyzed for various types of information in tongue diagnosis [28]. In this section, we analyzed the color map in the tongue segmentation. The H-channel, gray map, and binary map were used for different active contour propagations. Comparisons are shown in Figures 6 and 7. We also

H-channel map
In our study, we found that the H-channel map was not always beneficial for tongue image segmentation. Figure 6 shows the geodesic propagating results for both the gray map (left image) and H-channel map (right image). If the face color in its H-channel was close to the tongue color, the edge detector g H ¼ 1 1þ ∇GσÃH j j 2 (H denotes the H-channel map) could not distinguish the tongue body from the image.

Binary map
In this study, we initialized the upper part as a binary image (Figure 3). In this manner, ∇B (8) was a constant at both sides of the tongue root, the tractive forces from the two sides were the same, and no over-learning or local minima occurred. We compared the Geo-GVF propagating results in both the gray map and the binary map ( Figure 7). It can be seen that, if the binary map was not used, local minima would occur (right image).

Geometric term
Referring to (12), when the curve was far away from the real boundary, V B was nearly zero, and we adopted the geometric term G B to propagate the curve along its outward direction. Meanwhile, when the curve was close to the tongue edge, V B dominated the propagation and adjusted the curve to the real boundary of the tongue body. If we did not adopt the geometric term during the propagation of the GVF (Figure 8, right image), the initial contour in the upper part was too far, and no external force V B in (11) could be found around the initial contour.

Overall comparison
In this section, the DGF was compared within the dataset with different classes. The overall performances were presented afterwards in comparison with other representative studies [15,17,19].

Class variation
We have summarized the detailed segmentation results with respect to each tongue class in Table 1. The results showed that the performance of the DGF was inferior for the red tongue class compared with the other classes. This arose because the tongue color was close to the lip color in the red tongue class, making it hard for the DGF to converge to the real boundary, and leading to possible over-learning during the curve propagation. However, in the light white and carmoisine tongue classes, the tongue color was clearly distinct from the lip color, and a deep valley in the level set space was formed between them to prevent the curve propagating further, thereby providing superior performances in these two classes. Despite the slight differences in these results, the performances were good in the four classes, suggesting the scalability and effectiveness of the proposed DGF.

Comparative experiments
Comparative tests using the tongue image database in this study are shown in Figures 5 and 9. In Figure 5, we show visual comparisons of the DGF and two previous studies [17,19]. Specifically, the top row denotes the refined tongue images and the final propagating curves of the DGF (white curves), the second row corresponds to the segmentation results of the first row, the third row denotes the segmentation results of C2G2FSnake [19], and the fourth row denotes the segmentation results  using double snakes [17]. Compared with Zhai et al. [17] and Shi et al. [19], the tongue edge extracted by the DGF was smoother and more accurate. The DGF exhibited a better performance on almost every metric except for Norm.MD%, which measured the mean distance between the edges obtained automatically and the standard tongue body boundaries obtained manually (Figure 9), and was affected when the point-topoint distance was calculated. The performance for Norm.MD% was weaker because, during the propagation of the geodesic active contour, some local minimum points needed to be overcome. Consequently, the curve split and merged again, which left unnecessary tails in the final result ( Figure 5, marked by the red frame in the first image).
In C 2 G 2 FSnake [19], color space information was introduced to control both the geometrical model and the parameterized GVF model to realize a seamless switch from rough evolution to refined evolution. However, the parameters for different tongue color categories had to vary to adapt to the color contrast between the tongue body and the facial skin. Besides, for some special images, the active contours were initialized incorrectly for the evolution of C 2 G 2 FSnake. In this study, the geodesic model could propagate to the real boundary by splitting itself. 1 The Key Laboratory of Machine Perception (MOE), Peking University, Beijing, P.R., China. 2  Error Metric Chart 1 Figure 9 Tongue image segmentation results. The smaller Norm.HD%, Norm.MD%, FN%, and FP% are, the better they will be. The larger TP% is, the better it will be.