In this study, an active contour-based tongue image segmentation method was proposed. First, the tongue image was refined by a saliency window, and the active contour was initialized by taking advantage of the prior knowledge of the tongue image. Second, the tongue area was initialized into two parts: the binary upper part and the level set matrix for the under part. Third, the DGF was proposed to detect the tongue edge and segment the tongue region in the image, such that the geodesic flow was evaluated in the under part, and the Geo-GVF was evaluated in the upper part.

### Tongue region detection

Before the initialization of the active contour, the tongue region in the clinical image was detected. We used the saliency object detector proposed by Feng et al. [23]. This detector used the entire image as the context, which agreed with human intuition. The tongue body was the deep red region in the tongue image, which corresponded to the brightest region in the saliency map (Figure 2, bottom row). The saliency window composition cost function was defined in Feng et al. [23]. In particular, the algorithm detected two windows with the highest saliencies, and the one with the smaller size was adopted for the tongue body segmentation.

### Initialization

#### Contour initialization

Based on the refined tongue image, the active contour was initialized depending on the extracted feature points on the image [19]. As illustrated in Figure 3, the under half contour of the first image was constructed based on three special points: two angular points and the tongue tip. The upper half contour was replaced by straight lines, and another point (tongue root) was added to guarantee that the contour was inside the tongue body. The two angular points were automatically detected by the Harris detector while the tongue tip and root were located by edge detectors. The geodesic curve would shrink to the real boundary when the selected tip point was below the real tongue tip. Once the direction was guaranteed, the curve would not stop until the real boundary was reached.

For some tongue images, part of the patient’s nose might be included in the refined window (Figure 2, third image), thereby influencing the propagation of the proposed Geo-GVF. To solve this problem, we obtained the area of the dark region between the tongue root and the upper lip in the binary image (Figure 3), and assumed that the thickness of the upper lip (its height in the image) was at most twice that of the dark region. We then cut off the additional region above the upper lip. Although this refinement may not be accurate, we focused on the extraction of the tongue body rather than other parts.

Overall, owing to the robust ability of the DGF, only the rough locations of these four points were required to confirm that the direction of the initial curve was maintained during propagation. For the upper half contour, it swelled, while for the under part, it shrank. Any inaccuracy caused by the window detection or contour initialization could be overcome as long as the propagating direction was guaranteed.

#### Region initialization

We divided the whole region into two parts according to the row of angular points (Figure 3, fifth image). In the upper part, the curve over-learned or became stuck at some local minimum points, and we adopted the binary image and solved the problem completely. For the under part, we initialized the level set function with a negative constant inside the contour, but a positive value outside the contour, to keep the geodesic curve shrinking to the real boundary.

### Double geo-vector flow (DGF)

#### Geodesic flow

The geodesic model evolves the curve through the propagation of the level set function, which was introduced by Osher and Sethian [24]. The initial contour is represented by a zero level set function: *C*(*t*) = {(*i,j*) | (Φ)(*t,i, j*) = 0}. The curve propagation equation is:

\frac{{\partial}^{\mathit{\Phi}}}{\partial \mathit{t}}+\mathit{F}\left|\nabla \right.\mathit{\Phi}\left|=0,\right.

(1)

where *F* is the speed function, which depends on the image data and the level set function (Φ). In the geodesic model, the level set function is approximated to the signed distance function (matrix) with |∇*Φ*| = 1. Geodesic flow is introduced as a speed function *F*_{
t
} = (*c* + *κ*), where *κ* is the curvature \mathit{\kappa}=\mathit{div}\left(\frac{\nabla \mathit{\Phi}}{\left|\nabla \mathit{\Phi}\right|}\right) and *c* is a positive constant to keep the geodesic flow positive. The geodesic model can be rewritten as:

\begin{array}{l}\frac{\partial \mathit{\Phi}}{\partial \mathit{t}}=\left|\nabla \mathit{\Phi}\right|\mathit{div}\left(\mathit{g}\left(\mathit{I}\right)\frac{\nabla \mathit{\Phi}}{\left|\nabla \mathit{\Phi}\right|}\right)=\mathit{g}\left(\mathit{I}\right)\left|\nabla \mathit{\Phi}\right|\mathit{\kappa}+\nabla \mathit{g}\left(\mathit{I}\right)\nabla \mathit{\Phi},\\ \phantom{\rule{2em}{0ex}}\mathit{g}=\frac{1}{1+{\left|\nabla {\mathit{G}}_{\mathit{\sigma}}*\mathit{I}\right|}^{2}}\end{array}

(2)

where *I* is an image, and *g*(*I*) is the edge detector. *G*_{
σ
}is a two-dimensional Gaussian function with standard deviation *σ*, and is used to smooth the image noise. When the curve is close to the real boundary, *g*(*I*) is approaching zero.

In the traditional level set method, the signed distance function must be re-initialized every several iterations.

The Gateaux derivative [25] is adopted with the functional \mathit{\u03f5}:\frac{\partial \mathit{\Phi}}{\partial \mathit{t}}=-\frac{\partial \mathit{\u03f5}}{\partial \mathit{\Phi}} to simplify this mode. It transfers the energy minimization to the minimization of the functional *ϵ. ϵ* is defined as its variational formulation: ϵ(Φ) = *μP*(Φ) + *ϵ*_{
m
}(Φ). *P*(Φ) is a metric to characterize how close a function is to a signed distance function: \mathit{P}\left(\mathit{\Phi}\right)={\displaystyle \underset{\mathit{\Omega}}{\int}\frac{1}{2}}{\left(\left|\nabla \mathit{\Phi}\right|-1\right)}^{2}\mathit{dxdy}. By calculating the Gateaux derivative of the functional *ϵ* in (1), the final geodesic model can be obtained [26]:

\begin{array}{l}\frac{{\partial}^{\mathit{\Phi}}}{\partial \mathit{t}}\mathit{\mu}\left[\mathit{\Delta \Phi}-\mathit{div}\left(\frac{\nabla \mathit{\Phi}}{\left|\nabla \mathit{\Phi}\right|}\right)\right]+\mathit{\lambda \delta}\left(\mathit{\Phi}\right)\mathit{div}\left(\mathit{g}\left(\mathit{I}\right)\frac{\nabla \mathit{\Phi}}{\left|\nabla \mathit{\Phi}\right|}\right)\\ \phantom{\rule{1em}{0ex}}+\mathit{vg\delta}\left(\mathit{\Phi}\right),\end{array}

(3)

where Δ is the Laplacian operator.

Since the direction of the geodesic flow is fixed, we need the under half curve to shrink. The signed distance function needs to be initialized as a *row × col* matrix (where *row* and *col* denote the image row and column, respectively), with a negative constant (*ρ* > 0) inside the contour (red region) and a positive value outside the contour (Figure 3):

{\mathit{\Phi}}_{\mathit{un}}=\left\{\begin{array}{ccc}\hfill -\mathit{\rho},\left(\mathit{i},\mathit{j}\right)\hfill & \hfill \in {\text{\Omega}}_{\mathit{un}}\hfill & \hfill -\partial {\text{\Omega}}_{\mathit{un}}\hfill \\ \hfill 0,\hfill & \hfill \left(\mathit{i},\mathit{j}\right)\hfill & \hfill \in \partial {\text{\Omega}}_{\mathit{un}}\hfill \\ \hfill \mathit{\rho},\hfill & \hfill \text{\Omega}\hfill & \hfill -{\text{\Omega}}_{\mathit{un}}\hfill \end{array}\right.,

(4)

Supposing Ω_{
un
} is a subset in the image domain Ω (Figure 3), then ∂Ω_{
un
} denotes all the points on the boundaries of Ω_{
un
}. *ρ* > 0 is a constant for initializing the function ΦΩ_{
un
}. In this way, the direction of Ω_{
un
} points to the negative side, or the inside of the contour. Thereby, the curve will shrink during propagation:

\begin{array}{ll}\frac{\partial {\text{\Phi}}_{\mathit{un}}}{\partial \mathit{t}}=& \mathit{\mu}\left[\mathit{\Delta}{\text{\Phi}}_{\mathit{un}}-\mathit{div}\left(\frac{\nabla {\text{\Phi}}_{\mathit{un}}}{\left|\nabla {\text{\Phi}}_{\mathit{un}}\right|}\right)\right]\\ +\mathit{\lambda \delta}\left({\text{\Phi}}_{\mathit{un}}\right)\mathit{div}\left(\mathit{g}\left(\mathit{I}\right)\frac{\nabla {\text{\Phi}}_{\mathit{un}}}{\left|\nabla {\text{\Phi}}_{\mathit{un}}\right|}\right)\\ +\mathit{vg\delta}\left({\text{\Phi}}_{\mathit{un}}\right).\end{array}

(5)

For numerical implementation, since the propagating result is used for evaluating the GVF, we only need the under part to shrink. *Φ*_{
un
}^{k} is the *k* th iterated level set matrix, with its element being *ϕ*_{i,j}, and then its *k* + 1th iterated form is:

{\mathit{\varphi}}_{\mathit{i},\mathit{j}}^{\mathit{k}+1}=\left\{\begin{array}{c}\hfill {\mathit{\varphi}}_{\mathit{i},\mathit{j}}^{0}\hfill \\ \hfill {\mathit{\varphi}}_{\mathit{i},\mathit{j}}^{\mathit{k}}+\mathit{\tau S}\left({\mathit{\varphi}}_{\mathit{i},\mathit{j}}^{\mathit{k}}\right),\hfill \end{array}\right.\begin{array}{c}\hfill 0<\mathit{j}<\mathit{sline}\hfill \\ \hfill \mathit{sline}<\mathit{j}<\mathit{row}\hfill \end{array},

(6)

where *S*(*ϕ*_{i,j}^{k}) corresponds to the right hand side in (5) with *τ* being the stepsize controller. *sline* denotes the partition line between the two parts, and is the row number of the angular point. The intermediate propagating result is shown in Figure 4.

#### Geo-gradient vector flow (Geo-GVF)

The traditional parametric ACM is sensitive to the contour initialization, and the real boundary can only be reached when the initial contour is close to it [14]. In a different manner, the GVF obtains a continuous gradient space by solving a set of vector equations of thermal expansion [27]. Local gradient vectors are not only determined by themselves, but also by their neighboring gradients, and the GVF can overcome the recesses and obstacles during its evolution and converge to the real boundary from a far distance. It is defined as follows:

\mathit{E}\left({\mathit{V}}_{\mathit{I}}\right)=\mathit{\omega}\left(\left|\nabla \mathit{I}\right|\right){\nabla}^{2}{\mathit{V}}_{\mathit{I}}-\mathit{h}\left(\left|\nabla \mathit{B}\right|\right)\left({\mathit{V}}_{1}-\nabla \mathit{I}\right),

(7)

where *V*_{
I
} is a two-dimensional vector of the GVF, and (*i,j*) *V*_{
I
} (*p*) = (*u*_{
I
} (*p*), *v*_{
I
} (*p*)), *p* = (*i,j*). *I* is an image, and ∇*I* points to the image edge. ^{ω} (|∇*I*|) ∇^{2}*V*_{
I
} is the smoothing term. According to this objective function, when ∇*I* is close to zero, the area is nearly a constant, and *E*(*V*_{
I
}) is mainly dominated by the partial derivatives of the vector field. When the curve is close to the real edge, ∇*I* becomes large. The second item dominates the evolution and drives the curve to the real boundary until *V*_{
I
} – ∇*I*.

Despite the superior performance of the GVF, over-learning can easily occur during the curve evolution [15–17]. To solve this problem, we replace the gray image *I* with the binary image *B* (Figure 3) in (7):

\mathit{E}\left({\mathit{V}}_{\mathit{B}}\right)=\mathit{\omega}\left(\left|\nabla \mathit{B}\right|\right){\nabla}^{2}{\mathit{V}}_{\mathit{B}}-\mathit{h}\left(\left|\nabla \mathit{B}\right|\right)\left({\mathit{V}}_{\mathit{B}}-\nabla \mathit{B}\right).

(8)

The gradient vectors are a constant value on either the side of the dark region or the tongue root, and the tractive forces toward the two sides are the same, which prevents the curve from over-learning and ensures its accurate convergence. The weighting function *ω* ( • ) is chosen as a constant *ω* (∇*B*) = *w*, indicating that the whole image is being smoothed. *h* ( • ) is chosen as *h* (|∇*B*|) = |∇*B*|^{2}, which is related to the strength of the edge.

Based on the above, the active contour in the binary GVF model is calculated according to the following equation:

\frac{\partial {\mathit{V}}_{\mathit{B}}\left(\mathit{p}\left(\mathit{x},\mathit{y}\right),\mathit{t}\right)}{\partial \mathit{t}}=\mathit{w}{\nabla}^{2}{\mathit{V}}_{\mathit{B}}-{\left|\nabla \mathit{B}\right|}^{2}\left({\mathit{V}}_{\mathit{B}}-\nabla \mathit{B}\right).

(9)

It evolves in the upper part, where *y* < *sline*, *y* is the number of rows in the image, and *sline* is the separated line between the two parts.

Referring to (6), the geodesic flow only worked for the under part, and the curve in the upper part held on. In the proposed binary GVF, there was no force in the under part. The geodesic flow ensured the convergence of the under curve, whereas the gradient flow ensured the convergence of the upper curve. In real implementation, after the geodesic propagation, we obtained the final level set function, which provided the contour information inside. To unpack it, we calculated the one-dimensional Dirac measure *δ*_{0} of (6). The active contour is denoted by *C*(*p*), where *p* = *p*(*i*, *j*) denotes the image coordinates:

\mathit{C}\left(\mathit{p}\right)={\mathit{\delta}}_{0}\left({\text{\Phi}}_{\mathit{un}}\left(\mathit{p}\right)\right)=\frac{\mathit{d}}{\mathit{dp}}{\text{\Phi}}_{\mathit{un}}\left(\mathit{p}\right),\mathit{p}=\mathit{p}\left(\mathit{i},\mathit{j}\right).

(10)

Taking *C* (*i, j*) as the initial active contour of the GVF, we obtain the propagating equation [27]:

\mathit{C}\left(\mathit{p},\mathit{t}\right)=\mathit{\alpha}{\mathit{C}}^{\text{'}\text{'}}\left(\mathit{p},\mathit{t}\right)-\mathit{\beta}{\mathit{C}}^{\text{'}\text{'}\text{'}\text{'}}\left(\mathit{p},\mathit{t}\right)+{\mathit{V}}_{\mathit{B}}.

(11)

In real implementation, considering that the initial contour in the upper part is too far to propagate to the real boundary, a geometric term {\mathit{G}}_{\mathit{B}}={\mathit{g}}_{\mathit{B}}\times \overrightarrow{\mathit{n}} is introduced to (11) to help the initial curve quickly converge to the neighborhood of the real boundary, like (2): {\mathit{g}}_{\mathit{B}}=\frac{1}{1+{\left|\nabla {\mathit{G}}_{\mathit{\sigma}}*\mathit{B}\right|}^{2}}, while \overrightarrow{\mathit{n}} denotes the outward normal direction of the curve:

\mathit{C}\left(\mathit{p},\mathit{t}\right)=\mathit{\alpha}{\mathit{C}}^{\text{'}\text{'}}\left(\mathit{p},\mathit{t}\right)-\mathit{\beta}{\mathit{C}}^{\text{'}\text{'}\text{'}\text{'}}\left(\mathit{p},\mathit{t}\right)+\mathit{max}\left({\mathit{V}}_{\mathit{B}},{\mathit{G}}_{\mathit{B}}\right).

(12)

When the curve is far away from the real boundary, *V*_{
B
} is nearly zero, while *g*_{
B
} is close to 1, and the curve is propagating as illustrated in Figure 4. When the curve is close to the tongue edge, *V*_{
B
} will dominate the propagation and adjust the curve to the real boundary of the tongue body. This new scheme is called the Geo-GVF.

The DGF was implemented in MATLAB, and the source codes can be downloaded from Additional file 1. We initialized the tongue image by adopting the saliency object detection. The region of the tongue body was refined by a rectangular window (Figure 2, top row). The contour initialization used the special shape of the tongue body and its location on the face. The parameter settings were *μ* = 1, *λ* = 3, and *ν* = 0*.* 5 for the geodesic flow. *μ* was the coefficient of the internal (penalizing) energy term \text{\Delta}{\text{\Phi}}_{\mathit{un}}-\mathit{div}\left(\frac{\nabla {\mathit{\Phi}}_{\mathit{un}}}{\left|\nabla {\mathit{\Phi}}_{\mathit{un}}\right|}\right), while *λ* represented the coefficient of the weighted length term \mathit{\delta}{\text{\Phi}}_{\mathit{un}}\mathit{div}\left(\mathit{g}\left(\mathit{I}\right)\frac{\nabla {\mathit{\Phi}}_{\mathit{un}}}{\left|\nabla {\mathit{\Phi}}_{\mathit{un}}\right|}\right). *ν* denoted the coefficient of the weighted area term *gδ*(*Φ*_{
un
}), and generally, a small *ν* was chosen. For the Geo-GVF, its viscosity parameters were *α* = 1, *β* = 1. Some examples of the tongue segmentations are shown in Figure 5.

We compared the DGF with our previous work C^{2}G^{2}FSnake [19], Zhai et al. [17], and Fu et al. [15]. Zhai et al. [17] adopted the H-channel as an initial contour for the outside Snake contour, and a double snake was carried out afterwards. The inside contour kept propagating as long as it did not meet the outside contour. The outside contour kept the inside curve close to the real boundary without over-learning as a barrier. Fu et al. [15] introduced AdaBoost to locate the tongue body in the human face, and then eliminated the H-channel, which was regarded as the same as the lip and skin color. The initial contour was obtained close to the real boundary through a polar edge detector, and the Snake was evolved afterwards.

### Dataset evaluation and error measurements

The clinical tongue image database was provided by Shanghai University of TCM, and categorized by tongue color as follows: light white tongue (tinge, light white); red tongue (red, deep red, deep purple); purple tongue (heliotrope, pompadour, heliotrope, purple, modena); and carmoisine tongue. There were four classes and ten branches. Ten images were chosen for each branch, giving 100 images in total. Some examples are shown in Figure 2. The entire dataset can be downloaded from Additional file 2.

This study has been approved by the Shanghai society of medical ethics. All the patients have signed the informed consent form.

Boundary error metrics and area error metrics were used to evaluate the performance of the tongue image segmentation. For the boundary error metrics, the Hausdorff distance (HD) and the mean distance (MD) to the closest point were adopted to measure the errors between the proposed automatic segmentation result and the manual contour [19]. We used the normalized terms of the two distances (norm.HD and norm.MD) as the boundary error metrics. For the area error metrics, referring to Shi et al. [19], the false-positive (FP) volume fraction, false-negative (FN) volume fraction, and true-positive (TP) volume fraction were adopted.