Efficacy and safety of traditional Chinese medicines for non-alcoholic fatty liver disease: a systematic literature review of randomized controlled trials

Background Non-alcoholic fatty liver disease (NAFLD) is a common liver disease that may progress into, in the absence of proper treatment, severe liver damage. While the optimal pharmacotherapy for NAFLD remains uncertain and the adherence to lifestyle interventions is challenging, the use of herbal medicines such as traditional Chinese medicines (TCMs) to manage the condition is common. The evidence about TCMs in the management of NAFLD is continuously developing through randomized controlled trials (RCTs). This study aims to identify and evaluate the emerging evidence about the efficacy and safety of TCMs for NAFLD. Methods A systematic literature search was conducted to identify RCTs which investigated TCMs in the management of NAFLD published in 6 electronic databases including PubMed, the Cochrane Library, EMBASE, Web of Science, Scopus and China National Knowledge Infrastructure since inception to September 2020. RCTs comparing TCMs with no treatment, placebo, non-pharmacological and/or pharmacological interventions were included irrespective of language or blinding. The quality of reporting was evaluated using the Consolidated Standards of Reporting Trials Statement extensions for Chinese herbal medicine Formulas (CONSORT-CHM). Risk-of-bias for each study was assessed using the Cochrane risk of bias tool. Results A total of 53 RCTs involving 5997 participants with NAFLD were included in this review. Each included RCT tested a different TCMs giving a total of 53 TCMs identified in this study. Based on the evaluation of the RCT results, TCMs might have various beneficial effects such as improving TCM syndrome score, liver function, and body lipid profile. A range of non-serious, reversible adverse effects associated with the use of TCMs were also reported. However, no conclusion about the efficacy and safety of TCMs in NAFLD can be made. The quality of reporting was generally poor and the risks of bias was mostly uncertain in all trials. Conclusions There is some evidence from RCTs that supported the effectiveness and safety of TCMs for NAFLD. However, no conclusive recommendations can be made due to the questionable quality of the RCTs. Improvement in the RCT protocol, the use of a larger sample size, a setting of multicenter, and a more focused approach in selecting TCMs are recommended for developing high quality evidence about the use of TCMs in managing NAFLD.


Background
Non-alcoholic fatty liver disease (NAFLD) is a common chronic liver disease that refers to liver steatosis in the absence of significant alcohol consumption, use of susceptible medication, or other pre-existing liver condition or infections that result in fat accumulation [1]. Metabolic risk factors are the majority causes of the NAFLD, such as obesity, diabetes mellitus, and dyslipidemia [1]. The global prevalence of NAFLD was estimated at 9.7% and an increasing trend had been shown in recent years [2]. NAFLD has a wide spectrum of liver diseases ranging from simple liver steatosis to more advanced forms, such as nonalcoholic steatohepatitis, progressive fibrosis and cirrhosis [3]. This progression may be slowed or even reversed with proper management. However, at present, the optimal pharmacotherapy for NAFLD remains uncertain as no pharmacological agent has been officially approved for treating NAFLD [4]. The best recommendations for NAFLD patients were lifestyle improvement and physical exercise [5,6]. Lifestyle recommendation such as dietary restriction and enhanced physical activity have shown to beneficial in improving hepatic pathologic syndrome and reducing hepatic fat accumulation. However, adherence to lifestyle changes has been shown to be challenging. In the absence of proper management, more serious inflammation and degeneration of liver cells may result leading to irreversible liver injury and thus increasing the risks of hepatocellular carcinoma development [7].
In light of limited treatment options for NAFLD, herbal medicines, with traditional Chinese medicines (TCMs) in particular, have become an increasingly common healthcare choice for patients to manage or even treat the disease [8]. TCMs has a long history of use to treat liver disease in China. According to the Chinese traditional medical theory, NAFLD may result when Gancollaterals are obstructed by the dysfunction of Gan-qi catharsis and Pi transportation, internal accumulation of dampness-heat and dirty phlegm, and blocking of blood stasis [9]. Therefore, TCMs are used mainly to channel Gan-qi, promote blood circulation, reduce phlegm and cleanse unclean elements. In modern science, preclinical and clinical studies suggested that TCMs may be an effective measure for NAFLD due to the beneficial effects on fatty acid metabolism that improves lipid metabolic parameters such as decreased levels of triglycerides, total cholesterol and low-density lipoprotein, alanine aminotransferase, aspartate aminotransferase and increased production of high-density lipoprotein [4]. Such TCMs often contain complex constituents with multiple pharmacological activities at various targets. For instance, Qushi Huayu decoction, a well-known TCM formula consisting of at least 5 medicinal plants, has been reported to effectively reverse elevated levels of free fatty acid and total triglycerides, and improve hepatic steatosis and inflammation through multiple signaling pathways [10,11]. At the same time, TCMs, like all pharmacological agents, may potentially have the risks of adverse effects and toxicity which should also be determined and taken into consideration when deciding on the treatment option for NAFLD [12].
In recent years, increasing number of RCTs have been conducted to investigate the benefits and risks of TCMs in treating NAFLD. A systematic review conducted in 2013 assessed the benefits and risks of herbal medicines (including TCMs) for people with NAFLD and reported that some of the herbal preparations investigated seemed to have positive efforts on selected clinical indicators without inducing increased risks of adverse effects compared to the control groups [12]. Subsequent studies further suggested various underlying mechanisms through which herbal medicines might prevent NAFLD [4,13]. The evidence from RCTs or other studies about TCMs for NAFLD is mounting but, nevertheless, inconclusive if not conflicting. To keep track of the emerging evidence about TCMs in NAFLD, there needs to be continuous efforts to critically appraise the evidence about the efficacy and adverse effects of TCMs. Therefore, the study aimed to conduct a systematic review of RCTs which investigated the use of TCMs in NAFLD in order to evaluate the benefits and harms of TCMs for patients with NAFLD. In addition, by assessment the quality of RCTs on TCMs for NAFLD, further analysis would be carried out to explore how to improve the RCT design and the reporting of RCT findings in relation to TCMs. The findings will be useful for updating the current knowledge about TCMs in NAFLD, informing patients' choice of management measures, and identifying areas in need of further research.
Consolidated Standards of Reporting Trials Statement extensions for Chinese herbal medicine Formulas (CON-SORT-CHM) [16] were used in this review. CONSORT-CHM had been developed as one of the extensions of the Consolidated Standards of Reporting Trials (CONSORT) Statement to set the baseline for reporting trials using CHM formulas [17]. In addition to the basic criteria of reporting clinical trials as listed in the CONSORT, the CONSORT-CHM had additional consideration that adequately took into account the unique characteristics of TCM-theory, principles, formulas, and Chinese medicinal substances.

Types of studies
Randomized, double or triple-blinded, controlled trials which investigated the use of TCMs in NAFLD irrespective of blinding, publication status or date of publication, published in English or Chinese were considered for inclusion in this study. Quasi-randomized and observational studies were excluded. In this review, TCMs encompassed preparations which might include the use of the plant, animal materials, and mineral substances in preparations administered as capsules, tablets, teas, decoctions, granules and powders according to the unique principles and comprehensive theory of Traditional Chinese Medicine. In additional, TCM preparations which contained TCMs listed in TCM related standards such as the Chinese Pharmacopoeia [18] or the Grand Dictionary of Chinese Medicine [19] were eligible for consideration in this review.

Types of RCT participants
Participants of any age, gender or ethnic origin with a clear diagnosis of NAFLD irrespective of the diagnostic method, diabetic status or the presence of non-alcoholic steatohepatitis (NASH) were eligible for the studies. We excluded RCTs in which participants recorded had viral hepatitis, liver function decompensation, other liver diseases or undergone liver transplantation previously.

Types of interventions
For inclusions, RCTs which compared TCMs alone or TCMs in combination with behavioral interventions against placebo, no treatment, pharmacological therapy and/or other behavioral interventions were considered. Behavioral interventions referred to lifestyle interventions such as dietary modification and/or exercise regime. Pharmacological therapies referred to any other medicinal herbs not considered TCMs or conventional medicines such as prescription medicines regardless of the mechanisms of actions.

Types of outcome
In order to address the objectives of this study, both efficacy and safety of the TCMs investigated in the RCTs included in this review were to be analyzed. As such, the primary and the secondary outcomes of managing NAFLD with the use of TCMs included the following:

Primary outcomes
The primary outcome measures considered included changes in TCM syndrome score and experiences in adverse reactions. According to the Chinese Medicine Clinical Research of New Drugs Guiding Principles [20], TCM syndrome score is the scoring method to evaluate patients symptoms such as dry mouth, bitter eyes, dry eyes, bleeding gums, insomnia and dreams, abdominal distension, loss of appetite, fatigue, loss of appetite, hypochondriac pain, waist and knee pain, urine and bowel, etc. The symptoms could be rated as "no", "light", "moderate" or "severe" represented by the score of 0 point, 1 point, 2 points, and 3 points respectively.
On the other hand, adverse reactions experienced during or immediately after the intervention duration would be considered. Depending on the availability of data, adverse events would be classified as serious or non-serious. A serious adverse reaction was defined as any effects that could increase mortality; was life-threatening; required hospitalization; resulted in persistent or significant disability; caused a congenital anomaly or birth defect, or any important medical event that might have jeopardized the health of the patients. Non-serious adverse events, on the other hand, referred to any untoward medical occurrence not necessarily having a causal relationship with the treatment, but resulting in a dose reduction or discontinuation of treatment (at any time after commencement of treatment) [21].

Search methods of studies Electronic searches
This systematic review was performed according to the PRISMA-P guidelines [22] for searching the literature. Six electronic databases including PubMed, the Cochrane Library, EMBASE, Web of Science, Scopus and China National Knowledge Infrastructure (CNKI) were searched for RCTs evaluating TCMs in the management or treatment of NAFLD from inception to September 2020. The three primary search terms were "NAFLD", "TCM" and "RCT". As shown in Table 1, the operational definition used for these three primary terms referred to the related vocabularies. MeSH terms and keywords were used to develop a comprehensive search strategy and to ensure the validity of the strategy. Terms within "NAFLD", "TCM", and "RCT" were combined with OR, and the following results from each concept were combined with AND. The references of the included studies and Cochrane reviews on NAFLD were also searched.

Exclusion criteria and screening
The title, abstract and full text of each study were screened for meeting the inclusion criteria. The process of screening for inclusion consisted of 2 rounds of assessment. In Round 1, the title and the abstract were studied to exclude non-TCMs related RCTs. The following studies were also excluded from the first round of screening (1) review, meta-analysis, protocol; (2) [18,19] but was used in combination with other dietary supplements or nutraceuticals. In Round 2, full-text review was conducted to exclude non-TCMrelated RCTs and those RCTs which tested the effects of TCMs in combination with pharmacological interventions in the test group.

Selection of studies
The title and abstract was separately screened by 2 of the authors (ZL, XC) according to the inclusion criteria outlined above. Full texts of potentially relevant articles were retrieved for detailed assessment. The Cochrane evaluations and CONSORT-CHM statement evaluation were independently performed by two of the authors (ZL, XC) following the guidelines, and the disagreements were discussed and resolved by discussion or consultation with two other authors (JS, COLU).

Data extraction and management
Endnote X9 was used to categorize and file all the references. Excel 2013 was used to extract data and record. A standard extraction form was used to extract relevant data from the eligible trials, which contained the basic information of study, methods, intervention, participants, outcomes, overall findings, etc. The main information extracted from each included studies for further analysis is listed in the following: Basic information of study

Assessment of risk of bias in included studies
Two of the authors (ZL, XC) evaluated the risk of bias of each trial independently in accordance with the Cochrane Handbook for Systematic Reviews of Interventions [15] and the CONSORT-CHM [23]. All criteria were referred to from the Cochrane guidelines. The Cochrane risk-of-bias tool was used to assess all trials' quality. There were three categories [17] of the results: "low risk of bias", "Unclear risk of bias" and "High risk of bias". The judgement was made based on the definitions on recommendations from these two assessment methods as shown in the following. Selective outcome reporting • Low risk of bias: the study authors reported predefined, or clinically relevant and reasonably expected outcomes; • Unclear risk of bias: the study authors did not report all predefined, or clinically relevant and reasonably expected outcomes, or the data reported did not match the methods; and • High risk of bias: no mentioned about the predefined, or clinically relevant and reasonably expected outcomes.
Other bias • Low risk of bias: the trial appeared no other components that may cause the risk of bias; • Unclear risk of bias: the trial may or may not have other components that may cause the risk of bias. and • High risk of bias: there were other factors in the trial that could put it at risk of bias (e.g. the differences of baseline, for-profit involvement, and inappropriate intervention design).

Quality assessment methods
Each article included was independently assessed by two of the authors (ZL, XW). Disagreements were settled through discussion or consultation with the other two authors (JS, COLU). The 25-item version of the CON-SORT CHM statement was used to assess for the quality of the trials included. The checklist provides a set of guidelines that may be used to identify the strengths and weaknesses of clinical trials for the treatments of TCM intervention. To measure compliance, a grading system was devised for each criterion, where the reviewer gave a score of "0" if the item was not present at all, a "1" if the feature was partially present, for instance, some aspects of the CONSORT item were missing or being described unclearly, and a "2" if the item was present and clear. By applying the CONSORT criteria for all relevant sections of each study, an overall summary of the reporting quality of the included RCTs was produced. The evaluation method and results were independently checked for validity and consistency by all 6 authors.

Search results
The screening process conducted in accordance with the PRISMA guidelines is summarized in the flow diagram as shown in Fig. 1. A total of 954 citations were identified in the initial searches from the selected electronic databases and related sources. After the removal of duplications, 817 potentially relevant articles were retained for further assessment. Due to a range of reasons, 761 records were further excluded after Round 1 of the screening process (reading titles and abstracts): not randomized trials, review or metaanalysis or protocol articles, pharmacodynamics or pharmacology studies, acupuncture studies, other disease studies, or combined other diseases studies, or herbal ingredients not listed in the TCMs-related standards [18,19], or the use of TCMs was combined with other pharmacological agents. Fifty-six articles were then retrieved and included for further assessment. After accessing and reviewing the full text, 3 more articles were excluded including 2 studies that investigated non-TCMs interventions and 1 study which used both TCMs and chemical drugs in the test group. Eventually, 53 eligible articles published in Chinese (n = 48) and English (n = 5) were included in this review.

Description of studies
Among the 53 publications included in this review, 48 were published in Chinese and only 5 were published in English [24][25][26][27][28] among the included trials, one trial was conducted in Korea [28] and the remaining 52 trials were conducted in China. All of the included studies set a parallel two-arm design. More details are showed in Table 2.

Participants
A total of 5997 participants with NAFLD were recruited in the RCTs included in this review. Two trials did not report the number of males and females [29,30]. The remaining 51 trials reported that 3622 males and 2110 females took part in trials. Among 19 included RCTS, 62 participants were reported as dropouts or withdraw before the intervention was initiated. In terms of allocation, 3165 participants were allocated into the test groups, and 2772 participants were allocated into the control or comparison groups. Out of the 53 RCTs, only 42 reported the mean age of the participants and the standard deviation, 5 trials reported the average age of participants without standard deviation, 3 trials only reported the age range of participants, and 3 trials did not report the age information. Thirty-four trials reported the history of NAFLD in details, and 36 trials reported the participant's origin. With regards to patient types, 16 trials recruited both outpatients and inpatients as participants, 19 trials recruited outpatients as participants, and one trial recruited inpatients as participants [31]. Four studies did not mention about the trial settings [32][33][34][35].

Diagnosis
Chinese expert consensus and treatment guidelines were the preferred reference for diagnosis standards among the included trials. Fifty-one trials specified the diagnostic criteria including 39 trials selected the relevant standards of Chinese and Western medicine diagnostic, and 12 trials used the Western medicine diagnostic criteria. Detailed information is shown in Table 1. The most common diagnosis criteria reported in the studies were: Guidelines for management of NAFLD (2007 [36], 2010 [37]); Consensus Opinions on the Diagnosis and Treatment of NALFD with TCM and WM [38]. Chinese Medicine Clinical Research of New Drugs Guiding Principles      [39]; Expert consensus on TCM diagnosis and treatment of nonalcoholic fatty liver disease [40]. One trial [41] referred the NAFLD diagnostic criteria of the United State [1].

Intervention
A total of 53 different TCMs preparations were tested in the included trials. The combination of multiple herbs was the main intervention method. All the Chinese medicinal materials from each TCM preparation were summarized as shown in

Control and comparison
The control interventions included placebo, conventional medicine, lifestyle intervention, or lifestyle intervention plus conventional drug(s). Thirty-nine trials set a conventional medicine control group, such as polyene phosphatidylcholine capsules (n = 17), silibinin capsules or tablets (n = 4), ursodeoxycholic acid capsules or tablets (n = 3). Four trials used Chinese patent medicine as the intervention in the comparison group. Six trials designed a placebo control group and 32 trials designed a lifestyle intervention, diet or exercise as the comparison group. Detailed information is shown in Table 3.
All the comparisons were: • TCMs versus placebo (2 trials

The intervention durations
The duration of the intervention among the included trials ranged between 8 weeks and 6 months. In 34 trials, 12 weeks or 3 months was set as intervention duration; in 12 trials, 24 weeks or 6 months was set as the intervention duration; in 6 trials, 8 weeks or 2 months were the intervention duration; and 1 trial set 4 months as intervention duration.

Outcomes
As shown in Table 3 All trials measured the outcomes at the end of the intervention duration, and no RCTs reported any follow-up data of the outcomes after the interventions ended. The most commonly measured outcomes included related biochemical response measures of 1iver function (n = 49), blood lipids (n = 49), TCM syndrome score (n = 47) and B-ultrasound findings and computed tomography (CT) scan findings (n = 46). Blood sugar levels (n = 12) and body weight (n = 15) were less frequently used as part of the outcome measurements. Most of the RCTs (n = 41) compared the overall efficacy of interventions in the test groups and the control/comparison groups whereas 12 RCTs compared the 2 groups in terms of the each of the specific outcome measures in each study. All but 3 RCTs showed positive effects of TCMs on the outcomes measured in the test groups compared to the control/ comparison groups with statistical differences.
All the 28 RCTs set out to measure the safety of the TCMs were the only RCTs which reported the data of adverse       events experienced by the participants. Fourteen of these 28 RCTs did not identify any adverse events associated with the interventions used in the studies. Among the remaining 14 RCTs which reported adverse events, 13 RCTs reported only non-serious adverse events and 1 RCT reported both serious and non-serious adverse events in which hospitalization occurred in two participants from the test group receiving phyllanthus due to back pain and stroke and one participant in the placebo group had acute appendicitis [26]. Non-serious adverse effects mainly included gastrointestinal discomfort (such as diarrhea, gastric discomfort or light pain, nausea, diarrhea, decreased appetite) and other mild complaints about cough, headache, blurred vision, dizziness, toothache, gum bleeding, and flu-like symptoms. Thirteen trials reported that the adverse reactions were alleviated by symptomatic treatment with no influence on the trials. None of the trials reported any death from any cause. All the comparisons were: • TCMs versus placebo (2 trials); Further information about the reporting of each outcome measurements is provided in the following: Radiological response (BU, CT) • Seven trials conducted the B-ultrasound and CT to evaluate the efficacy of treatment on NFALD. 12 trials only conducted the CT results before and after treatment, and 27 trials only conducted B-ultrasound results.
Liver function • Out of the 47 trials which tested the related biochemical response measures of liver function, 19 trials reported the changes in AST, ALT, GGT, 17 trials reported ALT and AST change, 4 trials reported ALT, AST, ALP and GGT change, and 4 trials also reported TB. One trial also reported the MRS and free fatty acid to assess of hepatic fat content [28].
Blood sugar • Out of the 12 trials that reported the results of the related biochemical response measures of blood sugar, 7 trials reported FBG, FNIs, HOMA-IR results, and 1 trial [42] reported the Hb1A results.

Blood lipids
• Among the 49 trials that reported the results of blood lipids. 43 trials reported the TG and TC results, of which 16 trials reported the TG, TC, HDL-C and LDL-C results.

The overall efficacy
The final result of 50 trials reported that the test group was more effective than the control group, of which 37 trials showed statistical difference at a two-sided P-value of less than 0.05, while 13 trials showed statistical differences taken p < 0.01. A total of 41 trials reported the overall efficacy rates, the effective rate of the test groups ranged from 33.80 to 100%, of which 16 trials reported the overall effective over 90%, and only 1 [34] trial below 50%. The reported overall efficacy rate of control group ranged from 23.39 to 92.00%, of which only 1 [45] trial reported the overall efficacy rate of over 90%, and 7 trials below 50%. Compared with Ursodeoxycholic Acid Capsules, one trial found that the Jindanwang Mixture was equally effective in treating NAFLD, with the overall efficacy rate 94% vs. 92% in (p > 0.05) [45]. One trial reported that Qiyin Tea and Polyene Phosphatidylcholine Capsules were both equally effective in managing NAFLD [46]. On the other hand, Phyllanthus was shown to be superior to placebo in improving NAFLD (p = 0.873) in one trial [26]. Many of the TCMs were shown to have beneficial effects on the TCM syndrome score, liver function, body lipid profile, blood sugar level and body weight. For instance, a 6-month treatment with Shenge Formula with behavioral interventions, when compared to behavioral interventions only, was more effective in improving liver function and blood lipid profile, yielding a 100% efficacy rate in the test group versus 48.39% in the comparison group [47]. A 3-month treatment with Shugan Jianpi Huatan Decoction with behavioral interventions, when compared to the use of atorvastatin and behavioral interventions, was more effective in improving liver function, blood sugar level and blood lipid profile [48]. A 4-month treatment with Heze lipid lowing oral liquid decoction along was more effective than polyene phosphatidylcholine [49] in improving the liver function, body weight and body lipid profile. All 3 TCMs also appeared to be effective in improving the TCM syndrome score without inducing any risks of adverse effects.

CONSORT-CHM
The summary of the CONSORT-CHM quality assessment results of the 53 RCTs included in this review is shown in Table 4. None of the RCTs fully met all the CONSORT-CHM criteria. The most common reasons for non-compliance in descending order were: a lack of "Other information" (which included information about funding sources, where the full trial protocol could be assessed, and the registration number and the name of trial registry); a lack of discussion about trial limitations; incomplete results due to a lack of information about all the important harms and unintended effects in each group and a lack of ancillary analyses; and incomplete information about the trial methods (which included description about participant flow, blinding, allocation methods and implementation).
All 53 trials reported randomization, but only 6 of them fully reported the method of generating random sequences, including 4 trials used random number tables and 2 RCTs used the central randomized method. Only 3 trials specified the implementation of clinical trials, including random methods, recruitment of participants, and intervention process. None of the studies fully described how to conduct blinding. Seven trials mentioned that blinding method was used, but there was a lack of detailed descriptions of blinding personnel and implementation methods. Forty-six trials did not mention the blinding procedures at all. Only 4 trials fully reported the primary outcome, the secondary outcome, and the outcome of the relevant indicators stated in the method. The remaining 46 trials did not report all the outcomes that the study design had set to measure. Among the 14 trials reported the adverse reaction, 2 trials did not specify information of the cases number of test group or control group. A total of 25 trials did not include adverse effects as one of the outcome measurements.

Risk of bias
Most of the trials provided limited information about study design and methodology. Five multicentre randomized clinical trials were identified [27,30,[49][50][51][52]. While most of the trials (n = 42) had clearly specified both inclusion criteria and exclusion criteria in the trial design, 5 trials did not prespecify them [53][54][55][56][57]. Three trials had not pre-specified inclusion criteria [32,50,58] and 3 trials had not pre-specified exclusion criteria [59][60][61]. The authors' judgements about each domain of risk of bias are presented as percentages across all included trials as shown in Fig. 2, and the judgment about each risk of bias for each included trial is shown in Fig. 3. None of the trials was accessed to have low risk of bias for all domains. All included trials were accessed at unclear or high risk of bias in one or more domains.
Further information about the judgement of the 6 domains of the risks of bias is provided in the following:

Randomization
• We assessed 18 trials at low of bias due to adequate reporting and application of random sequence generation, the remainder were assessed at unclear risk of bias.
Allocation (sequence generation and allocation concealment) • Two trials reported detailed information on allocation concealment methods and were regarded as adequate [26,27], and 51 trials were assessed at unclear risk of bias due to not providing the specific methods of allocation concealment.
Blinding • Two trials claimed that they respectively used single-blinding or double-blinding but they did not report who was blinded [28,58]. Two trial reported that it used blindness-method evaluation [51,52]. One trial reported specifically who were blinded to the treatment assignment [26]. The remaining trials did not mention blinding.
Incomplete outcome data • Three trials specified the numbers and reasons of withdrawal and loss to follow-up and were assessed as trials with a low risk of bias [26,47,52]. One trial gave unclear information on withdrawals and loss to follow-up [58]. 18 trials reported participants lost to follow-up without providing the reasons for the loss and were assessed as trials with unclear risk of bias [28, 30-32, 43-45, 50, 55, 62-70]. The remaining trials reported that no participants were lost to follow-up during the trial.
Selective reporting • Only 2 trials specified outcome measures were not reported [60,61] and the remaining trials were assessed as having high risk of reporting bias. All 53 trials were considered to have unclear risks of bias due to lack of adequate information.
Other potential sources of bias • All the trials did not report the sample size calculations and were assessed as not free of other potential sources of bias.  Nevertheless, the evidence reported in the included RCTs is not sufficient to recommend any of the investigated TCMs for the treatment of NAFLD due to the questionable quality of RCTs, the risk of bias and the lack of homogeneous data. None of the included RCTs was in full compliance with the CONSORT-CHM guideline. The risks of bias were either identified or could not be ruled out due to incomplete information. The heterogeneity in the methodology across all the trials meant that a meta-analysis was not possible, and thus further recommendation about a common or predominant medicine for NAFLD could not be made. Improvement in the RCT protocol, the use of a larger sample size, a setting of multicenter and a more focused approach in selecting TCMs for testing in the future might allow more reliable evidence to be developed effectively to support the role of TCMs in NAFLD treatment. Further analysis of the RCTs included in this review identified common limitations and provided an insight about priority areas of improvement for RCTs testing TCMs as discussed in the following.

Table 4 Evaluation of included trial studies using the CONSORT-CHM statement
Firstly, the reporting quality of the 53 RCTs was generally poor. None of the trials completely fulfilled the criteria of the CONSORT-CHM statement. The evaluation results of the Cochrane risk-of-bias tool were also concerning in light of the multiple risks of bias associated with the randomization sequence generation and blinding. Although all the trials reported the study protocol was designed according to recommended standards, only 3 RCTs fully reported the randomization sequence generation in detail, and none of the 53 trials conducted a blinding process in full compliance. Blinding is important in minimizing bias and maximizing the validity of the study results. As a review of 3159 RCTs previously reported, compared with the RCTs using blinding, the RCTs not using blinding yielded 17% larger estimates of treatment effects and in trials with subjective outcomes, the effect estimates could be exaggerated by 25% [71]. Nevertheless, researchers seemed to have failed to make improvement in the study design and implementation and the issues about blinding procedures in RCTs testing TCMs or other herbal medicines were repeatedly reported and were once again reiterated in this study [72][73][74]. Secondly, most of the RCTs included in this review measured the TCM syndrome score to estimate the efficacy of the investigated TCMs. The score system is determined based on the assessment of TCM symptoms and signs, and is one of the most important indexes for evaluating the effectiveness of a TCM in the treatment of a disease [75]. Previously, this scoring system has been repeatedly used as the primary endpoints to examine the efficacy of TCMs for treatment of various conditions [76][77][78][79]. The scoring items mainly involved dry mouth, bitter eyes, dry eyes, bleeding gums, insomnia and dreams, abdominal distension, loss of appetite, fatigue, loss of appetite, hypochondriac pain, waist and knee pain, urine and bowel, etc. [39]. The scores were determined by TCM practitioners based on the assessment of clinical symptom and sign with a degree of clinical manifestation as observed. However, it was not reported in details in the included RCTs how the repeatability and reliability of TCM syndrome score assessment results were ensured. For this, the assessment should be repeated by at least 1 TCM practitioners separately in the same condition to verify the final scoring results [80]. The reporting of the TCM syndrome score results should be made specific to the syndromes assessed to rule out any concerns of incomplete outcomes data or selective reporting bias.
Thirdly, the limited sample size as reported in some of the included RCTs was another cause of concern. While most of the included RCTs recruited nearly 100 participants and some having over 200 participants, 19 trials (35.19%) reported less than 100 participants in the clinical studies. Insufficient participant's number may result in the inability to detect a precise effect and affect the statistic bias and the quality of trial, possibly leading to a lack of statistical ability to properly estimate the effect of treatment and overestimating the risk of intervention benefit. A sample size calculation involves determining the minimum number of participants needed to detect a treatment effect that is clinically relevant [81]. As depicted in the CONSORT-CHM guidelines, there was a special emphasis on sample size, with recommendations to explain how the sample size was determined which allows a high probability of detecting a statistically significant, clinically relevant difference if one exists [16]. However, most sample size calculation parameters in RCTs testing TCMs may remain poorly understood making it difficult to estimate the sample size needed. One potential solution was ''sample size samba'' or ''delta inflation'' , whereby investigators commonly start with the number of available participants and adjust their estimates of the sample size calculation assumptions to justify their sample size [82]. Importantly, the sample size calculation should still be adequately conducted and fully reported to ensure and demonstrate methodological quality. The age and sex of participants in the included trials were representative factors of people with NAFLD. Nearly all the included RCTs recruited participants in China and only 1 RCT was conducted abroad [28]. This may have had an impact on the generalizability of the evidence and the applicability of the interventions to other populations. No data longer than 6 months on any of the post-treatment follow-up or single-trial outcomes were reported in the included trials. Therefore, the long-term safety and effects of the tested TCMs remained unknown warranting further research in the future. On the other hand, the age attribute of the participants in most of the included RCTs (the range of mean age across the RCTs being 36.9 to 57.3 years old) appeared to be consistent with the common onset age of NAFLD. Only 1 RCT [83] reported the effects of TCMs on children patients with NAFLD. With the increase in obese children all over the world, NAFLD has been considered an important liver disease in children illness [84,85]. Obese children with clinical or biochemical hepatic abnormalities are prone to suffer from NAFLD [86] and can develop NASH at the age of 4 [83]. The landscape of therapeutic developments in pediatric NAFLD is expected to expand. For this, future RCTs should take into consideration the need for evidence of treatment in children NAFLD. The lack of information about the quality standards and quality control of the investigated TCMs might inevitably cause reasonable doubts about the safety of TCMs used by the participants. The reporting of future RCTs should provide supporting information about standardization including composition, quality control, detailed dose regimen, and manufacturing process [12].
Many crude extracts from medicinal plants have significant anti-NAFLD effects. For instance, berberine, which was isolated from the Chinese medicinal material Coptidis Rhizoma and widely used to treat diarrhea and other inflammatory diseases in China [87], was the component of Pinggan jian Decoction [88] and Yiqi Sanju Formula [44] analyzed in this review. Recent studies have proved a new therapeutic function of berberine in metabolic disorders, including obesity and diabetes [89,90]. Berberine can be used as a cholesterol lowering drug, through a unique mechanism distinct from statins [91]. These studies suggested a potential therapeutic activity of berberine for NAFLD. Besides 53 TCM formulas in these systematic review, many TCMs are reported to have significant anti-NAFLD effects. One famous TCM, Yinchenhao Decoction, first recorded in the "Shen Nong's Herbal Classic", has been used in treatment of gallbladder and liver diseases for centuries. It can reduce the accumulation of hepatic fat, enhance adiponectin secretion, increase endothelial progenitor cell proliferation, and increase PPAR-γ expression, which is probably responsible for the therapeutic effect of YCHD on NAFLD [92,93]. Another well-known TCM formula, Qushi Huayu decoction can effectively reverse elevated levels of free fatty acid and total triglycerides (TG), and also can improve hepatic steatosis and inflammation [10]. While treatment options for NAFLD remain limited, the role of TCMs as an indispensable resource for the development of liver protection drugs should be appreciated and the continuous effort in developing quality evidence about the efficacy and safety is pivotal.
Apart from efficacy, evaluation of the safety of TCMs in the management of NAFLD was another primary goal of this review. However, no clear conclusions can be made about the safety of TCMs due to inadequate reporting on adverse reactions in the included trials. If the studies were to be assessed for causality, it would clearly show that adverse events caused by TCMs are relatively infrequent [94]. TCMs when compared with placebo, rarely carry a higher risk of adverse effects according to the results of RCTs [95]. However, an increased incidence of rare adverse events or events with significant latency cannot reliably detected in RCTs [96]. Hence, reliable and comprehensive assessment on TCMs safety should be informed based on an integrative assessment of the totality of the available clinical data (RCTs, case reports, spontaneous reporting schemes and post-marketing surveillance studies).
There were several limitations in our study. Although only RCTs were included in our study, the quality of the RCT design was not high. Together with the vastly different TCMs investigated and the lack of homogeneity in the study design across the included RCTs, it was difficult to draw a clear conclusion about the efficacy and safety of TCMs for NAFLD. Moreover, the evaluation of the TCMs efficacy and safety was based on the quality assessment in this study which was in turn based on how the original study was reported. The judgment about the TCMs efficacy and safety, therefore, may be subject to the influence of the quality of reporting. The CONSORT-CHM extensions adopted as the evaluation framework in this review were designed to improve the completeness and transparency of reporting of interventions in controlled trials of Chinese herbal medicines. However, few researchers had adopted the recommendations fully. By employing the CONSORT-CHM guidelines as an assessment tool and revealing the shortfalls in the current RCT design and reporting, it is anticipated that the quality of TCMs RCT research would be improved through improved awareness, recognition and adoption of the CONSORT-CHM guidelines.

Conclusion
Based on this review, the efficacy of TCMs in NAFLD management appears to be promising and the risks associated with the investigated TCMs were reportedly minimal. However, no conclusion can be made so far due to the concerns over the quality of RCTs and the possible risks of bias. Improvement in the RCT protocol, the use of a larger sample size, a setting of a multicenter, a more focused approach in selecting TCMs and measures that allow the investigation of long term safety of TCMs are recommended for achieving a highlevel quality of TCMs evidence for NAFLD to be used to inform clinical practice.