Analyzing the Chinese landscape in anti-diabetic drug research: leading knowledge production institutions and thematic communities

Background The discovery of anti-diabetic drugs is an active Chinese medicine research area. This study aims to map out anti-diabetic drug research in China using a network-based systemic approach based on co-authorship of academic publications. We focused on identifying leading knowledge production institutions, analyzing interactions among them, detecting communities with high internal associations, and exploring future research directions. Methods Target articles published in 2009–2013 under the topic “diabetes” and subject category “pharmacology & pharmacy,” with “China,” “Taiwan,” “Hong Kong,” or “Macao” (or “Macau”) in the authors’ address field were retrieved from the science citation index expanded database and their bibliographic information (e.g., article title, authors, keywords, and authors’ affiliation addresses) analyzed. A social network approach was used to construct an institutional collaboration network based on co-publications. Gephi software was used to visualize the network and relationships among institutes were analyzed using centrality measurements. Thematic analysis based on article keywords and Rsc value was applied to reveal the research hotspots and directions of network communities. Results The top 50 institutions were identified; these included Shanghai Jiao Tong University, National Taiwan University, Peking University, and China Pharmaceutical University. Institutes from Taiwan tended to cooperate with institutes outside Taiwan, but those from mainland China showed low interest in external collaboration. Fourteen thematic communities were detected with the Louvain algorithm and further labeled by their high-frequency and characteristic keywords, such as Chinese medicines, diabetic complications, oxidative stress, pharmacokinetics, and insulin resistance. The keyword Chinese medicines comprised a range of Chinese medicine-related topics, including berberine, flavonoids, Astragaluspolysaccharide, emodin, and ginsenoside. These keywords suggest potential fields for further anti-diabetic drug research. The correlation of −0.641 (P = 0.013) between degree centrality and the Rsc value of non-core keywords indicates that communities concentrating on rare research fields are usually isolated by others and have a lower chance of collaboration. Conclusion With a better understanding of the Chinese landscape in anti-diabetic drug research, researchers and scholars looking for experts and institutions in a specific research area can rapidly spot their target community, then select the most appropriate potential collaborator and suggest preferential research directions for future studies. Electronic supplementary material The online version of this article (doi:10.1186/s13020-016-0084-y) contains supplementary material, which is available to authorized users.


Background
Diabetes mellitus (DM) is a chronic disease that is defined as a long period of having a high blood glucose level [1], which results in excessive thirst, frequent urination, and hunger. DM complications include ketoacidosis and nonketotic hyperosmolar [2,3]. According to the latest statistics of the International Diabetes Federation, the global prevalence of diabetes among adults aged between 20 and 79 years reached 8.3 % in 2013 [4]. A total of 382 million people worldwide are affected by the disease; of these, 80 % live in either mid-or low-income countries [4]. The number of people living with the disease worldwide is predicted to rise to 592 million by 2035 [4]. The prevalence of DM in China escalated from 0.9 % in 1980 to 11.6 % in 2010 [5,6]. In 2013, diabetes caused 5.1 millions deaths, accounting for 8.39 % of global mortality [4]. The financial cost of diabetes in the U.S. exceeded $548 billion in 2013, accounting for 11 % of global expenditure on health care [4]. The rapidly growing prevalence of diabetes has placed an enormous burden on both social and economic development [4].
The discovery of anti-diabetic drugs has become an active research area in Chinese medicine [7]. Many institutes located in different regions of Greater China are currently engaged in the research and development (R&D) of anti-diabetic drugs. They cooperate on diabetes research projects and work concurrently on selfdeveloped products. Some of these products, such as Xiaoke wan (消渴丸), Yuquan pill (玉泉片), and Ginseng-milkvetch declining sugar granule (參芪降糖顆粒) are among the most popular and well-known anti-diabetic drugs in China. Several studies have investigated the R&D of anti-diabetic drugs in China using bibliometrics to examine the distribution of publication years and journals, document types, keywords, citation counts, and journal impact factors [8,9]. However, the network-based systematic method has not been applied to evaluate China's pharmaceutical research in relation to diabetes, creating a gap in the understanding of the R&D landscape.
This study aims to map the R&D landscape of anti-diabetic drugs in Greater China (which includes mainland China, Taiwan, Hong Kong, and Macao) using a networkbased systematic review. By analyzing co-authorship in the diabetic literature to include Chinese researchers published in international journals during the period 2009-2013, the active institutes in this field were identified, the relationship between research units was explored, communities with high internal associations were detected, and the interests of a research community were characterized.

Data
This systematic review was based on the web of science's science citation index expanded (SCIE) database, a multidisciplinary index of more than 8500 peer-reviewed international journals [10]. Retrieved articles were those published during the period 2009-2013 under the topic "diabetes" and subject category "pharmacology & pharmacy, " with "China, " "Taiwan, " "Hong Kong, " or "Macao" (or "Macau") in the authors' address field.
Bibliographic information such as article title, authors, keywords, and authors' affiliation addresses was extracted for analysis. Records of authors affiliated with non-Chinese institutes were eliminated. We standardized and combined authors and their affiliated organizations. The data retrieval process is illustrated in Fig. 1.

Network visualization and analysis
The network was visualized using the Gephi software package (Version 0.8.2; The Gephi Consortium, Paris, France) and the relationships were analyzed to extract the associations between organizations. Each node in the network represents a research unit, and an edge between two nodes indicates the strength of co-authoring between these two units. Nodes were positioned using the Fruchterman-Reingold algorithm [11] in Gephi. The algorithm belongs to a class of algorithms known as force-directed algorithms; these are used to calculate Institutional information was obtained from the addresses provided by authors in the target articles, and was later standardized. Records of authors belonging to a non-Chinese institute were removed layouts of simple undirected graphs. This method uses a physical analogy to determine the placement of network nodes by minimizing the energy of the system. Nodes behave like atomic particles; they exert repulsive and attractive forces on one another. Repulsive forces exist between all pairs of nodes (like charged particles repelling each other); edges between adjacent nodes cause attraction (like spring force). The position of all nodes continues to adjust until the system reaches its equilibrium state. Thus, the more connected organizations have higher attractive forces and are positioned at the center; in contrast, weaker nodes with less or no connections to others are found on the periphery of the map.
In this study, we applied four centrality metrics to identify the key institutes: degree centrality, weighted degree centrality, betweenness centrality, and closeness centrality.
Degree centrality measures centrality. This shows the number of ties associated with a node in an undirected graph and is reflected in node size. In our research, the degree centrality of an institute represents its number of collaborating organizations. Nodes with a high degree of centrality are usually found in the dense area of the network because they are working with many different research units.
Each node's weighted degree was used to measure an institution's actual performance in productivity and how active the organization was in the system. Weighted degree centrality corresponds to the sum of weighted edges connected to a vertex. In the present study, it signified an organization's total number of co-publications.
Betweenness and closeness centrality describe the importance and role of a node in the system. Betweenness centrality measures how often a node appears on the shortest path between two nodes. In a collaboration network, some nodes do not interact directly; instead, they depend on an intermediary for communication. An intermediary with high betweenness functions as a "gatekeeper" to control the flow of interactions in the network. However, a high-betweenness node need not necessarily be one with a high degree centrality [12]. The closeness centrality of a vertex is the total geodesic distance between a vertex and all other vertices; it can be defined as how close an organization is to all others. A lower closeness value indicates that it is a more central node; that is, a node that can access or disseminate new information quicker than others [13,14].
The above analysis focused mainly on absolute strength between nodes, but the relative strength between them was also considered. The Jaccard index is a statistic that shows how similar two sample sets are by calculating the ratio of intersection of the sets by the union of the sample sets [15]. Jaccard index values range from 0 to 1; a value closer to 1 indicates a higher mutual dependence between the two units. This value is defined as where N ij and J ij stand for number of co-publications and relative strength of co-publication between institutions i and j, respectively. Referring to Scherngell and Hu's use of this index [16], we used it to measure each institution's level of mutual dependence on its collaborating pair for co-publication. We separately collected the top 50 institutional pairs measured by absolute (edge weight) and relative (Jaccard index) values. We selected the overlapping pairs between two rankings to determine the top institutional pairs in the Chinese anti-diabetic drug research network; this indicated those strongly connected institutional pairs that rely heavily on each other for publication.

Community detection and thematic analysis
In the undirected network, some tightly interconnected nodes formed relatively stable community subnetworks. To quantify the notion of community, we used modularity, a measure that assigns a numeric value assessing how well a partition of the network nodes matches the informal notion of community. We used the Louvain algorithm, an efficient and widely used Gephi method to detect high-modularity communities, to determine the relevant communities in the network [17].
To characterize the themes of each identified community, the occurrence of each article keyword in the community was calculated and its relevance to that particular community was measured by its R sc value, which is defined as R sc = f sc /f s [18]. This value is the ratio of occurrence of keyword S in community C's articles to its occurrence in the article set as a whole. A high R sc value indicates that a keyword's occurrence in a specific community is relatively higher than in the whole system and is more significant to that community. Moreover, for each community, we categorized all non-core keywords as "Others" and calculated its R sc value. The term "noncore keywords" describes keywords that appeared in the article set less than five times in total. Considering the linear and normally distributed characteristics of the data at the interval or ratio level, we then performed a Pearson's correlation test (SPSS Statistics software, Version 20; IBM Corp., Armonk, NY, USA) between the R sc value of "Others" and the degree centrality of that community to identify the relationship between research topics and collaboration opportunities of communities in the network.

General description of data sample
Using the retrieval criteria mentioned in the Methods section, a total of 882 articles were retrieved. Excluding one correction article, we obtained 881 articles after the exclusion process. A majority of the sampled items were research articles (86.85 %); the rest comprised review articles (9.75 %), meeting abstracts (3.29 %), and proceedings papers (1.02 %). Most of these publications were funded by government or non-profit organizations.  Figure 2 illustrates the current collaborative relationship between Chinese institutes involved in the anti-diabetic drug research. There are 430 nodes on the map and 7673 undirected weighted edges, indicating that these 430 institutes have cooperated 7673 times. Node size depends on degree centrality and edge thickness between two nodes is determined by the number of co-publications between them [19]; more frequently collaborating institutional pairs are connected by a thicker edge.

Institutional collaboration network
In this figure, research units from mainland China and Taiwan are represented in cyan and yellow, respectively, and magenta represents those from both Hong Kong and Macao. Only the names of the more active nodes are shown here.
The structure of this network can be broadly divided into two parts. The left part is dominated by institutions from mainland China and those from Hong Kong and Macao, and the right part comprises mainly Taiwanese organizations. Although there were few connections between these two parts, each part contained abundant internal connections. The research units in Taiwan tended to collaborate with organizations from the same region, but less with those from mainland China, Hong Kong, or Macao. Table 1 shows the top 50 institutes in weighted degree, indicating the total number of an institute's co-publications. In addition, the influence of a particular institute on others' collaborative behavior was explored using betweenness and closeness centrality.

Leading knowledge producers
The majority of mainland Chinese organizations on the list are outstanding academic institutes; these include the universities of project 985 and 211 and some of the country's best-known hospitals and research centers. Most of these are located in Shanghai (18.42 %) and Beijing (15.79 %).
The weighted degree comprises two parts: internal and external degrees. The internal degree represents the number of articles authored by scholars within the same research unit. The external degree reflects the number of co-publications by authors from different institutes. As characterized by their high weighted degrees, Shanghai Jiao Tong University, National Yang Ming University, China Pharmaceutical University, Chang Gung Medical Foundation, Taipei Veterans General Hospital, and Peking Union Medical College are the key institutes in Greater China's anti-diabetic drug research. Organizations that tend to collaborate externally rather than internally had a higher level of external collaboration (LEC). The average LEC value of Taiwanese institutes listed here was 66.99 %, indicating that they tended to cooperate more externally than internally. Conversely, mainland China's LEC was much lower than that of Taiwan, at only 32.98 %, indicating a rather closed collaboration network. As an example, Shanghai Jiao Tong University ranked first in weighted degree but its LEC was only 17 %. In contrast, Peking University had the highest betweenness centrality in the whole system, acting as a gatekeeper of knowledge flow. Other organizations that showed strong betweenness performance include Fudan University, National Taiwan University, Shandong University, Sun Yat-Sen University, and the University of Hong Kong; all of these acted as information intermediaries located in different regions across Greater China.
Many high-betweenness organizations were also the ones with low closeness centrality, emphasizing their important position as knowledge brokers who could access and share new information more quickly than others (Table 1). Some organizations occupied the highest positions for co-publication numbers, betweenness, and closeness, demonstrating their extraordinary contributions to knowledge creation and dissemination. These institutions were Shanghai Jiao Tong University, China Pharmaceutical University, Peking Union Medical College Hospital, Fudan University, Peking University, National Taiwan University, University of Hong Kong, and Sun Yat-Sen University.

Top institutional pairs
Institutional pairs that concurrently ranked in the top 50 in terms of edge weight and Jaccard index were chosen to identify the most frequently collaborating pairs with highest mutual dependence. Table 2

Thematic communities
The basic elements of an institutional collaboration network are the nodes and subgroups of the network, with various subgroups combining to form a complex institution network. As shown in Fig. 3, closely collaborating organizations were clustered into various community subnetworks. The 14 major subnetworks in this system were labeled in descending order of number of institutes. For example, community 1 was the largest community, Fig. 2 Institutional collaboration network of anti-diabetic drug research in China. This network is visualized and analyzed by using software Gephi, it comprises 430 nodes and 7673 undirected weighted edges. A node represents a research institution and node size depends on degree centrality while edge thickness between two nodes is determined by the number of co-publications between them. Research institutions from mainland China and Taiwan are represented in cyan and yellow respectively, and magenta represents those from both Hong Kong and Macao. Only the names of the more active nodes are shown here with 71 organizations, whereas community 14 was the smallest, with only three organizations. Community size increases with number of articles, and edge thickness between any two nodes corresponds to the frequency of co-publications involving authors from those two communities. Table 3 contains additional information about the 14 communities.
The internal structure mining of communities revealed geographic features. For community 2, only one out of the 64 institutes was outside Taiwan, indicating that this community was dominated by Taiwanese organizations that seldom communicate with mainland China, Hong Kong, or Macao. Community 3 was composed mainly of outstanding institutes from Shanghai, such as Shanghai Jiao Tong University, Fudan University, Shanghai Institute of Materia Medical of Chinese Academy of Sciences, and The Second Military Medical University. In contrast, all six research units in community 11 were found in Northwest China, including five from Shaanxi Province and one from the Xinjiang Uygur Autonomous Region.
The research interests of communities were identified by calculating the R sc value of article keywords. For each community, keywords were arranged in descending order of frequency; those accounting for the top 50 % of cumulative frequency were shown and those in bold and in italic were the ones with top 10 R sc values and more related to Chinese medicines respectively. As an example, community 2 focuses not only on Chinese medicines, diabetic complications, and hyperglycemia, but also on cohort study and structural biology.
All keywords appearing in the system were standardized for statistical analysis. Those occurring more than five times were classified as core keywords and the rest were classified as non-core keywords. The top core keywords were Chinese medicines, diabetic complications, oxidative stress, hyperglycemia, insulin resistance, mice model, inflammatory, pharmacokinetics, and GLP-1. The most frequent Chinese medicines keywords included berberine, flavonoids, Astragalus polysaccharide, emodin, ginsenoside, quercetin, curcumin, Ganoderma lucidum, and resveratrol. Table 4 shows the most frequently used keywords in Chinese medicine anti-diabetes research; these are categorized into "Herbs" and "Compounds. " More than three research articles focused on "Herbs" and more than five articles focused on "Compounds. " The Pearson's test of correlations between degree centrality and the R sc value of non-core keywords across communities produced a negative correlation coefficient of −0.641 (P = 0.013), indicating that when a community focuses on a research topic that is of common interest to all communities, it can easily build up its collaboration network. However, communities that concentrate on rare research fields are usually isolated by others and have a lower chance of collaboration. For more details regarding the collaboration network and thematic communities, please refer to the Additional files 1, 2, 3, 4.

Discussion
In the present study, the sample data included 881 scientific articles published by 430 Chinese institutes over the 5-year period of 2009-2013. This study identified the most productive institutes and leading academic communities in anti-diabetic drug development.
The average LEC of mainland China's organizations was only half that of Taiwan, indicating that institutes in China were less dependent on external collaboration. This might be explained by a number of factors: (1) the organization had sufficient resources to meet the needs of research projects; (2) there might be no adequate communication platform for knowledge sharing; and (3) geographic distance resulted in a negative impact on information flow between institutes. In contrast, although Taiwan exhibited a high LEC, it was rather disconnected from organizations in mainland China, Hong Kong, and Macao.
The network visualized data in this article may help researchers and scholars to navigate in this active research area and to find potential collaborators specializing in a particular research area. For instance, a researcher affiliated with Jilin University (a member of community 8) who specializes in the research of ginsenoside, is planning to explore the therapeutic effect of a combination of ginsenoside and medicinal mushrooms on diabetic complications. In Fig. 3, community 2 was the only group to focus on medicinal mushrooms research; therefore, the researcher could probably identify potential collaborators from this community. Knowledge of the list of institutes that belong to community 2, and with the help of the co-authorship network, the researcher could select the most appropriate expert for research collaboration based on centrality measurements. In this case, the researcher might approach China Medical University in Taiwan, as it ranked at the top in the three aspects of centrality measurements among institutes studying medicinal mushrooms. This kind of research collaboration does not simply represent the building of a new partnership between two researchers or institutions, but also fosters knowledge exchange across various diabetes research communities in China and further stimulates innovation and productivity.
In addition, each community studied here is labeled with a list of keywords, covering a diverse range of topics that included herbology, chemical component, molecular biology, clinical medicine, pharmacokinetics, therapeutic drugs, and animal experimental models. Chinese medicines, diabetic complications, oxidative stress, hyperglycemia, and insulin resistance were common topics that could be found on the keyword list across different communities, indicating their significance in the field. Only minimal adverse effects were reported for Chinese medicines used in treating type 2 DM, indicating certain advantages in the prevention of diabetes and delay of its complications [7]. Studying the role of oxidative stress in the pathogenesis of diabetes provides a theoretical basis for the prevention and treatment of the disease [20], as oxidative stress could be a major cause of the development of type 2 DM [21]. Furthermore, Chinese medicines in combination with insulin exhibited better clinical effect in the treatment of gestational diabetes [22]. Similarly, berberine exhibited beneficial anti-inflammation effects, indicating that it could be a potential therapeutic drug in type 1 DM treatment [23]. Each of the 14 communities in the studied system has its own area of expertise, which is usually a non-mainstream topic. Community 2, a subnetwork primarily composed of Taiwanese organizations, focuses on cohort study and structural biology while Community 3 is a Shanghaibased group that specializes in ficolin-3 and islet cell culture. Community 5 focuses on wound healing, radix Rehmanniae, sweet tea, and synergistic effect; most of its constituent organizations are from Guangdong Province, Hong Kong, and Macao. Finally, the Beijing-dominated