# Thesis final v7

Comparative Analysis of Louvain and K-Means Methods for Community Detection

By

Firstname LastnameUniversity Name

Abstract

Community detection in the graphs of complex real network systems is a significant area of data science research. A community or a cluster is one that has many edges joining the vertices included within the cluster while fewer edges joining with the vertices not included. The criteria for inclusion in a community is based on the data of the vertices and edges. A system’s data helps form communities of a system. These communities help the data analysts in getting a high-level view of the system’s constituents. Such a view is crucial for executing behavioral analysis, taking managerial decision-making, strategizing marketing plans, recommendations, etc. A lot of community detection algorithms have been suggested in the research community. From these, the Louvain algorithm and K-Means algorithm are two popular algorithm choices. This paper performs a comparative analysis of the two community detection algorithms by using various real-world data sets of different complexities and evaluating their performance against them. The comparisons will highlight the strengths and limitations of each algorithm and suggest the ideal scenarios for their applications.

Keywords: community detection, clustering, Louvain algorithm, K-means algorithm, data sets.

Table of Contents

TOC o “1-3” h z u 1.Introduction PAGEREF _Toc12514285 h 11.1.Related Works PAGEREF _Toc12514286 h 22.Background Concepts PAGEREF _Toc12514287 h 63.1.Graph Theory PAGEREF _Toc12514288 h 63.1.1.Graph PAGEREF _Toc12514289 h 63.1.2.Community PAGEREF _Toc12514290 h 73.1.3.Edge Matrix PAGEREF _Toc12514291 h 83.2.Clustering Algorithms PAGEREF _Toc12514292 h 113.2.1.Hierarchical Clustering PAGEREF _Toc12514293 h 113.2.2.Graph Partitioning PAGEREF _Toc12514294 h 123.2.3.Spectral Clustering PAGEREF _Toc12514295 h 123.2.4.Connected Component PAGEREF _Toc12514296 h 123.3.Spectral and Hierarchal Clustering Algorithms PAGEREF _Toc12514297 h 133.3.1.Louvain Algorithm PAGEREF _Toc12514298 h 133.3.2.K-Means Algorithm PAGEREF _Toc12514299 h 183.3.3.K-Means++ Algorithm PAGEREF _Toc12514300 h 193.Contribution PAGEREF _Toc12514301 h 224.1.Problem Definition PAGEREF _Toc12514302 h 234.2.System Overview PAGEREF _Toc12514303 h 234.Experiments PAGEREF _Toc12514304 h 255.1.Datasets PAGEREF _Toc12514305 h 255.1.1.Zachary’s Karate Club Dataset PAGEREF _Toc12514306 h 265.1.2.Dolphin’s Social Network Dataset PAGEREF _Toc12514307 h 285.1.3.Email-Eu-Core Dataset PAGEREF _Toc12514308 h 305.1.4.Datasets Summary PAGEREF _Toc12514309 h 315.2.Algorithm Performance Evaluation PAGEREF _Toc12514310 h 315.2.1.Performance Measures PAGEREF _Toc12514311 h 315.2.2.Silhouette Coefficient Method PAGEREF _Toc12514312 h 405.2.3.Elbow Criterion PAGEREF _Toc12514313 h 415.3.Algorithm Evaluation PAGEREF _Toc12514314 h 415.3.1.Louvain Evaluation PAGEREF _Toc12514315 h 415.3.2.K-Means Evaluation PAGEREF _Toc12514316 h 455.3.3.K-Means++ Evaluation PAGEREF _Toc12514317 h 585.4.Test Environment PAGEREF _Toc12514318 h 805.Results PAGEREF _Toc12514319 h 826.1.Karate Club PAGEREF _Toc12514320 h 846.2.Dolphins Dataset PAGEREF _Toc12514321 h 856.3.Email-Eu-Core Dataset PAGEREF _Toc12514322 h 866.4.Summary PAGEREF _Toc12514323 h 876.Discussion PAGEREF _Toc12514324 h 887.Conclusion PAGEREF _Toc12514325 h 938.Future Work PAGEREF _Toc12514326 h 95Sources PAGEREF _Toc12514327 h 96Appendix PAGEREF _Toc12514328 h 102

List of Figures

TOC h z c “Figure” Figure 1 Components of a Graph PAGEREF _Toc9243960 h 6Figure 2 Graph Types PAGEREF _Toc9243961 h 6Figure 3 Communities within a Graph PAGEREF _Toc9243962 h 7Figure 4 Overlapping Vs.. Non-Overlapping Communities (Whang & Dhillon, n.d.) PAGEREF _Toc9243963 h 8Figure 5 Edge Matrix Sample PAGEREF _Toc9243964 h 9Figure 6 Dendogram Example to Compare Agglomerative and Divisive Clustering PAGEREF _Toc9243965 h 11Figure 7 Connected Component (Kulkarni, 2017) PAGEREF _Toc9243966 h 12Figure 8 Louvain Algorithm Dendogram (Lund, 2018) PAGEREF _Toc9243967 h 14Figure 9 Example of Louvain Algorithm (Lund, 2017) PAGEREF _Toc9243968 h 15Figure 10 Pseudocode of Louvain Algorithm (Kim et al., 2013) PAGEREF _Toc9243969 h 16Figure 11 Framework of Louvain Algorithm PAGEREF _Toc9243970 h 17Figure 12 K-Means Sensitivity to Initial Centroids Selection PAGEREF _Toc9243971 h 20Figure 13 Pseudocode for K-Means Algorithm (Lloyd, 1982) PAGEREF _Toc9243972 h 20Figure 14 Framework of the K-Means Algorithm PAGEREF _Toc9243973 h 21Figure 15 System Framework PAGEREF _Toc9243974 h 24Figure 16 Ground Truth Clustering Map for Karate Club Dataset PAGEREF _Toc9243975 h 27Figure 17 Ground Truth Clustering Map for Dolphin’s Social Network Dataset PAGEREF _Toc9243976 h 29Figure 18 Louvain Clustering Map for Karate Club Dataset PAGEREF _Toc9243977 h 42Figure 19 Louvain Clustering Map for Dolphin’s Dataset PAGEREF _Toc9243978 h 44Figure 20 Elbow Criterion for Karate Club Dataset (K-Means) PAGEREF _Toc9243979 h 46Figure 21 Silhouette Graph for Karate Club (K-Means) PAGEREF _Toc9243980 h 46Figure 22 Modularity Graph for Karate Club (K-Means) PAGEREF _Toc9243981 h 47Figure 23 Calinski and Harabasz Score for Karate Club (K-Means) PAGEREF _Toc9243982 h 47Figure 24 K-Means Clustering Map for Karate Club Dataset PAGEREF _Toc9243983 h 49Figure 25 K-Means Elbow Criterion for Dolphin’s Dataset PAGEREF _Toc9243984 h 50Figure 26 Silhouette Graph K-Means (Dolphins) PAGEREF _Toc9243985 h 51Figure 27 Modularity Graph K-Means (Dolphins) PAGEREF _Toc9243986 h 51Figure 28 Calinski and Harabasz Score for K-Means (Dolphins) PAGEREF _Toc9243987 h 51Figure 29 K-Means Performance Scores for Dolphins Dataset PAGEREF _Toc9243988 h 52Figure 30 K-Means Clustering Map for Dolphin’s Dataset (k=4) PAGEREF _Toc9243989 h 54Figure 31 K-Means Elbow Analysis (Email-Eu-Core) PAGEREF _Toc9243990 h 55Figure 32 Silhouette and Modularity for emailEuCore PAGEREF _Toc9243991 h 56Figure 33Calinski and Harazar Scores for emailEuCore Dataset PAGEREF _Toc9243992 h 56Figure 34 Elbow Criterion for Karate Club (K-Means Vs. K-Means++) PAGEREF _Toc9243993 h 59Figure 35 Silhouette Graph for Karate Club (K-Means vs. K-Means++) PAGEREF _Toc9243994 h 60Figure 36 Modularity Graph for Karate Club (K-Means vs. K-Means++) PAGEREF _Toc9243995 h 60Figure 37 Calinski and Harabasz Score for Karate Club (K-Means vs. K-Means++) PAGEREF _Toc9243996 h 61Figure 38 K-Means++ Vs. K-Means Iterations for Karate Club Dataset PAGEREF _Toc9243997 h 62Figure 39 K-Means Purity Scores (for k=2:10) for Karate Club Dataset PAGEREF _Toc9243998 h 63Figure 40 K-Means NMI Scores (for k=2:10) for Karate Club Dataset PAGEREF _Toc9243999 h 64Figure 41 K-Means AMI Scores (for k=2:10) for Karate Club Dataset PAGEREF _Toc9244000 h 64Figure 42 K-Means ARI Scores (for k=2:10) for Karate Club Dataset PAGEREF _Toc9244001 h 64Figure 43 K-Means Homogeneity Scores (for k=2:10) for Karate Club Dataset PAGEREF _Toc9244002 h 65Figure 44 K-Means FMI Scores (for k=2:10) for Karate Club Dataset PAGEREF _Toc9244003 h 65Figure 45 K-Means F1 Scores (for k=2:10) for Karate Club Dataset PAGEREF _Toc9244004 h 65Figure 46 K-Means++ Clustering Map for Karate Club Dataset PAGEREF _Toc9244005 h 66Figure 47 K-Means Elbow Criterion for Dolphin’s Dataset PAGEREF _Toc9244006 h 67Figure 48 Silhouette Graph K-Means vs. K-Means++ (Dolphins) PAGEREF _Toc9244007 h 68Figure 49 Modularity Graph K-Means vs. K-Means++ (Dolphins) PAGEREF _Toc9244008 h 69Figure 50 Calinski and Harabasz Score Graph K-Means vs. K-Means++ (Dolphins) PAGEREF _Toc9244009 h 69Figure 51 Purity Scores for Dolphins Dataset PAGEREF _Toc9244010 h 70Figure 52 NMI Scores for Dolphins Dataset PAGEREF _Toc9244011 h 70Figure 53 AMI Scores for Dolphins Dataset PAGEREF _Toc9244012 h 71Figure 54 ARI Scores for Dolphins Dataset PAGEREF _Toc9244013 h 71Figure 55 Homogeneity Scores for Dolphins Dataset PAGEREF _Toc9244014 h 72Figure 56 FMI Scores for Dolphins Dataset PAGEREF _Toc9244015 h 72Figure 57 F1 Scores for Dolphins Dataset PAGEREF _Toc9244016 h 73Figure 58 K-Means++ Performance Scores for Dolphins Dataset PAGEREF _Toc9244017 h 73Figure 59 K-Means Vs. K-Mean++ Iterations for Dolphins Dataset PAGEREF _Toc9244018 h 75Figure 60 K-Means++ Clustering Map for Dolphin’s Dataset (k=5) PAGEREF _Toc9244019 h 76Figure 61 K-Means Elbow Analysis (Email-Eu-Core) PAGEREF _Toc9244020 h 77Figure 62 K-Means++ Modularity for emailEuCore PAGEREF _Toc9244021 h 78Figure 63 K-Means vs. K-Means++ Silhouette Score (Email-Eu-Core) PAGEREF _Toc9244022 h 78Figure 64Calinski and Harazar Scores for emailEuCore Dataset PAGEREF _Toc9244023 h 79Figure 65 Community Evaluation Metrics for Karate Club PAGEREF _Toc9244024 h 84Figure 66 Performance Evaluation Metrics for Karate Club Dataset PAGEREF _Toc9244025 h 85Figure 67 Quality Evaluation Metrics for Dolphins Dataset PAGEREF _Toc9244026 h 86Figure 68 Performance Evaluation Metrics for Dolphins Dataset PAGEREF _Toc9244027 h 86Figure 69 Quality Evaluation Metrics for Email-EU-Core PAGEREF _Toc9244028 h 86Figure 70 Performance Metrics for Email-EU-Core PAGEREF _Toc9244029 h 87

List of Tables

TOC h z c “Table” Table 1 Ground Truth Communities for Karate Club Dataset PAGEREF _Toc9244030 h 26Table 2 Ground Communities Scores for Karate Club Dataset PAGEREF _Toc9244031 h 27Table 3 Ground Truth Communities for Dolphins Dataset PAGEREF _Toc9244032 h 29Table 4 Ground Communities Scores for Dolphins Dataset PAGEREF _Toc9244033 h 30Table 5 Community Goodness Scores for email-Eu-core Ground Truth PAGEREF _Toc9244034 h 31Table 6 Datasets Features Summary PAGEREF _Toc9244035 h 31Table 7 Louvain Results for Karate Club Dataset PAGEREF _Toc9244036 h 41Table 8 Louvain Communities Scores for Community Evaluation Metrics PAGEREF _Toc9244037 h 42Table 9 K Louvain Performance Analysis (Karate) PAGEREF _Toc9244038 h 42Table 10 Louvain Results for Dolphin’s Dataset PAGEREF _Toc9244039 h 43Table 11 Louvain Communities Scores for Community Evaluation Metrics PAGEREF _Toc9244040 h 44Table 12 K Louvain Performance Analysis (Dolphins) PAGEREF _Toc9244041 h 44Table 13 Louvain Community Goodness Scores for email-Eu-core Dataset PAGEREF _Toc9244042 h 45Table 14 K-Means Performance Analysis for k=2 (emailEucore) PAGEREF _Toc9244043 h 45Table 15 Community Goodness Scores Analysis for Karate Club (K-Means) PAGEREF _Toc9244044 h 47Table 16 Performance Scores of K-Means PAGEREF _Toc9244045 h 48Table 17 K-Means Results for Karate Club Dataset PAGEREF _Toc9244046 h 48Table 18 Time and Iterations K-Means (K-Means) PAGEREF _Toc9244047 h 49Table 19 K Variation Performance Analysis for Dolphins (K-Means) PAGEREF _Toc9244048 h 52Table 20 Performance Scores of K-Means Score for Dolphin PAGEREF _Toc9244049 h 53Table 21 Time and Iterations of K-Means (Dolphins) PAGEREF _Toc9244050 h 53Table 22 K-Means Results for Dolphin’s Dataset PAGEREF _Toc9244051 h 54Table 23 Optimal k for K-Means Using Community Goodness Metrics PAGEREF _Toc9244052 h 57Table 24 K-Means Community Goodness Scores for k=2 (emailEucore) PAGEREF _Toc9244053 h 57Table 25 K-Means Performance Analysis for k=2 (emailEucore) PAGEREF _Toc9244054 h 57Table 26 K-Means Community Goodness Scores for k=42 (emailEucore) PAGEREF _Toc9244055 h 58Table 27 K-Means Performance Analysis for k=42 (emailEucore) PAGEREF _Toc9244056 h 58Table 28 Time and Iterations of K-Means and K-Means++ (Karate Club) PAGEREF _Toc9244057 h 61Table 29 K Variation Performance Analysis for Karate Club (K-Means++) PAGEREF _Toc9244058 h 62Table 30 Performance Scores of K-Means++ PAGEREF _Toc9244059 h 63Table 31 K-Means++ Results for Karate Club Dataset PAGEREF _Toc9244060 h 66Table 32 K Variation Performance Analysis for Dolphins (K-Means++) PAGEREF _Toc9244061 h 67Table 33 Performance Scores of K-Means++ for Dolphin PAGEREF _Toc9244062 h 74Table 34 Time and Iterations Comparison of K-Means and K-Means++ PAGEREF _Toc9244063 h 74Table 35 K-Means++ Results for Dolphin’s Dataset PAGEREF _Toc9244064 h 75Table 36 Optimal k for K-Means++ (Email-Eu-Core) PAGEREF _Toc9244065 h 79Table 37 K-Means++ Community Quality for k=9 (emailEucore) PAGEREF _Toc9244066 h 79Table 38 K-Means Performance Analysis for k=9 (emailEucore) PAGEREF _Toc9244067 h 80Table 39 K-Means++ Community Goodness Scores for k=42 (emailEucore) PAGEREF _Toc9244068 h 80Table 40 K-Means++ Performance Analysis for k=42 (emailEucore) PAGEREF _Toc9244069 h 80Table 41 Results for the Community Goodness Measures PAGEREF _Toc9244070 h 82Table 42 Results for the Performance Measures PAGEREF _Toc9244071 h 83Table 43 Performance Summary of Louvain vs. K-Means PAGEREF _Toc9244072 h 87Table 44 Python Libraries Used PAGEREF _Toc9244073 h 102

IntroductionThe important features of a system can be extracted as datasets which can be represented as graphs. For instance, a social networking site such as Facebook provides a shared platform to the people allowing them to interconnect and share data. The representation of a social network through a graph would consist of nodes and edges. The node would represent a person and an edge would indicate the person’s interaction (e.g. friendship, like, share).

A community or cluster is formed by vertices that are densely connected and scarcely connected to vertices from other communities. Communities or clusters help partition a large network into smaller collection of nodes from some specific perspectives of interest. The same network can be used for detecting communities of different types. For instance, community of users that like football or support a specific political leader.

Representation of complex real-world systems in the form of vertices and edges can serve as an important tool for understanding and analyzing the overall system once the communities are detected. All nodes with similar properties and behaviors form a community. For instance, the general trend of people on a social network, e.g., liking similar items/posts/pages, sharing views on certain topics, preferences, etc., make them fall into virtual communities or clusters.

Detection of these communities or clusters can be significant for many marketing and recommendation applications (Kim and Ahn, 2008). For instance, on a social network, notifying the soccer fans of a new soccer event, suggesting items based on an individual’s browsing history, suggesting friends, finding the influential people within a group, etc. Similar to the social networks, in community detection finds its application in many other fields e.g. in exploring research collaborations (Jiyanthi and Priya, 2018), finding protein interactions in biological networks (Wang et al., 2010), studying real-time air and land traffic maps (Pattanaik et al., 2018), etc.

Related WorksWith a view to the problem of community identification or clustering, a lot of algorithms have been published. Small variations or additions to the originally proposed algorithms have been suggested to improve the overall performance against some specific dataset or for the general advancement of the algorithm.

Arthur and Vassilvitskii (2007) propose a modification to the K-Means initial step of choosing centroids randomly, calling it K-Means++. Originally, the K-Means algorithm would choose any node that was selected through the random selection. This would result in the seeds forming the same clusters in the cases where the centroids would get selected close together. The proposed modification ensured that the initial random centroids are well apart from each other. The modification not only improved the algorithm but also increased its speed.

Before applying the K-Means algorithm, the network data is mapped to a lower dimensional space which has all the useful features of the original data. A common method used for performing this mapping is through neural networks. Vilcek (2014) proposed an algorithm called Deep K-Means in which K-Means serves as a multi-layer auto-encoder that decomposes the high dimensional data into a lower dimensional data recursively. The proposed algorithm outperformed the traditional spectral clustering algorithm.

Wang and Koopman (2017) use both K-Means and Louvain algorithms to cluster articles from the large network’s dataset, Astro, with no ground truth communities defined. The clustering is based on finding a semantic similarity between the articles. The semantic information is the metadata of the dataset. Based on the clustering results from different researchers for the same dataset, an approximation of the results from the research was considered as the ground truth communities for testing the two algorithms. Application of Louvain algorithm was simple as the algorithm did not need any input parameters to run and calculated clusters with local maximum modularity. However, to use K-Means, the value of k was determined pragmatically. The authors used the maximum Silhouette score corresponding to a k, which was 30. The found number was close to 31, the number of clusters found by Louvain algorithm. For the initial placement of centroids, the algorithm was run 10 times for the value of k between 10 and 60 and the centroid node that gave the lowest Sum of Squares, was selected.

Sommer et al. (2017) propose a simple method of applying K-Means clustering in a black-box manner. The work uses 13 different graph distance kernels with the K-Means to cluster datasets. The performance of all 13 variations of the black-box configurations is compared with that of Louvain algorithm. For all the mediocre sized datasets, it was found that the K-Means algorithm performed equally good as the Louvain algorithm. For the larger datasets, the Louvain algorithm outperformed the K-Means method. Using the kernels, the need of providing the number of clusters before the application of K-Means was removed. The kernel’s parameters and the number of clusters got predicted according to the modularity values for a dataset. The results varied with the type of kernel selected for a run. Free Energy (FE), Sigmoid Corrected Commute-Time (CCT) and Randomized Shortest-Path (RSP) Dissimilarity kernels gave the best results.

A comparative study has been conducted by Jianjun et al. (2014) in which they propose an active learning ‘Must-Link Cannot-Link’ approach for the undirected, unweighted graphs and use it with a semi-supervised community detection algorithm. The combination improved the overall clustering results of the algorithm. A total of six algorithms have been compared based on their performance measures while partitioning four datasets of known ground truths. The three metrics were chosen to measure the algorithm’s performance; modularity, accuracy and NMI.

With a view to evaluate the community detection algorithms for real-world small and large network datasets, a lot of published research is available. Some suggest different metrics of performance measure, while others discuss the misconceptions about certain factors that are ignored but may affect the clustering performance. The work highlights the issues in the process of performance evaluation itself.

Lee and Cunningham (2014) present the argument that algorithms that perform well on the smaller datasets with known ground truth communities may not perform equally well while extracting communities in the large social network datasets where the structure of networks is not known. For clustering these large datasets, community detection algorithms use some metadata. This metadata in itself may not perfectly depict the structure of the network.

Peel et al. (2017) share the same view as Lee and Cunningham (2014). Algorithms don’t perform equally in different communities of varying sizes and complexity. The authors have suggested a ‘No Free Lunch’ theorem that proves that no community detection algorithm is flexible enough to handle all types of community detection tasks. There is no single universal pattern for a community structure; no metadata describes all aspects of a community and not one algorithm fits all. There are general algorithms that perform good overall, and then there are the ones tailored for specific tasks. According to the research, even the common practice of selecting metadata from large networks (e.g., gender, religion, age, ethnicity, etc.) as a ground truth measure, may not be an effective tool for all types of communities. They study the relationship between the community structure and metadata in several detection frameworks to assess how close the link is between the two to their roles in the complex real-world network systems. Sometimes when an algorithm performs poorly in extracting the communities, the failure may be due to the metadata selected. Metadata may either not fully represent all the groups that may be belonging in a network or they may not relate too well to the overall community structure.

Jebabli et al. (2015) argues that when quality evaluators are considered as a criterion for optimization, the community structure may get overlooked. This is because the quality measures are independent of the underlying network topologies. Two dissimilar communities having different connections between nodes may give similar NMIs. As suggested, to be termed as an efficient algorithm, it must provide clusters that agree with the community topology more as compared to the ground truth community. Despite the lowering of evaluation metrics results, the algorithms must focus on encoding the network community topology. To prove the importance of maintaining the topological structures during clustering, the work uses the Amazon dataset’s ground truth community structure, and compares it with the estimated community structure produced by the popular community detection algorithm. It then studies the topological properties of the graph at macroscopic level (through average clustering coefficient, diameter, density and degree correlation, average shortest path), the microscopic level (via hop distance, node degree distribution and the associated average clustering coefficient) and finally the mesoscopic level (via community size distribution).

This work presents a comparative analysis methodology using the Modularity, NMI and accuracy measures similar to Jianjun et al. (2014) stated above. But considering the observations presented by Lee and Cunningham (2003) and Peel et al. (2017), the two algorithms shall be evaluated not only on the small datasets with ground truth available but also for the large network communities to get an overall idea of the variation in results. Having been convinced by the arguments presented by Jebabli et al. (2015), as part of the future work, the results produced by the two algorithms shall be evaluated against the topological metrics to see how well each algorithm performed against the grounds truth network topologies. Furthermore, seeing the impact of using modularity based kernels (Sommer et al., 2017) alongside the K-Means algorithm on the clustering performance, such an approach shall be explored in the future.

Background ConceptsThis section shall provide a brief overview of a few terminologies and concepts associated with Louvain and K-Means algorithm that are used in the thesis.

Graph TheoryGraphA graph is formed by a set of vertices, connected through edges. The general notation used to denote a graph is G = (V, E), where V is the vertices/nodes list and E is the edges list. Figure 1 shows a simple graph consisting of 20 nodes. The edges between the nodes are 52.

Figure SEQ Figure * ARABIC 1 Components of a GraphGraphs can be directed or undirected, weighted or unweighted.

a) directed, weighted (b) undirected, unweighted

Figure SEQ Figure * ARABIC 2 Graph TypesThe analysis presented in this work only deals with the undirected and unweighted graphs.

CommunityA group of nodes within in graph, exhibiting similar characteristics can be grouped to form a community. For a graph G, the community would be C = (Vc, Ec) such that C would be a subset of G. In Figure 3, communities are formed from the graph from Figure 1.

Figure SEQ Figure * ARABIC 3 Communities within a GraphCommunity may also be defined as a cohesive group where the interaction of the group members amongst each other is more intense as compared to the communications of the members with entities present outside the group (Jebabli et al., 2015). Based on this intuition, to find good communities Cg, the objective would be to maximize the number of edges entering a community, i.e. internal degree, denoted by vci and minimize the external degree, denoted by vci .

Cg=maximizevci and minimizevce , where vci ,vci ϵ GCommunities can be overlapping and non-overlapping. Overlapping communities are those communities where the nodes are assigned to more than one cluster/community.

Figure SEQ Figure * ARABIC 4 Overlapping Vs.. Non-Overlapping Communities (Whang & Dhillon, n.d.)The analysis in this work mainly focused on evaluating the algorithm performance for the real-world non-overlapping communities.

Edge MatrixAs K-Means works on clustering 2D space data, a way to map the network nodes and edges into a 2D space format was sought after. A new Edge matrix was created for this purpose. The matrix refers to a two-dimensional (2-D) vector which represents the network structure in a spatial representation i.e. it represents the nodes and edges in the form of a 2-D matrix. The K-Mean clustering is applied on this 2D vector. It is essentially an adjacency matrix (Weisstein, 2018) but unlike the adjacency matrix it has ones in the diagonal.

For a given directed or undirected graph, if v are the vertices of the graph, the edge matrix is a (0-1) matrix would be of size v x v i.e. v rows and v columns. An edge from the graph would be represented by a 1 between two vertices in the matrix. Similar to the adjacency matrix, for the undirected graphs, the edge matrix is symmetric. But unlike the adjacency matrix, the edge matrix has 1 in its diagonal. A few examples of the edge matrix are given below.

Figure SEQ Figure * ARABIC 5 Edge Matrix SampleFigure 6 shows three different graphs consisting of 4 vertices. In all the three graphs, the edge matrix would have 4 rows and 4 columns, representing the information about each vertex with the other vertices in the graph.

Consider the node ‘1’ of the first graph. An edge exists between 1 and 4. The presence of this edge is represented by a 1 at the intersection of row 1 and column 4.

Now, move to the next vertex, ‘2’. Node ‘2’ is connected only to node ‘4’. This edge is marked by 1 at the intersection of row 2 and column 4.

Now consider node ‘3’, it connects to only node 4. So, a 1 is marked at row 3 and column 4.

Now, consider node ‘4’. It is connected to nodes ‘1’, ‘2’ and ‘3’.

The diagonal is marked by 1.

The resulting matrix is the edge matrix of the first graph.

Similarly, the edge matrices of the remaining two graphs can also be determined.

The K-Means algorithm processes all of the data in every iteration. For the very large networks e.g. a network of over 10,000 nodes, using a matrix of nxn size, where n is the number of network nodes, can make the data size extremely big and sparse. Performing computations over such a big matrix becomes extremely expensive in terms of processing capabilities, time and memory. Thus for very large networks, use of a nxn sized matrix during K-Means clustering is an impractical approach. So as a general rule, when using K-Means to cluster data of very large networks, a reduced dimensionality is used to represent the data e.g. laplacian eigen vectors, fourier transform, principal component analysis, etc. Clustering over these reduced data representations, provides faster results, and requires less resources, providing equivalent results. In this thesis, the networks considered for clustering were of relatively smaller sizes; the biggest being of 1005 nodes. So no preprocessing step of data dimensionality reduction was performed. The edge matrix represented the actual edges of a nxn network.

Clustering AlgorithmsIt is the process of labeling a set of objects in such a manner that all the similar objects are placed in the same group, forming a cluster. According to the categories defined by Fortunato (2010), there are four types of clustering; Hierarchical, Graph Partition, Spectral and Connected Component.

Hierarchical Clustering

Large networks such as social networks contain a hierarch of networks i.e. many networks within a network. Hierarchical clustering aims to detect these multi-level clusters residing in a network. All algorithms start from finding a similarity or difference between the vertices. The detection may be top-down (divisive) or bottom-up (agglomerative). Figure 6 shows a dendogram example of hierarchical clustering by Xu and Wunsch (2009).

Figure SEQ Figure * ARABIC 6 Dendogram Example to Compare Agglomerative and Divisive ClusteringDivisive Clustering

In divisive clustering, a large network is split into smaller networks satisfying a criterion. The basic principal of the algorithm is shown in the work of Kaufman and Roussew (1990). Divisive clustering methods are computationally extensive. In a network of N data points, they consider 2N-1-1 possible two-subset divisions. Therefore, this type of hierarchical clustering is less popular than the agglomerative clustering (Xu and Wunsch, 2009).

Agglomerative Clustering

In agglomerative clustering, small clusters are merged to form bigger clusters based on some measure. A number of agglomerative algorithms have been proposed in literature (Zhang et al., 2012). Louvain algorithm is an agglomerative hierarchal clustering algorithm. An advantage of hierarchal clustering is that prior knowledge of the number or size of clusters is not required. A disadvantage is incorrect classifications e.g. nodes with degree one are classified as separate clusters (Fortunato, 2010).

Graph PartitioningIn this clustering approach, the nodes are partitioned if their edge importance is low, two clusters joined by an edge between two nodes. The edge can be cut. From the algorithms in literature, the graph-partitioning algorithm by Kawaji et al. (2001) clusters protein sequences.

Spectral ClusteringNetwork points are transformed into corresponding points in special coordinates. Clustering methods are then applied on the transformed vector. Algorithms by Meila-Shi (2001), Ng et al. (2002), Shi and Malik (2000) and Kannan et al. (2000) are a few spectral algorithms but K-Means (Lloyd, 1982) algorithm is most widely used.

Connected ComponentA graph may contain portions or subgraphs where every vertex can be reached from every other vertex that is part of the subgraph.

Figure SEQ Figure * ARABIC 7 Connected Component (Kulkarni, 2017)The watershed segmentation algorithm (Vincent and Soille, 1991) and the Hoshen-Kopelman algorithm (Shapiro and Stockman, 2002) works on connected components.

Spectral and Hierarchal Clustering AlgorithmsThe underlying learning process of a clustering algorithm can be either supervised or unsupervised (Tzanakou, 2017). Both K-Means and Louvain are unsupervised learning algorithms. K-Means however has the capacity of being trained and provide supervised clustering e.g. by training the algorithm on a network graph and the corresponding ground truth community, the algorithm would be able to predict labels when presented with a similar network map.

Essentially, clustering of networks is an unsupervised learning problem. The ground truth of networks is unknown and groups are made according to relatable features of the data. So, it was important to consider algorithms that had unsupervised learning capabilities and were driven by the features extracted from within the data. Both K-Means and Louvain embed logic of unsupervised learning and can effectively cluster data. Besides being most sought after algorithms for clustering in literature, both have different design and traits. K-Means considers all data while Louvain reduces the data size with every iteration. K-Means has the advantage of predicting the missing features during the clustering process. K-Means require manual input regarding selection of k, while Louvain is a fully automated method.

Another motivation for selecting K-Means and Louvain for this work was that in a related work by Wang and Koopman (2017), these two algorithms were considered to cluster semantic data representation of a very large Astro dataset (with over 18,000 nodes). The ground truth in that comparison was an experimental approximation. This work can be considered as a continuation of the comparison of the two algorithms using smaller datasets of different sizes and patterns with known ground truths.

Louvain AlgorithmLouvain algorithm is an unsupervised, agglomerative, hierarchical clustering community finding technique that is heuristic greedy and is based on modularity optimization. It finds a clustering that gives the local maximum modularity. The algorithm comprises of two phases that are repeated as long as there is an increase in the modularity measure (Lund, 2017). A single run of the two phases forms an iteration. At the end of each iteration is a clustering (partition) of the graph and a new level in the dendogram, as shown in Figure 8. The root of the dendogram represents the final clustering and has the highest modularity.

Figure SEQ Figure * ARABIC 8 Louvain Algorithm Dendogram (Lund, 2018)Phase one: This is the improvement phase where every node is assigned a separate community. The nodes are then traversed randomly. For each traversed node i, the algorithm calculates the modularity change when the node i is moved from its currently assigned community to any of the neighboring communities. If a potential move gives a higher modularity change, then the node i assigned to that neighboring community. If no potential moves give a higher modularity change, then the node remains in its current community.

Let c be the neighboring community of node i that it is merging into. The change in modularity for a node is computed through the following equation (Lund, 2017);

∆Q= in+2ki,in2m-tot+ki2m2-∑in2m-∑tot2m2-ki2m2

Where,

∑in is the sum of weights of the edges inside the community c∑out is the sum of weights of the edges that are incident to the nodes contained in the community cki is the sum of the weights of edges that are incident to the node iki,in is the sum of the weights of edges from the node i to the nodes in cm is the sum of the weights of all the edges in the network

The process repeats till in a sweep of all the nodes there are no more moves i.e. there is no increase in the modularity for any potential moves for each of the nodes for its respective neighboring nodes. When the modularity stops to improve, then it means the algorithm has found the local maximum modularity.

Phase two: This is the coarsening phase where each community found in phase one is considered as a new node for further processing. The edges that exist in the previously detected communities are replaced with self-loops that are connected to the new nodes. The weight of the self-loops is specified by the sum of weights of the edges that were replaced. A single edge between the new nodes replaces all the previous edges between the corresponding communities. The weight of this new edge is equal to the sum of weights of all the edges that were replaced.

The two phases repeat till the local maximum modularity is not reached.

A simple example showing the output of the two phases on a simple graph is shown below. The graph comprises of 10 nodes. At the end of phase one, three communities were found. Which are then transformed into 3 nodes of the graph, by the end of phase two. The intra-community edges are replaced by self-loops, while the inter-community edges are replaced by single edges between the corresponding node nodes.

(a) original graph (b) after phase one (c) after phase two

Figure SEQ Figure * ARABIC 9 Example of Louvain Algorithm (Lund, 2017)The pseudocode of the Louvain algorithm is given below.

Figure SEQ Figure * ARABIC 10 Pseudocode of Louvain Algorithm (Kim et al., 2013)The typical framework of the algorithm after interpretation as a flowchart is shown in the figure below.

Figure SEQ Figure * ARABIC 11 Framework of Louvain Algorithm

The algorithm has a time complexity of O (n log n) where m is the number of network edges. A linear complexity makes the algorithm run fast. Much of the computation is carried out in the initial phase of the algorithm. After the initial few passes, the number of communities decrease drastically which lowers the computations carried out in the later passes.

Fortunato and Barthelemy (2007) show that optimization algorithms that have the goal of maximizing modularity often get affected by ‘resolution limit’ wherein the independent adjacent communities also get merged together as it raises the overall value of modularity. Adjacent cliques that are connected by only a single edge are generally separate communities. By using a resolution parameter, a workaround this issue can be achieved. By setting the resolution parameter the small or large communities can be targeted for a network (Reichardt and Bornholdt, 2006). But use of resolution parameter limits the overall number of communities from being detected (Krings and Blondel, 2011). This thesis does not use any resolution parameters.

Secondly, in the cases where there are exponentially large number of modularity solutions, the algorithm cannot estimate whether the selected modularity is the global maximum one. It cannot be determined that whether the selected modularity is more significant than the remaining options (Good et al. 2010).

K-Means AlgorithmK-Means is a spectral clustering, greedy, and unsupervised learning algorithm that partitions a network of n nodes o1, o2, o3,…on into exactly k clusters C1, C2, C3, … with the objective of minimizing the sum of squares distance error, E, between the nodes and the centroids of each cluster, cen1, cen2, cen3,…. The initial clusters are repeatedly shuffled and assigned a cluster with closest centroid. The centroids are recomputed until there are no reassignments. At this point, E is minimum.

The objective function of the algorithm is to minimize the sum of squared error, E (Landman et al., 2018).

E=i=1ko∈Cid(o,ceni)2=1Where d is the Euclidean distance between the node under consideration and the selected centroid.

The initial stage of centroid selection affects the overall partitioning results of the k-Means algorithm. A good selection of centroids will lead to better clustering results and vice versa.

K-Means is generally suited for applications where the target clusters are distinct and well apart. In cases where the clusters may be distinct but there is overlapping in the data, or noisy data, or presence of outliers, the K-Means algorithm may not perform well (Landman et al. 2018). Although K-Means is NP-Hard i.e. it can take nondeterministic polynomial time, if the number of clusters, k, and data points, d, are fixed, then using Lloyd’s Algorithm, the problem can be solved in O(ndk+1log(n)) time, where n is the number of nodes that need to be clustered (Landman et al. 2018). So, the performance may suffer in larger networks as compared to the smaller ones.

K-Means++ AlgorithmMany variations have been proposed for the K-Means algorithm to improve the algorithm design (Elkan, 2003; Fahim et al., 2006; Blondel et al., 2010), improve the initial centroid selection (Arthur and Vassilvitskii, 2007) or conduct soft fuzzy clustering for overlapping networks where a node can be a part of a number of clusters (James et al., 1984).

This thesis uses the Blondel et al. (2010) version of the algorithm. It also uses the K-Means++ version by Arthur and Vassilvitskii (2007) which uses the Blondel et al. (2010) algorithm but picks the initial centroids that are far apart from one another rather than randomly. The idea is to avoid having the initial centroids within the same cluster, which may result in a suboptimal solution. For instance, in Figure below, two centroids happen to be very close to one another resulting in a suboptimal clustering.

(a) Suboptimal Clustering (b) Optimal Clustering

Figure SEQ Figure * ARABIC 12 K-Means Sensitivity to Initial Centroids SelectionThe pseudocode for the K-Means algorithm is presented below.

Figure SEQ Figure * ARABIC 13 Pseudocode for K-Means Algorithm (Lloyd, 1982)As can be seem from the pseudocode, the K-Means algorithm has two essential steps. Initially, the algorithm randomly selects k of the objects. In the start, these randomly selected k objects are assumed to represent the mean value or the center of a cluster (centroid). In the first step of the algorithm, all the remaining objects are processed such that based on the distance between the object and the center, an object is assigned to the cluster to which it is the closest. In the second step, for each cluster, the new mean (centroid) is computed. These two steps are repeated till the centroid of clusters have no changes.

Figure below shows an interpretation of the K-Means algorithm in the form of a flowchart.

Figure SEQ Figure * ARABIC 14 Framework of the K-Means AlgorithmThe worst case complexity is given by in Onk+2plogn where n is the number of sample nodes and p is the number of features. The average complexity is linear i.e. O (k n T), where T is the iterations, and n is the sample nodes.

In comparison to the Louvain algorithm, K-Means can equally fast results on average, provided the algorithm does not converge to some local minima. To avoid this problem, the algorithm is generally run several times. As the algorithm is fast, repetitive runs do not lower its effectiveness. For smaller networks, the K-Means algorithm may take lesser time than Louvain.

ContributionAs mentioned previously, Louvain and K-Means are two of the most widely used algorithms for clustering of data. Yet there is little to no formal literature available that compares the performance of the two algorithms on common benchmark datasets. The contribution of this thesis is that it compares the two most widely used clustering algorithms on three commonly used benchmark datasets. The comparison helps determine their performance strengths and limitations.

The datasets selected are essentially non-overlapping but some have distinct clusters while some do not. So the results of the research can draw out the most suitable clustering environment for both the algorithms.

The datasets are of different sizes. This will highlight the behavior of the algorithms as the size of data increases from small to large. Secondly, as the two algorithms implement different clustering techniques, i.e. hierarchical and spectral, the results of the comparison can be generalized to represent their underlying clustering techniques.

This work shall contribute to the existing literary database of algorithm comparisons for community detection. Such comparisons can serve as a dictionary to decide the situations where a particular algorithm is the most suitable one to use. By using the datasets that are considered as clustering algorithm benchmarks in literary circles, and estimating the performance of the two algorithms on them, the results can be contrasted with the existing evaluation results of the alternative clustering algorithms on the same datasets. So the work lays the foundation of future survey based research of algorithm evaluation. Such surveys do not exist with reference to the same datasets.

The work also uses a new Edge matrix to transform network data into spatial form for applying K-Means. The results from this work can be used to contrast the K-Means performance increase or decrease using other forms of network data representation techniques. Thus this thesis also lays the groundwork for future research in the use of K-Means for social network clustering.

Problem DefinitionThe task of community detection can be a complex process. For instance, in social networks, it involves evaluating people, their interactions and predicting the missing information. In view of the complexity of the community detection task, a large volume of data and algorithms have been suggested in the literature. These proposals are a general contribution to the community detection research or are specific to some selected application domains.

Louvain and the K-Means algorithms are two popular algorithms in the research circles used to find communities or clusters. The reason for selecting these particular algorithms was that Louvain algorithm is usually the first choice for community detection tasks in graph networks (Blondel et al. 2008). While K-Means is a highly popular choice when dealing with applications that require spatial clustering e.g. of images. In network analysis, there are no spatial links between nodes. So the strength of the algorithm is aimed to be tested in network graphs. K-means algorithm has been successfully adopted in clustering network systems (Vilcek, 2014). A comparison of Louvain and K-Means algorithms on the same datasets will help compare the performance and efficiency of the two.

Louvain algorithm is an agglomerative hierarchical clustering algorithm while K-Means is a spectral clustering algorithm. Both these algorithms help partition a large network into smaller communities. The aim of this paper is to use three real-world datasets for comparing and analyzing the performance of the Louvain algorithm and K-means algorithm. A set of popular performance metrics shall be used for performing the analysis. The data sets shall be acquired from the real-world data sets with ground truth communities known, and the datasets being publicly available for research purposes. The work shall prove whether;

K-Means and Louvain alone can capture the essence of a network or not.

The underlying measures i.e. modularity for Louvain and SSE for K-Means is enough to cluster a network or not.

The closeness to the ground truth is enough a measure to determine the efficiency of an algorithm or not.

Improvement of the preliminary phase of the algorithm (i.e. centroid selection) improves the results or not.

System OverviewThe system developed to carry out the comparison analysis comprised of a community identification module and a comparison module. The identification module comprised of the two algorithms under analysis, i.e. the Louvain algorithm and the K-Means algorithm. The network data set served as the input to the identification module. A map indicating the cluster number of each node, was the output of the identification module. The comparison module compares the output community map to the ground truth communities, thus helping in assessing the overall performance of the algorithm against a specific data set. The overall system’s framework is presented in the Figure below.

Figure SEQ Figure * ARABIC 15 System Framework

ExperimentsThis section presents the testing results of the two algorithms against three datasets of different sizes that are publicly available for research purposes; Zachary’s karate club dataset, Dolphin social network dataset and a large network communications dataset email-Eu-Core. All three datasets have non-overlapping communities and are publicly available online. The section also provides the evaluation of the K-Means++ version of the algorithm to see how the improvement in the centroid selection process improves the overall results of the K-Means algorithm.

DatasetsDatasets are representations of a real-world network system represented in the form of graphs i.e. nodes and edges. These datasets are based on the real data of a system that may be gathered over a long period, as part of an experiment, observations or while dealing with a problem. Later, these datasets are made available to the public to serve as test cases for testing system performances (SNAP, 2018).

Apart from the real-world systems, some synthetic models have also been proposed that can generate datasets of varying complexities. These synthetic models are designed to check the scalability of community detection algorithms (Fränti and Sieranoja, 2018).

The scope of this thesis is, however limited to only the presentation of an analysis of the real-world datasets. This section gives a brief overview of the three datasets. The overview includes the historical background, ground truth communities and the community goodness scores of the ground truth communities.

Ground Truth Communities

For some network datasets of real-world systems, the clusters into which the network eventually partitions, is known. These known partitions or clusters are the ground truth communities. Any clustering or community identification algorithm’s performance can be evaluated against the ground truth communities (Han et al., 2011). Generally, the ground truth clusters and the particular nodes that should be within a cluster, are known for small real-world networks. For bigger networks, generally, only the number of communities are known as the ground truth. For instance, the top 500 communities are given for some very large datasets and not the complete ground truth (SNAP, 2018).

Zachary’s Karate Club DatasetThis dataset was selected for the analysis not only because the data set represents a small network but also because it is popularly used as a benchmark for community detection algorithms in the research community (Hric et al., 2014). The dataset has been made available in GML format by Newman (2013). Its ground truth is also available (Girvan and Newman, 2002; Cheng et al., 2014).

The dataset was put together by Zachery (1977) over a three years’ time. It represents the data from a university’s karate club where the participants are represented by nodes. The social relations between any two club members that go beyond the club premises are represented by an edge between the two corresponding nodes. The network comprises of 34 nodes (members) and 78 edges (communication links between members). Due to a dispute between the instructor and the administrator, the club was split into two groups. The members that had more communications with the administrator eventually chose to join that team. While the rest joined the instructor.

As the ground truth communities are known for this data set, it guarantees a simpler performance evaluation and analysis of the community detection outcome of the two algorithms.

Ground Truth Labels

From the domain knowledge of the situation, the actual communities are known to be 2 for the dataset (Girvan and Newman, 2002). The nodes included within each community are listed in Table 1.

Table SEQ Table * ARABIC 1 Ground Truth Communities for Karate Club DatasetCommunity Label Node IDs in Community

0 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21

1 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33

The expected clustering graph for the ground truth communities of the dataset is shown in the Figure below. All the network graphs were generated using the ‘plot’ feature of the iGraph library. The Kamada-Kawai force-directed algorithm was selected to present the layout of the graph nodes (see Appendix for detail). The nodes were set to display their labels. And all the nodes within a community were assigned a separate color than the rest. The same graph convention has been used for all the graphs in the remaining thesis.

Figure SEQ Figure * ARABIC 16 Ground Truth Clustering Map for Karate Club DatasetGround Truth Community Evaluation

A good community model must have sound scores for the community stability measures such as modularity, silhouette and Calinski and Harabasz score. The scores for these three metrics are shown in the Table 2. The scores were calculated based on the ground truth communities and the lower 2D data representation of the network graph i.e. edge matrix. The two vectors were passed to the corresponding Python function for each of the metric, which returned the community quality score (See Appendix for the related Python functions).

Table SEQ Table * ARABIC 2 Ground Communities Scores for Karate Club DatasetModularity Silhouette Calinski and Harabasz0.37 0.16 7.83

As can be seen from the above table, the ground truth has all positive values for the community goodness metrics, which means there is no overlapping between data.

Dolphin’s Social Network DatasetThe dataset has been made available in GML format by Newman (2013). Dolphin’s social network dataset, gathered by Lusseau et al. (2003), was chosen because it is twice the size of Zachary’s karate club dataset. Also, the ground truth communities are known (Cheng et al., 2014).

The dataset comprises of 62 nodes, where each node represents a bottlenose dolphin. The dolphins lived in Doubtful Sound, New Zealand. Over a period from 1994 to 2001, the dolphins were observed to frequently communicate with one another. This communication is represented by an edge. The total edges are 159.

Ground Truth Labels

Total communities for the dataset are 4. Table 3 shows the nodes in each cluster (Cheng et al., 2014).

Table SEQ Table * ARABIC 3 Ground Truth Communities for Dolphins DatasetCommunity Label Node IDs in Community

0 1, 5, 6, 7, 9, 13, 17, 19, 22, 25, 26, 27, 31, 32, 41, 48, 54, 56, 57, 60

1 0, 2, 10, 28, 30, 42, 47

2 4, 11, 15, 18, 21, 23, 24, 29, 35, 45, 51, 55

3 3, 8, 12, 14, 16, 20, 33, 34, 36, 37, 38, 39, 40, 43, 44, 46, 49, 50, 52, 53, 58, 59, 61

The clustering graph for the ground truth communities of the dataset is shown in the Figure below.

Figure SEQ Figure * ARABIC 17 Ground Truth Clustering Map for Dolphin’s Social Network DatasetGround Truth Community Evaluation

The scores for the community stability evaluators i.e. modularity, silhouette and Calinski and Harabasz score, for the ground truth communities are given in the Table 4. Considering the three measures, as all are positive, the quality of the ground truth communities is found to be sound.

Table SEQ Table * ARABIC 4 Ground Communities Scores for Dolphins DatasetModularity Silhouette Calinski and Harabasz0.519 0.117 6.392

Email-Eu-Core DatasetThis dataset was selected because it represents a large real-world network and has been used in literature to analyze algorithm performances (Venkatesaramani and Vorobeychik, 2018). And so, the dataset would serve as a benchmark for the performance of Louvain and K-Means algorithms in the large social data networks where the communities are generally overlapping. The dataset has been made available by SNAP (2018).

The email-Eu-core network represents the email data of a large European research institution. The nodes represent the people of the institute and the edges represent any email that they might have sent to another person from the university. Any external communication with the rest of the world is not part of the dataset.

The data is represented by an unweighted, directed graph with 1005 nodes and 25571 edges.

Ground Truth Labels

The ground information for the dataset is provided by SNAP (2018). The ground truth comprises of 42 departments (communities) and each person (graph node) is associated with exactly one department (community). The core data for the 42 communities has also been provided by SNAP (2018).

Ground Truth Community Evaluation

The scores for the community goodness metrics measures for the ground truth communities are shown in Table 5. As can be seen, the dataset has a negative value for the Silhouette score. This means that there is some overlapping between the data.

Table SEQ Table * ARABIC 5 Community Goodness Scores for email-Eu-core Ground TruthModularity Silhouette Score Calinksy & Harabasz

0.42 -0.197 7.129

Datasets SummaryTable below shows a comparative summary of three datasets.

Table SEQ Table * ARABIC 6 Datasets Features SummaryKarate Club Dolphins Email-Eu-Core

Nodes 34 62 1005

Edges 78 159 25571

Ground Communities 2 4 42

Ground Modularity 0.37 0.519 0.42

Ground Silhouette 0.16 0.117 -0.197

Ground Calinski & Harabasz7.83 6.392 7.129

Algorithm Performance EvaluationPerformance MeasuresFor situations where the ground-truth of a community structure is known, the performance evaluation process is done by a simple comparison of the discovered communities with the known ones (Han et al., 2011). The various performance metrics suggested in the clustering algorithms research can be classified into three main categories:

Pair counting based measures – based on the counts the points on which the pair disagrees or agrees e.g. Rand Index, Jaccard Index, Adjusted Rand Index (ARI).

Set-matching-based measures – based on the cardinality of sets, aimed at finding the largest intersections between the pairs of vertices belonging to different clusters, e.g. purity.

Information theoretic-based measures – based on the mutual information shared between two clusters to check their agreement e.g. Normalized mutual information (NMI), Adjusted Mutual Information (AMI).

Choosing an evaluator is important for the type of solutions being analyzed. For instance, the evaluation metrics, accuracy and Jaccard similarity, are not ideal evaluators for binary or multilevel clustering. This is because in situations where the clusters identified contain the same nodes as the ground truth but their labels are assigned a different label, they are assigned a score of zero. And as for binary and multilevel clustering, the two metrics behave the same, they both were not chosen. Alternatives such as completeness, homogeneity were considered e.g. homogeneity, completeness, v measure, etc.

ARI is suitable for situations where the clustering results have high chances of having the same sized large clusters. While AMI is suitable for situations where the clustering results are small sized unbalanced clusters (Romano, 2016).

In situations where no information is known about the community structure, community quality measuring metrics are used (Han et al., 2011). These metrics are based on various characteristics that belong to a stable community structure. These metrics measures help ensure that communities are formed by sets of densely connected nodes which are poorly connected to the remaining network e.g. modularity, silhouette score, Calinski and Harabasz, etc.

To understand the quality of clusters formed be the two algorithms both the performance evaluators were to be studied i.e. the quality of the clusters formed as well as the accuracy against the ground truth. Based on these requirements of the system, ten performance variables from the three categories were selected to evaluate the two algorithms. A brief overview of the ten performance measures shall be detailed in this section. The idea behind choosing a range of evaluation metrics was to analyze the algorithms from all relatable perspectives.

Community Goodness Metrics

Network communities exist is different forms. They can be disjoint, overlapping, hierarchal, etc. Depending on the type of applications there are various heuristics to measure the quality or goodness measure of the detected communities within a network (Chakraborty et al., 2017). These goodness measuring heuristic metrics determine the level of quality of the communities formed. For instance in the case of non-overlapping community networks, the boundaries between the different clusters are distinct. And a node assigned would rather be in the selected cluster rather than any other cluster.

Modularity

Modularity measures the density of nodes i.e. the interaction of nodes within the community vs. the interactions outside the community. The value ranges from between -1 and 1. A value of 1 indicates a very stable community while a negative value indicates that the nodes are in the wrong cluster. The metric has extensively been used in literature to measure the quality of clusters detected by different algorithms (Rabbany et al., 2010; Falih, 2018; Collette, 2015).

Modularity of a network graph G is defined as;

Modularity= 12ei,j∈GAM- Degi*Deg(j)2e*δ(i,j)Where,

AM is the adjacency matrix

e is the number of edges in the network

δ(i, j) is 0 if the node i and node j are not in the same community, and it is 1 if the two nodes i and j are in the same community

Deg(i) is the degree of node i i.e. the number of edges connected to node iG is the network graph

An important aspect of the modularity metric is that due to its resolution limit, the metric cannot accurately evaluate small sized communities (Fortunato and Barthelemy, 2007).

Silhouette Score

Silhouette Score returns the ratio of the intra and inter-cluster distances. Its value is in between -1 and 1. -1 indicates wrong identification, 0 suggests the presence of overlapping communities and 1 suggests a sound clustering. Literature shows that the quality of clusters have been assessed through this metric (Falih, 2018; Fagnan, 2012).

The score is computed by finding the distance of node from the centroid of the community that it belongs to, and then comparing it with the distance of the node from the centroid of the nearest neighboring community that it does not belong to. If the assignment is correct then, the former (assigned community) distance should be shorter than the latter (neighboring) distance. These distances of the nodes are accumulated and normalized by the total number of nodes of the network. The larger the value, the better the clustering is, as it would indicate that the nodes are closer to their centroids as compared to the neighboring communities’ centroids.

If C is the centroid of a community a node i belongs to and Cnearest is the centroid of the nearest community the node i does not belong to, then in formal terms, silhouette score, S, can be defined as;

S=1Ni∈Nsilhouette(i)Where,

silhouettei=bi-aimax(ai,bi)Where,

bi=1Cnearestj∈Cnearestdistance(i,j)And

ai=distance(i,C)The nearest neighbor is decided by evaluating the distance of all the neighboring centroids, and then choosing the nearest one.

Calinski and Harabasz Score

Calinski and Harabasz Score, also called by the term Variance Ratio Criterion, this metric defines the ratio between the intra and inter-cluster dispersions. The metric has been used in literature to evaluate the quality of clusters in the absence of ground truth (Calinski and Harabasz, 1974; Falih, 2018; Fagnan, 2012).

Formally, the variance ratio criterion, VRC, is defined as;

VRC= CDistinterCDistintra×N-kk-1Where,

N is the network

K is the set of communities

C is a selected community from k

C is the centroid of a community, CN is the centroid of the entire network, NCDistintrais the intra-cluster dispersion or scatter matrix;

CDistintra=C∈ki∈Cdist(i,C)2CDistinter is the inter-cluster dispersion or scatter matrix;

CDistinter=C∈kC*dist(C,N)2N-kk-1 is the normalization term that prevents the VRC score to rise monotonically with the increasing number of clusters. This makes VRC a maximization optimization criterion. A large value of VRC would indicate that the spread of the communities is vast which means that the communities are compact. The relationship is reciprocal. The greater spread value means that the communities are well defined i.e. their intra-distances are small (compact) as compared to inter-distances.

Comparison Metrics

The measures in this section help compare the clustering performance against the ground truth community labels. These measures have been used in literature to evaluate the performance of clustering algorithms.

NMI

NMI is based on information theory’s Shannon entropy, it a comparison of the mutual entropy information shared between two clusters (Zhang et al., 2018). A value of 1 indicates a high correlation i.e. identical clustering results while a value of 0 indicates a low correlation i.e. independent clustering results. The metric has been used to evaluate clustering results (Collette, 2015; Fagnin, 2012).

If H is the entropy of a cluster and the two clustering to compare are A and B, the NMI score is determined by taking the ratio of their mutual information, I, to the sum of their individual entropies H(A) and H(B).

NMIA,B=2*I(A,B)HA+H(B)The entropy of a clustering is based on all the constituent clusters. If C is a cluster in the clustering A, N is the size of nodes in the cluster C, then the entropy of A can be found as;

HA= -C∈APC*log2P(C)where PC=CN.

For another clustering B, where D is a cluster of B, and P(C, D) the joint probability is defined as PC,D=C∩D/N. , the conditional entropy H(A|B) is estimated as;

HAB=C∈AD∈BPC,D*log2P(C)P(C,D)The mutual information I (A, B) is estimated as;

IA,B=HA-H(A|B)Purity

Purity gives the proportion of the correctly labeled members. Each identified cluster is matched with the one cluster from the reference clusters with which it has the maximum overlap. Then the count of the similar nodes gives an accuracy of the match. A purity score of 1 gives a complete match. The metric has been used for measuring the clustering results of algorithms (Rabbany et al., 2010; Hu, 2015).

Formally, the purity of a clustering (partition), C, with regards to the ground truth partitioning, C, is given by;

purity (C,)= 1Nk=1Kmaxl∈{1,…,K}Ck∩Cl∈0,1Where,

N is the network

C is a partition of N comprising of non-overlapping communities, i.e., C = {C1, C2, C3, …, CK}

C is the ground truth partitioning i.e. C=C1,C2,C3,…, CKIntuitively, purity measures the fraction of nodes that have been labelled correctly. The metric cannot be used to determine the quality of clusters. This is because in a situation where all the nodes are allocated to their individual communities, the purity score would be 1.

AMI

AMI measures the similarity between clusters and is independent of the absolute values of clusters. As it is a chance based metric, for a network with a large number of clusters, the AMI is generally high. For the independent clusters, the score is 0. For similar cases, the value is 1. The metric is used in literature for evaluating clustering results (Feng, 2014).

Formally, the AMI is defined as,

AMI(A,B)=IA,B-E{I(A,B)}H(A)H(B)-E{I(A,B)} Where

A is the clustering labels

B is the ground truth labels

H(A) is the entropy of clustering A

H(B) is the entropy of clustering B

I(A, B) is the mutual information

E{I(A, B)} is the expected value of mutual information between all possible cluster pairs

The higher the value of AMI is, the better the clustering results are.

ARI

ARI gives a ratio of the number of nodes that were correctly identified. The metric fines the false negatives and false positives. The metric varies from -1 to 1. ARI score of 1 indicates that the partitions/clusters are as expected identical, -1 indicates no similarity i.e. no agreement while a score of 0 shows a random inconclusive agreement. The metric has been used to evaluate clustering algorithms (Rabbany et al., 2010; Collette, 2015; Tang, 2017; Wagner and Wagner, 2007; Fagnan, 2012).

If X = {X1, X2, X3, …, Xm} and Y = {Y1, Y2, Y3, …, Yn} are two partitions (clustering), then their overlap (intersection) can be observed through their contingency table [ni,j]. Each entry of the table represents the common nodes between the two clustering Xi and Yj i.e. ni,j = |Xi ∩ Yj|. If {a1, a2, a3, …, an} represents the sums of the corresponding rows of the contingency matrix and {b1, b2, b3, … , bm} are the sums of the corresponding columns of the contingency matrix, then formally, adjusted random index, ARI can be defined as;

ARI= i,jnij2-iai2jbj2/n212iai2+jbj2-iai2jbj2/n2Where

n is the total nodes of the network

i,jnij2 is the index

iai2jbj2/n2 is the expected index

12iai2+jbj2 is the maximum index

FMI

Precision measures how accurate the detected clusters are. Recall measures how many clusters was the algorithm able to detect (Feng, 2015).

Precision= TPTP+FPWhile

Recall= TPTP+FNWhere

TP is the true positive

FP is the false positive

FN is the false negative

Consider the comparison of a clustering, C, with the ground truth, T. The above measures can be defined as follows;

True positive is the number of node pairs that reside in the same cluster for both the detected clustering, C and ground clustering, T.

False positive is the number of node pairs that reside in the same cluster for the clustering detected, C but in different class of ground labels, T.

False negative is the number of node pairs that reside in different clusters in both the detected clustering, C and the truth clustering, T.

FMI is a similarity measure based on the geometric mean of Precision and Recall (Fowkles and Mallows, 1983). The metric has been used in literature to evaluate clustering results (Wagner and Wagner, 2007).

Formally, the FMI is defined as;

FMI=TPTP+FP*(TP+FN)The value of FMI ranges from 0 to 1 where 1 indicates that a good similarity exists between the cluster and ground truth.

F1 Score

F1 Score is the weighted average (harmonic mean) of Precision and Recall that takes into account both the false positives and false negatives. It has been used in literature to measure the performance of classification algorithms (Li et al., 2008) as well as clustering algorithms (Feng, 2015; Wagner and Wagner, 2007).

The general formula for F- Score is;

FβScore= 1+β2Precision.Recallβ2.Precision+RecallWhere β is a positive value (usually 0.5, 1 and 2) that adds weight to the precision and recall measures.

For F1 score, the value of β is 1 and so the formula becomes,

F1 Score= 2*Precision.RecallPrecision+RecallThe best value of F1 score is reached at 1, while the worst is at 0.

Homogeneity

This measure is independent of the absolute label values. Given a ground truth, a cluster would be homogenous if all its points are contained within the same label. Its value ranges between 0 and 1, where 1 indicates complete homogeneity. The metric has been used to evaluate clustering results (Rosenberg and Hirschberg, 2007; Falih, 2018).

Assuming the following information;

N be the total number of nodes in the network

C presents the class labels such that C = {ci | i = 1, 2, …, n}

K presents the set of detected clusters such that K = {ki | i = 1, 2, …, m}.

A is the contingency table representing the two clustering C and K such that A = {aij} where aij are the nodes that are members of the ci class and detected in the clusters kj.

Then, formally the homogeneity, h is defined as;

h=1 if HC,K=01-H(C|K)H(C) elsewhere

HCK= -k=1Kc=1CackNlogackc=1CackHC=-c=1Ck=1Kacknlogk=1KacknFor the case of perfect homogeneity, the normalization factor H(C|K)H(C) is equal to 0.

Silhouette Coefficient MethodAn input parameter of K-Means algorithm in the total number of clusters to form, k. For any value of k, the K-Means algorithm generates the communities of the network. Based on the metric, Silhouette Coefficient, where the identified labels are related to the underlying position in the original data, a relation is identified between the input k and the corresponding value for the Silhouette Coefficient. The method is to select the value of k for which the silhouette coefficient score is the maximum. The relation of k and Silhouette score can best be seen in a line graph between the two values. Setting the value of k between 2 to the total number of nodes and studying the corresponding silhouette scores, the optimal value of k can be extracted for the network (Destercke, 2018; Kowalczyk, 2009).

Elbow CriterionUsing the Silhouette Criterion, it may happen that a high value is returned by only a few clusters. To avoid detection of very few clusters, the Elbow Criterion is also considered for finding the value of k (Destercke, 2018). A Sum of Squared Error (SSE) is the sum of the squared distance of each node included in a cluster from its centroid. SSE is estimated for each value of k. Using the heuristic Elbow Criterion, that value of k is selected for which there is an abrupt change in the SSE value. The relation of k and SSE can best be seen in a line graph between the two values. The general trend is that, as k increases the value of SSE decreases. SSE becomes 0 when k becomes equal to the total number of nodes in the graph. This is because each node itself becomes a cluster, and the difference between the cluster’s node and the centroid is no more. The goal of the Elbow Criterion is to select a value for k that is small and also has a low value of SSE. The elbow signifies the point after which, an increase in the value of k causes a uniform decrease in the SSE.

Algorithm EvaluationThis section evaluates the two algorithms based on the performance measures presented in the previous section.

Louvain EvaluationThis section gives a performance overview of the Louvain algorithm against all three datasets.

Karate Club Dataset

Using the Louvain algorithm, the number of communities found was 4. The detail of the nodes within each cluster are shown in Table 7.

Table SEQ Table * ARABIC 7 Louvain Results for Karate Club DatasetCommunity Label Node IDs in Community

0 4, 5, 6, 10, 16

1 0, 1, 2, 3, 7, 9, 11, 12, 13, 17, 19, 21

2 23, 24, 25, 27, 28, 31

3 8, 14, 15, 18, 20, 22, 26, 29, 30, 32, 33

The cluster map of the dataset is shown in the Figure below.

Figure SEQ Figure * ARABIC 18 Louvain Clustering Map for Karate Club DatasetThe scores for the Louvain algorithm’s performance evaluation metrics are shown in Table 8.

Table SEQ Table * ARABIC 8 Louvain Communities Scores for Community Evaluation MetricsModularity Silhouette Calinski and Harabasz0.418 0.146 5.54

Time taken by the Louvain algorithm was 0.0048s. The performance for the six metrics is given in the Table 9.

Table SEQ Table * ARABIC 9 K Louvain Performance Analysis (Karate)Purity NMI AMI ARI F1 Score Homogeneity FMI

0.98 0.61 0.44 0.49 0.8 0.83 0.63

Dolphins Dataset

Unlike the expected 4 communities, the algorithm found 5 clusters. The nodes within each cluster are listed in the Table 10.

Table SEQ Table * ARABIC 10 Louvain Results for Dolphin’s DatasetCommunity Label Node IDs in Community

0 0, 2, 10, 42, 47, 53, 61

1 1, 7, 19, 25, 26, 27, 28, 30

2 12, 14, 16, 20, 33, 34, 36, 37, 38, 39, 40, 43, 44, 46, 49, 50, 52, 58

3 5, 6, 9, 13, 17, 22, 31, 32, 41, 48, 54, 56, 57, 60

4 3, 4, 8, 11, 15, 18, 21, 23, 24, 29, 35, 45, 51, 55, 59

The clustering map for the dataset is shown in the Figure below.

Figure SEQ Figure * ARABIC 19 Louvain Clustering Map for Dolphin’s DatasetThe scores for the Louvain algorithm’s community goodness evaluation metrics are shown in Table 11.

Table SEQ Table * ARABIC 11 Louvain Communities Scores for Community Evaluation MetricsModularity Silhouette Calinski and Harabasz0.518 0.108 5.75

Time taken by the Louvain algorithm was 0.0051s. The scores for the six performance metrics are shown in Table 12.

Table SEQ Table * ARABIC 12 K Louvain Performance Analysis (Dolphins)Purity NMI AMI ARI Homogeneity F1 Score FMI

Score 0.887 0.73 0.64 0.64 0.79 0.032 0.74

Email-Eu-Core Dataset

Similar to the previous two datasets, the Louvain algorithm was directly applied to the network graph. The time taken by the algorithm to form communities was 0.0245s. The Louvain algorithm found 27 communities. And the modularity for the communities formed was 0.42. Table 13 below shows the goodness scores for the communities formed by the algorithm.

Table SEQ Table * ARABIC 13 Louvain Community Goodness Scores for email-Eu-core DatasetTime Taken (s) Communities Formed Modularity Silhouette Score Calinksy & Harabasz

0.0245 27 0.42 -0.295 3.96

The scores for the seven performance variables are shown in Table 14.

Table SEQ Table * ARABIC 14 K-Means Performance Analysis for k=2 (emailEucore)Purity NMI AMI ARI Homogeneity F1 Score FMI

Score 0.393 0.55 0.337 0.25 0.417 0.04 0.38

K-Means EvaluationThis section gives a performance overview of the K-Means algorithm against all three datasets.

Karate Club Dataset

To find the value of k, the elbow criterion was performed. For each value of k, the corresponding SSE score was recorded. The plot for the SSE for values of k starting 1 to 34 was then created to find the elbow, as shown in the Figure below.

Figure SEQ Figure * ARABIC 20 Elbow Criterion for Karate Club Dataset (K-Means)As no distinct elbow was present, the values of the traditional K-Means SSEs (‘144.18’, ‘115.82’, ‘99.18’, ‘85.26’, ‘75.83’, ‘68.47’, ‘69.12’, ‘53.43’, ‘54.38’, ‘47.41’, ‘48.17’, ‘42.49’, ‘40.90’, ‘38.02’, ‘31.96’, ‘29.08’, ‘28.20’, ‘24.33’, ‘21.50’, ‘21.83’, ‘16.67’, ‘17.17’, ‘16.50’, ‘12.33’, ‘11.83’, ‘9.50’, ‘7.50’, ‘7.50’, ‘5.50’, ‘4.50’, ‘3.00’, ‘2.00’, ‘1.00’) corresponding to the values of k = 1 to k = 34 were studied. The biggest drop in values was for the value corresponding to k=2 i.e., 115.8 to 99.1. So, as per the elbow criterion, the value of k = 2 was selected.

To validate our decision, the scores for the community goodness metrics were also considered i.e. the silhouette score, modularity and Calinski and Harabasz. The plots for the three metric scores against the variation of value of k, are shown in the Figures below.

Figure SEQ Figure * ARABIC 21 Silhouette Graph for Karate Club (K-Means)

Figure SEQ Figure * ARABIC 22 Modularity Graph for Karate Club (K-Means)

Figure SEQ Figure * ARABIC 23 Calinski and Harabasz Score for Karate Club (K-Means)To highlight the k selection process, the table below shows the values of the three performance metrics for the range of k from 2 to 10. The range was selected as it contained the maximum values of the three metrics.

Table SEQ Table * ARABIC 15 Community Goodness Scores Analysis for Karate Club (K-Means)K Silhouette Modularity Calinski and Harabasz2 0.16 0.37 7.83

3 0.15 0.31 7.03

4 0.18 0.10 6.91

5 0.16 0.16 6.53

6 0.16 0.08 6.19

7 0.13 0.19 4.89

8 0.17 0.13 6.31

9 0.14 0.06 5.16

10 0.10 0.10 5.44

The values of the modularity and Calinski & Harabasz Index are the highest at k = 2. Silhouette score showed the maximum performance at k = 4. But considering the majority votes for k = 2, it was selected as the number of clusters for further analysis.

Once the k was selected, the scores for the seven performance metrics for k from 2 to 8 was studied, as shown in Table 15. As can be seen, for k = 2, the algorithm gave the best results. This shows that the selected k indeed formed the best quality clusters as well as coincided with the ground truth.

Table SEQ Table * ARABIC 16 Performance Scores of K-MeansK Purity NMI AMI ARI Homogeneity FMI F1 Score

2 1 1 1 1 1 1 1

3 1 0.88 0.78 0.88 0.99 0.93 0.2

4 1 0.71 0.49 0.52 1 0.71 0.7

5 0.97 0.61 0.41 0.53 0.83 0.72 0.02

6 1 0.65 0.39 0.41 1 0.64 0

7 1 0.62 0.34 0.33 1 0.57 0.5

8 1 0.59 0.3 0.26 1 0.50 0.14

The detail of the nodes within the two clusters are shown in the Table below.

Table SEQ Table * ARABIC 17 K-Means Results for Karate Club DatasetCommunity Label Node IDs in Community

0 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21

1 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33

The cluster map for the dataset is shown in the Figure below.

Figure SEQ Figure * ARABIC 24 K-Means Clustering Map for Karate Club DatasetTable below shows the number of iterations and time taken by the algorithm to converge. The average time of K-Means was 0.0054s.

Table SEQ Table * ARABIC 18 Time and Iterations K-Means (K-Means)K Iterations Time (s)

2 4 0.0072

3 7 0.0049

4 3 0.0047

5 4 0.0048

6 6 0.0047

7 5 0.0052

8 3 0.0066

9 2 0.0062

10 5 0.0059

The table shows that the algorithm took less than a second to perform the clustering.

Dolphins Dataset

Similar to the Karate Club Dataset, the value of k was chosen pragmatically. For the values between 1 and the total number of nodes, the plot for Elbow Criterion is shown in the figure below.

Figure SEQ Figure * ARABIC 25 K-Means Elbow Criterion for Dolphin’s DatasetConsidering the values of SSE for K-Means (330.02, 300.85, 279.16, 259.53, 255.13, 239.67, 232.77, 226.52, 225.48, 213.27, 209.53, …, 11.50, 8.00, 5.67, 3.00, 1.00), the biggest drop after which the decrease becomes smoother are for the values of k as 4 for score 259.53. After this, the decrease in SSE was uniform.

Besides the Elbow Criterion, all the performance scores for three metrics (silhouette score, modularity and the Calinski and Harabasz scores) also helped in selecting the value of k. The plots of K-Means for the variation of k against the three metric scores are shown in the Figures below.

Figure SEQ Figure * ARABIC 26 Silhouette Graph K-Means (Dolphins)

Figure SEQ Figure * ARABIC 27 Modularity Graph K-Means (Dolphins)

Figure SEQ Figure * ARABIC 28 Calinski and Harabasz Score for K-Means (Dolphins)To explain the selection process, the values for the three metrics for the range of k from 2 to 10 are given in the Table 18. This range was selected as the maximum values of all the metrics was within this range.

Table SEQ Table * ARABIC 19 K Variation Performance Analysis for Dolphins (K-Means)K Silhouette Modularity Calinski and Harabasz2 0.1 0.3 5.8

3 0.09 0.23 5.37

4 0.08 0.34 5.25

5 0.07 0.21 4.18

6 0.06 0.24 4.22

7 0.06 0.26 3.8

8 0.05 0.2 3.52

9 0.02 0.18 3.07

10 0.04 0.09 3.16

Based on the listed scores in the table, the silhouette score was the highest at k=2. The Calinski and Harabasz gave the highest at k = 3. The modularity gave the highest at k = 4. Considering the majority of similarity, using SSE and Modularity scores, k = 4 was selected.

The performance data against the ground truth for the variation of k against the seven metrics is shown in the Figure below.

Figure SEQ Figure * ARABIC 29 K-Means Performance Scores for Dolphins DatasetA subset of these scores for the seven performance metrics against the ground truth are given in the Table 19. To see the variation in the values of the metrics for the change in k, the values for a range of k from 2 to 8 are shown in the table. The selected value of k for K-Means has been highlighted.

Table SEQ Table * ARABIC 20 Performance Scores of K-Means Score for DolphinK Purity NMI AMI ARI Homogeneity FMI F1 Score

2 0.56 0.32 0.32 0.12 0.27 0.44 0.38

3 0.61 0.4 0.3 0.1 0.37 0.40 0.17

4 0.56 0.33 0.25 0.07 0.32 0.37 0.29

5 0.6 0.37 0.28 0.11 0.39 0.37 0.12

6 0.67 0.47 0.35 0.19 0.52 0.4 0.2

7 0.64 0.38 0.24 0.07 0.43 0.3 0.16

8 0.7 0.41 0.27 0.15 0.46 0.39 0.06

A comparison of the number of iterations by the algorithm is shown in Table below. The average time of K-Means was 0.0056s. The time for the selected value of k was 0.0047s.

Table SEQ Table * ARABIC 21 Time and Iterations of K-Means (Dolphins)K Iterations Time (s)

2 12 0.0048

3 9 0.0066

4 4 0.0047

5 4 0.0046

6 6 0.0063

7 3 0.0066

8 4 0.0047

9 3 0.005

10 5 0.0047

The detailed clustering result by K-Means algorithm for the dataset at k = 4 is given in Table 22.

Table SEQ Table * ARABIC 22 K-Means Results for Dolphin’s DatasetCommunity Label Node IDs in Community

0 0, 1, 2, 3, 4, 7, 8, 10, 11, 12, 15, 19, 20, 22, 23, 25, 26, 27, 28, 30, 31, 32, 35, 36, 39, 42, 44, 46, 47, 48, 49, 52, 53, 55, 58, 59, 60, 61

1 14, 16, 33, 34, 37, 38, 40, 43, 50

2 5, 6, 9, 13, 17, 41, 54, 56, 57

3 18, 21, 24, 29, 45, 51

The cluster map for the dataset for K-Means is shown in Figure below.

Figure SEQ Figure * ARABIC 30 K-Means Clustering Map for Dolphin’s Dataset (k=4)Email-Eu-Core Dataset

To find the optimal value of k, the Elbow Criterion was considered. For the range of k from 1 to 100, the plot for SSE is shown in Figure below.

Figure SEQ Figure * ARABIC 31 K-Means Elbow Analysis (Email-Eu-Core)Considering the values of SSE for K-Means (‘30681.44’, ‘28057.21’, ‘27226.72’, ‘26367.67’, ‘25542.75’, ‘24915.37’, ‘24526.53’, ‘23639.54’, ‘23559.98’, ‘23124.40’, ‘22780.39’, …, ‘16451.01’, ‘16501.98’, ‘16289.76’, ‘16527.40’, ‘16138.34’, ‘16248.95’, ‘16071.48’, ‘16180.69’, ‘16181.18’, ‘16005.31’), the biggest drop after which the decrease becomes smoother are for the values of k as 7.

To validate this value of k for the network, the silhouette analysis was performed for a range of values of k from 2 to 100. Figure below shows the plot of the silhouette coefficients with respect to the number of clusters selected to apply K-Means. The value of k was chosen based on the combined scores of SSE and the other community goodness metrics, i.e. silhouette, modularity and Calinski & Harabasz score.

Figure SEQ Figure * ARABIC 32 Silhouette and Modularity for emailEuCoreThe silhouette coefficient value was the highest 0.26 for k=2. While at this value of k, the modularity was the highest i.e. 0.26 at k=15.

For the Calinski and Harazar scores, consider the plot in Figure below. The similarity score was in general over three times higher than the previously studied Karate Club and Dolphin’s datasets. At k=2, the value was the highest i.e. 93.8.

Figure SEQ Figure * ARABIC 33Calinski and Harazar Scores for emailEuCore DatasetThe high scores for the three good community variables indicate that the nodes within the formed communities are in stable position i.e. they have a stronger relation to the cluster they are part of rather than the neighboring clusters. Low values indicate that the nodes can be part of another neighboring community.

As can be seen from the performance scores graphs, the communities formed within the range of 2 to 100 are highly unstable. All the scores lower as the value of k is increased.

Table below shows the selected k based on the maximum values of the community goodness scores.

Table SEQ Table * ARABIC 23 Optimal k for K-Means Using Community Goodness MetricsSSE Modularity Silhouette Calinski & Harabaszk 7 15 2 2

Thus based on the majority values of Silhouette Coefficient and Calinski & Harabasz score, the value of k was selected as k = 2. Table below shows the community goodness scores for k=2. A low value of modularity indicates that there are no distinct boundaries between the communities formed for the network.

Table SEQ Table * ARABIC 24 K-Means Community Goodness Scores for k=2 (emailEucore)Silhouette Modularity Calinski & HarabaszScore 0.26 0.1 93.8

The seven performance scores for the selected value of k as 2 is shown in Table below. A low value for the scores indicate that the estimated clusters do not coincide with the ground truth at all.

Table SEQ Table * ARABIC 25 K-Means Performance Analysis for k=2 (emailEucore)Purity NMI AMI ARI Homogeneity F1 Score FMI

Score 0.1 0.05 0.01 0.008 0.008 0.05 0.2

From the ground truth we know that the number of clusters for the network are 42. Analyzing the performance results of K-Means around this ground truth value will show if there could have been an improvement in the performance of the algorithm if a different k was chosen. If there exists that value of k, then it would mean that the Silhouette coefficient analysis and Elbow Criterion failed to detect an optimal k.

As can be seen from the results, the value of k is highly fluctuating for all the above methods of determining k. Analyzing the ground truth community information for the value of k, the community goodness scores at k = 42 are shown in the Table 25. As can be seen, the silhouette and Calinski & Harabasz score decreases but the modularity increases. An increase in modularity means that the boundary between the clusters is getting more defined.

Table SEQ Table * ARABIC 26 K-Means Community Goodness Scores for k=42 (emailEucore)Silhouette Modularity Calinski & HarabaszScore 0.09 0.19 14.5

Table 26 shows the scores for the seven performance metrics at k = 42.

Table SEQ Table * ARABIC 27 K-Means Performance Analysis for k=42 (emailEucore)Purity NMI AMI ARI Homogeneity F1 Score FMI

Score 0.49 0.5 0.35 0.07 0.45 0.004 0.17

All performance metrics other than F1 Score and FMI showed an improvement in the performance when the cluster number from the ground truth were used. This shows that the clusters formed by K-Means at k=42 had more similarity to the ground truth than at k=2. This means that the k-analysis methods failed to detect the optimal value of k. Additionally, the performance scores are very low in general. This means, that the K-Means does not cluster properly enough to meet the requirements of the ground truth communities.

K-Means++ EvaluationThis section gives a comparison of K-Means++ with the K-Means algorithm. Both the versions implement Blondel et al. (2010). The difference between the two is the manner in which the initial centroids are selected. K-Means selects them randomly, while K-Means++ selects them randomly but applies the condition that the points are not close together.

Karate Club Dataset

To find the value of k, the elbow criterion was performed. The plot for the SSE for values of k starting 1 to 34 was considered, as shown in Figure below.

Figure SEQ Figure * ARABIC 34 Elbow Criterion for Karate Club (K-Means Vs. K-Means++)Studying the values of the K-Means++ SSEs (144.18, 115.82, 99.18, 84.02, 74.99, 66.55, 60.79, 54.16, 49.59, 44.10, 40.53, 37.35, 32.63, 29.80, 26.80, 24.33, 21.33, 19.50, 17.33, 16.00, 14.00, 12.50, 11.00, 10.00, 9.00, 8.00, 7.00, 6.00, 5.00, 4.00, 3.00, 2.00, 1.00) corresponding to the values of k = 1 to k = 34, the biggest drop in values was for the value corresponding to k=2 i.e., 115.8 to 99.1. After this drop, the decrease in scores was uniform. So, as per the elbow criterion, the value of k = 2 was selected. The value of k chosen was similar to that of the K-Means algorithm.

To validate our decision, the community goodness scores for the three metrics were also considered i.e. Silhouette, Modularity and Calinski and Harabasz. The plots for the three metric scores for both the K-Means and K-Means++ are shown in the Figures below. It can be seen that K-Means++ gave higher silhouette scores as compared to K-Means, but the behavior was opposite in Calinski and Harabasz score. For modularity, the K-Means++ gave a smoother curve.

Figure SEQ Figure * ARABIC 35 Silhouette Graph for Karate Club (K-Means vs. K-Means++)

Figure SEQ Figure * ARABIC 36 Modularity Graph for Karate Club (K-Means vs. K-Means++)

Figure SEQ Figure * ARABIC 37 Calinski and Harabasz Score for Karate Club (K-Means vs. K-Means++)Between the traditional K-Means and K-Means++, the scores fluctuated but both behaved somewhat similar at k = 2. At k=2, the KMeans++ algorithm took more iterations than traditional K-Means to converge. Table below shows the number of iterations and time taken by the two algorithms to converge. The average time of K-Means++ was 0.0056s in contrast to K-Means which was 0.0054s.

Table SEQ Table * ARABIC 28 Time and Iterations of K-Means and K-Means++ (Karate Club)K Iterations Time (s)

K-Means++ K-Means K-Means++ K-Means

2 7 4 0.008 0.0072

3 3 7 0.006 0.0049

4 3 3 0.0045 0.0047

5 4 4 0.0046 0.0048

6 4 6 0.006 0.0047

7 3 5 0.0046 0.0052

8 2 3 0.0046 0.0066

9 3 2 0.0046 0.0062

10 3 5 0.005 0.0059

The iteration graph for the above data is shown below.

Figure SEQ Figure * ARABIC 38 K-Means++ Vs. K-Means Iterations for Karate Club DatasetTable 28 shows the values of the three performance metrics for change of values of k from 2 to 10. The values of the modularity and Calinski & Harabasz Index are the highest at k = 2. Silhouette score showed the maximum performance at k = 11, but it did not coincide with the other metrics. So, considering the majority votes for k = 2, it was selected as the number of clusters for further analysis.

Table SEQ Table * ARABIC 29 K Variation Performance Analysis for Karate Club (K-Means++)K Silhouette Modularity Calinski and Harabasz2 0.16 0.37 7.83

3 0.15 0.31 7.03

4 0.17 0.08 7.16

5 0.18 0.11 6.69

6 0.16 0.15 6.53

7 0.17 0.07 6.17

8 0.17 0.12 6.17

9 0.16 0.15 5.96

10 0.19 0.11 6.05

11 0.20 0.13 5.88

The scores for the seven performance metrics for k from 2 to 8 are shown in the Table below. As can be seen, for k = 2, the algorithm gave the best results for K-Means++.

Table SEQ Table * ARABIC 30 Performance Scores of K-Means++K Purity NMI AMI ARI Homogeneity FMI F1 Score

2 1 1 1 1 1 1 1

3 1 0.88 0.78 0.88 0.99 0.93 0.058

4 1 0.71 0.49 0.52 1 0.71 0

5 0.97 0.61 0.41 0.53 0.83 0.72 0.65

6 1 0.65 0.39 0.41 1 0.64 0.27

7 1 0.62 0.34 0.33 1 0.57 0.27

8 1 0.59 0.3 0.26 1 0.50 0.25

The Figures below show the evaluation graphs for K-Means++ in contrast to K-Means. The comparison graphs show that except for the F1 Score, the evaluation scores for the traditional k-Means were less smooth i.e. they fluctuated more as compared to the K-Means++. In F1 Score, both the algorithms showed fluctuating scores. Despite the variations in the scores, the performance of both the algorithms was similar at k=2.

Figure SEQ Figure * ARABIC 39 K-Means Purity Scores (for k=2:10) for Karate Club Dataset

Figure SEQ Figure * ARABIC 40 K-Means NMI Scores (for k=2:10) for Karate Club Dataset

Figure SEQ Figure * ARABIC 41 K-Means AMI Scores (for k=2:10) for Karate Club Dataset

Figure SEQ Figure * ARABIC 42 K-Means ARI Scores (for k=2:10) for Karate Club Dataset

Figure SEQ Figure * ARABIC 43 K-Means Homogeneity Scores (for k=2:10) for Karate Club Dataset

Figure SEQ Figure * ARABIC 44 K-Means FMI Scores (for k=2:10) for Karate Club Dataset

Figure SEQ Figure * ARABIC 45 K-Means F1 Scores (for k=2:10) for Karate Club DatasetThe detail of the nodes within the two clusters are shown in the Table below.

Table SEQ Table * ARABIC 31 K-Means++ Results for Karate Club DatasetCommunity Label Node IDs in Community

0 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21

1 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33

The cluster map for the dataset is shown in the Figure below.

Figure SEQ Figure * ARABIC 46 K-Means++ Clustering Map for Karate Club DatasetDolphins Dataset

Similar to the Karate Club Dataset, the value of k was chosen pragmatically. For the values of k between 1 and the total number of nodes, the corresponding SSEs were noted. And then the plot for the Elbow Criterion was formed as shown in the figure below.

Figure SEQ Figure * ARABIC 47 K-Means Elbow Criterion for Dolphin’s DatasetThe elbow was not clear so considering the values of SSE for K-Means++ (330.02, 310.88, 294.90, 259.30, 257.81, 246.62, 244.60, 228.30, …, 10.67, 8.25, 6.17, 4.50, 3.00, 2.00, 1.00), the biggest drop after which the decrease became smoother was found at k equal to 4 i.e. 259.30.

The performance scores for three metrics (silhouette score, modularity and the Calinski and Harabasz scores) were also considered to help select the value of k. The Calinski and Harabasz and the modularity by the traditional K-Means gave the highest at k = 4. While the modularity by the traditional K-Means gave the highest at k = 5. Considering the majority of similarity, k = 4 was selected. K-Means++ did not point to a consistent k for all three metrics. So, if K-Means++ was considered for further analysis, the value of k would be selected as 5.

The values for the three metrics, for each iteration are given in the Table below.

Table SEQ Table * ARABIC 32 K Variation Performance Analysis for Dolphins (K-Means++)K Silhouette

Modularity

Calinski and Harabasz2 0.17 0.01 3.6

3 0.09 0.08 3.5

4 0.09 0.38 5.27

5 0.06 0.23 3.9

6 0.07 0.26 3.7

7 0.05 0.11 3.2

8 0.06 0.15 3.43

9 0.05 0.09 3.15

10 0.03 0.04 2.4

The plots of K-Means++ for the above metric scores are shown in the Figures below. The curve of the K-Means is also included in the graphs to see the overall values of the two algorithms. 14. Page 67 K-means and K-means ? See the last LINE at the end pf the page. Which of the two is K-Means++?

Figure SEQ Figure * ARABIC 48 Silhouette Graph K-Means vs. K-Means++ (Dolphins)

Figure SEQ Figure * ARABIC 49 Modularity Graph K-Means vs. K-Means++ (Dolphins)

Figure SEQ Figure * ARABIC 50 Calinski and Harabasz Score Graph K-Means vs. K-Means++ (Dolphins)The plots for the values for the seven performance metrics are given in the Figures below. The corresponding values for the K-Means have also been added in the graphs to see a comparison of the scores of the two versions of the algorithm. It can be seen that in all the curves, the scores of K-Means are a bit higher than the K-Means++. But the K-Means++ form communities that relate more to the ground truth as compared to K-Means.

Figure SEQ Figure * ARABIC 51 Purity Scores for Dolphins Dataset

Figure SEQ Figure * ARABIC 52 NMI Scores for Dolphins Dataset

Figure SEQ Figure * ARABIC 53 AMI Scores for Dolphins Dataset

Figure SEQ Figure * ARABIC 54 ARI Scores for Dolphins Dataset

Figure SEQ Figure * ARABIC 55 Homogeneity Scores for Dolphins Dataset

Figure SEQ Figure * ARABIC 56 FMI Scores for Dolphins Dataset

Figure SEQ Figure * ARABIC 57 F1 Scores for Dolphins DatasetThe performance of K-Means++ for the seven metrics is shown in the Figure below.

Figure SEQ Figure * ARABIC 58 K-Means++ Performance Scores for Dolphins DatasetTo see the variation in the values of the seven performance metrics for the change in k, a subset of the values for the range of k from 2 to 8 are shown in the table below. The selected value of k for the K-Means++ has been highlighted. Although the curves of K-Means show higher values, but at k-4, the scores of K-Means++ are higher (see comparison Table in Section E or K-Means performance scores from Section C).

Table SEQ Table * ARABIC 33 Performance Scores of K-Means++ for DolphinK Purity NMI AMI ARI Homogeneity FMI F1 Score

2 0.45 0.18 0.07 -0.01 0.11 0.45 0.32

3 0.61 0.4 0.34 0.14 0.38 0.41 0.2

4 0.59 0.38 0.32 0.14 0.37 0.4 0.09

5 0.62 0.37 0.29 0.11 0.37 0.4 0.12

6 0.56 0.26 0.15 0.006 0.28 0.3 0.16

7 0.61 0.34 0.21 0.04 0.37 0.32 0.14

8 0.66 0.41 0.27 0.14 0.48 0.37 0.22

A comparison of the number of iterations by the K-Means and K-Means++ algorithms is shown in Table below. The average time of K-Means++ was 0.0056, while of K-Means it was 0.0053. For k = 4, both the K-Means and K-Means++ performed the clustering in similar timeframes.

Table SEQ Table * ARABIC 34 Time and Iterations Comparison of K-Means and K-Means++K Iterations Time

K-Means++ K-Means K-Means++ K-Means

2 3 12 0.0071 0.0048

3 3 9 0.0047 0.0066

4 4 4 0.0047 0.0047

5 6 4 0.0046 0.0046

6 5 6 0.0046 0.0063

7 4 3 0.008 0.0066

8 3 4 0.0047 0.0047

9 2 3 0.0048 0.005

10 3 5 0.0064 0.0047

A plot to give a comparison for the iterations is shown in the Figure below.

Figure SEQ Figure * ARABIC 59 K-Means Vs. K-Mean++ Iterations for Dolphins DatasetThe detailed clustering result by K-Means++ algorithm for the dataset at k=4 is given in the Table below.

Table SEQ Table * ARABIC 35 K-Means++ Results for Dolphin’s DatasetCommunity Label Node IDs in Community

0 0, 1, 2, 3, 4, 7, 8, 10, 11, 12, 19, 20, 22, 25, 26, 27, 28, 30, 31, 32, 35, 36, 39, 42, 44, 46, 47, 48, 49, 53, 55, 58, 60, 61

1 5, 6, 9, 13, 17, 41, 54, 56, 57

2 14, 16, 33, 34, 37, 38, 40, 43, 52

3 15, 18, 21, 23, 24, 29, 45, 50, 51, 59

The cluster map for the dataset for K-Means++ is shown in the Figure below.

Figure SEQ Figure * ARABIC 60 K-Means++ Clustering Map for Dolphin’s Dataset (k=5)Email-Eu-Core Dataset

To find the optimal value of k, the Elbow Criterion was considered. For the range of k from 1 to 100, the plot for SSE is shown in the Figure below.

Figure SEQ Figure * ARABIC 61 K-Means Elbow Analysis (Email-Eu-Core)Considering the values of SSE for K-Means++ (‘30681.44’, ‘28057.21’, ‘27079.87’, ‘26247.93’, ‘25540.68’, ‘24835.12’, ‘24234.69’, ‘23866.06’, ‘23826.29’, ‘23346.10’, ‘22764.11’, ‘22641.56’, …,’15048.27′, ‘15068.05’, ‘14914.29’, ‘14892.09’, ‘14961.61’, ‘14796.61’, ‘14816.99’, ‘14731.14’, ‘14715.31’, ‘14672.57’, ‘14509.05’), the biggest drops after which the drop becomes smoother are for the values of k as 8.

To validate the value of k for the network, the silhouette analysis was performed for a range of values of k from 2 to 100. Figure 49 shows the plot of the silhouette coefficients with respect to the number of clusters selected to apply K-Means++. The value of k was chosen based on the combined scores of silhouette and the other metrics, i.e. modularity and Calinski & Harabasz.

Figure SEQ Figure * ARABIC 62 K-Means++ Modularity for emailEuCoreBoth the K-Means and K-Means++ share the highest modularity value of 0.26. But the score is achieved at k=15 for K-Means and k=9 for K-Means++.

Figure SEQ Figure * ARABIC 63 K-Means vs. K-Means++ Silhouette Score (Email-Eu-Core)For silhouette score, both the K-Means and K-Means++ shared the highest silhouette score of 0.34 at k=2.

For the Calinski and Harabasz scores, consider the plot in the Figure below. For both the K-Means and K-Means++, the score was the highest i.e. 93.8, at k=2.

Figure SEQ Figure * ARABIC 64Calinski and Harazar Scores for emailEuCore DatasetTable below shows a comparison of the optimal value of k considering the SSE and community goodness scores.

Table SEQ Table * ARABIC 36 Optimal k for K-Means++ (Email-Eu-Core)SSE Modularity Silhouette Calinski & Harabaszk 8 9 2 2

The general trend seen in the community goodness metrics is that the scores decrease with the increase of k. For all the three metrics the highest score gives the best community structure. Thus based on the closeness in values of k between SSE and modularity, the value of k was selected as k = 9. Table below shows the community goodness scores for k=9.

Table SEQ Table * ARABIC 37 K-Means++ Community Quality for k=9 (emailEucore)Silhouette Modularity Calinski & HarabaszScore 0.15 0.26 35.8

The seven performance scores for the selected value of k as 9 is shown in Table below.

Table SEQ Table * ARABIC 38 K-Means Performance Analysis for k=9 (emailEucore)Purity NMI AMI ARI Homogeneity F1 Score FMI

Score 0.29 0.33 0.18 0.04 0.22 0.015 0.2

Studying the community goodness metrics and matching scores of the algorithm at k = 42 (from the ground truth) will show whether the clusters formed are closer to the ground truth or not. Table below shows the community goodness scores and matching scores with the ground truth at k=42.

Table SEQ Table * ARABIC 39 K-Means++ Community Goodness Scores for k=42 (emailEucore)Silhouette Modularity Calinski & HarabaszScore 0.15 0.22 15.4

Table below shows the scores for the seven performance metrics at k = 42.

Table SEQ Table * ARABIC 40 K-Means++ Performance Analysis for k=42 (emailEucore)Purity NMI AMI ARI Homogeneity F1 Score FMI

Score 0.45 0.49 0.33 0.07 0.42 0.07 0.19

All performance metrics other than FMI show an improvement in the performance when the cluster number from the ground truth were used. This shows that the clusters formed by K-Means at k=42 had more similarity to the ground truth than at k=9. This shows that k-analysis did not effectively detect the number of clusters i.e. k. Secondly, the performance scores are very low in general. This means, that the K-Means does not cluster properly enough to meet the requirements of the ground truth communities.

Test EnvironmentA testing environment was set up in Python to analyze the performance of the two algorithms. This was because Python has built in parsers that can read a variety of dataset formats. Only the path to the file needs to be provided, and Python creates the graph itself. Additionally, both the Louvain and K-Means methods as well as almost all the evaluation metrics required to carry out the performance analysis of the two algorithms pre-exists in the Python (See Appendix for details of each method). Nine out of the ten performance metrics have an implementation in the Python libraries. Only the Purity metric was implemented from scratch.

iGraph is a free high performance graph library developed in C. Python interfaces with this library and facilitates the complex network research and analysis (Csardi, and Nepusz, 2005). The library eases the process of creating network nodes, edges, graphs, labeling, etc. iGraph also has the method of estimating the modularity of the clusters formed.

An implementation of Louvain algorithm by Blondel et al. (2010) is available in the iGraph package. The default settings of the Louvain algorithm were used. As all the dataset graphs were unweighted, all edges were assigned equal weights. The algorithm was set to give the final memberships and modularity after executing all the iterations.

Scikit-learn is a machine learning and data mining and analysis tool in Python. The tool has implementations of the different classification, clustering, dimensionality reduction, pre-processing, regression and model selection algorithms (Pedregosa et al., 2011). To realize the K-Means algorithm, the KMeans tool from the Scikit-learn package was used. The tool uses the K-Means++ method under default settings, where it is ensured that the initial centroids, although generated randomly, are never too close (Arthur and Vassilvitskii, 2007). Apart from setting the traditional K-Means or K-Means++ version of the algorithm, the default settings for the algorithm was used for the experiments. This means that for each pass of the algorithm, the centroids are computed ten times and the algorithm runs ten times. After this the output which has the minimum inertia value (SSE) is selected as the output of the algorithm. This means that the execution timings listed for the K-Means and K-Means++ against each dataset is ten times the single execution time.

For each network graph, an edge matrix is computed and the k-means filter is then applied on this matrix. The methods for performance metrics also use this edge matrix to compute their respective scores.

All the testing was carried out using a x64-based PC (HP EliteBook 8470p) with Microsoft Windows 10 Pro operating system. The system used Intel(R) Core(TM) i5-3360M CPU @ 2.80GHz, 2 Core(s), 4 Logical Processor(s), The system had 8 GB of RAM and 13.3 GB of virtual memory.

ResultsThe community goodness scores and the performance scores of the Louvain, traditional K-Means and K-Means++ algorithms for the dataset relating to the ground truth are summarized in the Tables 41 and 42.

Table SEQ Table * ARABIC 41 Results for the Community Goodness MeasuresDataset Method Modularity Silhouette Calinski and HarabaszClusters

Karate Ground 0.37 0.16 7.83 2

K-Means 0.37 0.18 7.83 2

K-Means++ 0.37 0.195 7.83 2

Louvain 0.418 0.146 5.54 4

Dolphin Ground 0.519 0.117 6.392 4

K-Means 0.34 0.08 5.25 4

K-Means++ 0.38 0.09 5.27 4

Louvain 0.518 0.108 5.75 5

Email-Eu-Core Ground 0.42 -0.197 7.129 42

K-Means 0.35 0.09 93.7 2

K-Means++ 0.15 0.26 35.8 9

Louvain 0.42 -0.295 3.96 27

From the community goodness scores, it can be seen that for the Karate Club dataset, both the K-Means and K-Means++ not only gave the correct community structures, their scores also matched with those of the ground data. Louvain on the other hand gave two extra communities. The modularity of the clusters was even higher than the ground community structure, but the other two measures suffered and the algorithm failed to get the clusters right. Between the K-Means and K-Means++, K-Means++ gave better score for silhouette coefficient, a community goodness measure.

The result for the Dolphin’s dataset shows that again both the K-Means and K-Means++ algorithms got the communities number correct. Louvain detected one extra community. Although an extra community was detected, but the Louvain algorithm’s scores were not too much deviated from the ground truth. K-Means++ again gave better performance results as compared to the K-Means for the Dolphin’s dataset. There was a slight lowering in the modularity measure, but the score was still in the sound range.

The result for the Email-Eu-Core dataset shows that Louvain although could not detect all the communities, but its detection results was closer to the ground truth community design as compared to the K-Means. Although Louvain shared the modularity measure with the ground truth, but the Silhouette and Calinski and Harabasz scores were different. Meanwhile, both the K-Means and K-Means++ could not converge in a convincing manner. K-Means detected only 2 communities while K-Means++ detected 9. As the silhouette score is a negative value, it means that some of the nodes would rather have been in different clusters than the ones assigned.

Table SEQ Table * ARABIC 42 Results for the Performance MeasuresDataset Algorithm Purity NMI AMI ARI F1 Scores Homogeneity FMI

Karate K-Means 1 0.71 0.49 0.52 1 1 0.71

K-Means++ 0.97 0.61 0.41 0.53 1 1 0.71

Louvain 0.98 0.61 0.44 0.49 0.8 0.83 0.63

Dolphin K-Means 0.56 0.33 0.25 0.07 0.29 0.32 0.37

K-Means++ 0.59 0.38 0.32 0.14 0.09 0.37 0.4

Louvain 0.887 0.73 0.64 0.64 0.03 0.79 0.74

Email-Eu_CoreK-Means 0.109 0.054 0.012 0.008 0.019 0.054 0.2

K-Means++ 0.29 0.33 0.18 0.04 0.22 0.015 0.2

Louvain 0.393 0.55 0.337 0.25 0.417 0.04 0.38

From the performance measures acquired for the three datasets, it can be seen that from between K-Means and K-Means++, K-Means gave better matching communities to the ground truth. Its scores were considerably better than those of K-Means++.

For the small Karate Club dataset, both the K-Means and K-Means++ outperformed Louvain. For all the seven clustering matching measures, the scores of K-Means and K-Means++ were higher than Louvain. So, not only did K-Means and K-Means++ formed better communities, they even coincided well with the ground truth.

But for the Dolphin’s database, Louvain’s algorithm gave better scores than K-Means even though the number of communities identified by K-Means was the same as ground truth while Louvain identified an extra community. All the community matching measures for Louvain gave better results than both K-Means and K-Means++. This shows that even though K-Means and K-Means++ formed the better communities, Louvain formed communities that coincided more with the ground truth. Except for the F1 Score, the K-Means++ gave better results than the K-Means.

For the Email-Eu-Core dataset, based on the scores of all seven clustering matching metrics, the Louvain algorithm outperformed the K-Means algorithm. The difference between the scores was considerably high. And again, except for the F1 Score, K-Means++ gave better performance scores than K-Means.

Karate ClubFigures below give an overview of all the performance metric scores for the two algorithms compared to the ground truth. The graphs show that besides the modularity metric, K-Means and K-Means++ form better quality clusters than Louvain. Also K-Means and K-Means++ performed better than Louvain for all seven performance metrics. The algorithm essentially gives the same output as the ground truth.

Figure SEQ Figure * ARABIC 65 Community Evaluation Metrics for Karate Club

Figure SEQ Figure * ARABIC 66 Performance Evaluation Metrics for Karate Club DatasetDolphins Dataset

A high level view of the results of the algorithms for the Dolphin’s datasets is presented in the Figures below. It can be seen that K-Means++ not only formed better quality clusters than K-Means, it also performed better for all the performance measures. Louvain found an extra community but other than that, it not only formed better quality communities but also scored closer to the ground truth as compared to K-Means++.

Figure SEQ Figure * ARABIC 67 Quality Evaluation Metrics for Dolphins Dataset

Figure SEQ Figure * ARABIC 68 Performance Evaluation Metrics for Dolphins DatasetEmail-Eu-Core DatasetA high level view of the community goodness score comparisons is shown in the Figures below. As can be seen from the graphs, K-Means++ performed better than K-Means. Louvain performed better than K-Means++ but it didn’t perform too good when compared to the ground truth.

Figure SEQ Figure * ARABIC 69 Quality Evaluation Metrics for Email-EU-Core

Figure SEQ Figure * ARABIC 70 Performance Metrics for Email-EU-CoreSummaryBased on the performance of the two algorithms against the two evaluation criterion i.e. ground truth communities and community goodness scores, the table below shows the ranking based on the comparison of the algorithms for the three datasets.

Table SEQ Table * ARABIC 43 Performance Summary of Louvain vs. K-MeansEvaluation Criterion Karate Club Dolphins Email-Eu-Core

Community Goodness K-Means++

K-Means K-Means++

K-Means Louvain

Matching with Ground Truth K-Means

K-Means++ Louvain Louvain

DiscussionThis work aimed at performing the task of community detection while exploring the workings, strengths and limitations of the two popularly used clustering algorithms, Louvain and K-Means. A lot of test cases exist for exploring the strengths and weaknesses of an algorithm. The approach chosen for this work was to start the evaluation against a small real-world system, and then observe the algorithm behavior as the testing networks became larger and complex.

The Louvain algorithm was applied directly on to the network’s graph only once. On the other hand, in the K-Means algorithm, the performance was found to be affected by the initial step of centroid selection. So, both the K-Means and K-Means++ algorithms were set to run ten times and the best of these runs was selected. In the best run, the selected centroids gave the lowest Within-Cluster Sum of Squares (WCSS) error.

The K-Means++ method revises the initial randomly selected centroids. The results of adding a variation in the selection process is visible by the performance difference between the K-Means and K-Means++. The results of K-Means++ show an improvement in the community goodness scores as well as the community matching scores.

For both Zachary’s karate club dataset and Dolphin social network dataset, to find the number of clusters, the algorithm was run with a variation of k from 2 to the size of network. Then, based on the results of silhouette score, modularity and the Calinski and Harabasz score, along with the Elbow Criterion, the value of k that gave the best results was selected for the comparison analysis with Louvain algorithm. While applying K-Means to the dataset, the change in the performance of the two versions of the K-Means algorithms, K-Means and K-Means++, was also determined. This helped understand the behavior of a most widely used variation of the original K-Means where the initial centroids are ensured to be not close to one another.

In the smallest Karate Club dataset, K-Means algorithm performed better than Louvain while K-Means++ performed better than K-Means. For the relatively bigger Dolphin’s dataset, K-Means++ gave better results than K-Means. The two algorithms formed better communities than Louvain. However, compared to the ground truth, Louvain gave better performance than both. And finally in the large complex network dataset, Email-Eu-Core, Louvain outperformed K-Means both in community formation and coincidence with the ground truth.

Based on the performance results, it can be seen that choosing between Louvain and K-Means algorithm is a decision that can be reached in a systematic manner. For some real-world datasets, using K-Means algorithm for clustering may not be the correct choice. For instance, after running K-Means multiple times, if there is a notable variation in the clustering results i.e. between the different runs the number of communities formed and the evaluation metrics have huge differences, then k-Means is not the correct choice for clustering the graph. And for some system large systems, Louvain tends to ignore or mislabel the communities. For instance, in the email-Eu-core dataset, Louvain only detected almost half the number of ground truth communities.

As observed through the study of literature and experiments, for the small datasets, both Louvain and K-Means give comparable performances in terms of speed. For the large datasets, however, Louvain algorithm is faster than K-Means. K-Means algorithm always converges to a value of k, whether it be 1 or the total number of nodes in the network. A feature of K-Means that is not present in Louvain’s algorithm is its ability to predict the missing information. This feature can be utilized by training the K-Means on the ground data and then perform supervised clustering. This setting can work in network environments where some data is available for training the algorithm. The new nodes can then be clustered into one of the available clusters. This setting can work in networks where the nodes enter and leave a network e.g. in traffic monitoring networks, etc.

One of the limitations of K-Means algorithm is its input parameters i.e. the number of clusters that a network should have must to known prior to applying the filtering. For the real-world networks, if the number of clusters is not known, the number of k is reached after several trials e.g. by applying the elbow criterion. In large networks containing millions of nodes, this could utilize a lot of time. Another issue with the K-Means algorithm is that, as the centroids are selected at random, two separate runs on the same dataset may result in two different clustering results. One run may perform better than the other, depending on the initial position of the centroids.

Based on the reviewed literature and the experimental results, an informed suggestion to mitigate both the issues of K-Means algorithm is to use topological measures such as degree of nodes, to choose the initial centroids. For instance, choosing centroids from within the relatively densely populated interaction areas of the network would result in defining strong positions for the centroid. Using the suggested technique will ensure that the clustering algorithm would extract clusters that conform with the overall network topology.

The comparison of the two flavors of the K-Means algorithm (i.e. the traditional K-Means and the K-Means++) revealed that the speed and performance of the two algorithms is comparable. From the comparison tests against the datasets, it was observed in almost all the measures, K-Means++ performed better. This unpredictability in behavior is due the random initialization of the centroids.

Based on the performance evaluation of the Louvain and K-Means algorithms, some observations common to the evaluation process can be drawn. Depending on the availability of the ground truth, the performance of the algorithms is generally measured through three kinds of comparisons;

Firstly, by comparing the number of communities found to the expected number of communities. This method is direct and can easily indicate whether there would be some mislabeled nodes assigned to dissimilar communities or not.

Secondly, by comparing the predicted labels for nodes to the expected labels for nodes. This method compares how well the predicted nodes map with the expected labels. And it also draws out the exact nodes that were mislabeled.

Lastly, the scores for metrics that help assess whether the communities formed are good i.e. the degree of communication of nodes within a community is more than their communication external from the community.

In all these types of comparisons, there are a variety of performance metrics to choose from. But depending on the type of clustering intended, some metrics are unsuitable to employ. For instance, if the communities formed are independent of the labels, then there is a range of metrics available to find such similarity scores. But some label dependent metrics require clustering results be in a certain way to be accepted. For instance, a cluster with the same nodes would give a zero score if the labels of the nodes were not the same as the expected ones.

Secondly, the metrics chosen to evaluate some aspect of a network should conform with the underlying network. For instance, the metrics selected for determining the value of k are not efficient for the network scenario presented in emailEUcorp dataset. The score for the Calinski & Harabasz Index decreased swiftly as the number of clusters increased for K-Means. This behavior of the metric could indicate two things; either the clustering results of the K-Means algorithm was not good. Or the Calinski & Harabasz Score cannot not go higher for the dataset. Calinski & Harabasz Score is generally used for assessing communities where the clusters are normally distributed i.e. the clusters are spherical with lighter centers. The underlying network may be interconnected in a manner that it does not sit well with the performance evaluator’s characteristics.

Another observation made was that the ground truth that is generally considered a standard for evaluating algorithms should also fall into the definition of a good community. A good algorithm may be categorized as inefficient if it does not agree with the ground truth. Considering the algorithm alone for misclassifications could be incorrect in the cases where the actual culprit in low performance scores could be the misleading ground truth itself. For instance, in the Zachary’s Karate Club dataset, one member has been classified into two separate communities from the very beginning when the dataset was published. The ground truth of that specific node is disputed. So, the original Karate Club network data presented by Zachary differs from the common version that is used in community detection (Peel et al. 2017). Similarly, the silhouette coefficient for the emailEUcore dataset showed a negative value for the ground truth labels provided with the dataset (see Section C. a). A negative silhouette coefficient value indicates misclassification of nodes in communities. For instance, a node in community 1 has more closeness to the community 2 compared to community 1. This could happen in two cases; either the number of clusters formed for a network are too many or the number of clusters are very less.

Based on the above observations, it should be a known beforehand that the ground truth provided for a problem does not go against the general structure of a network i.e. the communities formed from the ground truth are considered stable and sound communities. Secondly, identifying communities with a high value of modularity does not necessarily mean that the communities formed represent the actual network. And similarly abiding by a ground truth may not necessarily mean that the algorithm is completely reliable. In network clustering, one solution doesn’t solve all the problems.

Furthermore, there is a list of findings that are by-products of this comparison. Based on the comparative analysis of two popular algorithms on different sized datasets, our work proves that;

K-Means and Louvain alone do not capture the essence of a network i.e. the structural properties of the underlying network. For instance, for K-Means, the nodes that are best candidates to be selected as the centroids do not get selected. For instance, the nodes that have a high degree are the best candidates. Similarly, in Louvain, the smaller communities get merged into bigger ones if their merger increases the modularity value. The increase in modularity may ignore the fact that only a single link may exist between the smaller community and the bigger one.

Some topological measures should also be taken into account while using these clustering algorithms. The measures of modularity (Louvain) and SSE (K-Means) is not enough.

The efficiency of an algorithm cannot be determined by mere closeness to the ground truth. Sometimes the ground truth does not make sound clusters.

ConclusionThis work is a contribution to the existing literature of community detection algorithm comparisons. Two popular clustering algorithms, Louvain and K-Means were compared against three real-world datasets of different complexities to compare their performances.

Similar to the comparison results on a very large Astro dataset by Wang and Koopman (2017), both the algorithms converged, providing with the number of clusters for all the three selected datasets. In the Astro dataset, the comparison of the two algorithms was almost equivalent in terms of the number of communities found. But K-Means gave higher similarity with the approximated ground truth, thus performed better. Based on the results of the two algorithms against the smaller datasets selected in this work, the performance of the two methods varied. While KMeans performed better at Karate Club database, Louvain performed better in Dolphins. For the larger email network, both detected low number of communities. Louvain detected half of the total expected number of communities while K-Means only found 2 and 9 instead of 42.

The inconsistent performance of the two algorithms can be attributed to the difference in the properties of the three dataset. The datasets varied in size and their internode relationship structure. In the mentioned Astro dataset, the size may be big, but there might be less data patterns (intersection regions between the clusters) inside. The case could be the opposite for the email-Eu-Core dataset. The performance of K-Means deteriorated with increase in network size. An observed reason behind K-Means failure was the incorrect values achieved during the Elbow Criterion analysis step. The metrics considered for representing a good community misguided the results. But on checking the goodness community scores for the ground truth itself revealed an unstable community structure.

In this work, while performing K-Means, an edge matrix of size nxn was used, where n is the number of nodes. The representation was effective as it correctly clustered the smaller datasets where there were no overlapping patterns within the data. For the larger dataset, email-Eu-Core, the data pattern (e.g. intersection region between clusters) within the network was such that K-Means itself was not a good fit for the specific network. K-Means has found to be most effective in cases where there are no overlapping patterns within the data.

The data representation through edge-matrix was equally effective in representing the nodes of all three datasets. For the smaller Karate Club and Dolphins dataset, K-Means delivered faster results, despite the nxn computations in each iteration. But for the larger email-Eu-Core dataset, Louvain outperformed K-Means in all the runs. This shows how use of edge matrix for big datasets is not a practical approach.

Based on the comparative analysis of two popular algorithms on different sized datasets, our work proves that;

K-Means and Louvain alone do not capture the essence of a network.

Some topological measures should also be taken into account while using these clustering algorithms. The measures of modularity (Louvain) and SSE (K-Means) is not enough.

The efficiency of an algorithm cannot be determined by mere closeness to the ground truth. Sometimes the ground truth itself does not make sound clusters.

KMeans++ is an improved version of KMean algorithm both regarding speed and performance.

The performance of the K-Means algorithm is highly affected by the initial positions of the centroids. In this sense, Louvain algorithm is more robust.

The solutions for making the initial centroid selection and the number of clusters k should be automated and based on the topological features of the network. The existing Elbow and Silhouette criterion may miss finding out the optimal k.

Efficient algorithms are ones that stay close to the network structure as well as the ground truth.

One solution does not fit all the problems. An algorithm needs to be tweaked as per the requirements of the network. For instance, the centroid that have a higher degree could be selected as centroid for K-Means. The network could be swept first to remove single links between connected components before applying Louvain.

Future WorkThere are many dimensions in which the work of this research can be pursued further. As part of future work, the difference in performance of K-Means can be studied for the same datasets but different data representations and the results can be compared with the use of edge matrix. A comparison of K-Means with the other similar spectral clustering algorithms can be conducted. The performance evaluation of the two algorithms, K-Means and Louvain, can be extended by adding more of the popularly used datasets for algorithm performance benchmarking. Furthermore, devising some mechanism of clustering while considering the topological features alongside the quality evaluators can be explored in future. By creating a spatial transformation that also stores some topological features of the dataset, the horizons of designing a fully autonomous system of K-Means clustering can be explored. Such algorithms would pick the value of k based on the topological features. And the initial position of centroids would be random but will represent important positions of the network from the topological perspective. As this research is focused on studying the behavior of the two algorithms for non-overlapping networks, as part of the future work, the design of the Louvain and K-Means algorithms can be revised to allow detection of communities for both over-lapping as well as non-overlapping communities.

SourcesArthur, D. & Vassilvitskii, S. (2007). k-means++: the advantages of careful seeding. SODA ’07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Pg. 1027-1035. Retrieved from http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf

Blondel, V. D., Guillaume, J.L., Lambiotte, R. & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics P10008.

Calinski, T. & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics – Theory and Methods. Volume 3. Pg. 1-27.

Chakraborty, T., Dalmia, A., Mukherjee, A. & Ganguly, N. (2017). Metrics for Community Analysis: A Survey. Journal ACM Computing Surveys (CSUR). Volume 50 Issue 4, Article No. 54.

Collette, A. (2015). Comparison of some community detection methods for social network analysis. Thesis, Louvain School of Management. Retrieved from https://dial.uclouvain.be/downloader/downloader.php?pid=thesis%3A3104&datastream=PDF_10

Csardi, Gabor & Nepusz, Tamas. (2005). The Igraph Software Package for Complex Network Research. InterJournal. Complex Systems. 1695.

Destercke, S. (2018). Belief Functions: Theory and Applications. Springer. Pg. 263.

Elkan, C. (2003). Using the triangle inequality to accelerate k-means. In International Conference on Machine Learning. Pg. 147–153.

Fagnan, J. (2012). Community Mining: From Discovery to Evaluation and Visualization. Master’s Thesis. Retrieved from https://era.library.ualberta.ca/items/3aaf2145-e399-4424-af3b-558e2c5a1a86/view/beb9656b-1b3c-494a-984b-e67be682e3cc/Fagnan_Justin_Winter-202012.pdf

Fahim, A. M., et al. (2006). An efficient enhanced k-means clustering algorithm. Journal of Zhejiang University-Science A 7.10. Pg. 1626-1633.

Falih, I. (2018). Attributed Network Clustering: Application to recommender systems. Doctoral Thesis. Retrieved from https://www-lipn.univ-paris13.fr/~bennani/THESES/These_Falih.pdf

Feng, J. (2014). Information-theoretic Graph Mining. Dissertation. Retrieved from https://pdfs.semanticscholar.org/a654/04fb022a217da11c6f575c379cf8de7c0101.pdf

Fortunato, S. & Barthelemy, M. (2007). Resolution limit in community detection. Proceedings of the National Academy of Sciences, Volume 104, Issue 1, Pg. 36–41.

Fowkles, E. B. & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association. Volume 78, Issue 383, Pg. 553-569. Retrieved from http://wildfire.stat.ucla.edu/pdflibrary/fowlkes.pdf

Fränti, P. & Sieranoja, S. (2018). K-means properties on six clustering benchmark datasets. Applied Intelligence, Volume 48, Issue 12, Pg. 4743-4759. Retrieved from http://cs.joensuu.fi/sipu/datasets/Good, B.H., Yves, A.M. & Aaron, C. (2010). Performance of modularity maximization in practical contexts. Physical review. E, Statistical, nonlinear, and soft matter physics. Volume 81, Pg. 046106.

Girvan, M. & Newman M.E. (2002). Community structure in social and biological networks. Proc Natl Acad Sci U S A. Volume 99, Issue 12. Pg. 7821-6.

Han, J., Pei, J. & Kamber, M. (2011). Data Mining: Concepts and Techniques. Elsevier. Pg. 487-488.

Hric, D., Darst, R.K. & Fortunato, S. (2014). Community detection in networks: Structural communities versus ground truth. Phys. Rev. E Stat Nonlin Soft Matter Phys. Volume 90, Issue 6. Retrieved from https://arxiv.org/pdf/1406.0146.pdfHu, H. (2015). Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis. Doctoral Thesis. Retrieved from ftp://ftp.math.ucla.edu/pub/camreport/cam15-33.pdf

James, B., Robert, E. & William, F. (1984). The Fuzzy C-Means Clustering Algorithm, Computers & Geosciences, Volume 10, Issue 2-3, Pg. 191-203.

Jebabli, M., Cherifi, H., Cherifi, C. & Hamouda, A. (2015). Overlapping Community Detection Versus Ground-Truth in AMAZON Co-Purchasing Network. Proceedings of the 11th International IEEE SITIS Conference, Complex Networks and their Applications Workshop. Pg. 328 – 336.

Jianjun,C., Mingwei, L. Longjie, L., Hanhai, Z. & Xiaoyun, C. (2014). Active Semi-Supervised Community Detection Based on Must-Link and Cannot-Link Constraints. PloS one. 9. e110088. 10.1371/journal.pone.0110088.

Jiyanthi, S.K. & Priya, C.K. (2018). Clustering Approach for Classification of Research Articles based on Keyword Search. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Volume 7, Issue 1, ISSN:2278–1323

Kannan, R., Vempala, S. & Vetta, A. (2000). On clusterings – good, bad and spectral. In FOCS, Pg. 367–377.

Kaufman, L., & Roussew, P. J. (1990). Finding Groups in Data – An Introduction to Cluster Analysis. A Wiley-Science Publication John Wiley & Sons.

Kawaji, H., Yamaguchi, Y., Matsuda, H., & Hashimoto, A. (2001). A graph-based clustering method for a large set of sequences using a graph partitioning algorithm. Genome informatics. International Conference on Genome Informatics, Volume 12, Pg. 93-102.

Kim, Kyoung-jae & Ahn, Hyunchul. (2008). A Recommender System using GA K-means Clustering in an Online Shopping arket. Expert Systems with Applications. Volume 34, Issue 2, Pg. 1200-1209.

Kim, Y-H, Seo, S., Ha, Y-H., Lim, S. & Yoon, Y. “Two Applications of Clustering Techniques to Twitter: Community Detection and Issue Extraction”. Discrete Dynamics in Nature and Society, vol. 2013, Article ID 903765, 8 pages, 2013. Retrieved from https://www.hindawi.com/journals/ddns/2013/903765

Kowalczyk, R. (2009). Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems: First International Conference, ICCCI 2009, Proceedings. Pg. 198-199.

Krings, G. & Blondel, V.D. (2011). An upper bound on community size in scalable community detection. Computing Research Repository – CORR.

Kulkarni, K. (2017). Community Detection in Social Networks. Master’s Project, San Joe University. Retrieved from http://scholarworks.sjsu.edu/etd_projects/528

Landman, N., Pang, H., Williams, C. & Ross, E. (2018). k-Means Clustering. Retrieved from https://brilliant.org/wiki/k-means-clustering

Lee, C. & Cunningham, P. (2013). Benchmarking community detection methods on social media data. Retrieved from https://arxiv.org/abs/1302.0739

Lee, C. & Cunningham, P. (2014). Community detection: effective evaluation on large social networks, Journal of Complex Networks, Volume 2, Issue 1, 1 March 2014, Pg. 19–37, https://doi.org/10.1093/comnet/cnt012

Li, X., Wang, Y.Y., & Acero, A. (2008). Learning query intent from regularized click graphs. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval – SIGIR ’08. doi:10.1145/1390334.1390393

Lloyd, S.P. (1982) Least squares quantization in pcm. Information Theory, IEEE Trans. on, Volume 28, Issue 2, Pg.129–137.

Lund, H.M. (2017). Community Detection in Complex Networks. Masters Thesis. Retrieved from http://bora.uib.no/bitstream/handle/1956/16057/Thesis_Herman_Lund.pdf?sequence=1

Lusseau, D., Schneider, K., Boisseau, O.J., Haase, P., Slooten, E. & Dawson, S.M. (2003). Behavioral Ecology and Sociobiology 54, Pg. 396-405.

Meila, M. & Shi, J. (2001). A random walks view of spectral segmentation. AI and Statistics (AISTATS).Newman, M. (2013). Network Data. Retrieved from http://www-personal.umich.edu/~mejn/netdata/

Ng, A.Y., Jordan, M.I. & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Pg. 849–856, MIT Press.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O. …Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. Pages 2825−2830.

Peel, L., Larremore, B.D. & Clauset, A. (2016). The ground truth about metadata and community detection in networks. Science Advances. 3. 10.1126/sciadv.1602548.

Rabbany, R., Chen, J. & Zaiane, O.R. (2010). Top leaders community detection approach in information networks. SNA-KDD Workshop on Social Network Mining and Analysis.

Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical review. E, Statistical, nonlinear, and soft matter physics, Volume 74 1 Pt 2, Pg. 016110.

Romano, S., Vinh, N.X., Bailey, J. & Verspoor, K. (2016). Adjusting for Chance Clustering Comparison Measures. Journal of Machine Learning Research 17. Pg. 1-32. Retrieved from http://jmlr.csail.mit.edu/papers/volume17/15-627/15-627

Rosenberg, A. & Hirschberg, J. (2007). V-Measure: A conditional entropy-based external cluster evaluation measure. Retrieved from http://aclweb.org/anthology/D/D07/D07-1043.pdf

Shapiro, L.; Stockman, G. (2002). Computer Vision. Prentice Hall. pp. 69–73.

Shi, J. & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 22, Issue 8, Pg. 888–905.

SNAP. (2018). Stanford Large Network Dataset Collection. Retrieved from https://snap.stanford.edu/data/index.html

SNAP (2018). email-Eu-core network. Retrieved from https://snap.stanford.edu/data/email-Eu-core.html

Sommer, F., Fouss, F., Saerens, M. (2017). Modularity-driven kernel k-means for community detection. 26th International Conference on Artificial Neural Networks, Lecture Notes in Computer Science, Vol. 10614, Pg. 423-433.

Tang, D. (2017). The Adjusted Rand Index. Retrieved from https://davetang.org/muse/2017/09/21/adjusted-rand-index/

Tzanakou, E.M. (2017). Supervised and Unsupervised Pattern Recognition: Feature Extraction and Computational Intelligence. CRC Press. Pg. 203.

Venkatesaramani, R., & Vorobeychik, Y. (2018). Community Detection by Information Flow Simulation. CoRR, abs/1805.04920.Vilcek, A. (2014). Deep Learning with K-Means Applied to Community Detection in Networks. Retrieved from http://snap.stanford.edu/class/cs224w-2014/projects2014/cs224w-31-final.pdf

Pattanaik, V., Singh, M., Gupta, P., Singh, S. K. (2016). Smart real-time traffic congestion estimation and clustering technique for urban vehicular roads. 3420-3423. 10.1109/TENCON.2016.7848689.

Vincent, L. & Soille, P. (1991). “Watersheds in digital spaces: an efficient algorithm based on immersion simulations”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 13 (6): 583. doi:10.1109/34.87344.

Wagner, S. & Wagner, D. (2007). Comparing Clusterings – An Overview. Technical Report 2006-04. Retrieved from https://i11www.iti.kit.edu/extra/publications/ww-cco-06.pdfWang J, Li M, Deng Y, Pan Y. (2010). Recent advances in clustering methods for protein interaction networks. BMC Genomics. doi:10.1186/1471-2164-11-S3-S10

Wang, S. & Koopman, R. (2017). “Clustering articles based on semantic similarity”. Journal Scientometrics, Volume 111 Issue 2, Pg. 1017-1031. Retrieved from https://doi.org/10.1007/s11192-017-2298-xWeisstein, E.W. (2018). Adjacency Matrix. From MathWorld–A Wolfram Web Retrieved from http://mathworld.wolfram.com/AdjacencyMatrix.html

Whang, J. & Dhillon, I. (n.d.). Overlapping Community Detection in Massive Social Networks. Retrieved from http://bigdata.ices.utexas.edu/project/graph-clustering/

Xu, R. & Wunsch, D. (2009). Clustering. John Wiley & Sons. Pg. 32

Zachary, W.W. (1977). “An information flow model for conflict and fission in small groups”, Journal of Anthropological Research, 33, Pg. 452-473.

Zhang, J., Zhu, K., Pei, Y., Fletcher, G. & Pechenizkiy, M. (2018). Clustering Affiliation Inference from Graph Samples. [email protected]’18. ACM ISBN 123-4567-24-567/08/06. Retrieved from http://www.mlgworkshop.org/2018/papers/MLG2018_paper_37.pdf

Zhang, W., Wang, X., Zhao, D. & Tang, X. (2012). Graph degree linkage: Agglomerative clustering on a directed graph. 12th European Conference on Computer Vision. https://arxiv.org/abs/1208.5092

AppendixPython Functions

Table 36 shows the list of Python functions that were used for implementing the two clustering algorithms and the performance metrics.

Table SEQ Table * ARABIC 44 Python Libraries UsedObjective Python Implementation Used

Louvain Algorithm igraph.Graph.community_multilevel()

K-Means Algorithm sklearn.cluster.KMeans()

K-Means++ Algorithm sklearn.cluster.KMeans(‘random’)

Modularity Score igraph.Graph.modularity()

Silhouette Score sklearn.metrics.silhouette_score()

Carlinsky and Harabasz Score sklearn.metrics.calinski_Harabasz_score()

NMI Score sklearn.metrics.normalized_mutual_info_score()

AMI Score sklearn.metrics.adjusted_mutual_info_score()

ARI Score sklearn.metrics.adjusted_rand_score()

F1 Score sklearn.metrics.f1_score()

Homogeneity sklearn.metrics.homogeneity_score()

FMI Score sklearn.metrics.fowlkes_mallows_score()

Network Graph

The layout to display the network graph is achieved by applying the Kamada-Kawai force-directed algorithm onto the network graph. Then this achieved layout is used to display the graph. Consider the code snippet below.

graph_network = igraph.Graph.Read_GML(“karate.gml”)

layout = graph_network.layout(“kk”)

igraph.plot(graph_network,layout=layout).save(os.path.join(‘Ground Clustering.png’))

The “kk” is the abbreviation for the Kamada-Kawai force-directed algorithm.

Undirected Graph

The email-Eu-Core dataset was converted to an undirected graph using the as_undirected() method from the igraph library of Python. See the following example code;

graph_network = graph_network.as_undirected()

## Free Thesis final v7 Dissertation Example

Do you need an original paper?

Approach our writing company and get top-quality work written from scratch strictly on time!

## Leave a Reply