Analysing the effect of different Distance Measures in K-means Clustering Algorithm
DOI:
https://doi.org/10.69974/glskalp.01.03.60Keywords:
Data Clustering, Unsupervised Learning, Data Mining, K-means Clustering, , Distance MeasuresAbstract
Distance metrics are primary means for measuring the distance between two objects and used as the principal means of deciding the similarity or dissimilarity between the data to be clustered. Different distance measures are applied by different clustering algorithms for the purpose of grouping objects into clusters. The use of a particular distance metric can affect to a great extent the performance of a clustering algorithm and hence the outcome also. In this paper, we analyze the impact of various distance measures in the performance of K-means algorithm. We first describe the different distance measures that are commonly used with K-means algorithm, followed by application of the K-means clustering algorithm with each of these distance measures on various synthetic and real standard clustering data sets. To measure the impact of each distance measure on the performance of K-means algorithm, we have deployed various performance evaluation metrics.
References
Revathi, S., & Nalini, D. T. (2013). Performance comparison of various clustering algorithm. International Journal of Advanced Research in Computer Science and Software Engineering, 3(2), 67-72.
Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on neural networks, 16(3), 645-67
Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and. Techniques (3rd ed), Morgan Kauffman
Trushali, J., Savita, G. (2021), “Analytical Review of K-means based algorithms and Evaluation Methods”, GRENZE International Journal of Engineering and Technology (GIJET), Volume 7, Issue 1, Grenze ID: 01.GIJET.7.1.8_1, 479-486.
Singh, A., Yadav, A., & Rana, A. (2013). K-means with Three different Distance Metrics. International Journal of Computer Applications, 67(10)
Kapil, S., & Chawla, M. (2016, July). Performance evaluation of k-means clustering algorithm with various distance metrics. In 2016 IEEE 1st International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES) (pp. 1-4). IEEE.
Faisal, M., & Zamzami, E. M. (2020, June). Comparative Analysis of Inter-Centroid K-Means Performance using Euclidean Distance, Canberra Distance and Manhattan Distance. In Journal of Physics: Conference Series (Vol. 1566, No. 1, p. 012112). IOP Publishing.
Sinwar, D., & Kaushik, R. (2014). Study of Euclidean and Manhattan distance metrics using simple k-means clustering. Int. J. Res. Appl. Sci. Eng. Technol, 2(5), 270-274.
Suwanda, R., Syahputra, Z., & Zamzami, E. M. (2020, June). Analysis of Euclidean Distance and Manhattan Distance in the K-Means Algorithm for Variations Number of Centroid K. In Journal of Physics: Conference Series (Vol. 1566, No. 1, p. 012058). IOP Publishing.
Thakare, Y. S., & Bagal, S. B. (2015). Performance evaluation of K-means clustering algorithm with various distance metrics. International Journal of Computer Applications, 110(11), 12-16.
Patel, S., Trivedi, D., Bhatt, A., & Shanti, C. (2021). Web visibility and research productivity of NIRF ranked universities in India: A Webometric study. Library Philosophy and Practice (E-Journal). https://digitalcommons.unl.edu/libphilprac/5326/
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.