Skip to main content

K-Means Clustering Algorithm and K-Medoids Clustering Algorithm

K-Means Clustering Algorithm and K-Medoids clustering

K-Means Clustering Algorithm 

K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems in machine learning or data science. In this topic, we will learn what is K-means clustering algorithm, how the algorithm works, along with the Python implementation of k-means clustering.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group that has similar properties.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats the process until it does not find the best clusters. The value of k should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

-Determines the best value for K center points or centroids by an iterative process.

-Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a cluster.

Hence each cluster has data points with some commonalities, and it is away from other clusters. 

The below diagram explains the working of the K-means Clustering Algorithm:

K-Medoids clustering-

K-Medoids and K-Means are two types of clustering mechanisms in Partition Clustering. First, Clustering is the process of breaking down an abstract group of data points/ objects into classes of similar objects such that all the objects in one cluster have similar traits. , a group of n objects is broken down into k number of clusters based on their similarities.

Two statisticians, Leonard Kaufman, and Peter J. Rousseeuw came up with this method. This tutorial explains what K-Medoids do, their applications, and the difference between K-Means and K-Medoids.

K-medoids is an unsupervised method with unlabelled data to be clustered. It is an improvised version of the K-Means algorithm mainly designed to deal with outlier data sensitivity. Compared to other partitioning algorithms, the algorithm is simple, fast, and easy to implement.

K-Medoids:

Medoid: A Medoid is a point in the cluster from which the sum of distances to other data points is minimal.

(or)

A Medoid is a point in the cluster from which dissimilarities with all the other points in the clusters are minimal.

Instead of centroids as reference points in K-Means algorithms, the K-Medoids algorithm takes a Medoid as a reference point.

There are three types of algorithms for K-Medoids Clustering:

1.PAM (Partitioning Around Clustering)

2.CLARA (Clustering Large Applications)

3.CLARANS (Randomized Clustering Large Applications)

Explanation:

Clustering is one of the most important techniques in data mining and machine learning. It is an unsupervised learning method used to group similar data points into clusters based on their features. Among the various clustering algorithms, K-Means and K-Medoids are two of the most widely used partitioning techniques that help in identifying natural groupings within data.

K-Means Clustering Algorithm

The K-Means algorithm aims to divide a dataset into K distinct clusters, where each cluster is represented by the mean (centroid) of its data points. The process begins by randomly selecting K initial centroids. Each data point is then assigned to the nearest centroid based on the Euclidean distance. After assignment, the centroids are recalculated as the mean of all points in each cluster. This process repeats iteratively until the centroids no longer change significantly or a stopping condition is met.

K-Means is highly efficient and works well for large datasets with continuous numerical values. It minimizes the within-cluster sum of squares (WCSS), ensuring that data points in the same cluster are as similar as possible. However, K-Means has some limitations: it is sensitive to outliers, requires the number of clusters (K) to be predefined, and may converge to local minima depending on initial centroid placement.

K-Medoids Clustering Algorithm

The K-Medoids algorithm is a more robust alternative to K-Means. Instead of using the mean, K-Medoids represents each cluster by one of its actual data points called a medoid — the most centrally located object within the cluster. The most common implementation of this method is PAM (Partitioning Around Medoids).

The algorithm assigns each data point to the nearest medoid based on a chosen distance metric (often Euclidean or Manhattan). It then iteratively replaces medoids with non-medoid points if doing so reduces the total cost (sum of dissimilarities between points and their assigned medoids). Because it uses actual data points as centers, K-Medoids is less sensitive to noise and outliers than K-Means, though it is computationally more expensive.

Conclusion

Both algorithms are powerful tools for data segmentation. K-Means is faster and suitable for large datasets, while K-Medoids provides more accurate clustering when dealing with noisy or non-numeric data. Understanding both helps data scientists choose the right approach for different types of data mining applications.


Read More-

  1. Data Warehouse and Data Mining All Chapter Notes
  2. What Is Data Warehouse
  3. Applications of Data Warehouse, Types Of Data Warehouse
  4. Architecture of Data Warehousing
  5. Difference Between OLTP And OLAP
  6. Python Notes

Comments

Popular posts from this blog

The Latest Popular Programming Languages in the IT Sector & Their Salary Packages (2025)

Popular Programming Languages in 2025 The IT industry is rapidly evolving in 2025, driven by emerging technologies that transform the way businesses build, automate, and innovate. Programming languages play a vital role in this digital revolution, powering everything from web and mobile development to artificial intelligence and cloud computing. The most popular programming languages in today’s IT sector stand out for their versatility, scalability, and strong developer communities. With increasing global demand, mastering top languages such as Python, Java, JavaScript, C++, and emerging frameworks ensures excellent career growth and competitive salary packages across software development, data science, and IT engineering roles. 1. Python Python stands as the most versatile and beginner-friendly language, widely used in data science, artificial intelligence (AI), machine learning (ML), automation, and web development . Its simple syntax and powerful libraries like Pandas, ...

Why Laravel Framework is the Most Popular PHP Framework in 2025

Laravel In 2025, Laravel continues to be the most popular PHP framework among developers and students alike. Its ease of use, advanced features, and strong community support make it ideal for building modern web applications. Here’s why Laravel stands out: 1. Easy to Learn and Use Laravel is beginner-friendly and has a simple, readable syntax, making it ideal for students and new developers. Unlike other PHP frameworks, you don’t need extensive experience to start building projects. With clear structure and step-by-step documentation, Laravel allows developers to quickly learn the framework while practicing real-world web development skills. 2. MVC Architecture for Organized Development Laravel follows the Model-View-Controller (MVC) architecture , which separates application logic from presentation. This structure makes coding organized, easier to maintain, and scalable for large projects. For students, learning MVC in Laravel helps understand professional ...

BCA- Data Warehousing and Data Mining Notes

  Data Warehousing and Data Mining Data Warehousing and Data Mining (DWDM) are essential subjects in computer science and information technology that focus on storing, managing, and analyzing large volumes of data for better decision-making. A data warehouse provides an organized, integrated, and historical collection of data, while data mining extracts hidden patterns and valuable insights from that data using analytical and statistical techniques. These DWDM notes are designed for students and professionals who want to understand the core concepts, architecture, tools, and real-world applications of data warehousing and data mining. Explore the chapter-wise notes below to strengthen your theoretical knowledge and practical understanding of modern data analysis techniques. Chapter 1-Data Warehousing What Is Data Warehouse Applications of Data Warehouse, Types Of Data Warehouse Architecture of Data Warehousing Difference Between OLTP And OLA...