Agglomerative vs divisive

Agglomerative clustering starts with individual data points as separate clusters and merges them iteratively, while divisive clustering starts with all data points in a single cluster and splits them recursively.

Agglomerative is more commonly used.

Approach

  1. Initialisation: Begin by treating each data point as a separate cluster
  2. Compute Pairwise Distances: Calculate the distance between each pair of clusters (initially, individual data points are considered as clusters). Common distance metrics include Euclidean distance, Manhattan distance, and others.
  3. Merge Closest Clusters:
  4. Update Distance Matrix: Update the distance matrix to reflect the distances between the newly formed cluster and the remaining clusters using the chosen linkage criterion
  5. Repeat Until Termination: Repeat steps 3 and 4 until only a single cluster containing all data points remains
  6. Construct Dendrogram:
  7. Select Number of Clusters: Determine the number of clusters by cutting the dendrogram at an appropriate level. The choice of the number of clusters depends on the specific problem and the desired granularity of the clustering.

Linkage criteria

Used to measure the distance between clusters during the clustering process.

  1. Single Linkage (Minimum Linkage):