Density-Based Spatial Clustering of Applications with Noise

  1. Density-Based Approach: It groups together points that are closely packed, identifying regions of high density separated by regions of low density

    Does not assume that clusters have a specific shape or size, making it more flexible in identifying clusters of arbitrary shapes and sizes

  2. No Predefined Number of Clusters

  3. Robust to Noise and Outliers: It can identify and disregard points that do not belong to any cluster, classifying them as noise

  4. Cluster Representation: DBSCAN represents clusters as dense regions separated by areas of low density. It does not assign each point to a single cluster but rather identifies core points, border points, and noise points based on their density and proximity to other points.

  5. Parameter Selection: DBSCAN requires two main parameters

    1. epsilon (ϵ): defines the radius within which to search for neighboring points
    2. MinPts: specifies the minimum number of points required to form a dense region

Approach

  1. Core Points:
  2. Border Points:
  3. Noise Points:
  4. Density-Reachability:
  5. Density-Connectivity:

Steps

  1. The algorithm starts by randomly selecting a data point and determining whether it is a core point based on the number of neighboring points within the specified radius (ϵ)
  2. If a core point is found, DBSCAN recursively expands the cluster by adding density-reachable points to it
  3. Border points that are within the neighborhood of a core point are added to the cluster, and their neighborhood is explored to find additional core and border points
  4. Data points that are not density-reachable from any core point are labeled as noise points and do not belong to any cluster