Hardware Acceleration of DBSCAN Clustering

Project description:

Clustering is the task of unifying data points into groups or clusters, where the grouping of the points is commonly based as distance. Clustering has many applications including data mining, statistical data analysis, pattern recognition, and more. Two common clustering algorithms are K-Means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN).

With increasing needs to perform clustering on large datasets as fast as possible, running these on generic processors is proving to be inadequate, and specialized hardware would significantly improve run time.

The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm performs clustering based on the spatial density of the data points. The algorithm produces results superior to Kmeans for a wide range of data sets. For DBSCAN to form a cluster, there must exist at least a user defined number of data points that are all within the user defined radius of each other. Additional details are available in the following paper:

https://dl.acm.org/doi/10.1145/2724722

The algorithm was first presented by Martin Ester, Hans-Peter Kriegel, Jiirg Sander, Xiaowei Xu in :

https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf

The goal of this project is to implement dedicated hardware to run the DBSCAN algorithm in order to drastically improve run time. The algorithm will be adapted to allow hardware implementation and an efficient pipelined architecture will be designed. The design will be simulated, synthesized and its layout will be implemented.

Design goals and challenges

Learning the basics of Verilog RTL coding language (commonly used in the industry).
Learning the DBSCAN algorithm.
Design and implement an accelerator IP for the DBSCAN algorithm.
Perform the complete VLSI design flow from architectural design to backend implementation of the accelerator using sophisticated Cadence and Synopsys tools.