HW Implementation of MiniBatch Kmeans – A Clustering Algorithm for Unsupervised Learning

Clustering for unsupervised learning is an common task in machine learning systems. Several algorithms can be used for this task, for example K-Means. The main problem with K-means algorithm is the huge amount of computations. Minibatch Kmeans proposes an effective technique to drastically reduce the number of computations with an insignificant impact on the quality of the results.  The goal of this project is to design and implement a hardware Minibatch Kmeans accelerator from architectural design to backend implementation.

Clustering is adata analysis technique used to get an intuition about the structure of the data.It involves identifying subgroups in the data such that data points in the same subgroup (cluster).

Clustering is considered an unsupervised learning method as the output of the algorithm is not compared with any expected outcome. The aim is to understand the structure of the data by grouping it into subgroups according to a specified criterion.

Clustering is a compute intensive task, and the goal of the project is to implement dedicated hardware to perform these computations in order to offload the CPU/MCU and allow a more power-efficient calculation approach.

Several algorithms can be used for this task for example K-Means. The main problem with K-means algorithm is the huge amount of computations. Minibatch Kmeans proposes an effective technique to drastically reduce the number of computations with an insignificant impact on the quality of the results.

The algorithm will be adapted to allow hardware implementation and an efficient pipelined architecture will be designed. The design will be simulated, synthesized and its layout will be implemented.

Design goals and challenges

  • Learning the basics of Verilog RTL coding language (commonly used in the industry).
  • Learning the Minibatch K-means algorithm.
  • Design and implement an accelerator IP for the Minibatch K-means algorithm.
  • Perform the complete VLSI design flow from architectural design to backend implementation of the accelerator using sophisticated Cadence and Synopsys tools.

Prerequisite :  Digital Systems and Computer Structure – 044252