Accelerator for Machine Learning System

An advanced scalable hardware accelerator for K-Means, targets unsupervised-learning and clustering applications.

Clustering is an unsupervised machine learning technique of dividing data into groups such that elements within each group are similar. Here every group is termed as cluster.

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. However, finding the optimal solution to the k-means clustering problem for observations in d dimensions is NP-hard in general Euclidean space (of d dimensions) even just for 2 clusters:

∑ ∑ arg min || xj – u_i ||² (for all s where , i = 1..k , j ϵ S_i )

where there are k clusters, S_i is the cluster i, x_j is the d-dimensional point in cluster S_i and μ_i is the centroid (average of the points) of cluster S_i.

There is a strong need for a hardware acceleration IP, to offload the CPU/MCU and allow a more power-efficient calculation approach.

This project proposes building a designated accelerator, which efficiently performs RAM-to-RAM calculations in hardware in a pipeline fashion and thereby dramatically reducing CPU load for machine-learning software applications. The accelerator is well defined in paper [3], provides APB interface [2] for external CPU access (configurations, results, etc.).

References

Mini-batch gradient descent – https://en.wikipedia.org/wiki/Stochastic_gradient_descent
AMBA APB – http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0024c/index.html
https://ieeexplore.ieee.org/abstract/document/5963944‏

Previous knowledge required

044262 – Logic Design

Design goals and challenges

Learning the basics of Verilog RTL coding language (commonly used in the industry).

Learning the basics of communication protocols, hereby AMBA APB.
Learning common Machine-Learning standards which are commonly used in the industry.

Practice in coding design using arch. spec., ramping up an advanced accelerator as an IP.