Implementation of a DNA Sequencing Accelerator

The DNA Sequencing process involves passing a strand of DNA through the nanopore which causes drops in the electric current passing between the walls of the pore. The amount of change in the current depends on the type of base passing through the pore. This signal is then sampled.
In this project, we will design a stand-alone accelerator for the 3rd generation DNA sequence basecalling for personalized medicine applications.

DNA sequencing involves the determination of the order of DNA bases. It allows scientists to sequence genes and genomes. Once genes are identified and analysed from sequence information, scientists can look for mutations that cause disease, thereby providing valuable medical information.

The DNA Sequencing process involves passing a strand of DNA through the nanopore which causes drops in the electric current passing between the walls of the pore. The amount of change in the current depends on the type of base passing through the pore. This signal is then sampled.

Minion, https://nanoporetech.com/products/minion produces a file of raw samples or digital data, (typically at ~20x rate, i.e. ~100 samples typically yield ~5 bases).

Basecalling, the initial step of the genome assembly pipeline, translates the raw signal into bases (A, C, G, T) to generate DNA reads. Deletions are the dominant error of some DNA sequencing techniques. Basecalling is the most important step of the genome assembly pipeline that plays a critical role in decreasing the error rate. It is a very challenging task and is extremely time-consuming, taking hours in high end computers.

Since the signal is very noisy and there are only 4 output levels (2 bit signal), a simple sampling would create an unbearable error rate. Therefore some sophisticated techniques are required. This is a very new and developing field, where solutions keep changing. The latest basecalling solutions, that showed good results, are based on neural networks: RNN or combined CNN and RNN. So the basecaller accelerator is in fact a combination of CNN and RNN accelerator.

The resulting error rate still remains extremely high (up to 30%) which is resolved by redundancy (every base in genome is covered 30-50 times) and majority voting.

In this project, we will design a stand-alone accelerator for the 3rd generation DNA sequence basecalling for personalized medicine applications.

This is a research project, endeavoring into a new field of study, which may lead to further research and scientific publications.

What will we do and learn in the project?
1. Learn digital VLSI design tools and flow
2. Design a novel accelerator for DNA sequence basecalling

Requirements
• Desire to innovate and try new things
• Ability to work independently

Prerequisites Logic Design