DNA Sequencing Accelerator For Long Read

One of the most popular operations in personalized medicine is protein or DNA sequence database search based on pair-wise alignment, where a query sequence is compared with a database of sequences to find a highest-similarity sequence.
OLC-based assembly algorithms focus on finding the read-to-read overlaps, defined to be a common sequence between two reads. A read-to-read overlap is a sequence match between two reads, and occurs when local regions on each read originate from the same locus within a larger sequence.
In this project, we will design a stand-alone accelerator for long read (3rd generation) DNA sequence alignment for personalized medicine applications.

Project description:

One of the most popular operations in personalized medicine is protein or DNA sequence database search based on pair-wise alignment, where a query sequence is compared with a database of sequences to find a highest-similarity sequence. This similarity can provide insights on the functionality of the query protein or the role of a gene. Conventional computer architecture is proven to be inefficient for personalized medicine tasks. For example, aligning even several hundred DNA or protein sequences consumes several CPU hours on high performance computer. Hence, personalized medicine relies on hardware accelerators to keep up with the increasing amount of data generated from biology applications.

State of the art genome assembly methods designed for accurate short reads (2nd generation DNA sequencing) are not suitable for the 3rd generation nanopore DNA reads because of the high error rates of the current nanopore sequencing devices. Instead, Overlap-Layout-Consensus algorithms are used for nanopore sequencing reads, as they perform better with longer, error-prone reads. OLC-based assembly algorithms focus on finding the read-to-read overlaps, defined to be a common sequence between two reads. A read-to-read overlap is a sequence match between two reads, and occurs when local regions on each read originate from the same locus within a larger sequence.

In this project, we will design a stand-alone accelerator for long read (3rd generation) DNA sequence alignment for personalized medicine applications.

This is a research project, endeavoring into a new field of study, which may lead to further research and scientific publications.

What will we do and learn in the project?
1. Learn digital VLSI design tools and flow
2. Design a novel accelerator for DNA sequence alignment

Requirements
• Desire to innovate and try new things
• Ability to work independently

Prerequisites: Logic Design (044262)