
Project description:
How can we tell when a new mutation of COVID virus appears? We sequence DNA samples from many patients. These samples contain the host (patient’s) DNA as well as DNAs of multiple viruses and bacteria that live in our body and make their way to the sample. Then, we need to compare huge amounts of sequenced data with existing COVID strains and decide if there is a new variation (which will be very similar to existing COVID strains but still genetically different). The task is especially difficult because sequenced data contains sequencing errors in addition to genetic variations.
Content Addressable Memory (CAM) is widely used in computers (cache, TLB) for parallel search (compare). However, exact pattern matching enabled by conventional CAM proved to be inefficient in applications such as detecting viral mutations, which are part of larger emerging field of DNA processing and genome analysis. This is due to all those genetic variations and sequencing errors that make exact comparison impossible.
In this project, we will design a novel CAM that enables approximate search (i.e. search that tolerates differences up to certain programmable threshold). Such ability enables an efficient DNA pattern matching in the presence of sequencing errors and genetic variations, thus significantly speeding up viral mutation detection and genome analysis in general.
This is a research project, endeavoring into a new field of study, which may lead to further research and scientific publications.
What will we do and learn in the project?
- Learn a thing or two about DNA sequencing and genome analysis
- Learn VLSI circuit design tools
- Develop a novel memory architecture for very relevant and important purpose
Requirements
- Desire to innovate and try new things
- Ability to work independently and endeavor into unchartered territory
Prerequisites
- Logic Design 044262
- Nothing else : (this project would be the opportunity to get an exposure to the DNA sequencing and analysis field)
Prerequisites: Digital Systems and Computer Structure – 044252
Supervisor : Dr. Leonid Yavits