• Many techniques have been developed to simplify the testing of device after production. One technic is called "boundary scan" or sometimes referred to as "JTAG" (Joint Test Action Group). Each device that complies with the standard, which was accepted by the IEEE, includes 5 dedicated pins for testing only (AKA TAP – Test Access Port). In this project the students will take an existing logic project, preferably their own VLSI...
    Categories: |
  • Project description: A classifier is a machine learning model that is used to distinguish between different objects based on features. The Naive Bayes classifier is very effective in many real-world situations, like document classification and spam filtering. A Naive Bayes classifier is based on applying Bayes’ theorem. It utilizes the “naive” assumption of conditional independence between every pair of features.  Despite this simplifying assumption naive Bayes classifiers work very well....
    Categories: | |
  • Flash memory is widely-used memory technology, used in disk-on-keys, SSDs, set-top boxes (routers, TVs etc.), cellular SIM, and more. Flash memory requires a unique memory controller, as Flash is block-addressable, has unique error handling correction properties, wear leveling management and more. Solid-state drive architectures can arrange Flash chips and controller in several topologies: channels, bus-based, full crossbar and more. In this project, the students will implement a design of controller...
    Categories: | |
  • Project description: DNA digital data storage is defined as the process of encoding and decoding binary data to and from synthesized DNA strands. The global community produces digital data at increasing rates, creating enormous data centers for storage. Recent research proposes replacing the traditional data storage devices with biological DNA-based device, which can store information of the scale of a data-center within a few grams of weight. During DNA synthesis...
    Categories: | |
  • Project description: DNA digital data storage is defined as the process of encoding and decoding binary data to and from synthesized DNA strands. The global community produces digital data at increasing rates, creating enormous data centers for storage. Recent research proposes replacing the traditional data storage devices with biological DNA-based device, which can store information of the scale of a data-center within a few grams of weight. During DNA synthesis...
    Categories: | |
  • Clustering is the task of dividing data points into a number of groups such that data points in the same group are more similar to other data points in the same group than those in other groups. Kmeans is an effective clustering algorithm based on clustering the data points using the minimum distance of the mean of all the points in each cluster. For some datasets, Kmeans does not provide...
    Categories: | |
  • Project description: Deep neural networks can be extraordinarily accelerated by using memristive devices as synaptic connections. However, traditionally, the deep neural networks utilize the error backpropagation algorithms, which face some issues when the networks are implemented in hardware based on memristive devices: i) complex peripheral circuits with expensive ADCs and DACs and memory back for intermediate layer states; ii) lack of efficient online training methods. We recently developed an efficient...
  • Background: Computing-in-memory (CiM) has been a potential solution to break the memory wall and energy wall brought by the conventional computer architecture that separates the computing units and memory units. RRAM-based stateful logic is a kind of CiM that could implement any function in RRAM crossbar array. There are some efficient synthesis and mapping methods for 2D RRAM crossbar array. 3D RRAM crossbar arrays are denser and can support stateful...
  • Project description: How can we tell when a new mutation of COVID virus appears? We sequence DNA samples from many patients. These samples contain the host (patient’s) DNA as well as DNAs of multiple viruses and bacteria that live in our body and make their way to the sample. Then, we need to compare huge amounts of sequenced data with existing COVID strains and decide if there is a new...
    Categories: | |
  • Project description: High-throughput sequencing have substantially changed the way biological research is performed since the early 2000s.  These sequencing technologies obtain millions of short fragments (sequences) of DNA from a living organism to generate the organism’s DNA blueprint (genome). Thanks to these new DNA sequencing platforms, we can now investigate human genome diversity between populations, find genomic variants that are likely to cause diseases and even investigate the genomes of...
    Categories: | |
  • Project description: Background: The goal of the project is to design and implement a video processing accelerator to allow real time processing of a video stream. The accelerator will be composed a series of independent video processing units each of which receive a video stream as input and generate a processed video stream at the output which is fed into the next unit. Alpha blending is the process of combining...
  • Project Abstract: There are endless number of platforms that require implementation of video transformations, such as curve TV/computer/smartphone screens, goggles, pilot hamlet, etc. All these platforms require transformation of flat image to curved image that fits the display, so the user can see the image well without data loss. The main challenges of the core implementation are low latency (“video in => video out), high video resolutions and frame rate....
  • Project description: Clustering is the task of unifying data points into groups or clusters, where the grouping of the points is commonly based as distance. Clustering has many applications including data mining, statistical data analysis, pattern recognition, and more. Two common clustering algorithms are K-Means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). With increasing needs to perform clustering on large datasets as fast as possible, running these on...
    Categories: | |
  • Project description: One of the most popular operations in personalized medicine is protein or DNA sequence database search based on pair-wise alignment, where a query sequence is compared with a database of sequences to find a highest-similarity sequence. This similarity can provide insights on the functionality of the query protein or the role of a gene. Conventional computer architecture is proven to be inefficient for personalized medicine tasks. For example,...
    Categories: | |
  • The goal of this project is to design and implement an RTL IP (System Verilog) that will enable multiple instances of RISC-V cores to be connected in a ring configuration. The IP will consist of two main interfaces – on the one side the “Core” and the other the “Ring”. The Ring Interface will manage the data transactions on the ring - pushing and pulling RD/WR/RD_RSP transactions to/from the ring....
  • Clustering for unsupervised learning is an common task in machine learning systems. Several algorithms can be used for this task, for example K-Means. The main problem with K-means algorithm is the huge amount of computations. Minibatch Kmeans proposes an effective technique to drastically reduce the number of computations with an insignificant impact on the quality of the results.  The goal of this project is to design and implement a hardware...
    Categories: | |
  • The Advanced Matrix Extension (AMX), a new x86 extension designed for operating on matrices with the goal of accelerating machine learning computations. Intel’s Advanced Matrix Extensions (AMX) is a new 64-bit programming paradigm consisting of two components: A set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image and an accelerator that is able to operate on tiles. In the first stage of this project, a preprocessor...
    Categories: | | |
  • The DNA Sequencing process involves passing a strand of DNA through the nanopore which causes drops in the electric current passing between the walls of the pore. The amount of change in the current depends on the type of base passing through the pore. This signal is then sampled. In this project, we will design a stand-alone accelerator for the 3rd generation DNA sequence basecalling for personalized medicine applications.
    Categories: | |
  • Project description: Flash memory is widely-used memory technology, used in disk-on-keys, SSDs, set-top boxes (routers, TVs etc.), cellular SIM, and more. Flash memory requires a unique memory controller, as Flash is block-addressable, has unique error handling correction properties, wear leveling management and more. Solid-state drive architectures can arrange Flash chips and controller in several topologies: channels, bus-based, full crossbar and more. There are several new trends in SSDs that should...
    Categories: | |
  • A standard solution to memory security is encrypting all data written to untrusted storage. A big problem with client-side encryption (and other systems that protect only the data itself) is that it does not protect all aspects of how the client interacts with the server's storage. Where storage is accessed, the access pattern can also reveal secret information. Suppose a patient stores his/her genome on a remote server and wishes to check...
  • Sparse linear algebra is a frequent bottleneck in machine learning and data mining workloads. The efficient acceleration of sparse matrix calculations becomes even more critical when applied to big data problems. The goal is to implement an accelerator for multiplying a sparse matrix with a sparse vector. Current solutions fetch from memory all non-zero elements of the sparse matrix. The aim of this project is to implement a technique in...
    Tags:
  • An advanced scalable hardware accelerator for deep Convolutional Auto-Encoder (CAE), targets deep-learning applications. Integrating a CAE hardware accelerator has advantages in resources occupation, operation speed, and power consumption, indicating great potential for application in digital signal processing. This project suggests building a designated acceleration IP, which efficiently performs RAM-to-RAM calculations in a pipeline fashion and thereby dramatically offloads machine-learning software applications.
    Categories: | |
  • In this project, you are required to design a systolic array that efficiently implements the logic required to support per-channel activation tensor quantization for a convolution neural network. You are required to implement the design using SystemVerilog, simulate and synthesize it after which the layout will be designed. Area, power, and energy will be analyzed and compared to a conventional systolic array. Skills you will acquire: SystemVerilog, Synopsys Design Compiler,...
    Categories: | |
  • There are endless number of platforms that require implementation of video transformations, such as curved TV/computer/smartphone screens, goggles, pilot hamlet, etc. All these platforms require transformation of flat image to curved image that fits the display, so the user can see the image well without data loss. The main challenges of the core implementation are low latency (“video in => video out), high video resolutions and frame rate. The goal...
  • Hardware Trojan horses are a real concern for the last 12 years or so, especially for national security. . A few examples of what such a Trojan can do when triggered are : 1. Turn off security protections or insert a known key to the encryption engine; 2. Insert errors to cause malfunction of a critical infrastructure; 3: Leak information to an unprotected zone (for example from a privileged CPU...
    Categories: | |
  • RISC-V (pronounced "risk-five") is an open-source hardware instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles.The project began in 2010 at the University of California, Berkeley, but many contributors are volunteers not affiliated with the university. The goal of this project is to evaluate the enhanced performance of the double issue capability.
  • The goal is to design and implement the HDL of a high-performance hardware serial divider for high frequencies. Initially, at least two different division algorithms will be investigated and analyzed. The design will be parametrized so that it can be configured according to specified requirements. The divider will support a variety of input / output number representation formats.
  • Project description:Template Matching is a method for searching and finding the location of a template image in a larger image. It relies on calculating at each position of the image under examination a correlation or distortion function that measures the degree of similarity or dissimilarity to a template sub-image.Among the correlation/distortion functions proposed in literature, Normalized Cross-Correlation (NCC) and Zero mean Normalized Cross Correlation (ZNCC) are widely used due to...
    Categories: | |
  • RISC-V (pronounced "risk-five") is an open-source hardware instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles.The project began in 2010 at the University of California, Berkeley, but many contributors are volunteers not affiliated with the university. RISC-V, pronounced 'Risk-Five', is a new architecture that is available under open, free and non-restrictive licences. It has widespread industry support from chip and device makers, and is designed to...
  • Problem Description: Network routers by nature handle thousands of mega packets per second. Each packet might come from one port and be destined to another port. The actual routing decision is made only once the packet is received and inspected. This scheme by definition, causes head of line blocking, in which one packet destined to a blocked destination completely blocks the input queue or the common processing pipeline. These kinds...
  • RISC-V (pronounced "risk-five") is an open-source hardware instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles.The project began in 2010 at the University of California, Berkeley, but many contributors are volunteers not affiliated with the university. The goal of this project is to study the RISC-V instruction set and then to design and implement a basic RISC-V microprocessor that supports all the instructions. Additional features will...
    Categories: | |
  • The global community produces digital data at increasing rates, creating enormous data centers for storage.Recent research proposes replacing the traditional data storage devices with biological DNA-based device, which can store information of the scale of a data-center within a few grams of weight.In this project, the student will study the emerging technological approach, and will implement digital controller circuits for managing DNA storage device. The main goals are understanding of...
    Categories: | |
  • The goal of this project is to perform the complete backend design of the OFDM transmitter chip and its integrated memories. This includes : synthesis, gate level simulation, physical (layout) design and verification, timing verification, power and power grid analysis. The chip may then be submitted for fabrication.  The implementation will be done in Tower CMOS 0.18u technology.
    Categories: | |
  • RISC-V is a classic RISC architecture rebuilt for modern times. At its heart is an array of 32 registers containing the processor's running state, the data being immediately operated on, and housekeeping information.  RISC-V comes in 32-bit and 64-bit variants, with register size changing to match. A large amount of code has  been developed and written at IBM in assembly for the PowerPC processor for which no C source-code exists....
  • Reverse engineering of Integrated Circuits (IC's) is a complex process that involves multiple disciplines and skills. The input to the process is usually a physical device, and the output is a human-readable specification. At the first phase, the IC passes tear down to obtain a gate-level netlist description. In the second phase, a specification is extracted. The second stage is non-trivial and involves various learning algorithms and heuristics. The purpose...
    Categories: | |
  • A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers. The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship. The goal of this project is to build a novel DNN accelerator with simultaneous multi-threading.
  • One of the most popular operations in personalized medicine is protein or DNA sequence database search based on pair-wise alignment, where a query sequence is compared with a database of sequences to find a highest-similarity sequence. This similarity can provide insights on the functionality of the query protein or the role of a gene. Conventional computer architecture is proven to be inefficient for personalized medicine tasks. For example, aligning even...
    Categories: | |
  • One of the most popular operations in personalized medicine is protein or DNA sequence database search based on pair-wise alignment, where a query sequence is compared with a database of sequences to find a highest-similarity sequence. OLC-based assembly algorithms focus on finding the read-to-read overlaps, defined to be a common sequence between two reads. A read-to-read overlap is a sequence match between two reads, and occurs when local regions on...
    Categories: | |
  • Modern computer architectures increasingly rely on speculation to boost instruction-level parallelism. One of the common methods is the branch prediction. There are several ways to predict whether a branch is taken or not-taken, which significantly reduce the penalty of the branch. In this project we will develop a branch prediction that is bases on neural-network. The Fast Path-Based Neural Branch Prediction can reach 5% to 7% percent misprediction depending on...
    Categories: | |
  • A CISC decoder is typically set up as a state machine. The machine reads the opcode field to determine what type of instruction it is, and where the other data values are. The instruction word is read in piece by piece, and decisions are made at each stage as to how the remainder of the instruction word will be read. One method to alleviate this is to use a decoded...
    Categories: | |
  • The goal of this project is the development of an autonomous cyber protection chip for computer systems and communication channels linked to the cloud. Background: Current technology drives the accelerated development of computer components with increasing processing capabilities, bandwidth and high level of connectivity between components that maintain a constant link to the cloud. Such systems present a significant challenge in protecting the proper operation of the components. The purpose...
    Categories: | |
  • A systolyic array is an homogenous array of identical processors each performing the same function and each connected to several neighbours. Such a structure is very suitable for fast and efficient implementation of machine learning algorithms. The goal of this project is to design and implement an architecture for the computation of the convolution stage of a neural network for deep learning.
  • Stochastic Computing (SC), which uses a bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has high potential for implementing CNNs with ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power (energy) and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. In this project we will...
    Categories: | |
  • A group in Intel is working on x86 test content optimization and creation using ML techniques. A working solution already exists for test content optimization  in production mode. The next stage of the project is to create new content automatically by learning from legacy content (since x86 is backward compatible, huge legacy is available to learn from). Test optimization refers to the compilation of a test suit that achieves the...
    Categories: | | |
  • An advanced scalable hardware accelerator for mini batch gradient descent, targets deep-learning applications. Deep neural networks are being widely used in a large number of applications for analyzing and extracting useful information from large amount of data that is being generated every day. Inference and training are the two modes of operation of a neural network. Training is the most computationally challenging task as it involves solving a large-scale optimization...
    Categories: | |
  • A novel night vision low resolution camera is being developed in Technion. It is based on a thermally isolated floating MOS transistor used to sense temperature changes as a result of external Infrared radiation. When a constant voltage is applied to the transistor, its current signal follows the temperature variations. This current signal is read out and amplified before further processing. This is done by an integrated readout circuit (ROIC).
    Categories: | | |
  • The Technion's innovative TMOS sensors utilize widely available and affordable CMOS-SOI technology together with MEMS micromachining to achieve break-through in passive IR imaging. The CMOS-SOI technology allows the integration of the 2D sensors focal plane array matrix with the analog readout, which is the subject of this project. In this project, you will design, implement and simulate top level architecture for an IR camera system that includes 10x10 matrix of...
    Categories: | |
  • Although advances with silicon-based electronics continue to be made, alternative technologies are being explored. Digital circuits based on transistors fabricated from carbon nanotubes (CNTs) have the potential to outperform silicon by improving the energy–delay product, a metric of energy efficiency, by more than an order of magnitude. Hence, CNTs are an exciting complement to existing semiconductor technologies. In order to evaluate the potential of CNFETs to replace silicon CMOS technology,...
    Categories: | |
  • A vast majority of the modern digital VLSI devices utilize a technique called 'full scan' for production testing. This technique concatenates all the device registers (flip-flops or latches) in a few shift registers called 'scan chains'. In this configuration, a production tester may use the scan chains to drive logic values to the inputs of combinatorial circuits, sample the results from their outputs, output the results via the same scan...
    Categories: | |
  • A vast majority of the modern digital VLSI devices utilize a technique called 'full scan' for production testing. This technique concatenates all the device registers (flip-flops or latches) in a few shift registers called 'scan chains'. In this configuration, a production tester may use the scan chains to drive logic values to the inputs of combinatorial circuits, sample the results from their outputs, output the results via the same scan...
    Categories: | |
  • The RSA algorithm stood out among asymmetric encryption systems as a conceptually simple and practical encryption and authentication method which provides a near perfect level of security. Public-key cryptographic systems, such as the RSA often involve modular exponentiation (Z = Ye mod n). This widely used and computational complex operation is  performed using successive modular multiplications (C = AB mod n). The performance of such cryptosystems is primarily determined by...
    Categories: | |
  • The RSA algorithm stood out among asymmetric encryption systems as a conceptually simple and practical encryption and authentication method which provides a near perfect level of security. Public-key cryptographic systems, such as the RSA often involve modular exponentiation (Z = Ye mod n). This widely used and computational complex operation is  performed using successive modular multiplications (C = AB mod n). The performance of such cryptosystems is primarily determined by...
    Categories: | |
  • Modern flash-based memories contain aggressive 19nm scaling of floating-gate transistors. When performing read/write/erase commands in a flash memory,  the chip is occupied and cannot be used to perform other commands in parallel. It is sometimes possible to stop the instruction execution in the middle (to perform another instruction) but the penalty of return is a significant slowdown of command execution. The SSD architecture consists of multiple channels. Each has multiple...
    Categories: | |
  • Modern flash-based memories contain aggressive 19nm scaling of floating-gate transistors. As a result, data is often stored with errors due to inter-cell interference, coupling, random-telegraph noise and more. The signal-to-noise ratio becomes even worse as density increases. In order to provide reliable data storage, system controller employs error-correcting algorithms. In this project, the students will implement a design of advanced error-correction encoder and decoder. The goal is to study and...
    Categories: | |
  • The NVM Express (NVMe) specification was introduced in 2011 and today it is the new standard storage interface for Solid-State Drives (SSD). The NVM Express specification defines a controller interface for PCIe SSD used for Enterprise and Client applications. It is based on a queue mechanism with advanced register interface, command set and feature set including error logging, status, system monitoring (SMART, health), and firmware management). The southbridge is one...
    Categories: | |
  • Write-Once Memory (WOM) code enable to transform information such that consecutive writes to the memory would have uni-directional transition of bits. This property is useful for SSD memory since it reduces the number of program-erase cycles, thus it increases the memory endurance and might and also performance impact. In this project, the students will do analysis/trade-off of new WOM codes efficiency and power/area/throughput comparison. The goal is to implement a...
    Categories: | |
  • Problem Definition: Network routers by nature handle thousands of mega packets per second. Each packet might come from one port and be destined to another port. The actual routing decision is made only once the packet is received and inspected. This scheme by definition, causes head of line blocking, in which one packet destined to a blocked destination completely blocks the input queue or the common processing pipeline. These kinds...
  • Orthogonal Frequency Division Multiplexing (OFDM) is a Frequency Division Multiplexing (FDM) technique used as a digital multi-carrier modulation method. Instead of using one high speed channel, the data is split into a large number of lower speed channels. Orthogonal sub carriers are used to carry data on several parallel data streams which allows more efficient use of the spectrum compare to regular FDM. Orthogonality of the carriers prevents interference between...
  • Implementation of a Smallest Univalue Segment Assimilating Nucleus (SUSAN) Block
    Edge and feature extraction is one of the most important first steps in computer vision. Its main objective is to find as many useful features from a scene while keeping the output noise level to a minimum. Edge, corner and vertex detection processes serve to simplify the analysis of images by drastically reducing the amount of data to be processed.  The SUSAN principle is the basis for algorithms to perform...
  • Memristors are resistive devices with varying resistance which depends on the voltage applied to the device. The most natural memristor application is memory. However memristors can also be used for other applications, for example logic circuits. Once such approach is MRL (Memristor Ratioed Logic) - a hybrid CMOS-memristive logic family. In MRL, OR and AND logic gates are designed using memristors. The limitation of MRL is that every memristor-based logic...
    Categories: | |
  • An advanced Global-Navigation-Satellite-System (GNSS) accelerator, which provides the end user with improved position, velocity and time solutions. High performance conventional GPS/GNSS receivers rely on ASIC technology to implement massive correlators, as the performance of SDR solutions is still limited. With a reasonable distribution of tasks between the host hardware and reconfigurable peripherals, a higher performance is achieved. The figure illustrates a schematic structure of a GNSS receiver, where the proposed...
    Categories: | |
  • מערכות הסוחרות באופן אוטומטי בניירות ערך שינו מן היסוד את פעילות שוק ההון בשנים האחרונות. רוב המסחר בבורסות האמריקאיות מתנהל כיום ללא כל מעורבות אנושית. מכונות המסחר יכולות להיות מתוכננות לסחור במניות, אופציות, חוזים עתידיים ומוצרי מט"ח המבוססים על אוסף של כללים מוגדר מראש  הקובעים מתי לקנות, מתי למכור וכמה כסף להשקיע בכל מוצר מסחר. מערכות המסחר האוטומטיות הולכות ומשתכללות תוך עיבוד נתונים בכמות ובקצב הולכים וגדלים יחד עם קיצור...
    Categories: | |
  • As the manufacturing technologies of VLSI progresses, HW architects are constantly looking for ways to improve overall performance of the CPU. In the past, many small scale architecture improvements, as well as pipelines, and other methods were used to improve performance. Other methods were increasing clock frequency and the width of data-bus, from 16 bit to 32, 64 and higher. As the manufacturing processes become more and more dense,  and...
    Categories: | |
  • The RSA algorithm stood out among asymmetric encryption systems as a conceptually simple and practical encryption and authentication method which provides a near perfect level of security. Public-key cryptographic systems, such as the RSA often involve modular exponentiation (Z = Ye mod n). This widely used and computational complex operation is  performed using successive modular multiplications (C = AB mod n). The performance of such cryptosystems is primarily determined by...
    Categories: | |
  • The goal of this project is to design an algorithm to detect and correct such errors. The scheme relies on a coding technique that incorporates the side information of fast detrapping during the encoding stage. The implementation includes matlab modeling, spec and architecture definition, logic design using the Verilog HDL, verification and synthesis.
    Categories: |
    Tags:
  • The goal of this project is to develop algorithms for performance enhancement/cost reduction and implement it on HDL for related memory controller. The implementation includes matlab modeling, spec and architecture definition, logic design using the Verilog HDL, verification and synthesis. The emphasis of this project will be on low latency of the design.
    Categories: |
    Tags:
  • Description: In the field of cryptanalysis the tools utilized to recover the secret information are very different from the ones utilized to build the cipher. For the most part cryptanalysis is based on probabilistic Bayesian techniques. In this method some information leaked from the system is exploited in order to derive a slight probability advantage of one code over another. Accordingly, after a sufficient number of ciphertext messages are analyzed,...
    Categories: |
    Tags:
  • Description: The project is an OpenSPARC T1-based SoC which includes: – Full or reduced OpenSPARC T1 CPU core – OpenSPARC FPU – Bridge to connect the CPU and FPU to the Whisbone bus – Nor flash controller – UART – OpenCores ethernet controller – Bridges from Whishbone to Altera and Xilinx DRAM controllers The goal of this project is to perform the complete backend design of a OpenSPARC T1 microprocessor...
    Categories: |
    Tags: