As the manufacturing technologies of VLSI progresses, HW architects are constantly looking for ways to improve overall performance of the CPU. In the past, many small scale architecture improvements, as well as pipelines, and other methods were used to improve performance. Other methods were increasing clock frequency and the width of data-bus, from 16 bit to 32, 64 and higher.
As the manufacturing processes become more and more dense, and clock frequencies have almost reached a limit of usability [due to wave length, and design tools limitations], the road to improve performance is changing. Also other methods have reached the wall of maximum improvements.
One of the biggest challenges for running a CPU in high frequencies is dissipating the heat that is generated and supplying the currents that are required.
One of the recent methods to drive performance up, that is mostly common today, is combining multiple cores on a single chip. By placing multiple cores, the designers gain several advantages:
1. Since the device size is reduced, designer can place more devices on the same die area, while keeping the die-size at cost-effective range
2. Most OS's today run multiple procedures in parallel, so splitting the different processes between multi-cores enable each one to run independently, w/o being interrupted. This saves the overhead cost of task switching.
3. SW developers can build their frameworks to run independent parts on different cores, and boost execution times.
4. The cores can share data on the caches much more efficiently, as it does not need to go out of the chip, and it can run at full clock speed.
In this project the students will design a quad-core CPU RISC, which has internal cache, where all the cores take advantage of the single cache. All the cores are identical, and can access the data on the cache. There is a mechanism to fetch data from external memory, with a wide BW bus, such that most of the times, the cores are not stalled by a cache miss. It is required to build a mechanism to share data on the cache and keep it coherent. Cache write can use write-through method, to keep the main memory in synch with the cache. The implementation will be done is Systemverilog.
Prerequisites: Logic Design, CPU Architecture