CS152: Computer Architecture

Final Project

Prof. Bob Brodersen

Final Report and presentation slides 11:59pm 5/15/2002
Final Presentation 5/16/2002, at the BWRC 10:00-11:40 AM
Functionality Check-off TBA

Team Evaluations on final project

As before, you will be evaluating your team members' performance.  Remember that points are not earned only by doing work, but by how well you work as a team. So if one person does all the work and makes it difficult for others to help out, that certainly does not mean he/she should get all the points!

Split up a total of 100 points between the members of your group (including yourself!). Submit your evaluations to your TA  before 5/15, 11:59 pm.


Description:

This is the final lab for the course project, for which you will make various enhancements to your processor to enable execution of a DSP algorithm at high performance with minimum energy consumption.

This lab will be graded based on the functionality and energy consumption of your design. A functional processor is one that can execute the given DSP benchmark program correctly and within the given time specs.  We can't stress this enough -- if your processor does not work, your grade will be certainly reflect this.  Please note that all the features of Lab 5 must also be functional  in this lab. You don't need to implement a cache, but still use the ZBT RAM of Lab 6 as your memory. 

Please take a good look at the due dates for this assignment. You will send your final report and a copy of your slides to the staff mailing list (cs152@cory.eecs.berkeley.edu) by 11:59PM on 5/15 (the night before the presentations).  Be sure to meet this submittal time and get some sleep so you can do a good oral presentation! . Submit all the relevant files electronically as before, and make sure there is an HTML report file named "report.htm". In class, you will give a 15-minute oral presentation of your complete design as a group- each person should talk for 5 minutes and we will have 5 minutes of questions . 

Grading:

The grade will have 3 parts: 50% on your basic design which includes functionality and what tradeoffs you explored. 25% will be on your final report, how it  explains what you did and why. 15% on your final oral presentation and 10% on how low a power level you achieved in your design. If your design doesn't meet the timing specification you won't receive any points for energy minimization.

All project related files are in the directory v:/cs152/project 

Step 1: Enable DSP support in your datapath

As discussed often in lecture, digital signal processing applications have very different characteristics than traditional user-based general purpose programs. One of the most important differences is that most DSP applications have a fixed real-time requirement, such that the process absolutely must run in a certain amount of time to be useful, and it does no good at all to run any faster. It is this characteristic that allows us to scale voltage and reduce power/energy consumption without "sacrificing performance" (since it does no good to go faster than the requirement).

In this project, you will be running a very simple DSP benchmark using one of the most common kernels: the FIR filter operation. An FIR (finite impulse response) filter is used extensively in all types of signal processing, such as pass/stop filters, equalizers, and echo cancellers to name a few. For more information on FIR filters, look at this FIR filter document.

There is only one DSP instruction you need to implement to support our benchmark:

mac $rd, $rs, $rt => $rd = $rd + $rs[15:0] * $rt[15:0]

MAC (which stands for multiply and accumulate) is an R-type instruction, with opcode = "000000" and  function code = "000101".

You will need a 16-bit multiplier to implement the instruction. You can choose from the following options

Implement the MAC unit and integrate it into your processor. You don't need to worry too much about your choice at this point. You'll have a chance to improve performance and energy later (remember, get it working first, and then optimize!).

Step 3: DSP Benchmark of your initial processor

Now you are now ready to run the DSP benchmark for an initial assessment of your processor's performance and energy consumption. As mentioned above and in the FIR background document, the DSP benchmark program we use is a 21-tap FIR high-pass filter.

The assembly file of the FIR filter is named "dsp_bench_spim.s" Since SPIM does not have a MAC instruction, the MAC instruction has been translated into three instructions. However, the assembly code you are going to run on your process is named "dsp_bench.s", which has the MAC instruction. The instruction memory file has already been translated as "dsp_bench.mem". You do not need to run mipsasm on "dsp_bench.s" again.

Run the DSP benchmark program in ModelSim with post place & routing VHDL simualtion. You must calculate and record the total execution time of the benchmark, the total energy consumed by the program, and the average power consumption.

Step 4: Make your design meet the performance spec and consume minimal power.

DSP Bench Mark Program Maximum Execution Time: 1.5ms

You probably will notice that your initial processor not only does not meet the performance requirement, and also consumes lots of energy. In this final and important step you are going to make advanced enhancements to your processor such that it can meet the performance spec and consume as little energy as possible. Here are some tips for lower energy consumption:

Supply Voltage (V) Clock Rate Multiplier
1.8 1
1.7 0.9
1.6 0.8
1.5 0.7
1.4 0.6
1.3 0.5
1.2 0.4
1.1 0.3
1.0 0.2
0.9 0.07

There are many ways to improve the performance of a processor, each of which affects a certain class of operations. Here we have a reasonable list of improvements you can make to your processor. Keep in mind that some of these designs will result in a huge increase in power consumption, but will allow you to scale down the voltage significantly and still meet the timing spec.

You are welcome and encouraged to come up with your own idea for improving performance and lowering energy. However, make sure you talk to your TA ahead of time. A complex but non-functional processor will really hurt your grade.

Report and Presentation

As indicated above, the project report and the slides for your presentation  are due the night before the presentations. 

The final report and presentation should be a lot more formal than the previous labs. You should imagine that you are presenting a design review to the technical director of your company (you don't need to dress like it, though). Functionality check-off will be on the same day, and your processor must work completely. The time of the check-off will be posted.

In addition to all your performance results and analyses, we would like to see you describe why you made choices the way you did, as well as detailed explanations on how you made your improvements and what your challenges were. The presentation should be 10-15 slides summarizing what improvements you made, what results you achieved,  and what you learned or what surprised you. This will be broken up into three 5 minute talks, by each of the members of the team. Since there is only one grade be sure that all of your team does a good job!