CS152: Computer Architecture

Prof. Bob Brodersen

Lab Assignment #6
Final Report: Due Tuesday 04/23/2002
In-Class Oral presentation: TBA


Lab assignments are due at 11:59pm on the due date. No late labs will be accepted. Submit all the relevant files electronically. Make sure there is a HTML report file named "report.htm". After the assignment is graded, your TA will create in the directory a file named "score.htm" reporting the score and a short comment.


Description:

In this lab, you will be designing a memory system for your pipelined processor. The previous memory module was far from practical, as you will probably never have separate, dedicated memory banks for instructions and data. Using a realistic main memory system will cause two problems in your pipelined processor: First, your cycle time will dramatically increase as a result of the main memory write and read latency, and second, you must handle conflicts when both data and instruction accesses occur in the same clock cycle. As you most likely have learned in lecture and in the book, the solution to these problems is the addition of cache memory. THIS LAB CAN BE EXTREMELY DIFFICULT, SO AN EARLY START WOULD BE A VERY GOOD IDEA. Also make sure you read through the whole lab before starting to do any design/implementation.

Problem 0: Team Evaluation

As before, you will be evaluating your team members' performance on the last lab assignment (Lab 5). Remember that points are not earned only by doing work, but by how well you work in a team. So if one person does all the work, that certainly does not mean he/she should get all the points!

You may give a total of 20 points per person in the group (besides yourself), but do not give any more than 30 points to one person. Submit your evaluations to your TA by email as soon as possible (no later than than the Friday after this lab was assigned).

Problem 1: Adding Cache to Your Processor

Note: Read these instructions carefully. They contain all the information you will need to write your report for this lab, including some requirements not directly related to the cache and memory additions!

The first step in creating a new memory system is changing the model for our main off-chip memory bank (hereafter called the ZBT SRAM). The new (Zero Bus Turnaround) ZBT SRAM is very different from the on-chip (on-FPGA) memory you have been using. Please refer to the data sheet included in the package below for further information about how ZBT SRAM works, and all timing specs. To include the ZBT SRAM module into your own design, follow the instructions below:

  1. Download the ZBT SRAM package (ZIP) to your local directory. The package contains the datasheet in PDF format. The main VHDL file is named MT55L256L32P.VHD. This is the behavior model of the ZBT SRAM, do NOT try to synthesize it. Also included in the package is an example testbench for using the ZBT SRAM, test.vhd, as well as the DO file for use with ModelSim.
  2. Make sure you read the datasheet carefully. Run the test.do in ModelSim to understand how the ZBT SRAM works.
  3. The ZBT SRAM VHDL file should be only connected through the System Generator created VHDL testbench. In the Simulink design, create the I/O gateway to connect to ZBT SRAM. Then manually edit the VHDL testbench created by the System Generator to connect the ports to the ZBT SRAM VHDL module. Remember each of the Simulink Xilinx gateway ports has a valid bit in VHDL file. For output ports from the processor, the valid bits can be ignored; for input ports to the processor, the valid bits should be connected to logic '1'.
  4. Since the contents of the ZBT SRAM will be un-initialized, a boot ROM will be necessary to boot the processor properly. The boot ROM is implemented with an on-FPGA 1K deep 32bit wide ROM, with logic address mapped to processor address 0x0. The off-chip ZBT SRAM logic address is mapped to 0xFC000000, and the STDOUT is mapped to 0xFFFF0000. The Boot ROM should only contain codes to copy a portion of the boot ROM to the start of the off-chip ZBT SRAM, then jump to the beginning of the off-chip ZBT SRAM. In this way, the copied portion of the ROM will be executed from the ZBT SRAM.
  5. Make sure the whole memory system contains only the boot ROM, off-chip ZBT SRAM, and STDOUT.

The next step is to design your cache. The only constraint on your design is that the total number of 32-bit words (excluding tag and status bits) in the two caches (data and instruction combines) has to be exactly 48. This means you can use them in any architecture and allocate them between instruction and data any way you like.

Once your cache is designed, you must test it thoroughly before moving onto the next part. You should also begin to evaluate how the addition of cache affected your critical path. Evidence of testing the cache by itself and a very detailed analysis of your critical path will be required in your report.

Problem 2: Adding a Single ZBT SRAM and Arbitration to Your Processor

After you have your cache ready to go, there is one more problem that needs to be fixed. Both the data and instruction caches will eventually need to access the ZBT SRAM at the same time. You must to design an arbitration method for handling simultaneous memory requests. Depending on your cache timing and how efficient you try to be, this can be the most difficult part of the lab. You can choose one of the following three arbitration schemes:

  1. One controller: controls both cache, arbitration between cache and ZBT SRAM, and ZBT SRAM access. This controller has delay of 15ns.
  2. Two controllers: each cache has its own controller with a delay of 6ns, and both arbitration and ZBT SRAM access are combined into one controller with a delay of 9ns.
  3. Three controllers: each cache has its own controller with a delay of 6ns, arbitration has its own controller with a delay of 3ns, and ZBT SRAM access has its own controller with a delay of 6ns.

It's entirely up to you which scheme you choose. There is no one scheme that is better than the others -- the right choice is a function of the structure of your processor and your design style.

Problem 3: In-class Oral Presentation

There will be a formal in-class oral presentation of Lab 6 one week after it is due. Each group has 10 minutes. The final results of the memory subsystem competition will also be unveiled at that time. All members of the group should present an equal amount of material. More details will be presented in lecture.

Wrap-up:

Your lab report must contain a description of (a) how your cache operates, (b) how you handle ZBT SRAM request arbitration, (c) how the addition of cache and main memory affected your critical path. Somewhere in the report, you must calculate and describe the average memory access time (AMAT) of your processor for the quick sort program. In addition, you should include legible schematics, all relevant VHDL code, all diagnostic programs (in assembly language). Also record down all performance metrics, (area, speed, power), and compare them with Lab 5 results.

Extra Credit: Memory Subsystem Performance Competition

In order to get full credit, you must complete Problems 1-3 such that you have a working processor with caches and a unified ZBT SRAM main memory (and of course you must present your results according to the guidelines of the report). However, there are up to 5 extra credit points up for grabs if you want to try to design the absolute fastest memory system possible for the quick sort benchmark. The assembly code for this program is available to you in the V:\cs152\mystery directory.

The five fastest projects in the class will receive 1-5 points of extra credit based on their rank in the competition. The only restriction on what you can do to accomplish your goal is that your caches may not have a combined capacity of more than 48 words (not including tag and status bits). Otherwise, there are no holds barred. The competition will be judged based a performance ratio that purely measures the memory subsystem. In other words, if you have some ugly baggage from Lab 5 (like a severe branch or hazard penalty), we will exclude that from the results.

Your group should meet early to make a design decision of the organization of the cache, memory interface, etc. Also keep in mind that ONLY WORKING PROJECTS CAN RECEIVE ANY EXTRA CREDIT. That said, here are some of the design decisions you should make:

  1. Instruction/data cache size, how should you split the 48 words between the two cache?
  2. What's the cache line size?
  3. For each cache, what's the cache organization? Direct-mapped, n-way set associative, or fully associative?
  4. If an associative cache is used, what's the replacement policy? Random, LRU, cyclical, etc?
  5. For the data cache, what's the write policy? Write through or write back? Write allocate?
  6. Should there be a write buffer?
  7. Should there be prefetching?

As you might notice from the last two items on the list, there are a lot of sophisticated improvements that can be made to the memory system. Feel free to talk to any of the TAs about what tricks you can play with memory. You also may want to review the lectures for some advanced memory architectures that may have slipped by before. The results of the competition will be announced on the day of in-class presentations.