CS 208: Computer Organization and Architecture

Takehome exam: due 3:00PM Wednesday, June 4

This is a an exam. You may consult your notes, any book, and the Internet. You may not speak with any person other than Jeff Ondich, electronically or otherwise, about the content of this exam. If you obtain relevant information from any source other than yourself, cite your sources clearly.

Justify your answers. (Note that "justify your answers" implies "show your work.") Have fun.

(12 points) Consider the following caches, each of which can hold up to one hundred twenty-eight 32-bit words of data. Assume that addresses are 32 bits long.
- A 32-entry direct-mapped cache with 4-word blocks.
- An 8-set 2-way set associative cache with 8-word blocks, using a "least recently used" strategy for choosing which block to evict.
For each of these caches, answer the following questions.
1. Draw a diagram of the cache. How many bits of memory does it use? (Please do not include or concern yourself with showing the memory required to implement the least-recently-used eviction strategy.)
2. Suppose the byte located at each of the following addresses is requested in the indicated order:
  
  47, 645, 660, 48, 130, 659, 995, 49, 655, 128, 1030, 560, 1170, 49, 130
  
  Once this sequence of requests is finished:
  - Show a diagram of the contents of the cache. This diagram should show the valid, tag, and data values for each cache entry. For the data, just show the addresses of the bytes that are stored. For example, if a cache entry contains the data from byte addresses 64 through 79, just write "[64-79]" in the data field (since you have no way of knowing what data is actually stored in those addresses).
  - Report the hit ratio for this sequence of requests.
  Notes: (1) These addresses are byte addresses--so byte 3 is contained in the word starting at address 0. (2) These hit ratios are very low, both because the caches start out empty, and because I have made no effort to make my numbers adhere to the principles of spacial or temporal locality.
(5 points) Suppose that you are on a chip design team, and the people designing the cache system have done simulations of two cache layouts, yielding the following data:
- Option A: an L1 cache that takes 1 clock cycle for a hit, an L2 cache that takes an additional 2 clock cycles (for a total of 3) when there's a miss on L1 and a hit on L2, and RAM, which takes an additional 12 clock cycles (for a total of 15) when both the L1 and L2 caches miss. The L1's hit rate is 73%, and the L2's hit rate is 92%.
- Option B: an L1 cache that takes 1 clock cycle for a hit, an L2 cache that takes an additional 2 clock cycles (for a total of 3) when there's a miss on L1 and a hit on L2, an L3 cache that takes an additional 4 clock cycles (for a total of 7) and RAM, which takes an additional 8 clock cycles (for a total of 15) when both the L1 and L2 caches miss. This L1's hit rate is 65%, the L2's is 87%, and the L3's is 96%.
You may assume that the choice between Options A and B has no effect on the clock rate or the performance of any of the other portions of the chip. (That's an unrealistic simplifying assumption, of course, but go ahead and assume it anyway.)

Which option will you choose, and why? (Note that all of the points for this problem will be for the "why" and not the "which".)
(12 points) Suppose you are running a MIPS program that has the following sequence of instructions currently in the pipelined datapath of Figure 4.56:

lw $t1, 44($sp) [currently in the MEM/WB pipeline register] sub $t2, $a0, $a1 [in EX/MEM] add $t3, $t1, $t2 [in ID/EX] addi $t3, $t3, -1234 [in IF/ID]

Your job, as in problem #1 on the previous exam, is to show the values of as many of the lines in the datapath as you can, given the information you have.

Assume:
- The lw instruction is located at address 0x00A00000.
- Each register K has value $K = 4*K, except for sp, which has value $sp = 0x20003000. Also, assume that these values have been unchanged during the time it has taken the lw instruction to get to the MEM/WB register.
- In data memory, the word at address N (assuming N is a multiple of 4) has value N - 5. Again, assume these values have remained unchanged during the time it has taken the lw instruction to get to the MEM/WB register.
- Multiplexer choices are always numbered from 0 at the top. For example, each of the 3-input multiplexers in the EX portion of the datapath has choice 0 at the top, 1 in the middle, and 2 at the bottom.
(15 points) This part of the exam concerns the first few pages of John von Neumann's First Draft of a Report on the EDVAC. Read it and answer the following questions. For the first three questions, which concern the historical context of this report, you may need to look elsewhere for information.
1. In an earlier edition of your textbook, Patterson and Hennessy note that von Neumann's report report was one of several things that attorneys used to break the Mauchly/Eckert patent on the computer. This point is also made on the brief Wikipedia page about the EDVAC report. What besides this report was relevant in the patent case?
2. Who besides von Neumann deserves credit for this report?
3. During the 1940's, several women were heavily involved in the development and operation of the ENIAC. Name at least two of them, and briefly describe their roles in the project.
4. Throughout the Draft Report, von Neumann refers to the main subdivisions of a computing system as CA, CC, I, O, R, and M. To what modern concepts do these six subdivisions refer?
5. What does von Neumann means by the term elastic in section 2.3? He also advocates separating some things in the interests of elasticity. What does he want to separate, and how does this separation serve the goals of elasticity?
6. There is a sentence in section 2.9 that begins with "As to (h) (sorting and statistics), the situation is somewhat ambiguous...." This sentence hints at the subject of one of the chapters in Patterson and Hennessy. Which chapter?
7. Clarify the first paragraph of section 4.1. You might find it handy to include an example or two using modern gate language.
8. Summarize von Neumann's arguments in favor of binary rather than decimal computation. Are these arguments still valid?
9. Why does von Neumann like vacuum tubes? (Vacuum tubes, by the way, are devices that can be used to perform switching operations like those done by transistors. Thus, you could build gates out of vacuum tubes. Vacuum tubes tended to burn out like light bulbs do, so it was common for drug stores and hardware stores to sell them. When I was a kid, our neighborhood drug store had a kiosk where you could bring in a burnt-out tube and stick it into the plugs on the kiosk to figure out which replacement tube would work. This was necessary if you wanted your TV to keep working.)
10. Consider the last paragraph of section 5.6. In what ways is von Neumann's advice still valid, and in what ways is it not?
(16 points) Suppose you have a small machine/assembly-language architecture consisting of:
- An instruction memory (IM) containing your program. It consists of an array of 20-bit instructions, which addressed consecutively. (That is, the first instruction is at address 0, the next is at address 1, etc. No multiples of 4 required here.)
- A 16-bit program counter (PC).
- Data memory (DM) consisting of a sequence of 16-bit integers. Again, these are addressed consecutively--bytes don't have their own addresses, so for example, the third 16-bit word in memory is located at address 2, not address 4.
- A 16-bit stack pointer (SP). At the start of execution, SP = 0. A push operation will put a value at DM[SP], and then increment SP by 1. A pop operation will decrement SP by 1. That is, the stack in DM grows from address 0 towards higher addresses.
- A an input device (I) that allows the user to specify a single 16-bit integer.
- An output device (O) that shows the value of DM[SP], that is, the integer in data memory pointed to by the stack pointer.
Each instruction has the format:

op code [4 bits] immediate data N [16-bit two's complement]]

The op-codes are as follows.

Op-code 0000: LOAD. Pushes the value of I onto the stack. Op-code 0001: PUSH N. Pushes the value N onto the stack. Op-code 0010: POP. Decrements SP by 1. Op-code 0011: ADD. Pops the top two items from the stack and then pushes their sum onto the stack. Op-code 0100: SUB. Pops the top two items from the stack (A, then B) and then pushes A - B onto the stack. Op-code 0101: BR N. Adds N to the PC. Op-code 0110: BEQ N. If the top two items on the stack are equal, PC = PC + N Op-code 0111: BLT N. If the top item on the stack is strictly less than the second item, PC = PC + N Op-code 1000: HLT. Halts execution. Op-code 1001: GET N. Pushes a copy of DM[SP - N] onto the stack (without removing the original). Op-code 1010: PUT N. Copies the stack's top item to DM[SP - N].

I recommend that you use a module like Fig 4.17's Registers for your DM. That is, you'll want something that has two separate Read Address lines and a Write Address line that's independent of the Read Addresses. You need not show the internal structure of such a module.

Here are your jobs.
1. Write a program in this language that will (yes, again) compute the Nth Fibonacci number, where N is specified in the I input device. (Index the Fibonacci numbers starting at 0, so F0 = 1, F1 = 1, F2 = 2, F3 = 3, F4 = 5, etc.)
2. Design a datapath and control to implement this machine architecture. Hand in a drawing of the datapath, a chart showing how op-codes would map to control values, and any explanations you deem appropriate.
(3 points) You don't have to recommend any books or tell me jokes or anything. Just have a great break and beyond.