CS 208: Computer Organization & Architecture

Takehome Exam #3

Submit as a PDF file via Moodle by 5:00PM Monday, March 16.

This exam doesn't quite fit the "short answers all on one page" model, so structure your answers however you wish, but please make an effort to keep your answers and explanations brief. There's nothing here that requires a lengthy explanation.

You may consult your notes, any book, and the Internet. You may not speak with any person other than Jeff Ondich, electronically or otherwise, about the content of this exam. If you obtain relevant information from any source other than yourself, cite your sources clearly.

  1. (12 points) Consider the following caches. Assume that addresses are 32 bits long. Also, as is our custom this term, assume that a "word" consists of 4 bytes.

    • An 8-row direct-mapped cache with 8-word blocks.

    • An 8-row 2-way set associative cache with 4-word blocks, using a "least recently used" strategy for choosing which block to evict.

    For each of these caches, answer the following questions.

    1. Draw a diagram of the cache. How many bits of memory does it use? (Please do not include or concern yourself with showing the memory required to implement the least-recently-used eviction strategy.)

    2. Suppose the byte located at each of the following addresses is requested in the indicated order:

      47, 645, 656, 48, 130, 655, 995, 49, 655, 128, 1030, 560, 1170, 49, 130

      Once this sequence of requests is finished:

      • Show a diagram of the contents of the cache. This diagram should show the valid, tag, and data values for each cache entry. For the data, just show the addresses of the bytes that are stored. For example, if a cache entry contains the data from byte addresses 64 through 95, just write "[64-95]" in the data field (since you have no way of knowing what data is actually stored in those addresses).
      • Report the hit ratio for this sequence of requests.

      Notes: (1) These addresses are byte addresses--so for example, byte 19 is contained in the word starting at address 16. (2) These hit ratios are very low, both because the caches start out empty, and because I have made no effort to make my numbers adhere to the principles of spacial or temporal locality.

  2. (6 points) Suppose that you are on a chip design team, and the people designing the cache system have done simulations of two cache layouts, yielding the following data:

    • Option A includes:
      • An L1 cache that takes 1 clock cycle for a hit. The L1's hit rate is 76%.
      • An L2 cache that takes an additional 2 clock cycles (for a total of 3) when there's a miss on L1 and a hit on L2. The L2's hit rate is 86% under the assumption that the L1 cache missed.
      • RAM, which takes an additional 12 clock cycles (for a total of 15) when both the L1 and L2 caches miss.
    • Option B includes:
      • An L1 cache that takes 1 clock cycle for a hit. The L1's hit rate is 68%.
      • An L2 cache that takes an additional 2 clock cycles (for a total of 3) when there's a miss on L1 and a hit on L2. The L2 cache's hit rate is 85% under the assumption that the L1 cache missed.
      • An L3 cache that takes an additional 4 clock cycles (for a total of 7) when there are misses on L1 and L2 and a hit on L3. The L3's hit rate is 97% under the assumption that both L1 and L2 missed.
      • RAM, which takes an additional 8 clock cycles (for a total of 15) when the L1, L2, and L3 caches all miss.

    You may assume that the choice between Options A and B has no effect on the clock rate or the performance of any of the other portions of the chip. (That's an unrealistic simplifying assumption, of course, but go ahead and assume it anyway.)

    Which option will you choose, and why? (Note that all of the points for this problem will be for the "why" and not the "which".)

  3. (12 points) This part of the exam concerns the first few pages of John von Neumann's First Draft of a Report on the EDVAC. Read it and answer the following questions. For the first three questions, which concern the historical context of this report, you may need to look elsewhere for information.

    1. Who besides von Neumann deserves credit for this report?
    2. What role did this report play in invalidating the ENIAC patent?
    3. During the 1940's, several women were heavily involved in the development and operation of the ENIAC. Name at least two of them, and briefly describe their roles in the project.
    4. Throughout the Draft Report, von Neumann refers to the main subdivisions of a computing system as CA, CC, I, O, R, and M. To what modern concepts do these six subdivisions refer?
    5. In section 1.3, von Neumann refers to "more numerical material...than the (final) results." In a modern computer, where does this "more numerical material" get stored?
    6. What does von Neumann means by the term elastic in section 2.3? He also advocates separating some things in the interests of elasticity. What does he want to separate, and how does this separation serve the goals of elasticity?
    7. Section 2.9 is about questions addressed in one of the chapters in Patterson and Hennessy. Which chapter?
    8. List briefly von Neumann's arguments in favor of binary rather than decimal computation. Are these arguments still valid?
    9. Why does von Neumann like vacuum tubes?
    10. Consider the last paragraph of section 5.6. In what ways is von Neumann's advice still valid, and in what ways is it not?
  4. (10 points) Let's take one more stroll down memory lane to visit our non-pipelined datapath Fig 4.17. As we have noted all along, there's one big unrealistic aspect of this diagram: the separation of Instruction Memory from Data Memory. Of course, you could build a system that works with this separation, but for the reasons outlined by von Neumann in section 2.5 of his Draft, we don't do that. Modern general-purpose computer systems put instructions and data into the same memory module.

    For this problem, your job is to show how to modify Fig 4.17 and its operation to accommodate a combined Instruction Memory / Data Memory. You should assume that each instruction takes two clock cycles. During the first clock cycle, the instruction is loaded from the shared memory into an Instruction Register, and during the second cycle, the instruction is executed as usual.

    Hand in a modified or redrawn version of Fig 4.17 showing the reorganization required to implement this two-cycle / one-memory datapath. You may add any new logic elements you need, but try to keep your modifications of 4.17 as simple and minimal as possible. You should also make clear how the control needs to be arranged to make the two-cycle process work.

  5. (2 points) No need for any more recommendations of books or music or jokes or movies or anything. Thanks for joining me this term. Have a great break.