Assignment 6 - The Descent to Assembly

Due: Thursday, October 17, at 10:00pm

Starter code: a6.tar
Upload solutions via Gradescope as: puzzle1.asm through puzzle7.asm and one puzzleX.c file (see below)

Goals

This assignment is designed to help you with the following:

  • identifying patterns representing common structures from higher-level languages
  • tracing possible execution flows through assembly code

Collaboration policy

For this assignment, you may work alone or with a partner, but you must type up all of the answers yourself. (It is therefore unexpected for two submissions to be completely identical.)

You may also discuss the assignment at a high level with other students.

You should list any student with whom you discussed the assignment, and the manner of discussion (high level, partner, etc.) in comments at the top of your .asm or .c source file(s).

If you work alone, you should say so instead.

Assessment

To demonstrate proficiency, your submission needs to:

  • identify the structures used in each puzzle program
  • identify control flow after jump/call instructions
  • identify parameter counts and sizes

To demonstrate mastery, your submission needs to:

  • demonstrate proficiency
  • identify possible control-flow paths through the puzzle
  • reverse-engineer the C code to generate the assembly for one of the puzzles
  • include your name and collaboration statement at the top of your .asm and C source files

Background

What does a compiler do?

That question has a long, complicated answer. In brief, though, our compiler (gcc) takes C source files as input and produces an executable program as output. The executable program contains, among other things, machine language instructions whose behavior implements the computations articulated in the original C code.

Machine language is just bits, like anything else in the computer, but this means it’s hard to read. So, if we want to understand the correspondence between C code and its equivalent machine language, we’re better off asking gcc to output assembly language code instead. Assembly isn’t particularly easy to read either, but it’s a lot easier than machine language.

As a general rule, each assembly language instruction corresponds to exactly one machine language instruction, and vice versa. There are some exceptions (e.g., sometimes one assembly language instruction is an alias for a sequence of two or three machine language instructions), but as a rough guide, you can think of assembly and machine code as being in one-to-one correspondence. As a result, by understanding the assembly generated by gcc, we will be very close to understanding the machine code as well.

Assignment overview

For this assignment, you are going to practice understanding the correspondence between simple C code and its equivalent assembly language by solving a sequence of puzzles.

To test your knowledge, you are encouraged (although mostly not required to) try to “reverse engineer” the corresponding C code Although we could use gcc on mantis, we will start by using an extremely handy tool called the Compiler Explorer. You’ll put some C code into the input panel, and the output panel will show you the assembly generated by the selected compiler. As you adjust your C code, you’ll be able to watch the changes in the assembly, and then compare your assembly code to the puzzle’s code.

Your assignment

This assignment comprises seven puzzles, given in assembly. For each puzzle, you will read some given assembly and answer some questions about the corresponding C program.

Optionally, you will also need to reverse engineer one of the puzzle programs.

Proficiency

To demonstrate proficiency, you must answer some questions as comments in the assembly code for each puzzle. To answer these questions, you’ll need to complete the following:

  1. Fill in the comment at the top of the assembly file, indicating which structure(s) are present in the puzzle:
    • conditional branching (if, if/else)
    • switch statement
    • loop (for, while, do)
    • nested loop
    • function call
    • recursive function call
  2. For each # Next: TODO marked in the inline comments, provide the label(s) of the instruction(s) that may be executed immediately after the instruction on the TODO line.
  3. Fill in the comment at the top of the assembly file, listing the registers that are used for each function call’s parameters and the size (in bits) of each register.

Mastery

To demonstrate mastery, you will additionally need to explore the control flow of each puzzle. In a comment at the top of the assembly file, list three possible orders that labels may be executed in during a single invocation of the function function1.

The final piece of this assignment is to reverse engineer one program. For a puzzle of your choice in the given puzzles (puzzle1–puzzle7), create a C program that produces the assembly output provided. (Hint: You will likely find the Compiler Explorer helpful for this part.)

Note: Don’t forget to check the list at the top of this page to make sure you have everything you need (e.g., collaboration statement).

Getting started

You should refer to Lab 3 if you have any issues or questions about getting Compiler Explorer set up.

We will solve puzzle0.asm together in class on Monday October 14th. Ask any questions you have about this process!

In working through these puzzles, I encourage you to try out tons of simple C programs, for example ones with if statements, with for/while loops, and any other constructs you can think of, to see what they look like in assembly. Don’t be afraid to experiment!

Handing it in

To submit this assignment, you should upload your seven commented puzzleX.asm files to Gradescope.

If you are aiming to demonstrate mastery, make sure to include a puzzleX.c file with the appropriate name to match its puzzle (for example, submit puzzle2.c if you reverse-engineered puzzle2.asm).

Advice

  • Don’t try to understand every line of assembly! There are a lot of instructions, and this assignment is really just about the structure. Look at the jumps (jmp, jl, je, and any other that start with j) function calls (call), returns (ret), and labels (.L2:, start_here:).

  • Ask questions, and go in order!