Assembly to C
Package: asm-and-c-package.tar
Upload via Moodle as: asm-and-c.tarNOTE: this assignment has two due dates. For Friday, Oct 17, please submit work for any three of the puzzles plus one reverse-engineered puzzle. For Wednesday, Oct 22, submit solutions for the whole thing (including whatever you submitted Friday.
You may work alone or with one other person for this assignment. I encourage you to collaborate on this one; it's much more fun to talk to each other and to have friends to share your befuddlement.
Goals
- Get familiar with x86_64 assembly language basics
- Identify assembly language patterns representing common structures from higher-level languages
- Trace possible execution flows through assembly code
Rubric
Background
What does a compiler do?
That question has a long, complicated answer. But in brief, our compiler (gcc)
takes C sources as input, and produces an executable program as output.
The executable program contains, among other things, machine language
instructions whose behavior implements the computations articulated
in the original C code.
Machine language is just bits, like anything else in the computer, but that makes
it hard to read. So if we want to understand the
correspondence between C code and its corresponding machine language,
we're better off asking gcc to output assembly language code instead. Assembly isn't
particularly easy to read either, but it's a lot easier than machine language.
As a general rule, each assembly language instruction corresponds to exactly
one machine language instruction, and vice versa.
There are some exceptions (e.g., sometimes one assembly language instruction
is an alias for a sequence of two or three machine language instructions),
but as a rough guide, you can think of assembly and machine language instructions
as being in one-to-one correspondence. As a result, by understanding the
assembly language generated by gcc, we will be very close to understanding the
machine language as well.
Assignment overview
For this assignment, you are going to practice understanding the correspondence between simple C code and its equivalent assembly language by studying a sequence of puzzles.
For each puzzle, you will read some assembly language and try to identify the assembly structures that correspond to C-language structures like loops, if/else statements, function calls, etc. For a couple of the puzzles, you will also try to write C code that compiles to the puzzle's assembly code. Overall, for this assignment, you will be doing a simple form of reverse engineering.
To help us study assembly code, we will use an extremely handy tool called the Compiler Explorer. You'll put some C code into the input panel, and the output panel will show you the assembly language generated by the selected compiler. As you adjust your C code, you'll be able to watch the changes in the assembly language, and then compare your assembly code to the puzzle's code.
What you should do?
In the asm-and-c-package.tar package,
you will find several files named puzzle0.asm, puzzle1.asm, etc. For each
puzzle, your job will go like this:
- Study
puzzleN.asmto understand what it does. Although it's good to try to understand it line-by-line, you will also want to understand the puzzle holistically. Each puzzle's code performs a computationally familiar task, and can be described in a single short sentence. - Fill in the comment at the top of the
puzzleN.asmfile, indicating which structures are present in the puzzle:- conditional branching (if, if/else)
- switch statement
- loop (for, while, do-while)
- nested loop
- function call
- recursive function call
- For each
# Next: TODOmarked in the inline comments, provide the label(s) of the instruction(s) that could possibly be executed immediately after the instruction on the TODO line. - Fill in the
Possible ordercomments at the top of the assembly file, giving three possible orders in which the labels may be executed in during a single invocation of the functionfunction1. - Fill in the
Registers and sizescomment at the top of the assembly file, listing the registers used for the parameters passed tofunction1, and explicitly indicating the sizes of the registers. For example, if the only parameter tofunction1is found in%rdiand it is always referenced as%edi, you’d write something like "%edi (4 bytes)". If there are two parameters, with the first being an 8-byte parameter and the second being a 2-byte parameter, you’d write "%rdi (8 bytes)" and "%si (2 bytes)". - For any two puzzles of your choice (not including
puzzle0, which we will work through in class), write a C filepuzzleN.ccontaining a function that, when entered into Compiler Explorer, produces the assembly language code found inpuzzleN.asm. At the top ofpuzzleN.c, include a comment containing a one-sentence description of the purpose offunction1.
NOTE: when you are writing your two puzzleN.c files, you may find that
some of the labels in the puzzleN.asm will not
show up in the assembly generated by Compiler Explorer. For example, puzzle4.asm
has labels CA, CB, CC, and CD, all
of which were added by the CS faculty after compiling was finished. Those extra labels
are there to help you trace through the code. Similarly, if you write code in Compiler Explorer
and it gives you the same code as one of the puzzles but the L* labels have different
numbers in different places than the puzzle, that's fine--the numbers don’t
have to exactly match, but they should be in the right places.
Compiler Explorer settings
Set it up like this:
Hand it in
Combine your edited puzzleN.asm files and your two puzzleN.c files
in a tar file named asm-and-c.tar and submit it via Moodle.
One last note...
This is weird stuff at first. We'll spend all week in class giving you the tools you need to complete this assignment. Be patient and persistent. The pieces will fall into place before long.