CS2630: Computer Organization Project 1
MiniMa: (mini) MIPS assembler written in C
Goals for this assignment
• Translate MAL instructions to TAL, and TAL to binary
• Resolve the addresses of branch and jump labels
• Build an assembler
Before you start
This project is fairly involved, so start early. To be prepared, you should:
• review the readings from Ch 6
• complete the Stored Programs inquiry activity
• take the Knowledge Check on pseudoinstructions
Read the document as soon as you can and ask questions in Debug Your Brain/discussion board/office hours.
The testing infrastructure is designed so that it is possible to get MiniMa working one phase at a time (there are 3 phases), so you can pace yourself.
So that we know who is working together: you must declare your partner at https://uiowa.instructure.com/courses/101698/pages/sign-up-for-project-1-partner
Rubric
See the assignment on ICON for the rubric. Notice that non-compiling programs and programs that don’t pass any tests earn 0 points. Therefore, we cannot stress enough that you should approach this project incrementally. Consider the following stories:
• Team 0xBA51C: got a few tests working each day, running the code frequently to check.
They ended up passing all tests except a couple tricky cases and got an A (> 90 %).
• Team 0xCOFFEE: wrote all functionality in one all-night coding fever. They are pretty sure they covered every single case for all 3 phases. However, they never compiled or ran their code until the due date. When they did, they found it didn’t even compile.
They didn’t have time to fix it and had to turn it in as is. Grade: F (0%).
Setup
You can git clone (or download zip) the project from https://research-git.uiowa.edu/cs2630-assignments/minima-c
Log into this website using your hawkid and password.
A note on collaboration: We encourage you and your partner to create a https://research-git.uiowa.edu/ repository to collaborate on your code more effectively. If you do so, you must mark your repository Private and add your partner by going to Members and inviting your partner by hawkid.
Repositories marked Internal or Public will be considered an intent to cheat by sharing code with other students.
Introduction
In this project, you will be writing some components of a basic assembler for MIPS, called MiniMa (mini MIPS assembler). You will be writing MiniMa in Java.
The input to MiniMa is an array of Instruction objects. The Instruction class has several fields indicating different aspects of the instruction. MiniMa does not include a parser to turn a text file into Instruction objects, so you will write programs using the functions in InstructionFactory, which creates Instructions.
struct Instruction { enum ID instruction_id; // id indicating the instruction uint8_t rd; // register number destination
uint8_t rs; // register number source
uint8_t rt; // register number secondary source
int32_t immediate; // immediate, may use up to 32 bits
uint32_t jump_address; // jump address
uint8_t shift_amount; // shift amount
char label[MAX_LABEL_LENGTH]; // label of the line this Instruction appears on char branch_label[MAX_LABEL_LENGTH]; // label used by branch or jump instructions
};
MiniMa has three basic phases for translating a MIPS program into binary. The next three sections describe these phases. The section after that, “What you need to do”, will describe your job in Project 1.
1. Convert MAL to TAL
In this phase, MiniMa converts any pseudo instructions into TAL instructions. Specifically, MiniMa creates a new output array of Instruction objects and stores the TAL instructions into it in order. For any true instruction in the input, MiniMa just copies it from the input to the output. For any pseudo instruction, MiniMa writes 1-3 real instructions into the output.
Examples:
Ex a)
label2: addu $t0,$zero,$zero
This instruction will be provided to you as the Instruction object:
Instruction_id rd rs rt imm jump_address shift_amount label branch_label addu 8 0 0 0 0 0 "label2" 0
Because this instruction is already a TAL instruction, you will just copy it into the output array.
Ex b)
blt $s0, $t0, label3
This pseudo instruction, will be provided to you as the instruction object:
Instruction_id rd rs rt imm jump_address shift_amount label branch_label
blt 8 0 0 0 0 0 "" "label3"
Note that label="" because the line the instruction was on is unlabeled.
This instruction is a pseudo instruction, so we must translate it to TAL instructions. In this case:
slt $at,$s0,$t0
University of Iowa, 2020
bne $at,$zero,label3
Which you will represent with the following two Instruction objects.
Instruction_id rd rs rt imm jump_address shift_amount label branch_label
slt 1 16 8 0 0 0 "" ""
Instruction_id rd rs rt imm jump_address shift_amount label branch_label
bne 0 1 0 0 0 0 "" "label3"
We used $at (the assembler register) to store the result of the comparison. Since MIPS programmers are not allowed to use $at themselves, we know we can safely use it for passing data between generated TAL instructions.
IMPORTANT: notice that branch instructions have Immediate=0 in phase 1. Instead, they specify the target using branch_label. In phase 2, the branch_label will get translated into the correct immediate.
You must also make sure that you detect I-type instructions that use an immediate using more than the bottom 16 bits of the immediate field and translate them to the appropriate sequence of instructions.
This is just the end of the Part 1 background information. There are no tasks to do yet, so keep reading!
2. Convert labels into addresses
This phase converts logical labels into actual addresses. This process requires two passes over the instruction array.
- Pass one: find the mapping of labels to the PC where that label occurs
- Pass two: for each instruction with a non-zero branch_label (jumps and branches) calculate the appropriate address using the mapping.
Example before phase2: branch target for branch instructions indicated using branch_label field
Address Label Instruction Important instruction field values
0x00400000 label1: addu $t0,$t0,$t1 label="label1"
0x00400004 ori $t0,$t0,0xFF
0x00400008 label2: beq $t0,$t2,label1 label="label2", branch_label="label1"
0x0040000C addiu $t1,$t1,-1
0x00400010 label3: addiu $t2,$t2,-1 label="label3"
after phase2: branch target for branch instructions indicated using immediate field
Address Label Instruction Important instruction field values
0x00400000 label1: addu $t0,$t0,$t1
University of Iowa, 2020
0x00400004 ori $t0,$t0,0xFF
0x00400008 label2: beq $t0,$t2,-3 immediate = -3
0x0040000C addiu $t1,$t1,-1
0x00400010 label3: addiu $t2,$t2,-1
This is just the end of the Part 2 background information. There are no tasks to do yet, so keep reading!
3. Translate instructions to binary
This phase converts each Instruction to a 32-bit integer using the MIPS instruction encoding, as specified by the MIPS reference card. We will be able to test the output of this final phase by using MARs to translate the same input instructions and compare them byte-for-byte.
To limit the work you have to do, MiniMa only needs to support the following instructions:
Instruction
addiu (whether it is MAL depends on imm)
sll
addu
or
beq
bne
slt
lui
move (always MAL)
ori (whether it is MAL depends on imm)
blt (always MAL)
bge (always MAL)
This is just the end of the Part 3 background information. Your tasks begin in the next section.
What you need to do
1. You will complete the implementation of phase 1 by modifying the file Phase1.c. Do not modify Phase1.h.
/* Translates the MAL instruction to 1-3 TAL instructions
* and returns the TAL instructions in a list
*
* mals: input program as a list of Instruction objects
* tals: after the function returns, will contain TAL instructions (should be same
size or longer than input list)
*/
void mal_to_tal(struct ArrayList *mals, struct ArrayList *tals);
If a MAL Instruction is already in TAL format, then you should just copy that Instruction object into your output list.
• It is strongly discouraged to modify the fields of an Instruction object individually.
Instead you should do the below.
• To create an Instruction with specific fields, see the functions in InstructionFactory.h. Find examples of their usage in AssemblerTest.c.
If a MAL Instruction is a pseudo-instruction, such as blt, then you should create the TAL Instructions that it translates to in order in result ArrayList.
You must check I-type instructions for the case where the immediate does not fit into 16 bits and translate it to lui, ori, followed by the appropriate r-type instruction. Remember: the 16-bit immediate check does not need to be done on branch instructions because they do not have immediates in phase 1 (see phase 1 description above).
Use the following translations for pseudo instructions. These translations are the same as MARS uses.
I. Instruction passed to mal_to_tal:
addiu r1,r2,Immediate
# when immediate is too large (note that addi/addiu RTL has sign
extension)
=>
Instructions returned from mal_to_tal:
lui $at,Upper 16-bit immediate
ori $at,$at,Lower 16-bit immediate
addu r1,r2,$at
Note that lui will never be given an immediate too large because it is not well-defined for more than 16 bits (MARS also disallows lui with >16-bit immediate, try it yourself).
II. Instruction passed to mal_to_tal:
blt r1,r2,gohere
=>
Instructions returned from mal_to_tal:
slt $at,r1,r2
bne $at,$zero,gohere
III. Instruction passed to mal_to_tal:
bge rs,rt,hello
=>
Instructions returned from mal_to_tal:
slt $at,r1,r2
beq $at,$zero,hello
University of Iowa, 2020
IV. Instruction passed to mal_to_tal:
ori r1,r2,Immediate
# when immediate is too large (note that ori RTL has zero extension)
=>
Instructions returned from mal_to_tal:
lui $at,Upper 16-bit immediate
ori $at,$at,Lower 16-bit immediate
or r1,r2,$at
V. Instruction passed to mal_to_tal:
move r1,r2
=>
Instructions returned from mal_to_tal:
addu r1, $0, r2
NOTES:
If in doubt about a case of translation there are things you can do before asking for help.
1) examine the test cases, and
2) assemble the test program in MARS to see what it produces
You only need to support the instructions that are already uncommented in Instruction.c. You should not change Instruction.c or InstructionFactory.c.
2. You will complete the implementation of phase 2 by implementing the 2-pass address resolution in a function called resolve_addresses in Phase2.c. Do not modify Phase2.h.
/* Returns a list of copies of the Instructions with the
* immediate field (i-type) or jump_address (j-type) of the instruction filled in
* with the address calculated from the branch_label.
*
* The instruction should not be changed if it is not a branch or jump instruction.
*
* unresolved: input program, whose branch/jump instructions don't have resolved immediate/jump_address
* first_pc: address where the first instruction of the program will eventually be placed in memory
* resolved: after the function returns, resolved contains the resulting instructions
*/
void resolve_addresses(struct ArrayList *unresolved, uint32_t first_pc, struct ArrayList
*resolved);
Using our example from the phase 2 description:
Address Label Instruction Important instruction field values
University of Iowa, 2020
0x00400000 label1: addu $t0,$t0,$t1 label="label1"
0x00400004 ori $t0,$t0,0xFF
0x00400008 label2: beq $t0,$t2,label1 label="label2", branch_label=1
0x0040000C addiu $t1,$t1,-1
0x00400010 label3: addiu $t2,$t2,-1 label="label3"
The first_pc argument is the address where the first instruction in unresolved would be written to memory after phase 3. Using the above example, resolve_addresses would be called with first_pc=0x00400000.
Refer to the earlier description of phase 2 for how to calculate the immediate field.
3. You will complete the implementation of phase 3 by implementing the function translate_instruction in Phase3.c. Do not modify Phase3.h.
/* Translate each Instruction object into
* a 32-bit number.
*
* tals: list of Instructions to translate
*
* returns a list of instructions in their 32-bit binary representation
*
*/
public static List translate_instructions(List tals)
This function produces an encoding of each R-type, I-type, or J-type instruction. Refer to the
MIPS reference sheet for format of the 32-bit format.
Make sure to set unused fields to 0. This default value is not required by the MIPS architecture, but it is required by the provided MiniMa test code. You'll also notice that MARS chooses to use 0 for unused fields.
General implementation help
• .h files are header files, which declare symbols (functions, structs, enums) that are defined elsewhere in a .c file. The idea of putting declarations in an .h file and definitions in a .c file is roughly analogous to putting declarations in an interface and definitions in a class in Java.
• If you need a function or struct declared in a .h file, then put the following at the top of your .c file.
#include “filename.h”
For example, if your code needed to refer to struct Instruction, then put
#include “Instruction.h”
• We’ve structured the skeleton code so that you only need to modify…
o Phase1.c
o Phase2.c
o Phase3.c
o AssemblerTest.c (optional but recommended)
Running and testing your code
The three phases are run on several test inputs by …
make
./AssemblerTest
The provided test cases separately test the three Phases. Therefore, you'll have good evidence for a correct Phase when all tests named by that Phase # are passing.
You can add your own tests to AssemblerTest.java. Use an existing test as a template. If you don't want to manually calculate what phase3_expected should be for your test, you can letMARS do it for you: assemble your input program in MARS and look at the Code column in the Text Segment.
You should add additional test cases to AssemblerTest.java. Find corner cases the tests do not cover:
• other input instructions
• I-type instructions that do or do not exceed 16 bits
• different label positions
• different combinations of instructions
You are responsible for testing your assembler beyond the given tests. We will use additional tests during grading.
Testing tips
When you run the tests, you’ll get a message like.
Assertion failed: (size(expectedP1) == size(tals)), function testHelperPhase1, file
AssemblerTest.c, line 18.
Since many tests call testHelperPhase1, this information alone is not enough to know which test failed. You’ll need a stack trace. To get one, use gdb. Here’s an example interaction.
gdb ./AssemblerTest
(gdb) run
(gdb) bt
#0 0x00007ffff7deee35 in raise () from /lib64/libc.so.6
#1 0x00007ffff7dd9895 in abort () from /lib64/libc.so.6
#2 0x00007ffff7dd9769 in __assert_fail_base.cold () from /lib64/libc.so.6
#3 0x00007ffff7de7566 in __assert_fail () from /lib64/libc.so.6
#4 0x000000000040249d in testHelperPhase1 (input=0x40b2a0, expectedP1=0x40b650) at
AssemblerTest.c:18
#5 0x0000000000402c17 in test1Phase1 () at AssemblerTest.c:80
#6 0x00000000004067b0 in main () at AssemblerTest.c:1174
Aha! It was test1Phase1 that failed. From here you could examine the variable values by adding print statements or by continuing your gdb interaction. Tip: the frame command in gdb takes an argument N and goes to the function call labeled #N in the stack trace.
(gdb) frame 4
#4 0x000000000040249d in testHelperPhase1 (input=0x40b2a0, expectedP1=0x40b650) at
AssemblerTest.c:18
18 assert(size(expectedP1) == size(tals));
(gdb) display size(expectedP1)
1: size(expectedP1) = 7
(gdb) display size(tals)
2: size(tals) = 0
What to submit
For credit your MiniMa implementation must at least compile and be runnable. You should not depend on modifications to files other than those you are submitting:
Required:
• Phase1.c (method implemented)
• Phase2.c (method implemented)
• Phase3.c (method implemented)
Both partners are responsible for double-double-checking that 3 files submitted to ICON are the version of the files that you intended.
Good job!
Once you have completed this project, if you added support for the rest of the MIPS instructions and the dot (.) directives you could replace the assembler of MARS (i.e., the code behind this button ).
The only piece we’ve excluded is the parser that turns text files of MIPS code into our internal representation of instructions: a list of Instruction objects.