Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
COMS W4115 Fall 2024: Written Assignment 3
Programming Languages and Translators
Deadline: December 2nd, 2024 by 11:59 PM
Overview
This homework assignment will test your knowledge of semantic analysis, runtime environments, and control flow analysis, all of which you have learned in class. Please submit your assignment via Gradescope by the deadline. You must adhere to the academic integrity policies of the course.
Problems
1 #include <s t d i o . h>
2
3 int x = 3 , r = 5 ;
4
5 int f o o ( f l o a t r ) {
6 return x + 2 ∗ r ;
7 }
8
9 int bar ( f l o a t l , f l o a t m) {
10 return m − l − x ;
11 }
12
13 int pltIsAwesome ( int something ) {
14 int a = 2 , b = 1 , c = 3 , x ;
15 fo r ( x = 0 ; x < 2 ; x++) {
16 b −= f o o ( x ) ;
17 f l o a t something = 2 . 5 ;
18 i f ( x < 1 ) {
19 f l o a t x = 2 , r = x + 1 0;
20 something −−;
21 }
22 e l s e {
23 int x = 3 ;
24 b += 1 ;
25 }
26 something −= r ;
27 int r = 2 ;
28 a += bar ( f o o ( something ) , x − r ) ;
29 }
30 b += 2 ;
31 return a ∗ b − x − something ;
32 }
33
34 int main ( ) {
35 p r i n t f ( "%d\n" , f o o ( pltIsAwesome ( 5 ) ) ) ;
36 return 0 ;
37 }
1. (24 Points) One way to identify redefined variables is to use symbol tables to store information about variable definitions and the scopes that house them. Used by the compiler, symbol tables are map-like
Table 1: Global Symbol Table for Problem 1
data structures that contain important information about variables, functions, and classes, allowing for powerful operations such as scope resolution and type checking of code. You will now have an opportunity to work with symbol tables in this problem.
Suppose you are given the C code snippet on the previous page.
-
(2 Points) Identify the different scopes of function pltIsAwesome in the code snippet. Label scopes in the following form: “Scope X : [Beginning Line Number, Ending Line Number ].” X is a numerical value (starting from 1) that labels your scope, Beginning Line Number is the line number of the first line in the scope, and Ending Line Number is the line number of the last line in the scope. For example, if the second scope you identified is defined from line 2 to line 4, inclusive, then you should include “Scope 2: [2, 4]” in your response. Assume that all scopes are static. Consider function pltIsAwesome only; don’t worry about functions main, foo, or bar.
-
(5 Points) For each scope that you identified in part (a) for function pltIsAwesome, construct its symbol table. Each table must be titled, “Scope X ” (where X is the scope label you created in part (a)), and have the following four columns: Name, Kind, Type, and Line Number ; make sure to identify all of these attributes for each identifier you add to the symbol table. Note that the identifier Kind will be either “parameter” or “variable” for this part. Further, you do not need to worry about global identifiers (global variables and function names) when constructing your symbol table, but you do need to consider function parameters. We have already created the global symbol table for you in Table 1. Finally, assume that all scopes are static. Hint: only include variables defined in a scope as part of the symbol table for that scope.
-
(3 Points) Which definition of variable r does the one referenced at line 26 correspond to? Explain how you can use the symbol tables you constructed in part (b), as well as the global symbol table we have provided, to help you identify this definition.
-
(3 Points) Suppose we are still considering only function pltIsAwesome and are currently at line 19 in the code snippet (we have not scanned the rest of the function yet). Which identifiers (parameters and variables, including global variables if applicable) are active at the point of the program immediately after line 19 is scanned? Active identifiers are those that are currently being referenced at a particular point of the program. List the active identifiers as tuples of the form “(Name, Kind, Type, Line Number ).” You do not need to explain the identifiers you selected.
-
(3 Points) Suppose we are instead at line 26. Which identifiers are active at the point of the program immediately after line 26 is scanned? List the active identifiers just as in part (d).
-
(4 Points) Now, consider all functions in the code snippet. Assuming that the entire code snippet is still statically scoped, compute the following values:
• The value returned by ‘foo‘ the first time it is called on line 16.
• the values of ‘a‘, ‘b‘, ‘x‘ and ‘something‘ at the end of the first iteration of the ‘for‘ loop.
• the values of ‘a‘, ‘b‘, ‘x‘ and ‘something‘ at the end of the second iteration of the ‘for‘ loop.
• the value printed at line 35 after executing main
(4 Points) Once more, consider all functions in the code snippet. Now compute the above values assuming that the entire code snippet is dynamically scoped.
32. (12 Points) Suppose you are given the following C code snippet:
1 Cl a s s PLT {
2
3
int f o o ( int w) {
4
i f (w < 2 )
5
return h e l l o (w, w + 1 ) + 3 ;
6
return bar (w − 2 ) ;
7 }
8
9
int bar ( int x ) {
10
return f o o ( x ) ∗ f o o ( x − 1 ) ;
11 }
12
13
int h e l l o ( int y , int z ) {
14
return y ∗ z ;
15 }
16
17
int main ( ) {
18 bar ( v al ) ;
19
return 0 ;
20 }
21 }
(a) (3 Points) Draw the activation tree generated by executing main when val is 3.
(b) (9 Points) Using a runtime stack, show the activation records generated by executing main when val is 1. You only need to include the old frame pointer, return address, returned value, and function argument(s); don’t worry about temporary variables. You can expand the stack in either direction, as long as you are clear about the direction of expansion in your answer.
3. (20 Points) Suppose you are performing control flow analysis on the following code snippet:
1 e n t r y
2 w := 2
3 x := 3 − w
4 L1 : y := x + y
5
i f w < y goto L2
6 x := x − w
7 z := w ∗ x
8
i f z > 3 goto L3
9
e l s e goto L1
10 L2 : w := x ∗ y
11 y := z − x
12
i f y < 5 goto L3
13
e l s e goto L2
14 L3 : w := x ∗ 2
15 x := w + z
16
return
(a) (6 Points) Suppose that we treat each instruction in the code as a separate node (except goto, which should be an edge), e.g., w := 2 at line 2 and x := 3 - w at line 3 are separate nodes.
Given these nodes, draw the control flow graph (CFG1
) for this code snippet. Note once more that each node in the CFG should represent exactly one statement/instruction in the code snippet.
Make sure to also label your nodes with numbers, as you will need to reference them for the next few parts. You can assume that the return block is the exit block. Hint: Not every line has to be a node. Think about how you should handle "else goto xx" appropriately
(b) (3 Points) List the basic blocks generated from part (a), using as few basic blocks as possible.
The basic blocks should be represented as a sequence of line numbers surrounded by curly braces.
For example, if node 20-24 form a basic block, then you should include {20, 21, 22, 23, 24} in your response. Similarly, if only node 36 forms a basic block, then you should include {36} in your response.
(c) (4 Points) In a CFG, a node x dominates node y if all paths from the entry node to node y include node x. Node x is also referred to as the dominator of node y. List the dominator sets for each node y in your CFG from part (a), where a dominator set is the set of nodes that dominate node y. A dominator set should be represented as a set of node labels (which you added in part
(a)) surrounded by curly braces. For example, if nodes r, t, and x dominate node y, then you should include “y: {r, t, x}” in your response.
(d) (5 Points) In a CFG, a node x strictly dominates node y if all paths from the entry node to node y include node x, and x = y. We define the dominance frontier of a node x as the set of all nodes y in the CFG such that node x dominates an immediate predecessor of node y, but node x does not strictly dominate node y. In other words, the dominance of node x is said to be terminated at its dominance frontier. Given your CFG from part (a), is there any node that has a non-empty dominance frontier? If there is, provide one example of such a node in your CFG by referencing its label, and identify its dominance frontier. The dominance frontier should be represented as a set of node labels surrounded by curly braces. For example, if nodes r, t, and x comprise the dominance frontier of node y, then you should include “y: {r, t, x}” in your response. If there is no such node, explain why all nodes have an empty dominance frontier.
(e) (2 Points) In a CFG, a back edge is an edge such that the target node dominates the source node. List all back edges in your CFG from part (a). Back edges should be represented as “y → x,” where y represents the label of the source node, and x represents the label of the target node.
4. (9 Points) Rachel has a stack machine with an accumulator. She wants to know if it is possible to compute the expression E = (12 − 3) ∗ (8 + 2). Demonstrate to Rachel that it is possible to use the accumulator to compute E by drawing a table that shows the steps of the evaluation, including the contents of the stack and the accumulator. Your table should contain the following columns: Step, 1
Not to be confused with the acronym for a context-free grammar, which is also CFG.
Table 2: Stack Machine Example for Problem 1
Step
Accumulator
Stack
acc ← 2
2
<init>
push
2
2, <init>
acc ← 3
3
2, <init>
acc ← acc ∗ top
6
2, <init>
pop
6
<init>
Accumulator, and Stack. For Step, you can only use acc ← int (acc represents the accumulator, and
int represents an integer), push (pushes a node from acc to the top of the stack), pop (pops off a node
from the top of the stack), top (represents the top of the stack), and the mathematical operators (+,
−, and ∗) in the expression.
For example, suppose you were computing the expression F = 2 ∗ 3. You would then create a table
similar to Table 2.
65. (25 Points) Here is a C program:
#include <s t d i o . h>
int main ( ) {
int a ;
s c a n f ( "%d" , &a ) ;
a = a + a ;
int b = 5 ;
int c = a ∗ b ;
b = a ;
int d = b ∗ 5 ;
c = a + d ;
p r i n t f ( "%d" , c ) ;
return 0 ;
}
(a) (4 Points) Convert the body of the function to SSA (Single Static Assignment) form. Every
variable must have a subscript, starting with 0. Example - scanf("%d", a0);
When a variable is assigned to for a second time, create a new variable with a new subscript.
Refer to the class slides for more information.
If you are typing this out in Latex, for simplicity you can just use suffixes instead of subscripts,
like a0.
You can omit the variable declarations (like int a;). You can also omit the include and function
header. Your code can start with the scanf statement.
(b) (4 Points) Now look at the following LLVM code corresponding to the above C code, generated
using no optimization (-O0).
d e f i n e i 3 2 @main ( ) #0 {
%1 = a l l o c a i 3 2 , a l i g n 4
%2 = a l l o c a i 3 2 , a l i g n 4
%3 = a l l o c a i 3 2 , a l i g n 4
%4 = a l l o c a i 3 2 , a l i g n 4
%5 = a l l o c a i 3 2 , a l i g n 4
s t o r e i 3 2 0 , i 3 2 ∗ %1, a l i g n 4
%6 = c a l l i 3 2 ( i 8 ∗ , . . . ) @scan f ( i 8 ∗ g e t el em e n t p t r inbounds ( [ 3 x i 8 ] , [ 3 x i 8 ] ∗
@. s t r , i 6 4 0 , i 6 4 0 ) , i 3 2 ∗ %2)
%7 = l o a d i 3 2 , i 3 2 ∗ %2, a l i g n 4
%8 = l o a d i 3 2 , i 3 2 ∗ %2, a l i g n 4
%9 = add nsw i 3 2 %7, %8
s t o r e i 3 2 %9, i 3 2 ∗ %2, a l i g n 4
s t o r e i 3 2 5 , i 3 2 ∗ %3, a l i g n 4
%10 = l o a d i 3 2 , i 3 2 ∗ %2, a l i g n 4
%11 = l o a d i 3 2 , i 3 2 ∗ %3, a l i g n 4
%12 = mul nsw i 3 2 %10, %11
s t o r e i 3 2 %12, i 3 2 ∗ %4, a l i g n 4
%13 = l o a d i 3 2 , i 3 2 ∗ %2, a l i g n 4
s t o r e i 3 2 %13, i 3 2 ∗ %3, a l i g n 4
%14 = l o a d i 3 2 , i 3 2 ∗ %3, a l i g n 4
%15 = mul nsw i 3 2 %14, 5
s t o r e i 3 2 %15, i 3 2 ∗ %5, a l i g n 4
%16 = l o a d i 3 2 , i 3 2 ∗ %2, a l i g n 4
%17 = l o a d i 3 2 , i 3 2 ∗ %5, a l i g n 4
%18 = add nsw i 3 2 %16, %17
s t o r e i 3 2 %18, i 3 2 ∗ %4, a l i g n 4
%19 = l o a d i 3 2 , i 3 2 ∗ %4, a l i g n 4
%20 = c a l l i 3 2 ( i 8 ∗ , . . . ) @ p ri n t f ( i 8 ∗ g e t el em e n t p t r inbounds ( [ 3 x i 8 ] , [ 3 x i 8
] ∗ @. s t r , i 6 4 0 , i 6 4 0 ) , i 3 2 %19)
r e t i 3 2 0
}
Match the registers in the LLVM code to your SSA variables. Write it as a table, with SSA
variables in one column and the corresponding LLVM register number in the other column. Note
7that not all LLVM registers will have corresponding SSA names, but all your SSA names
should be matched to corresponding LLVM registers. Here is the first row of the table, to start
you off.
SSA variable
LLVM Register
a0
%2
...
...
(c) (2 Points) Take another look at the above LLVM code, and in particular follow the register %2.
It seems to be “written to” multiple times. On the second line, we have
%2 = alloca i32, align 4
Then later, we have a scanf where the value is stored to %2. Then again, we have an instruction
store i32 %9, i32* %2, align 4
which seems to store the value from %9 to %2.
Does this violate SSA? What is going on here? Explain in one or two short sentences. Hint -
look up the LLVM alloca, store and load instructions, and understand the difference between
registers and memory.
(d) (5 Points) Let’s go back to the SSA code that you created in part (a). Apply only algebraic
optimizations, copy propagation and constant folding to this code. Apply them over and over
again, as many times as you can, in any order, until you cannot apply them any more. For the
kinds of algebraic optimizations you can use, refer to the class slides on local optimization.
(e) (4 Points) Apply only common subexpression elimination and dead code elimination to the code
you get from part (d) (NOT part (a)!). Apply them over and over again, as many times as you
can, in any order, until you cannot apply them any more.
(f) (4 Points) Now you have some super-optimized SSA code. Clang also does all of these opti
mizations (and more!), on the LLVM IR. Let’s take a look at what Clang generates with the -O3
flag.
1 d e f i n e i 3 2 @main ( ) local_unnamed_addr #0 {
2 %1 = a l l o c a i 3 2 , a l i g n 4
3 %2 = b i t c a s t i 3 2 ∗ %1 t o i 8 ∗
4 c a l l void @llvm . l i f e t i m e . s t a r t . p 0i 8 ( i 6 4 4 , i 8 ∗ n o n n ull %2) #3
5 %3 = c a l l i 3 2 ( i 8 ∗ , . . . ) @scan f ( i 8 ∗ g e t el em e n t p t r inbounds ( [ 3 x i 8 ] , [ 3 x i 8 ] ∗
@. s t r , i 6 4 0 , i 6 4 0 ) , i 3 2 ∗ n o n n ull %1)
6 %4 = l o a d i 3 2 , i 3 2 ∗ %1, a l i g n 4 , ! tbaa ! 4
7 %5 = s h l nsw i 3 2 %4, 1
8 s t o r e i 3 2 %5, i 3 2 ∗ %1, a l i g n 4 , ! tbaa ! 4
9 %6 = mul nsw i 3 2 %4, 12
10 %7 = c a l l i 3 2 ( i 8 ∗ , . . . ) @ p ri n t f ( i 8 ∗ n o n n ull d e r e f e r e n c e a b l e ( 1 ) g e t el em e n t p t r
inbounds ( [ 3 x i 8 ] , [ 3 x i 8 ] ∗ @. s t r , i 6 4 0 , i 6 4 0 ) , i 3 2 %6)
11 c a l l void @llvm . l i f e t i m e . end . p 0i 8 ( i 6 4 4 , i 8 ∗ n o n n ull %2) #3
12 r e t i 3 2 0
13 }
Look at lines 5-10 of this code, and rewrite each line in a simplified way that helps you understand
what it is doing. For example, line 5 and 6 might be rewritten as:
s c a n f ( "%d" , & %1) ;
%4 := %1
Do this for lines 7-10. You have complete discretion over the “syntax” you use to represent the
simplified lines, as long as we can understand it. E.g. - you can represent line 6 as %4 ← %1
instead of %4 := %1.
(g) (2 Points) Finally, compare your optimized SSA from part (e) with this simplified LLVM code
from part (f). How is it different? Do you notice anything that puzzles you about the simplified
LLVM code?
8