HW6: Data ow Analysis and Optimizations Getting Started
To get started, accept the assignment on Github Classroom [https://classroom.github.com/] and clone your team’s repository.
Many of the les in this project are taken from the earlier projects. The new les (only) and their uses are listed below. Those marked with are the only ones you should need to modify while completing this assignment.
Programming Help, Add QQ: 749389476
bin/cfg.ml
bin/analysis.ml
bin/solver.ml
bin/alias.ml
bin/dce.ml
bin/constprop.ml bin/liveness.ml bin/analysistests.ml bin/opt.ml
bin/backend.ml bin/registers.ml bin/printanalysis.ml PerformanceExperiments.xlsx
“view” of LL control- ow graphs as data ow graphs helper functions for propagating data ow facts
the general-purpose iterative data ow analysis solver alias analysis
dead code elimination optimization
constant propagation analysis & optimization
provided liveness analysis code
test cases (for liveness, constprop, alias) optimizer that runs dce and constprop
you will implement register allocation heuristics here collects statistics about register usage
a standalone program to print the results of an analysis
experimental results
bin/datastructures.ml set and map modules (enhanced with printing)
The shared, public git submodule to which you should push your public test case is structured as shown below ( is “spring 20XX”, e.g. sp24 is “spring 2024”):
spXX_hw6_tests/
spXX_hw6_tests/sp24_hw6_tests.ml spXX_hw6/tests/regalloctest.oat
spXX_hw6_tests/*.oat
Git submodule to store student-generated public tests
add the info for your test case here
an example test case (though it is not substantive enough!)
(please modify only your le)
You’ll need to have menhir [http://gallium.inria.fr/~fpottier/menhir/] and clang [https://clang.llvm.org/] installed on your system for this assignment. If you have not already done so, follow the provided instructions [../../toolchain.html#toolchain] to install them.
As usual, running will run the test suite. also now supports several new ags having to do with optimizations.
ctao tset– ctao
desu sretsiger eht fo margotsih a tnirp : sger-tnirp–
esu ot rotacolla
retsiger hcihw tceles : }retteb|ydeerg|enon{ collager–
noitacolla retsiger rof esu ot
sisylana ssenevil hcihw tceles : }wolfatad|laivirt{ ssenevil–
)ecd yb dewollof porptsnoc( fo snoitareti owt snur : 1O-
The Oat compiler we have developed so far produces very ine cient code, since it performs no optimizations at any stage of the compilation pipeline. In this project, you will implement several simple data ow analyses and some
optimizations at the level of our LLVMlite intermediate representation in order to improve code size and speed.
Provided Code
The provided code makes extensive use of modules, module signatures, and functors. These aid in code reuse and abstraction. If you need a refresher on OCaml functors, we recommend reading through the Functors Chapter [https://dev.realworldocaml.org/functors.html] of Real World OCaml.
In , we provide you with a number of useful modules, module signatures, and functors for the assignment, including:
: A module signature for a type that is both comparable and can be converted to a string for printing. This is used in conjunction with some of our other custom modules described below. Wrapper modules and
satisfying this signature are de ned later in the le for the and types.
: A module signature that extends OCaml’s built-in set to include string conversion and printing capabilities.
: A functor that creates an extended set ( ) from a type that satis es the module signature. This is applied to the and
wrapper modules to create a label set module and a UID set module .
: A module signature that extends OCaml’s built-in maps to include string conversion and printing capabilities. Three additional helper functions are also included: for updating the value associated with a particular key, for performing a map look-up with a default value to be supplied when the key is not present, and for updating the value associated with a key if it is present, or adding an entry with a default value if not.
: A functor that creates an extended map ( ) from a type that satis es the module signature. This is applied to the and wrapper modules to create a label map module and a UID map
lm.serutcurtsatad
module . These map modules have xed key types, but are polymorphic in the types of their values.
Task I: Data ow Analysis
Your rst task is to implement a version of the worklist algorithm for solving data ow ow equations presented in lecture. Since we plan to implement several analyses, we’d like to reuse as much code as possible between each one. In lecture, we saw that each analysis differs only in the choice of the lattice, the ow function, the direction of the analysis, and how to compute the meet of facts owing into a node. We can take advantage of this by writing a generic solver as an OCaml functor and instantiating it with these parameters.
The Algorithm
Assuming only that we have a directed graph where each node’s and edges are` labeled with a data ow fact and each node has a ow function, we can compute a xpoint of the ow on the graph as follows:
Here , and are abstract operations that will be instantiated with lattice equality, the meet operation and the ow function (e.g., as might be de ned by the gen and kill sets of the analysis), respectively. Similarly, and
are the graph predecessors and successors in the ow graph, and so are not in one-to-one correspondence with the edges found the LL IR control- ow- graph of the program. These general operations can be instantiated appropriately to create a forwards or backwards analysis.
)m(dda.w ,]n[sccus ni m lla rof
,)]n[tuo tuo_dlo lauqe!( fi
)ni(]n[wolf =: ]n[tuo
)]n[sderp(enibmoc = ni tel
]n[tuo = tuo_dlo
)(pop.w = n tel
ytpme si w litnu taeper
sedon lla htiw tes wen = w tel
wolf enibmoc lauqe
Don’t try to use OCaml’s polymorphic equality operator ( ) to compare and – that’s reference equality, not structural equality. Use
the supplied instead.
erapmoc.tcaF
]n[tuo tuo_dlo
Getting Started and Testing
Be sure to review the comments in the (data ow analysis graph) and module signatures in , which de ne the parameters of the solver.
Make sure you understand what each declaration in the signature does – your solver will need to use each one (other than the printing functions)! It will also be helpful for you to understand the way that connects to the solver. Read the commentary there for more information.
Now implement the solver
Your rst task is to ll in the function in the functor in
. The input to the function is a ow graph labeled with the initial facts.
It should compute the xpoint and return a graph with the corresponding labeling. You will nd the set datatype from useful for manipulating sets of nodes.
To test your solver, we have provided a full implementation of a liveness analysis in . Once you’ve completed the solver, the liveness tests in the test suite should all be passing. These tests compare the output of your solver on a number of programs with pre-computed solutions in . Each entry in this le describes the set of uids that are live-in at a label in a program from . To debug, you can compare these with the output of the
function on the ow graphs you will be manipulating.
printanalysis
The stand-alone program can print out the results of a data ow analysis for a given .ll program. You can build it by doing . It takes a ag to indicate which analysis to run (run with for a list).
sisylanatnirp ekam
lm.tsetsisylana
lm.serutcurtsatad
gnirts_ot.hparG
smargorpll/.
lm.ssenevil
ekaM.revloS
lm.revlos TCAF
sisylanatnirp
For example, once the solver is implemented correctly, you can use it to display the results of liveness analysis for the program like this:
Task II: Alias Analysis and Dead Code Elimination
The goal of this task is to implement a simple dead code elimination optimization that can also remove instructions when we can prove that they have no effect on the result of the program. Though we already have a liveness analysis, it doesn’t give us enough information to eliminate instructions: even if we know the UID of the destination pointer is dead after a store and is not used in a load in the rest of the program, we can not remove a store instruction because of aliasing. The problem is that there may be different UIDs that name the same stack slot or heap location. There are a number of ways this can happen after a pointer is returned by :
The pointer is used as an argument to a instruction
The pointer is stored into memory and then later loaded
The pointer is passed as an argument to a function, which can manipulate it in arbitrary ways
Some pointers are never aliased. For example, the code generated by the Oat frontend for local variables never creates aliases because the Oat language itself doesn’t have an “address of” operator. We can nd such uses of by applying a simple alias analysis.
1% 46i ter
9 ,5 46i dda = 1%
}{=yrtne_{
{ )vcra% **8i ,cgra% 46i(niam@ 46i enifed
ll.dda/smargorpll evil- sisylanatnirp/. >6wh/:0143sic
tsactib rtptnemeleteg
Alias Analysis
We have provided some code to get you started in . You will have to ll in the ow function and lattice operations. The type of lattice elements, , is a map from UIDs to symbolic pointers of type . Your analysis should compute, at every program point, the set of UIDs of pointer type that are in scope and, additionally, whether that pointer is the unique name for a stack slot according to the rules above. See the comments in for details.
1. : the ow function over instructions
2. : the combine function for alias facts
Dead Code Elimination
Now we can use our liveness and alias analyses to implement a dead code elimination pass. We will simply compute the results of the analysis at each program point, then iterate over the blocks of the CFG removing any instructions that do not contribute to the output of the program.
For all instructions except and , the instruction can be removed if the UID it de nes is not live-out at the point of de nition
A instruction can be removed if we know the UID of the destination pointer is not aliased and not live-out at the program point of the store
A instruction can never be removed
Complete the dead-code elimination optimization in , where you will only need to ll out the function that implements these rules.
Task III: Constant Propagation
Programmers don’t often write dead code directly. However, dead code is often produced as a result of other optimizations that execute parts of the original program at compile time, for instance constant propagation. In this section you’ll implement a simple constant propagation analysis and constant folding optimization.
llac erots
enibmoc.tcaf.sailA
wolf_nsni.sailA
Start by reading through the . Constant propagation is similar to the alias analysis from the previous section. Data ow facts will be maps from UIDs to the type , which corresponds to the lattice from the lecture slides. Your analysis will compute the set of UIDs in scope at each program point, and the integer value of any UID that is computed as a result of a series of
and instructions on constant operands. More speci cally:
The ow out of any or whose operands have been determined to be constants is the incoming ow with the de ned UID to with the expected constant value obtained by (statically) interpreting the operation.
The ow out of any or with a operand sets the de ned UID to
Similarly, the ow out of any the de ned UID to
A or of type
All other instructions set the de ned UID to
Constant propagation of this form acts a lot like an interpreter–it is a “symbolic” interpreter that can’t always produce an answer. (At this point we could also include some arithmetic identities, for instance optimizing multiplication by 0, but we’ll keep the speci cation simple.)
Next, you will have to implement the constant folding optimization itself, which just traverses the blocks of the CFG and replaces operands whose values we have computed with the appropriate constants. The structure of the code is very similar to that in the previous section. You will have to ll in:
with the rules de ned above
with the combine operation for the analysis
(inside the function) with the code needed to perform the constant propagation transformation
or with a operand sets
sets the de ned UID to
tsnoCfednU
dioV llac erots
kcolb_pc.porptsnoC
enibmoc.tcaF.porptsnoC
wolf_nsni.porptsnoC
tsnoCfednU
tsnoCfednU
pmci ponib
pmci ponib
pmci ponib
t.tsnoCmyS
lm.porptsnoc
pmci ponib
Once you have implemented constant folding and dead-code elimination, the compiler’s ag will ask to optimize your ll code by doing 2 iterations of (constant prop followed by dce). See . The optimizations are not used for testing except that they are always performed in the register- allocation quality tests – these optimizations improve register allocation (see below).
This coupling means that if you have a faulty optimization pass, it might cause the quality of your register allocator to degrade. And it might make getting a high score harder.
1O- lm.tpo
Task IV: Register Allocation
The backend implementation that we have given you provides two basic register allocation stragies:
none: spills all uids to the stack;
greedy: uses register and a greedy linear-scan algorithm.
For this task, you will implement a better register allocation strategy that makes use of the liveness information that you compute in Task I. Most of the instructions for this part of the assignment are found in , where we have modi ed the code generation strategy to be able to make use of liveness information. The task is to implement a single function that beats our example “greedy” register allocation strategy. We recommend familiarizing yourself with the way that the simple strategies work before attempting to write your own allocator.
The compiler now also supports several additional command-line switches that can be used to select among different analysis and code generation options for testing purposes:
tuoyal_retteb
lm.dnekcab
retsiger deificeps eht esu }retteb|ydeerg|enon{ collager–
sisylana ssenevil deificeps eht esu }wolfatad|laivirt{ ssenevil–
edoc 68x rof scitsitats egasu retsiger eht stnirp sger-tnirp–
Computer Science Tutoring
The ags above do not imply the ag (despite the fact that we always turn on optimization for testing purposes when running with ). You should enable it explicitly.
For testing purposes, you can run the compiler with the verbose ag and/or use the ag to get more information about how your algorithm is performing. It is also useful to sprinkle your own verbose output into the backend.
The goal for this part of the homework is to create a strategy such that code generated with the ags is “better” than code generated using the simple settings, which are
. See the discussion about how we compare register allocation strategies in . The “quality” test cases report the
results of these comparisons.
Of course, your register allocation strategy should produce correct code, so we still perform all of the correctness tests that we have used in previous version of the compiler. Your allocation strategy should not break any of these tests – and you cannot earn points for the “quality” tests unless all of the correctness tests also pass.
Task V: Experimentation / Validation
Of course we want to understand how much of an impact your register allocation strategy has on actual execution time. For the nal task, you will create a new Oat program that highlights the difference. There are two parts to this task.
Create a test case
Push an Oat program to git submodule. This program should exhibit signi cantly different performance when compiled using the “greedy” register allocation strategy vs. using your “better” register allocation strategy
stset_6wh_42ps
lm.dnekcab
wolfatad ssenevil– ydeerg
collager–
wolfatad ssenevil– retteb collager–
sger-tnirp–
with data ow information. See the les and for uninspired examples of such a program.
Yours should be more creative. (For a challenge, try to create an Oat program for which your register allocation strategy yields a signi cant speedup, but for which clang is not able to do much better.)
Evaluate the Performance
Use the unix command to test the performance of your register allocation algorithm. This should take the form of a simple table of timing information for several test cases, including the one you create and those mentioned below. You should test the performance of the compiler in eight con gurations for
source programs:
1. using the 2. using the 3. using the 4. using the
ags (baseline) ags (greedy)
ags (better)
ags (clang)
And… all of the above plus the ag. These experiments will test our optimizations using the Oat optimizer.
To help you with this task, the in this project includes two new targets and that generate executables for each case.
Each such target takes a parameter and can optionally include to enable the Oat level 1 optimizations.
Test your compiler on these three programs:
your own test case
]1O-=TPO[ >ll.emanelif<=ELIF stnemirepxe_ll ekam
]1O-=TPO[ >tao.emanelif<=ELIF stnemirepxe_tao ekam
ll.lumtam/smargorpll
tao.tsetcollager/smargorp4wh
>emanelif<=ELIF ekam
stnemirepxe_ll stnemirepxe_tao
retteb collager-- wolfatad ssenevil--
ydeerg collager-- wolfatad ssenevil--
enon collager-- laivirt ssenevil--
tao.tsetcollager/stset_6wh_42ps
tao.2tsetcollager/stset_6wh_42ps
For best results, use a “lightly loaded” machine (close all other applications) and average the timing over at least ve trial runs.
The example below shows one interaction used to test the
le in several con gurations from the command
The example below shows one interaction used to test the
le in several con gurations that use the ag from
the command line:
latot 200.0 upc %56 metsys s00.0 resu s00.0 tuo.gnalc_a/.
tuo.gnalc_a/. emit >nlos/6wh/wh/0143sic-nnepu/secapskrow/:0143sic
latot 671.0 upc %99 metsys s00.0 resu s81.0 tuo.retteb_a/.
tuo.retteb_a/. emit >nlos/6wh/wh/0143sic-nnepu/secapskrow/:0143sic
latot 781.0 upc %99 metsys s00.0 resu s91.0 tuo.ydeerg_a/.
tuo.ydeerg_a/. emit >nlos/6wh/wh/0143sic-nnepu/secapskrow/:0143sic
latot 854.0 upc %99 metsys s00.0 resu s64.0 tuo.enilesab_a/.
tuo.enilesab_a/.
emit >nlos/6wh/wh/0143sic-nnepu/secapskrow/:0143sic
c.emitnur/nib
tao.tsetcollager/smargorp4wh gnalc– tuo.gnalc_a o- ctao/.
c.emitnur/nib tao.tsetcollager/smargorp4wh
retteb collager– wolfatad ssenevil– tuo.retteb_a o- ctao/.
c.emitnur/nib tao.tsetcollager/smargorp4wh
ydeerg collager– wolfatad ssenevil– tuo.ydeerg_a o- ctao/.
c.emitnur/nib tao.tsetcollager/smargorp4wh
enon collager– laivirt ssenevil– tuo.enilesab_a o- ctao/.
noitazimitpo
htiw tao.tsetcollager/smargorp4wh rof selbatucexe gnitareneG
” noitazimitpo
htiw tao.tsetcollager/smargorp4wh rof selbatucexe gnitareneG” ohce
exe.niam/nib dliub enud
tao.tsetcollager/smargorp4wh=ELIF
stnemirepxe_tao ekam >/62h/secapskrow/:0143sic
1O- ll.lumtam/smargorpll
tao.tsetcollager/smargorp4wh
“1O- noitazimitpo
htiw ll.lumtam/smargorpll rof selbatucexe gnitareneG” ohce
exe.niam/nib dliub enud
1O-=TPO ll.lumtam/smargorpll=ELIF
stnemirepxe_ll ekam >nlos/6wh/wh/0143sic-nnepu/secapskrow/:0143sic
Code Help, Add WeChat: cstutorcs
latot 560.0 upc %89 metsys s00.0 resu s60.0 tuo.1O-gnalc_a/.
-gnalc_a/. emit >nlos/6wh/wh/0143sic-nnepu/secapskrow/:0143sic
latot 324.0 upc %99 metsys s00.0 resu s24.0 tuo.1O-retteb_a/.
-retteb_a/. emit >nlos/6wh/wh/0143sic-nnepu/secapskrow/:0143sic
latot 279.0 upc %99 metsys s00.0 resu s79.0 tuo.1O-ydeerg_a/.
-ydeerg_a/. emit >nlos/6wh/wh/0143sic-nnepu/secapskrow/:0143sic
latot 423.1 upc %99 metsys s00.0 resu s23.1 tuo.1O-enilesab_a/.
-enilesab_a/. emit >nlos/6wh/wh/0143sic-nnepu/secapskrow/:0143sic
1O- ll.lumtam/smargorpll gnalc– tuo.1O-gnalc_a o- ctao/.
ll.lumtam/smargorpll
1O- retteb collager– wolfatad ssenevil– tuo.1O-retteb_a o- ctao/.
ll.lumtam/smargorpll
1O- ydeerg collager– wolfatad ssenevil– tuo.1O-ydeerg_a o- ctao/.
ll.lumtam/smargorpll
1O- enon collager– laivirt ssenevil– tuo.1O-enilesab_a o- ctao/.
– noitazimitpo htiw ll.lumtam/smargorpll rof selbatucexe gnitareneG
Don’t get too discouraged when clang beats your compiler’s performance by many orders of magnitude. It uses register promotion and many other optimizations to get high-quality code!
After collecting this data, use it to ll out . This spread sheet will compute the speed up for each of these programs under the various optimization con gurations. It asks you to report the processor and OS version that are hosting the Docker instance that you used to test your code, as well as the total time (in seconds) for the various con gurations.
Post Your Results
For fun, please post your performance results to Ed [https://edstem.org/us/courses/52477/discussion/] on the designated thread so that everyone can see how your optimizations perform.
xslx.stnemirepxEecnamrofreP
Projects that do not compile will receive no credit!
Your team’s grade for this project will be based on:
90 Points: the various automated tests that we provide. Note that the register-allocator quality points cannot be earned with an allocator that generates incorrect code.
5 Points for committing an interesting test case to the shared repo. (Graded manually.)
5 Points for the timing analysis in (Graded manually.)
xslx.stnemirepxEecnamrofreP
stset_6wh_42ps