Lab 2: Dataflow
Fall Semester 2020
Due: 28 September, 8:00 a.m. Eastern Time Corresponding Lecture: Lesson 5 (Dataflow Analysis)
This lab will familiarize you with writing static program analyses using the LLVM compiler infrastructure. LLVM is a collection of compiler and analysis toolchain utilities widely used in the software analysis community. You will use LLVM to implement two intra-procedural dataflow analyses, one forward (reaching definitions analysis) and one backward (liveness analysis). In LLVM, these are referred to as passes over the code.
Note on Past Issues
This lab is more challenging than your earlier labs. In past semesters, it has caused a high number of students to be submitted to the Office of Student Integrity for Academic Integrity violations. In particular, it’s possible to find solutions to similar analyses on the internet. Looking at these solutions in any form is likely to influence your thinking and cause your solutions to be similar to them, which is an Academic Integrity violation in this class. If you are unclear of our guidelines for what is collaboration and what is cheating, we suggest reviewing that section of the syllabus. If you have any questions about what is allowed and what is not allowed, please reach out to Instructors via Piazza for clarification.
Students who submit solutions found to be similar to online resources or other students should expect a 0 grade on the lab, a disciplinary record of an Academic Integrity issue through the Office of Student Integrity, and will not be eligible to receive a final grade of A in the course. Students who have had past Academic Integrity issues may find that OSI assigns them higher penalties.
General Resources
● Getting Started with LLVM: http://llvm.org
http://releases.llvm.org/8.0.0/docs/GettingStarted.html#an-
example-using-the-llvm-tool-chain
● LLVM Documentation: http://releases.llvm.org/8.0.0/docs/index.html
http://releases.llvm.org/8.0.0/docs/LangRef.html
http://llvm.org/doxygen/index.html
● C++ Pointers / References: http://www.cplusplus.com/doc/tutorial/pointers/
https://www.geeksforgeeks.org/pointers-vs-references-cpp/
● SSA Intermediate Representation Form (used in LLVM bitcode): https://en.wikipedia.org/wiki/Static_single_assignment_form
Programmer’s Manual Resources
● Programmer’s Manual: http://releases.llvm.org/8.0.0/docs/ProgrammersManual.html
● Using isa<> to check whether a pointer points to an instance of a particular type: http://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-c ast-templates
● Enumerating basic blocks and instructions in a function: http://releases.llvm.org/8.0.0/docs/ProgrammersManual.html#basic-i
nspection-and-traversal-routines http://releases.llvm.org/8.0.0/docs/ProgrammersManual.html# iterating-over-predecessors-successors-of-blocks
● Important classes: http://releases.llvm.org/8.0.0/docs/ProgrammersManual.html#the-
value-class http://releases.llvm.org/8.0.0/docs/ProgrammersManual.html#the- user-class http://releases.llvm.org/8.0.0/docs/ProgrammersManual.html#the- instruction-class http://llvm.org/docs/doxygen/html/classllvm_1_1ValueMap.html http://llvm.org/docs/doxygen/html/classllvm_1_1SetVector.html
1. Download and extract the lab code found on Canvas in the dataflow.zip archive.
2. Navigate to the Dataflow folder and execute the dataflow_prep.sh file with the following command (you will likely be prompted for the VM password, which is
‘student’):
3. Navigate to and run the following commands:
You should now see libDataflowPass.so under Dataflow/build/Dataflow.
sudo sh dataflow_prep.sh
Dataflow/build
make clean
4. Go to the Dataflow/example directory and generate LLVM bitcode from the programs we will analyze with the following commands:
5. Run a Dataflow pass using the command below to ensure everything works as expected for the test program ArrayDemo.c. This pass demonstrates the API by printing the definitions, uses, predecessors, and successors of each instruction.
opt -load ../build/Dataflow/libDataflowPass.so -Printer < ArrayDemo.bc > /dev/null In addition to testing your setup, the printer pass (in Printer.cpp) is a very useful
reference for implementing your own passes.
Lab Instructions
Complete the method in the ReachDefAnalysis.cpp and LivenessAnalysis.cpp (located in ) skeleton files to implement the two analyses. Do not write your analysis code outside of these files, as these files are the only ones you will submit. You may use the C++ Standard Template Library (STL) when implementing your analyses and the functionality provided through LLVM, but you may not use other third party libraries.
Your code will need to iterate over the program points in the input program and store the
clang -emit-llvm ArrayDemo.c -c -o ArrayDemo.bc
clang -emit-llvm Greatest.c -c -o Greatest.bc
doAnalysis
Dataflow/Dataflow/
computed dataflow facts in analyses inherit from the base class
located in the directory
and DataflowAnalysis::outMap. Both , which you can find in the header file
. Besides including useful also defines useful utility
, and isDef.
classes such as and functions such as
DataflowAnalysis::inMap
DataflowAnalysis
DataflowAnalysis.h
Dataflow/Dataflow/
SetVector
ValueMap
DataflowAnalysis.h
getPredecessors
getSuccessors
LLVM passes are performed on an intermediate representation (LLVM bitcode) generated from the program source code. LLVM bitcode is generated in Static Single Assignment (SSA) form, a common simplification used by compiler infrastructure. In SSA form, each variable is assigned exactly once, and every variable is defined before it is used. Variables are represented directly by the instruction defining it. In fact, the Instruction class is a subtype of the Value class. You can check whether an instruction defines a variable by checking getType()->isVoidTy(). An entry into the inMap or outMap is the instruction as opposed to a variable. The lectures speak to variables being stored (x or y) but with the SSA representation being used, the instruction itself is a proxy for the variable. That is why the inMap and outMap all show the instructions in the printer output.
Programming Help
Since each variable is uniquely defined, variables will never be redefined in SSA form. This will affect the generation of KILL sets in your implementation of reaching definitions analysis, specifically, they will always be empty.
After completing the two doAnalysis methods, rebuild the analyses using the commands from setup step 3, and then rerun the analyses using following commands to print the results of the analyses on the ArrayDemo program:
If your implementation is correct, your output will match the example output in ArrayDemo_ReachDef and ArrayDemo_Liveness found in dataflow/example/. The order of elements in the IN and OUT sets does not matter, but the number of elements and the values should match exactly. Please note that if your implementation produces extra console output beyond the sets, we will not consider your output as matching.
We have also included another program, Greatest.c, and it’s expected outputs for testing your implementation. You can use commands similar to those above to analyze this program. We also encourage students to develop their own test cases / corresponding output and share them on Piazza, although we will not be validating them for correctness.
Additionally, we have provided images of the control flow graphs (CFGs) for both programs, named array_demo_graph.gif and greatest_graph.gif. Each instruction is represented by a node, and edges represent an instruction’s successor instruction(s). These graphs are useful for reasoning about the correctness of your IN and OUT sets for the sample program analyses. We recommend that you take a moment to review the graph and ensure you understand the structure of each program. Pay close attention to the instructions that have multiple successor and/or predecessor instructions.
Helpful LLVM API Information
ValueMap has a similar interface to std::map. For example, you can access or insert an element using the [] operator:
opt -load ../build/Dataflow/libDataflowPass.so -ReachDef < ArrayDemo.bc > /dev/null
opt -load ../build/Dataflow/libDataflowPass.so -Liveness < ArrayDemo.bc > /dev/null
ValueMap
Instruction* I = /*…*/;
vm[I] = 5; // inserts to the map
Code Help
LLVM will generate all of the necessary objects for your analyses according to its object model (see ). You will not need to create new elements to insert into
or DataflowAnalysis::outMap, your code should add the pointer to the object to the .
SetVector has a similar interface to std::vector, except that it does not permit inserting duplicate elements. The == operator returns false for two SetVector objects containing the same elements in different orders.
Also, functions in the library from the STL work with SetVector. For example:
assuming is defined elsewhere as:
Items to Submit
We expect your submission to conform to the standards specified below. To ensure that your submission is correct, you should run the provided file validator. You should not expect submissions that have an improper folder structure, incorrect file names, or in other ways do not conform to the specification to score full credit. The file validator will check folder structure and names match what is expected for this lab, but won’t (and isn’t intended to) catch everything.
The command to run the tool is: python3 zip_validator.py lab2 lab2.zip
Submit the following files in a single compressed file (.zip format) named lab2.zip. For full credit, there must not be any subfolders or extra files contained within your zip file.
1. (50 points)
2. (50 points)
DataflowAnalysis.cpp
DataflowAnalysis::inMap
SetVector
for( int i = 0; i < 5; i++ )
sv.insert(i);
sv.insert(0); // has no effect
if (std::all_of(sv.begin(), sv.end(), isPositive))
// all_of is from
printf(“All numbers in sv are positive!”);
isPositive
ValueMap
bool isPositive(int i) {return i > 0;}
LivenessAnalysis.cpp
ReachDefAnalysis.cpp
浙大学霸代写 加微信 cstutorcs
Grading Criteria
Generally, credit for this lab is awarded as follows:
● For each program, both analyses will be equally weighted
● The inset and outset of each analysis will be compared against the correct values, with
each equally weighted
● For each set, your score will be [# expected tuples]−[# missing tuples]−[# extra tuples] % of the total
[# expected tuples]
possible points for that set. For example, an output with 10 expected tuples where you
found all 10 expected tuples but also two additional tuples would score 10−0−2 % = 80% 10
of the possible credit
Your dataflow analyses will be graded against multiple programs, including the benchmark programs provided as a part of this lab assignment, which will constitute at least half of your grade. In general, the programs used in grading but not provided as part of the lab are of the same order of complexity as the provided programs. Please note that your analysis must not produce any output beyond the in and out set values when run.
While efficiency is important, it is entirely secondary to correctness. You will not gain points for an efficient yet incorrect algorithm. There is a time limit several times longer than the expected execution time for an efficient algorithm in the grader. Any analysis that has not completed at the end of this time limit will not be considered correct as will any analysis that causes a crash.