CS160 Assignment 2 Due: Monday, May 8th 11:59pm
Prerequisites
1. Forthisproject,we’llneedmoredependenciesfromopam.Do: opam install ocamlgraph ppx_hash odoc
before starting this project.
2. IfyouareonMacOS,makesuretoinstallGraphvizifyouhaven’talready.Youcandosowithbrew:
brew install graphviz
On CSIL, Graphviz is already installed.
Important Notes
1. Forthisassignment,youareallowedtouseanyofthestandardlibraryfunctionsfromList.You’llfind List.map , List.concat_map , List.fold_right especially helpful.
2. Someofthefunctionsinthisassignmentuselabeledarguments.Makesuretoreaduponhowlabeled arguments work in OCaml.
3. Wehaveprovidedalotofusefuldatastructuresandhelperfunctionsforthisassignment,andhave documented their expected usage. You can view the documentation by running dune build @doc , which will create HTML pages containing the documentation. The root of the documentation is contained in: _build/default/_doc/_html/index.html . You can use your favoraite web browser to open and navigate this page. If you are using CSIL, You can try opening that file in VS Code remote to open it in your browser. If that doesn’t work, copy the entire _build/default/_doc/_html folder to your computer and try opening that in your browser. Alternatively, you can view the documentation by reading the comments we wrote, which can be usually found in the correspoding OCaml signatures. We strongly advise you not to read the actual implementation of those signatures, since we used some advanced OCaml features and not-so-well documented third-party libraries. In case you have any questions about the stuff we provided, please don’t hesitate to ask them on Slack!
Prelude: What we talk about when we talk about programming languages
A programming language is made up of 3 parts:
1. TheSyntaxdecideswhetherastring(i.e.,asequenceofsymbols)belongstothelanguageornot.For example, if you are in first grade doing arithmetic, or typing into a calculator, a valid syntax would look like:
An example of an invalid syntax would be: + 11 2
Your syntax checker for the “normal” arithmetic would reject this syntax 1 . Essentially, syntax is how your programming language looks.
The part of the compiler that makes this kind of validity decisions is called a parser. In practice, in addition to outputting a yes/no answer, the parser also tranforms a valid string (aka concrete syntax) into a tree (aka abstract syntax) that is represented as some kind of tree-like data structures in memory. The abstract syntax is often called the abstract syntax tree (AST) due to its tree-like structure.
In this assignment, you will only encounter abstract syntax, not concrete syntax. In the next assignment, you will implement a parser that transforms a program written in concrete sytntax to an abstract syntax tree.
2. TheSemanticsdefinesthemeaningofyourprogram.Forexample,theobviousmeaningofthe arithmetic expression 11 + 2 is: “evaluate the addition operation to obtain the integer value 13 “. Although the meaning of 11 + 3 might seem painfully clear, we can perfectly give an alternative meaning of 11 + 2 , by interpreting it as adding hours in a 12 hour clock. In this interpretation, 11 + 2 will evaluate to the integer value 1 . In other words,the semantics of a program is orthogonal to its syntax, because we can give different semantics to the same piece of program.
3. TheCompiler:Essentially,acompilertranslatesonelanguagetoanotherlanguage.Thesecond language is usually lower-level (i.e., closer to what the bare-metal CPU can understand) than the first language.
These three components of a programming language is encapsulated into three signatures in language.ml. In OCaml, signatures goes hand in hand with modules. A signature is like library APIs: it promises the existence of a set of types, values, and functions which can be freely used by the client of the library. A module, on the other hand, fulfills the promise of a signature by providing the required implementations of said types, values, and functions.
1. Let’slookatthesignatureforthesyntaxofaprogramminglanguage:
module type Syntax = sig
type program
val show_program : program -> string
(** Turn a program into a human-readable string. *)
val pp_program : Format.formatter -> program -> unit
(** Pretty-print a program. *)
This is essentially saying that, to implement a program’s syntax, you must provide:
A type called program that represents the abstract syntax tree of your program. When
implementing the program type for a programming language, the AST is usually represented using OCaml’s algebraic data types, but other kinds of representations are possible, as we will see later in this assignment.
A show_program function that takes a program and turns it into a string.
A pp_program which is a pretty printer for the program type.
The `Syntax` signature stipulates what a syntax of any language must provide. We can
extend the `Syntax` signature to declare customized signatures for our own programming
languages. For example, [calculator/sig.ml](./calculator/sig.ml) defines the `Syntax`
signature for the calculator programming language:
(* Calculator’s syntax *)
module type Syntax = sig
type program =
| Const of int
| Add of program * program
| Mul of program * program
include Language.Syntax with type program := program
1. For the Sematics component, we need:
The program type – the abstract syntax tree (AST) representation of our programming language. The interpret: input -> program -> output function that takes the input to the program and the program and produces an output . This interpret function corresponds to the semantics of our program. Note that we generalize our previous notion of semantics by having the interpret takes a parameter of type input . This is because many programming languages allow the programs to manipulate some kind of state: for example, a Turing machine manipulates an infinite tape, a pushdown automaton manipulates a stack, and a C program manipulates your RAM, console, disk, network, etc.
2. For the Compiler we need:
The program1 type – AST of the first program
The program2 type – AST of the second program
The compile: program1 -> program2 function that translates the first program into the second program.
Take a look at the rest of the language.ml and calculator/sig.ml to get a better understanding of the signatures before starting the assignment.
Part 1: Calculator
In this part of the assignment, you’ll first implement the reference semantics for the calculator programming language. Although this semantics is relatively easy to understand, it is hard for machines to execute. Thus, you will next implement a compiler that translates calculator programs into stack machine programs. You only need to fill in the blanks in calculator/impl.ml .
程序代写 CS代考 加QQ: 749389476
Problem1(5pts).Implementthereferencesemantics2 ofthecalculatorprogramminglanguageby completing the interpret function in the Calc_ReferenceSemantics module. This function takes the calculator syntax defined in calculator/sig.ml and evaluates it into an integer.
Problem 2 (5 pts). The stack machine language is defined in calculator/sig.ml . A program in this language is a list of instructions of type instr :
type program = instr list Each instr can be
Push
Op Add2 , which pops two elements from the top of the stack and pushes their sum onto the stack
Op Mul2 , which pops two elements from the top of the stack and pushes their product onto the stack.
Implement the interpret function for the stack machine language in the SM_Semantics module.
Problem 3 (5 pts). Implement a compiler from the calculator language to the stack machine language by completing the compile function in the C2SM module. You must ensure that your compiler is correct in the sense that the behavior of a compiled program matches the behavior of the original program. Specifically, you need to ensure that, in following diagram
the two paths that go from Calculator to Int always produce the same answer. To test this, you can come up with a calculator program P, and compare the result of evaluating P using the reference semantics
( C_ReferenceSemantics ) with the result of evaluating the stack machine program compiled from P (the
CalculatorStackSemantics module implements this “stack machine” semantics of calculator programs). We included a few example test cases in test/test_calculator.ml .
Part 2. Regular Expressions
The regular expressions (regex) can be thought of as a programming language, parameterized by an alphabet. The meaning of a regex r is: given a sequence of symbols from the alphabet, whether or not r matches the sequence. Syntactically, a regex program is built from the following constructs, which is also defined in regex/sig.ml :
Void matches nothing
Lit s matches a literal symbol s from the alphabet Cat (r1, r2) matches the concatenation of r1 and r2 Or (r1, r2) matches either r1 or r2
Star r matches zero or more occurrences of r .
In this part of the assignment, you will implement the reference semantics for regex. However, you will quickly discover that although this semantics is relatively easy to understand, it suffers from abysmal performance due to the need for backtracking. To solve this problem, you will then implement an alternative semantics of regex by compiling it into NFA and then into DFA.
In short, you will implement the following diagram:
For the rest of the assignment, we assume the alphabet always contains the epsilon symbol. You can find the documentation for the alphabet signature in
Problem 1 (5 pts). In regex/syntax.ml , we gave a partial implementation of the Notation module, which defines helper functions to build more complex regex. The documentation for the Notation signature can be found in
Notation module:
val plus : program -> program
(** Matches one or more occurrences of r *)
val seq : program list -> program
(** Matches a sequence of regex *)
val alts : program list -> program
(** Matches some regex out of a list of alternatives *)
Hint: For seq and alts , you may find List.fold_right helpful.
Problem 2 (10 pts). Complete the reference semantics of regex in regex/sem_reference.ml . On a high level, the semantics is that a regex program is interpreted on an input sequence of symbols in the alphabet, aka a word, and returns an output of type bool that indicates whether the regex matches the word.
Reference semantics are the canonical, and usually the most straightforward, to implement the semantics of a programming language. However, in the case of our regular expression reference semantics, this is not the most efficient way. As an exercise, try matching match 000 against (0+0+)+1 using the reference semantics. You’ll find that you’ll get a segfault and run out of memory! This motivates the translation from the regex reference semantics to the more efficient NFA / DFA implementations. In the next few problems, you will implement an efficient semantics of regex by implementing two compilers: the first compiler
translates regex into an equivalent NFA, and the second compiler translates an NFA into a DFA.
You will only need to modify regex/sem_efficient.ml . Note that we have implemented libraries for representing NFA/DFA as transition graphs and utility functions that manipulate them. We will explain how to find the documentation and use those libraries in the next section.
Problem 3 (10 pts). To help you get some familiarity with using the transition graph library, implement the interpret’ function in the NFA module. This function runs the NFA starting from the input state on the
input word. The interpret function will call interpret’ with the initial state as the input state. As an example, we implemented the interpret’ and interpret functions for DFAs in the DFA module, which you may find helpful.
Hint: You may find List.exists and List.map helpful.
Problem 4 (10 pts). Complete the compile function in module Regex2NFA that translates a regex into an
equivalent NFA. You may find helpful to maintain the following invariants for the constructed NFAs:
1. There’sexactly1initialstate
2. There’sexactly1acceptingstate
Problem 5 (10 pts). Implement the compile function in module NFA2DFA that translates an NFA to a DFA. Hint: You may find List.fold_* and fix in utils.ml helpful.
Explanations for how to use the library functions
For the problems in part 2, you’ll need to use the functions defined in transition.ml. The main data structure that transition.ml defines is the Graph Functor:
(* excerpt from transition.ml *)
module Graph = struct
module type S = sig
type label
(** State label type. *)
type symbol
(** Symbol type (transition label). *)
(* … things you won’t have to be worried about rn *)
(* … signatures and documentation for the graph: *)
val dfa_next : t -> state -> symbol -> state
(** Assuming the input graph describes a DFA,
return the next state from the input state via the input symbol.
Raise an exception if no transition is possible, or if there are multiple next
states. *)
val nfa_next : t -> state -> symbol -> state list
(** Assuming the input graph describes an NFA,
return the list of next states from the input state via the input symbol. *)
(* … more methods signatures here *)
module Make (A : Alphabet.S) (Lab : Label.S) :
S with type label = Lab.t and type symbol = A.t = struct
type label = Lab.t
(* … implementations of the methods defined above *)
The comments starting with … are not in transition.ml but are inserted for illustrative purposes. This module might look scary, but I’ll try to explain the important parts. As you can see the Graph type is parameterized by:
1. The label type – or the type for the label of the transition graph
2. The symbol type – or the type for the symbols for running the transition graph
The Graph.Make is what allows you to create a Graph with different types. We can create a transition graph like:
module BinGraph = Graph.Make (Binary) (Label.Int);;
This uses Graph.Make to create a new module called BinGraph . BinGraph is a transition Graph that is parameterized by the Binary alphabet (see: alphabet.ml ) and the Int label (which is defined in transition.ml ). To create new states and add transitions, you can look at the function definitions for graph in transition.ml, here’s an example:
let fresh_state ~init ~accept = BinGraph.State.create (Counter.next ()) ~init ~accept;;
let fresh_init () = fresh_state ~init:true ~accept:false;;
let one_state =
s0 = (fresh_init ())
BinGraph.add_state empty_graph s0
You can find the complete example and other examples in examples.ml. In the DFA code, we have:
module DFA = struct
module Make (A : Alphabet.S) (Lab : Label.S) = struct
module Graph = Graph.Make (A) (Lab)
type state = Graph.state
type symbol = A.t
type output’ = symbol list list
type input’ = state * symbol list
type program = Graph.t
CS Help, Email: tutorcs@163.com
This is because DFA is a Functor as well! Thus, DFA’s transition graph (the module Graph = Graph.Make (A) (Lab) part) is parameterized by the arguments you give to DFA.Make . To try out the DFA on a concrete alphabet and label, you need to do:
To get a better understanding of how this works, look at the examples (or ./test/examples.ml ). To load the examples into utop, you can do:
Then in the utop prompt:
The my_output constant stores the output of running the example input on an example_program. This is false because our interpret DFA outputs booleans. The h () function saves a picture of the example
graph to the hw2 folder, with the name graph
Example test cases for part 1 and part 2 can be found in test/test_calculator.ml and test/test_regex.ml, respectively. You can run all tests with dune runtest .
Submission
program’s semantics (as you’ll see), but in order to keep the semantics of a programming language consistent, the language authors often provides a “Reference Semantics” implementation of their programming language. For example, CPython is the reference implementation of Python, and Micropython is an implementation of Python’s sematics that is optimized for microcontrollers.
1. Even though this expression is valid in Polish Notation. ↩
2. This semantics is called the “Reference Semantics” as there are often many ways to implement a ↩
module MyBinDFA = DFA.Make (Binary) (Label.Int)
(* try out the DFA functions here *)
utop # open Examples;;
utop # my_output;;
– : string = “false”
utop # h ();;
Programming Help, Add QQ: 749389476