Project 2

The goal of this project is to build a parser for the Core language discussed in class. At the end of this handout is the grammar you should follow for this project. This project should be completed using Java.
Your submission should compile and run on stdlinux. It is your responsibility to ensure your code works there.
You need to write a parser for the language described. Every valid input program for this language should be parsed, and every invalid input program for this language should be rejected with a meaningful error message.
Your project should have a main procedure in a file called Main.java, which does the following in order:
1. Instantiate the scanner with the file name given as a command line argument. 2. Generate a parse tree for the input Core program using recursive descent.
3. Perform semantic checks on the parse tree.
4. Use recursive descent to print the Core program from the parse tree.
Make sure your parser can compile and run using these commands on stdlinux so that it will interact with my tester.sh and grading scripts:
javac *.java
java Main Correct/1.code
Input to your project
The input will be identical to the input for project 1. The scanner should processes the sequence of ASCII characters in the file and should produce a sequence of tokens as input to the parser. The parser performs syntax analysis of this token stream and builds a parse tree.
Parse Tree Representation
You should generate a parse tree for the input program, using the top-down recursive descent approach described in class.
You may represent the parse tree however you like. In the “Suggestions” section I give a recommendation on how to do this, not a requirement.
Code Help, Add WeChat: cstutorcs
Output from your project
All output should go to stdout. This includes error messages – do not print to stderr.
The parser functions should only produce output in the case of an error.
The printer functions should produce “pretty” code with the appropriate indentation,
i.e. statement sequences for if/while statements should be indented, and nested statements should be further indented. I do not have strict format requirements here, the printer functions are mainly to help you verify you generated a correct parse tree and help you with your debugging.
NOTE: THE PRINT FUNCTIONS SHOULD ONLY BE CALLED AFTER THE ENTIRE PROGRAM HAS BEEN PARSED! The purpose of the print functions is to verify you have a complete parse tree that has accurately stored all the information we will need to interpret the program in project 3. The print functions must have no interaction with the scanner!
Invalid Input and Semantic Checks
Your parser should recognize and reject invalid input. For any error, you have to catch it and print a message. The message should have the form “ERROR: some description” (ERROR in uppercase). The description should be a few words and should accurately describe the source of the problem. You do not need to have sophisticated error messages that denote exactly where the problem is in the input file. After printing the error message to stdout, just exit back to the OS. But make sure you catch the error conditions! When given invalid input, you parser should not “crash” with a segmentation fault, uncaught exceptions, ect. Up to 20% of your score will depend on the handling of incorrect input and on printing useful error messages.
There are several categories of errors you should catch:
Scanner Errors
First, the scanner should make sure that the input stream of characters represents a valid sequence of tokens. For example, characters such as ‘ ’ and ’%’ are not allowed in the input stream. Your scanner from project 1 should already be catching these. If the scanner finds an error, the parser should stop executing.
Parser Errors
Second, the parser should make sure that the stream of tokens is correct with respect to the context-free grammar. If the parser detects a token in the wrong place it should print a meaningful error message and stop parsing.
Programming Help
Semantic Errors
Third, after building the parse tree there are additional checks that need to be done to ensure that the program “makes sense” (semantic checking). For this project, the following semantic checking needs to occur:
Verify that every ID being used has been declared. Declaration of global and local variables are possible in this language, following similar scoping rules as languages like C or Java. For example, there are two semantic errors in this program:
procedure example is integer x;
begin x:=1;
if x=y then integer z;
Verify that IDs were declared with the appropriate type (integer or array) for how they are being used. There are four places where you need to check that the variables being used are of the correct type:
1. For assign: “id := new integer[];”, the id here needs to have been declared as array, not integer
2. For assign: “id[] = ”, the id here needs to have been declared as array, not integer
3. For assign: “id = array id”, both ids here needs to have been declared as array, not integer
4. For factor: “id[]”, the id here needs to have been declared as array, not integer
Check for “doubly-declared” variable names. All variables declared within the
same scope should have unique names. For example,
should result in an error because there are multiple x’s declared.
procedure example is integer x;
integer x; begin

Testing Your Project
I will provide some test cases. The test cases I will provide are rather weak. You should do additional testing testing with your own cases.
I will provide a “tester.sh” script that works similar to how the script from project 1 worked, and it will use the diff command to compare your output to what the output should be.
Suggestions
There are many ways to approach this project. Here are some suggestions:
Spend some time making sure you understand the grammar on the last page.
Make sure you understand the recursive descent parser described in lecture.
Plan to spend a significant amount of time on the parser. Once it is working correctly, the semantic checking and printing should be simple.
Have separate functions for the parsing and semantic checks. Trying to do these simultaneously will increase the complexity of your code and increase the likelihood of errors.
Represent the parse tree by creating a class for each nonterminal. Instances of these classes will then represent the nodes of your parse tree. Each class should contain at least a field for each child that node in the tree could have.
The parse tree does not need to store everything, many tokens can be discarded after checking them. For example, the root node does not need to store the PROGRAM, IS, BEGIN, and END tokens, we just to need to verify those were present in the input.
Pick a small subset of the language (e.g. only a few of the grammar productions) and implement a fully functioning parser for that subset. Do extensive testing. Add more grammar productions. Repeat.
Post questions on piazza, and read the questions other students post. You may find details you missed on your own. You are encouraged to share test cases with the class on piazza.

Please note this is a language like C or Java where whitespaces have no meaning, and whitespace can be inserted between keywords, identifiers, constants, and specials to accommodate programmer style. This grammar does not include formal rules about whitespace because that would add immense clutter.
Recall epsilon is the empty string. ::= procedure ID is begin end | procedure ID is begin end
::= | ::= | ::= | ::= integer id ;
::= array id ;
::= | | | | |
::= id := ; | id [ ] := ; | id := new integer [ ]; | id := array id ; ::= out ( ) ;
::= in ( id ) ;
::= if then end
| if then else end
::= while do end
::= | not | or | and ::= = | <
::= | + | –
::= | * | /
::= id | id [ ] | const | ( )
Code Help