CS106 CS442 SE442 Homework 1: scanner

CS106 CS442 SE442 Homework 1: scanner
RR_language.pdf
Sample_RR_programs

CS106 & CS442 & SE442 Compiler principles and constructions
Macau University of Science and Technology
1. Introduction
We will construct a lexical analyzer, also called a scanner for the specified source language. A scanner can be designed as an automaton using the DFA (deterministic finite automaton) techniques. DFA can accomplish complex tasks, just like automata can perform incredible tasks, as shown by the picture above, which is found on the internet [1].
2. The source language
The source language is documented in the file , which is uploaded to Moodle. In addition, some sample programs of this language (the .rr files) are in the folder
. These files are available on Moodle. You should read these files and understand the RR language.
3. Preparation
Studying some related materials are helpful, including
The file on a simple scanner algorithm, which describes a DFA-based scanner. The file on the *C-Minus` language, which shows a DFA diagram for a parser. The document and sample code of the RR language.
The file scanner.fib.rr , which shows the output of scanner for a .rr file .

4. Expected behavior of the scanner
The above picture found online [2] shows the computation of a scanner,
Test your scanner with the provided RR programs as the source. You can use the programs in the
folder Sample_RR_programs . Or you can use some RR program that you create. The output of a scanner for RR is as follows:
When a source program is correct,
some token list should be generated, which will be used by the parser in the future. The generated token list should be printed.
For each token, the token type or the token string should be printed. If the string cannot be decided by its type, like ID or Number, or String, then both the string and type should also be printed.
When some error occurs, the corresponding error message should be printed. At least one error message is printed for the first lexical error found by the compiler.
Try to create some errors that should be detected by the scanner. For example
An optional feature of the scanner is to print each line of the source code before printing its tokens, as shown by the textbook code.
The comment can be recorded as a token, but it is optional.
How to distinguish the negation token and minus token? It is tricky because they have the same string “-“. There are two ways to do it
1. let the scanner make the decision. We can observe that for token with – , the previous token can decide if it is negation or minus. For example, it is a negation if the previous token is an operator, or ), or ], or }, or ;, and so on. By exhausting all the cases, the scanner can tell it is a negation or minus.
2. Let the parser do it. It could be easier. When a ‘-‘ token is found, the scanner can let its type be HBAR (horizontal bar). Later the parser can decide if it is a minus or negation, possibly changing the token type to MINUS or NEG
x = 3|5 ; // error: | does not belong to any token
y = 3\-5 ; // error: the bot of a num cannot be negative
z = ‘abc’; // error: abc should not appear between two single quotes.
Code Help
For example, suppose a RR source file contains the following code:
void task(void){
int temperature = -75;
while (temperature < (26 + 1\2) ) { /* 26.5 degree is warm */ prt(“int”, temperature); prt(“str” ” is too cold.\n”); temperature = temperature + 1; prt(“str”, “The room is warm enough.\n”); When the scanner runs, the screen printing could be similar to the sample run listed below. In the conversation, a line starting with a smile face 🙂 is a sentence output by the program in the conversation. Each token is printed in a line, where the starting line number is optional. myscaner sample.rr 🙂 Each line of sample.rr with its tokens are shown as follows: 1…… void task(void){ 2….. int temperature = -75; ID temperature 3….. while (temperature < (26 + 1\2) ) { /* 26.5 degree is warm */ ID temperature 4….. prt(“int”, temperature); STRC “int” ID temperature 5….. prt(“str”, ” is too cold.\n”); STRC “str” STRC ” is too cold.\n” 6….. temperature = temperature + 1; ID temperature ID temperature 7….. } RCUR 程序代写 CS代考 加微信: cstutorcs 8….. prt(“str”, “The room is warm enough.”); STRC “str” STRC “The room is warm enough.” 9….. } RCUR 10….. task(); Note that you can choose to let your scanner to obtain the source code filename by asking the source file name, instead of as a command-line argument. The file scanner.fib.rr shows an example of more detailed output of a scanner. Other helpful file may also be uploaded with the assignment. 5. Implementing the scanner Use the algorithm of a scanner discussed in class, which is also documented in the file simple_scanner_algorithm.pdf . Do not use some automatically generated scanner code. You can use C or C++ to implement the scanner, but C is recommended because the helping code shown by the teacher in the classroom will be C code. Design your program with reasonable modules. A possible choice (not required) is having the following files : include the declarations of tools for general purpose. : the definitions of general-purpose tools. : declaration of token types and related function prototypes. : token-related function definitions. : declarations related to the list data structure. : function definitions for the list data structure. : declaration of some DFA. automata.c : definitions for the DFA. The DFA can be defined as a function according to the simple scanner algorithm. scanner.c : the main function, the driver of the program. automata.h 5.1 Bonus features (5 points) A num constant can be simplified by the scanner. For example, a num constant 4\6 in a program can be recorded as the most simplified form 2\3 by the scanner. For example, if line 5 is a statement , then the scanner may print for this line : You have at most five bonus point if you can do this. (15 points) A suggested way to implement the scanner is following the design of some DFA. There is a document explaining the DFA-based scanner algorithm. If you can implement your scanner in such a way, showing a jflap graph of the DFA for the scanner, and the code can clearly show the structure following a DFA, you have at most 15 bonus points. 6. How to submit Upload your files at the webpage address of this homework on Moodle. The uploaded files can include at least the following: The source files (.c, .cpp, or .h files), A document file (.txt, .md, .docx) describing your work: The tasks that are successfully accomplished. Can your scanner correctly print the tokens of the .e files? Can your scanner find the lexical errors? For example, can your scanner report errors after adding some strange characters like # and ? into a .e file? The remaining problems. What is not implemented? You can record your scanner’s output via screen capture or text the bonus features that you have implemented … Your experience, how did you solve the difficulties that you met? About homework submission in a group: You are encouraged to do the homework alone. At most, three students can form a group to submit the homework together. One group only needs to submit the homework once by one student. The other members do not need to (better not) submit the homework again or just submit one .txt file saying who the group members are and who submitted the homework. In each file you submit, record the registered names in Chinese (if you are not an international student) and English letters (PinYin), each student’s classes, as comments. For example: 5….. x = 4\6; Deadline: Friday, December 2, 2022, 11:00 pm Plagiarism is not allowed References 1. “MAILLARDET’S AUTOMATON” https://www.fi.edu/history-resources/automaton 2. “Lexical Analysis in Compiler” https://binaryterms.com/lexical-analysis.html 3. “Among the Automata” https://www.theparisreview.org/blog/2012/05/22/among-the-automata/ 4. “Compiler Construction Principles and Practice”, Kenneth C. Louden, PWS Publishing Company, 1997. ISBN 0-534-93972-4 http://www.cs.sjsu.edu/~louden/cmptext/. /*
Programming Help, Add QQ: 749389476