The Language RR
MUST compiler course
1. Motivation
We design a language that can conveniently express computations tasks, like computing expressions. Especially it can specify and carry out computations based on rules. Therefore, the name of the language is RR, which stands for Rapid Rules, Rolling Rules, or Rapid Reasoning, for fun. An exciting logo is found on the internet [2].
The motivation for designing the language is to bring some best ideas of LRR [1], a term-rewriting language, and C. So, the features of RR include two parts:
A simplified and modified version of C.
A modified and enhanced version of LRR, especially data structures like an array, is allowed.
2. Comments
/* multiple-line
comment */
//one-line comment
Same as in C.
3. Expressions
3.1 Expression data types
char : The character type, same as in C, is an unsigned integer of 1 byte.
char x; /* x is a char variable */
x = ‘a’; /* ‘a’ is a constant */
int : The integer type, same as int in C, can be positive or negative, with 4 bytes.
int x, y; /* two int variables */
x = -5000; /* -5000 is an int constant */
num : The number type represents a fraction. It replaces the float and double types in C.
A fraction, described in math notation like bt , can be represented as a num in the language E, in which the t the numerator is called the top, while b the denominator is called the bot . For a num , its bot is always a positive int , while its top can be any int .
A constant of a num has the form top\bot , where a bot cannot be 0 or negative. For example,
20\27 , -3\5 .
A num constant can have a suffix like e5 and e-1 , which means the 5th power of 10 and 1/10, respectively.
address type: each type has its corresponding address type is a different type, like address of char, address of int, and so on.
num n = 3\5 ; /* variables can be initialized in its definition statement */
num y = n * -99\100 ; /* numbers can be applied with math operations */
num z = 31415926e-1 /* it means z is 3.1415926 * 10^(-1)*/
int x ; /* declare a variable of type int */
num n = 3/5 ;
num * np ; /* np is a pointer to a num variable */
np = &n; /* &n computes the address of the num variable n */
int * xp = &x;
char * yp;
For simplicity of the compiler project, we do not need to consider the address of another pointer.
Arrays are defined in the same way as in C. For example, with the form
typeName arrayName [positive_int_const]
For the simplicity of the compiler project, a nested array does not need to be considered in the RR language.
An array name, as an expression, its value can represent the address of its first element when an address is needed, same as in C.
An array can be initialized in an array declaration statement like :
3.3 String
Like in C, a string is recorded in an array of characters whose content ends with a null character ‘\0’. Same as in C, a string constant is surrounded by double quotes in the form of “some string here” . A string constant can be used in several cases, including:
1. Initializing a character array with a string. For example:
2. As an argument of some function call
Like in C, there is no variable with string type; only character arrays are used to save string.
3.4 operators
The operators are selected from those of C, therefore, with similar meanings and features. The selection of the operators has based on the following considerations:
void print_message(char * arr);
char greeting[80] = “How do you do.”;
print_message(greeting); /* array name can be used as an address of char */
int arr[8] = {1, 2, 3, 4, 5, 6, 7}; /* The missing part element is 0 */
num arr2[] = {1\2, 2\3, -3\4, -4\5, 50\99 } ; /* an array of num with 5 elements */
char name[20] = “Alice”;
/* name contains the six characters of ‘A’, ‘l’, ‘i’, ‘c’, ‘e’, ‘\0’, same as: */
char name2[20] = {‘A’ ‘l’ ‘i’, ‘c’, ‘e’, ‘\0’} ; /* the extra elements are all ‘\0’ */
/* similarly */
char name3[] = {‘A’ ‘l’ ‘i’, ‘c’, ‘e’, ‘\0’} ; /* an array of 6 elements */
ps(“how are you”); /* the string “how are you” is printed on the screen, without the double q
ps(name) ; /* name is the array defined above. Alice is printed on the screen. */
The operators are sufficient for complex math computations.
Some operators are not considered for the simplicity of the compiler project.
For example, the function call operator () of the C language [4] is not mentioned in E because it is easier for writing the compiler. For the C++ compiler, this operator should be addressed. Another example is the unary plus operator ‘+’. This operator makes no difference in computation. For instance, +5 is the same as 5 , y * +(5+6) is the same as y * (5+6) . Therefore, this operator is ignored in E, while the unary-operator (negation) is considered in RR.
Precedence Operator Description arity associativity
1 [ ] array subscripting 2 left-to-right
2 – unary minus (negation) 1 right-to-left
! logical NOT 1
* Indirection, (dereference) 1
& Address of 1
multiplication, division, and remainder (modulor)
left-to-right
4 + – addition , substraction 2 left-to-right
for relational operators of <, ≤, >, and ≥
left-to-right
6 == != for relational equal = and non-equal = 2 left-to-right
7 && logical AND 2 left-to-right
8 || logical OR 2 left-to-right
9 = assignment 2 right-to-left
10 , comma 2 left-to-right
3.5 Expressions
Expressions are defined the say as in C. An expression is one of the following kind:
An atomic expression:
A constant: like 5 3\5
Github
A name (variable name, array name): like x , .
The naming policy of a variable is the same as in C: a sequence of letters, numbers, and underscore _ , while the starting character is a letter of the underscore.
An operator expression constructed by connecting some smaller expression(s) with an operator.
a function call.
an expression surrounded by ( ) , like: ( 3+4/5 ) .
3.6 Function calls
Functions in RR, also called regular functions, which are different from term functions, are the same as in C.
The function definition has the following form:
some_array
returnType functionName (parameters) {
statements
The return type of a function can be void or any type.
The body of a function definition is a sequence of statements that does not contain another function
definition.
A function call is the same as in C, like max(x, 3\5) . It is an expression with the type indicated by the definition of the function.
3.7 Input and output of expression data
For expressions, there are only two built-in functions prt and get . The prototypes of these two functions are as follows.
// pattern of call: prt(string, address)
void prt(char *, any *); // prototype
// pattern of call: get(address)
void get(any *);
Examples of calling the two functions are the following:
char name[80];
prt(“str”, “please input an int \n”);
get(“int”, &x)
prt(“str”, “The value of x is \n”);
prt(“int”, x);
prt(“str”, “what is your name?”);
get(“str”, name) ; // array name can represent the address of its first element.
prt(“str”, “hello ” + name); // + concates two strings.
4.1 Term syntax
Syntactically, a term is defined as follows:
A constant of an expression-data type, e.g., 5 , -3\5 , ‘a’ .
A variable declared with the type tm , e.g.,
A variable declared with an expression type. For example:
An unreduced term constant starts with a back quote ` followed by a regular term. For example:
A term-function call, in the form of funcName(term_1, term_2…, term_n) , where funcName has two different kinds
1. funcName is either can be a built-in term-function name, like the operators +, -, * , \ . For each operator of an expression (Section 2.6), there is a built-in term-function (the prefix version of the operator).
2. or a program-declared term-function name, like fib . For example:
tmx; //xisaterm
// x is a term , also an expression
tm x; // a term variable.
tm y = `fib(2, 3) ; // y is assigned with a unreduced term (or a raw term) starts with a bac
tm z = `:(+(5, 6), x) ; // another unreduced term is assigned to z.
+(5, 6) // + is a built-in term-function
fib(7) // fib is a declared term-function
Code Help, Add WeChat: cstutorcs
The n terms inside ( ) are the arguments. The number n must match the number of
parameters in the declaration of the term function.
When a term t is surrounded by (), ,.i.e. (t), it means the reduced form of the term t. For example,
(`fib(5, 6)) means the fully computed from of the term fib(5, 6) .
The value of a term-function call is a term t , which is the result of applying a rule (Section 4.3) where fits into the LHS of the rule, and the t is the RHS. If no rule is applicable, then the result of calling f(someterms) is itself, which is also a term.
4.2 Term declarations
Terms declaration statements have three kinds:
parameter-variable declaration, in the form of: var: x, y, z; . Here, each var is some variable that can appear in an LHS of a rule. They are like parameters of a term function.
constant declaration, in the form of constant: a, b, c; . Each constant is like a symbolic constant, with the type tm .
term-function declaration, in the form of func: f1(2), f2(int, tm), int f3(tm, tm); . More specifically, there are several forms. :
When only the number parameters is given, it means each of its parameter is of the generic term type tm . For example, f1(2) , it means a term-function f1 , and each call of f1 needs to arguments of the type tm .
When specific types of parameters are given, it means its arguments should have the specific type. For example, f2(int, tm) , it means the first argument of calling f2 should be an int , and the second argument should has the type tm . The declaration f3(tm, tm) is the same as
When the return type of a term-function is not specified, like , its return type is a generic term type tm . When some specific return type is shown, like , it means the term function f3 should be reduced to a term with the type f5 .
The declaration statements in a file should appear in a declaration block in the form of decl{ } . For example:
f(someterms)
f(someterms)
int f5(2))
constant: nil, nul;
var: x, y, z;
func: fib(int, int), f(2), int f3(tm, char);
Each rule statement has the form of , where LHS is a term function call, and RHS is a term. Each term-variable appearing on the RHS should also appear on the LHS.
All rule statements in a file should appear in a rule block of the form rule{ } . For example:
LHS => RHS ;
sum(n) => s(<(n, 1), n) ; // sum(n) computes 1+2+…n
s(true, n) => 0;
s(false, n) => +(n, sum(-(n, 1)));
4.4 Relationships between terms and expressions
Different kinds of terms and expressions are named as follows:
There are expressions that are also terms, which are called term-expressions, including those atomic expressions, i.e. the constant and variables of expression types.
The other expressions, including the operator expressions and function calls, are not terms. These are called the only-expressions
There are terms that are not expressions, including the variables declared with the type tm , and the calls of term-functions. These terms are called the only-terms.
Because the computation of terms and expressions are carried out in different ways the relationship between terms and expressions are allowed by some restrictive rules, to avoid meaningless computation. These rules are the following:
In an only-expression, an only-term cannot appear as a sub-expression. In an only-term, an only-expression cannot appear as a sub-term.
// fib , sometm, and : are three term functions
decl{ fib(2), sometm(1), :(2); }
int f(int) ; // f is declared as regular function
fib(5,6) + 7; // not allowed, an operator expression cannot include a term-function call
+(fib(5,6), 7) ; // allowed.
+(f(5), 7) ; // not allowed, a term cannot contain a regular function call
f(5) + 7 // allowed.
x = fib(5, 6) // not allowed
=(x, fib(5, 6)) // allowed.
4.4.1 Communication between the computation of terms and expressions
A variable can be used to link the computation of terms and expressions. For example:
int y = someRegularFunc(99); //
=(x, someTermFunc(y, 5)); // Change x by some term function call, where y is used as an argument
prt(“str”, “x = ” + x + “\n”);
4.5) Input and output of terms
For terms, there are two built-in term functions, prttm and gettm . Their declarations are like
tm gettm(0);
tm prttm(tm)
Examples of calling these functions:
prt(“str”, “please input a term \n”);
=(t, gettm());
prt(“str”, ” the term is \n”);
more examples are shown Examples are shown in the file io.rr . 5) Statements
Reguar statements, i.e., the statements that can appear in a regular function, are constructed similarly to the statements in C, including the following kinds:
Declaration statement.
Expression statement, e.g., an expression followed by a ; . Especially a call of a regular function is also an expression statement.
num x; /* variable declaration */`
num arr[8]; /*array declaration */`
void fun(num x); /*function declaration */` tmt; //tisaterm.
x = y + 3; /* expression statement */
somefunction(void); /* a function call is an expression*/
some_function(3, 3\5);
3+5; // a meaningless expression statement
Null statement, which is a simple ; . Selection statement:
The if statement is the same as in C. It optionally has an else part. Loop statement:
The while statement is the same as in C.
compound statement is a block with { and } at the two ends.
A term-function call statement, which has the form: someTermFunc(tm1, tm2…) ;
Note that a function definition is not considered a statement.
6. keywords
char, int, num, while, if, else, return, decl, rule, var,
keyword usage
char a type
int a type
num a type
str a type. str x is the same as char * x .
bool a type, the boolean type
false a boolean constant
true a boolean constant
tm the type of general terms.
any the type of general expressions.
while while loop
if selection statment
else the else part of if
return in a regular function, like in C.
decl the declaration block of terms
rule the rule block of terms.
keyword usage
var declaring the parameter-var names
constant declaring the term constants
ep a special term representing the empty term
export the export statment, exporting a module of names
import the import statement, importing a module of names.
7. Import and Export of names
Inside f.rr , inorder to allow a set of names to be used by other files, put these names inside an export- block, of the form:
export moduleName{
name1, name2, name3;
Inside f.rr , inorder to use set of names that are declared in some other file and exported as a module, use an import statment to import these names, of the form:
import moduleName;
8. Program files of a RR program
A RR file is a sequence of items that are executed consequently. These items could include:
An import statement
An export block
A term-declaration block
A rule block
One or many global name declaration, which can be
a regular function declaration (a.k.a. function prototype) a global variable declaration
a global array declaration.
one or many regular function definitions
one or many statements of calling a regular function or a term function.
Code Help
RR_Program_examples
Unlike C, there is need to have a main function as the entrance of a program. The execution of a program starts at the first item.
9. Program examples
Each RR program file has an extension name .rr . There are several .rr files in the folder .
10. Runtime behavior
The calls of regular functions are carried out like in C, i.e., the runtime environment will put a frame on the stack memory for the function call, and when the computation of the call ends, the frame is removed from the stack.
The calls of term functions are carried out according to the specified rules, i.e., when a term matches the LHS of a rule, then the result is the corresponding RHS of the rule; this is called a step of term reduction. There are possibly different ways to implement term reduction efficiently. A simple way is to do it like expression computation.
The semantic analysis, including type checking, of regular functions is carried out like in C. The semantic analysis for the code of terms should be done in some reasonable way.
The usage of names has the following restrictions:
A name should be declared before its usage.
The exception is the export block. The names exported can be declared later in a file after the
export block.
A usage of a name should match its declaration.
For example, a call of a function should match the prototype of the function.
10. Ambiguity of RR
There are unspecified aspects of the language RR. A compiler of RR may customize the computation result for unspecified code meaning.
Especially for term reduction, when a term can be applied with different rules, which produce different eventual results, the code is ambiguous. In this situation, a compiler may report an error or specify some precedence among rules to decide which rule should be called.
The semantics of rule reduction and efficient implementation have open research topics.
References
1. “Webpage of Professor Rakesh Verma and the LRR project”
http://www2.cs.uh.edu/~rmverma/research.html
2. “Rolls-Royce Holdings” https://en.wikipedia.org/wiki/Rolls-Royce_Holdings
3. “Is it possible to do modulo of a fraction” https://math.stackexchange.com/questions/864568/is-it-
possible-to-do-modulo-of-a-fraction
4. “C Operator Precedence” https://en.cppreference.com/w/c/language/operator_precedence
5. “Conversion and Function Call Operator in C++” https://mymusing.co/conversion-and-function-call-
operator-in-c/