Dirty work done and ready to fix the bugs.

bae8f63b · 李晓奇 · fdb812b7 · bae8f63b · bae8f63b · bae8f63b
Commit bae8f63b authored Sep 22, 2022 by 李晓奇
8 changed files
--- a/.gitignore
+++ b/.gitignore
 build
+Documentations/1-parser/*.pdf
+compile_commands.json
+.cache
--- a/Documentations/1-parser/Basics.md
+++ b/Documentations/1-parser/Basics.md
@@ -127,8 +127,6 @@
 #   >>>>>>>>>>>>>>>>>>token stream>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 ```

-
-
 我们以一个简单的单词数量统计的程序`wc.l`为详细介绍下`Flex`的功能和用法（请仔细看程序中的注释内容）:

 ```c
@@ -255,13 +253,13 @@ int main(void)
 ```

 另外有一些值得注意的点：
+
 1. Bison 传统上将 token 用大写单词表示，将 symbol 用小写字母表示。
 2. Bison 能且只能生成解析器源代码（一个 `.c` 文件），并且入口是 `yyparse`，所以为了让程序能跑起来，你需要手动提供 `main` 函数。
 3. Bison 不能检测你的 action code 是否正确——它只能检测文法的部分错误，其他代码都是原样粘贴到 `.c` 文件中。
 4. Bison 需要你提供一个 `yylex` 来获取下一个 token。
 5. Bison 需要你提供一个 `yyerror` 来提供合适的报错机制。

-
 另外，上面这个 `.y` 是可以工作的——尽管它只能接受两个字符串。把上面这段代码保存为 `reimu.y`，执行如下命令来构建这个程序：

 ```shell

--- a/Documentations/1-parser/Flex-matching.md
+++ b/Documentations/1-parser/Flex-matching.md
@@ -13,16 +13,10 @@ Note: if there is any discrepancy, please refer to `The flex Manual`.

 **************************

-
-
 When the generated scanner is run, it analyzes its input looking for strings which match any of its patterns.  If it finds more than one match, it takes the one matching the most text (for trailing context rules, this includes the length of the trailing part, even though it will then be returned to the input).  If it finds two or more matches of the same length, the rule listed first in the `flex` input file is chosen.

-
-
 Once the match is determined, the text corresponding to the match (called the "token") is made available in the global character pointer `yytext`, and its length in the global integer `yyleng`.  The "action" corresponding to the matched pattern is then executed, and then the remaining input is scanned for another match.

-
-
 If no match is found, then the "default rule" is executed: the next character in the input is considered matched and copied to the standard output.  Thus, the simplest valid `flex` input is:

 ```c
@@ -31,12 +25,8 @@ If no match is found, then the "default rule" is executed: the next character in

 which generates a scanner that simply copies its input (one character at a time) to its output.

-
-
 Note that `yytext` can be defined in two different ways: either as a character _pointer_ or as a character _array_.  You can control which definition `flex` uses by including one of the special directives `%pointer` or `%array` in the first (definitions) section of your flex input.  The default is `%pointer`, unless you use the `-l` lex compatibility option, in which case `yytext` will be an array.  The advantage of using `%pointer` is substantially faster scanning and no buffer overflow when matching very large tokens (unless you run out of dynamic memory).  The disadvantage is that you are restricted in how your actions can modify `yytext`, and calls to the `unput()` function destroys the present contents of `yytext`, which can be a considerable porting headache when moving between different `lex` versions.

-
-
 The advantage of `%array` is that you can then modify `yytext` to your heart‘s content, and calls to `unput()` do not destroy `yytext`. Furthermore, existing `lex` programs sometimes access `yytext` externally using declarations of the form:

 ```c
@@ -45,10 +35,6 @@ The advantage of `%array` is that you can then modify `yytext` to your heart‘s

 This definition is erroneous when used with `%pointer`, but correct for `%array`.

-
-
 The `%array` declaration defines `yytext` to be an array of `YYLMAX` characters, which defaults to a fairly large value.  You can change the size by simply #define'ing `YYLMAX` to a different value in the first section of your `flex` input.  As mentioned above, with `%pointer` yytext grows dynamically to accommodate large tokens.  While this means your `%pointer` scanner can accommodate very large tokens (such as matching entire blocks of comments), bear in mind that each time the scanner must resize `yytext` it also must rescan the entire token from the beginning, so matching such tokens can prove slow.  `yytext` presently does _not_ dynamically grow if a call to `unput()` results in too much text being pushed back; instead, a run-time error results.

-
-
 Also note that you cannot use `%array` with C++ scanner classes
--- a/Documentations/1-parser/Flex-regular-expressions.md
+++ b/Documentations/1-parser/Flex-regular-expressions.md
@@ -67,7 +67,6 @@ The patterns in the input  are written using an extended set of regular expressi
  
   exactly 4 `r`

-
 * `"[xyz]\"foo"`
  
   the literal string: `[xyz]"foo`
@@ -76,7 +75,6 @@ The patterns in the input  are written using an extended set of regular expressi
  
   if X is `a`, `b`, `f`, `n`, `r`, `t`, or `v`, then the ANSI-C interpretation of `\x`. Otherwise, a literal `X` (used to escape operators such as `*`)

-
 * `\0`
  
   a NUL character (ASCII code 0)
@@ -138,12 +136,10 @@ The patterns in the input  are written using an extended set of regular expressi
  
   omit everything within `()`. The first `)` character encountered ends the pattern. It is not possible to for the comment to contain a `)` character. The comment may span lines.

-
 * `rs`
  
   the regular expression `r` followed by the regular expression `s`; called "concatenation"

-
 * `r|s`
  
   either an `r` or an `s`
@@ -152,12 +148,10 @@ The patterns in the input  are written using an extended set of regular expressi
  
   an `r` but only if it is followed by an `s`. The text matched by `s` is included when determining whether this rule is the longest match, but is then returned to the input before the action is executed. So the action only sees the text matched by `r`. This type of pattern is called "trailing context". (There are some combinations of `r/s` that flex cannot match correctly.)

-
 * `^r`
  
   an `r`, but only at the beginning of a line (i.e., when just starting to scan, or right after a newline has been scanned).

-
 * `r$`
  
   an `r`, but only at the end of a line (i.e., just before a newline). Equivalent to `r/\n`.
@@ -184,8 +178,6 @@ The patterns in the input  are written using an extended set of regular expressi
  
   an end-of-file when in start condition `s1` or `s2`

-   
-
 Note that inside of a character class, all regular expression operators lose their special meaning except escape (`\`) and the character class operators, `-`, `]]`, and, at the beginning of the class, `^`.

 The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom. Those grouped together have equal precedence (see special note on the precedence of the repeat operator, `{}`, under the documentation for the `--posix` POSIX compliance option). For example,
@@ -219,16 +211,12 @@ For example, the following character classes are all equivalent:
 * `[[:alpha:][0-9]]`
 * `[a-zA-Z0-9]`

-
-
 A word of caution. Character classes are expanded immediately when seen in the `flex` input. This means the character classes are sensitive to the locale in which `flex` is executed, and the resulting scanner will not be sensitive to the runtime locale. This may or may not be desirable.

 * If your scanner is case-insensitive (the `-i` flag), then
  
   `[:upper:]` and `[:lower:]` are equivalent to `[:alpha:]`.

-   
-
 * Character classes with ranges, such as `[a-Z]`, should be used with caution in a case-insensitive scanner if the range spans upper or lowercase characters. Flex does not know if you want to fold all upper and lowercase characters together, or if you want the literal numeric range specified (with no case folding). When in doubt, flex will assume that you meant the literal numeric range, and will issue a warning. The exception to this rule is a character range such as `[a-z]` or `[S-W]` where it is obvious that you want case-folding to occur. Here are some examples with the `-i` flag enabled:
  
  | Range   | Result    | Literal Range      | Alternate Range     |
@@ -239,8 +227,6 @@ A word of caution. Character classes are expanded immediately when seen in the `
  | `[_-{]` | ambiguous | `[_'a-z{]`         | `[_'a-zA-Z{]`       |
  | `[@-C]` | ambiguous | `[@ABC]`           | `[@A-Z\[\\\]_'abc]` |

-   
-
 * A negated character class such as the example `[^A-Z]` above _will_ match a newline unless `\n` (or an equivalent escape sequence) is one of the characters explicitly present in the negated character class (e.g., `[^A-Z\n]`). This is unlike how many other regular expression tools treat negated character classes, but unfortunately the inconsistency is historically entrenched. Matching newlines means that a pattern like `[^"]*` can match the entire input unless there`s another quote in the input.
  
   Flex allows negation of character class expressions by prepending `^` to the POSIX character class name.
@@ -255,21 +241,12 @@ A word of caution. Character classes are expanded immediately when seen in the `
  
   Flex will issue a warning if the expressions `[:^upper:]` and`[:^lower:]` appear in a case-insensitive scanner, since their meaning is unclear. The current behavior is to skip them entirely, but this may change without notice in future revisions of flex.

-
-
 * The `{-}` operator computes the difference of two character classes. For example, `[a-c]{-}[b-z]` represents all the characters in the class `[a-c]` that are not in the class `[b-z]` (which in this case, is just the single character `a`). The `{-}` operator is left associative, so `[abc]{-}[b]{-}[c]` is the same as `[a]`. Be careful not to accidentally create an empty set, which will never match.

-
-
-
 * The `{+}` operator computes the union of two character classes. For example, `[a-z]{+}[0-9]` is the same as `[a-z0-9]`. This operator is useful when preceded by the result of a difference operation, as in, `[[:alpha:]]{-}[[:lower:]]{+}[q]`, which is equivalent to `[A-Zq]` in the "C" locale.

-  
-
 * A rule can have at most one instance of trailing context (the `/` operator or the `$` operator). The start condition, `^`, and `<<EOF>>` patterns can only occur at the beginning of a pattern, and, as well as with `/` and `$`, cannot be grouped inside parentheses. A `^` which does not occur at the beginning of a rule or a `$` which does not occur at the end of a rule loses its special properties and is treated as a normal character.

-  
-
 * The following are invalid:
  
  `foo/bar$`
@@ -278,8 +255,6 @@ A word of caution. Character classes are expanded immediately when seen in the `
  
  Note that the first of these can be written `foo/bar\n`.

-  
-
 * The following will result in `$` or `^` being treated as a normal character:
  
  `foo|(bar$)`
@@ -294,16 +269,3 @@ A word of caution. Character classes are expanded immediately when seen in the `
  ```
  
  A similar trick will work for matching a `foo` or a `bar`-at-the-beginning-of-a-line.
-
-
-
-
-
-
-
-
-
-
-
-
-
--- a/Documentations/1-parser/Parser-FurtherReadings.md
+++ b/Documentations/1-parser/Parser-FurtherReadings.md
@@ -59,6 +59,7 @@ identifier("abc123") ==> (Some("abc123"), "")  这里返回的是字符串 "abc1
 2. `or(p,q)`: 表示首先尝试 p，如果成功则返回结果，否则接着尝试 q，否则失败。

 那么就可以定义
+
 ```
 factor = or( 
  seq(number,identifier).map { Expr.Mul(Expr.Const(#1), Expr.Val(#2)) },
@@ -159,11 +160,11 @@ var var = 1;  // 旧版报错
 考虑如下的例子

 | a₁  | a₂  | a₃  |
-| ----- | ----- | ----- |
+| --- | --- | --- |
 | a   | ab  | bba |

 | b₁  | b₂  | b₃  |
-| ----- | ----- | ----- |
+| --- | --- | --- |
 | baa | aa  | bb  |

 对这组输入来说，这个问题是有解的，因为 a₃a₂a₃a₁ = b₃b₂b₃b₁。
@@ -172,11 +173,9 @@ var var = 1;  // 旧版报错

 **然而**，这个问题是不可能机械求解的！不可能写出一个程序来判定这个问题。事实上，不可判定问题无处不在，[莱斯定理](https://en.wikipedia.org/wiki/Rice%27s_theorem)告诉我们，任何non-trivial程序的属性都是不可判定的。

-
 ## 在线解析

 学到这里，虽说大家已经可以写 parser 了，但是这在工程实践上却还不够。比如说，IDE 为了提供准确的实时报错、自动补全、代码缩进，都需要在用户编辑代码时立即提供语法树。仅仅利用 lab2 这种简单的离线解析器是完全不能满足使用的。在编辑代码时，大部分时间代码都是语法甚至词法不正确的，必须考虑到各种错误情形，并保证不会搞乱代码。此外，在提供自动缩进时，后方的错误不应该影响到前方代码的缩进。还有一个问题是，离线解析需要从头构建语法树，代价较高。受到这种“在线解析”需求的启发，涌现了不少很有实用价值的工作，比如：

 1. [tree-sitter](https://github.com/tree-sitter/tree-sitter): incremental parser 框架，总是在内存中维护完整的语法树。
 2. [Auto-indentation with incomplete information](https://arxiv.org/ftp/arxiv/papers/2006/2006.03103.pdf): 基于 Operator precedence parser 的用于代码缩进的框架，支持局部前向解析。尽管并不维护完整的语法树，但由于每次解析量很少，所以速度足够快。
-
--- a/Documentations/1-parser/README.md
+++ b/Documentations/1-parser/README.md
@@ -47,6 +47,7 @@ Token         Text      Line    Column (Start,End)
 具体的需识别token请参考[基础知识](./Basics.md)。

 提示：
+
 1. 在编写本部分前，需要首先修改 `.y` 文件。具体怎么做请参见[基础知识](./Basics.md)。
 2. 在进入实验下一部分前，你可以使用我们提供的 `lexer` 程序进行调试。参见本文档 3.2 节。
 3. token编号是自动生成的，`make` 后，可在 `build/syntax_analyzer.h` 中找到。每次修改token后，都应该重新 `make` 后再进行对照。
@@ -165,6 +166,7 @@ int main(void) {
  ```
  
  如果构建成功，会在该目录下看到 `lexer` 和 `parser` 两个可执行文件。
+  
    * `lexer`用于词法分析，产生token stream；对于词法分析结果，我们不做考察
    * `parser`用于语法分析，产生语法树。

@@ -223,12 +225,14 @@ int main(void) {
  本实验的提交要求分为两部分：实验部分的文件和报告，git提交的规范性。
  
    * 实验部分
+      
        * 需要完善 `src/parser/lexical_analyzer.l` 和 `src/parser/syntax_analyzer.y`。
        * 需要在 `Reports/1-parser/README.md` 中撰写实验报告。
            * 实验报告内容包括
                * 实验要求、实验难点、实验设计、实验结果验证、实验反馈
  
    * git 提交规范
+      
        * 不破坏目录结构（实验报告所需图片放在目录中）
        * 不上传临时文件（凡是可以自动生成的都不要上传，如 `build` 目录、测试时自动生成的输出、`.DS_Store` 等）
        * git log 言之有物
@@ -238,6 +242,7 @@ int main(void) {
    * 代码提交：本次实验需要在希冀课程平台上发布的作业[Lab1-代码提交](http://cscourse.ustc.edu.cn/assignment/index.jsp?courseID=17&assignID=54)提交自己仓库的 gitlab 链接（注：由于平台限制，请提交http协议格式的仓库链接。例：学号为 PB011001 的同学，Lab1 的实验仓库地址为`http://202.38.79.174/PB011001/2022fall-compiler_cminus.git`），我们会收集最后一次提交的评测分数，作为最终代码得分。
  
    * 实验评测
+      
        * 除已提供的 easy, normal, hard 数据集之外，平台会使用额外的隐藏测试用例进行测试。  
  
    * 报告提交：将 `Reports/1-parser/README.md` 导出成 pdf 文件单独提交到[Lab1-报告提交](http://cscourse.ustc.edu.cn/assignment/index.jsp?courseID=17&assignID=54)。
@@ -268,6 +273,7 @@ int main(void) {

 * 评分标准：
  实验一最终分数组成如下：
+  
    * 平台测试得分：(70分)
    * 实验报告得分：(30分)  
      注：禁止执行恶意代码，违者0分处理。  

--- a/src/parser/lexical_analyzer.l
+++ b/src/parser/lexical_analyzer.l
@@ -12,19 +12,54 @@ int pos_start;
 int pos_end;

 void pass_node(char *text){
-     yylval.node = new_syntax_tree_node(text);
+     yylval.value = new_syntax_tree_node(text);
 }

 /*****************声明和选项设置  end*****************/

 %}

+letter [a-zA-Z]
+digit [0-9]
+ID {letter}+
+INTEGER {digit}+
+FLOAT {digit}+\. | {digit}*\.{digit}+
+
+/*
+
+%token <node> _IF _ELSE _WHILE _RETURN _INT _FLOAT _VOID 
+%token <node> _ASSIGN _RELOP _ADD_OP _MUL_OP
+%token <node> _L_SQUARE _R_SQUARE _L_PARE _R_PARE _L_BRACKET _R_BRACKET
+%token <node> _SEMI _COMMA _ID _INTEGER _FLOATPOINT
+
+*/

 %%
- /* to do for students */
- /* two cases for you, pass_node will send flex's token to bison */
-\+ 	{pos_start = pos_end; pos_end += 1; pass_node(yytext); return ADD;}
-. { pos_start = pos_end; pos_end++; return ERROR; }
+if {pos_start = pos_end; pos_end += 2; pass_node("if"); return _IF;}
+else {pos_start = pos_end; pos_end += 4; pass_node("else"); return _ELSE;}
+while {pos_start = pos_end; pos_end += 5; pass_node("while"); return _WHILE;}
+return {pos_start = pos_end; pos_end += 6; pass_node("return"); return _RETURN;}
+int {pos_start = pos_end; pos_end += 3; pass_node("int"); return _INT;}
+float {pos_start = pos_end; pos_end += 5; pass_node("float"); return _FLOAT;}
+void {pos_start = pos_end; pos_end += 4; pass_node("void"); return _VOID;}
+
+{ID} {pos_start = pos_end; pos_end += yyleng; pass_node(yytext); return _ID;}
+{INTEGER} {pos_start = pos_end; pos_end += yyleng; pass_node(yytext); return _INTEGER;}
+{FLOAT} {pos_start = pos_end; pos_end += yyleng; pass_node(yytext); return _FLOATPOINT;}
+
+\= 	{pos_start = pos_end; pos_end += 1; pass_node("="); return _ASSIGN;}
+"<=" | ">=" | "<" | ">" | "==" | "!=" {pos_start = pos_end; pos_end += yyleng; pass_node(yytext); return _RELOP;}
+"+" | "-" {pos_start = pos_end; pos_end += 1; pass_node(yytext); return _ADD_OP;}
+"*" | "/" {pos_start = pos_end; pos_end += 1; pass_node(yytext); return _MUL_OP;}
+
+\[ | \] {pos_start = pos_end; pos_end += 1; pass_node(yytext); return yytext[0] == '[' ? _L_SQUARE : _R_SQUARE;}
+\( | \) {pos_start = pos_end; pos_end += 1; pass_node(yytext); return yytext[0] == '(' ? _L_PARE : _R_PARE;}
+\{ | \} {pos_start = pos_end; pos_end += 1; pass_node(yytext); return yytext[0] == '{' ? _L_BRACKET : _R_BRACKET;}
+
+"," | ";" {pos_start = pos_end; pos_end += 1; pass_node(yytext); return yytext[0] == ',' ? _COMMA : _SEMI;}

- /****请在此补全所有flex的模式与动作  end******/
+" " | \t { pos_end++; }
+\r\n | \n | \r { lines++; pos_end = 0;}
+/* . { pos_start = pos_end; pos_end++; return ERROR; } */
+/****请在此补全所有flex的模式与动作  end******/
 %%
--- a/src/parser/syntax_analyzer.y
+++ b/src/parser/syntax_analyzer.y
@@ -14,7 +14,7 @@ extern FILE * yyin;

 // external variables from lexical_analyzer module
 extern int lines;
-extern char * yytext;
+extern char *yytext;
 extern int pos_end;
 extern int pos_start;

@@ -31,24 +31,176 @@ syntax_tree_node *node(const char *node_name, int children_num, ...);
 /* TODO: Complete this definition.
   Hint: See pass_node(), node(), and syntax_tree.h.
         Use forward declaring. */
-%union {}
+%union {
+    node value;
+}

 /* TODO: Your tokens here. */
+/*
+alias:
+- SPEC: SPECIFIER
+- DEC:DECLARATION
+- COM: COMPOUND
+- STMT: STATEMENT
+- EXPR: EXPRESSION
+- ITER: ITERATION
+- SELC: SELCTION
+- RET: RETURN
+- Tokens starting with '_' is the terminator
+*/
 %token <node> ERROR
-%token <node> ADD
+%type <node> TYPE_SPEC RELOP ADDOP MULOP
+%type <node> DEC_LIST DEC VAR_DEC FUN_DEC LOCAL_DEC
+%type <node> COM_STMT STMT_LIST STMT EXPR_STMT ITER_STMT SELC_STMT RET_STMT
+%type <node> EXPR SIMPLE_EXPR VAR ADD_EXPR TERM FACTOR INTEGER FLOAT CALL
+%type <node> PARAM PARAMS PARAM_LIST ARGS ARG_LIST
+/* These are for flex to return
+NOTE: Though combining _LE _LT _BT _BE _EQ _NEQ to _RELOP makes the program simpler,
+    it may not satisfy the subsequent requirements.
+*/
+%token <node> _IF _ELSE _WHILE _RETURN _INT _FLOAT _VOID 
+%token <node> _ASSIGN _RELOP _ADD_OP _MUL_OP
+%token <node> _L_SQUARE _R_SQUARE _L_PARE _R_PARE _L_BRACKET _R_BRACKET
+%token <node> _SEMI _COMMA _ID _INTEGER _FLOATPOINT
+
 %type <node> program

 %start program

-%%
 /* TODO: Your rules here. */
+%%
+
+program: DEC_LIST {$$ = node("program", 1, $1); gt->root = $$;}
+       ;
+
+DEC_LIST: DEC_LIST DEC {$$ = node("declaration-list", 2, $1, $2); }
+        | DEC {$$ = node("declaration-list", 1, $1);}
+        ;
+
+DEC: VAR_DEC {$$ = node("declaration", 1, $1); }
+   | FUN_DEC {$$ = node("declaration", 1, $1); }
+   ;
+
+VAR_DEC: TYPE_SPEC _ID _SEMI {$$ = node("var-declaration", 3, $1, $2, $3); }
+       | TYPE_SPEC _ID _L_BRACKET _INTEGER _R_BRACKET _SEMI {$$ = node("var-declaration", 6, $1, $2, $3, $4, $5, $6); }
+       ;
+
+TYPE_SPEC: _INT {$$ = node("type-specifier", 1, $1); }
+         | _FLOAT {$$ = node("type-specifier", 1, $1); }
+         | _VOID {$$ = node("type-specifier", 1, $1); }
+         ;
+
+FUN_DEC: TYPE_SPEC _ID _L_PARE PARAMS _R_PARE COM_STMT {$$ = node("fun-declaration", 6, $1, $2, $3, $4, $5, $6); }
+       ;
+
+PARAMS: PARAM_LIST {$$ = node("params", 1, $1); }
+      | _VOID {$$ = node("params", 1, $1); }
+      ;
+
+PARAM_LIST: PARAM_LIST _COMMA PARAM {$$ = node("param-list", 3, $1, $2, $3); }
+          | PARAM {$$ = node("param-list", 1, $1); }
+          ;
+
+
+PARAM: TYPE_SPEC _ID {$$ = node("param", 2, $1, $2); }
+     | TYPE_SPEC _ID _L_SQUARE _R_SQUARE {$$ = node("param", 4, $1, $2, $3, $4);}
+     ;
+
+COM_STMT: _L_BRACKET LOCAL_DEC STMT_LIST _R_BRACKET {$$ = node("compound-stmt", 4, $1, $2, $3, $4);}
+        ;
+
+LOCAL_DEC: LOCAL_DEC VAR_DEC {$$ = node("local-declarations", 2, $1, $2);}
+         | {$$ = node("local-declarations", 0);}
+         ;
+
+STMT_LIST: STMT_LIST STMT {$$ = node("statement-list", 2, $1, $2);}
+         | {$$ = node("statement-list", 0);}
+         ;
+
+STMT: EXPR_STMT {$$ = node("statement", 1, $1);}
+    | COM_STMT {$$ = node("statement", 1, $1);}
+    | SELC_STMT {$$ = node("statement", 1, $1);}
+    | ITER_STMT {$$ = node("statement", 1, $1);}
+    | RET_STMT {$$ = node("statement", 1, $1);}
+    ;
+
+EXPR_STMT: EXPR _SEMI {$$ = node("expression-stmt", 2, $1, $2);}
+         | _SEMI {$$ = node("expression-stmt", 1, $1);}
+         ;
+
+SELC_STMT: _IF _L_PARE EXPR _R_PARE STMT {$$ = node("selection-stmt", 5, $1, $2, $3, $4, $5);}
+         | _IF _L_PARE EXPR _R_PARE STMT _ELSE STMT {$$ = node("selection-stmt", 7, $1, $2, $3, $4, $5, $6, $7);}
+         ;
+
+ITER_STMT: _WHILE _L_PARE EXPR _R_PARE STMT {$$ = node("iteration-stmt", 5, $1, $2, $3, $4, $5);}
+         ;
+
+RET_STMT:  _RETURN _SEMI {$$ = node("return-stmt", 2, $1, $2);}
+        | _RETURN EXPR _SEMI {$$ = node("return-stmt", 3, $1, $2, $3);}
+        ;
+
+EXPR:  VAR _ASSIGN EXPR {$$ = node("expression", 3, $1, $2, $3);}
+    | SIMPLE_EXPR {$$ = node("expression", 1, $1);}
+    ;

-/* Example:
-program: declaration-list {$$ = node( "program", 1, $1); gt->root = $$;}
+VAR: _ID {$$ = node("var", 1, $1);}
+   | _ID _L_SQUARE EXPR _R_SQUARE {$$ = node("var", 4, $1, $2, $3, $4);}
+   ;
+
+SIMPLE_EXPR: ADD_EXPR RELOP ADD_EXPR {$$ = node("simple-expression", 3, $1, $2, $3);}
+           | ADD_EXPR {$$ = node("simple-expression", 1, $1);}
+           ;
+
+RELOP: _RELOP {$$ = node("relop", 1, $1);}
+     ;
+/*
+RELOP:  _LE {$$ = node("relop", 1, $1);}
+     | _LT {$$ = node("relop", 1, $1);}
+     | _GT {$$ = node("relop", 1, $1);}
+     | _GE {$$ = node("relop", 1, $1);}
+     | _EQ {$$ = node("relop", 1, $1);}
+     | _NEQ {$$ = node("relop", 1, $1);}
     ;
 */

-program : ;
+ADD_EXPR: ADD_EXPR ADDOP TERM {$$ = node("additive-expression", 3, $1, $2, $3);}
+        | TERM {$$ = node("additive-expression", 1, $1);}
+        ;
+
+ADDOP: _ADD_OP {$$ = node("addop", 1, $1);}
+     ;
+
+TERM: TERM MULOP FACTOR {$$ = node("term", 3, $1, $2, $3);}
+    | FACTOR {$$ = node("term", 1, $1);}
+    ;
+
+MULOP:  _MUL_OP {$$ = node("mulop", 1, $1);}
+     ;
+
+FACTOR: _L_PARE EXPR _R_PARE {$$ = node("factor", 3, $1, $2, $3);}
+      | VAR {$$ = node("factor", 1, $1);}
+      | CALL {$$ = node("factor", 1, $1);}
+      | INTEGER {$$ = node("factor", 1, $1);}
+      | FLOAT {$$ = node("factor", 1, $1);}
+      ;
+
+INTEGER: _INTEGER {$$ = node("integer", 1, $1);}
+       ;
+
+FLOAT: _FLOATPOINT {$$ = node("float", 1, $1);}
+     ;
+
+CALL: _ID _L_PARE ARGS _R_PARE {$$ = node("call", 4, $1, $2, $3, $4);}
+    ;
+
+ARGS: ARG_LIST {$$ = node("args", 1, $1);}
+    | {$$ = node("args", 0);}
+    ;
+
+ARG_LIST: ARG_LIST _COMMA EXPR {$$ = node("arg-list", 3, $1, $2, $3);}
+        | EXPR {$$ = node("arg-list", 1, $1);}
+        ;
+

 %%

@@ -75,7 +227,7 @@ syntax_tree *parse(const char *input_path)
        yyin = stdin;
    }

-    lines = pos_start = pos_end = 1;
+lines = pos_start = pos_end = 1;
    gt = new_syntax_tree();
    yyrestart(yyin);
    yyparse();