In the previous unit, we discussed the handling of variables. Now, these variables appeared in the context of expressions. And I kind of waved my hands around the translation of these expressions into VM code. Well, in this unit, I'm going to discuss this translation in detail. So we're going to talk about how the compiler handles expressions. Now, to get started, I'd like to go back to basics and present to you a subset of the Jack grammar that describes or defines expressions. And here, I simply lifted a snapshot from the book that accompanies this course and simply put it on the slide. And what we see here, once again, is the grammatical definition of what it takes to be an expression. Now, we discussed this in the previous module. I'm not going to repeat it. But I just want you to observe that every one of the expressions that we see here will qualify as a legitimate expression according to this grammar. For example, take the number 5. Well, I look at the grammar, and first of all, I observe that 5 is an integer constant. Now, if you look at the second rule, you will see that an integer constant is a term. And according to the first rule, a term is an expression. And therefore, 5 is an expression. What about x? Well, x is a variable name. And according to the same second rule, a variable name is a term. And term, once again, is an expression. What about x+5? Well, x is an expression, or x is a term, and + is an op. And 5 is yet another term. So once again, it matches the pattern of term followed by an optional op and yet another term. So using the same rationale, every one of the expressions that you see here will qualify or will be accepted by the Jack grammar. Now, as you can imagine, I can invent an unlimited number or an infinite number of such expressions, and some of them will be highly complex. And the question that we face as compiler writers is, how can we systematically translate any one of these versatile expressions into VM code? Beginning with the trivial 5 or x and ending with a very elaborate expression. How do we do it? How do we map it on VM commands? Well, the first thing that we have to bring into the picture is the parse trees that we mentioned and used in the previous module. And what we see here is an example of a simple expression, a * (b + c) and the parse tree which can be derived from this expression or the parse tree that describes the same semantics that the expression describes. Now, this thing is called infix, this notation is called infix. And as I pointed out here, it is human oriented. What do I mean by this? Well, look at b + c. We want to add up two quantities, b and c. Why do we say b + c? Why don't we say something like add up b comma c? Or why don't we say something like, here are two quantities, b and c, add them up. Okay? Instead, we say b + c. Why do we use this language? Well, we use this language because that's what was sort of ingrained into our brains in elementary school. We were taught that whenever we have to add up two numbers, we say the magic words b + c. And because the operator is located in the middle of this expression, between the two operands, because of that, this notation is called infix. But you have to realize that there's nothing special or sacred about this notation. And this notation would be just as good. In prefix notation we put the operators before the operands. And that's why it's also sometimes called functional notation, because that's what we do when we define functions, right. We put the function name, and then we give a list of operands, or parameters, or arguments for this function. So here, for example, we start from the left if we evaluate this prefix expression, and we realize that we have a multiply operator. Well, we know that multiplication is a binary operator, it expects to see two operands. So we keep on reading, and we're looking for the first operand, and we find it. It's called a. And now we look for the second operand, well, we hit this plus. So we know that the second operand is the result of applying plus to two more arguments. And so we go on to evaluate the plus, and so on and so forth. So you see that the prefix notation is just as natural and ubiquitous as the infix notation. And if you don't completely see it, you can add some parentheses, and then things will become clearer. Finally, we also have something called postfix notation. In the postfix notation, we first list the arguments on which we want to operate, and then we call up the operator that has to be applied to the previously stated operand values. Now, for obvious reasons, this is called postfix. And I would like to point out, very importantly, that this postfix notation is intimately related to our stack machine, because our stack language is also postfix. If we want to add up two numbers, what do we do? We push a, we push b, and we say add. Classical postfix. So if you take everything that you saw in this slide into consideration, except for the prefix, which is irrelevant for our discussion. So if you think about the infix, the postfix, and the parse tree, this gives you a very strong clue on how you can translate from infix to postfix. Now, why is this important? It's important because our source language is infix. Java, C#, C, Pascal, Python, all these languages are infix, you describe expressions using infix notation. And yet, at the same time, our target language, the VM language, is postfix, so the compiler has to translate from infix to postfix. How should we do it? Well, I guess that you are beginning to guess the answer. And here is one way of doing it. We start with some infix We start with some expression written as usual in index notation and then from it we generate a parse tree, okay?. And by the way, how to do it is something that we've all ready done. That's what the parser is all about. That's what we did in the previous module. We wrote a parser that takes source code like this and produces XML code from it. And XML, it's just yet another way to describe a parse tree. So you put together this parse tree, and then you go through every node in this parse tree in a certain order and you generate from it what you see here on the right hand side. Which is stack machine code that can be executed and this execution will result in evaluating the value of the source expression, which is exactly what we want. Right? Now if you're not convinced, I suggest that you stop the tape right here and go through the stack machine code, and convince yourself that this code will end up computing the code that we started with, the source code on the left. Now, the question of course is how do you move from the parse tree to the VM code? Well, there's an algorithm that does it and this algorithm is called depth-first tree traversal. And so you begin from the root of the tree which happens here to be the symbol plus and you go all the way down and when you hit a terminal leaf you process it. So you go all the way down and you found x, well you output push x and then you backtrack to the root and you will once again go all the way down and you'd get to the 2, you push 2. You backtrack to the closest node which is g. You go all the way down and you push y and so on and so forth. Now I don't want to spend too much time about this algorithm for two reasons. First is all, we are not going to use it. Will use something different. And, second of all, notice that the parse tree of a real program, can be gigantic, right? All we have here, is a very simple example. And therefore, it will be nice if there were an alternative way to process this parse tree, or to process the source code, without creating the entire parse tree as a side effect of this co-generation operation. Indeed there is such an alternative, and that's what I want to discuss next. In the algorithm that you will see here, we're going to generate the VM code on the fly without having to create the whole parse tree in the process. In particular we will use the following code generation algorithm that we called code write. So code write receives an expression and goes on to generate code from it. Now if the expression is a number like five, the last token in this input example, then all we do is output the VM statement, push 5 or if you want to be accurate, push constant 5. If x happens to be a variable like x, the first token in this input, then we output the VM command push x and, of course, we don't push x. We push something like local 0 or static 2, whatever is the mapping of x on the virtual segments is dictated by the symbol table which is always in the background of these co-generation algorithms. If x happens to be the expression exp1 op exp2, for example, a + b or in this example x + g something. Then first, we generate code for x, then we generate code for g in this particular example. And then we output op, okay? And we do it because that's the way that the VM code works, right? We want to transform the infix operation into postfix and that's exactly what we've just done. Now if x happens to be a unary operator. An op followed by some expression for example, -x, I'm sorry it's -z. In this input sample here. Then first, we generate code to evaluate z and then we output in this particular example, neg which is the VM way to describe minus when minus happens to be a unary operator. And finally, if x is an invocation of some function, for example g(2, y, -z), then first we generate code to evaluate 2. Then we generate code to evaluate y. Then we generate code to evaluate -z. And only then we output call f. And this will apply the f function on all the values that were previously evaluated and placed on the stack. So as you see, this very elegant and highly recursive algorithm is well-equipped to take any infix expression, and converted into VM code. The result, in this particular case, will be the stream of VM commands that you see here. And if you are not sure that this is the case, you can play with the algorithm, stop the tape, and convince yourself that indeed, this will be the output that this code generation algorithm will generate in this particular case, in this particular example. Now, the nice thing about this algorithm is that not only does it capture the semantics of the input expression, it also computes the value of this expression, so in run time the value of this expression is going to be placed on the stack and that;a exactly what we want to achieve. So things look fine and with that I'd like to turn to make some very important points about the switch from parsing something that we did in the previous module to code generation, something that we do now. So, here's another example of some input source code or some input expression in the context of a let statement. And if we would apply the compiler that we wrote, in the previous project, to this expression then we will get the XML code that we see here. And what we see here is a rather sort of a flat expression of the input. We say that we have the identifier a followed by plus followed by the identifier b followed by minus followed by the identifier c and so on and so forth. We go from left to right and then after we compute the value of the expression we add some XML tags and that's it. So we generate static XML code, that's what we did in the previous project. But in the current project, we do something quite different. We generate none of the XML that you see here, this XML is completely irrelevant now. And instead of generating it, we generate actual executable VM code. Now, a great deal of the program that you all be fall is applicable, I mean, the overall API and the overall structure of the compiler or the compilation engine is applicable also to this project. But the path that generates the XML code has to be replaced with code that generates this particular example here, the VM code that you see at the bottom or something similar to this VM code. Because you know different compilers generate different target codes some compilers are more efficient than others. So I cannot tell exactly what your compiler will produce I can just give you some general guidelines on what has to be done. So these are sort of, that's the first observation about the difference between what we did in the last project and what we do now. I'd like to turn your attention to yet another example. Now instead of a + b- c, we now look at a + b * c. What will the co-generator or the previous algorithm do with this expression? Well, it will happily produce the VM code that we see here. It will push a, it will push b, it will add them up, then it will push c and multiply the result c by a times, by a plus b. And then, obviously to populate the result into x. So, that's very nice indeed, however, as you hopefully saw this VM code will not generate the correct algebraic result of a + b * c, why? Because we have, what is know as, operator priority in mathematics. So, first we have to compute b * c, and then we have to add a to the result given the operator priority of mathematics. The compiler is going to ignore operator priority and produce this code instead. That's one possibility, no, I call it version one. Another compiler written by perhaps a different student or a different developer may well produce this version, which happens to be in this example algebraically correct, okay? So what's the difference between version one and version two? In version one we go from left to right and we look at this expression as a + b times c. So we associate operators to the left, so to speak. In the second version, we view the input as a + b * c okay? So we evaluate a, then we evaluate b plus c, and then we, I'm sorry, then we evaluate b times c, and then we apply the addition operator. So the question is, which version is correct? And the answer, surprisingly enough, is that both versions are perfectly okay, compilation-wise, why is that? Well, this peculiar observation is due to the fact that the Jack language does not have operator priority. There's nowhere int he Jack language specification will you find a rule that says that multiplication comes before addition, it's not there. And it's not there because we decided, Norm and I, when we designed the language, we decided to design a language that has no operator priority. Because a language like this lends itself more easily to writing a compiler. So it's the job of the compiler writer to decide how he or she wants to pursue this expression. From left to right, from right to left, we don't care, okay? So, operator priority is not part of the language specification, it's part of the implementation. It's up to you to decide how to carry out this expression evaluation. That said, obviously we have to allow Jack programmers, to give them the ability to write programs that are algebraically correct. And therefore we allow, or we feature, in the language, parenthesis. And the Jack programmer can use parenthesis to signify or to inform the compiler that he or she wants to evaluate b times c before evaluating the plus. So, it's the job of the compiler, in this case, to generate only this particular version. If there are parenthesis they have to be respected and so we have to view this expression as a, that's x1, plus b times c, this is X2. So we evaluate a or we generate code that evaluates a then we generate code that evaluates b times c and then- we admit the ED operation, okay? So once again, in the first example that you see at the top, the compiler can generate one of these two versions and it's kind of indeterminate. But once we have parenthesis, we must, the compiler writer must take notice of the parenthesis and cause the code write algorithm to work in the proper order as dictated by the parenthesis. All right, so this is everything I wanted to say about compiling expressions. And we are now in a position to move on and talk about flow of control.