Welcome to Project 11, in which we're going to complete the construction of our compiler. Now, to remind you, we decided to split this substantial software development task into two separate and standalone projects. In project 10, we built a syntax analyzer. And in this project, in project 11, we are going to extend and morph this syntax analyzer into a full scale compiler. Now we are going to do this morphing in two standalone stages also. And let me describe them. Well, first of all, we've built a syntax analyzer. And this analyzer is capable, as you remember, to parse the source code input and generate from it XML code that demonstrates that the syntax analyzer knows what it's doing, that it knows how to parse and understand the syntax of the source code. Now, this understanding is quite impressive, and yet it has one limitation. It doesn't handle identifiers in a very clever way. Because you see the source code contains like, you know, any high level program, it contains numerous identifiers, x, y, names of subroutines, names of arguments, and so on. And all these identifiers are treated by the syntax analyzer in the same way. It simply calls them identifier, and that's it. So, what you want to do next is to develop the ability to handle and understand the semantics of the symbols in the source code. So we're going to develop the ability of creating and using symbol tables. And as an interesting question, as usual in this course, how can we test this module in isolation from the rest of the system? Well, we decided that we will test the symbol table capability by extending the basic analyzer that you've already built and by generating some richer XML code that will include the same code that you generated before plus a more refined identification of the identifiers that the analyzer found in the source code, as we'll explain in just a few minutes. Once we have this capability in place, we can go on and develop the code generation capability of the compiler. Once we have the code generator, we can throw away the XML. We need it no longer. And we can tweak a little bit the parser in order to develop by now executable VM code, instead of the static XML code that we developed before. So, these are the stages that we are going to follow in Project 11, and let us begin with the symbol table. Before we get started, let us remember how the output of the syntax analyzer looks like. In particular, note how we treat the variable count. Presently, the syntax analyzer simply tags it as an identifier, right? And yet as we decided, we want to be much more specific about the role of this symbol in the source code. And here's how we are going to do it. We are going to extend the syntax analyzer to output the following information. First of all, the identifier's category, whether or not it's a var, an argument, a static field, class, or subroutine. So every one of these categories should appear in the XML output. By the way, when we say var, it's a synonym for local. Var and local variables are the same thing in our jargon. And if the identifier's category is one of var argument statical field, we want the XML code to also include the number of this identifier within its kind. So, whether it's argument zero, one, two, and so on. We also want the syntax analyzer to output this information. And finally, we also want to output whether the identifier is being defined, for example in the context of a VAR jack command. Or whether it's being used, for example in the context of evaluating some expression. So that's another item that we want your syntax analyzer to output to your XML output. So how are we going to actually implement this symbol table capability? Well, in one of the previous units we gave the SymbolTableAPI so we request that quite naturally you now implement it in a language like Java or Python. Now, once you have this API implemented, you have to test it. And to test it, we suggest that you extend the syntax analyzer that you already developed in project ten with the identifier handling capabilities that we described at the top of this slide. Now we don't care exactly how you do it. We don't care what tags you're going to use. You should come up with your own format and simply output it and convince yourself that your syntax analyzer now has the ability to provide a refined analysis of the identifiers that it parses. We also won't expect that you submit this part of the project. This is for yourself, for your own benefit only. And it's very important that you do it because it will enable you to test the symbol table in isolation, or to unit test the symbol table in a standalone fashion. How should you do it? Well, we suggest that you test this extended syntax analyzer by simply running it on the test programs that we gave you in project ten. And once again, convince itself that it knows how to handle identifiers cleverly. All right, so let's assume that we are done with this. And therefore, by now, we can move on to talk about code generation. Now before we do this, I want to kind of take stock at what we have done so far. By now, we have a syntax analyzer which is capable of parsing and understanding not only the syntax of the source jack code but also the semantics of all the identifiers. And so now that the source code is fully understood syntactically and symbol created, it lends itself to the last and final stage in the compilation, which is called generation. And that's what we're going to do next. Now, how are we going to do it? Well, here's the deal. We're going to provide you with a set of six test programs whose names are listed here on the right hand side. And the development plan is going to be staged and we're going to use the notion of unit testing. So you will develop your compiler in stages, and in each stage, you will do whatever is necessary. In order to translate the next test program. So for every one of these test programs, you have to do the following. First of all, use your evolving compiler so to speak, to compile the directory that contains the test program. Typically, well I shouldn't say typically, some of these directories contain single VM file, other directories contains several VM files. So you have to compile the entire directory then you have to look at the generated code, the VM code that you're compiler has generated. If things look fishy, you have to go back to stage one and fix your compiler. If things look okay, then you load the directory into the VM emulator. Now you can do it because if your compiler is well functioning, it will create a set of VM files, one for every given eject file. So when you load the directory into the VM emulator, the emulator will load the VM files and you'll be able to actually execute them. So you can execute the generated code, you can inspect the results, we will tell you what to look for. And once again, if things are not working correctly you have to fix your compiler and go back stage one. So that's the loop that you have to go through with every one of these test programs. Now I'd like to point out something interesting. Normally, when you compile a program, a program you have written. And you get some errors you assume the culprit is the code that you wrote. So you fix the source code in order to compile it better in the next iteration. But in this project, exactly the reverse is true. Why? Because the test programs that we provide are perfect programs. They contain no syntax or run time errors whatsoever. And therefore, if you get any error, you must assume that what is not working properly here is not the source code but rather your compiler. It's your compiler that you wrote that is misfunctioning. This is something that normal application developers don't even dream about. They don't even think about the possibility that the compiler has a bug. But since you took the [INAUDIBLE] course, you are now in a different league in terms of their competence and expertise. And that this comes with certain responsibilities, if something is screwed up. It's your compiler that has to be fixed and that's what Project 11 is all about, you're writing a compiler. All right, so what we'll do next between now and the end of the unit is we'll go through every one of the six test programs and say a few words about it, so that you will know what to expect. Well here's the seven program. It has a single class called main, a single rather trivial subroutine or void method called main. And it simply computes the value of 1 plus 2 times 3. And so this test code is designed to test how your compiler handles a simple program. An arithmetic expression involving some constants only, a do statement, and a return statement. So once again, you have develope or extend your compiler to handle these capabilities only. And then it can test it on the seven program and move happily to the next test program. Now if things work properly, your compiler will generate something like this VM file here. Now it should not necessarily generate the exact VM file that you see here. Because different compilers generate different target programs, and that's perfectly okay. But whatever program your compiler generated, you can then load it into the VM emulator. And when you will run this program you have to see the number seven at the top left of the screen. If you see this number, you will know that your program is well behaving, your compiler I meant is well behaving. All right, moving along, here is a little quiz, but rather a tricky one. Inspect the VM code on the right and focus on the two red instructions. Select one of the correct observations. A, the two highlighted instructions are designed to clean up the memory before the function returns. B, they are related to the function call-and-return contract. And C, they are example of some unnecessary code that is sometimes generated by the compiler. What is the correct answer? Why don't you stop the tape and think about this question for a little while, and then we'll talk about the answer. So let me explain why. Well look at the top red statement which is pop temp 0. And notice that it comes just after the command call Output.printInt. And there's a very tight relationship between these two VM commands. Why? Because, you know what? Let's also go back to the jack source code and realize that call Output.printInt came when the compiler generated code for do Output.printInt. So here, we have a method call in the context of a do statement. And the compiller writer knows that by contract, every VM effort irrespective of whether it returns the value or not, I'm sorry, irrespective of whether it's void or not. Every VM effort must return a value, it must return something. And therefore in this particular case, we are calling the method in the context of a do statement. So this indicates to the compiler that we are not interested in the return value of this method. And therefore, it is still our responsibility to do something with this return value. Now because this return value serves no purpose whatsoever, we simply dump it. So we pop it off the stack, because it was placed there by printInt, which operates according to the contracts. And we pop it onto temp 0, we dont care. We're not going to use it, we completely ignore it. So once again, whenever we generate code for a subroutine call in the context of a do. We must also generate code that yanks the return value off the stack and throws it away somewhere. So that's the explanation to pop temp 0. What about the next command? Where the next command is push constant 0. Well, the next command is necessary in the context of the overall VM function that we have here, which is main.main. If you go back to the Jack code once again, you see that main.main is a void method. Or a void function, it doesn't matter. Well it doesn't matter once again if it's a void or not. It must return a value according to the rules of the game. So at the end of the code that I generate, the compiler writer or the compiler generates for main.main, I must return something. And therefore, we decided that in that case you will simply return or you will generate a call that returns constant zero. Assuming that the caller of this method will yank it as we just saw. But that's not my business. My business is just to deliver what is required by my contract. So I push something through the stack and return. So you see, as compiler writers, you have an obligation to play by the rules. Because there are so many other people around you which assume that you play by the rules. And if you don't do that you're going to create some massive problems. So this, I think, is quite a perceptive quiz that hopefully helps to illustrate some important aspects of method call-and-return. All right, the next test program that I will discuss, the third test program deals with converting decimal to binary. So here's an example. If we give this program the number 171 we expect it to return a set of bits that represent this number in binary. So we have to decide on some way to represent the input and the output. And here's how we do it. By some arbitrary agreement that we made up, we are going to put the input, in this case 171, in RAM[8000]. And we expect the program, if it runs properly, to output the set of 16 bits that represent this number in binary. Into the RAM from address 8001 all the way to 8016. So when we run this program, if we look at the RAM, take a look at 8000. We see that, indeed, it contains the value 171. And if we look at the next 16 values, we can scroll down the RAM. We will see that they contain the bits that represent this number in binary if the program was well translated. So here's the program itself. It has a main function. The main function calls some private functions or some private static methods. And I don't want to discuss it. You can stop the tape and look at the code. Or, actually, you can look at the code when you work on the project. You don't have to do it now. And for now, you can just observe that this program is designed to test the ability of your compiler to handle expressions. But simple expressions that contain no arrays and method calls. And also we are going to test the ability of your compiler to handle if, while, do, let, and return. So once again the tip is that before you test this program, you have to add to your compiler the abilities to carry out these things. To handle expressions, simple expressions and to handle every one of these five statements. Which altogether, these are the statement types that we have in the Jack language. So, I'd like to give you some tips on how to test the compiled code. Because this is an interactive program that deals, well, it's not interactive but it deals with the RAM. And therefore there are some little things that we have to worry about. Well first of all, how do you get to address 8000? Well, a convenient way to do it is to use the binocular control. Just click it and the simulator will enable you to enter some address and then you can zero in on this address in the RAM. The next step is that when you rewind the code, the compiled code of this program it is going to erase the RAM. So you will not be able to see the results of the most recent execution of the program. So don't rewind the code unless you really have to. Another thing that I want to point out is that if you want to enter something into the RAM. In particular if you want to enter a value into 8000 in order to test the program. You should note that you cannot access the RAM when the animation mode is in no animation. This is just some peculiar feature of the VM emulator. So, before you enter a value into the RAM you have to get out of the no animation mode. Enter the value, and then go back to no animation. Because otherwise, it would be very difficult to follow the execution of the program. And finally, if you want to see the program's results at the end of the program execution, you have to click the Stop button. If you won't click the Stop button you will not get to see what we see here in the RAM. Once you click it, boom, you will get to see the state of the RAM. And you can appreciate whether or not your program, the compiled program is well functioning. If it's well functioning it means that, well it shouldn't, it doesn't mean. But it adds some more evidence to the observation that your compiler is well behaving. All right, the next program is Square, or we also called it square dance. And we talked about this program at length at some point in the course. So I'm not going to belabor on it now. I just want to point out that this program is designed. To test how your compiler handles constructors, methods, and expressions that include method calls. Once you develop this functionality, you're invited to test it on the Square application. The Square directory includes the Square.jack class file, the SquareGame.jack class file, and a main class that tests everything. And once again these files were discussed previously in the course. If you want you can go back to look at them. The next test program that we're going to look at is called average. And here is the code of this program. It's a single class file with a single function in it, and as you see This program uses arrays and indeed. Well not indeed, but when you run this program, you will get some values which seem to be correct. And this program is designed to test the ability of your compiler. To handle arrays and strings. So once again, I said it several times, I will say it again. Once you think that your compiler knows how to handle arrays and strings, you are welcome to test it on the average Test program. And the next program run before the last test program is called Pong. That's yet another program that we saw several times during the course. And this program is a complete object-oriented interactive game. And it tests the ability of your compiler to handle a complete object-oriented application, including the handling of objects and static variables. Now, when you look at this screen here, you can identify the ball. You can identify what we call bat, but other other people call pebble and something more abstract, which is the Pong game itself. And indeed the directory of Pong includes the following files. A ball abstraction. A ball class that delivers the ball abstraction. A bat abstraction. The Pong game, itself. And a name file that runs the show, so to speak. Your compiler is supposed to compile the entire Pong directory. And in particular, it will compile each one of these files separately. It will generate four VM files. When you load the directory into the VM emulator, if everything works properly you will be able to play Pong. Most likely you will have to delay the execution in order to be able to control the game. So if you remember there's an execution speed slider in the VM Emulator which you can play with in order to play the game. Finally, the last program that we supply is called Complex Arrays. And this program is designed to test how your compiler handles array manipulations that include all sorts of fancy index expressions. So, for example take a look at this expression here. As you can see, the indices of the arrays are quite hairy. And indeed that's exactly what we want you to test. We want you to test that the compiler knows how to handle these cryptic statements. And to save you time when you run this program, if everything works properly you will see the following output. You will see the required result, we took the trouble to compute it for you. And then the actual result, what your program has actually generated. If these results will be in agreement you know that your VM code is well functioning, which means that the compiler has done its job appropriately. So you can look at the code, it includes some private methods. Not terribly interesting as far as the compiler goes. But once again if you want to understand the test code, you're welcome to do it on your own. So to recap we have to complete the development of the full scale compiler in order to test that the compiler is progressing according to our test program. We supplied these 6 test programs, they are all available in the Project 11 folder. And you have to develop your compiler in a way that tests every one of these programs in the given order. Very important to develop or recommended to develop the compiler so that it will carry out these tests in the given order. All right, so this has been the description of Project 11. And in the next and final unit in this module, we will bring up some general observations about code generation and compiling high level languages.