In the previous unit, we talked about certain conventions that developers of Jack compilers are advised to follow. We call it the standard mapping over the VM map platform. And in this unit, we'll discuss, finally, how to actually build this compiler. So here's the overall architecture and roadmap of our compiler. To remind you, in the previous project, in project 10, We wrote the syntax analyzer that generates XML code. Now, we have to take this software and morph it into a program that generates VM code. This program will be our Jack compiler. And obviously we'll do it one step at a time as usual. Now, we proposed to base the compiler software architecture on five separate modules. If you write it in Jack, I'm sorry if you write it in Java, it would be five Java classes. And the JackCompiler is sort of the main program that is going to drive everything. Then we have a JackTokenizer, SymbolTable, a VMwriter, and a CompilationEngine. And in this module, and I'm sorry, in this unit we're going to focus mainly On the SymbolTable and the VMWriter. Why? Because everything else has been developed already in project 10, although we still have to tweak with it. We have to modify some of that code that you write in these modules, but once again, the gist of what we'll do It's concentrated in the SymbolTable and in the VMWriter. All right, so let us begin with the topmost class, the JackCompiler. How do you use it? Let's start with the user level of functionality. Well, if you write the JackCompiler in Java, you will probably say something like Java, then the name of your program, JackCompiler, and then you supply it with a mandatory input. And this input can be one of two things. It can be either a file name, .jack, in which case The compiler will generate from it another file with the same name, but with a VM extension. This is the output of the compiler. Or this input can be the name of a directory. If it doesn't have an extension, it must be the name of a directory. In which case the compiler is going to generate one VM file for every Jack file found in this directory. So what I just described is summarized in this output segment here. And let me say a few words about how it actually works. Well, for each source Jack file, the compiler creates a JackTokenizer object in order to process the input. And it also creates an output vm file. Now I should say it more accurately, the jack tokenizer is going to represent the input. So, this would be the input of the compiler, and the dot vm file is going to be the compiler's output. Now once we do this set up the compiler can go on to use symbol table, compilation engine, and VM Writer in order to generate the VM code and write it or admit it into the output file, the VM file. All right, now, The Jack Tokenizer is the next module in the list. And here, I'm happy to say that mission accomplished. We already developed the Jack Tokenizer in project ten. We can use it as is. We don't have to touch it for the full scale compiler as well. So this thing has been taken care of. What about SymbolTable? Well, SymbolTable is new. And because it's new, let me start with some background. So here's an arbitrary example, the point class that we used several times in this course, and we see that there are several kinds of symbols or variables lurking here. First of all, we have field variables and static variables. X, Y are fields, and there's only one static variable point count These variable have scope, the scope is the region in the program in which they are recognized and they are recognized throughout the class. Right? Any subroutine in this class can see and manipulate static and field variables. And the compiler keeps track of these variables, using a class level, a symbol table, an example of which yo see here. So, as you see the two field variables and the single static variables are accounted for. In the class table symbol table. Now, in other kinds of variables that we may have are arguments and local variables. That's what we see in this example here. Other is an argument, DX and DY are local variables. And the scope of arguments and local variables is the subroutine code in which they are being declared. These variables are not recognized, they are unknown, outside this subroutine The compiler is going to keep track of these kinds of variables using another symbol table which is a 17 level table as you see here in this example. Now, one thing that we can conclude from this exposition here Is that we never need more than two symbol tables. That the compiler never needs more that two symbol tables. Why? Because the class-level symbol table can be initialized each time we start compiling a new class obviously. And what is, maybe, less obvious, the Subroutine-level symbol table can also be reset each time we start compiling a new subroutine, within the current class. Why? Because the previous symbol table that we had is no longer interesting, it's passe. It contains irrelevant information. We already compiled the previous method, we can forget about it, we generated the code. So when we move on to compile the next subroutine, we can reset the SymbolTable that we use before. So, the symbol table that we use before. So, the SymbolTable class if you implement it as a class, let's say in Java is going to have two instances only. A class-level symbol table and a subroutine symbol table. Now, within the symbol tables, the compiler, as you see here, gives each variable a running index within its scope and kind. The index started zero It is incremented by one each time a new symbol is added to the table, and it is reset to zero when we start a new scope, a new symbol table. Now, all these operations have to be handled by the routines of the symbol table Module or symbol table class. And, that's what we're going to discuss next. The API of this class. So, first of all, this class will have the constructor that creates a new symbol table. It will have a starts subroutine that simply starts a new subroutine scope. Following the tips that I gave you before. Then perhaps the most interesting method is the method that adds a symbol to the table. It's interesting but it's straightforward. It receives various items of information. The name of the symbol that you want to add. The type of the symbol and the kind. And based on this information, it adds a new if you will, to the symbol table. Then we have a VarCount routine that returns the number of symbols that were already defined or added to the symbol table of this kind, whether it's static field, argument, and so on. And this routine is quite helpful in several places when you will generate code and then we have three queries if you will, with the names kind of, type of and index of which we turn Some useful information about a certain symbol. So we can say, what is the kind of x? Well it can be the method you'll say. It will look up the table. It will say it's STATIC, FIELD, something else. What is the type of x? Well once again, the method will go to the symbol table and will say, the type is such and such. And same with index. So you know these routines are also very helpful when you generate code for obvious reasons. If you're not convinced, you will when you actually develop the compiler. Now I took the same API and sort of put it in a smaller space on the slides, so there's nothing new here. I also put the two symbol tables that we saw before, just for reference. And I'd like to say a few words about implementation. The SymbolTable abstraction that we discussed all along can be implemented using a classical data structure called the Hash Table. And if you're not sure what is a hash table you're welcome to read about it in the Internet and look up hash tables in the library that supports your language, whether it's Java or Python. Every modern language has a hash table abstraction somewhere. And so we can use one hash table to represent the class scope and another hash table for representing the subroutine scope. When we start compiling a new subroutine, the letter hash table can be emptied and reset, and that's basically what we have to do. Now here's another tip which is relevant for the compiler in general. When you compile error-free Jack code, each symbol which is found neither In the subroutine symbol table, nor in the class table, must be a symbol that represents either a subroutine name or a class name. So that's a useful table that will serve you well when you write the compiler. Moving along lets talk about the VMwriter. The VM writer's job is to emit, generate and emit VM code into the output VM file and here's the API. The VM writer has a constructor that creates a new output file whenever we start compiling a new class, we have a right push routine that writes a VM push command. Now, in order to write a VM push command you have to know to which segment you're pushing and what is the index that you want to effect. So, once you have this two pieces of information. This routine is trivial. Likewise the writePop routine is also trivial. It also writes a pop VM command. WriteArithmetic command writes the command you know, so if the argument is SUB it will write SUB. Very straightforward. And all the other routines follow the same rationale, they get some arguments and they generate from it a straight forward VM command. So this model is relatively simple to develop, and yet it's important from self engineering standpoint because it encapsulates all the activities which are related to actually generating output to compiler. Finally, we have Compilation Engine and the Compilation Engine gets its input from a jet tokenizer and writes the output to. The output file using the VM writer. So, output is generated by calling VM writer methods. The compilation engine if you recall, we discussed it several times in the previous module and in this one. It is organized as a series of Compile XX routines. XXX being a syntax element in the Jack language. We have something like 15 such elements so we'll have something like 15 such compile routines. And the contract between these routines because they keep calling each other all the time. Is that each compilexxx routine should read the XXX construct from the input, it should advance the input exactly beyond XXX and then it will emit the output VM code, which effects the semantic of XXX. XXX can be something like while, let, and so on. Therefore we should call this compile routine only if XXX is indeed the current Syntactic element that we are handling. Now, if this element is part of an expression, and therefore it's something that should have a value, then the emitted VM code should compute this value and put it at the top of the VM stack. So that's what the CompilationEngine is supposed to do. It does it using this API. Now, this API, which consists of this slide and the next one. So, you know, two slides for this API. This API is completely identical to the API that we had for the compilation engine when we wrote the ascetic senerizer in project 10. And therefore, I dont want to belabor about it, I talked about it in the previous project. And yet in this project, we have to move, you know, this API in to API that generates not XML code but VM code, so there is some work to be done here and we're going to spend the next unit, project 11 overview In order to give you step by step guidelines on how to actually carry out and test this morphing from code that generates XML to code that generates executable VM code. So that's what we do in the next unit.