We are now ready to take a look at the analysis of variance for our plasma etching experiment. The slide that you're looking at here on the left hand side has the manual calculations for the total sum of squares and the treatment sum of squares displayed. And then over in the right-hand column, you see the error sum of squares being computed as the difference between the total sum of squares and the treatment sum of squares. Then everything is displayed in the analysis of variance table. Now, as I've said earlier, usually you would do these calculations on a software package, but I did want to illustrate doing them by hand for this one example so you could see that it's really not that difficult to do. Okay, so let's look at our analysis of variance table. We have three degrees of freedom for RF power because we had four levels of power. We have 19 degrees of freedom total because we had 20 observations. And the degrees of freedom add just as the sums of squares add. So 19 minus 3 is 16. We have 16 degrees of freedom for error. And of course, you could also compute that by using the formula n times a minus 1 for the degrees of freedom for error. Or rather a times n minus 1 and a is 4 and n is 5. So a times n minus 1 would be 16. Then the mean squares are found by dividing the sums of squares by the degrees of freedom. And then the F-statistic is mean squared power over mean squared error. And that S-statistic is 66.80. That's the computed value of the test statistic. Now remember, values of F0 that are greater than 1 are an indication that the means might be different. This is much greater than 1 and in fact this particular computer program that I use to generate this arithmetic uses a p-value approach. And the p-value here, it says, is less than 0.01. So there's pretty strong evidence that says that these treatment means are not the same. Here's a picture that can help you visualize that the probability distribution as shown here is an F distribution with 3 and 16 degrees of freedom. It's exactly what that distribution looks like. And on this distribution, I have plotted the 5% value and the 1% critical value. And you can see that our computed value of F0 is far, far out in the tails beyond that. In fact, it's out here at 66.8, and these are all back in the range between 3 and 5. The area to the right of 66.8 is the p-value. And our computer program didn't even bother to calculate exactly how far out in the tail it is. It simply said it's much, much smaller than 0.01. So we have in this case a result that is highly statistically significant. As I mentioned, mostly we do this computing with software. We rely on computers to generate the Anova for us. And the textbook exhibits sample calculations from the three software packages that I had mentioned earlier, Design Expert, JMP, and Minitab. And I would suggest that you look at those packages, because they all produce the same basic information. But the display and how it's organized and how it is laid out is slightly different from one package to the next. The textbook also discusses some of the summary statistics provided by these packages, which we will also talk about as we go along. Okay, so now we're ready to address the question of model adequacy checking and assumptions. Checking assumptions is important. Well, what are the assumptions behind the Anova? Well, we've assumed that the observations are normally distributed. We assumed that we had constant variance. Remember that error term was NID 0 sigma square. And we've assumed that we have independence. And one of the ways we ensure independence is by making sure that we've used a random sampling process to collect the data. So this is a completely randomized design. We may also want to think about have we fit the right model to the data? Is there a lot of unexplained variability that's not accounted for by the factor that we studied in our experiment? Well, that's important and we need to think about that as well. Later on, we're going to talk about what to do if some of these assumptions are violated. And that is a very important part of the course that we'll get to. Most model checking in the analysis of variance begins by looking at residuals. What are residuals? Well, residuals are the part of your data that is left over that is not explained by your Anova model. In my book, the residuals are denoted by eij. That's the jth residual in the ith treatment. And the way we find that is by subtracting the predicted value for yij, that's y hat ij, from the actual observed value. It turns out that to predict the jth observation in any treatment, we simply use the treatment average that turns out to be a least squares estimate of any observation in the ith treatment. So yij minus the treatment average y bar i dot, that would be the jth residual in the ith treatment. You don't generally have to do this yourself. Computer software will typically generate these residuals and save them for you. Some of them even do some of the plotting automatically. And the reason I say plotting is because graphical methods are usually the way that we typically start by looking at model adequacy. For example, if you want to check the normality assumption the easiest way to do that is to simply present or construct a normal probability plot of the residuals. And on the slide, you see a normal probability plot of the residuals with a straight line drawn through the plot. And this straight line does a pretty good job of describing the data. There's a straggler or two at the end, but we're not going to worry too much about that. And in a normal probability plot, remember, if you can represent the observations on that plot with a straight line, you're pretty comfortable. You should be pretty comfortable with the normality assumption. There are of course other useful residual plots. Here's a plot, this is on the left now, is a plot of residuals versus run order or time. What you hope to see here is just random scatter. If you see something other than random scatter, it can be an indication of some problem. One of the more common patterns that you see if it's not random on a plot of residuals versus run order is something that's more of a funnel-shaped appearance. That is the scatter in those residuals gets bigger as we go through time. This is an indication that the longer your processor system runs, the greater the variability in the data. Now, what could this be due to? Well, it could be due to a lot of things. It might be operator fatigue if there are people involved. It might be environmentally related. It might have something to do with a warm-up effect on the equipment. Or it might have something to do with some other operating characteristic of your process, some other variable perhaps that you didn't control in this experiment that is varying through time. At any rate, a plot of residuals versus run order that has that funnel-shaped appearance is an indication that we have some problems with validity. And we need to try to find out what the cause is. The plot of residuals versus fitted values is on the right. This is simply a plot of residuals versus the treatment averages. And again, this plot exhibits random scatter, and that's what you want. Again, a funnel-shaped appearance on this plot could be an indication that as the response increases in magnitude, the variability increases along with it. This is not unusual in many situations. In fact, the normal distribution is the only distribution where the mean and variance are not connected. In many, in fact all continuous distributions, all the useful continuous distributions, as the mean gets bigger, the variance gets bigger along with it. So these two plots are very useful to help you understand the normality and constant variance assumptions. And these plots are generated routinely by software.