So we've now seen how the analysis of variants works, how it tests the hypothesis or equality of means, but we're not really finished yet because while the analysis of variance test the hypothesis of equal treatment means, when that hypothesis is rejected, we don't know which means are different. So assuming that your residual analysis is satisfactory and you don't have any additional work to do, then usually, your next objective is to determine which specific means are the ones that differ. The problem of determining which means are different following an ANOVA is usually called the multiple comparison problem. There are lots of ways to do this. If you look in the textbook, you'll see a discussion of a number of different methods for doing multiple comparisons. There are two methods that are routinely available in software that I like to use. One of them is the Fisher Least Significant Difference or Fisher LSD method. The Fisher LSD method, many people joke about LSD, but the Fisher LSD method basically uses pairwise t-tests. Then there's another method developed by a very famous statistician named John Tukey called Tukey's method which works very well, and there's also some graphical methods that can be used. Here's some output from a computer package. This is Design-Expert and it is illustrating the Fisher LSD method. Notice that at the top of the display, it shows you the estimated mean for each treatment and it shows you the standard deviation or standard error for that treatment. By the way, the standard error for that treatment would simply be the mean square for error divided by n, the number of observations in each treatment, and you take the square root of that and that would give you the standard error for that treatment mean. Now, the lower part of the display is showing you the Fisher LSD method. It's comparing every single pair of means. One against two, one against three, and one against four. Then it does two against three, two against four, and three against four, so there are six pair-wise comparisons. Each of those pair-wise comparisons is done with a two sample t-test, assuming that the variances are equal. What you see here in this column is the t-statistic for each one of those pair-wise comparisons, and the adjacent column are the P-values. You notice that every single pair of means is statistically significantly different with the P-values, very small. So what does that mean? That says that every single mean is unique. Every mean is different from every other mean. Here's another way to do this. This is a graphical method, and I like this graphical method. The way this works is we plot our treatment averages on a horizontal scale. This is the scale of iterate, and here, these dots represent the treatment averages. Now, above that scale, generate another scale and draw a normal distribution or sketch a normal distribution on that graph with a standard deviation roughly equal to the standard error of a treatment mean, which we've already seen is 8.13. So this is a normal distribution with a standard deviation of about 8.13. Theoretically, this should be a t-distribution, but the T and the normal looks so similar that if you just sketched the normal distribution shape, that's going to work just fine. The location of this initially is not very important, but then, what you're going to do is you're going to slab this distribution back and forth. Is there a place that you can locate this distribution where it looks like all of these averages could have been drawn from that same distribution? I don't think so. I think if it's located here, it's certainly very unlikely that those came from that distribution. Possibly, this one did, but this one certainly didn't. Now, if you slab the curve over more in this direction, these two could likely have come from that distribution, but not these. So there's really no place that you can make this distribution fall on that upper axis, where it looks like all four of these means came from the same distribution. So this is a graphical view of how the analysis of variants works. Here's another display. Since we have a continuous factor, RF power, we could fit a regression line to the data. The graph on the left is a straight line regression fit. That doesn't look too bad, although it looks like that that line tends to overpredict at the middle of power range and underpredict at the upper end. So to try to compensate for that, I added the quadratic term to the model, so I get a quadratic fit in the curve on the right, which looks a little bit better. A graph or curve like that could be useful to our experiment because remember, she wants to find a power setting that would be appropriate for a given target iterate. For example, if she has a target iterate of about 600, maybe somewhere around here, she might go over to this curve and come down and look at something like about maybe 192 or 191 RF power as an appropriate power setting to generate the iterate she wants. So graphical methods can also be used to help explain the results of an experiment. Finally, one last little comment about the analysis of variants. Why does the analysis of variants work? What's the underlying theory here? Now, that's a complete discussion that is beyond the scope of this course, but I can give you a general idea, I think, very quickly. We're sampling from normal distributions, and it turns out that if we are sampling from normal distributions, the sum of squares treatments divided by Sigma square follows a chi-square distribution with a minus one degrees of freedom. The error sum of squares divided by Sigma square also follows a chi-square distribution, but with a times n minus one degrees of freedom. These two sums of squares are independent. There's a very famous theorem in statistics called Cochran's theorem that allows us to establish that. So now let's take the ratio of each sum of squares divided by its degrees of freedom, and the ratio of two independent chi-square random variables each divided by their number of degrees of freedom turns out to have an F-distribution with numerator and denominator degrees of freedom that match the numerator and denominator degrees of freedom in the two independent chi-squares. So that's the justification for the F-ratio following this F-distribution. Finally, you can show that the expected value of the mean square for treatments is equal to Sigma square, plus a quantity involving the sum of the squares of the treatment effects, that if the treatment effects are not all zero, produces a positive number. So if the treatment means are all the same, the Taus are all zero, and the expected value of mean square treatments is Sigma square. Conversely, if the treatment means are different, then that second term is positive and the expected value of mean square treatments is larger than the expected value of mean square error, which is always Sigma square. So this is why an upper tail F-test is appropriate. This is why large values of the test statistic F-zero lead to rejection of the null hypothesis.