In this video, we'll describe and discuss the utility of prediction intervals. So far we've learned that there's a conceptual difference between prediction and explanation, but there's also a statistical difference. It turns out that there's more uncertainty associated with a prediction than with an explanation. To see this, let's recall from a previous lesson that the one minus Alpha times 100 percent confidence interval for the mean response at a set of predictor values can be written in the following way. That confidence interval, we took the value of the response, so Å·_i plus or minus a critical value based on the t distribution, we argued for why this is the right critical value in a previous lesson, times the standard error of this Å·_i value. Remember that the Å·_i is equal to x_i star so this is the vector of predictor values with a one in the first column times the fitted model parameters. Now again, this is a confidence interval for the mean of the response at this set of predictors. Now, this interval takes into account the uncertainty associated with Å·_i, the mean value of the response. It takes into account the uncertainty if you were to have different data, so different y values at the fixed x values, as we've discussed before, you might get a different line. Very likely you would get a different line, it might be a steeper slope, it might be a more shallow slope of course the intercept will change and that amount of uncertainty is being taken into account in the confidence interval for the mean of the response, but that interval does not take into account the additional uncertainty associated with the fact that a single measurement of the response does not always occur at the mean, but rather occurs randomly around the mean. If we look at this plot here, for this particular value of the predictors, the response occurred up here but really it could've been anywhere according to some distribution. Now it's most probable that the point would have occurred here according to our re-sampling arguments, like somewhere close to the true line but the points can occur somewhere relatively far away from the true line that supposedly generated the data and so the amount of uncertainty that occurs off of the true line, needs to also be taken into account when we're predicting a single value, as opposed to just predicting where the mean value occurs. That brings us to the idea that a confidence interval for the mean response value is really different from a prediction interval for a future observation or an observation not used to train the model. We said sometimes we will try to forecast the future observations or predicts something that happens in the future but sometimes we try to predict a value that has occurred, but just wasn't used in the training of the model and both of those cases should be under a prediction interval. When we derive the confidence interval for a mean value of the response, we really just took into account the variance of this first term, the x_i star times Beta-hat but for a prediction interval we really need to take into account the variability here, coming in through the least squares estimator, but also the variability in any given point. The individual points have variability associated with them in addition to the variability associated with estimating the mean. When we have independence across the terms in the sum, which we do, the error term should be independent from that first term, then we can calculate the variance of x_i star times Beta-hat plus the variance of Epsilon_i and that first variance term we calculated in our previous lesson. It will look like Sigma squared times x_i star times X transpose times x inverse times X_i star transpose and then the variance of Epsilon_i, we assume to be Sigma squared. Of course, you could write this as Sigma squared times x_i star times X transpose times x inverse times X_i star transpose plus one. Now, this is the variance of a future value or observed value and so we need the square root of this for our prediction interval. The story is the same as with the confidence interval in the sense that we don't know Sigma squared and so we would have to estimate it using the standard estimator that we've come up with in a previous lesson, Sigma-hat squared which is the residual sum of squares over the residual degrees of freedom. Putting that altogether, our prediction interval for a future observation or observation not used to train the model given the set of predictor values is Å·_i star plus or minus, we have the same distribution, so we'll use the same critical value of t, the level Alpha over two, and the t-distribution as degrees of freedom n minus p plus one and then we'll multiply by the square root of an estimator of the variance that we just calculated. The square root of Sigma-hat squared times x_i star times X transpose times x inverse times X_i star transpose plus one. Now we should note the main difference between a prediction interval for a future observation or one not used to train the model and the confidence interval is really that extra Sigma hat squared times one, so that's basically a plus variability in the error term, that's the main difference between the two. That will imply that the prediction interval is always wider, so that additional term tells you the prediction interval will always be wider than the confidence interval and that makes sense because prediction comes within that extra uncertainty, the uncertainty in the error term in addition to the uncertainty in the model term. Now, R has a nice built-in function that will allow you to make predictions and it's actually the same function that you use for confidence intervals that we learned about previously. We'll just change one of the arguments. We use the predict function and we plug in whatever we called our linear model object. We also place our set of predicted values in the form of a data frame and we'll learn how to do this in the next lesson basically on making predictions in R. Then the interval argument should be set not to confidence but to prediction, that will give us the wider prediction interval. In the next lesson, we'll take a look at how to implement this on some real data in R.