Let's take a look at how we can make predictions, including prediction, intervals. And also we'll look at confidence intervals using some real data in our. So we should be used to this data set by now. It's a data set that consists of measurements related to the impact of advertising Media's on the sales of a given product. And I'll give you a second to read those variables if you aren't familiar with them yet. But once you do, let's take a look at reading in our data, So you can read in the data using this code here, and there was no major cleaning to do with this data. There were no missing values, so the first thing that will do is go ahead and split the data into a training set and a test set. We've done this once or twice before, and I'm setting a seal just to keep the split the same each time. So if you were to run the code many times, you would get the same, rose in the training set and in the test set. So then we go on to fit our linear model on the training set, and I'm calling this LM marketing and looking at a summary that we've analyzed before at least fit on the entire data set. And, the estimate should be pretty similar, though, and we noticed in our lesson on T tests that the newspaper parameter, the parameter associated with the newspaper predictor is not statistically significant at the Alpha equals point 05 level. So that was just a reminder from a previous lesson. We're not really using that result here, but it's good to keep in mind. So the next thing that I've done is extracted some values from the testing set and use them to make predictions. So there's nothing stopping you from using the entire, testing set for making predictions. But right now, I just wanted to look at a couple of them. So I chose a sample of size five randomly drawn from the test set. And, I'm calling that star because in the notes, we've called this like X star and Y star. And then this line here is where I let r do all the heavy lifting in calculating a prediction interval. So I wanted to round things to two decimal places just to make it look a little bit neater. But the main function here that we're using is the predict function, and we mentioned in a previous video that the predict function takes in the linear model object. So we said we called that LM underscore marketing. It takes in, the new data, the new set of predictors that you like to make predictions at. So we call that star here. That's the five randomly chosen values from the testing set. And then you should specify whether you want a prediction interval or a confidence interval if you want an interval at all. So here I'm asking for a prediction interval. If we left this out, we would just get the fitted values. So the first, meaningful column down here. But with the interval set to prediction, we're also getting two additional columns, namely, the lower bound of the prediction interval and the upper bound for the prediction interval. And also noticed that I haven't specified a confidence level, in this, function. The default is 95% confidence. So alpha equals point 05 But you could change that with a level argument here. You can change it to whatever confidence level that you'd like. So, as I try to do, we can also, compute these quantities that are is computed really easily for us by hand. And here I'm setting our alpha. So our if our confidence level is 95% our alpha would be point 05. I'm extracting the coefficients from our model just to make the, some of these computations look a little bit nicer down below. I'm also computing the degrees of freedom residual degrees of freedom for our model again just to keep some of the computations, in the code looking a little bit nicer. I'm computing it upfront, residual sum of squares, which we've computed before. And then here is our estimate of the error variance. So that's always the residual sum of squares over the degrees of freedom. And then I'm extracting the model matrix for the model. So this is that matrix with a column of ones and then another column, consisting of the measurements of each of the predictors in the training set. And so this will be used in our standard error calculation that goes into the prediction interval. Now, this next line again, I'm doing mainly for things to look a little bit nicer when we compute the prediction interval. I'm just taking, so remember, star was the name of the values that I extracted from the test set the five values from above. right here. And here what I'm doing is, first, I'm taking just the predictors instead of leaving the last column, which is the response I just want the predictor values, and that's actually a data frame. So I want to store it as a matrix so that I can do matrix manipulations with it. Like multiplication. And then here, instead of using the predict function to give me the predicted values, computing them using, other functions. So first, the predicted values should be, the values of the predictors put into a matrix with a column of one's out front times. Times your parameters. So actually, I could change this to B because we already stored that above So, this should be your Y hats. I'm calling them. Why? Star Hat? Because they are the we're predicting for the star values that subset of the testing set. So we'll use these in the prediction interval to. Now, here's our prediction. Interval and you should test this against our notes when we wrote down the formula for the prediction interval. So why Star Hat? And I'm just doing this for the first value here, So I'm doing it for this row here instead of for each one of them. Of course, the computation would be very similar for the for the other four or any value in the testing set. So taking the first, fitted value or predicted value minus than this whole quantity here. So the first pieces just our critical value at the right, confidence level and with the right number of degrees of freedom. So this is our T statistic, and then the next. So the t statistic times the standard error. So the standard error is a bit long, but we wrote it down in class. It's the square root of our estimate for the error variance times, and then this quantity Here is a column of ones and then the next, few columns will be the value of the predictor in that first row. So this should be, a row vector of length four and then times x transpose x inverse. That's that quantity there. And then times the transpose of that what we called X star vector. Same thing over here but the transpose and then remember the prediction Interval has a plus one. So this sigma hat is multiplying this entire thing here and that gives us the correct standard error for a prediction interval. Now that's just the lower bound of the prediction interval. The upper bound is exactly the same, except we're starting at the predicted value and adding that quantity, namely the critical value times the standard error and notice when we when we print that out, I've rounded it to two decimal places. We get exactly the same as what R gets up here. All right, so doing the computation is quite important. But it's also important to be able to interpret the prediction interval and so we could say something like we're 95% confident that if a new company selling product, P entered the market with a YouTube marketing budget of 2800. [COUGH] So we can interpret the prediction interval as follows, with we'll have to unpack what we mean by confident here. But we could say we're 95% confident that if a new company selling product P entered the market with a YouTube budget of $287,160 a Facebook marketing budget of $18,600 and a Newspaper marketing budget of $32,760 they would sell between 16,440 and 23,960 units of product p. So notice that's for an individual new company. And also, if you're wondering where I got these values from this say two 187,160 remember that these data are, recorded in thousands of dollars. So, this value is just this first value here times 1000 Probably. Now, what does confidence mean? Or really, it's a bit complicated what confidence means. So a proper interpretation requires that we think through the re sampling procedure. So re sampling says that we fix the predictors, in the training data and re sample the response many times at those same predictor values. And then for each one of those re samples, we fit the model to the to each one of those training data sets, and we compute the prediction interval at the same values of the response from above. For each of the models that we fit into. Now, of course, we never do this procedure unless we're simulating to show what it looks like. In practice, we only have one data set, but really, the interpretation of confidence means that we imagine we have many data sets that were re sampling from, and that's how we get these probability distributions. And so if we did this among the prediction intervals that we compute, we would be 95% confident we would have 95% of those covering the true value of the response for this new company. Now we can think about the prediction interval. Say up here as a range of plausible values for the sales for the new company. As long as we think about plausible values as being, related to this re sampling procedure. So I think it's worth mentioning that, in this case, we're predicting a value of the response that was recorded already, and we put it in the testing set. And so we know the true value of sales to be 18.84 or if we multiply that by 1000 to get the number of units 18,840. And so that means we know that our prediction interval covers the true value, right? The true value. This number here and our prediction interval, it covers that. And the idea would be that if we redid this re sampling procedure, and we computed this prediction of many, times, the end points of the interval would change, and 95% of the time those endpoints would cover this this value. But that's also including, the ability for this value to change because we're also taking into account the random fluctuation in this value itself, meaning that we're thinking about re sampling this, value that we are predicting, too. So there's variability in both, the model that we fit and the new value that we end up trying to predict. Okay, so let's contrast the prediction interval with a confidence interval. So I mentioned above the only thing we have to change. We still use the predict function. But we just change the interval argument to confidence. And here we get the same fitted values. Those should not change. What does change is the interval, the lower and upper bound of that interval. And we could repeat the by hand calculations changing. All basically, all we need to do is delete this plus one, and we would have the confidence intervals. Now, what's the interpretation? Well, this is about, a claim about the the value on average and not some new particular value. So can interpret the confidence interval as we are 95% confident, that a new company selling product P with a YouTube marketing budget of Facebook budget, Newspaper budget of these values, we can expect to sell between 19,740 and 20,670 units of product P on average. So that's not, an interval telling us what this particular company would sell, but it would be what they would sell on average. So there's less variability in that. So importantly, notice how much smaller this interval is. The lower bound is 19,740 upper bound, 20,670. And that goes down to, like, 16,000 or so and up to 24,000. So quite a big difference between the prediction and confidence Interval. Now, finally, we've mentioned, the mean squared prediction error as a way of maybe comparing, different model and so let's take a quick look at that, too. So let's first fit a reduced model. So above, we fit sales as the response YouTube, Facebook, and newspaper as predictors. So let's drop newspaper from this reduced model here. And once we do that, let's compare the two models using the mean square prediction error. So, first, for the full model, we can, calculate the predicted values using the full model and using all of the test data now here. I'm only calculating the point predictions and not any interval. And the reason for that is because I'm just putting this into a formula for the mean squared prediction error. And we would take the mean of the squared difference between the true values and the test set and the predicted values that we calculated above. And so this, number here, what we get is about 6.2. Now, if we do the same thing on the reduced model, so here I'm computing the same thing. But using L M underscore marketing to that's our reduced model, but also on the testing set and calculate the same quantity here. But using, the predicted values from the reduced model, we get 6.1. So we get a lower, mean squared prediction error, and that would tell us that the reduced model seems to do a bit better than the full model in making predictions, even though we left out the newspaper predictor. And so we might as well go with the reduced model because it does, well just a bit better, in terms of predictions and we didn't have to include we could. We kept a simpler model, basically the one without newspaper, and just recall that this is consistent with our T test result. Recall that the T test said that newspaper was statistically insignificant, and therefore we could maybe reasonably take it out of our model. And the the comparing of the mean square prediction errors seems consistent with that.