[MUSIC] In this video I'm going to illustrate It have to run linear regression and then we have multiple x values. So this is an example where we sync that the sales of sales representative can be predicted, by how much promotional budget they have. Column they have with the company in terms of number of months. The market potential where they are working in. The market change since the last period. And how they are rated by their customers, their average rating by their customers. So, these values are in thousands of dollars. So, the sales is in thousands of dollars. Promotional budget is in thousands of dollars, and market potential is in thousands of dollars. And time is in number of months. And customer rating can be any value between zero and five. So to do this, first we want to go to data analysis. But before I do the data analysis I want to tell you something. In Excel it's required that you would have all of your X's in a continuous manner. So you cannot have, for example, Promotional Budget in your first column, Sales, which is your Y value, in the second column, and start having more X it doesn't allow you. You have to be able to pick all of the xes in a continuous manner. So make sure before you run the analysis, you rearrange your worksheet such that the x's are all in consecutive columns. So once you have that, we go to data analysis. It really is a very simple difference between the two that we have done, between simple linear regression and what we will do in multiple regression. We select regression and once again, it will ask you what is the y variable. So you're going to click on the first cell, pick the entire column. And then when it comes to x variables, you have to simply select all of them. So here it is columns B through F and then you do the same thing control shift down and select the rest of it so it basically says that it has selected B1 to F51 so this includes everything that we have identified as a independent variable, the promotional budget times market potential. Market change in cost already. Again I recommend that you would have headings for these, because otherwise it would just show up as variable one, variable two, variable three, and that doesn't really make much of a sense. So I'm going to click on Labels and I'm going to put it in new worksheet. And here's our output, and again as you can see, the numbers are kind of squished in, so I'm going to double click on every column to expand it. So, I just put my cursor here, double click, double click and I see now everything is visible. And as I had mentioned before, upper and lower repeats itself twice, so you can go ahead and ignore it or you can be like me and always remove it. Okay, so now what do we see? First of all, how is this as a model, as a model, because it's a multiple regression, and in multiple regression, naturally the R square will go up as you increase the number of independent variables. So, in multiple regression, we will focus on adjusted R square. So, our adjusted R square says that about 70% of variations that you see are explained by variables that you have identified. One of the things that you would see is that we no longer can just talk about correlation here because it's now multiple of things that are being included, so correlation is about the regression model itself. Rather than the correlation of each individual x variable. So the first thing we want to do is that we want to look at the p values of every coefficient that we have here. So focusing on the p values, any p value that shows a value greater than 0.5 because that's our level of significance means that it does not have any strong relationship with the Y value that we are trying to predict. And here is the market change. So we do not reject the hypothesis that these two things, market change and sales do not have a relationship. We do not reject it. So what do you have to do first, before we would use the model, is to remove such variables. These variables are basically noise they're not adding anything and it could be because, market potential itself is capturing what you're looking for. A market changes every done than variable in which case you would that there is a collinearity between them. Which means market change some what can be explained by the market potential it's so, it's a redundant variable or it's a variable the standard useful in predicting art sale. So, the first thing we need to do is clean up our model. And we're going to remove it. Now, if I had more than one, I would start removing the one with the highest value then move on to the next just want to move onto the next one. You will remove these one by one until you have only variables that have p values that are significant. And in this case, that means p values that are less than 0.05. So what I'm going to do is that I'm going to go back to my variables, and this time I'm going to get rid of this column altogether. So either you can remove it or just create another sheet that doesn't include it. So in this case, i'll just going to remove it. And it's gone. So now I'm going to run my model with just these four variables. So I have reduced my number of independent variables from five to four. Go back to data, data analysis, pick regression and you would see because I'm in the same sheet it has the same values but remember now I have to change this. I no longer have columns B through F but I have column to E so I'm just going to change it here and not go through the process over again and it's going to put it in a new worksheet again. So this is our new output, so what do we see? So adjusted r square is 7 10.05. The last one was 702, so actually our model improves just a little bit. Sometimes you will see that it may not improve at all, but it will stay the same. It is so better to get the noise out of your model. Now let's look at our p-values. The p-values are all significant. They're all less than 0.05. So now we only have things in our model that is less than 0.05 level of significance. There's therefore we have a good model. And this model now can use this for predictions.