[MUSIC] So let's have a look at the most common statistical tests. Really, if you can do these, you can do a lot of statistics, you can write good reports. And if you look at some published data, especially my world, many of the articles are written using only these statistical tests. They really are the most common. So let's run through them. The first one is just normal Student's t-test. That is where we compare the mean of the same numerical variable between two groups. So imagine again, we have the age for patients in group A and the age for patients in group B, that is the same numerical variable. And I'm going to compare the two means to each other. Now remember this data was generated randomly, so your values are going to look different. So for that, we're going to use the hypothesis tests. And we see the HypothesisTests., so that package, you don't have to put up other HypothesisTests. But I'm just using it here so you can see where this EqualVarianceTTest function comes from. Now remember, a Student's t-test, we are interested in EqualVariance. So we are assuming that this EqualVariance, all the assumptions are basically for this that we use for parametric tests. So we don't want any outliers that the values off from an old distribution, that the variance is all equal Student's t-test. So we've taken for granted that all these assumptions are meet. And we did that when we created the data. And that is why I still love to, until today, like to create simulated data. When I create get simulated data, I'm in control of the data creation. And especially when I learn a new language, or the language changes, or new packages come out, and I just want to experiment and explore with it, I'm not going to use actual real life data sets. I'm going to simulate data. Because that simulate data, that is me controlling those actual values. And I know what those results should look like. So when I look at that new language, when I look at changes in the language, when I look at new packages, as we are doing here, then I should know what the answer should be. So EqualVarianceTTest is my function, and I've got to past the two list objects, the two lists of data point values. And remember, we created two sub data frames to new data frames based on one that just has patients in group A in it, and one just with patients of group B. So now, that little decision may come to use now because I can just say data A, the age, by the symbol notation in data B, and then in square brackets :Age. So we quickly, the two ages, compare to each other. And look at this beautiful result, I absolutely love this. It says it's a two sample t-test of equal variance, and gives me some population detail. It said that, because I didn't put any arguments in, it's assuming under the null hypothesis that the difference between these two means should be 0. And then it gives me all of this information. It says my t-statistic there is 2.22, with 98 degrees of freedom. And I see my two-sided, two-tailed hypothesis p-value 0.02. So, there was a statistically significant difference in my instance for chosen alpha value of 0.05 between the means of these two groups. And it's easily the reject the null hypothesis. So, when I write my report I could say that, this was the mean and this was the standard deviation for the age in group one. Because we did that through the descriptive statistics, Let's solve for group B, and that gave me a t-statistic of 2.2 and a p-value of 0.03. And therefore, that to me means a statistically significant difference. Now instead of getting this whole report, I might just be interested in just the p-value. So I can actually say p-value, and as argument, we passed exactly what we did before. This time, let's do it for the white cell count between the two. And all I get back now is the p-value, which is 0.86, that is obviously more than 0.05. So note that there's the significant difference in the white cell count between patients in group A and group B. But you see here how I went about thinking about how to do this analysis. Right at the beginning, I always knew that I was going to compare patients in these two groups to each other. And therefore, I created these two separate data sets. And that is something that I can suggest that you do when you do your statistical analysis. It just makes it so easy when you get to the actual analysis. Now if the assumptions were not met and the variances between these two groups were not equal, so if I did the variance of CRP in group A and variance in CRP, and they were not close to each other, those variances remember, they can't use a normal Student's t-test. I have to use an UnequalVarianceTTest, just like different state but there's an UnequalVarianceTTest function inside of hypothesis test package. And I can just run that, in case I need to do that. And I'm going to get back this similar sort of report. Let's move on, so that's t test, very easy to do. Under the assumptions for the use of parametric tests, really all the t-tests that you want right there, have a look at the hypothesis test package. Now let's just do some, create some linear models, linear regression. Remember, that is where I just have had a bunch of numerical variables, and I'm trying to predict one of them, that is my outcome variable. And I have lot of independent, that's independent variables, my predictor variables. And I use them to try and predict a value for this outcome variable. So let's see if we can try and predict CRP given other values. Now we're going to use the fit function. And what do we want fitted, a linear model. So LinearModel is going to be my first argument. The vast majority of the time, we're just interested in linear models. And now I'm going to use this @formula, a macro. Very nice because I can just write then this very nice formula. CRP, the first one that you put is the outcome variable, the dependent variable that we're interested in. And then a tilde symbol and then whatever you wanted to predict. If you just put a 1 there, remember that is my simplest model. So it's just going to predict them, it's going to use the mean of CRP as a predictor for each of the individual CRP levels. Mean of CRP is just going to be my predictor. And then comma, the last argument is the actual data frame that we are using. And if we run that function, we see here all the information that we want. We see our estimate there for our intercept and we see the p-value for that. Of course, that is going to be a very low p-value there, that is our base model. And now we can just start working from this base model and we can start adding predictors. So fit, the LinearModel again, but this time, we're going to use age as a predictor of this. The 1 will automatically be there, so let's just run that. And now we can see our model, and we can see our coefficients. The age is now being brought in, and we see age is not really a good predictor, doesn't add. No, it's not statistically significant adding value to the description of this. And you see the formula there that was written, it's 1 + Age, so the 1 was added there. And now I can just start adding more predictor variables. Now we only had three macro variables. So I can only use age and white cell count. But you can see how you can start building this up. You can put interaction terms in by multiplication, etc., if you know your linear regression. And there we can see our new model and we can see all the statistics surrounding our new model, so very easy to do linear regression. Finally, just analyzing categorical variables, let's have a look at the chi-square test for independence. So I'm just going to see if one categorical variable is dependent on another or are they completely independent from each other. So I have two categorical variables and I'm just going to do analysis of the proportions. The first thing that I need for chi-square test is my contingency table, my observed table. And to do that, I just need remember to know how many values they are. So I'm going to ask is the group, and remember there was A and B, is that independent from the results? So let's just remind ourselves, remember how to write this little BiFunction. And let's do that for group A and group B, and again, making these two sub DataFrames, dataA and dataB that allowed me to do this easily enough. So that I see there for my 1, I have these values, and for group A. And for Group B, I have these set of values. Now I have to create a contingency table. And now I've typed in some values here but they are wrong. Remember this was I had unlicensed before but this was a new notebook. I ran it again so my values are going to be different. So let's just type all those six values in here to create my contingency table from this instance. All I want you to do and why I wanted to do this live is the fact that you must notice that depending on how many numbers there are, these are going to be in different order. It's going to be in descending order, but that means it scrambles around my actual values. So let's go from, in our instance, stick to the top one, so it's static first, so that's 15,13, 11. Now, the static words improved, so I better do the same below here. So static here was 20 and then worst was 24, and then improved, came and improved the 17. So I've gotta stick to that same order. But I'm using this in a reshape because I want to reshape this as 2 rows along 3 columns. So let's look at our contingency table, a very nice contingency table. And we see the 2 rows and the 3 columns. So the columns would represent the result and I put them in order of static, worse, improved, and then the 2 rows would be A and B. So now I have that contingency table of observed data. Now I could also have reshaped a 3x2, chi-square test can work exactly the same. And I can just pass this contingency table now to the chi-square test function, and lo and behold, I see my Pearson's chi-square test here. We can see that we failed to eject the null hypothesis with a 1 side of p-value of 0.138. I can see all my residuals there, I can see my statistic. I could see the 2 degrees of freedom, and all the information of a no chi-square test. Everything is there for you, beautifully done. It's just a pleasure to use Julia in the hypothesis test package for this. Now the very last section, very shortly, I'm going to show you how to export the simulated data as a spreadsheet file. Because I might want to use the simulated data, I always use it. I might compare that same data that I've just created with another package or another version of the language, or another language completely. But I like to the simulated data just to export that. And that will be the last short video in this module using Julia for your statistical analysis.