[SOUND] I'm sure that some of you are still wondering about why you need to learn statistics or why do so many organizations who are looking for people who understand how to use statistical methods? Just recently, Career Cast, a web based employment service listed the best jobs of 2016 and once again, we see statistics on the top of the list. Speaking to the enormous need for scientists who will be slicing and dicing the data companies have so they can improve their decision making. So for you, as someone who's interested in leadership roles, this is also important. If you don't ask the right questions, then the analysis done by the most talented statisticians will be of very little use. You need to be able to understand statistical analysis, ask the right questions and shape the future of the inquiries. In this module, we are now ready to begin the process of making inferences, so let's get started. Remember that the major purpose of statistic is to provide information so that informed decisions can be made. In everything we do, we face uncertainty. Statistics will allow us to anticipate the possible outcomes of these uncertainties and in turn, improve our decisions. We also know that surveying the entire population is too costly and takes too much time. So we rely on a sample. In the last module we learned about how to take samples and what does it mean to have a representative sample which is not biased. Now we will learn how to use the sample information and make inferences about the population. This is what is known as inferential statistics. Sample information will provide us a point estimate which we will use as an estimation for the population parameter. For example, I may want to know the average age of students who take a course in Coursera. Using a sample of Coursera participants, I can find the average age for the sample and then use that as the estimate for the population of Coursera participants. Let's say I get such sample and found average age of the participants to be 28. I don't need to know much about statistics to know that my estimate is more than likely a bit off. What statistics can tell me is how confident I can be about my estimation and it's degrees of error. Have you ever heard of margin of error? This is a value we can calculate and this is one of the concepts that you will be learning in this module. Let's start with the larger concept of confidence interval. Using only one value for estimating the population mean or any other population parameter leaves much room for error. It is much better to provide a range, this range is called an interval estimate. This is still an estimate, and we can't be certain that it actually contains the true population parameter. The probability that that interval actually contains the true population parameter Is called the level of confidence. Finally, an interval estimate associated with a certain level of confidence is called a confidence interval. I think we all have suffered with headaches and when we take a pill we want to get relief as fast as possible. So consider this, two drugs are being tested for headache relief. We want to know the time it takes to experience relief and the testing is done on a group of size 100. One group takes Drug A and the other one takes Drug B. Drug A average time was 38 minutes before they felt relief. Drug B, average time elapsed was 43 minutes before they felt relief. Base on the study, the average time for Drug A is five minutes less but in the big picture, can you conclude that Drug A really acts faster? Could it be the make of the people who were in the group A that made them feel pain relief faster? What if the conclusion of the study showed the following? Now Drug A resulted in 20 minutes faster relief. Are you more likely to think that Drug A is more effective than Drug B? Of course there can still be other explanations about why Drug A group reported faster relief but given this larger difference, the other possible explanations maybe less likely be the reason for the difference we are seeing. Now it would be a good time to explain the concept of margin of error. When estimating the population mean, begin with the best point estimate from the sample mean and then add and subtract the margin of error. I mentioned that you might have seen the phrase margin of error in news programs and business reports and etc. Here's an example for margin of error. This image is from Gallup Daily Economic Confidence Indexes from April 15, 2016. Look closely and you will see the sample size used and the margin of error. By the way, what exactly is the margin of error? Well, margin of error is mathematically is made of many things. One is the size of your sample. Remember, you should know this now at the gut level that the larger the sample size, the more reliable your estimation will be. That means larger sample size will reduce the margin of error. Then, it is the natural variability that you have in your sample which is represented by the samples standard deviation. Finally, the confidence level which describes uncertainty of the sampling method. In another word, how confident are you that you have studied and sample which contains the true population parameter? The notion for this is one minus alpha. Alpha is known as the significance level. So if you want a 95% confidence interval the most commonly confidence level used then you're essentially saying that there is a 95% chance that the sample you took will allow you to make a correct inference about the true population parameter and 5% chance of missing the mark. Let's explode the concept of confidence interval a little further. This is a distribution of sample means. Central limit theorem shows that if you take many samples and then plot each sample mean, we will end up with approximately a normal distribution. Thus, if you take just one sample then we expected about 68% of all sample means to lie within one standard deviation of the population mean. 95.5% of the sample means to be within two standard errors of the of the population mean, and 99.7 of the sample means will lie between three standard errors of the population mean. So if you are considering a 95% confidence level, then you are implying that the sample that you took will have a mean which would be roughly about plus or minus two standard errors from the mean of this distribution, 1.96 is to be exact. This is the Z score for the desired confidence level. So that confidence interval will be express in the number of standard errors that you're away from the mean. And that leads you with the 5% chance that you selected a sample which fell outside of your boundary, that 5% chance of selecting such a sample is equally likely at each end of the tail. So then the equation for confidence interval, for estimating mean of a population is the sample mean, plus or minus the margin of error. And the margin of error is calculating by finding the z score of the confidence level desired times the standard error. Where the standard error is calculated by taking the standard deviation of the population and dividing it by the square root of the sample size. Please note that in this equation for the standard error, we use sample standard deviation as an estimation for sigma, which is the population standard deviation. You may wonder why. So let me explain. It's well understood that we do sample studies because we don't know something of interest about our population. Thus, we don't know what the population mean or standard deviation is. But the statistical equations use different notations and distributions when we know the population's standard deviation, and when we don't know this value. To be completely precise, if we know the population standard deviation, that is sigma, then to find the z score corresponding to our desired confidence level we will use the actual sigma and the normal distribution and find its z score in order to calculate the confidence interval. However, if we don't know the population's standard deviation, then we use the sample standard deviation s as an estimate of sigma which is the actual population standard deviation. And another distribution as known as t distribution in order to calculate the confidence interval. Let me show you the relationship between the t distribution and the normal distribution. The t distribution is similar to that of the standard, normal distribution. Both are symmetrical and bell-shaped. However the t distribution is more spread out than the standard, normal distribution. That is, it has more area in its tails and in less in its center. The amount of spread of the t distribution is given by degrees of freedom which is n minus one that's sample size minus one. Now, let me analyze this graph, the most peak and narrow curve in this graph is a standard normal curve. The most spread out curve shows the t distribution for sample sized used was three, a very small sample. The other curves our t distribution plotted as the sample size has increased to four and then ten. One thing you notice is that as we increase the sample size, the t distribution becomes more peaked and narrower, and approaches a true normal distribution. Thus, if you have a fairly large sample size, then the two curves become more or less identical. Actually past sample size of 30, these two distributions become very similar. Since in this class, we only plan to work with large sample sizes, I will always use a normal distribution thus a z score for calculating the margin of error when illustrating a problem in my lectures. When solving problems using a software like Excel it is just as easy to use the t score and the t distribution. So in a sense, t score, z score become almost synonymous and interchangeable when we have large data sets. As I said, to be perfectly precise, we should be using a t distribution. But for large sample sizes where the population standard deviation is not known, using just a standard normal distribution is an extremely good approximation. Just to show you that numerically, consider the following example. There is a population which you don't know anything about and this is why we are taking samples from it. We take a random sample of varying sizes from this population. I started with 30, because at 30 t distribution and the standard normal distribution become very close and get closer as the sample size increases. For each of the sample taken, we calculate the sample's standard deviation. Which will be used as a point estimate of the populations standard deviation. So now let's see how the t score and z score will be if we wanted a 95% confidence interval. This table shows the t score for the different sample sizes I have here. As we increase the sample size the t-score starts getting closer to 1.96, the actual z-score. So why do I say when I'm demonstrating concepts to you I will only use z-score? Because z-score stays the same regardless of the sample size. So for the most commonly used confidence interval of 95%, you know that value is 1.96, approximately two standard errors from the mean. So you can focus on calculating the standard error. T-score, which is based on distribution, on the other hand, is dependent on the sample size, which makes it impossible to memorize a t-score, it is not unique. And that would prohibit you from doing any quick calculations or mental math. One of my objectives in this class is to give you the ability to read business reports and look at data and be able to decide for yourself if the conclusion makes sense or not. Which means that you do some basic reasoning and mental math and relying on a z-score for a quick approximation is perfectly good substitute when you don't have access to a computer. There are three confidence intervals that are used most often. 95% is the most often used interval, whenever the confidence level is not mentioned, you can assume that it is at 95%. This value is the default value in most of the statistical softwares as well. A good approximation of the t-score using a normal distribution is 1.96. The other two are 90% confidence interval in which case z score is 1.645 and 99% confidence interval and for that z-score is 2.576. You can pretty much memorize this for quick references as you look at business reports and analysis presented to you. In this lesson, you learned about confidence level, margin of error and how all this come together to create a confidence interval. In the next lesson, we will put all this together and develop the confidence level and start looking at its meaning. Its meaning. Its meaning. Its meaning.