Hey and welcome to Module 3. In this video, we're going to learn about random variables. Let's recap what we've done so far. We've learned about a lot of things, the foundations of probability. We've learned about; experiments, events, and sample spaces, then we did axioms of probability and their consequences. We studied conditional probability, Bayes theorem combinations, permutations, and independent events. We've studied a lot of the foundations, but suppose, if you think about statistics, we want to do things like, find an average value, find variance. You can't really find the average value of events. We need another layer on top of all this foundational work that we have done. That layer is going to be random variables. A random variable is a function that maps events from the sample space to the real numbers. I can think of a random variable as X. Its domain is going to be the events in the sample space, and its output is going to be some real number. Now, a random variable is not random and it's not a variable, it's actually a function, I think the random variable term is more historical in nature. We'll continue to call them random variables because that's the convention. But really, a random variable is a function that takes sample space events and gives a number. Random variables can be discrete or continuous. A random variable is discrete if it's set of possible values is discrete. In Module 3, we're going to focus primarily on discrete random variables. For an example, suppose we flip a fair coin 100 times. The sample space is going to be all these events that are going to be 100 tuples where every part is going to be either a head or a tail, head or a tail. The cardinality is going to be 2 to the 100. Lots and lots of events in the sample space S more than we can really conveniently organize. But suppose we define a random variable Y to count the number of heads in the 100 flips, then Y can take on the values 0-100. It's going to organize all those events in the sample space and give us a number 0 through 100 for each event in the sample space. Now you can define lots of different random variables on the same sample space. Maybe X is the number of tails, maybe W is the number of heads minus the number of tails. Lots of possibilities depending on what you need for your particular situation. On the other hand, a random variable is continuous if it's set of possible values is an entire interval of numbers. For example, we could let T be the time between two customers entering the store. That time is in maybe some eight hour day. I could let you be a random number chosen from the interval minus 5 to 5. You could think of darts and throwing a dart at a dartboard. How far away is your dart from the center? That could be a random variable. Here's our convention, we are usually going to denote random variables by a capital letter near the end of the alphabet. Again, that's just convention. Specific instances of the random variable will be denoted by a lowercase letter. For example, if I have X and that's going to equal little x, so this is going to be my random variable and this is going to be my realization of the random variable. We'll see that throughout the examples that we're going to cover in the rest of this module. Step back for just a second and think about the big picture. In statistics, we will model populations using random variables, and then we're going to get features or parameters of that population, for example, mean or variance. The random variables then will tell us something about the population we're studying. That's the big picture. It's going to somehow condense all the information from our sample space, from our populations, and give us quantities that we can calculate: means or variances with. We should do some examples and we start with my favorite example, we're going to roll a six-sided die twice. We've studied this sample space a number of times. Suppose we let X equal the sum of the two rolls, then I want to calculate the probability of X being equal to 2, 3, 4, up to 12. The probability that X equals two. Our random variable is the capital letter X and our two is the actual realization. What does it mean to get a two? From our sample space, that's the event one, one. There's only one way to get that and that has probability, 1 over 36. I'll also notice that's the same as getting a six, six and that's the probability that X equals 12, we can continue the probability that X equals 3, that's going to be the probability of getting a one-two or a two one that's going to have probability 2 over 36. That's going to also be the same as a six and a five or a five and a six, and that's the probability of X equaling 11. We can continue. We can calculate the probability that X equals 4. That's going to be 3 over 36, and that will also be the probability that X equals 10, X equals 5. That's going to be 4 over 36. You can calculate all the individual elements that go into X equaling 5, and that's also the probability that X equals 9. Probability that X equals 6 is 5 out of 36, and that's the same as the probability of X equaling 8. I'll just write this one out, the probability that X equals 7, that's going to be a one and a six, a two and a five, a three and a four, a four and a three, five and a two, six and a one, probability of that whole thing and that's going to be 6 over 36. Now, what do we expect? If we add up all these probabilities, X equaling 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, what we get is the sum from K equaling 2 up to 12. The probability that X equals K, that's equal to 1, and we should expect that because every single element in this sample space is somewhere among the various realizations of the random variable X, and we've accounted for all those realizations through each going through 2, 3, 4, 5, 6, 7, all the way up to 12. What have we calculated? Well, we've actually calculated something called the probability mass function of a discrete random variable, and this is given by X equaling X. The probability mass function is little p of x. This is an abbreviation for the probability of X equals x. Sometimes when we start to have multiple random variables in the same problem, we'll put a little subscript on the probability mass function. It's all the x's in S, so that the random variable of X and s is equal to the realization little x. What do we expect from this? Well, the first thing we expect is the probability for each one of these is between zero and one. Could be zero if there are no events in our sample space that give a positive probability. Notice this is the same as Axiom 1. The second thing we expect is that if we sum up over all the realizations, little x, if we sum that up, that's going to equal all of the sample space and that's going to have probability 1, which is what we saw on the previous example. This comes from Axium 2. Then the third thing is if I have two different values, X equals a union, X equals b. So X could equal a or b and I'm assuming a is not equal to b, then X equaling a is disjoint from X equaling b, and so this would be the probability that X equals a plus the probability that X equals b, and this is Axium 3. The three axioms that we already studied come to bear on the random variables and the probability mass function of that random variable. Let's do another example. Suppose we have a lab and it has six computers. Let's let capital X denote the number of these computers that are in use during the lunch hour. Here's some numbers. The probability that none of the computers is in use is 0.05. The probability that one is in use is 0.1, two is 0.15 and so on. Let's notice, couple of things. This is the same as the probability that X equals x, and because there's six computers, the only possible values of our random variable X are the integers 0-6. Let's do some calculations. Suppose we want to calculate the probability that at most two computers are in use. We want the probability that X is less than or equal to 2. This is going to be the same as the probability that X equals 0 plus X equals 1, plus X equals 2, so that's going to be 0.05 plus 0.1 plus 0.15 and so we get 0.3. The probability that at most two computers are in use is 0.3. Suppose I want to know the probability that at least half of the computers are in use. In terms of our random variable, that's going to be the probability that X is bigger than or equal to 3. That will be the probability. X equals 3, X equals 4, X equals 5 and X equals 6. If we add up all those probabilities from our table we get 0.7. Now, let's notice something here. The probability that at least half of the computers are in use, is? Probability that X is bigger than or equal to 3. The complement of that event is the probability that at most two computers are being used. The complement of x being bigger than or equal to 3, is the event x is less than or equal to 2. You'll notice from the probabilities that works out exactly as we have calculated it. What about the probability that there are three or four computers free? The probability that three or four are free. Well, what does it mean for three computers to be free? That means three are being used. That would be the probability that x equals 3, or so that would be union. If I have four computers free, that means two are being used, so that would be x equals 2. The events x equals 3 and x equals 2 are mutually exclusive, so I can just add these probabilities and so we get 0.4. There's one more thing we have to talk about because it came up in that previous example, suppose I want to know something called the cumulative distribution function. Usually it's given by some capital F and it's going to be the probability that x is less than or equal to y. It's a cumulative. We're accumulating all the probabilities up to and including y, and another way to write that would be x less than or equal to y, adding up. All the probabilities that x equals x, so we could write this as a step function. For discrete random variables the cumulative distribution function is a step function, so let's see what that means. It's going to be zero if our y is less than zero because there is no probability of y is minus 5, the probability that x is less than or equal to minus 5 is 0. There is no probability there. If our y is between 0, but less than 1, it's 0.05. If y is bigger than or equal to 1 and less than 2, we're going to add-in 0.05 plus 0.1, so we get the 0.15. You can see that here, we're adding the probability that x equals 0, plus the probability that x equals 1. We can continue. If y is bigger than or equal to 2, then we have to add on the 0.15, so we've got point 0.05, 0.1, 0.15, so we end up with 0.3. If y is bigger than or equal to 3 and less than 4, we add on the 0.25. We get 0.55 Continuing on 4 and 5. Now we add on 0.2, so we get 0.75. Almost done. We get 0.9, and if y is bigger than or equal to 6, if y is any number bigger than or equal to 6 we get 1. We could graph this and it would be a step function and all discrete random variables have step functions as their cumulative distribution function. We can see we can calculate probabilities from the cumulative distribution function just the same as we can calculate probabilities with the probability mass function. We'll do more examples of this and start studying named random variables in the next video. Thank you.