In this module, what we're going to talk about is sampling. How do you represent a population basically. So, what you're going to learn about are the major differences between probability and non-probability sampling, which are the two major stems of your sampling frames, and then also we're going to talk about specific practices about probability samples and how are the non-probability and probability samples different from one another. So, the major differences between sampling can be defined by probability samples and non-probability samples. These are the two large trunks that set up your sampling decisions. In probability sampling, every member of your population has some chance to be selected for the survey. In non-probability sampling, by definition, every member of your population does not have some chance to be selected for the survey. In other words, you're going to miss some people because of how you set up your survey. Sometimes non-probability samples are called convenient samples, sometimes purposes samples. We're going to go into much more detail about non-probability samples. Traditional researchers as identified by my frowning, beard face guy here, don't really think of non-probability samples very often because they rarely represent a population. Most traditional pollsters and survey researchers are really trying to represent and predict what a large population of people think and are going to do. So, they very often issue non-probability samples because the error rates in non-probability samples can be too high, as we saw when we looked at coverage and sampling error in a previous module. However, I'm going to spend a lot more time actually on non-probability samples than probability samples because I think for UX research, we're more often going to actually engage with the non-probability sampling techniques. However, let's talk a little bit more about probability and how probability sampling works. So like I mentioned, in a probability sample, every person in a population has a chance of being chosen to be in the sample. So, samples therefore approximate the true values of a population. That's why probability samples are so important, especially in large-scale research. You're trying to represent a basically true belief of a population of people. The closer you can get to representing that population, the more accurate you are going to believe about what that population actually believes. Remember when we talked about error in surveys and how one source of error might be that you miss a whole group of people who don't answer your survey, or who are different than other people. That error is introduced or reduced, I should say, by having probability samples. We're really randomly choosing people from a population. Another way of looking at this is my chocolate chip cookie diagram here, where the population, this theoretical construct called the population, is represented by the large gray circle. That population has a set of beliefs defined by their demographics, defined by their personal characteristics. Older people are going to believe things differently than younger people, females might have a different set of beliefs about something than males do. All of these are general, large scale differences based on demographics or experiences. The population therefore is, while it's shown here as a big gray blob, it's actually not. It's actually a rich, diverse, mosaic of people with different beliefs and habits, some of which are regularized based off those personal characteristics. Now, the little blue dots are representing samples. Any given researcher could go and take one of these samples from the population, and the hope is that each one of these blue dots could fairly accurately represent the large gray circle. That's what probability sampling does. It takes basically this construct known as your population, and tries to very rigorously, find a random portion of that population to ask questions to, to be able to represent the beliefs of the whole. But because that population is not homogeneous, because it has lots of different beliefs and different characteristics, it's really important to think about how we sample them and how we reduce the error that we get if we miss a group of people. To have a probability sample, one important thing that you need to have is a list of every member of the population. If you don't know actually who's in the population, you can't randomly sample from them because not everybody has a chance to be in your survey. Remember that the definition of probability sampling is that every member of your population has some random chance to participate. If you can't actually get a list of them, they don't have that chance. Sampling frames are lists of people in your population that help you to actually draw a sample from that population. So, here are some example sample frames. The classic examples is what I call here global examples, come from pollsters and survey research. One is street addresses. If you really are interested in representing the population of the United States for instance, you're often going to turn to street address, and then of course, even that misses a huge portion of people. That's your coverage error, again. Homeless people, people who live in unstable housing may not have a street address, and so you're going to consequently miss some of them. Another classic method for trying to get a list of your population classically is phone numbers. At one point in the history of the United States, about 96 percent of people had a landline phone in their household. That number has shrunk drastically over the last five years because of the introduction of mobile phones, and people have really been dropping landline phones services. So, survey researchers had to respond to that. Having a sampling list of phone numbers used to be a great way to have a list of everybody in a large population like that, now it's not so much. We're going to look at more local examples, or things that you're more likely going to participate in, I should say. For instance, an organization's e-mail directory can act as a good sampling frame. For instance, if the population I'm interested in is all of the first-year students who are entering the University of Michigan this fall, well I know that the University of Michigan has a list of all the e-mail addresses of that group. So in that case, my population, our first-year students here at the university, my sampling frame is the list that the university provides of other e-mail addresses, and then I can sample probability-wise from that list. Another group might be phone numbers of all the people who have called customer service this past month. Now, there you're really limiting your population in a way. The only group you're really going to generalize to are people who have called customer service in the past month, but you have a list of all the people and you can sample from that list and be able to represent that particular population. Finally, if we think about people on a site where I do a lot of work with online communities, you can think about the usernames of all site users. Somewhere in your database there's going to be a table that has a list of all usernames for all site users, and that again, in this case your population is all users of your site especially if the site requires usernames, and the sampling frame is the database of their names, and then you can randomly draw from that if you are interested in doing a probability sample survey. So, probability samples are best when you are trying to describe a population. All users think this about our product, all our users struggle with this aspect of the site. This is how younger males reacted to this change in the product. In each case, this in italics is a provocation or changing the environment that you're trying to represent how that affects a certain type of population. How you get access to that population is really important. Non-probability surveys should not probably used to describe a population. I'm going to talk ways in which these non-probability surveys are wonderful in various ways for UX research, but if you're trying to say all users think this, you're going to miss some people if you're overly dependent on non-probability surveys. A good example of this. So, Pew Research did a series of studies. Pew Research Center is a large research organization, they're non-profit, they describe themselves as a fact tank, and they do a lot of survey research, and they do probability or survey research. But because probability surveys are so expensive and getting harder and harder to conduct, they did a big evaluation of online non-probability surveys to see what happens when we do non-probability surveys. How much do they actually deviate from these gold standard probability surveys? So, what they found is that indeed these non-probability surveys did end up being biased in different ways. So for instance, people who participated in non-probability surveys, were more civically and politically engaged. That makes sense. If you're civically minded and you like to volunteer, you might also volunteer to participate in research at a higher rate than people who don't. That changes the probability of who participates in your survey or not. People who participate in these non-probability surveys were also more likely to live alone, they're more likely to not have children, to be unemployed, and to be low income. These non-probability surveys were also much more likely to miss Black and Hispanic respondents. So, depending on the type of insight you're trying to generate and depending on the type of population you're trying to generalize too, non-probability surveys can be problematic. But again, that's partially because of how you're trying to represent a population. So, going back to our chocolate chip cookie diagram. In this case the chocolate chip cookie is looking a little weird, and that's because I've drawn a line here to show, what if with a non-probability survey you miss a huge chunk of the population? Well, in that case your samples have no chance of representing that portion of the population, and you're never going to be able to capture their experiences, or beliefs, or demographics, or characteristics, or anything like that. In summary, not every set of question needs a probability sample. In UX research, non-probability samples are actually probably going to be more of your bread and butter, and a common technique that you use. However, it's important to distinguish for what types of UX questions and what types of insights do you need a probability sample, versus when is it okay to use non-probability. Understanding how these different sampling methods work and understanding the different kind of theoretical differences between them, really is the cornerstone for making these smart decisions. The next module, we're going be talking more about non-probability samples, and how those work and can be effective for you.