Let's look at a more basic example of how a histogram might be constructed, and then use that as a springboard for talking about additional descriptive statistics, that can be generated for quantitative variables. In this example, we have the exam grades of 15 students. We first need to break the range of values in the intervals, also called bins, groups, or classes. In this case, since our dataset consists of exam scores, it'll make sense to choose intervals that typically correspond to the range of letter grades. So, 10 points wide, 40 to 50, 50 to 60, etc. By counting how many of the 15 observations fall in each of the intervals, we get this table. To construct the histogram from this table, the intervals are plotted on the X axis and show the number of observations in each interval or the percentage of observations in each interval on the Y axis. Which is represented by the height of the bar located above the interval. >> Once the distribution has been displayed graphically as a histogram, we can describe the overall pattern of the distribution, and mention any striking deviations from that pattern. More specifically, we should consider the following features. We will get a sense of the overall pattern of the data from the histogram's center, spread, and shape, while outliers will highlight deviations from that pattern. >> When describing the shape of a distribution, we should consider symmetry or skewness of a distribution and peakness or modality. That is the number of peaks or modes that the distribution has. Here, all three distributions would be referred to as symmetric. But they're different in their modality or peakness. The first distribution is unimodal. It has one mode, roughly at ten, around which the observations are concentrated. The second distribution is bimodal. It has 2 modes, roughly at 10 and 20, around which the observations are concentrated. The third distribution is kind of flat, or uniform. The distribution has no modes, or no value around which the observations are concentrated. Instead the observations are roughly, uniformly distributed among the different values. [MUSIC] A distribution is called skewed-right if the right tail, the larger values, is much longer than the left tail, or smaller values. Note that in a skewed-right distribution, as you can see here on the right, the bulk of the observations are small to medium with a few observations that are much larger than the rest. An example of a real life variable that has a skewed-right distribution is salary. Most people earn in the low to medium range of salaries with a few exceptions such as CEO's professional athletes etc. That are distributed along a large range that is the long tail of higher values. A distribution is called skewed-left if the left tail, or smaller values, is much longer than the right tail, or larger values. Note that in the skewed-left distribution, the bulk of the observations are medium to large with a few observations that are much smaller than the rest. An example of a real life variable that has a skewed-left distribution is age of death from natural causes. Most deaths from natural causes happen at older ages with fewer cases happening in younger ages. Skewed distributions can also be bimodal. Here's an example, a medium sized neighborhood 24 hour convenience store collected data from 537 customers on the amount of money they spent in a single visit to the store. The histogram displays the data. You can see that the amount of money spent is concentrated around $20. And then concentrated again around $50. From the marsh crater dataset, we also display the latitude of the marsh craters rims. The values are concentrated around 66 to 69 decimal degrees north, and again, around 36 decimal degrees north. So the mode or modes of a variable are the values that occur most often. And knowing this can help you make better decisions. The mode for example has applications in book publishing. Not surprisingly, it's important for the publisher to print more of the most popular books, because printing different books in equal numbers will caused a shortage of some books, and an oversupply of others. Likewise, the mode has applications in manufacturing. For example, it's also important to manufacture more of the most popular shoes in shoe sizes. >> Now as we've seen, the mode, is not always at the center. The center of distribution is its mid-point. The value that divides distributions to let approximately half the observations take smaller values, and approximately half take larger values. >> As you can see from the histogram, the center of the grades distribution is roughly 70. We can get only a rough estimate for the center of the distribution. 7 students scored below 70 and 8 students scored above 70. Estimates can often be made from examining the histogram. So what about spread? The spread of the distribution, also called variability, can be described by the approximate range covered by the data. From looking at the histogram, we can approximate the smallest observation, or minimum, and the largest observation, or maximum, and thus approximate the range. In our exam score example, you can see that the approximate minimum is 45. That is the middle of the lowest interval of scores. The approximate maximum is 95, the middle of the highest interval of scores. So our approximate range is about 50 points, 95 minus 45. The overall pattern of the distribution of a quantitative variable is described by its shape, center, and spread. By inspecting the histogram, we can describe the shape of the distribution. But as we saw, we can only get a rough estimate of the center and spread.