Descriptive statistics are powerful. They can be used in retrospective studies to overview historic data and explain trends. Here are going to see how to form simple yet powerful queries with MIMIC-III database. In particular, we're going to see that extracting patients can result in different estimations depending on which table identifier we look. Therefore, we really need to understand the schema of the database and how it encodes the data. Descriptive statistics are based on common queries to understand the distribution of data in MIMIC-III database. They're relatively simple, yet very powerful ways to interrogate the database and provide intuitive summary visualizations. Normally, descriptive statistics look into patient characteristics, intensive care unit utilization, and patient outcomes such as mortality. A number of factors should be considered while we extract this data, and those factors can actually influence further analysis in the results. For example, when we are looking into estimating the number of patients, we will realize that there is more than one way leading to similar but not identical results. As we're going to see, we can look into the number of distinct patients across care units. We can also look into unique hospital admissions. Some patients have been admitted more than once. Therefore, we would expect that the number of unique patient admissions is less than the number of unique hospital admissions, since a patient can be hospitalized more than once. We can also consider unique admissions to ICUs and this number will be, again, different than the number of distinct patient across intensive care units because some patients have been admitted to more than one intensive care unit. Here I show you one of the simplest queries we can use to interrogate an electronic health record database with relation to the age distribution across different departments. In particular, for MIMIC-III, it is useful to know the age distribution across intensive units. Here we see a diagram of the query. We combine the Patients and the Icustays table. Age can be computed by subtracting the time of admission to ICU from a patient's date of birth. To get the age distribution across ICUs, we can use the different care units obtained from the Icustays table. This information is encoded with the first_careunit identifier. We also need to note here that we are looking the age distribution of patients which are older than 16 years old. In other words, we're not interested in children. Here we plot the results of this query and we see the age distribution in years with relation to different intensive care units, such as the medical intensive care unit, the surgical ICU, the cardiac surgery coronary unit, the coronary care unit, and the trauma surgical intensive care unit. During your practical session, you will have the opportunity to practice these queries yourself in order to familiarize with the MIMIC-III database and its main structure. Here, we see a similar query which aims to extract the gender per critical care unit. Since gender is a field of the patient table, to extract the distribution of gender across ICU units, we just need to combine the Patient table with Icustays table and this is based on the subject_id identifier. For each patient, we collect the ICU the patient was admitted to. Subsequently, we also collect the gender, the date of birth, and hospital admission time. The latter two are used to compute age and filter patients with an age older than 16 years old. As we see, even with simple queries like extracting age and gender, we still need to understand the data and consider what questions address the descriptive statistics for employee. Here, for example, we have adjusted our query to exclude patients younger than 16 years old. Therefore, our data focus on the adult population. Intuitive visualization of those statistics , in this case, can provide us with insight of which conditions affect more females and which conditions affect more males. Here we see, as example, the visualization of the query we put together to extract gender across critical care units. Here we don't show the neonatal intensive care unit because we selected patients older than 16 years old. There is no point. We can observe though that the medical intensive care unit has the larger number, both in terms of female and male patients. We can also observe that the number of men is higher in the intensive care units with relation to cardiac problems as well with relation to trauma surgery. Same data can be visualized with completely different ways. We show you here an example where we see the data returned from the same query, visualized with a graph, with relation to percentage of patients rather than absolute number of patients. Here we see another example of descriptive statistics which aim to extract the distinct patient hospitalizations. The Patient table is combined with the Icustays table to get each patient's unique identifier and the corresponding ICU that each of the patients were admitted to, similar to what we've seen before with relation to extracting age and gender across ICU units. In other words, here we use the INNER JOIN of the two tables in order to select all the rows from both tables with a matching subject identity. We use each patient's date of birth and time admitted to the hospital in order to exclude patients which is younger than 16 years old. We should highlight that the Icustays identifier groups all ICU admissions within 24 hours of each other, so it is possible for a patient to be transferred from one type of ICU to another and have the same Icustay ID. We see here a visualization of the results of our query. This allow us to estimate the number of distinct patient across all care unit. It also provide us the ability to estimate the percentage of patients in each unit according to the first ICU admission. The query would omit details of the other care units if the patient has been transferred internally. We see here that the medical intensive care unit has the biggest number of admission. We expect this since the other intensive care units are more specialized. Some patients might have been admitted to hospital more than once. Apart from the number of unique patients, we would also like to know the number of unique hospital admissions and the corresponding distribution across intensive care units. To get a number for hospital admissions, we combine the Patients table with the Icustays table based on each patient unique subject identifier, like we've done earlier. We collect each patient's identifier, and date of birth, and all the unique hospital stay identifiers. This is a unique number that provide us with the information of a patient hospital stay. Once we have this information, we can then filter with relation to the age and provide an estimate of the unique hospital admissions. We see here the results of our query with the relations to the total hospital admissions. They're similar to what we obtained before, but not identical. We should highlight that the unique admissions to ICUs is different than the number of distinct patient across intensive care units because some patients have been admitted to more than one intensive care unit. In order to identify the distinct ICU admissions, we combine, again, the patients in the ICU table based on each subject's unique identifier. So we collect each patient's identifier, date of birth, and hospital admission time. Here, the key difference is that we use the unique ICU stay identifier instead of the hospitalization identifier. This remind us also the point that the MIMIC-III database has been constructed by linking the hospital database with the ICU information; therefore, the two identifiers are unrelated. Following similar process, we see the results of our query over the total ICU stays. In order to highlight the difference between those three queries, we see here the results with relation to unique patient admissions to unique hospital admissions and to unique ICU stays. We expect that the number of distinct patients across all care unit would be slightly less than the number of hospitalizations. These will also less than the number of ICU admission across different units. The largest differences are in the cardiac surgery recovery unit and the medical intensive care unit. Perhaps, they indicate that patients have been transferred from one unit to another. Here we see how descriptive analytics can give us useful insight about the number of patients and the way resources are utilized. They can also reveal trends and anomalies. Hospital admission were dramatically affected by the pandemic, for example. We shouldn't undermine how powerful descriptive statistics are. They can identify changes in the demographic profile of hospitals and ICU units over time. For example, based on descriptive statistics, studies have shown a steadily increase in chronic illness, including diabetes mellitus and hypertensions. There is an increase in the proportion of elderly in women. They have also showed that while some emerging technologies have improved the patient's outcome, at the same time, they increase the length of stay and the associated risk of complications, as well as the resource consumption. The last years, we also see a decline in hospital mortality from acute coronary syndromes. We can observe that the age of admitted patient is higher in later years. Descriptive analytics, they can also look into leading cause of admissions, as well as the leading cause of death. Putting all this together, descriptive analytics can provide us a lot of information about historic data. They can be used to explain trends, but they cannot be used to predict future and prevent disease and high rates of mortality. Therefore, they are limited into retrospective studies.