[MUSIC] In this lesson, we're going to cover how to build reliable data pipelines with Delta Lake. In particular, we're going to create the medallion architecture, bronze, silver and gold level tables with Delta Lake. We'll also analyze the Delta transaction log to take a look at how our data changes over time and how it captures all of those changes. And we'll also use the update command to update existing data to correct for errors. To motivate this lesson, let's take a look at the diagram below. You can see here that you can have many different sources of incoming data. The first step is to write out the raw data to a bronze table. We'll then use the bronze table as input to our silver table. This is where we're going to filter out columns that we don't need, or records that we don't need. Clean the data and potentially augmented with additional information. We'll then use the silver table as input to the gold level table. This is typically used for business level aggregates and is the cleanest version of your data. This will then power tools such as streaming analytics or AI reporting. To get started in building out this architecture, let's run our classroom setup script. We'll then create a temporary view with our parquet file. You'll notice that when you create this view, we specify using parquet because the underlying data is stored in parquet file. Next, let's write our raw data into a delta bronze table. The only change that we have to make here is we're going to specify using delta rather than using parquet. And so, all of the content should be identical to what the parquet file also contains. So now that this is finished, let's take a look at the data in the data tab here on the left. You'll notice that we have this fire calls bronze table. Let's click on it here. Here we can see the schema, and we can see the sample data. This is the same with any other table you create within data bricks. But because this is a delta table, we also get this additional details tab here as well as history. So for example, we can see when the table was created, last modified, number of files, etc. We can also click on this history tab. And we can take a look at the history of our delta table here. We've only had one commit to it. So we're still on version zero, but we can see information such as the timestamp who committed it, what type of operation. And if we click here, we can scroll to the right to see the other parameters that it gets tracked. Now let's go back to a notebook. Let's take a look at the underlying files that were generated from writing this delta table. You can see here that we still have all of the underlying parquet files. But what it also generates is this delta log directory. Let's go ahead and take a look at the first JSON record within this delta log. To do that, we're going to call FS Head on our delta log file. And here we can see the commit the time stamp, which user, etc. So now that we've written to our bronze level delta table, let's go ahead and refine it a bit, and then write it to delta silver. Here, we're going to filter out any unnecessary columns and any no records. So we're going to create a new table called fire calls silver using delta, which has just a subset of the columns. And it's going to filter out records where the city is null and the neighborhoods analysis boundaries is also none. So here we have our cleansed dataset, but you'll notice that there's certainly more data cleansing that could have happened. For example, in the city column here, we have some cities listed as San Francisco, others that are listed as SF. Let's make this consistent using the update command. So here we're going to use the update command to set the city to be San Francisco where the city either equals SF or San Francisco in all caps. And now we can see how this is reflected in the transaction log. We could always go back to the UI to take a look at it, or we could call described history on our fire calls silver table. And so you'll see that we now have two versions, version 0, version 1. Version 1 is the latest version. And we can see the operation parameters here, with this predicate about the city being equal to SF or San Francisco. And now, we'll have another record in the delta log directory for our fire calls silver dataset. We can take a look at the corresponding JSON file here. And we can take a look at all of the files that needs to remove, and all of the new files that it needs to create in order to make this change for the update. Lastly, let's go ahead and aggregate our data and write it out to our gold level tables. We're going to create fire calls gold using delta, selecting just our neighborhoods and renaming the column to be called neighborhoods. Also with just two o's. We're going to get the call tape for each neighborhood and the corresponding count, because we want to report back to the business, the most common call types for each neighborhood. And so now that we have our gold level table, we can easily answer that question for them. We can take a look at the neighborhoods, the call types and the corresponding count. And we can easily sort it for them if they want to see the most common, the least common, or grouping it by neighborhood.