[MUSIC] We've talked about how everything represented in a computer has a numeric form and discussed what that means for some particular kinds of data. And back in course 1, we introduced features and the idea of feature engineering. Now, we're going to go into a bit more depth on this point and present some basic approaches to feature engineering for different kinds of data. By the end, you'll understand some standard techniques and have explored scenarios where feature engineering varies based on the type of data. Having a deep understanding of data is an essential prerequisite for doing EDA, Exploratory Data Analysis, as well as feature engineering. Very often, it happens that only certain types of feature engineering techniques are valid for certain types of data. You may need to encode your data differently in order to visualize it better and to use it as features in machine learning models. Algorithms often require your data to be in a certain form for them to be able to learn effectively. That's where the need for feature engineering arises. As a starting point, what are some of the simplest features we can use? Well, as with everything else in machine learning, a lot depends on the data. Some techniques work well with certain algorithms and data types, while others might be useful in all cases. First, let's talk about numerical data. We've already discussed how machines can only understand numbers. So any non-numeric data needs to be converted into numbers for a machine learning algorithm to learn with it. Most of the algorithms and machine learning have underlying assumptions about the data where they work best. Understanding these assumptions and transforming the data accordingly is the key to building an optimal machine learning model. Quite often, the raw numerical data can be directly fed to a machine learning algorithm without any transformation or engineering. Typically, the raw data will just have values or counts. Let's have a look at this data set. If you look closely at the values in this table, you can see that the cells contain only numeric values. In this form, the data can be directly used to train a machine learning algorithm. However, in many cases, you may need to reorganize and amalgamate multiple instances of your data before using it, as shown in this particular data set. Let's say you're building a very naive machine learning model that recommend songs to listeners. In the raw data you got from the streaming service, there's a record of every time a song was played keeping track of who listened to it. Rather than using this long list of every time a song was played, you're probably better off grouping together all the entries for a single user. Giving you a historical count of the number of times they've played a particular song. As shown in this data set, perhaps, it would be a good idea to recommend songs that are similar to Come As You Are to users 0 0 1 and 0 0 4 as they have a very high count for the song. Alternatively, you may not be interested in how many times something happened, just if it happened at all. In this case, converting attribute values to Binary form could be helpful. A scenario where this could be useful is movie recommendations where it's useful to know if users seen a particular movie or not as opposed to having a count of how many times a movie has been seen. Bidding is used for transformation of continuous numerical values into discrete values or categories. The easiest kind of biding simply buckets values according to some fixed criteria. Maybe making buckets of the same size, maybe using some standard domain specific thresholds. This approach is useful when you know the fine details aren't really important and what they're learning algorithm to treat a range of values similarly. Since bidding leads to discrete categorical features, we probably want an additional step of feature engineering on those categorical features. Maybe something like one hot encoding as we've seen in our prior videos before using those features to learn a model. Or as we've discussed, you can treat the bins as ordinal features and assign them integer values accordingly. We can also apply some scaling techniques to deal with skewed data. An example would the log transform which, when applied, compresses the values which are higher in magnitudes and expands the ones with lower magnitudes. Let's look at the distribution for the feature income. As we can see, values are skewed to the left side of the distribution. However, on applying the log transformed, the distribution is more normal. Other frequently you scaling techniques include z-score normalization min-max, etc. So far, we've covered some simple feature engineering techniques for numerical data. But what if the data is non-numeric? In fact, most of our real-world applications, which require human interaction are text based systems. We've already talked about some transformations of text earlier when we were consolidating our data sources. It's such a big topic that we're going to address it later this week. What other kinds of data can you think of? How about temporal data? Temporal or time series data is essentially a sequence of data points where some attributes are measured over time. Some examples of temporal data are the daily closing value of a particular stock or index, an ECG recording of the electrical signal measured from the heart, or the daily average temperatures over a period of time. So what are some basic approaches we can take when preparing time series data for our models? Let's start with an example. Here's a plot of the recorded daily average temperature. If we wanted to predict the average temperature for tomorrow, we could try using the mean temperature from all the days in the previous week. This is a very naive approach called a moving average model which simply predicts the next observation based on the mean of a subset of previous observations. Although it seems simple enough, moving average can sometimes be very useful in prediction based on the trend of a particular measurement. If we predict the next day's average temperature by simply looking at the moving average for that week as shown in this plot, you can see the moving average actually closely models the real data. In other situations, we may not necessarily care what order observations happen but rather if they occur during certain periods of time. We mentioned when describing the numerical representation of time that you should think beyond format to what you really want to represent. For example, if we have the time and date of observations, we could create simple features like day of the week or weekends versus weekdays. As shown here, dates can be converted to a vector table where each cell represents a day of the week with the values 0 or 1 based on which day of the week that date corresponds to. Similarly, we can have a vector representation based on if the day was a weekday or weekend. The last type of data we'll look at in this video is image data. Again, we discussed image data earlier and presented some different ways to transform images into useful forms. Now, we're going to look at some of the simplest features which can be extracted from image data. Remember that the smallest individual element of an image is called a pixel which stores a number representing the intensity or brightness of a given color at any given point in the image. In grayscale images, smaller numbers closer to 0 represent black and larger numbers closer 255 represent white. So essentially to a machine, an image is simply a matrix of pixel values. Each value between 0 and 255 representing the intensity of that color for the pixel. Now since we know the images are made up of pixel values, we can simply use these pixel values together as a feature vector. Or build a histogram of pixel intensities for each image to train a machine learning algorithm. Let's look at an example in more detail. On the left, there's an image of a handwritten digit 2, which is 28 by 28 pixels. We can directly use the pixel values as features to train a machine learning algorithm to be able to recognize the digit and that's it. You've just created a very naive machine learning model that can perhaps help you with some image recognition tasks. There are other complex feature detection and extraction techniques that can help in detecting image features, like edges or blobs or corners or other points of interest. One of the most popular is the scale-invariant feature transform or sift which is used to detect and describe local features in images. However, going into details about these techniques would be outside the scope of this lecture,. But I definitely encourage you to look it up if you're interested in working with images. Especially because recent developments in deep learning have benefited from the large labeled image repositories. And there's been a lot of work done to better represent images just taking in raw pixel values as inputs without any feature engineering. Most of the features are learned through the many layers in the deep network, and we encourage you to explore these deep architectures further. There you have a smattering of techniques to begin your feature engineering journey. Stay tuned for more adventures.