Welcome to the Machine Learning course. This course provides a high-level overview of a technology that is changing our lives by transforming everything from the tech industry to agriculture, insurance, banking, and marketing. We use machine learning dozens of times a day without even knowing it. Each time we use a web search engine, or our spam filter detects undesired emails, or the photos containing our friends are automatically grouped, we are exploiting machine learning algorithms. Bill Gates said: “A breakthrough in machine learning would be worth ten Microsofts”. The director of DARPA (the research and development agency of the US Department of Defense), Tony Tether, said: “Machine learning is the next Internet”. These are just some of the many claims made about machine learning that show the interest of both the most powerful nations and the most important companies for this technology. Driven by the astonishing results obtained by machine learning algorithms in the last decade, billions of euros are invested every year in this field for public and private research. But, what is machine learning? In 1997, Tom Mitchell, professor at Carnegie Mellon University, provided this definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks T, as measured by P, improves with experience E.” One of the key elements of this definition is the experience. Unlike classical artificial intelligence approaches, which use a top-down deductive approach based on models, machine-learning algorithms exploit a bottom-up inductive approach that, starting from experience, data, detects patterns that are used to develop general conclusions. Since knowledge is induced from data, machine-learning algorithms may fail, as they cannot learn something they have not experienced. Using the words of Arthur Samuel in 1959: machine learning is the “field of study that gives computers the ability to learn without being explicitly programmed”. Traditional programming is a manual process in which a person, the programmer, develops a program that runs on a computer and uses input data to produce the output. In machine learning, this paradigm is overturned to get an automated process: starting from the input data and the corresponding desired output, machine-learning algorithms automatically develop the program. In this way, computers program themselves. Machine learning is useful whenever we don’t know or it is very complex to explain to the machine how to solve a certain task, while it is easier to show the machine examples of how the problem is solved and let it learn to find a solution on its own. For instance, suppose that we want to develop a program to recognize animals from their picture. Using traditional programming, we would need to write many lines of code to specify how to process the image and what characterizes a certain animal species compared to others. Using machine learning, instead, we simply have to provide a set of examples where for each image we need to indicate which animal is portrayed. Machine learning techniques can be roughly divided into three main categories: Supervised Learning, Unsupervised Learning, and Reinforcement Learning. Supervised learning aims at inferring a model mapping an input space X to an output space Y given a dataset of examples. We refer to these techniques as supervised because for each example in the dataset, we have the association between the observation variables available as input and the output label/value produced by a supervisor. Going back to the previous example, the observations are the images of animals and the labels are the names of the animals. Once we have trained a Supervised Learning technique over the dataset of examples, we want the learned model to be able to make predictions about new unseen observations. In other words, we want to learn models that can extract and generalize the knowledge contained in the training dataset, similar to what our brain does: in fact, we are able to recognize a dog even if we have never seen that particular dog before. Supervised learning techniques can be applied both into classification and regression problems. In classification problems, the output variable to be predicted is categorical or discrete, while in regression problems the output variable is numerical or continuous. For example, consider a weather forecasting problem where, given a set of meteorological input variables, we have to predict the weather tomorrow. If the expected output is a label like sunny/cloudy/rainy, we have a classification problem, while, if we have to predict the temperature during the day, it is a regression problem. Let’s move to the second category of machine leanring techniques. Unsupervised learning aims at finding a better representation for the data. In this case, we have a dataset of observations without any output variable and we want to discover hidden structures from unlabeled data. Developing better data representations may be an essential step in solving other machine learning problems, performing data analysis and data visualization. Unsupervised models can be further grouped into clustering, association, and dimensionality reduction problems. In clustering problems, we want to unveil the inherent groupings in the data. A cluster is, therefore, a collection of objects that are “similar” between them and are “dissimilar” to the objects belonging to other clusters. For example, in retail stores clustering can be used to group the customers according to their purchasing habits in order to devise a separate business strategy for each group. In association rule learning, we want to identify new and interesting patterns in our dataset, which are usually represented in form of rules or frequent itemsets. They are commonly used for market basket analysis to know which items are bought together, for customer clustering in retail to know which stores people tend to visit together, price bundling, for assortment decisions, cross-selling, and others. Finally, the third unsupervised learning problem is dimensionality reduction. Dimensionality reduction techniques aim at reducing the number of variables present in the dataset. In most cases, these variables can have large correlations, making the representation redundant and negatively affecting the training of machine learning models. Reducing the number of variables without losing too much information is of paramount importance for increasing the efficiency and the reliability of the learning process. Dimensionality reduction can be achieved through feature selection, which is the selection of a subset of the original variables, or through feature extraction, which consists of projecting the original variables into a new, usually smaller set of variables. The third category of machine learning techniques is Reinforcement Learning. Reinforcement learning aims at solving sequential decision-making problems. In this case, the agent, that is the decision-maker, learns by interacting with the environment. At each iteration, the agent perceives information from the environment, selects a decision based on this information, and receives a reward, which is a numerical value that the agent aims at maximizing over time. A reinforcement-learning agent learns by trial and error, just as animals and humans do when they learn to walk, talk, drive, cook, etc. Reinforcement learning finds application in many real-world problems, such as digital advertising, resource management, medicine, autonomous driving, automatic trading, and many others. In the next lectures, we will cover these three categories of machine learning techniques in more detail, starting with Supervised Learning ones.