In this video, we are going to look at the simplest neural network possible, just one neural note. And that would be perceptron, so a single neuron constitutes what is a Threshold Logic Unit, the TLU. So what does the threshold logic and it look like, basically it is this one note that has inputs x1, x2 to xn. That are weighted by w1, w2 to wn, right, so you have any inputs coming into this one neural unit that has a threshold t. And based on whether this combination of the input x, to the weight, w will multiply them and add them up, add these signals up if that is above a particular threshold t. Then the output label, y would be one and that will be the threshold unit, right? So an example of that would be suppose you have a Threshold Logic Unit created for onset of diabetes. So you're wide the label with the onset of diabetes, and for this I suppose you have only three input features age BMI, and Bood Glucose. And depending on the weights assigned to each of these, so w1 would be the weight for age, w2 for BMI and w3 for Blood Glucose. You can see whether that combination exceeds a particular threshold. And then you would say that the answer of diabetes label would be one. This perceptron is then a linear classifier because you have this dot product of x and w. And then check whether that is above the threshold, so an equation would be something like this, right? So y would be one if this some of these individual products, so the sum going from one to end. The end inputs each for each input, you're multiplying it by the weight w, so xi wi and the product of that. And then you add them up over all n, If that value is greater than specialty, then the label, y will be one. If it is less than specialty, it will be zero, the goal of this learning this perceptron is that you need to learn these weights w1 to wn. You learn these such that this objective function is maximized, the perceptron algorithm is an online learning algorithm. That is a mistake driven algorithm which means that you only learn, then the classifier makes a mistake. If the classified is perfectly fine, you're not doing anything any any mistake in the input that you're seeing, don't change it, right? That is the philosophy that goes in the perceptron training, so the parameters here would be this w1 to wn. There are n parameters depending on the inputs, and then the question is what about this threshold t. Isn't that also parameter typically t is said to be zero, so then it becomes like a sign operation if you multiply x to w and if the sign of the final sum is greater than zero, then we call it. Yes, and if it is less than zero that the sign is negative, then we say y is zero, right? But you can also then, instead of having the threshold at zero, you can have threshold also as one of the weights. So what is typically done is this n dimensional input, x is extended by one, so it becomes an n plus one dimensional input. Where that last element is always one, so that is the bias term, so you have x1 to xn and then that last dimension is one for every input. But then that allows you to look at this as an n plus one dimensional input, that has n plus one dimensional weight vector where you have w1 to wn corresponding to individual feature, weights and then this last w. What is usually called w zero would be minus threshold, so you can think about it as the same, y equal to one. If the sum of xi wi is greater than t can be changed to say y is one, if the sum of xi wi where I goes from zero to n. Where x is equal to zero has a special meaning, right, if that is greater than equal to zero, then the what is what?. So, you have just basically had created a uniform treatment of both this way to vector and the threshold. And in that way then perceptron ends up having n plus one dimensions for the input. Where the last dimension is one, the bias term and hence N plus one parameters, right? Where that last parameter is the threshold t, excellent, so this is basically a uniform way of kind of learning for weights and thresholds. So let's try to do that and see how that actually works, so for this example we have only two dimensional input, x1 x2, okay? And then as you know, the 3rd 1 would be the bias term one, so recall perception is a mistake driven algorithm. So you'll change the model only when it makes a mistake, so let the input be these two inputs, you have x1 as 0.8 and x2 as 0.4. And then the second example we will have x1 as 0.3 and x2 as 0.2. The first examples label is one and the second one is a negative example, so it's the label is zero, okay, so that's that's what we're going to work with. Okay, so the first step is to really identify what these weights are, right? So you have two weights w1 and w2 and then you have this w zero which is going to be minus of the threshold. So let's set it up as four and minus two, so you have to wait one is four, weight two is minus two and the threshold is 0.1 and so minus of t will be minus 0.1. Clear, good with this weight vector, we are going to see what would the perceptron do for the first input. So when x1 is point 0.8 and x2 is 0.4 we have y1 to be 0.8 times four, that's the weight one. And then 0.4 times -2 and then plus one times minus 0.1, right? When you add them up together, that gives you 2.3, 2.3 is greater than zero, and so the label would be one. The actual label was one, the predicted label is one, it is not a mistake. We don't do anything to the perceptron perception is doing that, so now we take the second example for the second example, x1 is 0.3, x2 is 0.2. So the predicted label would be 0.3 times four and then plus 0.2 times -2 and then one times minus 0.1, right? When you add them up, the total sum is 0.7, now 0.7 is greater than zero, so the predicted label is one. But that is a mistake because the actual level was zero, so this is the first time that the perceptron training algorithm will trigger because you have made a mistake. So what would be that training algorithm to train the perceptron, the weights are updated using a specific rule, the change of the weight is the difference in the label. So y-y prime would be the difference in the label, right, so in the first example the difference was zero because the predicted was one, the actual was one so one minus one was zero. So you don't do any change, but here the actual is zero and the predicted as of one, so the y minus y prime is actually a minus one. So you're reducing the weights, you are giving a high score for this example, you should actually bring down that score. That's the idea, so that multiplied by the learning rate, multiplied by the input. That is the with the weight changes, so in this case let's say the rate of change the r is one. Okay, let's keep it simple, so if that is the case then w1 is changed this way, so w1 minus four. Whether its original value Is going to be minus one times the rate one times the input 0.3. That was the one that made the mistake on, so actually the weight one is going to be reduced by 0.3, it was four. So now the new weight is 3.7, let's take the second rate, w2, w2 would be changed this way. So w2 minus two which is original value is going to be minus one times one times the input x2 which is 0,2. So w2 has to be brought down by 0.2, it was already minus two, so It now becomes minus 2.2, w0 is also parameter, right? So w0 also has to be changed the same way, so w0 would be w0 minus 0.1 is again minus one times one times one, that was the input x. So remember the bias term is always the one, so that means w0 has to be pulled down by one and hence it is going to become -1.1. So the reason why this change is happening now is that if you actually now see the threshold has reduced, the weight, stem cells have reduced. So what was earlier, 0.7 is going to be slightly lower and the threshold has changed and that has already kind of included deaths. It's going to be a negative term and that negative term is actually now going to give the current label zero, right? Because that negative term is going to be lower than the threshold of zero. It need not be the case, that one change will actually make this error character in this particular case it happens that. This input example, 3.2 now is correctly labeled but that need not be the case all the time. However, as you see that the model has changed slightly so that it will make less mistakes on examples like this, right? This kind of approach is kind of continued until the training error doesn't improve. So you you have made it better now you'll continue with other examples and keep doing so. And if it happens that the training error doesn't change anymore, then you can say that the model has converged. The biggest advantage of perceptron is that there is theoretical proof that if there is a solution. If there is a linear model that can be trained for the given input, this perceptron algorithm will converge in finite time. It's very rare to find algorithmic guarantees, but perception is one such model where there is an algorithmic guarantee that given sufficient data. If there is a solution that exists that there is a perception that can be trained, then this algorithm will lead to that solution in in finite time. So it will converge in finite time, okay, so this is an interesting model, this was a 1980s model. It was built in 1980s but now has gained significance because of deep learning approaches. So, what are the key takeaways from this video on perceptron, is that linear classifier is kind of based on a threshold logic unit. The perceptron convergence theorem is critical, and that is that if there is a solution that means if the data is linearly separable, then this algorithm of training. The model will find the solution in finite time, there are theoretical bounds, a number of examples, but we will not talk about that in more detail. But it also is critical that this is an online algorithm, which means that you can keep learning, that it scales very well with large datasets. You keep learning and if you have more data you can just continue building the model. And then there are ways in which this perceptron algorithm can be trained so that you are able to find what is the most appropriate perceptron. And combine those perceptions innovated faith with elevated perception and so on. Okay, so in the next video we're going to talk more about deep learning models in detail. I want to kind of live with another negative disadvantage basically of perceptron is that it handles only numeric data. So all data needs to be converted into numeric form, but we'll talk about more in the next videos.