And welcome back, folks. So now we're going to talk a little bit about learning. So in terms of our course we've done a bunch on background and fundamentals. We looked at different models of network formation. And now we've moved towards trying to understand how network structure impacts different kinds of behaviors, and so forth. We talked a bit about diffusion, and now I want to focus in a little bit on learning. And in terms of learning, we're going to basically look at two different models. we'll look briefly at Bayesian learning, and then we'll look at, at what's known as the DeGroot model. And you know, there's a whole variety of models out there these days and different ways of, of modeling learning. And, you know, it's going to depend on who observes whom, and what the network structure looks like, and there's hybrid models out there. What we're going to do is, is just look at these two to get some flavor of these things. And the DeGroot model turns out to be a very useful one. The Bayesian one has interesting insights and interesting questions associated with it. So we're going to start with the Bayesian learning, and we'll talk about repeated actions where people get to observe what each other are doing. So I'm deciding over time what I'm doing, and I can see what my neighbors are doing. And there's going to be interaction between us and, and in terms of what I learned from what my neighbors experience and, and so forth. So we'll look at that a bit, and then we'll move into the DeGroot model. And the DeGroot model is going to be one with repeated communication, where people can, can keep talking to each other. But it's going to be a very naive way of updating. So what I'm going to do is, is essentially just keep taking weighted averages of, of information that I get from my, from my friends. And I'll form opinions by, by continuing to average things. Even though I might end up hearing from things from I might end up over-weighting or under-weighting. So it'll be a, a more naive model, where I'm not fully rational in a, in a Bayesian sense. so Bayesian learning, I mean these people are, are probabilistically sophisticated. you, you, you take into account information, you update a posterior using Bayes rule, and then, and maximize some payoff based on that. So the DeGroot model is going to much more naive and actually easier to work with in, in many ways. And there's some experimental work these days which you can find in, in some of the references, which compare, you know, Bayesian models and learn, and DeGroot models and other models. And you know find that, that it, humans are somewhat rational in what they're doing, but they have limits to their rationality and, and they don't look like they're necessarily full Bayesians. and some of these naive models, alternative models, can be better at actually capturing human behavior. Okay, so let's start with the Bayesian model as a, as a very useful benchmark and an important point to consider. And the idea here is, you know, first of all we can ask you know, will society converge? So will it be that eventually everybody converges to doing the same thing or, or having the same beliefs? will people learn and aggregate information properly? So imagine that new a technology comes out, and we're not sure whether this is a good technology or a bad technology, and some people start playing with it and using it, and other people can see whether others are, are enjoying it. will eventually, if it's a good technology, and a better one than the old one, will it take over, or will it not? Will, will people not necessarily learn, under what conditions might that happen? so, we can ask information, you know, questions about whether or not people are going to accurately aggregate information. And so, I'll start with a model by Bala and Goyal from 1998. And it's a very simple setting. A very natural one to, to analyze. So there's a number of people in some network. And we'll take this network to be a single component, so people are all going to be able to, to path-wise be connected to each other. So there's end players. And in this simple version of this we'll treat it as if you have two actions, A or B. Now, you can expand this to, to some finite number of actions, it'll be pretty obvious exactly how to do that once we get to the results. so the intuitions here are very easy to state with two actions, and I'll do a simple versions of this. It's going to be obvious how to extend this beyond this. So basically let's think of, of you choosing either action A or B over time. And so, maybe think of this as the old technology, A's the old technology, B is the new, unknown technology. or, you know, A is acting in one manner, say you know, keeping my money in bonds, and, and B is investing in, in some kind of stock over time. And so, you know, we've got these different actions that we can be taking over time, A or B. And A pays one, for sure. Whereas B is uncertain, and it pays two with probability p and zero with probability one minus p. Okay? And let's suppose, to make things simple, that people don't mind risks, so basically they just care about the expected value. So they know they can get one from, from choosing action A. And action B is either going to payoff two, with probability p, or zero, with probably one minus p. So basically, B is better if p is bigger than a half. This is bigger, bigger than one then they should choose B. But if p is less than a half, then they should choose A. Alright, so very simple setting. B's better if p's bigger than a half. A's better if, if p's less than a half. But we don't know what p is. So it's a new technology, we're uncertain. Maybe we have some prior information, maybe we have some guess at what p is, but we don't know for sure. So these individuals have to experiment a little bit with p with B, sorry, to find out whether p is, is good or bad. Okay? So, there're going to be choosing actions over time. And the learning model is, is going to be as follows. Each period, a person makes a choice between A or B, and each period you get a payoff, okay? And so, if I choose A, I get a payoff of one, for sure. And if I choose B, I'm going to get a payoff of two with probability p, and zero with probability one minus p. Okay, so I, I try this new technology. Maybe I'm a farmer. I try a new thing, and I either get the higher payoff or the lower payoff with probability p or one minus p, 'kay? So, each period I'm going to do that, and what does the network do? The network is such that what I also get to see is I get to see my neighbors choices, and their outcomes. Okay, so if I'm a given individual and I have two friends I chose A, I got a payoff of one. this individual chose B, they got to payoff of zero. This person chose A, they got a payoff of one. this person over here chose B and got a payoff of two. So basically, what I learn, I learned I chose A, they chose A, I get to see that this person chose B and got a payoff of zero. This person chose B and got a payoff of two. And I'm, everyday I'm going to get all this information, and I'm going to store that information over time. And over time, I'll begin to, to, to learn. So if I get, you know, if I begin to see lots of people choosing B, and lots of people getting two's, I'm going to think, well it's probably a good thing. P's probably pretty high. If I see lots of people choosing B, and lots of people getting zeros, then I'll downgrade my belief on p, and I, I would be more likely to pick A. And what people are going to do in this setting, is they're going to maximize their overall stream of expected payments, right? So I'm going to get, you know, a dollar today if I chose A. I get some random amount if I choose B. And every day I'm, I'm making this, this choice. And I have an expectation of what this thing looks like, conditional on what I've seen up to a, a point in time. And so I'm getting some prior, some payoff that, that pi sub i at time t, will be the payoff I get from following a certain strategy of choosing As and Bs. And they'll be some delta less than one greater than zero, and I'm going to maximize that sum of, of discounted payments, okay? And let's suppose that p is unknown, so I don't know it initially. It takes on some finite set of values. So maybe it could be 0.1, 0.2, 0.3, etc. Right, so it has some finite set of values that I'm trying to guess, whether B is a good thing. Okay, that's the structure. So now let's talk about some of the difficulties with this. What, what are the real challenges in Bayesian learning? So first of all, let's, let's think of the following. So let's, let's suppose that I, I know what the network looks like. I'm person one, here's person two. person two is connected to some other individuals, say three and four. And they're connected to five, six, and so forth, right? And we've been doing this for a set of periods. And what I get to see, let's suppose I, I'm choosing A for a while. I'm, I'm kind of pessimistic. I'm seeing that, that this person chose B, okay? Now the fact that they chose B tells me two things. One is it tells me something about the, the fact that they must think B is a good outcome. But, it also tells me things about what they might be seeing from other individuals, right. So, so over time they've been seeing three and four are doing. And I don't get to see what three and four did. But I know that two saw three and four. And I know that three is influenced by five, and four is influenced by six, and so forth, right? So there's some network out there of a bunch of individuals. And let's suppose, for instance, this is just a, is an example. Let's suppose that I've been choosing A for awhile, I choose, I, I see that this person is choosing B for awhile. And so, and let's I suppose I see them. You know, I see person two choose B, they get a payoff of two. I see them choose B, they get a payoff of two. B, they get a payoff of two, so I'm thinking, wow, this is really great. and so then I switch to B, and I, I keep seeing them get twos. And then suddenly, I see them switch to A. What would that tell me? Well, now I have to think, well why would they have switched to A? It's probably because their own experience has been pretty good. It must be that they see some bad experiences somewhere else, right? So now I have to think, what are all the possible experiences they could have had? Well, it could be that they saw three getting bad payoffs, or four getting bad payoffs. Or maybe it's that they saw three switch from B to A. Or, you know, so they saw both three and four switch. So I have to, you know, in order to think about this problem, it's a very complicated problem. I have to think about, what are all the scenarios that could be considered in terms of all the histories of As and Bs, and what everybody is seeing, and how does that impact each person's decision. And what, what should they do in response to that, 'kay? So the updating question here is actually fairly complicated. So I can make all kinds of indirect inferences just based on what somebody's strategy is. Okay, that's one, one challenge. What's a second challenge here? A second challenge is that there also could be some interaction. Let's suppose I start with with a belief that p is less than half, right, If, if I was alone in the world, even if I believed that p was less than a half, I know that I'm going to be wrong for a long time. Its still worthwhile for me to try B a few times, just to see, to experiment, and to see if p, if, if in fact maybe I'm wrong and maybe p is, is higher than a half. So even if I start out with a prior, I, it could be that I want to experiment. Right, it could be that I want to try B a few times just to see what happens. Then once I've learned that could be very valuable information, because if it pays off two a bunch of times in a row, I'm going to want to take B. And that's going to give me payoffs for the rest of my life. So trying something out can be very worthwhile. And so, being fully rational, even if I start with p less than a half, as long as I'm making choices over time, there's an option value for trying this thing and and, and that's going to be positive. And I might want to try that and, and experiment for a while, and see what happens. Okay, so, so now there's an experimentation that comes into play as well. Okay, well now let's suppose there's two of us, person one and person two. I'd like the other person to experiment. If I think p's less than a half, why don't I let them try it, and I'll sit by and just choose A? And if, if they experiment and play with B for a while and it pays off well, then I can switch to B. But I don't have to pay the cost of the experimentation. I want a free ride. 'Kay? Now that becomes a game, which is actually going to have a fairly complicated equilibrium, especially when you start putting that game in a network with all kinds of players. And we begin to have the players connected to other players, and now we look at this, this simultaneous decision of, who's going to choose B in this period? Who's chose it in the last period? What are our beliefs? What do I think everybody else's belief is, and so forth. So when you, when you get to, to overall looking at this game, the game becomes very complicated. both because of the strategic aspects, and because of the, the, Bayesian inference. And so, now you can begin to see why it might be that in fact when we put humans in the laboratory and, and ask them to play games, or to make these kinds of choices, that they might not behave in a fully Bayesian manner. and its just, just complicated to do. It's, it's hard to even write the model down and solve it. Okay so in fact, the way that let me just say a little bit about how this is solved in terms of the Bala and Goyal approach. So what they did is assume that players are not going to be strategic about this, and each person is just going to choose things which maximize their own payoff and, and not worry about the gaming aspect of it. And secondly it, I'm not going to infer things from the fact that other people are making different kinds of choices. I'm just going to, to keep track of what have I seen in terms of my histories of As and Bs, okay. So I just, I just keep track of my, what, whatever I've seen through myself and my neighbors. And I'll just keep track of what are the relevant payoffs, and important, most importantly, how many times have I seen B payoff two, how many times have I seen it payoff zero? And then I can update on what I think p is, just based on, on those observations. And I'll ignore everything else, and I won't do the complicated updating, I won't game things. I'm just going to look at, at the twos and zeros, and decide whether or not I want to switch from A to B or B to A. Okay, so let's look at that. Okay, so what's a proposition you can prove then fairly directly? the first thing you can show is, let's suppose p is not exactly a half, where I'd being different between choosing an action. Then with probability one, there's a unique time, or there'll be a time, a random time, sorry. such that all agents in a given component play just one action and play the same action from that time onward, okay. So bascially what's, what's happening is that as long as p's not exactly a half, and we would be sort of indifferent we're basically going to eventually all end up choosing the same action. And we'll just lock in on some action and, and play that forever after at some time, okay. So, so, sometime we'll all eventually converge, and, and play the same action forever. Okay, so that's the nature of the proposition. So let's talk through the intuition and, and basic proof behind this, why is this true. and I'm just going to sort of sketch out the proof, it's, it's fairly easy to, to fill in the details here. So let's suppose that, that this weren't true, right? So if, if it wasn't true, then basically somebody's going to be having to switch back and forth infinitely many times, otherwise we'll eventually converge. So somebody's gotta be going back and forth between A and B infinitely often. And then, in particular somebody let's suppose we just have one component, and this just works, you know, regardless of which component you're looking at. So, let's suppose somebody is playing B infinitely often. Okay, so they, if we don't converge, somebody's gotta be playing B infinitely often, okay. Now we can use the law of large numbers. So law of large numbers is going to tell us that if somebody plays B infinitely many times, then they're going to come to get an arbitrarily accurate estimate of what p is. So with the probability going to one in time, they will, their belief will converge to p. And so, what does that mean? Well, in order for them to keep playing B, if their belief about p is becoming arbitrarily accurate, then it must be that p is converging to bigger than a half, otherwise they would stop, right? So over time, they're good, they're good Bayesians, they know how accurate their belief is. They would either converge to p above half or not, or below half, because it's not allowed to be exactly half under our assumption. If it's above half, then they'll keep playing it. If it's below half, then eventually they would stop playing it. Because now they'd be arbitrarily accurately convinced that it's, that it's not good. So if it's not good, they should learn that, and they'd stop playing it. If it is good, they'll keep playing it. Okay, so it must be that if they do play it infinitely often, then it's gotta be the case that they're converging to the good belief. Otherwise they would've stopped. Okay? So, now this means that, that they have to be converging to the true belief or, or with probability one that the, the true p has to be bigger than a half. so then, everybody who sees this person is actually going to see this sequence played. They're going to also see B played infinitely often. They're also going to have to converge to the belief that P is bigger than a half. And so they should all start playing B, right? So if this person is, is learning that B is, is good, then their neighbors are all going to have to converge. Then these people are all going to see B infinitely often and converge to p. Their neighbors are going to have to converge, and so forth. So the neighbors of agent must play, then all agents must have to play B. So it just has to spread out, okay? So, if, if anybody's going to play B infinitely often, then it's gotta be that it's a good thing, and you learn. If not, then we've stopped, and everybody had to have played A. So, either somebody plays B infinitely often, in which case, we converge to B. Or they don't, in which case everybody's converging to A. So that gives us a proof that basically we're going to get a convergence. And we're going to converge to either all playing B or all playing A. Well, does that mean that we always converge to the right action? That if, if B is, if p's really bigger than half we converge to B, and if p is really smaller than a half then we're going to converge to A? Well, let's suppose that p is really bigger than a half. Is p is bigger than a half, then B is the right thing to do. We should really be playing B. so, then we will play the right thing if we actually converge to that. But its possible, that we might not converge to that. And how could that happen? That could happen if we all start pessimistically enough. And we just happen to get some bad draws on, on B, initially. So it's possible that everybody gets some bad draws on B, stops playing B, and then we never learn after that, right? So, so even when p is bigger than a half, it's possible for us to converge to A. on the contrary, if A is the right action, then we've gotta converge to the right action. Because that means that now p is less than a half. So there's no way to converge to B in that case, because if we played B long enough, we'll learn that it's not good, and then we'll all switch to A. So, in, in a situation where A is the right thing to do, we'll necessarily, everyone's going to have to learn that, and we will converge to A. if B is the right thing to do, we'll eventually converge, but we could all stop playing B too soon, and we might end up just converging to B. Okay, so, so, we will all converge to doing the same thing in this model, but whether it's the right thing or not depends on whether B is the right thing or A is the right thing. If A is the right thing, we'll definitely converge to the right answer. If B is the right thing, we might or might not, depending on what our prior distribution is, and whether we get good luck in the initial draws. Okay you could enrich this model so that you have different priors for different individuals. You can actually start specifying a prior, so what's different people's priors or prior beliefs. And then the probability of converging to the correct action, of converging to B, for instance, when it's the right thing to do, can be made arbitrarily high. Basically if, if, you know, if we add many actions, as long as each action there has somebody who initially has a very high prior that that action is the best one, then we'll get enough experimentation in these different actions. So that we'll learn about these different actions. And society can, can learn arbitrarily accurately, as long as there's somebody who's really willing to try out every technology. And the case where we might fail to learn is going to be a situation where nobody's convinced enough to begin with to, to give it a long enough try. We might end up not learning about it. Okay, conclusions. where did we end up in this, in this model? we all end up choosing the same actions, so we reach a consensus. it doesn't necessarily mean that we all end up with the same belief. Because we're going to have different observations, so it might be that one of us stops with a probability, you know our probabilities differ on whether B was good or not. we, we might end up with different beliefs, but we all believe that it's, we're all pessimistic enough to stop. you can do speed of convergence kinds of results here. You could, you know, go through and, and with depending on whether B is good or bad, sort of do computations. There you have to actually explicitly solve though for these optimal actions as a function of what the histories look like. And there's a number of, of theorems in, in studies of, of two-arm bandits and other things in probability theory, where you can get rates of convergence on these things. law of large numbers, especially in this kind of Bernoulli world, have well established speeds of convergence, so you can calculate those kinds of things. and, and relatively, you know, these things will happen relatively quickly in terms of the number of observations giving good information here. limitations, okay so there, there's a number of limitations in this kind of model. one is that, that, you know, basically everybody was getting the same payoffs from A or B. And so, when you think about new technologies in the real world new technologies might be right for some people and not right for other people. And so, when you start putting in that heterogeneity, then it's much harder for me to necessarily infer, you know, maybe my, my, my neighbor is getting a good payoff from this, but will that be the same payoff I get? That complicates things. And that heterogeneity means that the learning might take a very different form than what it did in this model. here we've got repeated actions over time, so everybody keeps taking all these actions and trying all these different things. There's a lot of things in which we're not trying things repeatedly over time we're just learning about them slowly over time. So with things like global warming, we're going to get one, one go at it. And you know the, the, it's not as if we can just try infinitely often experimenting with different things. And you know, if we get it wrong we've got it wrong. so, so the, the repeated actions over time, and getting feedback from that is, is a situation where we get lots of information over time and, and incoming information. It might be that information, we get it in, in slower clumps or different bits. here this is a very stationary environment. It could be that the environment changes, which make things even more complicated. And finally and, and probably most importantly in this model we were not really able to take the network into account. the network really didn't play any role. All the arguments were just that, you know, somebody would eventually learn, and the neighbors have to learn, and the neighbors of the neighbors have to learn, and so forth, so it's a simple induction argument. And we weren't able to say anything about what was going on in terms of one network versus another. Now, you could do simulations and see whether speeds are faster in one versus another. but we can go to other models to try and get a better feeling for exactly how the network works on that. And that's what we'll, we'll do next when we move to the DeGroot model. That'll bring in the network very explicitly and allow us to do a lot of calculations quite easily. So that'll be our next subject. We'll start looking at the DeGroot model where network structure's going to play a much more prominient role in the learning process.