Okay, so let's take a look at marrying the strategic formation models we have been looking at with some of the earlier types of models that we had for estimating networks, random networks. And in particular we'll look at sub graph generation model SUGM, and try and figure out how we mgiht fit in some of the utility based calculations we've been looking at. Okay so we've got utility from forming subgraphs, links, triangles, etcetera, but what we're going to do is, is noise that up by putting in some randomness in the utility. So let's have a look on how we might do that, and let's do this in a context of a specific example. So what we are going to do is try and ascertain whether or not when we look at caste relationships, is there some sort of social pressure that's operating on these and not surprisingly we might find that there is. And in particular are, when we, when we look at cross caste relationships. When we look at, at caste relationships that go across boundaries, are they more likely to occur in private? When people have no friends in common, or do they occur with the same frequency when people have a friend in common as when they don't have a friend in common. Okay, so that's one of the questions we might ask. So let's look back at some of the data we had from this work with, Abhijit Banerjee, Chandrasekhar and Estu du Flur. So this is village 26 again, kerosene-rice sharing, and what we're looking at here, then, is again we've, we've colored the nodes by a, just a dichotomous cast look, so schedule caste and schedule tribes are blue. General and otherwise backward castes are red. So what we've got is we see that, you know, there's fewer relationships going across the boundaries of, of this designation than within. So we saw the, the probability of going across was 0.006, the probability within was 0.009. But we could you know, we can look at different, here's Village48. A different ne, sub-network. Who visits which household? Which other household socially? It's a denser network but we see similar patterns in terms of the segregation. And so what we want to ask here is, let's look at, at say somebody from the red, some from the blue categorization. Do we see this kind of relationship, where they have a friend in common, less, relatively less frequently than we see this kind of relationship? So, do people have, prefer to form these things in private rather than in situations where there's going to be some sort of witness? To the interaction. Okay, so one difficulty in beginning to estimate this kind of thing is the fact that triads are going to take, triangles are going to take three people to agree to form, whereas links are going to take only two And so naturally it's going to be more difficult to get triangles to form. And so we're going to get a bias that, that makes these things relatively less likely and then if we make any particular link less likely across, then these things might have a lower likelihood just because we're working with threes, rather than twos. And so what we want to do is account for preferences explicitly. Otherwise we're naturally going to find that the less diesired triads compared to more desired triads is going to look less than less desired dyads compared to more desired dyads. So there's going to be a bias there unless we, ac, ac, account for this carefully. So how are we going to do this? So let's build preferences into this. And then look at a sub-graph generation model, and then try and figure out what the, how the probability of link forming depends on the likelihood that the pair meets and, and both wishing to form it. Okay? So generally we can think of this as saying there's characteristics that i has, say in this case their caste designation. And there's going to be utility that they get from forming a link. Based on their characteristics and the other person's characteristics and then there is some, something either unobserved or some personality or something else which then also affects that utility. So we'll put it in error terms. We subtracted off something. Which could be negative, it could be positive, so maybe it's a boost but there is some random element here. And i benefits from the link. Yeah, if and only if, this error is less than this utility. Okay, so if the error's less in magnitude than the utility, then this term's going to be positive. And you're going to want to form that link, and otherwise you're not going to. Okay, so we have a very simple preference based model. Now we're going to try and fit that in to a sub graph generation model. So how we going to do that? Well, under pairwise stability the links are going to form if and only if both of these two prefer it assuming that the, the chance they're getting exactly a zero utility is, is zero. So now we've got that links form if and only if i prefers to form a link and j prefers to form a link. So the error that j gets from forming a link with i is less than the utility that j gets from forming a link with i, and, and so forth, okay? So links are going to form on both of this things are true, and so if we have some distribution of what the error terms look like, then the probability that a given links forms is going to be proportional to the probability that their error is less than i's utility and the probability, times the probability that, that error is less than j's utility. So has to be that both of them prefer it. So when we take this product that will give us the product that, that chance that both of these people prefer it. What's the chance that both prefer it is the product of two, okay. That's, that's under an assumption here and puts the assumption is that the errors are iids. So the chance that j likes i is a, the noise in the chance that j likes i is isn't dependent with the noise that i gets from the same relationship. Okay. Now you can do the same thing with triangles. What's going to happen is now we going to have triangles depending on the three people's characteristics and then we'll have multiply it three times. One for i, j and k, okay? So it's exactly the same kind of ideas and principles so we could generate any kind of sub graph. By doing the same technique. Right? Putting in utilities for different sub-graph forms depending on the characteristics of the individuals involved, and then probabilities that people are actually going to have errors that are less than that. And, and that gives us some distribution here. Okay. So now let's go ahead and, and try and look at how we would use this kind of model to estimate something. So what's the null hypothesis? So if we think that there's no social pressure, then we think that a given person's preference for having an across, to being involved in across caste triad, compared to within caste triad. Is the same as whether they prefer across caste link compared to a within caste link, okay? So what we're allowing them is to care about caste but what we're saying is they don't care the probability that they prefer something across caste in terms of a triad is the same as their relative preference for that within a link. And instead of a triangle. Okay, so that's the null hypothesis that we have. So now we can just go in and say, okay well what's our, our model said that the frequency of cross caste triads compared to within caste triads is going to look like this ratio. Of utilities if we just assume now that everybody has a similar utility function that either varies, am I going across or within and then we just get you syncratic noise on the particular relationships. So then we have got a cube of the cross caste triads compared to within and square on the cross links compared to within. So now we are correcting from the fact that triads are harder to form. And so that tells us the probability prefer to form across caste triad. Right? So, so if we go back here we want to back out what this utility looks like. So what's the probability that I prefer this? Well, this is going to be the cubic. Right, we'll just take a cube root of the of the relative frequencies. Right, so we can then just correct the probability that preferred to form across is just going to be the frequency to the 1 3rd, crossed for links is going to be to the 1 half. Okay. So now, if under the null hypothesis, these two things should be the same, that tells us that these frequencies in the data should be the same, if the null hypothesis is correct. Okay? So what we can do is plot what's the frequency of cross caste triads to compared to within to the one third power, look at that compared to the links to the one half power. And these things should be the same under a hypothesis that social pressure doesn't matter. And if they're different, then we can figure out which one, you know, is, does social pressure encourage it or discourage it? So if this number is, if the top number is less, then we're seeing discouragement based on the social pressure. And if it's more, then we would see encouragement. Okay? So let's plot these out. Here's links down here. This ratio. And here's for triangles. Triangles up here. And this should be on the 45 degree line under that null hypothesis, so this is the ratio of triangles. This is the link ratio raised to the three halves to correct for the three versus two. And now when we look at these, they should all line up on the 45 degree line or half above and half below and these are for the 75 different villages. And indeed, we see that there are more winding up below. And if you do a statistic test of just looking. So one conservative test in this world. Is that if the null hypothesis were true, then you ought to have a coin flip as to whether a village ends up on one side or the other side of these line. In fact, when you do that the preponderance of villages end up below the line, and this is going to be statistical significant up to 99.99% level or more. One interesting thing you can do here is you can actually also then sub divide these villages by how integrated they are in terms of, or how balanced they are in terms of the caste designations. So, some of these villages would be 50% red, 50% blue, in terms of those different measures we had of the scheduled caste, scheduled tribe, versus general and otherwise backward castes. So some of them split halfway down the middle, so you have two di-, you know, people evenly matched. Others are say 90% to 10% or 95% to 5%. So there'll be a big majority of, of one caste group and a small minority of another caste group. And so what we can do is look at how balanced the groups are. So let's look at the relative size of the, of how big the minority is compared to the majority. And if the minority is above median size, then that gets a light blue. So these ones down here. We can see that most of them end up pretty far below. There's only a couple of them that end up anywhere above the 45 degree line. Most of them are ending up below. Whereas the reds are the ones where there's a little more imbalance, so that the smaller caste are, are more minority. And this actually, now you find that those, that reds actually are, are a bit closer to the 45 degree line. So the more skewed the village is, actually you find less of the, the pressure. According to the statistics, whereas here if you've got a very well-balanced village then the castes seem to separate more, in particular under the triangles you see even more pressure to separate. So this is actually something that you see in different data sets is that the more balance things are. The more tension there can be in forming cross group ties, and in particular here we see that the relative ratio of triangles to in, in cross caste compared to links you see that the, it's more, more often that you get links compared to triangles in this kind of setting. So this is just you know, one illustration of how we might begin to marry these kinds of models. But what it does to show us we can use preferences together with other kinds of, of statistical models to begin to estimates some of this models and see what's going on in some of the data. Get a little bit of a lens, its hard to do, interpret this closely. But at least we can figure out whether there are certain patterns in a data, and here there are patterns among the triangles and the links. So we reject the null hypothesis based on the model people show us significantly stronger preference in terms of what we estimate. Now whether or not they truly have those preference depends on whether the model is correct, the model is a little bit simple here more for the, pur, purposes of illustration. But we can begin to build richer models that take more into account and see whether this finding holds up to those. okay, so you know,when we model the strategic network information. We can begin to use different types of models that allow for sub graphs and allow for strategic network formation and estimation. That opens up a, a possibility of looking at systematic estimation and it allows us to you know, include sub graphs. There is actually now a whole series of family of dynamic models being formed too. So this is a fiarly interesting area of exploration these days.