Okay, so want to take a lot at, at a structural model and fitting structural models if, of network formation, I, and that combine aspects of both strategic formation and chance meetings. And, the idea here is that, you know we can build these models to explore the fact that in a lot of settings there's going to be some choice involved but also some chance involved and we might want to estimate some things like relative roles... And you know, the the random models can be too extreme, the strategic models can be too extreme. We seen the beginnings in terms of the exponential random graph models of ways to combine some of these things but we can also in particular instances fit models that are more precise to the setting involved and more directed at asking a very specific question. And so for instance, let's ask a question of, when we see homophily how much of that was due to the choices of the individuals and how much of that was due to the fact that you're more likely just to be meeting individuals of your own type rather than choosing to interact with individuals of your own type... So if we want to ask a question like that. Can we build a simple model to address that? And so here, what I want to emphasize is really the techniques for doing this rather than a specific model. So this is going to be a very specific and stylus model. But what the what I want to do is just illustrate that you can use, you can do similar things to where you build what you think is the right model for a particular application. And then use that to generate networks. Look at the networks that come out. Try and match them up with the data and that will allow you to fit parameters to the model that best match the data and then do statistical tests to see whether, you know, certain things are really going on. how much choice is really going on. how much chance is really there. How much noise is in the data and so on, so forth. So that's the idea here. And so I want to emphasis basically an approach rather then taking so seriously the specifics of this particular model. It's more as an illustatration or an example then as as to be taken seriously as the model. So in terms of application of homophily Let's suppose that we've got two types of, two groups, Group A and Group B and they form fewer cross say race relationships than would be expected given their population mix. So if we go back to our add health data and look at one of those high schools. And we see that, then we see a segregation by race. We could ask is, is this due to structure? So maybe they just don't meet each other very often. They don't meet each other very often because In the school, there's certain kinds of structural patterns in terms of the way the courses are organized or the way that people will take extra circular that don't allow for many meetings between different races, or is it due maybe to the preferences of group A or the preferences of group B or both of their preferences and so forth... So can we begin to sort these things out. So, I'm going to just take a look at, at a couple of papers the techniques from a couple of papers that are with Sergio Quarini and Paulo Pin, from nine 2009 and 10. And what we'll do is just, we'll specify how much utility a given individual gets as a function of the friendships they have. And then we'll allow a meeting process that has randomness in terms of who you're going to meet. And we'll allow this, both the utilities and the meeting process to depend on your type, so in this case, say your race or your gender, or your age, or your profession. Whatever, whatever it might be. and then begin to see what comes out of that and, and try and match up the parameters to the to the data. Okay, so let me say a little bit about the idea here and, and, you know, when we're, when we're thinking about trying to estimate strategic formation models Generally, what we end up seeing is, is the result of some choices that were made. And there's something that's known as revealed preference theory in economics. Which refers to the fact that you know, we might see say a consumer buying certain products. And then, based on the fact that they bought one product at a given price and not another product at a given price. We begin to try and infer what their preferences over different product attributes are. So what do they really want if they ended up buying something and not buying something else? Okay, and so here what we'll do is we'll be basically inferring preferences by saying, okay, this person formed these friendships and not the other, another set of friendships. That gives us some insight into what their preferences might be. Why did they form these friendships and not those? Well, it tells us something about the, what they preferred to form in terms of friendships now again that could be due to what they have available. And just as in consumer theory you might have a budget which says okay look these were the things I could've afforded, and I bought this and not that. here what we're going to have to do is just sort of infer what are the, what is the rate at which you had opportunities to form different types of friendships? And so, the chance part is going to be fitting what were the opportunities that were coming along and then what choices were made as a function of those, and that'll give us information about what's actually the, the preferences and, and what were, were the relative opportunities that they had. So that's the idea. One thing to emphasize here is this gives us, say, a different kind of look at things than just direct surveys. So you might, for instance, ask people, what's your attitude on race or would you like to form friendships across races and so forth. And the difficulty with asking people directly is that people often answer in ways that aren't necessarily congruent with the choices that they make. So this takes seriously what did you actually do, not what you would say on a survey. And sometimes there can be differences about this and so this is a different way of sort of measuring attitudes towards, you know things like race, or gender, or age, or whatever it might be in that particular context. Okay, so a simple model. what we're going to have is some set of types 1 through k, so this might be ethnicity, it might be the, the age of the individual, it might be a combination of their age, their religion, their gender, etc. and what we'll have is a very simple model in terms of the preferences that people have. So, this is going to be a simple independent link formation model. So, it's going to be simple in that dimension. It's not going to be trying to recreate richer parts of the network but it's going to allow to separate out some of the preference aspects from, from some other aspects. And so what people value, is they care about how many same-type friendships they have, and how many different-type friendships they have. So, really simple model. You just care about how many friendships do I have with t-, people that look like me, how many friendships that I have of people that are of a different type, and I get some benefit from just that. Okay, so very, the simplest possible formulation you can imagine. And in particular, what you get in terms of utility, is then some number of, of same and different type friendships weighted by a parameter, gamma i, where gamma i is capturing how much do you weight a different friendship compared to a same type friendship, okay? So, it's a preference bias. If this was 1, then all I care about is the total number of friendships. I don't care what their mix is. If this is bigger than 1, then I actually care for diversity. I care more to have friendships with other types than same types. If it's less than 1, then I get a higher benefit from same-type friendships than different-type friendships, right? So. Gama i is going to be the critical perimeter. In terms of representing preference bias. And then we also have this other perimeter. Alpha. And what is Alpha going to keep track of. Alpha, is going to be generally less than one. Is going to be some diminishing returns to friendships. So my first friendship might be very valuable to me. My second one additional value and so forth. By the time I get to my 10th 12th etcetera these friendships are becoming less valuable and so the fact that alpha might be less than one would give a concave function. So as you look at at the utility as a function of total numbers the utility's going to tend to be concave if alpha is less than 1. So we've got a situation where, as alpha's less than 1, then we've got curvature in that utility function. Okay. so let's let t i be the total number of, of friendships that we're forming. And, so basically, people are socializing, they have an opportunity to form friendships. They meet people of different types, and, in this model let's let qi be the fraction of own types that you're going to meet and be able to form friends with. And then 1 minus qi is the relative number of other types that you're going to form. And so if you spend, if TI is the total number of friends that you form, then the relative number, this is going to be your S size, is going to be, the fraction that were of same type and, the DI is going to be the fraction that were different type, times your total friendship. And so here in this model is a very simple model I just have opportunities coming and the cost is just going to be its going to be costly for me to form some number of friendships I'll cut off that total number but then the mix I get is just going to depend on the relative meeting rate. So I, I meet people at some rates, and I take whatever friendships come, but it's expensive for me to form friendships, and so after some time period I stop socializing or trying to find new friends. Okay? So ti is just going to maximize, it's going to be a maximizer of this overall utility function. Where you've got, same type firend, different type friend, and so forth. And the rate at which the come is Qi for same type, 1 minus Qi for different types. And that's going to be coming out of the random part of the process. And right now, then what we could do, is say, we can figure out if we knew what gamma was, and was alpha was, and what Q is, and C, and so forth We could solve this function and say, how many total friendships would a given individual like to have? And how would that depend on those relative parameters, okay? Okay, so to maximizes this function. If you solve that, you can get an expression for what ti is in terms of the over the other parameters. So, maximizing that function. take the derivative with respect to TI, set it equal to zero. Alpha's less than one. This is necessary and sufficient for the solution. and then we'll also add some noise to the given decision. So it might be that a given individual for whatever reason has more or fewer opportunities or more or less values. So they're going to, we're just going to add noise in terms of the, the friendships that have given individual forms so a person a of type i is going to have an extra error term, epsilon a and so the total number of friendships of any given individual forms is just going to be some noisy thing about this solution. Okay. So very simple model in terms of the the formation. But now we can see if we write down a simple model of How much utility you get from some aspect of, of the network. we maximize that. And we get a solution for what. How many in this case. What degree. So we can think of this really as the degree of aging eye. This is the degree that they would like to have. In terms of this model. and then what they end up with is some noisy variation on what they would like to have. given the parameters of the model. Okay so how do we actually identity the parameters from the data. so what we can do is in the data we'll actually observe the ti's the tai's so we'll see, how many we'll use the add health data. So when we look at the actual networks of friendships in these high schools we can see how many friendships did each individual form. And so we observe this directly in the data, and that's going to vary with the qi's, so as a function of qi, the function of the alphas, the gamma i and so forth. That gives us a tai. And so one thing to notice, is that when we look at this expression for tai, This is increasing in q i if gamma is less than one. Right? So, if gamma is less than one, then you've got a plus one q i and then you've got minus gamma. So, you've got q i Times 1 minus gamma i, in here. And so if gamma i is less than 1, then you've got a positive expression for t a, t is a function of q. So, more of my if the fraction of people I'm meeting is more of my own type, I should form more friendships And so that's what's going to allow us to begin to fit what gamma i is, right? So the idea is t i should be a function of q i, and how quickly it varies with q i is going to be dependent on what gamma i is. Okay? And, in particular, if you actually look, this is a picture of the add health data. So these high schools here there are 84 schools. And each dot here represents a certain race. Group within a particular school. So for instance, this dot here is a group of white students that formed. So it was a particular school, and in that school the white students formed about a little over between 60 and 65% of the population. this school right here, this is a, a group of black students in a high school where their groups' size is a fraction of the school was just below, between 20 and 30 percent, a little closer to 30 percent and this then tells us on average how many friendships did they form? They formed on average about eight friendships. This group formed on average you know about three and a little bit of change, and so forth. What we see here is that indeed if you do just the slope between here, you see that the slope is 2.3. So, there's an increase as a function of your group size. So the more prevalent your group is in the population ,that's going to lead to higher qis. And indeed, we see that there's a higher group of friends as, higher friendship, as a function of the the size of the group. So the easier it is to meet your own type the more friends different groups are forming, and so we will be able to actually identify that gamma perimeter from this data, and in particular when you look at this thing, you know the slope here is 2.3, the T statistic on that is 7.3, so... You're you're quite a number of standard deviations away from from zero so so we're actually seeing a highly significant slope here. So we will be able to identify the fact that groups that have higher proportions are forming more friendships which would indicate that they're getting higher utility under this model. and we can then estimate what the gammas are based on that. And in particular I'll just sort of you know, show you the best fit lines. If you look at the best fit lines for different parameters, you'll for different races you'll end up seeing different slopes, and that'll allow us to back out gamma-ise the gammas for different Races, because each one of them has, having a different relationship between how big their size is, their group size, and then how many friendships they are forming. Okay. So the last part of the puzzle in terms of figuring out the randomness in this kind of model is where do the Qis come from. So we've got the, how many friendships each... Person would want to form as a function of the parameters of the utility function and the rate at which they meet different individuals, but now we want to ask, what's the rate at which they meet different individuals? Okay, and the important thing here is that the rate at which they're going to meet different individuals is going to depend on the decisions of the other agents. Agents, okay? So if everybody was trying to form the same number of friendships, and we're just sort of mixing in the population, then if my group formed 30% of the population and some other group formed 70% of the population, then I would meet my own group at, at a rate 30%. And I would meet agents of different types at a rate 70%. But if, the other types, imagine if the other types are actually trying to form, they form many more friendships. They're spending more time circulating and mixing, then they're going to be easier to meet and my type is going to be relatively less likely to meet, and so it's not just... a function of the relative sizes, it's also a function of how many friendships different groups are trying to form. And so we need to solve this overall as an equilibrium given that the, the, that the t's are going to be determined by these relative rating, the q's, and the q's are going to be determined also by the actual decisions of the agents. So in particular let's think of the meeting process and we'll think of this as a giant party. So we can think of this like a cocktail party. So let's think of a different given individual say is a green agent. And this green agent is bouncing around in a party where there are green agents and red agents. And so what's going to happen. imagine that the incoming proportion of reds is 80%. And green's it is 20%. But if the red's spend more time, trying to form friendships. And are generally forming more friendships. It's going to be easier to form friendships with red's then green's. And so even though it say let's say .8. 0.2 coming in. The mixture in here could be, say, 90% 10%, or, or even more skewed than that if the reds are spending, say, twice as much time in the, in this party than, than the greens are. So, the rate at which they come in and, and go out, is not necessarily going to be the same as what the relative stock of people is, if the greens are exiting much more rapidly than the reds are. So, as we go through this process then, you know, this group, given green node bounces in to somebody, meets one friendship, meets two. Three, four, so it's got three red friends and one green friend, and it decides, okay, that's enough, I'm satiated. You know, I formed four friendships, and that's enough for me. And then it decides to exit. a red might find this to be, if, if gamma i is less than 1, then reds are meeting reds at a higher rate. They might want to stay longer. And that's basically the idea of the model. Ok? So we've got this q i is the rate at which i meets i. one minus q i the rate at which you meet the different types and the way in which this is going to be modeled is the q i the rate at which you meet your own type in terms of this process Is going to be dependent on the stock, how many of those individuals are actually in a room. But will also allow this to be biased, so that even when I'm in the room, it might be that that I'm biased in terms of meeting my own type. So maybe I'm in this large room, but I actually look for greens and try and find greens. In which case I'm going to meet greens at a faster rate than actually they're, they're in the room. And so, if beta i is exactly equal to 1, then the rate at which I meet people is just, what's the stock of these people in this party. If beta is greater than 1, then you're going to meet your own types, at a rate faster than than you would just milling around. You're actually going to meet your own types. at a, at a faster rate so, so this particular formulation says the weight at which you're going to meet people is dependant first of all on how many people are in this party and then also can be skewed by this extra parameter which represents some viscosity in this meeting process so own types are going to tend to, to meet own types. So this is going to be the bias in meetings, right? This is going to be the parameter beta i, where beta i greater than one means that you're meeting your own type at a rate of above what you should be meeting them relative to how they're mixing in this population setting, okay? So we've got Qi equals to the, equal to the, this stock thing, so if, if, if I was 50% of the population and beta was 1, then I would meet my own type at a rate of 1 out of 2. If we set beta to 2, then the, the chance I would meet my own type to be about 71% And if beta was as high as seven then, you know, the chance that I would meet my own type would be, would be about 91%. So, as you begin to, you know, this would all be with, with a half, and, and sticking in whatever my relative size is, what this does is sort of buy us this relative rate at which I'm going to make own type friendships compared to other type friendships. Relative to what the mixing, the total number of friendships are, where the stock is going to be just you know, based on the sum of the ti's from my type, compared to the sum of the tj's overall. The, the j's right? So it's keeping track of sort of what's the relatives size of meeting the population compared to the others, and then we raise that to some power. Okay, so what does this all work out to be? Then we've got the ti maximizing this function. The stocks are going to be relative to the relative number of meetings that different groups want, rated by their relative sizes. And then the meetings are going to be determined by what the stock is raised to these bias parameters. And so, if the b-, if the s- The, the, the fact that the stocks have to add up to one. tells us that we have a balance equation. In terms of what these qi's have to look like. When you sum across the i's. The qi race to the beta eyes, have to equal one. Okay, so what we end up with in terms of having balance on the meetings that the, you know, if one group's meeting the other groups at a certain rate, they have to match up, that's going to give us an equation which will help us solve for this beta i parameter. So we're going to be able to solve for the beta i parameter from this. Okay, so a simple model where we maximize utilities, we have a meeting process, we estimate the meeting process, we put all these pieces together, and we'll be able to estimate both the beta i is from here, and the gamma i s from here, and then see what it looks like in the data. Okay, so we've got these two conditions. This, maximizing this, this will help us identify the, these parameters. we've got this, which will help us identify these parameters. The qi's we'll actually observe in the data, so what's the relative proportion of own type friendships to other type friendships for each group? So we've basically can identify these perimeters by fitting this model to data. the only perimeter we got left .Is, we've still got this cost of forming friendships. That we don't know exactly what that is. and so when we look at the equations that we have For the t's and the betas I'm putting in some errors. Then what do we end up with? We end up with these two equations we have to fit. We've still got this c out here. and so, what we can do is when we look at you know, solving this out For two different groups we can say that the, the relative weight at which is should be forming friendships compared to js forming friendships including the errors should be a ratio here, where now this ratio is going to divide the c out. So the c we can factor out. By just looking at relative numbers of friendships because the c scales everybodys friendships up or down. And so if we look at relatives numbers of friendships formed by one group compared to another then that factors out the c and then we don't have to estimate the c directly, we can just estimate the alphas and gammas. Right. So basically what that tells us is that the we're going to end up with ti minus tj equaling some error, and now we, and this is cross multiplying. so we end up with an expression which no longer has a season because we're comparing relative ts to each other rather than absolute ti. so that is one way of just factoring out one of the parameters. Okay, so that's a technical detail in terms of estimation, which will make our life a little easier. Now we just have three perimeters to estimate. We estimate Alpha, the Gamma Is, and the Beta Is. Alright. So these are the parameters that are left, and we factored out that, that C parameters. Okay. Fitting technique, very simple... What we'll do is, we'll just build a grid of Alphas, Beta i's, Gamma i's. So, we've got a grid over all these things for each network and each school and each specification of biases. We can see what's the actual number of total friendships that would be predicted for each group What's the realized number and so we can calculate an error in terms of how, how big the error is in, in total friendships compared to what it was.What's the error in terms of actual group relative group meeting rates, the qi's. So we can, we, these predict ti's and qi's Right? So for each one of these it predicts ti's and qis, and then we can look at what the actual ones in the data are. And, sum the squared errors across all the networks. So for each one of these we're going to have say if we have four races, we'll have four sets of tis, four sets of qis. And we can sum the squared errors, for each school we'll have a set of eight errors, sum all those up, and then choose the biases, to minimize the weighted sum of the squared errors, okay? So we, we just choose those things to minimize these. Okay, so what do you get when you fit this? So you can go through there's actually five categories of students because there's the Asians, blacks, Hispanics, whites, and there's also some that are miscoded or, or didn't indicate race. Okay, so we have some others, and then we have the fit. Alpha comes out to be about 0.55, so roughly like a square root in terms of diminishing returns. When we look at gammas, what do we get? We get Asian, they get different type friendships are worth about 0.9 of same type friendship. Blacks 0.55, Hispanics 0.65, white's 0.75. So we get different fits of that parameter, all of them are less than 1, but they're varying in terms of at what rate they would like to form or they get a value of different type friendship compared to same type friendship. And then the second thing that we have are these beta parameters. And the beta parameters indicate that for asians and blacks we're seeing a high rate of bias towards meeting owned types. so a good portion of the bias that they actually observe is actually due to the fact taht they're meeting themselves at a much higher rate. whereas for Hispanics and whites, these parameters are much lower and in fact the Whites see a mixing rate which is roughly 1, Hispanics about 2.5 and then Asians and blacks a factor of 7 higher. Now, one thing we can do is then ask you know, are these statistically significant numbers? Do we have any idea whether these could be, you know maybe all these numbers are just noisily different from one and in fact the model isn't, isn't all that different. so are these you know, truly different in terms of some statistical sense? And what you can do, is we can test a hypothesis. So what we could do for instance is look at the sum of squared errors. So this is the residual sum of squared errors. So this is the sum of squared errors that we get, by looking say just at the preference biases. So look all the, all of the gammas and look at the ti's that are generated, see what's the errors that you actually see in the data. and then say, let's suppose that we restricted all of these to be equal to 1. So we, we've forced all of the gamma parameters to be equal to 1. Okay, so you force those to be equal to 1 and then you do the best fit of the model. What you end up with, you, you'd end up with, a-alpha would drop to .2. But the error would go up to 17000 compared to 4000 when you allow these parameters to vary. Then you can do an F test. And what this says is that this is, the F value here is 42. The F threshold for even a 99% confidence level is 3.3. This thing is way, I mean, the,the size of the square areas you're getting its so much larger, its a factor of four larger, so you are getting basically, you know, a huge amount of the error is actually being explained by allowing these, gammas to differ. So, if you allow the gammas to differ across race you are actually explaining a huge amount of the error. The error blows up by a factor of four when you, you force all of these gamma parameters to be equal to 1. So you know, you can, you can reject, so the, the ones red here indicate that you rejecting these things this particular hypothesis. So they're certainly not all equal to one statistically under this particular model. Are they all equal well the error goes up to 61.75, if you forced them all to be equal, the best guess would be that they're all .8. Okay. And then you can ask, okay, is it, is it true that Asians and Blacks have the same preference parameter bias? If you fit a model where you force those 2 things to be the same, and re-estimate the model, you know, you'd end up with a, an estimate of alpha to be 0.7. The gammas for those two races that are forced to be the same, the Asians and blacks will be 0.8, and then so forth. What would the error be there? Well it would go from 4700 to 5300. It actually has an F value of 9.93 still highly significant. So, it looks like Asians and blacks have different parameters. The reduction in the error is not just due to randomness. So, using these kinds of models, you can go through and do F tests and other kinds of statistical tests, by looking at the errors you observe under the model and the errors that you would observe if you forced. or if you work with some null hypothesis or some alternative hypothesis, then, then the one the one that allows all of the parameters to be fit, that gives you, a new set of estimations. You can compare the errors that you get under the two and then ask whether that reduction in error came up at random or not. A standard statistical test. In this case an F test tells you which ones. So you can't reject the, the hypothesis that Asians and whites have the same preference bias. You can reject the hypothesis that Asians and blacks have the same, and so forth. So you can go through and do, a, so blacks and Hispanics are not distinguishable here. but blacks and whites are distinguishable, right? So when you look at these F tests. which ones are statistically significant, you get certain differences you can say are statistically significant, and other ones are, are not. Okay. You can do the same thing for the meeting bias, you can go through and, you know, same kind of tests. And indeed, the meeting biases are also highly significant, so it really appears that there's bias both in preferences and In meetings. And what this again. What I want to emphasize here. Is not this particular model, but this approach of. If your careful about writing down a structural model. And you can began then, to derive implications of that model. That model then generates certain observed patterns. Match those patterns up with the data, so in this case what it was generating was total number...the degrees of all the agents and the relative fractions of friends of the different types they should have. And then we can look at the degrees and fractions of different friends that they have in the data, try and best match those parameters up That gives us estimates for preference parameters an so forth, and then we can test whether they're significant and learn something about the relative choices that were made. Here it appears that both choice and chance were present, if you believe the model then it looks like people have biased preferences towards own type and that's accounting for the fact that you're forming more friendships when you're put in a school that has more of your own type. and, and so you know we, we end up with estimates, there. And the you know, the kind of thing that that allows one to do is then do analysis where you can go to look at say, counter factuals. What would happen in a school if we change the way in which people meet. And so we try and eliminate that beta parameter and move that towards one. So we want to make sure that everybody meets each other. How much of an impact is that going to have on friendship formation. Using this model you could begining to estimate something like that. So it allows you to, to look at different policies or, as opposed to a policy that tries to influence a preference parameters that would have a different impact on, on what would happen. And so using a model like this, you can begin to sort those things out. And so this is just an idea of one particular model that marries strategic formation with some randomness. Very specific model, but it's a technique that can be used much more generally.