Not only are we generating larger amounts of data but we're generating data of much greater complexity. So, this is this protein data bank I mentioned and this is its growth in terms of number of structures there are in it. Also each, well not everyone of these, but the majority of these little blobs are actually Nobel prizes. So there, my, my, blob from John Kendrew won the first Nobel prize in structure biology. And all of these essentially for, well most of them also brought forth Nobel prizes. These are actually drawn to scale. So you can see what's happened over time is the complexity of data. Up to the whole ribosome for example, the whole protein synthesis factory in the cell, we now understand that in ways that we just never did before. And so the complexity of this data, and what that means in interpretation and use, which I'm not really going to get into, except to say we are in an era that's quite different than it was before. I've just a discussion with the chancellor last week about actually establishing a data science department here, or some entity that's, that's dedicated to data science. And people would say why do we need that? You know, we, you know that doesn't, don't people just do that in their labs. Well I think we need it. You could've said the same about computer science 30 years ago. Computer science became a very vibrant and active discipline, and I think what you're going to see is data science become the same kind of thing. Because it's actually critical to the future of many industries, including the drug discovery industry. Again, my biased opinions. A number of years ago, I was actually asked by uh,[LAUGH] then chancellor to tell me what bioinformatics is in one slide. So I produced this, and it, it never got used, because it's way too complicated and way too, you know, it's, it's, there's not a picture on it or anything like that, so, but, it really sort of reflects things that are going on, even today. so I already mentioned this notion of huge in-, influx of data. And what we're trying to do, of course, is turn that data into information, knowledge and discovery. including in drug discovery. And, you know, these are sort of the processes that go on to reach that. At the same time, back in oh it was about 2010, we actually passed, surpassed the number of, the number of websites, there was more websites than there were people on the planet. So, you know, we've got this incredible increase in the amount of information available. And at the same time, we're actually working at high levels of complexity to understand what's going on. All the way to higher life forms, all the way between that and DNA sequence. And all of these projects in here, brain mapping, cardiac modeling, are things that are going on on this campus that really relates to that kind of integration of data. So what you'll see, what you'll see coming out of all of this as I move forward is. First of all, I believe it's changing the way that we do science. And the way we think about science. We've been sidelined for way too long. each of these different levels of biological complexity. What we're starting to do now,[LAUGH], is move across those levels of complexity, and you know, it's a very exciting time, it is the best of times, at least for that. So this is just, you know, I just love this, this is my, you know, this is, this is my dream of doing science, so Craig Venter, you probably know as the Venter institute here. He, he was one of the major proponents of metagenomics. Essentially one day just decided I'm going to take my yacht and I'm going to sail around the world, and periodically out of all these dots I'm going to scoop up a bucket of seawater. And I'm going to sequence everything that is prokaryotic, everything that's small In that sea water. And so, what he did, and you know, and then of course he comes back. So, he goes on this great cruise on his own yacht. And he co, when he comes back, he's got a science paper already. It just doesn't get any better than that. [LAUGH] in, in any case, so this is where he went and he did collect all this data. So, we suddenly started to accumulate very large amounts of protein de-, sequence data that we didn't have before. Because we were sequencing these complete organisms. And so there was of the order of 17 million sequences on the first pass that we'd never ever seen before. Alright? So there was 17 million completely new proteins that no one had ever seen before. And 99% of the DNA associated with that was from unknown organisms. We had no idea what they were. So we hadn't even begun to catalog one tiny fraction, of the organisms, that are in the ocean. So the idea that there is culturable organisms there. That there, there are, there are, you know, compounds that can be found. New receptors. And that all the kinds of impact that it could have on drug discovery is all there. Alright. But as we started looking into this, there were a number of discoveries which I won't get into, but, but what it did is it started the whole metagenomics revolution. So it wasn't just getting buckets of sea water. Everybody wanted to have their own yacht, right, so they basically started scooping up and doing things, they started doing soil, right, so. Now we're analyzing all the organisms in a cubic foot of soil, in different parts of the world. And perhaps most interesting of course was actually to look at the human gut, so the microbiome is actually sequencing all of the organisms that exist within the human gut. Already out of that, we have some pretty good bio markers for certain types of autoimmune diseases like Crohn's Disease. So you could actually now, we're getting to the point of early detection of Crohn's by virtue of doing large scale sequencing of the human gum, microbiome. So, you know, I think this is, you know, exciting examples of what's to come. This just illustrates the known and unknown, so this is just one particular type family of proteins. I don't want to get into a lot of detail, but you can see that the red, were completely, this is a phylogenetic tree, so it expresses the evolutionary distance between these things. But the only point is that the higher you carry out its own sequence, but in this, in this metagenomic data we discovered thousands and thousands of new, of these new PTPases, and that's just one protein family. So lots of very valuable and new information. But with all of that comes problems, you know, it's always the case when things move very quickly it takes a lot of time to catchup. I mean, it takes times to catchup ethically, legislatively and you know tech, in terms of technology. So, as a result of all of this, for example. When you go to a database, and you pull out a sequence. And you look at its annotation. There are estimates to say that 30% of that annotation is wrong. Because that annotation has been made by propagation. So, in other words if sometime, someone says, okay. This, this particular protein sequence does this. And then someone else compares it to another sequence and they say okay it's the same, this must do the same thing. But if the first one is wrong the second one is wrong and you get this kind of propagation. So you know, we've got a lot of misinformation out there right now. That's the other point. So in any of these studies it's very important to. take heed of all of that. So that's sort of the omics revolution part of it. Let's look at, another, some of the other revolutions that are going on, which I think are going to impact things. So the open science and the IT revolutions. What does that bring to the table. So who's familiar I can ask a question but you're not allowed to answer, only put up your hands. Who's familiar with open access and what it means? Okay, a few people. How many people have published a paper in an open access journal? [SOUND] Well you should. So the basic idea is when changing the publishing paradigm we're making information accessible to absolutely everybody. And so, instead of paying to read it, you're paying to write it and publish it, essentially. The models that do that. The, the journal models and the societal models all still work. There's now proven business models to support this. But it changes the, what you can do. Because. For example, a typical license on an open on a open access journal article, would be what's called cc-by. It's a creative commons license, an these are actually, declared and described, accurately. An it means that anyone can take your knowledge, and they can do what they like with it. As long as they attribute you as the original source of that information. So, the chance, the ability to propagate knowledge actually, in my opinion, increases quite significantly. And, you know, it's taken a while for it to get going. But last year, about 20% of all journal publications in the biomedical sciences were open access. And that number is increasing steadily. I think in five years essentially open access will be the model. And so that, that creates a lot more access to information than we, we ever had before. I think there's some interesting software as sort of going on the same paradigm. I won't get into that. How many people are familiar with GitHub? That's a good reason one, no, one person's stretching their fingers. Okay so I won't get into that. But I mean, other areas which I think are very intriguing are the socialization of science and also the, the, the, the flattening of the hierarchy, such that anyone can make discoveries and be listened to. We're in, I love this fact, we're in this generation and when I was growing up I would be sitting at the dinner table and my father would slap me around the head and say you should be seen and not heard. Alright? Now, of course, you know, I couldn't possibly do that with my kids. Oh, they're too big now, anyway, but when they were smaller. But now, we're, we went. Long time ago, we went into the generation of not being seen and not heard, but now we're in the generation of heard and not seen. Alright, I, I've got to tell you this, because it, to me it's really interesting. is I got a paper into the journal I run, and it was a fabulous paper on pandemic modeling. All right, and I was so impressed by it when I looked at it, I, as I arrange to meet the author because it was a single author, and she was actually here in San Diego. We met, we talked about the paper. I was so impressed with her I invited her, she gave a lecture here at UCSD. She went to Princeton to give a lecture on the same stuff. Really nice. So what's, so what? What's special. She was a senior at LaHoya High School. She was 15 years old. That paper was reviewed by Science. So anyone can do great stuff now. They have access to information. They have access to knowledge. And under certain circumstances. Obviously, she's an exception rather than the rule. But it tells you the kinds of things that are coming. And so in my opinion, so all of this is perturbing the system.