Welcome to our third lecture in the course. Today, we're going to cover Issues of Reproducibility in Systems Biology. We will discuss general considerations that we must be aware of when interpreting data in systems biology approaches. In order to obtain reliable data, one must start with a carefully planned experiment, and that implies that there is a clearly defined biological problem that we're addressing this problem with the appropriate experimental design and analysis that contains statistical power. That it has a suitable sample size, and the appropriate controls, and validated reagents. Given the high volume of data obtained where high throughput approach is commonly used in statics biology, it is not uncommon for these approaches to give rise to high false negative, and hi, and false positive rates. So, a, a false negative is a biological event not detected with the experimental method used, and usually is due to a sensitivity issue. A false positive is an event that is detected with the experimental method used but is not biologically relevant. And within the false positives, there's two categories, the technical false positive, which is in the case when the false positive is not consistently detected, and the biological false positive, when it is consistently detected, but is not biologically valid. And just to illustrate some of how these issues are applied to a Systems Biology method, we'll just go through the ChIP-Seq. As you remember from Marc Birtwistle's course on experimental methods in Systems Biology ChIP-Seq or ChIP ChIP is an approach that is used to measure DNA occupancy with transcription factors or, other DNA binding proteins. And, it basically, consists of obtaining cells or tissues that have been treated. Then this, the tissue or the cells are crosslinked, the, are liced, and then the chromatin is sheered using, usually sonication. Then, the protein of interest, the target protein is immunoprecipitated, and then the DNA is freed from the protein. The DNA is amplified and, and made into a library, and is sequenced, and then, there's statistical comparisons to identify which of those regions have been significantly bound by the protein of interest. This procedure has several limitations that we have to be aware of. For starters, the binding event cannot be assumed to be direct. For example, the transcription factor that we may be interested in, may be part of a complex, and so, there may be intermediate proteins between the, the protein of interest and the DNA. It may, it does require large quantities of starting material which may not be practical for certain tissues. The rare and low affinity interactions tend to be hard to detect. And, one of the biggest limitations is this whole approach is dependent on the quality of the antibody used to immunoprecipitate the protein of interest. So, it has to be an antibody that has high specificity, and high sensitivity. So, going back to how this may result in limitations, may result in high false negative or false positive rates. For Chip-seq, the sensitivity issues is, is, is a technological limit. And in terms, it can be due to that the, the event that we're looking, DNA binding event, may occur in, in very few cells within the tissue, or very, sporadically. There maybe also an overrepresentation of strong affinity binding events over weak events due to the sensitivity of the antibody. So, the antibody may not detect low abundance proteins of interest, or transcription factors. And there's also part of it, maybe the computational analysis that we do after where if they are high statistical thresholds, the weaker chip interactions may not be included. Possible false positives. The technical ones usually means that the, it involves that the artifact in, is highly variable. In the biological false positive, the artifact or can be consistently reproduced by a lax biological basis, and these false positives can be somewhat dealt with, if you have the proper quality controls. And for the, the, the ChIP assay, basically, a good quality control would be to repeat the ChIP assay in a background that lacks a transcription factor so that and determine whether those, those peaks, those, those Chip interactions are actually accurate. And that would be possible, if you, for example, treated the cells with siRNA against the transcription factor of interest. Another good quality control is to really validate the antibody specificity and sensitivity, to make sure that the antibody targets only the protein of interest and it doesn't have off targets. The issue of antibody specificity and reliability is a, a, a big one, and there had been a number of studies I have shown there is a, a great need to validate the specificity and sensitivity of these antibodies. And recently, a group has done a very thorough study, where they assess the specificity over 6000 commercially available antibodies. And, as you can see from this graph, y-axis, they have the antibody providers, and on the y-axis the success rate of this validation. And what they found is that, they could validate less than 50% of the antibodies, and from this graph, you see that certain companies are selling antibodies that are incredibly poor, and the variability is significant depending on the type of antibody. So, this brings us to this idea that, that one of the first steps in any ChIP assay is to validate the antibodies. And the ENCODE guidelines that have been recommended, and ENCODE is the encyclopedia of DNA elements consortium that, that have developed guidelines for, for ChIP assays. And, what they recommend is a multiple levels of validation. For instance, they have done validations of this antibody against SIN3B, and this is an example of the validation in two different cell types. And, as you can see in this antibody is able to detect the appropriate band of size, whereas, this antibody in the same two cell types is not able to detect the, the, the band. The, the, the most prominent bands are at different molecular weight size. They are also further validating using for a different antibody, using immunoprecipitation assays to determine that, indeed, you're precipitating the, the specific band. Also immunofluorescence assays, where they determine whether the localization of the protein is consistent with what is known. And also, compare that localization in cells that have been treated with siRNA to abolish expression, and determine how specific the staining is. And, as you can see, for example, for this antibody seems to be pretty specific. And finally, using mass spectroscopy methods to really identify the identity of that band. So, they would run the cell lysates on a gel, cut out the band, and do mass spec to determine what proteins are in that band. Another source of concern when it comes to reproducibility is the use of small molecule inhibitors, as they may have off targets that may occlude the real effect of regulating the activity of a protein. And, this was very nicely illustrated in the series of studies, where they looked what are the effects of well-known protein kinase inhibitors, on other kinase activities. So, as an example here, we have a list of protein kinases, and their remaining activity after being treated with two well-known PK inhibitors, Protein Kinase A inhibitors. And, as you can see Protein Kinase A when treated with these two inhibitors at this concentration, still has some activity. What is concerning is that, when you look at the activity of other kinases, you realize that these two inhibitors are advertised as PK specific inhibitors, actually, have even better activity, much more activity, very different kinases. For example, these PK inhibitors appear to regulate the activity of MAP kinase, kinase one. The activity of PDK1, and the activity of ROCK 2. So, if you are doing an experiment where you're using these kinases to demonstrate the involvement of PKA in a biological event the fact that you may actually be, the fact that you may be affecting unintended kinases may put your findings in question. So, how do we solve this? Well, one of the ways to solve it is to use multiple structurally unrelated inhibitors, and check if you still have the same effect. And the idea it is that structurally unrelated inhibitors probably will not have the same type of off targets. Also, more importantly, is to use complimentary approaches to reduce activity such as over expression of mutations that may affect the activity of the intended protein. Or, decreasing the levels of the intended protein using siRNA approaches. I want to end by just highlighting what should be the best practices to ensure reproducibility of research. One is to provide detailed methods enough detail that every person that attempts to reproduce these findings is able to. The next is the appropriate biological and technical replicates to show the variability of, of this effect. And, finally, is to make sure that the key findings are validated using a number of different methods. And, I want to also point to you that there is this very interesting panel, the NIH that was video cast called Reproducibility of Data Collection and Analysis Modern Technologies in Cell Biology: Potentials and Pitfalls. It's, it's a very nice Panel that expands on other approaches that we didn't cover today. So, and that's it for this lecture and next lecture will cover our second research paper. Thank you.