In the previous module, we discussed how you can set your error rates and design studies to achieve a certain level of power. Because it's almost never possible to completely eliminate errors, it's very likely that if you perform lines of research you will observe mixed results. If we look at the scientific literature then one of the bigger problems we see is that journals and researchers very often use p-values and especially statistically significant p-values as a threshold to either submit something for publication or to accept it for publication. At the same time we know that mixed results should happen and that these mixed results and non-significant results are much less likely to be written up. We know this from research that has looked at studies that were submitted to the time-sharing experiments in the social sciences: A program that gives researchers the opportunity to collect a large nationally representative data set based on the study proposal that they submit. Because we know which studies have been approved and where data has been collected. Franco and colleagues thought it would be interesting to take a look at whether the results were unpublished and not written up. Unpublished or written up and maybe undergoing review, or ended up being published as a function of whether the researchers observed null results, mixed results or strong supportive results for their hypothesis. As we can see in this table researchers were very likely to have written up and published results if they were strong in favor of their hypothesis. But null results were very often unpublished and not even written up. At the same time non-significant studies should be expected. Now, sometimes if you study a hypothesis, you're right and the alternative hypothesis is true and sometimes you're wrong and the null hypothesis is true. Now, let's say that you're like me and you're actually always right every study that you do. The alternative hypothesis is correct as you predicted. Even in these cases you should expect non-significant results in lines of research. Let's imagine that you do four studies in a row and for each of these studies you design them in a way that you exactly have 80 percent power in each study that you perform. What is then the probability that all four studies will be statistically significant? Well, we can easily calculate this. It means that you have an 80 percent probability of a significant result in the first study multiplied by an 80 percent probability in the second study and in the third and in the fourth. So altogether in the four studies, the probability that all of them will be significant if you have 80 percent power is actually 41 percent. It means it's more likely that you'll observe mixed results. In a set of four studies, if your hypothesis is true and if you have 80 percent power. Now, for some weird reason, scientific journals don't represent this reality. If you open a scientific journal and you take a look Is this what you see? Do you see these mixed lines of research being discussed and presented? No, it's very peculiar that if our job is to write what is most likely to be true and to discover things about reality that the scientific journals that we publish don't actually represent this reality. We're pretending as if everything that we do works perfectly and we're not giving an unbiased overview of the mixed results that we know are present because they should be there. Statistically speaking this must happen. I always find this extremely peculiar and will be really happy if somewhere in my lifetime we managed to solve this issue and actually publish scientific journals that look a little bit more like what's really happening. Now, imagine that you perform three studies in a row. They are very similar, almost identical, maybe some minor variations but you have no reason to assume that these small variations matter theoretically speaking. You should find a result in all three of these if your hypothesis is correct. You observe that two of these are statistically significant. The third one isn't. Would you consider this convincing evidence for an effect? Maybe you think two out three doesn't really sound impressive. Does it? Now, if you happen to be a Meat Loaf fan you might know one of his songs which says, Two Out of Three Ain't Bad and I have to say in this case Meat Loaf was spot on. Two out of three is actually an impressive result if you do three studies in a row. If we just calculate the probabilities of this happening if we assume that you had 80 percent power and you use an Alpha level of five percent. Then this result will be observed in 38.4 percent of the studies that you perform. If you do lines of research with three studies you have 80 percent power and a five percent type one error rate you will observe two out of three results quite often. So these findings should be present somewhere in the literature. Now, maybe you don't think that they're very convincing. So you don't submit them to an editor. But if you would, you could argue that actually the data you collected is very strong evidence for the alternative hypothesis. If we do the calculations based on binomial likelihood ratios, we see that it's almost 54 times more likely to observe two out of three significant results, (under the assumption that we had 80 percent power and a five percent type one error rate) when the alternative hypothesis is true - when something is actually going on - than when the null hypothesis is true. So you should be able to submit such lines of research again if they're very coherent and if they all test a similar hypothesis, you should be able to submit these to a journal and say, "Dear journal. Don't you want the scientific literature to be accurate? Well, this is what real research looks like." So after you've realized this take a moment to think about this question. Would you trust scientific journals that publish mixed results more or less than journals that only publish statistically significant results. Well, I would say that, if we have a scientific journal that actually represents reality, gives me a slightly higher trust in that what they publish is actually like reality. But if I see scientific journals where everything that is published and all lines of multiple studies yield significant results, I know for a fact that it can't be true. So I would say that publication bias is one of the biggest challenges that we're facing. We should be able to publish these lines of mixed results if we do research. Actually, if we look at the scientific Code of Conduct, it also tells us that we should be doing this. It says that we need to do justice to all research results obtained. We should not remove or change results without explicit and proper justification. So publishing selectively, only the results that work, is problematic. This is a topic we'll return to in the next module when we talk about scientific integrity. Now, unless you only study true effects with 100 percent power which is quite unlikely to be true in your case, you will observe some p-values where the result is larger than an Alpha level somewhere in your career. Actually it should happen quite regularly under reasonable assumptions of the statistical power in your test. So how can you share null results of well-designed studies, so that other researchers can actually find them? First of all, it makes sense to try to perform replication and extension studies. Stay true to the original hypothesis and test it in a consistent manner. This makes sure that any non-significant results can be likely attributed to just being a fluke. There just non-significant because it's a Type II error which we know should happen especially in lines of research. So if you perform coherent lines of research that are quite similar conceptually, you should be able to include all of the studies in a small internal meta-analysis. A meta-analysis is a topic of the next lecture. Try to design informative studies and publish conclusive results both when the alternative hypothesis is true as when the null hypothesis is true. Make sure that if you design a study and ask a question, the result is also interesting when the null hypothesis is true. We talked about how you can falsify predictions and test things through equivalence tests for example, and this is one way to design and publish informative null results. You can also use registered reports. In the registered report format, you submit your study idea, your methods, and your analysis plan to the journal before you have collected the data. The reviewers and the editors don't know whether the results will be supported or not, because the data is not yet in, and they will evaluate the manuscript and give feedback on the original study idea based on whether it's scientifically interesting and the methods are well-designed. If you then follow up on this proposal and you collect the data as you intended, then the results will be published regardless of whether they are statistically significant or not. You can also consider discussing all related studies in a research line in the discussion section. If there is some reason why you honestly think that a study you originally thought was a good idea to perform ends up not yielding the results that you expected and you think there might be a good reason for this, you shouldn't discuss this. This is actually a perfect thing to discuss in, for example, the limitations section of your discussion. Because if you really think that there's a reason for a non-significant result, then this means that there is a limiting factor in your hypothesis. It works under some conditions but not under others, and you can mention this in the discussion. You could also link to raw data and an online summary of the study that you performed. So people who want to perform a meta-analysis in the future, can still have access to the raw data. People have argued for a long time that it's quite problematic that we dress up our research to look better than it actually was. This is a quote by Greenwald, from 1975 where he writes, "First, it is a truly gross ethical violation for a researcher to suppress reporting of difficult to explain or embarrassing data in order to present a neat and attractive package to a journal editor." Now of course, if you start to write up research more like reality, it will be a little bit more messy. You will find mixed lines of results. This has to happen, and you're not really used to seeing this in the literature. So we're in a phase where we have to figure out how we deal with mixed results and how we evaluate them. It's therefore very important that if you're a reviewer of a paper that reports either null results or mixed results, you do not bias your evaluation of the methodology or the scientific contribution based on whether the results are statistically significant or not or whether all studies are significant or not. Remember again, it's not even a realistic expectation that in all papers you review, you only find significant results. In an interesting study by Mahoney, 1977, he submitted almost the identical paper to different journals and let them be reviewed by people. He looked at what reviewers say about these papers as a function of whether they are presented as positive results, negative results or whether for example, there are not even results part of the paper. So they're just seeing the introduction and the methods but not the outcome. Interestingly enough in his research, he found that if there are null results, reviewers are much less likely to be positive about the use methodology or about the scientific contribution than if the outcome is positive. Now, this shouldn't happen. Their scientific contribution and the methodology should be relatively independent for well-designed studies based on the outcome of the results. If you like the methodology, if it yields a positive result, you should also like the methodology if it yields a negative result. So if you review papers that start to publish and present mixed results, try not to be biased as a reviewer. Other fields have developed much stricter rules to make sure that the research that has been done is actually shared and available to other researchers. The Food and Drug Association requires researchers to submit certain studies such as clinical trials to a trial registry. So that it's clear what people are planning to do and when the trial is completed, to report the results within a year. Failing to follow up on this rule which is called the Final Rule, can actually lead to huge fines. Now, this is one way to solve the problem. I think that if we as a field don't manage to solve these problems ourselves, it's highly likely that we will have some sort of authority or association that requires much stricter rules and we'll implement these in our fields as well. So when you design lines of research, make sure that you are prepared for mixed results. They have a very high likelihood of occurring especially if you perform multiple studies. Prepare for them and make sure that you publish all results of well-designed studies.