Welcome to our last lecture in the Integrated Analysis in Systems Biology course. Today, we will cover the last research article titled Common Genetic Variance Modulate Pathogen-Sensing Responses in Human Dendritic Cells. We chose this paper, because it is a good example of using high-throughput expression data to gain insight into genomic variability. The outline for today will cover some of the experimental methods that are used to understand genetic variability, and to be able to make associations of genetic variability and expression. We will go over the goal of the study, which is to identify some of the genetic variants that give rise to variation in immune response. And we'll cover some of the findings, the cis and the trans, expression QTL, mechanisms by how these cis and trans QTLs are working, and finally, some summary and conclusions. We'll get started by defining what are SNPs. SNPs stands for Single Nucleotide Polymorphisms. If one were to sequence DNA from a population of individuals, you would notice that a portion of the DNA sequence was conserved, but there would be certain nucleotides that would be different. Single nucleotide polymorphisms are just generic variants, or alleles, whose sequence reveals only a single changed nucleotide. These SNPs, or alleles, can be found in coding and non-coding sequences. So they can be found within the sequence of a gene or they can be found within the regulatory sequences, such as promoters of a gene. And they can be observed at different frequencies in the general population. And the ones that are more informative are those that can be found in at least 5% of the population. Since every individual receives DNA from their mother and from their father, you have a possibility to either have a homozygous SNP, meaning that both the homologous chromosomal DNA have the same allele. So in this case, the allele is A, or the allele is G. Or you can have a heterozygous SNP, where the homologous chromosomes have different alleles. So one variant could be the G, and the other variant could be the A. Historically, the contribution of genetics to disease has been studied using linkage analysis. So linkage studies are especially useful when you have diseases that are due to a single gene and are highly penetrant, meaning that if an individual carries this gene, there's a high likelihood that they will exhibit the disease. And a good example of it is BRCA1. BRCA1 is a gene that when mutated, it has been associated with a high incidence of breast and ovarian cancer. For linkage analysis, it's usually required to have a detailed pedigree of families that have both affected and non-affected members across many generations. It is the most powerful approach if you are studying, like I said, highly penetrant phenotypes in that is monogenic. And linkage studies consist of genotyping individuals for known SNPs, and these SNPs are used as markers, and then using statistical tests to determine if the SNPs are closed, or linked, or segregate together with a gene that is important for disease. This approach has successfully identified diseases associated, genes or mutations, but not all diseases are due to a single gene. When a phenotype is not always a reflection of the genotype, we can assume that it's a complex trait. A complex trait tends to not follow Mendelian genetics, meaning that complex traits do not have typical patterns of dominance or recessiveness, like Mendelian traits. They tend to be polyogenic, meaning affected by many genes. And there tends to be an environmental component to them. So for these complex traits or diseases that may be due to multiple mutations or gene variants, linkage analysis is not an adequate approach. A better approach is genome wide association studies, or GWAS. GWAS is the study of genetic variation across the entire human genome, that is designed to identify genetic associations with the presence or absence of a disease or condition. It requires a large population, comprising both affected and non-affected individuals. This population is genotyped. And the SNPs are analyzed to determine SNPs that associate with a disease or have a higher allele frequency in affected individuals. The results are then plotted in a Manhattan plot, as shown here, where the Y-axis is a statistical association, and the X-axis is the genomic position of the SNP, so the chromosomal location. After threshold is applied in this case, this is the threshold, the SNPs that appear to be significantly associated, show many chromosomal locations, or loci, that appear to be involved. There's also other approaches that can be used to associate DNA variations, or SNPs, with a phenotype. And another approach is Quantitative Trait Loci, or QTL Mapping. This is particularly useful when there's a quantitative trait involved. And a quantitative trait is a trait that varies continuously across a population. Good examples of that would be, for example, height, or disease risk, or growth rate, something that there is variability to the phenotype across the population. And a QTL stands for Quantitative Trait Locus. Refers to the actual region, or the SNP, that is statistically associated with the phenotypic trait. And given the variability of the phenotype, there may be an environmental component to it. So there may be population, in this case, just for illustration purposes, there's multiple leaves in each sleeve. Leaf has a different phenotype. But there is a huge variation in the phenotype in all the individuals of the population. And if we want genotypes, this population and identifies one allele that appears to be informative using statistical methods. You can start to see that the alleles can correlate with the variability that is found in the trait that is interested. So we have a quantitative trait that we're interested in. Once we genotype, we identify these alleles, and then we associate these SNPs with the variation in the phenotype. But, there is a caveat to these approaches. Both QTL mapping, or GWAS, are great in the sense that they allow us to identify loci genomic locations that are associated with these complex diseases or traits. But the issue is that these loci may contain many genes, so a great number of genes. So it is difficult to prioritize which of these genes are associated with a disease. More recently, there's been a variation of a QTL approaches called EQTL mapping, where instead of just looking at a phenotype, and associating the phenotype with DNA variation or SNP, you're actually looking at mRNA expression as an intermediate phenotype. So in this type of approach, you're combining genotyping with high throughput methods to monitor the expression that results from this genetic variation. And there's two types of eQTL. One, the eQTL that basically associates a region of DNA with mRNA expressions changes. And then there is reQTL, which is a response expression quantitative trait loci, which involves a type of reQTL, where you're associating with a mRNA expression changes after a stimulation. There are two types of eQTLs. There are those that occur in cis and those that occur in trans. And what that means is that depending of the position of the eQTL, with respect to the gene, will determine whether this is a cis or a trans. So a cis eQTL featuring a SNP that is located very close to the gene under regulation. So, it's usually within 1 million base pairs window of the geno. They're the start of the geno. The stop quote end of the gene. And it most likely will directly affect the expression of the target gene. To illustrate this, we have three different genes, Gene A, B, and C, and Gene B has a SNP very close to a starting position. And so we can assume that given how B is actually expressed less when the SNP is present, that B is regulated by a cis eQTL. So the SNP in the promoter region of gene B can affect transcription binding. Now for example, that's a possible mechanism. And causing in this sort of deficit in transcription factor binding in this promoter region can cost a reduction in expression. And it has no effect in Gene C and Gene A, which may be far from this SNP. For trans eQTL, the SNP can be far from the gene under regulation. And the cut off tends to be about, again, one million base pair window, or on a different chromosome, so that the distance is greater than one million base pair window, or a different chromosome. And the effect on the SNP can be indirect, meaning that it can occur through different genes. So for example, we may be have Gene A containing a SNP that affects its activity. The gene A may be a transcription factor. It may have a SNP within it's codon region, that may affect somewhat it's function without affecting it's expression. And thus, now since this Gene A is a transcription factor, it may actually affect the expression of Gene C. So the SNP would be a trans eQTL of Gene C. Another possibility is having a SNP in the promoter region of a transcription factor, where this SNP would actually regulate the expression of gene A, and indirectly of gene C. So in this case, gene A is regulated by a cis-eQTL, while gene C is regulated in trans-eQTL. So first, there's the association of expression to genomic location. So very similar to the Manhattan plot that I showed you previously, Y-axis is the statistical association measure, while on the x-axis is the chromosomal location of the SNPs. But in this case, the correlated with the expression levels, in some cases and not in others. So as you can see in here, there is no association while in here, there's a strong association. As you can see, there's differences in the expression. And depending whether you have a homozygous allele or a heterozygous allele of the SNP, you can have also gene expression differences. For example, in this case, having a homozygous allele shows an association with less expression of gene A, having a heterozygous allele results in more expression of gene A. So that was the primer for this paper. In this research article, what they're interested in is determining how this genetic variation contributes to immune system response to pathogens. And specifically, to what extent the heterogeneity in response to pathogens is driven by genetics, and how this heterogeneity can cause diseases in some but not in others.