In this lecture, I'm going to introduce you to bioinformatics analysis of the metagenomic data. We'll go through metagenomic workflow, quality assessment of reads, and you will hear a brief introduction to their mapping of pair-end reads. Metagenomics involves collecting of samples from an environment, such as clinical samples from human, pig, poultry, beef, another animals, fresh water, and sewage samples. Collected samples undergo the purification and extraction of DNA, and later being sent to sequencing. Sequence reads that come out of sequencing machine undergo quality assessment (QC) step. Fastqc is a commonly used quality control tool. And here, you can look at the quality of your reads, you can look whether sequencing machine left, any adapters, you can look at the GC content, you can look at the k-mer composition of your reads. In this example, you can see that closer to the end of the reads, the quality score of bases going low, and naturally, we want to remove those low quality bases. That's why in quality assessment step, we perform trimming of low quality bases from the ends. Here, you can choose various cutoffs. The most commonly used cutoff is 20 and 30, whether cutoff of 20, you remove only the bases below quality score of 20. And if you want to be a bit more strict, you use a cutoff of 30. After trimming of low quality bases from the ends, you might find that some of your reads are too short if the quality of the read was poor. That's why you might want to discard the reads that are shorter than certain amount of base-pairs. And this amount is also dependent on the project and how strict you want to be. Commonly used are 50 base-pairs, and some people use 30 or they use higher number. Additionally, as mentioned, you can look at whether the sequencing machines left any adapters, or primers or they have other represented sequences. You can see which sequence is over-represented. And in the quality assessment step, you can remove them. Here are the examples of the most commonly used read trimming and adapter removal tools, such as cutadapt, trimmomatic, PRINSEQ, BBDuk. And after you perform that trimming and adapter removal, you can see that our reads, they left only with the bases that have high quality score. And additionally, you can see that there are no adapters left. So you can proceed with your analysis. So after the quality assessment step, reads can be used for taxonomic identification to see who is out there, for quantitative analysis to see how many are there and full functional analysis to identify the functional profile of the community. In this course, you might hear a lot the words mapping and alignment. That's why in the next slides, I'll try to introduce you to the mapping of paired-end reads. So, for example, we have a database with two genome references, two chromosomes. And we'll go through the possible scenarios and see which pairs will be remained as mapped and which pairs will be discarded and unmapped. So let's start with the first pair. We have a pair and it finds its perfect match in the first chromosome. And it's properly paired and it's pairs was in the reasonable distance. Here, we have another pair and it finds its alignment in other part of the reference. But here, we can notice that there is one difference, nucleotide difference from the reference. And this is a possible suggestion of the SNP there. Here comes another scenario, another pair. And it maps similarly to the side of the reference chromosome. Additionally, it could have found the alignment in the second chromosome, but those read are not properly paired and they're paired within the very long distance. That's why they would be discarded and only hit within the first reference would be left. Another example where the reads are mapping to different chromosomes, this mapping will also be discarded because will leave only the references if the map was within the same chromosome. Here, the reads mapping closer to read pair number two. And it's once again suggests, it might be a SNP. There because it has a difference from nucleotide and it agrees with the previously mapped read. And here's the last example of the reads mapping to the second chromosome in this case. But in this scenario, you might have a small deletion undergo. So this is an example of pair-end mapping. And that was a brief introduction to the bioinformatics tools for metagenomic data analysis. And in the following lecture, we will look at the different methods for metagenomic data analysis and see some commonly used tools.