The fourth semester is mostly dedicated to the students' work on their Master's theses. The theses are supervised by the leading experts from Russian and foreign scientific centers working in the field of bioinformatics.
Master's thesis projects
of different years:


Spring 2021
Spring 2020
Master's theses, 2021
Data-driven approach to identify differentiation trajectories of myeloid cells in atherosclerotic plaques
Student: Maria Firulyova
Supervisors: Konstantin Zaitsev; Jesse Williams (University of Minnesota)
Atherosclerotic cardiovascular disease is an inflammatory disease of the arteries. During atherosclerosis progression, the special structure with complex cellular composition called atherosclerotic plaque is formed. The differentiation relationships still remain unclear within the intima myeloid cell population associated with atherosclerotic plaque: the differentiation process which leads to foam macrophages formation in the plaque still remains unknown. New computational approaches for trajectory analysis designed for single-cell RNA sequencing data provide an opportunity to reconstruct trajectories for cells of interest. The important feature of trajectory inference is the possibility to identify genes and regulons which are statistically significant associated with the identified lineages. The project is focused on secondary analysis of public single-cell RNA sequencing studies dedicated to atherosclerosis. The results covered multiple topics including processing, integration and annotation of scRNA-seq datasets and trajectory inference of myeloid cells which were identified in all prepared scRNA-seq atherosclerosis datasets.

Presentation_Firulyova M. (slides)
Analysis of somatic mutability in cutaneous melanoma in response to UV-irradiation
Student: Dmitrii Usoltsev
Supervisor: Mykyta Artomov
Skin cancers, such as cutaneous melanoma, harbor the highest mutation burden among all malignancies. While the vast majority of these changes are consistent with UV-induced mutations, the biological effects of UV carcinogenesis have yet to be fully elucidated. We performed in silico analysis of the TCGA cutaneous melanoma cohort to find genes that selectively accumulate mutations in high UV-burden tumors. Subsequently our findings were replicated in in vivo tumors derived from human melanocytes with controlled UV-exposure to confirm UV-induced nature of identified mutations. TCGA melanoma tumors were separated into 3 groups by their UV-signature burden, analysis of per gene mutational burden was adjusted relevant clinical features, overall tumor mutational burden and gene length. In vivo tumors were generated by UV-irradiation of human melanocytes and further injection into mice. Somatic variant calling was performed on lab-generated tumors using original melanoma cell line and tumor resulting from non-irradiated cells as comparison.

Presentation will be available after publication of results
Comparison of T-cell signaling programs associated with response to checkpoint immunotherapy in different cancer types
Student: Marina Terekhova
Supervisor: Vadim Zhernovkov (University College Dublin)
It is a well-known fact that the immune system is critical in cancer development and progression. The immune surveillance theory suggests that the immune system permanently controls the cells and tissues of the body and is responsible for recognition and killing cancer cells. However, this immune surveillance causing selection of cancer cells that are poorly immunogenic or have extensive mechanisms, allowing escape from immune detection. As a result, the malignant cells arise with the capability to slip away from immune destruction, proliferate and manifest clinically as cancer. T-cells are important component of cell-mediated immunity against cancer and they are controlled by the number of costimulatory and inhibitory signals that serve as checkpoints. Checkpoint regulators guarantee that T-cell responses maintain self-tolerance, effectively protect the organism from pathogens and malignancies. Immune checkpoint inhibitors represent a new class of immunotherapy and have demonstrated a rapid increase of overall survival rate in patients with different types of advanced cancer. [...] The aim of this work is to reveal transcription factors that can serve as predictors for treatment with immune checkpoint inhibitors in different types of cancer.

Presentation_Terekhova M. (slides)
Identification and functional annotation of hypothetical proteins from orthonectids` parasitic plasmodium (Bilateria: Orthonectida)
Student: Elizaveta Skalon
Supervisors: George Slyusarev, Natalya Bondarenko (St. Petersburg State University)
Orthonectida Giard, 1877 is a small phylum of poorly known marine invertebrates. [...] Orthonectids dramatic loss of complexity and a unique life cycle is the only case among the large Annelida group, but the origin of orthonectids parasitism is still unknown. The main adaptation to parasitism, the plasmodium, remains underexplored and many questions related to the biology of the orthonectids parasitic stage have not yet been resolved. Discovering genes expressed explicitly in the plasmodium is an essential step towards revealing the mechanisms behind the development and functioning of orthonectids' parasitic stage. It will help to explore orthonectids adaptations to a parasitic lifestyle. Here, we present the identification and annotation of orthonectids' plasmodium-specific hypothetical proteins, intending to understand the processes behind the plasmodium functioning and the orthonectids adaptations to parasitism.

Presentation_Skalon E. (slides)
A genome‐wide association study for flowering time in guar (Cyamopsis tetragonoloba (L.) Taub.)
Student: Aleksandar Beatovich
Supervisor: Alexander Tkachenko
The guar plant (Cyamopsis tetragonoloba, (L.) Taub.) is a short day annual herbaceous flowering plant native to India and Pakistan, industrially important for serving as the main source of guar gum, widely used in the oil and gas industry. The exploitation of this plant in countries of northern latitudes is limited due to longer day lengths during the vegatitive season compared to its native habitat. The ability to efficiently identify early flowering guar varieties would greatly accelerate breeding efforts. Genomic resources of guar are limited, despite its economic significance. This study presents a new highly contiguous guar genome assembly and 10736 variant sites derived from RADseq data of a cohort of 192 guar plants of different varieties. A pilot genome wide association study was performed that found a number of SNP markers in proximity to previously established genes that regulated flowering time that could be used as markers in marker assisted breeding of this economically important crop.

Presentation_Beatovich A. (slides)
Investigation of common DNA variants contribution to polygenic disease risks in Russian population
Student: Valeria Rezapova
Supervisor: Mykyta Artomov
Human traits and diseases are the results of individual or combinatorial factors of genetics and environment. Over the last decade, genome-wide association studies (GWAS) have discovered a substantial number of associated variants for many complex traits. The success of GWAS in finding and replicating thousands of associations for thousands of phenotypes has demonstrated the usefulness of previous approaches and ushered in a new era of human genetics. However, even within European-centered GWAS data, there are local subpopulations significantly under-represented in these studies. For example, Russians, being one of the largest ethnic groups among the Europeans, remained significantly under-represented in GWASs for years. The aim of the present work was to test whether UK biobank GWAS results could be successfully applied for estimation of the polygenic risk scores in samples of Russian-descent.

Presentation_Rezapova V. (slides)
Integrating lipidomics data with reaction networks
Student: Mariia Emelianova
Supervisor: Alexey Sergushichev
Lipids are an important class of biomolecules that are involved in many vital cellular processes. Due to their hydrophobic nature, lipids are the major constituents of biological membranes and are thus the physical basis of all living organisms because they provide the ability to separate living entities from their natural surroundings. Another task that lipids fulfil is the storage of surplus energy for later consumption. Finally, lipids are also involved in extra- and intracellular signaling processes, where they transduce signals and amplify regulatory cascades. Since lipids play a crucial role in many biological processes, any imbalance in their homeostasis can lead to serious conditions in living organisms, such as chronic inflammation, cardiovascular diseases, diabetes, and neurodegenerative diseases. Therefore, the importance of lipid influence in biomedical research should not be underestimated. [...] For now, lipidomics data cannot be easily integrated into current pipelines and it remains unclear of the particular lipid roles in metabolism, their exact function and impact in various biological processes. The aim of this study was to extend the applicability of the metabolic network analysis to lipidomics data. To do this, it was needed to build comprehensive metabolic and lipid-specific graphs, then update the currently existing pipeline for network analysis to suit lipid-specific analysis and test the pipeline on real datasets.

Presentation_Emelianova M. (slides)
Application of metabolome-transcriptome integration approach for detection of loci controlling flowering time of guar (Cyamopsis tetragonoloba (L.) Taub.)
Student: Elizaveta Grigorieva
Supervisor: Alexander Tkachenko
Guar (Cyamopsis tetragonoloba (L.) Taub.) is an annual legume crop native to India and Pakistan. Seeds of the plant serve as a source of galactomannan polysaccharide (guar gum) used in the food industry as a stabilizer (E412) and as a gelling agent in oil and gas fracturing fluids. There were several attempts to introduce this crop to countries of more northern latitudes. However, guar is a plant of a short photoperiod, therefore, its introduction to Russia is complicated by a long day length during the growing season. Breeding of the new guar varieties insensitive to photoperiod is slowed down due to the lack of information on functional molecular markers, which, in turn, requires information on guar genome. In this work presented an attempt to use integrative transcriptome-metabolome integration approach to understand the genetic determination of flowering time variation among guar plants with different in their photoperiod sensitivity. This study was performed on nine early and six delayed flowering guar plants with the goal of finding a connection between biomarkers and differentially expressed transcripts. Metabolome-transcriptome integration was done by two different approaches: WGCNA and Shiny GAM.

Presentation_Grigorieva E. (slides)
Human exome variant database construction
Student: Mary Futey
Supervisors: Alexander Tkachenko; Yury Barbitoff (Bioinformatics Institute)
Next generation sequencing has greatly increased the amount of data available for both research and clinical uses. However, in order to utilize this data there is a need for both accurate tools and standardized analytic pipelines, as well as resources such as variant databases that capture variation across all populations. There are several large databases that are worldwide in scope, however they often under-represent certain populations, leading to initiatives that focus on these underserved groups. One such group is the various ethnic populations within Russia. We developed a pipeline to conduct variant calling on WES data from 1739 individuals from the Russian Federation. Lastly we conducted an over-representation analysis to assess the frequency of disease causing alleles in the Russian population compared with a reference population.

Presentation_Futey M. (slides)
Analysis of microRNA expression profiles in mechanical tissues of cultivated and wild varieties of flax (Linum usitatissimum)
Student: Angelica Dun
Supervisor: Alexander Tkachenko
MiRNAs were suggested to be the key players during flax stem development. Numerous studies have shown that indeed, this type of small RNAs demostrate a great impact on the regulation of both intrusive elongation and cell wall thickening which are the most important stages of flax development. Samples of phloem fibers at the late stages of development from three poorly studied Flax varieties (fiber, linseed and wild), were used in our work. Novel miRNAs and their targets were computationally predicted for all samples. Differentially expressed miRNAs specific for fiber cultivar were identified and their mRNA targets among differentially expressed genes were predicted.

Presentation_Dun A. (slides)
From the distribution of synapses to neural function in a circuit that mediates attention shifts
Student: Natalia Baymacheva
Supervisor: Karl Farrow (KU Leuven, IMEC)
Neurons receive through thousands of synapses distributed throughout dendrites. These synapses are transformed into electrical signals, which undergo specific integrations while travelling across the dendrite down to soma. Scientists have been studying dendritic computation properties for decades, but the exact mechanisms remain to be unravelled. One of the most arguable aspects of this question is the input location. Does it provide any significant influence on the output when considering in vivo network scale? To address this puzzle, we studied pathways in the superior colliculus (SC) responsible for evoking innate defensive behavior from visual stimuli. Wide-field (WF) neurons play the leading role in these pathways, receiving their inputs directly from retinal ganglion cells (RGC) 1 and inhibitory interneurons (Gad2). We used in-vivo recordings from RGC, Gad2 and WF neurons in mouse brain to build linear-nonlinear models of a WF neuron. Simple linear model fittings showed the correlation of RGC subtypes and the layer of WF neurons, to which they contribute. Implementation of the same model for inhibitory Gad2 recordings demonstrated strong inhibition in deeper, proximal to soma layers of WF neurons. When applying the activation function model to the data, no notable improvements have been observed, thus concluding that dendrites process local signals linearly. To fully tackle the problem of input locations, we still need to probe the built models on a different set of stimuli and comparison with other known models.

Presentation_Baymacheva N. (slides)
Development of 5`-end RNA sequencing data analysis method
Student: Liuaza Etezova
Supervisor: Alexander Tkachenko
RNA sequencing (RNA-seq) is a powerful tool to study gene regulation and functioning on a transcriptional level that has been successfully used in application to various scientific questions in a plethora of organisms. Sequencing of RNA 5'-ends is an important approach for studying gene regulation with a particular focus on transcription initiation level. Many program packages for analyzing 5'-end sequencing are at the disposal of researchers. The majority of them, however, fail to address special issues arising in the context of transcription initiation and regulation processes characteristic of different domains of life thus making necessary the development of a specialized approach that would take into account these differences. The aim of this study was to develop a bacterial 5`-end RNA sequencing data analysis method with prospects of application of this method
to Cappable-seq — specialized 5'-end sequencing method used for analysis of bacterial transcription start sites. In this work, we analyzed a dataset of matched samples sequenced with RNA-seq and Cappable-seq and implemented several functionalities on top of the existing ecosystem for 5'-end data analysis. Our implemented utilities allow assaying gene-expression on operon level as well as subtracting non-enriched libraries used in Cappable-seq.

Presentation_Etezova L. (slides)
Master's theses, 2020
Transcriptome analysis of myoblasts C2C12 with mutations in LMNA gene
Student: Oksana Ivanova
Supervisors: Renata Dmitrieva (Almazov Centre); Alexey Sergushichev
The nuclear lamina is a polymer located on the inner surface of a nuclear membrane. Lamina supports the structure of the nucleus, participates in the organization of chromatin, regulation of gene expression and the processes of cell division. The major components of nuclear lamina – proteins lamin A and C – are encoded by a single gene called LMNA. Mutations in the LMNA cause diseases that are united into the laminopathy group. These disorders include cardiomyopathy, neuromuscular diseases, myo- and lipodystrophy, and metabolic syndrome. Laminopathies caused by missense mutations p.G232E and p.R482L in LMNA affect skeletal muscle tissue. To date, treatment of laminopathy is symptomatic and there are no effective medications against disease. Despite the big number of fundamental scientific researches of LMNA mutations, the exact molecular mechanisms of disorder development and muscle specificity remain unknown. In this work, we investigate gene expression and molecular pathways of muscle tissue that was altered by mutations G232E and R482L in lamin A/C gene using cell model of myoblasts C2C12 and transcriptome analysis.

Presentation_O. Ivanova (slides)
Chromothripsis in a view of spatial organization of the genome
Student: Natalia Petukhova
Supervisors: Nikita Alexeev; Sergey Aganezov (Johns Hopkins University)
Chromothripsis is a mutational phenomenon representing a unique type of tremendous complex structural variation: initially described in cancerous genomes, as well as in other disorders, chromothripsis presents massive genomic alterations during a single cellular event characterized by the simultaneous shattering of chromosomes followed by random reassembly of the DNA fragments and subsequent ligation of broken segments' ends, ultimately resulting in newly formed, mosaic derivative chromosomes. The identification of such unforeseeable catastrophic instance has deeply modified the comprehension of the genesis and the etiology of complex genomic rearrangements and has provided new insights on cellular and molecular mechanisms for genomic instability and the role of genome maintenance pathways. Several nonexclusive mechanistic models have been proposed to explain the cause and high complexity of chromothripsis event but the molecular mechanism of such cellular catastrophe remains unclear and poorly understood, especially from the point of its prediction. The aim of present work is dedicated to analyze chromothripsis from the light of spatial genome organization and to answer such questions: do the chromothripsis rearrangements breakpoints appeared in cancer have the spatial predisposition at the genome organization of normal tissue; how the spatial location of chromothripsis breakpoints can be compared with other structural variations (SV) of non-chromothripsis origin; does the whole chromothripsis cluster has more spatial proximity within this region compared to other genome loci without chromothriptic events.

Presentation_N. Petukhova (slides)
Estimating gene priorities in complex traits based on GWAS summary statistics
Student: Nikita Kolosov
Supervisor: Mykyta Artomov (Massachusetts General Hospital)
The vast majority of human phenotypes, including diseases, are complex traits. The involvement of multiple genes and biological pathways in such phenotypes, among other factors, results in a relatively small contribution of each associated genetic marker. Genotyping array technology provides an affordable tool to find the genetic nature of the disease. Nevertheless, major complication in understanding disease biology only from GWAS often arises from inability to directly identify a complete set of causal genes. <...> We developed a novel Positive-Unlabeled (PU) learning based gene prioritization method - Gene Prioritizer (GPrior), intended for prioritizing disease-relevant genes given a matrix of gene-level features and sets of reliably causal genes. It is an ensemble of five PU bagging classifiers that finds the optimal combination of the predictions among individual PU algorithms. We tested our approach on both simulated and experimental data and estimated gene priorities for several traits (Schizophrenia, Education attainment, IBD and coronary-artery disease). GPrior delivers significantly better prediction qualities compared to individual PU-learning algorithms, conventional ML approaches, and other gene-prioritization tools used in the field. GPrior is yet not another fine-mapping approach rather it is a gene-level prioritization tool using hidden patterns of functional relatedness among the disease-relevant genes. At the same time GPrior is complementary to any fine-mapping approach and GWAS results post-processing. Altogether, GPrior fills an important and currently underdeveloped niche of methods for GWAS data post-processing, significantly improving the ability to pinpoint disease genes compared to existing solutions.

Presentation_N. Kolosov (slides)
Integration of RNA-sequencing data into phenotype search system GeneQuery
Student: Boris Shpak
Supervisors: Alexander Predeus (University of Liverpool); Maxim Artyomov (Washington University in St. Louis)
GeneQuery is a novel geneset-based phenotype search engine that can be applied across all publicly available microarray experiments independent of the curation status. It utilizes Weighted Gene Correlation Analysis (WGCNA) unsupervised clusterization algorithm that identifies groups of genes that are co-regulated across the samples of each study. Despite being the first search engine spanning virtually all of published microarray studies for human, mouse, and rat, an obvious limitation of GeneQuery was its inability to search RNA-seq data, which became the method of choice for gene expression profiling during the last 10 years. Thus, this work features an update of GeneQuery that would allow us to search most of the published RNA-seq data. We also discuss experimental validation of some targets discovered using GeneQuery. In our earlier studies, GeneQuery revealed an unexpected connection between the transcriptional signatures of TREM2-deficient microglia and a portion of the aging-associated expression signature consisting of genes responsive to α/γ-tocopherol treatment of the mouse brain. In this work we find additional evidence of a specific transcriptional signature of TREM2-dependent microglia inflammation that is upregulated in aging murine brain and can be reversed by α/γ-tocopherol treatment. The obtained results allowed us to rethink the previous design of validation experiments. Expression signature analysis presented in this thesis started experiments to assess the efficacy of administering α/γ-tocopherol to TREM2(–/–) microglia cell culture (a model of Alzheimer's disease exacerbated by TREM2-deficiency) for mitigating pyroptosis induced by damage-associated molecules.

Presentation_B. Shpak (slides)
Chromosome-scale genome assembly from long noisy reads using Hi-C data
Student: Anton Zamyatin
Supervisors: Pavel Avdeyev (George Washington University); Nikita Alexeev
New studies of genome rearrangements cannot be provided without chromosome-level assemblies. The contiguity of genome scaffolds allows better understanding of the organization of chromatin inside the cell nucleus. Possibility to sequence long repeat regions provides insights into the organization of heterochromatin, large centromere, and telomere regions. However, only long reads sequencing will probably not achieve this level of genome contiguity. It can be that sequencer cannot read particular regions at all. In that case, we need good scaffolding. If we have a reference genome, there are no problems with this, but it is more complicated if there is no reference - we have to use an additional source of information. In the past, the best choice was to use mate-pairs reads. Now we have an incredible source of information about proximities in genome Hi-C. Hi-C method is excellent for scaffolding but has some issues with low signal regions and ambiguity in haplotype regions. After the finish of assembly and scaffolding, genome assemblies must be validated to avoid misassembles and misjoints. The present thesis is about all of these stages of chromosome-scale genome assembly during execution of two genome assembly projects - Mosquitos and Barncles projects.

Presentation_A. Zamyatin (slides)
Construction of the GATK4-based pipeline for Russian Exome Project
Student: Mrinal Vashisth
Supervisor: Yury Barbitoff (Bioinformatics Institute)
Lack of Russian variant compendium represents a major gap on the genetic map of the world. Having such a compendium can greatly enrich our understanding of variation in global populations. The Genome Russia Project is unlikely to get completed soon. For the time being efforts are directed towards releasing a draft variant database using a few hundred russian exomes. A draft of the database has already been formed with the data analysis based on the Genome Analysis Toolkit (GATK3), but uniform reanalysis of samples with newer tools (i.e., GATK4) is necessary. During this project, a variant analysis pipeline based on GATK4 Best Practices has been developed. The pipeline is deployable on an HPC cluster within a containerized environment. The constructed pipeline was used for re-analysis of 1276 exome samples. The resulting variant dataset was used to compute allele frequencies, which were compared with other data sources such as the Genome Aggregation Database (gnomAD). Furthermore, statistical analyses were done for the monogenic disease prevalence in Russian population based on known pathogenic variants. Finally, we established a variant browser to make the data publicly available. This will be the first step towards developing a database similar to gnomAD comprising exome germline variants for the Russian population.

Presentation_M. Vashisth (slides)
Using RNA-sequencing data for diagnosing rare Mendelian diseases
Student: Maria Romanova
Supervisor: Alexey Sergushichev
Mutations in Mendelian diseases are located within the single genetic locus, they have low frequency but high effect size. One of the methods for finding such mutations can be RNA-sequencing analysis. It enables expression comparison between individual sample versus control samples, thus it can reveal expression outliers and imbalances in allele expression. Transcriptional level information in RNA-sequencing data can help in the discovery of novel splicing events. Validation of coding changes that impact RNA expression and splicing usually is done with RNA sequencing analysis among many other functional tests. And variant calling is also available. Thus, RNA sequencing can serve as another complementary method to confirm the diagnosis, as well as an independent method with a number of advantages. Thus, the main point of this work was to create an automated reproducible pipeline of tools that are most suitable for analyzing RNA- sequencing data in order to obtain a list of a prioritized candidate or even causative genes for help in the diagnosis of rare Mendelian diseases.

Presentation_M. Romanova (slides)
Investigation of mutations associated with autism in a cohort of children according to exome sequencing
Student: Ekaterina Gibitova
Supervisor: Pavel Dobrynin
Autism spectrum disorder (ASD) includes a group of neurodevelopmental disorders characterized by social defects and stereotyped behavior. It is shocking that in most cases, the etiology of ASD is unclear, but it is generally believed that ASD has a strong genetic link. There is currently no consensus on which genes have sufficient evidence to support the relationship with ASD. Between the research team and the clinical sequencing team, estimates of the number of genes related to ASD vary widely, ranging from a few to a few hundred. The purpose of this project is to discover unique mutations associated with ASD in a cohort of 194 subjects.

Presentation_E. Gibitova (slides)
Evolution of CRISPR-Cas systems and their distribution across geographic locations
Student: Sedreh Nassirnia
Supervisors: Mikhail Rayko (SPbU), Alexander Tkachenko
CRISPR-Cas systems are adaptive immunity that is present in the majority of archaea, about 90 percent, and almost half of the bacteria. CRISPR-Cas can capture fragments which are originated from invasive DNA sequences (spacers), such as viruses, bacteriophage for bacteria or plasmids and create a sequence-based array for cleaving viral mobile elements, and also ancillary DNA that can be either taken by transformation, natural acquisition and transduction or also target self chromosome or plasmids that are presented inside the cell. Characterization and study the evolution of CRISPR-Cas systems not only provided a better understanding of defense mechanisms in prokaryotes but also is necessary knowledge for genome editing.
CRISPR-Cas systems are under rapid evolution, and due to the additional horizontal gene transfer events, there are different combinations of Cas proteins that give rise to multiple types of CRISPR-Cas systems. Therefore, it is quite challenging to study all these diversities from an evolutionary point of view. The aim of this project is to discover the diversity and distribution of different varieties of CRISPR-Cas systems based on their effector complex (Cas proteins) across the phylogenetic tree.
We were able to identify different functional clusters of the Cas-related proteins. We showed that multiple clusters are present in major phyla, implying a high degree of HGT, and at the same time we found phyla associated with single clusters that may have evolved in isolation from bacteriophages.

Presentation_S. Nassirnia (slides)
Reconstruction and analysis of viral phylogenetic networks
Student: Daria Nemirich
Supervisor: Nikita Alexeev
To date, viral epidemics represent a significant threat to public health. In the last decade, at least seven viral outbreaks (COVID-19, Ebola, MERS-CoV, H1N1, H7N9 and others) have occurred resulting in numerous human deaths. In order to prevent disease spread, monitoring of its current state is highly necessary. In recent years, with the introduction of next-generation sequencing, it has become much easier to obtain comprehensive data for the pathogen samples. As a result, it is now possible to establish detailed and accurate information on the outbreak source, transmission chains and viral population composition. However, despite the abundance of the software created to serve the aforementioned objectives, there are still unresolved problems, such as the absence of an adequate system for detection of recombination events and the usage of too simplified viral populations simulations. This work aimed to address the challenges mentioned above, by creating the simulation pipeline, which includes all aspects of viral evolution within a single host, such as mutations, recombinations, changes in haplotypes fitness values and size of the population. Besides, the probabilistic model that manages recombination events was developed.

Presentation_D. Nemirich (slides)
Differential selection in the rhizosphere microbial communities of wheat and rye
Student: Ksenia Maximova
Supervisor: Ilia Korvigo (Ksivalue; All-Russia Research Institute for Agricultural Microbiology)
An understanding of how microbial communities interact with plants under various environmental conditions might yield insights into macroecological processes. Since the next-generation sequencing analysis has become available, a lot of statistical methods have been adapted for research in ecology to help identify microbial signatures (groups of taxa) that are associated with some ecological patterns. Interactions between plants and microorganisms are reasonably obvious around plant roots, and the evidence of long-range plants specific responses in the bulk soil is overgrowing. However, this scientific field is covered by an insufficient number of studies, mainly due to the diversity and complexity of specific plant responses in soil communities. Multiple studies have underpinned the necessity of the evaluation of host-microbiome interactions for effective crop rotation and the prevention of soil deterioration. In this regard, proper modelling of plant-microbe interactions is a crucial step toward the rational exploitation of the microbiota for agricultural management.

Presentation_K. Maximova (slides)