Course projects
During the second semester, each student is involved in a Systems Biology project, which may be further developed into a Master's thesis. The projects are supervised by the leading experts from Russian and foreign scientific centers working in the field of bioinformatics.
Systems Biology projects of different years:
Spring 2020
Spring 2019
Projects, 2020
Pilot analysis of the inherited predisposition to lung cancer recurrence
Student: Valeria Rezapova
Supervisor: Mykyta Artomov (Massachusetts General Hospital)

Lung cancer is one of the most important medical and social-economic problems in most developed countries of the world due to its leading position in the structure of cancer incidence and mortality. Recurrence occurs in 30% of lung cancer patients after radical therapy and about 50% of patients with recurrence die within 2 years. Recurrence could be predicted using molecular, immunohistochemical methods and clinical observations of the epithelial changes. Here, we analyzed pilot exome sequencing data from 3 groups of lung cancer patients - 10 samples with distant recurrence, 10 samples with locoregional recurrence and 10 controls with no recurrence. We describe optimal quality filtering and association study strategies and two gene candidates, with prior evidence of involvement in cancer progression - PRSS21 and KRT6A.
Melanoma somatic variant and copy number alterations data analysis
Student: Dmitrii Usoltsev
Supervisor: Mykyta Artomov (Massachusetts General Hospital)

Melanoma is the deadliest skin cancer. In 2020, it is estimated that there will be 100,350 new cases of melanoma and 6,850 deaths in the United States. Unlike other cancers, melanoma has a clear pattern of onset mediated by carcinogen exposure – UV light. Here we present an analysis of lab-generated melanoma tumors obtained by UV-irradiation of healthy melanocyte cell line and further grown in immunocompromised mice. We show that artificial melanoma obtained in this way is different from majority of clinically obtained melanomas in patients. Specifically, ultraviolet mutational signature discriminates between the melanoma origins. Although natural and lab-grown tumors are very similar according to the overall gene signature. Using the somatic variant and copy number variation calling approaches, three gene candidates were found LCE3C, IFIH1 and MEIS2. Deletion of LCE3C is crucial for skin barrier function. IFIH1 involves in tumor cell apoptosis. MEIS2 is important for cancer pathogenesis. All these genes significantly influence survival of patients with melanoma.
Analysis of scRNA-seq data of mitral valve disease in K/BxN mouse model
Student: Maria Firulyova
Supervisor: Konstantin Zaitsev
The heart valves are structures composed of different cell populations, and both the mitral and tricuspid valves are characterized by specific properties in terms of morphological, physiological and other levels, including cell-type composition. Due to these differences, the valves don't equally contribute to the progression of some diseases — in particular, the mitral valve disease. However, the certain factors which play the role of a predisposition to the manifestation of mitral valve disease are still unclear. The project is focused on secondary analysis of single-cell RNA sequencing of both valves under the control and inflamed condition. The analysis results covered multiple topics including annotation of cell populations which are presented in the data, cellular type composition changes during disease progression in the mitral valve, and the role of endothelial-mesenchymal transition in the development of mitral valve disease.
Microarray analysis of gene expression induced by vitamin D3 and its analogs in a human THP-1 monocytic cell line
Student: Marina Terekhova
Supervisors: Ferdinand Molnar (Nazarbayev University), Alexey Sergushichev
It is well established that besides regulating bone and calcium homeostasis vitamin D is involved in a multitude of fundamental cellular processes, including the cell cycle, apoptosis and differentiation, as well as effects on carcinogenesis, immune function, autoimmune diseases and cardiovascular disorders. Since supraphysiological doses of 1,25(OH)2D3, that are necessary to obtain these non-classical effects, result in hypercalcemia, a huge variety of analogs were developed to minimize the calcemic side effects while preserving or augmenting the beneficial effects of 1,25(OH)2D3. The development of DNA microarray technologies has created the opportunity to investigate the effects of 1,25-dihydroxyvitamin D3 on the gene expression profile in various cell types. So, in this study, we aimed to examine changes in gene expression associated with the vitamin D3 and its analogs stimulation in THP-1 cells. THP-1 monocytic human cells were treated with 4 hour 100 nM 1,25-Dihydroxyvitamin D3, 100 nM Gemini or 100 nM TX527 and the gene expression were examined using Illumina HumanHT-12 V4.0 expression BeadChip. Limma based gene prioritization identified a set of 224 genes with adjusted p-values < 0.05 for 1,25-dihydroxyvitamin D3 versus ethanol (113 up- and 111 downregulated genes), a set of 742 genes - for Gemini versus ethanol (268 up- and 474 downregulated genes) and a set of 1166 genes - for TX527 versus ethanol (429 up- and 737 downregulated genes). Differences in gene expression between stimulations with vitamin D preparations were generally implicit and by further analysis of overlapping genes, using published data for THP-1 cells, 15 up- and 3 downregulated genes were selected as vitamin D key target genes. Most of these vitamin D-induced genes are related to immune system regulation. Overall picture of gene set enrichment analysis showed a very low number of significant results, however overlapping with vitamin D pathways and ChIP-Seq datasets was clearly evident.
RNA-seq analysis of early and delayed guar (Cyamopsis tetragonoloba) varieties
Student: Elizaveta Grigorieva
Supervisor: Alexander Tkachenko
Guar (Cyamopsis tetragonoloba (L.) Taub.) is an annual legume crop widely cultivated in India and Pakistan. Guar beans are a source of guar gum that is used in many industries - food, cosmetics, and oil. However, lack of information about guar genome imposes difficulties in breeding this culture across the word and specially in Russia. The main challenge in guar breeding is day length, because guar is a short day plant and the most of guar varieties cannot start to flower under Russian weather conditions. In this study RNA-Seq technology and metabolome investigation were used with a goal of indicate responsible to photoperiodic sensitivity genes by integrative omics approach for 6 early and 9 delayed guar varieties.
SNP calling in transcriptomic and genomic data from guar, Cyamopsis tetragonoloba
Student: Aleksandar Beatovich
Supervisor: Alexander Tkachenko

The guar plant, Cyamopsis tetragonoloba, is a leguminous annual herbaceous plant that contains in its seed galactomannan, a polysaccharide with multiple industrial applications. For this reason identifying SNPs would provide useful information to plant breeders and allow an enhanced exploitation of this plant. In this article SNPs were called from guar RNA-seq data and a novel reference genome was assembled. The 20800 SNPs reported from the KisSplice pipeline are consistent with previously reported numbers of SNPs from guar RNA-seq data. The high number (102778) of SNPs reported by GATK-rna pipelines is at an discrepency with the previously reported number of SNPs in RNA-seq data indicating possible false positives. The novel guar genome assembly although an improvement on its previous draft assembly needs to have its level of fragmentation further reduced in order to be used as a reference ATK cohort genotyping pipeline due to a runtime issue in one of the steps of the pipeline.
Re-annotation of the Intoshia linei genome (Orthonectida)
Student: Elizaveta Skalon
Supervisors: George Slyusarev, Natalya Bondarenko (St. Petersburg State University)
Intoshia linei is an invertebrate organism belonging to the phylum Orthonectida. Orthonectids are parasites of various marine animals. For a long time, orthonectids were considered to be a basal group due to their extremely simplified nervous and muscle system. However, recent molecular studies are placing them among highly specified annelids, the segmented worms. Because of the many exciting aspects of orthonectids biology, they are poised to address many integrative organismal biology questions. Discovering orthonectids genomes is an essential step towards revealing the global mechanism which is behind a secondary reduction of parasitic organisms. Our main goal was to improve the accuracy of the existing genome annotation of Intoshia linei in light of new RNA-sequencing data. We sequenced and assembled additional Intoshia linei adults transcriptome and re-annotated the genome with MAKER pipeline. The annotation assessment results demonstrate that the updated annotation contains more structural elements and is more consistent with our RNA-seq data.
The role of the microbiome in Parkinson's Disease
Student: Mary Futey
Supervisor: Dmitry Rodionov (Sanford-Burnham-Prebys Medical Discovery Institute)
Several studies analyzing the role of Parkinson's Disease indicate that gut microbiome dysbiosis co-occurs with PD. However, there is still a lack of consensus in certain areas, such as the prevalence of potential protective or detrimental bacterial families and the change in alpha diversity in PD. This report compares the microbial community composition among and between the gut microbiomes of 89 patients with confirmed PD and 66 healthy controls, utilizing 16s RNA marker gene sequencing. PD samples had higher alpha diversity was and decreased levels of butyrate producing families. Future studies should include the use of potential confounders, such as diet, medications and demographics and the functional prediction of the microbiomes.
Lipid network analysis
Student: Mariia Emelianova
Supervisor: Alexey Sergushichev
This work is dedicated to metabolic network analysis, which has a great advantage of possibility to identificate new metabolic pathways. But for now, it is still not possible to fully integrate lipidomic data with metabolic network. This project is dedicated to creating lipid network for lipid data and testing the concept of lipid network analysis.
Identification of novel miRNAs in Flax Stem
Student: Anzhelika Dun
Supervisor: Alexander Tkachenko

Linum Usitatissimum also known as Flax is one of the most widespread cultivated plants. Linen which is the most popular fabric after cotton is made from flax stem fibres and its production represents a big part of textile industry. Unique properties of flax stem related to intrusive elongation mechanisms of which are poorly studied. 124 miRNAs of L.Usitatissimum are known nowadays and form 23 families, almost a half of them participate in intrusive elongation. Therefore, miRNAs are the key players in Flax development and can be used in order to improve fibres properties. We identified 150 novel miRNAs for 10 RNA-seq samples using highly accurate tool miRDeep2. The following filtering of dublicates and low quality candidates left 17 best predicted miRNAs. Our results after validation could expand base of L.Usitatissimum miRNAs and to help in clarification of intrusive elongation mechanisms.
Spatial reconstruction of Arabidopsis gene expression using NovoSpaRc
Student: Natalia Baymacheva
Supervisor: Alexander Tkachenko

Spatial information about gene expression is important for our understanding of how organisms function. A popular method of scRNAseq allows analysis of gene expression on single-cell level but most technologies lose spatial information during the sample preparation. Recent tool for gene cartography from single-cell expression data is able to regain the spatial information from expression data and some prior knowledge. This tool is tested on several animal models with some great results but the same methodology may not apply to plant single-cell data due to how root stem cell niche functions.
Comparison of genome assemblies for Nanger Dama
Student: Azat Mingaleev
Supervisor: Pavel Dobrynin
In this work, we analyzed three different genomic assemblies of the Nanger dama gazelle using sequencing techniques such as Hi-C, 10X, and Bionano. As a result of the analysis, problems with Bionano were identified and further work was carried out with two assemblies. In the end, it was found that for HiC and 10X the number of unaligned fragments was slightly less than 6%. In the future, due to technical problems with haltools and smash ++, it is planned to continue work on finding break points and rearrangements.
A Snakemake pipeline for identification of novel parts of the genome in bacteria
Student: Mikhail Lebedev
Supervisor: Sergey Nurk (National Human Genome Research Institute, NIH)

During this project, we developed an easy to use pipeline to get novel parts of sequenced bacterial genome, where it was not possible to get results this easily. Moreover, it comprehensively visualizes problems with possible uncorrected insertions and deletions (indels) errors in a given bacterial genome sequenced with Oxford Nanopore. Returned novel parts of bacterial genomes can be used to analyze and find new genes.
Projects, 2019
Chromothripsis: bioinformatic analysis of complex genomic rearrangements
Student: Natalia Petukhova
Supervisor: Sergey Aganezov (Johns Hopkins University)

Initially described in cancer genomes, and subsequently observed in constitutional disorders, chromothripsis constitutes a new class of massive genomic alterations characterized by the simultaneous shattering of chromosome segments followed by random reassembly of the DNA fragments during a single cellular event, to form complex derivative chromosomes.
The purpose of this study is to analyze genome of the patient with acute leukemia and presumably having chromothripsis in the basis of complex genomic rearrangements. The objective of this work is a bioinformatic analysis of DNA whole genome sequence using different bioinformatic tools for genomic aberrations searching and comparison/confirmation of the obtained results with cytogenetic study which still remains a "gold standard" diagnostic expertise in medical practice.
Producing a chromosome-scale mosquitos genome assembly from Oxford Nanopore reads
Student: Anton Zamyatin
Supervisors: Pavel Avdeyev (George Washington University), Nikita Alexeev

The main task of this project is a producing of the chromosome-level genome assemblies for An. coluzzii and An. arabiensis using different sequencing technologies. Long reads from Oxford Nanopore sequencing technology are used for draft assembly then short reads from Illumina are used for polishing and data from HI-C sequencing is used for mapping scaffolds. The second task is a developing of the working pipeline for eukaryotic genome assembly using these types of input data and state-of-art computational tools that are available for these purposes. At the end of the project, it must be two chromosome-level genome assemblies of two mosquito's species An. coluzzii and An. arabiensis that can be used in further population genomics studies and other chromosome-level analyses.
Estimating gene priorities in complex traits based on GWAS summary statistics
Student: Nikita Kolosov
Supervisor: Mykyta Artomov (Massachusetts General Hospital)

Genome-Wide Association Study or GWAS is a powerful tool for investigating genetic origin of complex traits. They are called complex since their development is affected by multiple genes and external factors. In this review, we will predominantly focus on such complex traits as polygenic diseases. Such diseases are caused by DNA alterations in several genes at once. As a result, it is hard to identify their biological mechanisms using family-based studies. GWAS allows one to get closer to solving this problem by looking on disease alleles prevalence in cohorts of cases and controls. In this article, we present a new approach, which allows to get probabilities for genes to be causal based on GWAS summary statistics using machine learning (ML) classification algorithms.
Signaling responses to personalized exercise therapy in skeletal muscle in heart failure patients
Student: Oksana Ivanova
Supervisors: Renata Dmitrieva (Almazov Centre), Alexey Sergushichev
Heart failure (HF) is one of the most widespread disorders in the world. Signs and symptoms of heart failure commonly include shortness of breath, excessive tiredness, and leg swelling as well as the limited ability to exercise. Moreover, HF often induces
skeletal muscle myopathies like cachexia and myodystrophy. However, there still does not exist a common treatment of muscle disorders in HF patients for improving the quality of their lives and increasing ability to exercise. This study is trying to describe the effect of physical exercise therapy in heart failure patients in order to define possible targets for pharmacological treatment of muscle wasting.
16s rRNA analysis of the gut microbiota in obesity and healthy state
Student: Ksenia Maksimova
Supervisor: Yulia Kondratenko (St. Petersburg State University)

In the last decade studying human gut microbiome is gaining popularity. There is much research about impact of microorganisms on the host, role of gut microbiota in diseases and about the prospect of the treatment different diseases by repairing gut microbiome. Recent studies already revealed the main species living in our body and the great interest of exploration in human microbiota issues so far is to link changes inside such community with particular human phenotype. One of the highest-profile demonstrations of the microbiota's influence on human health is the microbiome in obesity. In present study, will be characterized the gut microbiota community of obese phenotypes compared to the healthy.
Multi chain effect analysis from paired antibody repertoire data
Student: Sedreh Nassirnia
Supervisor: Maria Chernigovskaya
B lymphocytes (B-cells) are one of the most important parts of the adaptive immune system that are involved in immune response with secreting antibodies and producing B-cell receptors (BCRs) through multiple mechanisms for binding to a specific antigen. There are some checkpoints to ensure that only antibodies with two identical heavy chains (IGH) and two identical light chains (IGL) participate in the immune response against antigens. But sometimes antibodies are made up dual light chains that can show auto-reactivity in the immune system and leads to the generation of autoimmune disease in human and mouse samples. In this work, four human and three mouse 10-x genomics VDJ paired datasets were analyzed to investigate cells with a dual light chain. The IgBlast tool was used to filter out suspicious contigs from the datasets and further analyses.
Exome analysis of samples using GATK4 pipeline and database development
Student: Mrinal Vashisth
Supervisor: Yury Barbitov (Bioinformatics Institute)

Expanding GeneQuery transcriptional database into RNA-Seq space
Student: Boris Shpak
Supervisors: Alexander Predeus (University of Liverpool; Bioinformatics Institute), Maxim Artyomov (Washington University is St. Louis)
InCHIANTI dataset analysis
Student: Maria Romanova
Supervisor: Maxim Artyomov (Washington University is St. Louis)
A comparative analysis of viral outbreak networks reconstruction methods
Student: Daria Nemirich
Supervisor: Nikita Alexeev

FSHD diagnosis through Nanopore sequencing
Student: Ekaterina Gibitova
Supervisor: Pavel Avdeyev (George Washington University)