Overview

The production of high quality biological data has increased tremendously over the last years. The complete genomic DNA sequences for several model organisms, including the Human genome, are now available. We also see the emergence of new sources of data, such as mass spectrometry, 2D gel electrophoresis, yeast two-hybrid systems, ChIP-Chip, ChIP-Seq, and gene expression data from DNA microarray experiments. Today's challenges are therefore to model the higher levels of complexity of the cellular processes and to integrate heterogeneous sources of data. Bioinformatics is a new field of research that applies mathematical and computer science theories to organize, model and help understand fundamental biological and biomedical problems.

Research Activities

New high-throughput projects and techniques have shown that a significant fraction of the genome of higher organisms is represented in primary transcripts. Furthermore, comparative sequence analysis revealed that many regions are evolutionary conserved, which raises the question: what is all that non-coding RNA used for? The proposed research program is for the development of bioinformatics tools to assist the identification and annotation of functional RNA elements.

In recent years, our group has developed tools for the simultaneous alignment and structure prediction of RNA sequences, as well as for the discovery of structure motifs using suffix arrays algorithms. With our life science collaborators, we studied the IRES motifs in the UTRs of mammalian genes, we explored the landscape of RNA secondary structures in subviral RNA pathogens, and developed computational approaches to investigate the trans-splicing activity in the mitochondrial genome of diplonemids.

The new research program described here extends these approaches and comprises two themes:

Identification of RNA-RNA interaction motifs: discoveries in the mitochondrial genome of Euglenozoa;
Breaking the genome's regulatory code: a logical and relational learning approach.

RNA elements are playing critical roles in the cell, the dysregulation of the associated processes is often associated with disease state, consequently progress in understanding them will therefore have a direct impact on human health, agriculture, and on our understanding of cellular biology in general.

Facilities

We are sharing a powerful computing infrastructure with the other research groupd at SITE. This includes an IBM p595 muti-user, multi-processor (64 cores) server with large central memory (128 Gbytes), large disk capacity (2 Tbytes). We also have a Sun V20z server (2 AMD Opteron, 8 Gb), small Linux server, mac mini server, several iMac (21" and 27"), etc.

Current Projects

Some of the projects we are currently working on are:

ModuleInducer: a user friendly environment to analyze ChIP-Seq data using inducive logic programming
mPmS: an effecient pattern matcher for mutiple components multiple sequence RNA patterns

Funding

The activities of this research group are funded in part by:

NSERC Discovery Grant
CFI (New opportunities, together with Ali Miri and Lucia Moura)