Bioinformatics of carbonic anhydrases

The focus of this research consists of bioinformatic studies of carbonic anhydrases (CA), ubiquitous enzymes which are found in all kingdoms of life, focusing on CAs in animals. The ultimate goal is to increase our understanding of how and why various CA isoforms and CAs in different species are different from each other. This would be useful in understanding drug specificity, and in the design of novel, improved forms of CAs. An intermediate goal in this process is the discovery of most likely evolutionary histories of CAs.
The project is run by Martti Tolvanen in collaboration with the group of Seppo Parkkila (University of Tampere).

Conserved domain and evolution of secreted phospholipases A(2)

Secreted phospholipases A(2) (sPLA(2) s) are lipolytic enzymes present in organisms ranging from prokaryotes to eukaryotes but their origin and emergence are poorly understood. We study conserved domains of sPLA(2)
and proposed a model for their evolution.

Protein hydropathy predictions

We investigated the prediction accuracy of 56 hydropathy scales by correlating predicted values with the accessible surface area in known protein structures. Results for different amino acids vary greatly within each scale. We also investigated prediction accuracies of amino acids separately in secondary structural elements and in protein fold families.

Gene expression on the theory of phase synchronization

Using phase synchronization it is possible to detect biological
associations for gene pairs with cell cycle-specific expression profiles.  Phase-synchronization clustering is able to detect biologically associated gene pairs that have linearly correlated (simultaneous and inverted) as well as time-delayed expression profiles.


Tools for the submission of mutations to databases and maintenance of locus-specific mutation databases. Advanced, integrated computer systems are needed to store and organize the increasing mutation information.

Turku BioNLP Group

The main focus of the BioNLP group is the development of text mining algorithms and their utilization in knowledge discovery in biology and medicine. The flagship resources developed by the group are the TEES text mining system which won several text mining competitions, and the EVEX resource comprising detailed information about genes, proteins, and their mutual relationships mined from the entire publicly available biomedical literature.  The BioNLP group has a number of collaborative projects with universities worldwide.

Research utilizing the Auria biobank data

The Auria biobank ( is a project to centralize the storage of patient samples and related information from Turku area hospitals and to provide the means for scientists to utilize this large dataset for research purposes. The IT-department is involved in the Auria-project and we will utilize our expertise in machine learning and large scale data mining to apply the biobank data to different research questions.