Model optimisation and comparison
in systems biology
Systems biology relies on the ability to choose rationally between alternative models through a combination of experimental and theoretical arguments (Toni and Stumpf, 2010). This project will develop tools to optimise and compare stochastic systems biology models using the nested sampling algorithm (Skilling, 2006) that computes the Bayesian evidence.
This project is a collaboration between:- Stuart Aitken, School of Informatics and CSBE, University of Edinburgh
- Ozgur Akman, Centre for Systems, Dynamics and Control, University of Exeter
- Andrew Millar, School of Biological Sciences and CSBE, University of Edinburgh
Model selection answers the question: Given two or more models, and one or more data sets, which model structure (topology) explains the data best? Calculating the Bayesian evidence for a model is a quantitative approach to answering this question. The Bayesian evidence is the result of an integration over parameter values, rather than a point estimate of the goodness of fit for some specific combination of parameter values - which will typically only be guaranteed to be locally optimal, and therefore limit the comparisons that can be made between the models.
The model comparision functions will be delivered to users by incorporating
them into a new version of the popular stochastic simulation tool
Dizzy
(Ramsey et al, 2005) "Dizzy-Beat"
[Bayesian evidence analysis tools].
R and Matlab implementations of the key algorithms will also be released.
All code will be made available open source via this SourceForge site:
https://sourceforge.net/p/bayesevidence/
A number of use cases will provide data and models that will validate the computational methods. The most important of these comes from the Millar Lab where one, two and three loop models of circadian rhythms in Arabidopsis thaliana are being fitted to data obtained under varying experimental conditions. The majority of clock models that have been developed thus far, including those for Arabidopsis, are sets of coupled deterministic differential equations. However, stochastic versions of these models are becoming increasingly popular tools in computational circadian studies. Such models are considered to more accurately reflect the cellular environment in which the relatively small numbers of mRNAs and proteins comprising the clock yield significant stochastic fluctuations in expression levels; consideration of stochastic effects is thus necessary to fully quantify the molecular mechanisms controlling circadian timing.
Moreover, by being more representative of laboratory conditions, stochastic models can generate predictions that lie outside the scope of purely deterministic ones. For example, a recent stochastic model of the reduced Arabidopsis clock that is found in the unicellular alga Ostreococcus tauri gave insights into the optimal molecular species to use as an experimental phase maker in this system, as well as the minimum cellular population required to determine whether free-running oscillations occur at the single-cell level (Akman et al, 2010). Taken together, these considerations indicate that stochastic models are key to constructing viable likelihood functions when using Bayesian approaches to infer parameter values.