Team:TU Darmstadt/Project/Bio/Modeling/sec1
Contents
Computational Design of Riboswitches
Abstract
Even usage of already proposed riboswitches in individual projects often require to change nucleobases in RNA or DNA sequences without altering the structure formed by the original sequence. However, minor base interchanges in the sequence may result in completely different structures that would render the intended riboswitch unusable. We used state-of-the-art computational structure prediction algorithms to implement routines that automate the process of base pair interchanges, but preserve the original structure to ensure functionality of the riboswitch. Furthermore, we were able to develop a genetic algorithm that is able to completely design a new riboswitch. Those algorithms are provided in form of a webservice and may be used freely by the community.
Introduction
Biomolecular structure prediction is an essential part in the research of many biological and chemical areas. In the case of riboswitch design, we are especially intereseted in the RNA structure prediction problem. Due to the high conservation of base-pairings in RNA or DNA sequences, most computational methods for this problem direct their efforts to predicting the secondary structure of such sequences.
Goal
Our goal within this subproject is to use secondary structure prediction algorithms of RNA sequences to autonomously search for optimal base pair interchanges that preserve a specific secondary structure or obey conditions on the structure. Regarding riboswitches in particular, we would like to develop programs that can computationally design cis-repressing RNA sequences (crRNAs) that lock specific regions of the sequence and transacting RNA sequences (taRNAs) that unlock them again. Resulting riboswitches may be used by our safety group in the iGEM 2015 project and support the synthetic biology community by facilitating the handcrafted design of such riboswitches.
Results
The Algorithm
We managed to develop a genetic algorithm based on a particle filter that is able to design crRNA and taRNA sequence pairs acting as riboswitch to lock and unlock specific regions. Our algorithm follows three general steps depicted in Figure 1.
- A seed switch $s_{seed}=(crRNA, taRNA)$ containing initial sequences for the crRNA and taRNA
- A set of fixed positions $\mathcal{F}$ within the crRNA and taRNA that are not allowed to mutate
- A set of desired switch positions $\mathcal{L}$ within the crRNA sequence that should be locked by the crRNA and unlocked again by the taRNA
- A number $k$ of desired switch candidates to return and a population size $m\geq k$
- Initialize a particle population $S=\{s_{seed},s_{seed},\dots,s_{seed}\}$ containing $m$ copies of the seed switch
- Do until convergence:
- Mutation step: For each particle $s\in S$, mutate the switch by a random nucleo base:
- Uniformly draw a random position $x\not\in\mathcal{F}$ in $s$
- Uniformly draw a random nucleo base $b\in\{A, C, G, U\}$
- Replace the nucleo base at sequence position $x$ with $b$
- Evaluation step: Generate a probability distribution $p(s)$ over $S$ follows:
- Fold all crRNA sequences in $S$ and predict their secondary structures $\mathcal{S}_{cr}^1,\mathcal{S}_{cr}^2,\dots,\mathcal{S}_{cr}^m$
- Cofold all crRNA and taRNA sequences in $S$ and predict their secondary structures $\mathcal{S}_{ta}^1,\mathcal{S}_{ta}^2,\dots,\mathcal{S}_{ta}^m$
- Assess the quality of each riboswitch with the formula $$ \begin{aligned}f(s_i)=&c_1 \frac{\sum_{l\in\mathcal{L}} \phi(l, \mathcal{S}_{cr}^i)}{|\mathcal{L}|} + c_2 \frac{\sum_{l\in\mathcal{L}} (1-\phi(l, \mathcal{S}_{ta}^i))}{|\mathcal{L}|} \\ &- c_3 MFE(\mathcal{S}_{cr}^i) - c_4 MFE(\mathcal{S}_{ta}^i),\end{aligned} $$ where $c_1, c_2, c_3, c_4\in [0, 1]$ are weighting coefficients, $MFE$ returns the minimum free energy of a secondary structure and $\phi$ is defined by $$\phi(l, \mathcal{S})=\begin{cases}1 & \text{, if position } l \text{ is paired in } \mathcal{S} \\ 0 & \text{, otherwise}\end{cases}.$$ Note that $f$ asseses a weighted quality of the riboswitch by computing the crRNAs locking ability (fraction of paired bases in the region of interest), the taRNAs unlocking ability (fraction of unpaired bases in the region of interest) and the stability of the formed structures in terms of free energy.
- Update the candidate list with possible better riboswitches within the current generation $$ \mathcal{C}:=\underset{\{c_1,c_2,\dots,c_k\}\subset S\cup\mathcal{C}}{\mathrm{argmax}}\sum_{j=1}^k f(c_j) $$
- Generate a probability distribution $p(s_1),p(s_2),\dots,p(s_m)$ over the population by normalizing the quality values of each particle $$ p(s_i)=\frac{f(s_i)}{\sum_{k=1}^m f(s_k)} $$
- Mutation step: For each particle $s\in S$, mutate the switch by a random nucleo base:
- Selection step: Draw $m$ new paticles from $S$ according to the probability distribution computed in the evaluation step:
- Initialize a multiset $S_{new}:=\emptyset$
- Do $m$ times: $S_{new}= S_{new}\cup \{s\}$, where $s\in S$ is randomly drawn according to $p(s)$
- Set $S:=S_{new}$
Application: Name to be filled in
Our safety group was in need of a kill-switch for genetically engineered bacteria and provided us after some research with a sequence skeleton for a crRNA and taRNA riboswitch, where x
marks a mutable position:
BioBrick prefix and suffix |
YUNR sequences |
RBS |
crRNA skeleton:
GAAUUCGCGGCCGCUUCUAGAGxxxxxxxxxxxxxxxxxxxxxxxxxUUGGxxxxxxxAGAGGAGAxxxxxxxxx
xxUACUAGUAGCGGCCGCUGCAG
taRNA skeleton:
GAAUUCGCGGCCGCUUCUAGAGxxCCAAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxUACUAGUAGCGGCCGCUGCAG
The goal of this switch is to lock the RBS region (marked blue) with the crRNA sequence and unlock it again by cofolding with the taRNA sequence. We run our algorithm to replace all mutable positions (marked by x
) with nucleo bases, such that the crRNA and taRNA deactivate or activate the ribosomal binding site. The following figure shows the evolution of the particle population that has emerged from our running algorithm with a random initial seed switch:
Our method computationally designed the following three candidates for the ribosoaml binding side (blue RBS) as region of interest for switching: