Computational Design of Riboswitches

Abstract

Even usage of already proposed riboswitches in individual projects often require to change nucleobases in RNA or DNA sequences without altering the structure formed by the original sequence. However, minor base interchanges in the sequence may result in completely different structures that would render the intended riboswitch unusable. We used state-of-the-art computational structure prediction algorithms to implement routines that automate the process of base pair interchanges, but preserve the original structure to ensure functionality of the riboswitch. Furthermore, we were able to develop a genetic algorithm that is able to completely design a new riboswitch. Those algorithms are provided in form of a webservice and may be used freely by the community.

Introduction

Biomolecular structure prediction is an essential part in the research of many biological and chemical areas. In the case of riboswitch design, we are especially intereseted in the RNA structure prediction problem. Due to the high conservation of base-pairings in RNA or DNA sequences, most computational methods for this problem direct their efforts to predicting the secondary structure of such sequences.

Goal

Our goal within this subproject is to use secondary structure prediction algorithms of RNA sequences to autonomously search for optimal base pair interchanges that preserve a specific secondary structure or obey conditions on the structure. Regarding riboswitches in particular, we would like to develop programs that can computationally design cis-repressing RNA sequences (crRNAs) that lock specific regions of the sequence and transacting RNA sequences (taRNAs) that unlock them again. Resulting riboswitches may be used by our safety group in the iGEM 2015 project and support the synthetic biology community by facilitating the handcrafted design of such riboswitches.

Results

The Algorithm

**Figure 1** Our genetic algorithms iterates three elemental phases for the evolution of a population of riboswitches. Mutation causes new riboswitches falling into the population, while probabilistic sampling will eliminate the majority of bad switches.

We managed to develop a genetic algorithm based on a particle filter that is able to design crRNA and taRNA sequence pairs acting as riboswitch to lock and unlock specific regions. Our algorithm follows three general steps depicted in Figure 1.

Our full algorithm for automated design of riboswitches is described as follows:

Input:

A seed switch $s_{seed}=(crRNA, taRNA)$ containing initial sequences for the crRNA and taRNA
A set of fixed positions $\mathcal{F}$ within the crRNA and taRNA that are not allowed to mutate
A set of desired switch positions $\mathcal{L}$ within the crRNA sequence that should be locked by the crRNA and unlocked again by the taRNA
A number $k$ of desired switch candidates to return and a population size $m\geq k$

Initialize a particle population $S=\{s_{seed},s_{seed},\dots,s_{seed}\}$ containing $m$ copies of the seed switch
Do until convergence:
1. Mutation step: For each particle $s\in S$, mutate the switch by a random nucleo base:
  1. Uniformly draw a random position $x\not\in\mathcal{F}$ in $s$
  2. Uniformly draw a random nucleo base $b\in\{A, C, G, U\}$
  3. Replace the nucleo base at sequence position $x$ with $b$
2. Evaluation step: Generate a probability distribution $p(s)$ over $S$ follows:
  1. Fold all crRNA sequences in $S$ and predict their secondary structures $\mathcal{S}_{cr}^1,\mathcal{S}_{cr}^2,\dots,\mathcal{S}_{cr}^m$
  2. Cofold all crRNA and taRNA sequences in $S$ and predict their secondary structures $\mathcal{S}_{ta}^1,\mathcal{S}_{ta}^2,\dots,\mathcal{S}_{ta}^m$
  3. Assess the quality of each riboswitch with the formula $$ \begin{aligned}f(s_i)=&c_1 \frac{\sum_{l\in\mathcal{L}} \phi(l, \mathcal{S}_{cr}^i)}{|\mathcal{L}|} + c_2 \frac{\sum_{l\in\mathcal{L}} (1-\phi(l, \mathcal{S}_{ta}^i))}{|\mathcal{L}|} \\ &- c_3 MFE(\mathcal{S}_{cr}^i) - c_4 MFE(\mathcal{S}_{ta}^i),\end{aligned} $$ where $c_1, c_2, c_3, c_4\in [0, 1]$ are weighting coefficients, $MFE$ returns the minimum free energy of a secondary structure and $\phi$ is defined by $$\phi(l, \mathcal{S})=\begin{cases}1 & \text{, if position } l \text{ is paired in } \mathcal{S} \\ 0 & \text{, otherwise}\end{cases}.$$ Note that $f$ asseses a weighted quality of the riboswitch by computing the crRNAs locking ability (fraction of paired bases in the region of interest), the taRNAs unlocking ability (fraction of unpaired bases in the region of interest) and the stability of the formed structures in terms of free energy.
  4. Update the candidate list with possible better riboswitches within the current generation $$ \mathcal{C}:=\underset{\{c_1,c_2,\dots,c_k\}\subset S\cup\mathcal{C}}{\mathrm{argmax}}\sum_{j=1}^k f(c_j) $$
  5. Generate a probability distribution $p(s_1),p(s_2),\dots,p(s_m)$ over the population by normalizing the quality values of each particle $$ p(s_i)=\frac{f(s_i)}{\sum_{k=1}^m f(s_k)} $$
3. Selection step: Draw $m$ new paticles from $S$ according to the probability distribution computed in the evaluation step:
  1. Initialize a multiset $S_{new}:=\emptyset$
  2. Do $m$ times: $S_{new}= S_{new}\cup \{s\}$, where $s\in S$ is randomly drawn according to $p(s)$
  3. Set $S:=S_{new}$
Return the candidate set $\mathcal{C}$ containing crRNA and taRNA pairs serving as riboswitches

A free web-interface to our implementation of this algorithm running on a server is provided at rsdnerf.com and may be used by the community to aid the design of their own riboswitches. Note that evaluation step 1. and 2. make use of our artificial neural network for secondary structure predictions! </div>

Application: Name to be filled in

Our safety group was in need of a kill-switch for genetically engineered bacteria and provided us after some research with a sequence skeleton for a crRNA and taRNA riboswitch, where x marks a mutable position:

BioBrick prefix and suffix YUNR sequences RBS

crRNA skeleton: GAAUUCGCGGCCGCUUCUAGAGxxxxxxxxxxxxxxxxxxxxxxxxxUUGGxxxxxxxAGAGGAGAxxxxxxxxx xxUACUAGUAGCGGCCGCUGCAG

taRNA skeleton: GAAUUCGCGGCCGCUUCUAGAGxxCCAAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxUACUAGUAGCGGCCGCUGCAG

The goal of this switch is to lock the RBS region (marked blue) with the crRNA sequence and unlock it again by cofolding with the taRNA sequence. We run our algorithm to replace all mutable positions (marked by x) with nucleo bases, such that the crRNA and taRNA deactivate or activate the ribosomal binding site. The following figure shows the evolution of the particle population that has emerged from our running algorithm with a random initial seed switch:

**Figure 2** The initial seed sequence resulted in an initial particle population of high unlocking ability (more than 80% of the desired region was unlocked by the taRNA), but a very bad locking ability (less than 13% of the desired region was locked by the crRNA). However, the evolution process moved the population to riboswitches with high locking abilities (above 50%) while retaining the good unlocking abilities (above 70%).

Our method computationally designed the following three candidates for the ribosoaml binding side (blue RBS) as region of interest for switching:

Team:TU Darmstadt/Project/Bio/Modeling/sec1

Contents