Template:Heidelberg/software/maws
MAWS
Abstract
Aptazymes, i.e. fusions of a catalytic ribozyme or DNAzyme with one or several aptamers, make construction of nucleic acids-based circuits responding to external stimuli possible, as their catalytic activity can be activated in either the presence or absence of the cognate ligand. The design of aptazymes has been hampered by the need for communication modules, which translate the binding of a ligand to the aptamer into a change of function in the catalytic nucleic acid. Standard procedure herefore is selection on the one hand and rational design on the other, resulting in tedious work which is not certain to succeed. To this day, only few attempts have been made to automatize the design process, which all target a specifically designed chassis ribozyme. Here we describe a software "JAWS" (Joining Aptamers Without Selection) using a criterion for computational selection of communication modules based upon the partition function, as well as energetic and entropic data. This is then used in a random search-like algorithm to extract optimal communication modules without the need for selection.
Introduction
Designing communication modules for controlling functional nucleic acids usually requires either the use of known modules (insert cit here), or a selection process, whereby nucleic acids containing randomized modules are folded in the presence of the controlling ligand and assayed for their activity (insert cit here). This constitutes a tedious and lengthy process, with uncertain results, whereafter nucleic acids have to be folded in the presence of their ligand to display activity (insert Penchovsky here). This renders impractical all kinds of assays requiring more complicated switching behaviour, as increasingly complex arrays of communication modules would require increasingly complex selection schemes. Those selection schemes can prove restrictive to smaller labs, as well as iGEM teams, whose time and equipment-constraints do not allow for a selection process of multiple weeks. To alleviate the need for selection, a variety of methods have been developed for the case of the hammerhead ribozyme. These range from long, predefined communication modules (insert cit here) to computational methods involving an energetic criterion, as well as random search algorithms to construct dynamically switching modules (insert penchovsky, breaker ...). Although those approaches address the specific problem of the hammerhead ribozyme very well, they lack generality, as the hammerhead ribozymes used there are heaviliy modified to allow for an easy design process (insert cit here). Here we developed a completely general approach to the computational design of communication modules, relying on a variable set of constraints. We validated several computationally designed aptazymes with different combinations of aptamer and functional nucleic acid in vitro and found that all of them showed the desired switching behavior.
Algorithm
To generalize the design process of aptazymes we consider an aptamer inserted into a stem of the ribozyme of interest, adding a region of randomized nucleotides functioning as communication module. We constrain the active state of such an aptazyme to be such, that the aptamer is not bound by the rest of the structure, and all randomized nucleotides form a symmetric stem consistent with the consensus secondary structure of the native ribozyme. We constrain an inactive state to be such, that a part of the aptamer with a length of $N$ nucleotides forms a stem with the randomized nucleotides, and a portion of randomized nucleotides with length $N$. Hereafter, we shall call $N$ the shift of the communication module. By defining our communication module by a consensus structure, a length and a shift, we specify a constraint, checking for the existence of the active state consensus structure as well as the inactive structure, with a tolerance of $M$ wrong base pairs. We do so by inspection of the matrix of all base pair probabilities $P_{ij}$, yielded from the partition function $Z$ calculated for our sequence. We then discard all sequences not satisfying this constraint. Of the remaining sequences, only those that satisfy the constraint of the ensemble free energy difference between active and inactive state being less then the aptamer's interaction energy $\Delta{G_{Apt}}$ are retained. Using this approach, we arrive at functional communication modules for nucleic acids completely unrelated to the hammerhead ribozyme. In detail, the algorithm proceeds as follows:
Secondary structure prediction is performed using a partition function based approach using ViennaRNA (citation), which then allows for the extraction of the nucleic acid's partition function $Z=\Sigma_{I}e^{-\beta{G_{I}}}$, with $I$ an index enumerating all potential base pairs, loops and stacks, and $G_{I}$ the corresponding Gibbs free energy. This in turn allows for the construction of the probability matrix $P_{I}=\frac{e^{-\beta{G_{I}}}}{Z}$ consisting of the base pair probabilities $P_{ij}=\frac{e^{-\beta{G_{ij}}}}{Z}: i,j\in{{Bases}}$. The matrix $P_{ij}$ will be the basis of our calculations. To construct a communication module, we use the following steps:
- Given the position of a region $\mathcal{R}\subset{Sequence}$ comprising $2\cdot{N}+A$ bases, which should surround the aptamer $\mathcal{A}$ with length $A$, construct an active conformation. Henceforth, we shall refer to $N$ as the stem length and to the active conformation as conformation $\mathcal{I}$. This conformation is such, that the aptamer does not participate in base-pairing interactions with the rest of the functional nucleic acid. Furthermore, all $2\cdot{N}$ nucleotides of the region $\mathcal{R}\diagdown{\mathcal{A}}$ should participate in base pairing interactions, forming a stem of length $N$.
- Given $R$, $N$ and a shift $S$, construct an inactive conformation, consisting of the first $N$ bases in $R$ pairing with the bases $N+A-S$ to $2\cdot{N}+A-S$. This conformation has a stem displaced by $S$ nucleotides from the active conformation, disturbing ribozyme activity. The inactive conformation shall be refered to as conformation $\mathcal{II}$.
- Generate a sequence $\mathcal{S}$, with all nucleotides in $\mathcal{R}\diagdown\mathcal{A}$ randomised.
- Compute the partition function $Z$ of $\mathcal{S}$ and the base pair probability matrix $P_{ij}$ at $\theta{ = 37°C}$. Check, if base pairs conforming to conformation $\mathcal{I}$ exist. This is true iff $P_{ij}\gt{P_{threshold}} \forall{(i,j)\in{\mathcal{I}}}$. If it is true, accept the sequence and move on to the next step. Else, return to step 3.
- Using the base pair probability matrix $P_{ij}$, check if base pairs conforming to $\mathcal{II}$ exits. This is true iff $P_{ij}\gt{P_{threshold}} \forall{(i,j)\in{\mathcal{II}}}$. Requiring simultaneous conformity to $\mathcal{I}$ and $\mathcal{II}$ ensures that the structure is bistable, with local minima of free energy coinciding with the active and inactive structures, respectively. If this is true, accept this sequence, otherwise discard this sequence and return to step 3.
- Compute the ensemble free energies $\Delta{G_{\mathcal{I}}}, \Delta{G_{\mathcal{II}}}$ for the conformations $\mathcal{I}$ and $\mathcal{II}$ respectively. Check if $\Delta{G_{\mathcal{I}}}\lt{E_{threshold}} \wedge \Delta{G_{\mathcal{II}}}\lt{E_{threshold}}$. This ensures that the secondary structures are stable and do not unravel easily at 37°C.
- Compute the difference in ensemble free energies $\Delta{G} = \Delta{G_{\mathcal{I}}} - \Delta{G_{\mathcal{II}}}$ and constrain it to be smaller in its absolute value than a threshold $\Delta{E_{threshold}}$. This ensures that the bistability of the structure is given, and the structure switches conformation upon ligand binding.
- If all of the above apply, accept the sequence as a candidate sequence for the aptazyme. Else return to step 3.