Template:Heidelberg/software/maws

MAWS

Abstract

Aptazymes, i.e. fusions of a catalytic ribozyme or DNAzyme with one or several aptamers, make construction of nucleic acids-based circuits responding to external stimuli possible, as their catalytic activity can be activated in either the presence or absence of the cognate ligand. The design of aptazymes has been hampered by the need for communication modules, which translate the binding of a ligand to the aptamer into a change of function in the catalytic nucleic acid. Standard procedure herefore is selection on the one hand and rational design on the other, resulting in tedious work which is not certain to succeed. To this day, only few attempts have been made to automatize the design process, which all target a specifically designed chassis ribozyme. Here we describe a software "JAWS" (Joining Aptamers Without Selection) using a criterion for computational selection of communication modules based upon the partition function, as well as energetic and entropic data. This is then used in a random search-like algorithm to extract optimal communication modules without the need for selection.

Introduction

Designing communication modules for controlling functional nucleic acids usually requires either the use of known modules (insert cit here), or a selection process, whereby nucleic acids containing randomized modules are folded in the presence of the controlling ligand and assayed for their activity (insert cit here). This constitutes a tedious and lengthy process, with uncertain results, whereafter nucleic acids have to be folded in the presence of their ligand to display activity (insert Penchovsky here). This renders impractical all kinds of assays requiring more complicated switching behaviour, as increasingly complex arrays of communication modules would require increasingly complex selection schemes. Those selection schemes can prove restrictive to smaller labs, as well as iGEM teams, whose time and equipment-constraints do not allow for a selection process of multiple weeks. To alleviate the need for selection, a variety of methods have been developed for the case of the hammerhead ribozyme. These range from long, predefined communication modules (insert cit here) to computational methods involving an energetic criterion, as well as random search algorithms to construct dynamically switching modules (insert penchovsky, breaker ...). Although those approaches address the specific problem of the hammerhead ribozyme very well, they lack generality, as the hammerhead ribozymes used there are heaviliy modified to allow for an easy design process (insert cit here). Here we developed a completely general approach to the computational design of communication modules, relying on a variable set of constraints. We validated several computationally designed aptazymes with different combinations of aptamer and functional nucleic acid in vitro and found that all of them showed the desired switching behavior.

Algorithm

To generalize the design process of aptazymes we consider an aptamer inserted into a stem of the ribozyme of interest, adding a region of randomized nucleotides functioning as communication module. We constrain the active state of such an aptazyme to be such, that the aptamer is not bound by the rest of the structure, and all randomized nucleotides form a symmetric stem consistent with the consensus secondary structure of the native ribozyme. We constrain an inactive state to be such, that a part of the aptamer with a length of $N$ nucleotides forms a stem with the randomized nucleotides, and a portion of randomized nucleotides with length $N$. Hereafter, we shall call $N$ the shift of the communication module. By defining our communication module by a consensus structure, a length and a shift, we specify a constraint, checking for the existence of the active state consensus structure as well as the inactive structure, with a tolerance of $M$ wrong base pairs. We do so by inspection of the matrix of all base pair probabilities $P_{ij}$, yielded from the partition function $Z$ calculated for our sequence. We then discard all sequences not satisfying this constraint. Of the remaining sequences, only those that satisfy the constraint of the ensemble free energy difference between active and inactive state being less then the aptamer's interaction energy $\Delta{G_{Apt}}$ are retained. Using this approach, we arrive at functional communication modules for nucleic acids completely unrelated to the hammerhead ribozyme. In detail, the algorithm proceeds as follows:

Secondary structure prediction is performed using a partition function based approach using ViennaRNA (citation), which then allows for the extraction of the nucleic acid's partition function $Z=\Sigma_{I}e^{-\beta{G_{I}}}$, with $I$ an index enumerating all potential base pairs, loops and stacks, and $G_{I}$ the corresponding Gibbs free energy. This in turn allows for the construction of the probability matrix $P_{I}=\frac{e^{-\beta{G_{I}}}}{Z}$ consisting of the base pair probabilities $P_{ij}=\frac{e^{-\beta{G_{ij}}}}{Z}: i,j\in{{Bases}}$. The matrix $P_{ij}$ will be the basis of our calculations. To construct a communication module, we use the following steps:

Given the position of a region $\mathcal{R}\subset{Sequence}$ comprising $2\cdot{N}+A$ bases, which should surround the aptamer $\mathcal{A}$ with length $A$, construct an active conformation. Henceforth, we shall refer to $N$ as the stem length and to the active conformation as conformation $\mathcal{I}$. This conformation is such, that the aptamer does not participate in base-pairing interactions with the rest of the functional nucleic acid. Furthermore, all $2\cdot{N}$ nucleotides of the region $\mathcal{R}\diagdown{\mathcal{A}}$ should participate in base pairing interactions, forming a stem of length $N$.
Given $R$, $N$ and a shift $S$, construct an inactive conformation, consisting of the first $N$ bases in $R$ pairing with the bases $N+A-S$ to $2\cdot{N}+A-S$. This conformation has a stem displaced by $S$ nucleotides from the active conformation, disturbing ribozyme activity. The inactive conformation shall be refered to as conformation $\mathcal{II}$.
Generate a sequence $\mathcal{S}$, with all nucleotides in $\mathcal{R}\diagdown\mathcal{A}$ randomised.
Compute the partition function $Z$ of $\mathcal{S}$ and the base pair probability matrix $P_{ij}$ at $\theta{ = 37°C}$. Check, if base pairs conforming to conformation $\mathcal{I}$ exist. This is true iff $P_{ij}\gt{P_{threshold}} \forall{(i,j)\in{\mathcal{I}}}$. If it is true, accept the sequence and move on to the next step. Else, return to step 3.
Using the base pair probability matrix $P_{ij}$, check if base pairs conforming to $\mathcal{II}$ exits. This is true iff $P_{ij}\gt{P_{threshold}} \forall{(i,j)\in{\mathcal{II}}}$. Requiring simultaneous conformity to $\mathcal{I}$ and $\mathcal{II}$ ensures that the structure is bistable, with local minima of free energy coinciding with the active and inactive structures, respectively. If this is true, accept this sequence, otherwise discard this sequence and return to step 3.
Compute the ensemble free energies $\Delta{G_{\mathcal{I}}}, \Delta{G_{\mathcal{II}}}$ for the conformations $\mathcal{I}$ and $\mathcal{II}$ respectively. Check if $\Delta{G_{\mathcal{I}}}\lt{E_{threshold}} \wedge \Delta{G_{\mathcal{II}}}\lt{E_{threshold}}$. This ensures that the secondary structures are stable and do not unravel easily at 37°C.
Compute the difference in ensemble free energies $\Delta{G} = \Delta{G_{\mathcal{I}}} - \Delta{G_{\mathcal{II}}}$ and constrain it to be smaller in its absolute value than a threshold $\Delta{E_{threshold}}$. This ensures that the bistability of the structure is given, and the structure switches conformation upon ligand binding.
If all of the above apply, accept the sequence as a candidate sequence for the aptazyme. Else return to step 3.

In terms of the optimisation of $N$ and $S$ it is sufficient to run one simulation with relaxed constraints and evaluate the two dimensional histogram of points $(S,N)$ to find the region in $S,N$ space with the highest amount of candidates generated. This should correspond to the optimal stem length and shift for the aptamer ribozyme combination to be considered. ==Results== Our implementation of the algorithm, termed JAWS (Joining Aptamers Without SELEX), was done in Python using the ViennaRNA library and its Python bindings. We used JAWS to create functional DNA and RNA constructs exhibiting switching behavior. Thus, we constructed variants of the SpinachII fluorescent aptamer, which were shown to be switchable in real time by the presence or absence of ATP to a fluorescent and non-fluorescent state, respectively, using an aptamer from Szostak0000. Both variants outperformed a previouly published combination (BBa_K1614012)Xxxx0000, as they exhibited a wider dynamic range and stronger contrast between their active and inactive state. Furthermore, experimental evidence corresponded well to computational predictions, as candidate 2 (BBa_K1614015), predicted to have a more stable secondary structure, also exhibited slightly less background, better switching and a clearer spectrum than candidate 1 (BBa_K1614014). In addition, we created ligand switchable versions of the F8 DNAzyme SomeChineseGuy0000, which were designed to cleave in the presence of ampicillin, an oligonucleotide, ketamine, or p53. Their activity was confirmed in optimal buffer as well as in a more realistic setting for on-site detection of small molecule contamination in beverages (specifically in energy drinks spiked with carbenicillin). Finally, JAWS successfully computed switchable HRP mimicking DNAzymes for rapid characterisation of aptamers generated by MAWS, our aptamer designing tool. The switchable DNAzymes indeed showed increased activity in the presence of their respective ligands. There we observed, that different stem strength, as predicted by the software, using both free energy $\Delta{G}$ and an entropy like distanceKullback0000 $S[P_{ij},\frac{1}{L}]=-\Sigma_{ij}P_{ij}\cdot{log(LP_{ij})}$, where lower energy and entropy indicates greater stem strength possessed less background activity as well as a higher dynamic range, whereas weak stems possessed higher background and a lower dynamic range. ==Discussion== Following the results from the switchable HRP assays, we propose the introduction of another constraint regulating the stem strength of our candidates. Namely, at each step enforce that $S[P_{ij}]=\Sigma_{ij}P_{ij}\cdot{log({P_{ij}}\cdot{L})}$ be inside an interval of entropies$[S_{0},S_{1}]$. This ensures at least in silico, that the DNAzymes generated possess a certain stem strength, and thus a specified dynamic range and kinetic behaviour. Furthermore, the process of generating ligand gated, bistable structures is easily expandable to switching structures which are not simple stems. This is already implemented in our software, as the constraint regarding the existence of basepairs conforming to a stem can actually process any given secondary structure. Even so, we are currently limited to simulating DNAs and RNAs without any consideration of pseudoknots. This would be crucial for even more efficient aptazyme design, as many ribozymes feature extensive tertiary interactions to promote their activity. Again, this could be bypassed by adding additional terms to the partition function $Z$, corresponding to inter-loop interactions, which would allow for additional constraints stabilizing or destabilizing tertiary interactions in the presence or absence of ligands. That said, we expanded upon existing computational concepts, generalizing them to all functional nucleic acids and developed an extensible, modular, and universal constraint based system, allowing for the design of tailor-made ligand gated nucleic acids, fit to any specific problem.