Team:Heidelberg/software/jaws

Abstract

Aptazymes, i.e. fusions of a catalytic ribozyme or DNAzyme with one or several aptamers, make construction of nucleic acids-based circuits responding to external stimuli possible, as their catalytic activity can be activated in either the presence or absence of the cognate ligand. The design of aptazymes has been hampered by the need for communication modules, which translate the binding of a ligand to the aptamer into a change of function in the catalytic nucleic acid. Standard procedure herefore is selection on the one hand and rational design on the other, resulting in tedious work which is not certain to succeed. To this day, only few attempts have been made to automatize the design process, which all target a specifically designed chassis ribozyme. Here we describe a software "JAWS" (Joining Aptamers Without Selection) using a criterion for computational selection of communication modules based upon the partition function, as well as energetic and entropic data. This is then used in a random search-like algorithm to extract optimal communication modules without the need for selection.

Introduction

Designing communication modules for controlling functional nucleic acids usually requires either the use of known modules Kertsburg2002, or a selection process, whereby nucleic acids containing randomized modules are folded in the presence of the controlling ligand and assayed for their activityKertsburg2002. This constitutes a tedious and lengthy process, with uncertain results, whereafter nucleic acids have to be folded in the presence of their ligand to display activity Penchovsky2013. This renders impractical all kinds of assays requiring more complicated switching behaviour, as increasingly complex arrays of communication modules would require increasingly complex selection schemes. Those selection schemes can prove restrictive to smaller labs, as well as iGEM teams, whose time and equipment-constraints do not allow for a selection process of multiple weeks. To alleviate the need for selection, a variety of methods have been developed for the case of the hammerhead ribozyme. These range from long, predefined communication modules Kertsburg2002 to computational methods involving an energetic criterion, as well as random search algorithms to construct dynamically switching modules Penchovsky2013Penchovsky2005. Although those approaches address the specific problem of the hammerhead ribozyme very well, they lack generality, as the hammerhead ribozymes used there are heaviliy modified to allow for an easy design process Penchovsky2013. Here we developed a completely general approach to the computational design of communication modules, relying on a variable set of constraints. We validated several computationally designed aptazymes with different combinations of aptamer and functional nucleic acid in vitro and found that all of them showed the desired switching behavior.

Communication Module Scheme
Fig.1: The concept behind JAWS' communication modules:

The stem containing the aptamer (turquoise) contains base-pairs involving the aptamer in the inactive conformation. Activation occurs by strand displacement following ligand binding. Thereafter, another stem (black) forms, involving neither the ribozyme (purple) nor th aptamer.

Algorithm

The JAWS Algorithm
Figure 2: JAWS uses a random search algorithm coupled to constraints to identify bistable nucleotide sequences:

A partial template containing the ribozyme sequence 5' of the aptamer, the aptamer itself and the ribozyme sequence 3' of the aptamer is loaded, nucleotides between ribozyme and aptamer are randomised. Constraints enforcing active structure formation and inactive structure formation are enforced. Following that, sequences conforming to those constraints are accepted.

Active and Inactive Constraints
Figure 3: The base pairing constraints ensure that chosen structures have the potential to fold into both an active and an inactive structre.

The base-pairing constraints of the JAWS algorithm are twofold: Firstly to ensure the existence of the active state, secondly to enforce the dominance of the inactive state, to remove background activity.

To generalize the design process of aptazymes we consider an aptamer inserted into a stem of the ribozyme of interest, adding a region of randomized nucleotides functioning as communication module. We constrain the active state of such an aptazyme to be such, that the aptamer is not bound by the rest of the structure, and all randomized nucleotides form a symmetric stem consistent with the consensus secondary structure of the native ribozyme. We constrain an inactive state to be such, that a part of the aptamer with a length of $N$ nucleotides forms a stem with the randomized nucleotides, and a portion of randomized nucleotides with length $N$. Hereafter, we shall call $N$ the shift of the communication module. By defining our communication module by a consensus structure, a length and a shift, we specify a constraint, checking for the existence of the active state consensus structure as well as the inactive structure, with a tolerance of $M$ wrong base pairs. We do so by inspection of the matrix of all base pair probabilities $P_{ij}$, yielded from the partition function $Z$ calculated for our sequence. We then discard all sequences not satisfying this constraint. Of the remaining sequences, only those that satisfy the constraint of the ensemble free energy difference between active and inactive state being less then the aptamer's interaction energy $\Delta{G_{Apt}}$ are retained. Using this approach, we arrive at functional communication modules for nucleic acids completely unrelated to the hammerhead ribozyme. In detail, the algorithm proceeds as follows:

Secondary structure prediction is performed using a partition function based approach using ViennaRNA Lorenz, which then allows for the extraction of the nucleic acid's partition function $Z=\Sigma_{I}e^{-\beta{G_{I}}}$, with $I$ an index enumerating all potential base pairs, loops and stacks, and $G_{I}$ the corresponding Gibbs free energy. This in turn allows for the construction of the probability matrix $P_{I}=\frac{e^{-\beta{G_{I}}}}{Z}$ consisting of the base pair probabilities $P_{ij}=\frac{e^{-\beta{G_{ij}}}}{Z}: i,j\in{{Bases}}$. The matrix $P_{ij}$ will be the basis of our calculations. To construct a communication module, we use the following steps:

  1. Given the position of a region $\mathcal{R}\subset{Sequence}$ comprising $2\cdot{N}+A$ bases, which should surround the aptamer $\mathcal{A}$ with length $A$, construct an active conformation. Henceforth, we shall refer to $N$ as the stem length and to the active conformation as conformation $\mathcal{I}$. This conformation is such, that the aptamer does not participate in base-pairing interactions with the rest of the functional nucleic acid. Furthermore, all $2\cdot{N}$ nucleotides of the region $\mathcal{R}\diagdown{\mathcal{A}}$ should participate in base pairing interactions, forming a stem of length $N$.
  2. Given $R$, $N$ and a shift $S$, construct an inactive conformation, consisting of the first $N$ bases in $R$ pairing with the bases $N+A-S$ to $2\cdot{N}+A-S$. This conformation has a stem displaced by $S$ nucleotides from the active conformation, disturbing ribozyme activity. The inactive conformation shall be refered to as conformation $\mathcal{II}$.
  3. Generate a sequence $\mathcal{S}$, with all nucleotides in $\mathcal{R}\diagdown\mathcal{A}$ randomised.
  4. Compute the partition function $Z$ of $\mathcal{S}$ and the base pair probability matrix $P_{ij}$ at $\theta{ = 37°C}$. Check, if base pairs conforming to conformation $\mathcal{I}$ exist. This is true iff $P_{ij}\gt{P_{threshold}} \forall{(i,j)\in{\mathcal{I}}}$. If it is true, accept the sequence and move on to the next step. Else, return to step 3.
  5. Using the base pair probability matrix $P_{ij}$, check if base pairs conforming to $\mathcal{II}$ exits. This is true iff $P_{ij}\gt{P_{threshold}} \forall{(i,j)\in{\mathcal{II}}}$. Requiring simultaneous conformity to $\mathcal{I}$ and $\mathcal{II}$ ensures that the structure is bistable, with local minima of free energy coinciding with the active and inactive structures, respectively. If this is true, accept this sequence, otherwise discard this sequence and return to step 3.
  6. Compute the ensemble free energies $\Delta{G_{\mathcal{I}}}, \Delta{G_{\mathcal{II}}}$ for the conformations $\mathcal{I}$ and $\mathcal{II}$ respectively. Check if $\Delta{G_{\mathcal{I}}}\lt{E_{threshold}} \wedge \Delta{G_{\mathcal{II}}}\lt{E_{threshold}}$. This ensures that the secondary structures are stable and do not unravel easily at 37°C.
  7. Compute the difference in ensemble free energies $\Delta{G} = \Delta{G_{\mathcal{I}}} - \Delta{G_{\mathcal{II}}}$ and constrain it to be smaller in its absolute value than a threshold $\Delta{E_{threshold}}$. This ensures that the bistability of the structure is given, and the structure switches conformation upon ligand binding.
  8. If all of the above apply, accept the sequence as a candidate sequence for the aptazyme. Else return to step 3.

In terms of the optimisation of $N$ and $S$ it is sufficient to run one simulation with relaxed constraints and evaluate the two dimensional histogram of points $(S,N)$ to find the region in $S,N$ space with the highest amount of candidates generated. This should correspond to the optimal stem length and shift for the aptamer ribozyme combination to be considered.

Results

Our implementation of the algorithm, termed JAWS (Joining Aptamers Without SELEX), was done in Python using the ViennaRNA library and its Python bindings. JAWS, as well as its documentation, is freely available as Open Source on our GitHub repository. We used JAWS to create functional DNA and RNA constructs exhibiting switching behavior. Thus, we constructed variants of the SpinachII fluorescent aptamer, which were shown to be switchable in real time by the presence or absence of ATP to a fluorescent and non-fluorescent state, respectively, using an aptamer from Sazani2004. Both variants outperformed a combination of previously published components (BBa_K1614012)Sazani2004, as they exhibited a wider dynamic range and stronger contrast between their active and inactive state. Furthermore, experimental evidence corresponded well to computational predictions, as candidate 2 (BBa_K1614015), predicted to have a more stable secondary structure, also exhibited slightly less background, better switching and a clearer spectrum than candidate 1 (BBa_K1614014). In addition, we created ligand switchable versions of the F8 DNAzyme Wang2014, which were designed to cleave in the presence of ampicillin, an oligonucleotide, ketamine, or p53. Their activity was confirmed in optimal buffer as well as in a more realistic setting for on-site detection of small molecule contamination in beverages (specifically in energy drinks spiked with carbenicillin). Finally, JAWS successfully computed switchable HRP mimicking DNAzymes for rapid characterisation of aptamers generated by MAWS, our aptamer designing tool. The switchable DNAzymes indeed showed increased activity in the presence of their respective ligands. There we observed, that different stem strength, as predicted by the software, using both free energy $\Delta{G}$ and an entropy like distanceKullback1951 $S[P_{ij},\frac{1}{L}]=-\Sigma_{ij}P_{ij}\cdot{log(LP_{ij})}$, where lower energy and entropy indicates greater stem strength possessed less background activity as well as a higher dynamic range, whereas weak stems possessed higher background and a lower dynamic range.

Discussion

JAWS workflow
Figure 4: Predict-test-optimize cycle applied to JAWS.

Experimental data were crucial to the final design of JAWS, as they enabled us to make optimization of predicted aptazymes with respect to dynamic range and kinetics possible.

Following the results from the switchable HRP assays, we implemented an additional constraint regulating the stem strength of our candidates. Namely, at each step enforce that $S[P_{ij}]=\Sigma_{ij}P_{ij}\cdot{log({P_{ij}}\cdot{L})}$ be inside an interval of entropies$[S_{0},S_{1}]$. This ensures at least in silico, that the DNAzymes generated possess a certain stem strength, and thus a specified dynamic range and kinetic behaviour. Furthermore, the process of generating ligand gated, bistable structures is easily expandable to switching structures which are not simple stems. This is already implemented in our software, as the constraint regarding the existence of basepairs conforming to a stem can actually process any given secondary structure. Even so, we are currently limited to simulating DNAs and RNAs without any consideration of pseudoknots. This would be crucial for even more efficient aptazyme design, as many ribozymes feature extensive tertiary interactions to promote their activity. Again, this could be bypassed by adding additional terms to the partition function $Z$, corresponding to inter-loop interactions, which would allow for additional constraints stabilizing or destabilizing tertiary interactions in the presence or absence of ligands. That said, we expanded upon existing computational concepts, generalizing them to all functional nucleic acids and developed an extensible, modular, and universal constraint based system, allowing for the design of tailor-made ligand gated nucleic acids, fit to any specific problem.

References