Difference between revisions of "Template:Heidelberg/software/maws"
MAJendrusch (Talk | contribs) |
|||
(16 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
<html> | <html> | ||
− | < | + | <head> |
− | + | <style> | |
− | </ | + | .basictext { |
− | < | + | margin-left: 15px; |
+ | } | ||
+ | </style> | ||
+ | </head> | ||
+ | <body> | ||
<div class="container"> | <div class="container"> | ||
<div class="content"> | <div class="content"> | ||
Line 23: | Line 27: | ||
</p> | </p> | ||
</div> | </div> | ||
+ | <div class="col-lg-6"> | ||
+ | |||
+ | <div class="videocase"> | ||
+ | |||
+ | <embed | ||
+ | |||
+ | src="https://www.youtube.com/watch?v=K_WSwxZ87_I;hl=en_US&re | ||
+ | |||
+ | l=0&autohide=1&autoplay=0" wmode="transparent" | ||
+ | |||
+ | type="application/x-shockwave-flash" width="100%" height="100%" | ||
+ | |||
+ | allowfullscreen="true" title="Video"> | ||
+ | |||
+ | </div> | ||
+ | |||
+ | </div> | ||
</div> | </div> | ||
<!-- panel-body --> | <!-- panel-body --> | ||
Line 78: | Line 99: | ||
<strong> | <strong> | ||
− | A diazepam-targeting aptamer binding its ligand. | + | Figure 1: A diazepam-targeting aptamer binding its ligand. |
</strong> | </strong> | ||
Line 143: | Line 164: | ||
<div class="imagewrapperimage"> | <div class="imagewrapperimage"> | ||
− | <img class="img-responsive" src="https://static.igem.org/mediawiki/2015/d/d3/Algorithm_flowchart.svg" style="min- | + | <img class="img-responsive" src="https://static.igem.org/mediawiki/2015/d/d3/Algorithm_flowchart.svg" style="min-width:500px;"> |
</div> | </div> | ||
Line 151: | Line 172: | ||
<strong> | <strong> | ||
− | MAWS is based upon a simple iteration of <b>sampling</b> and <b>ranking</b> sequences: | + | Figure 2: MAWS is based upon a simple iteration of <b>sampling</b> and <b>ranking</b> sequences: |
</strong> | </strong> | ||
Line 157: | Line 178: | ||
<p class="basictext"> | <p class="basictext"> | ||
− | The target is loaded into the software. Following this, positions of all possible nucleotides are sampled, and sequences ranked by entropy. This is iterated, | + | The target is loaded into the software. Following this, positions of all possible nucleotides are sampled randomly from a uniform distribution of positions and orientations, with energies being saved at each sample. Following this, probability distribution $P(\mathcal{X})$ for each nucleotide are calculated, using the partition function $Z$. Entropies are then calculated and sequences ranked by entropy. This is iterated, at each step sampling the probability distribution of the aptamer, apprioximated by $P_N(\mathcal{X_N})\approx{\delta(\mathcal{X_{N-1}}-\hat{\mathcal{X_{N-1}}})}\cdot{\frac{e^{-\beta{E(\mathcal{X_N})}}}{Z}}$. |
− | + | ||
</p> | </p> | ||
Line 171: | Line 191: | ||
<div class="imagewrapperheader"> | <div class="imagewrapperheader"> | ||
− | + | The Steps Visualized | |
</div> | </div> | ||
Line 177: | Line 197: | ||
<div class="imagewrapperimage"> | <div class="imagewrapperimage"> | ||
− | <img class="img-responsive" src="https://static.igem.org/mediawiki/2015/6/6b/Algorithm_panel.png" style="min-height: | + | <img class="img-responsive" src="https://static.igem.org/mediawiki/2015/6/6b/Algorithm_panel.png" style="min-height:600px;"> |
</div> | </div> | ||
Line 185: | Line 205: | ||
<strong> | <strong> | ||
− | + | Figure 3: MAWS completes <b>three</b> basic types of tasks, preparation, initial sampling and iterative sampling, visualized here: | |
</strong> | </strong> | ||
Line 191: | Line 211: | ||
<p class="basictext"> | <p class="basictext"> | ||
− | + | (A) The small molecule target diazepam (green) is loaded, and a bounding box (blue) is generated. (B) Positions inside the bounding box, and possible orientations and conformations of each nucleotide (blue) dG, dA, dT or dC are sampled from a uniform distribution. The nucleotide with the lowest entropy is chosen as the starting point for the next step. (C) Given the position of lowest energy of the previous nucleotide (blue surface), possible conformations of the next nucleotide are sampled from a uniform distribution of all torsion angles. The position of minimal energy is chosen by iteration, discarding unfavourable conformations (red), characterized by intersection with the target, or with the aptamer itself. Favourable conformations (blue line) are positively selected. At each step, the sequences with the lowest per-step entropy are selected for the next step. This is iterated. (C) The finished product of the algorithm for the case of diazepam, folded with the ligand for 1 ns. | |
− | + | ||
</p> | </p> | ||
Line 202: | Line 221: | ||
<ol class="basictext"> | <ol class="basictext"> | ||
<li> | <li> | ||
− | A bounding box is generated around the target. The conformational and positional space inside the bounding box is sampled from a uniform distribution for the four possible nucleotides $C = {dAMP, dGMP, dCMP, dTMP}$. At each sample, positions and energies of the system are saved. | + | A bounding box is generated around the target (Fig. 3A). The conformational and positional space inside the bounding box is sampled from a uniform distribution for the four possible nucleotides $C = {dAMP, dGMP, dCMP, dTMP}$ (Fig. 3B). At each sample, positions and energies of the system are saved. |
</li> | </li> | ||
<li> | <li> | ||
Line 215: | Line 234: | ||
</p> | </p> | ||
<p> | <p> | ||
− | For each nucleotide $n \in C$, the entropy-like fitness function is computed as $S[P]=-V\cdot\Sigma_{I} | + | For each nucleotide $n \in C$, the entropy-like fitness function is computed as $S[P]=-V\cdot\Sigma_{I}p_{I}\cdot\Sigma_{J}(-P_{J}\cdot{\beta\Delta{G_{J}}}+ln(\frac{Q}{Z}))$ |
</p> | </p> | ||
</li> | </li> | ||
Line 225: | Line 244: | ||
</li> | </li> | ||
<li> | <li> | ||
− | The previous conformation of the aptamer is kept fixed, and $n$ is appended to the sequence. Torsion angles inside that nucleotide are sampled from a uniform distribution. | + | The previous conformation of the aptamer is kept fixed, and $n$ is appended to the sequence. Torsion angles inside that nucleotide are sampled from a uniform distribution (Fig. 3C). |
</li> | </li> | ||
<li> | <li> | ||
Line 235: | Line 254: | ||
</p> | </p> | ||
<p> | <p> | ||
− | For each nucleotide $n \in C$, the entropy-like fitness function is computed as $S[P]=-V\cdot{Sigma_{I}(-P_{I}\cdot{\beta\Delta{G_{I}}}+ln(\frac{Q}{Z}))}$ | + | For each nucleotide $n \in C$, the entropy-like fitness function is computed as $S[P]=-V\cdot{\Sigma_{I}(-P_{I}\cdot{\beta\Delta{G_{I}}}+ln(\frac{Q}{Z}))}$ |
</p> | </p> | ||
</li> | </li> | ||
Line 305: | Line 324: | ||
</div> | </div> | ||
<!-- container --> | <!-- container --> | ||
− | + | </body> | |
</html> | </html> |
Latest revision as of 19:26, 19 November 2015
Abstract
Aptamers, originally conceived as a better alternative to antibodies, are now a popular object of research. This research however, is limited by the use of a selection process (SELEX) to generate aptamers. This process needs careful planning and is time consuming with a usual timeframe of multiple weeks. Furthermore, success is heavily dependent on the quality of the nucleic acid library employed in the process. Herein, we construct an entropy-like metric on the space of nowhere zero N-dimensional probability distribution. By using this in an iterative process, we select nucleotides with a probability distribution of maximum distance to the uniform distribution at each step. This yields optimal ligand binding aptamers, without the need of selection. In the field of nucleic acid research, functional nucleic acids have been, and continue to be one of the topics most worked on. Even outside of that field of research, aptamers, ribozymes and DNAzymes are being used to construct very sensitive and specific sensors
Introduction
In the field of nucleic acid research, functional nucleic acids have been, and continue to be one of the topics most worked on. Even outside of that field of research, aptamers, ribozymes and DNAzymes are being used to construct very sensitive and specific sensors
Due to this need for a major speedup in the SELEX process, multiple computational approaches were designed. Algorithms have been devised to tackle the problem of nucleic acid library design
Algorithm
We first replace the relative entropy $S[P|Q]=-\int{P\cdot{log{\frac{P}{Q}}}d\mathcal{X}}$ by an entropy-like distance using the maximum entropy expression for probability, which is the canonical ensemble $P(\mathcal{X})=\frac{e^{-\beta\cdot{\Delta{G}}}}{Z}$, with $\Delta{G}$ the gibbs free energy of a single conformation of the aptamer ligand complex and $Z=\int{e^{-\beta{\Delta{G}(\mathcal{X})}}d\mathcal{X}}$. That distance we construct in the following:
Given $S[P|Q]=-\int{P\cdot{log{\frac{P}{Q}}}d\mathcal{X}}$, we have $\bar{S}[P|Q]=-\int{P\cdot{ln{\frac{P}{Q}}}d\mathcal{X}}$. From this follows trivially that $S[P|Q]>S[P'|Q]\Rightarrow\bar{S[P|Q]}>\bar{S[P'|Q]}$. Furthermore, we can assume for further considerations that $Q$ is the uniform distribution, and define $S[P]:=S[P|Q]$, where $Q$ is uniform. We then have: $\bar{S[P]}:=-\int{-P\cdot{\beta\Delta{G(\mathcal{X})}}+ln(\frac{Q}{Z})}$ Here, as everywhere hereafter, $\mathcal{X}$ designates a set of variables comprising all the aptamers coordinates. Now, every nucleotide should be scored by $S[P]$ and the distance in quality between two aptamers is given by the metric $g[P,P']=|\bar{S[P']}-\bar{S[P]}|$. Taking this into account, entropy-like fitness functions $\bar{S[P]}$ provide a partial order on the space of all nucleotide sequences $\mathcal{N}$, and $g[P,P']$ provides a metric on $\mathcal{N}$, or more specifically a metric on space of equivalence classes of all nowhere zero probability distributions, with equivalence being defined as: $A=_{S}B \iff S[A]=S[B]: A,B\in{\mathcal{N}}$. We use $g[P,P']$ to choose nucleotide sequences with a certain distance to the minimum $min(S[P])$ of entropies, which is rather intuitively the most specific nucleotide to the target. To use these principles to calculate aptamers to a certain target, we use the following algorithm:
- A bounding box is generated around the target (Fig. 3A). The conformational and positional space inside the bounding box is sampled from a uniform distribution for the four possible nucleotides $C = {dAMP, dGMP, dCMP, dTMP}$ (Fig. 3B). At each sample, positions and energies of the system are saved.
- Optionally, the bounding box is split along one principal direction. The variance $Var(\Delta{G}_{i})$ is calculated for each half $i$ of the bounding box. The space inside the bounding box is sampled again, with probability $P_{i}=\frac{Var(\Delta{G}_{i})}{\Sigma_{j}Var(\Delta{G}_{j})}$ for a sample to come from half $i$ of the bounding box. This can be iterated until sufficient variance reduction is reached. At the last step, save positions and energies at each sample.
-
For each nucleotide $n \in C$, where $n$ is the nucleotide at the 3' end of the aptamer, the partition function $Z$ is calculated as $Z=V\cdot\Sigma_{I}p_{I}\cdot{\Sigma_{J}\frac{e^{-\beta{\Delta{G_{J}}}}}{N_{J}}} : J\in{Samples},I\in{Strata}$, where $V$ is the volume of the sample space
For each nucleotide $n \in C$, the probability distribution is computed as $P_{J}=\frac{e^{-\beta{\Delta{G_{J}}}}}{Z}$
For each nucleotide $n \in C$, the entropy-like fitness function is computed as $S[P]=-V\cdot\Sigma_{I}p_{I}\cdot\Sigma_{J}(-P_{J}\cdot{\beta\Delta{G_{J}}}+ln(\frac{Q}{Z}))$
- The nucleotide $n$ with the lowest entropy, and all other satisfying $g[P,P_{best}]\lt{g_{threshold}}$ is accepted for the next step.
- The next steps are iterated $N$ times, where $N$ is the number of nucleotides the aptamer is to be extended by. For each sequence chosen in the previous step, for each nucleotide $n \in C$:
- The previous conformation of the aptamer is kept fixed, and $n$ is appended to the sequence. Torsion angles inside that nucleotide are sampled from a uniform distribution (Fig. 3C).
-
For each nucleotide $n \in C$, where $n$ is the nucleotide appended in the previous step, the partition function $Z$ is calculated as $Z=V\cdot{\Sigma_{I}\frac{e^{-\beta{\Delta{G_{I}}}}}{N_{I}}} : I\in{Samples}$, where $V$ is the volume of the sample space.
For each nucleotide $n \in C$, the probability distribution is computed as $P_{I}=\frac{e^{-\beta{\Delta{G_{I}}}}}{Z}$
For each nucleotide $n \in C$, the entropy-like fitness function is computed as $S[P]=-V\cdot{\Sigma_{I}(-P_{I}\cdot{\beta\Delta{G_{I}}}+ln(\frac{Q}{Z}))}$
- The nucleotide with the lowest entropy, and all other satisfying $g[P,P_{best}]\lt{g_{threshold}}$ are accepted for the next step.
Results
Our implementation of the algorithm, termed MAWS (Making Aptamers Without SELEX), was done in Python using the openMM library and its Python bindings for molecular dynamics simulations as well as the mpmath Python library for arbitrary-precision floating point arithmetics. MAWS, as well as its documentation, is freely available as Open Source on our GitHub repository. Using MAWS, we have generated aptamers for several protein targets, including actin, xylanase, p53, and saCas9, as well as small molecule targets, including kanamycin and ketamine. Each of those aptamers was computed within a day at most (saCas9) and two hours at the least, on a laptop computer with only four cores. The aptamers were then experimentally validated by linking them to an HRP-mimicking DNAzyme, yielding a so-called aptabody. Aptamers against protein targets were tested by a modified Western blot procedure, where the aptabody was used in place of a conventional antibody. Aptamers binding small molecules were assayed in solution. Aptamers designed by MAWS were found to be functional, binding their targets in chemoluminescence based assays (link to stefan and stefan). In silico validation of aptamers showed a tendency for them tightly bind their target in molecular dynamics simulations. Furthermore, MAWS predicted a very tight fit of the aptamer to its ligand.
Discussion
As the principle of minimal relative entropy should be universal, one could envision aptamers comprised of biomolecules other than nucleic acids to be generated by the software. In terms of PNA, XNA or other nucleic acid derivatives it should be possible to incorporate them into the software with minor changes to the parameters. Furthermore, the second and the following steps could be changed to a markov chain monte carlo algorithm, which would forego sampling in regions where the free energy $\Delta{G}$ is very large. One could then switch to sampling interaction energy, as conformations prevented from occurring in nature by the laws of physics could not arise if the markov chain transition probability were to be chosen to be energy gradient dependent, therefore avoiding artifacts. As the algorithm can be executed in a very short time on consumer hardware, it could be beneficial to implement master-worker-style parallelization for this software to be able to harvest more aptamers in a shorter time.