Team:Freiburg/Results/Modeling
Modeling
Life as Complex Biochemical Network
Nowadays, more and more information on biochemical networks is collected, and a lot is already known. Understanding biochemical networks is in general important for understanding signaling pathways within organisms, especially regarding medical health issues. Furthermore, it is needed for building new networks from scratch using synthetic biology.
However, the more components a network involves, the harder it gets to estimate how a complex network reacts to changes from both the inside and the outside. This is especially a problem, if entities change in both time and space. Experimentally studying all different components of a network for different possible conditions often is not only time-consumptive but also very expensive and prove often to be impossible at the desired level.
Mathematical modeling of biological networks therefore is a powerful tool to support experimental research, predicting the behavior of networks concerning both time and space. These predictions can then be validated experimentally.
Two crucial biological processes within the DiaCHIP are cell-free expression and the diffusion of proteins to the specific chemical surface. Cell-free expression limits the DiaCHIP concerning amount and production rate. Protein diffusion limits the system regarding space thereby determining the maximum protein spot density one can still resolve. Therefore, we decided to model both cell-free expression and protein diffusion to predict the behavior of protein concentration in time and space.
With a model like this, it is possible to find the minimum time needed to get a sufficiently high concentration of protein for antibody detection. This reduces the time needed for diagnosis by the DiaCHIP. Furthermore, not only protein, but also other molecules' concentrations can be simulated. Therefore, bottlenecks within the biochemical system as well as limiting substrates can be predicted. Specifically increasing the concentration of such limiting substrates as well as substances involved in limiting processes may not only increase the overall protein yield but also speeds up synthesis.
Moreover, a cell-free expression model has huge potential for application on all systems dealing with protein expression in general. Although designed for cell-free systems, it represents the central dogma of biology and therefore mathematically describes a lot of processes involved in many biochemical networks.
Overview
Our system consists of the three biochemical and biophysical processes cell-free transcription, cell-free translation and diffusion of the protein to the opposite surface, as shown in figure 1. With our model we aim to simulate cell-free expression to predict the amount of protein synthesized in the course of time. We set up a system of ordinary differential equations (ODE System) based on the law of mass action, describing the kinetics of the single processes occurring during transcription and translation. Starting from a detailed description including 95 different entities and 100 different parameters, we defined additional assumptions to simplify our system to a total of 65 entities and 44 parameters. Kinetic constants and all other parameter values needed were researched in literature. If no values were found, appropriate values were suggested. As the final model can not be solved analytically, it was solved numerically using Mathematica.
Model Description
1. Transcription
Transcription describes the processes of mRNA synthesis from a DNA template, the first part of the central dogma of biology, as shown in figure 2. Unspecific binding is also displayed here. The detailed system inherits 33 different entities described by their corresponding ODEs and 31 parameters. Using a set of assumptions, the system was simplified to 14 entities and 10 parameters.
Detailed System
Simplified System
2. Translation
Translation describes the processes of protein synthesis from an mRNA template, the second part of the central dogma of biology, as shown in figure 3. Unspecific binding is neither displayed in the figure nor included into the mathematical description due to the sheer amount of entities and parameters inherited. The detailed system consists of 62 different entities described by their corresponding ODEs and 69 parameters. Using a set of assumptions, the system was simplified to 51 entities and 34 parameters.
Detailed System
Simplified System
3. Diffusion
3.1 Introduction and Motivation
In the final step of cellfree expression proteins being produced are diffusing inside the microfluidic chamber. We modeled an ideal case to provide a tool:
On the PDMS slide spots of immobilized DNA produces proteins with a steadily decreasing production
rate. The product is distributed homogeneously on the spot and starts diffusing freely in the cell-free
mix. Furthermore besides convection through gravitation any interaction is assumed to be negligibly small. The coated iRIf glass is expected to be an ideal sink. Any proteins reaching the slide are bound and therefore do not contribute to diffusion anymore.
What knowledge did we want to gain by the diffusion?
- Time optimization: When is the most efficient time to stop the expression?
- Product optimization: How much of the totally produced proteins does bind to the surface?
- Spot distance optimization: How is the bound protein distributed on the glass slide?
3.2 Model System
Effective modeling depends on wisely chosen assumptions and a minimally complex system. The main assumptions we considered can be seen in chapter 3.3 and will be referenced by A1,A2, ...
Assume the following setup: A microfluidic system is given consisting of a spotted PDMS slide (multiple spots) and the chemically treated iRIf slide. Yet instead of observing this system of multiple spots we reduce our problem to a cylindrical area around one spot. This area has a radius of L and still the same width W as the whole system. Ideally the spot is a perfect circle which allows us to reduce the problem to a twodimensional one: No matter from which angle the cylinder is observed the identical behaviour is seen. Therefore a crosssection through it provides enough information.
While the expression is done the system lies as in the picture shown above, parallel to the ground. Therefore we included in our model system possible movements due to gravitation. The analytical equation describing this process is the diffusion convection equation:
It is an inhomogeneous parabolic partial differential equation (PDE). The diffusion constant D depends on the media and materials involved in the system, the inhomogenity describes sources of the system. The velocity vector v is assumed to be a possible sedimentation velocity (A4) and therefore a constant. The initial and boundary conditions are the following: The PDMS slide as well as the iRIf slide are closed surfaces while the left and right edges of the system are not. On the PDMS slide proteins are produced by a logistic function (A5). The solution to this system is only numerically solvable. Therefore it has to be prepared mathematically by discretizing the whole setup. Finally we used the Crank-Nicholson approach on a finite-differences-method to solve this system.
3.3 Numerical approach
Let us start by describing the crosssection in terms of mathematics: It is a subset M of the set of twodimensional real vectors. Of this subset we define two more subsets S and B describing the DNA spot and the glass slide respectively.
Two time δ is the spot width, its thickness is ε. The thickness of the binding area is ξ yet we choose them to be the same (A6). Next we have to simplify our PDE: It consists of three terms on the right side which can be (in our case) shortened to the following components:
The next step is the change from a continuous to a discrete system: First we describe our concentration function as one depending on three integers (places x and z, time t):
Also, instead of handling every component individually we define a concentration vector containing all concentration values of M at a static time step:
We do the same for our source by defining a production vector at the time t:
Why is this done? Because we can split our grid into a connected line of rows. The first row of the grid then would be given by the first components of the vector, followed by the second row and so on.
Next we discretize the derivations of the PDE by exchanging the differentials by finite differences. In this case differentials can be seen as the rate of change in one (x-,z- or) t-step (secants so to say). For a first and second derivation this would be written as:
Using the integer multiplicated step values for our PDE components this results in:
Logically this method is called "finite differences method". For conveniences sake two parameters λ and μ are defined and used in the upcoming simplifications:
There are different approaches with the finite differences method to approximately solve the PDE. We decided to use an ansatz by Crank and Nicholson which has the following form: A functions change in one time step is determined by the average of momentary and following time step of the right hand side of the PDE. Or, in mathematical terms:
Combining the PDE parts and splitting the concentration values into following and momentary time steps the problem can be solved by solving a linear system of coupled equations in place components of the concentration:
This is an iterative method: The values of the upcoming time step are calculated by the ones from the former time step. The equations cannot be solved independently from one another. In a grid of NW rows and NL columns this a system of NW times NL equations that have to be solved every step.
What can be seen in the scheme above is the underlying geometrical approach: Diffusion and convection are done only with direct grid neighbours (two vertical and two horizontal ones). Logically this holds true only for non-border grid points. For the systems borders our boundary conditions have to be considered which manifest in the following conditions:
Inserting those in the main Crank-Nicholson-scheme leads to the border descriptions:
Combining two border conditions leads to corner conditions which have to be considered too. They follow the same calculation yielding:
Being a linear system of equations this problem can be replaced by a short form using matrix-vector multiplication (hence the definition of the vectors from before):
Where the two matrices A and B consist of the factors accompanying the concentration values. They are rather large and are sketched below:
This concludes the theoretical part of the diffusion. The next step was to implement this system using python 3.4.2 including numpy and matplotlib. Source codes can be obtained here:
Python source code
The first file produces two output files. One of them contains the protein distribution on the glass slide at the finishing time. The other one consists of the total bound protein at the surface of the slide. The second file returns pictures of the ongoing diffusion giving an insight of the diffusion process in the crosssection system.
3.4 Assumptions
A1: The spots are equally distributed and distance to the borders of the microfluidic chamber is at least one spot-to-spot distance.
A2: The spots are perfect circles and the binding site at the iRIf glass is homogeneous -> Around one spot cylindrical symmetry is given.
A3: The area of produced proteins is as thick as the one of bound ones -> χ = ε
A4: Movement by virtue of gravitation is considered to already be in a stationary status (constant movement).
A5: Protein production is described by a logistic function with a minimal value close to zero and maximum value of 100 (percent).
A6: ε = ξ = Δ x = Δ z
Results
1. Transcription and Translation
We wanted to test our model's ability to predict the concentration of the cell-free expression biochemical network entities and to adapt it to different conditions. Given these properties, our model has the potential to reveal limiting resources and bottleneck reactions and to test optimized parameter settings for the resulting effects on the biochemical network behavior.
We run the model using default starting concentrations and parameter values as given in the following, tracking the concentration time course of free RNA polymerase, free 30s and 50s ribosomal subunits, free 70s ribosome, free amino acids, free tRNA, free initiation factors, free elongation factors, free nucleotides, free mRNA and free protein, to examine the network's predicted behavior. Afterwards, we tracked the protein and mRNA concentrations in three more simulations, while changing one starting concentration each, to examine the network's predicted reaction and compare it to the reaction expected.
Network Behaviour Under Default Parameter Values
Protein expression occurs between 220 s and 230 s and reaches a maximal protein concentration of about 2.8 µM. Interestingly, free mRNA concentration remains close to zero but reaches a short peak between 230 s and 240 s. The peak correlates with the saturation of protein concentration, denoting the end of protein expression. This can be explained by an increased release of mRNA molecules from translation complexes at the end of expression, which are still bound by evolving initiation complexes that can not be further processed and therefore decrease strongly again.
The concentration of free RNA polymerase drops down close to zero almost immediately, making it a possible limiting entity and therefore a candidate for optimization.
The time course of free 30s and 70s ribosomal species behaves similar to the RAN polymerase. Already at the beginning droppping down to zero, whereas the 50s ribosomal subunit concentration remains quite high. This points to a faster kinetic of the beginning of translation initiation compared to the end of translation initiation, binding a lot of 30s subunits but leaving the 50s subunits unbound. Again, a peak can be observed for all three ribosomal species at the end of protein expression, caused by an increased dissociation of translation complexes.
The concentration of initiation factors 1 and 3 already drops in the beginning of the reaction, whereas initiation factor 2 concentration drops only during protein expression. This can be explained by the limited mRNA concentration in the beginning of the reaction, which has to bind to the evolving initiation complex before IF-2-GTP can be bound. As IF-2-GTP concentration is not restored at the end of protein expression, this hints to GTP limiting the overall reaction rather than amino acids.
As for initiation factor 2, concentrations of all elongation factors remain constant until the beginning of protein expression, due to the limited amount of mRNA. Interestingly, only EF-G-GTP experiences a severe drop during protein expression. This could be due to EF-Tu-GTP regeneration kinetics being faster than EF-G-GTP regeneration. As EF-G-GTP is not restored as it is the case for IF-2-GTP, this is another hint on GTP being a limiting entity.
The time courses of amino acid and tRNA concentration show a slightly decreasing trend, caused by processing of amino acids and tRNAs to aminoacyl-tRNAs. In contrast to the tRNA concentration, the amino acid concentration shows a drop in the very beginning probably due to the binding of amino acids by the aminoacyl-tRNA synthetases. In contrast to the amino acid concentration, tRNA concentration experiences an increase during protein expression between 220 s and 230 s, that can be explained by the release of empty tRNAs during translation.
Looking at the time courses of nucleotide concentrations, it can be observed that the ATP concentration remains almost constant, whereas the GTP concentration drops extremely during protein expression. This drop seems to be responsible for the drop in total nucleotide concentration. Therefore, GTP is a possible limiting entity.
Network Reaction To Changed Parameter Values
As observed for time courses under default parameter values, ribosomal species and GTP may be limiting the cell-free expression process. Therefore, mRNA and protein expression under both lower and higher concentrations of those entities was simulated.
Protein expression is expected to be faster with higher concentration of ribosomes. This is also predicted by the model. Also, mRNA concentration is lower for higher ribosomal species concentration, confirming a faster process. In the course of protein concentrations, the differences are rather small.
Protein expression is expected to gain a higher yield using higher GTP concentrations. This behaviour is clearly predicted also by the model, confirming GTP to be a limiting entity for the cell-free expression described by the model. Both mRNA and protein concentrations are much higher compared to lower GTP concentrations.
Summary and Model Optimizations
Our model describing cell-free expression, including 65 entites and 44 parameters, showed the ability of both predicting network behavior and network reaction to changed conditions. Although the model's predicted output does not fit to real cell-free expression yet, there is a huge capability of making the model more specific. Using the set of assumptions defined, the model can be extended step by step, bringing it closer to the cell-free expression biochemical network.
2. Diffusion results
First, we made an animation of the diffusion using exemplary values. You can see a video of the diffusion process down below:
We also checked the protein binding for different spot widths: 20%, 40%, 60% and 80% of the total crosssection width. While the binding distribution returned gaussian profiles the total binding - like the production - followed a logistic function.
The total binding concentration was lower the broader the spot was. This can be explained by diffusion to the left and right borders of the system. In other words the broader the spots (while holding constant distances between the spots) the more "contamination" we have to expect at neighboring spots. Therefore, theory supports the expectation.
The binding distribution showed a similar result: The broader the DNA spot is distributed the broader a gaussian profile of the bound protein concentration (lower μ and higher σ). Here we neglected the fact that there are no infinite binding positions. So at a specific point a plateau would limit the distribution. Also, a broader profile is the result of the higher width. On one hand it is good to have a big enough detection spot. On the other hand being too broad could lead to overlaps with neighboring spots. Due to a lack of time gaining further insight into how much protein concentration is needed was not possible.