Team:Tianjin/Modeling
Overview
This year, Tianjin-iGEM makes a good combination of experiment design and modeling and solves all kinds of problems we meet.
What’s the best condition of the two-phase extraction experiment? Do not need compare experiments results in different conditions any more. Using only few data, we can calculate the exactly best condition through computer programs and give great help to our experiments.
How hydrophobin self-assembles on the surface? You really do not have to use electron microscope. See how we propose a physical model using mathematical method and how the model fits well with the experiment data.
Why the application of hydrophobin accelerates the cutinase catalyzed PET hydrolysis? The classic model won’t work. Novel kinetic models are made and it is amazing that we make it visible using computer software.
Mathematical model and computer stimulation give us powerful method to predict, improve and direct our experiments.
Adsorption
We decided to create a model of Janus adsorption for two main reasons:
Describing the self-assembly process of Janus on the solid substrate or air/water interface
To support and complement enzyme kinetic modeling as the basic part of heterogeneous kinetics
Model Formation
As literature review, we know that Class I Janus are able to self-assemble at an air/water or water/oil interface, forming a robust, electron-microscopically identifiable monolayer characterized by a rodlet structure. Yu et al. studied the formation of rodlets of the inJanus from Grifola frondosa. They used a Langmuir trough and found that rodlets formed at the air-water interface during compression of a surface film of inJanus. de Vocht et al. repeated the experiment to have an image of rodlet structure after drying an aqueous solution with scanning force microscope. As for Class II hydrophobins, HFBI, HFBII and HFBIII from T. reesei have been shown to form films with a self-assembled hexagonally ordered structure (Paananen et al. 2003; Kisko et al. 2005; Kisko et al. 2007). Multilayer Langmuir-Blodgett films, HFBI and HFBII proteins formed hexagonal crystallites (Kisko et al. 2005.) Self-assembled hexagonally ordered films of HFBIII on an air/water interface and on a silicon substrate were also studied (Kaisa Kisko et al. 2007)
As evidence given above, the pattern of two classes Janus are totally different.
Class I Janus
A key difference between class I and class II Janus is that only class I Janus form amyloid-like rodlets. Ingrid Macindoe et al. compared three types of EAS: wild-type EAS, EASΔ15(a truncated version of EAS that lacks 15 residues from the disordered Cys3–Cys4 loop) and F72G EASΔ15(F72 mutate to G) by nuclear magnetic resonance(NMR) spectroscopy. They also carried out molecular docking simulations using the data-driven docking program HADDOCK. Different sets of intermolecular hydrogen bond restraints between neighboring monomers over residues S71-I75 in both parallel and anti-parallel arrangements were tested. The width of the rodlet generated in this way is approximately 65 Å, which agrees very well with the measured width of approximately 61 Å for EASΔ15 rodlets(Ann H Kwan,2008)(Figure 1)
Figure 1. Ribbon diagram of EASΔ15 hexamer in gray showing packing of side chains in the β-spine core and location of charged residues. Side chains of F72, L73, and I74 are shown as purple sticks, N67 as green sticks and T66 and T68 as orange sticks. Red and blue spheres denote positions of Cγ and Cε atoms of Asp an Lys side chains, respectively. Horizontal line illustrates that the majority of charges are located on the “bottom” half of the structure (corresponding to the hydrophilic face).
So, as the data obtained from MD simulation above, the diameter of monomer is 27 Å.
Figure 2. Model of the rodlet unit by overview. Two circles simplify hydrophobins as regular spheres. The connection between dimer is the β-barrel unit.
To figure out the stoichiometric relationship between Janus and surface site, we calculate area of Janus and a unit respectively. We assumed the size of a Janus is far larger than a single surface bound site.
If regarding dimer as a unit (Figure 2), the area of two Janus(SH)= 1145 Å2, and the area of rectangle unit(SA) = 1755 Å2.
So, the elementary reaction equation is simplified as,
Parameters |
Meaning |
H |
Hydrophobin(Janus) |
As |
Active site on the solid surface |
k1 |
Associated constant |
k-1 |
Dissociated constant |
θ |
fractional occupancy of the adsorption sites |
τ/τ∞ |
Surface concentration of adsorption/ Maximal surface concentration |
Hypothesis:
(1) Each site can hold at most one molecule of Janus(mono-layer coverage only)
(2) The surface containing the adsorbing sites is perfectly flat plane with no corrugations (assume the surface is homogeneous)
(3) The adsorption process is reversible
(4) There is an H-bond interaction in the β-barrel forcing to form rodlets.
When adsorption rate equals to desorption rate,
,
simplify it and solve θ, , where
The adsorption isotherm equation is,
Class II Janus
The high resolution of the AFM images from both the air-facing side and the water-facing side enabled comparison of the repeating structures in the AFM images with the dimensions of an HFBI monomer.
Figure 3. Model of the structure of HFBI film. The film viewed from the hydrophobic side (A) and from the hydrophilic side (B). (A) and (B) are processed AFM images showing the average repeating structures in a single crystalline area. In the middle panel and expanding to both sides is a model of the monomer arrangement in the film(Géza R. Szilvay,2007)The model in figure suggests that three monomers forming a trimer are arranged to form a hexagonal pattern.
Figure 4. Simplifying model of the structure unit of class II hydrophobin film. Full line in the circle is valid hydrophobin in one unit and the hexagon is a cell of crystal monolayer film.
To figure out the stoichiometric relationship between hydrophobin and surface site ,we regarded hexagonal cell as a unit. (Figure 4)The area of six effective hydrophobins (SH)= , and the area of hexagonal cell (SA) = R.(R is the radius of ideal hydrophobin sphere)
The elementary reaction is as follow,
Parameters |
Meaning |
H |
Hydrophobin(Janus) |
As |
Active site on the solid surface |
k1 |
Associated constant |
k-1 |
Dissociated constant |
θ |
fractional occupancy of the adsorption sites |
τ/τ∞ |
Surface concentration of adsorption/ Maximal surface concentration |
Hypothesis:
(1) Each site can hold at most one molecule of Janus (mono-layer coverage only)
(2) The surface containing the adsorbing sites is perfectly flat plane with no corrugations (assume the surface is homogeneous)
(3) The adsorption process is reversible
(4) There is an interaction between adjacent Janus forming hexagonal crystalline structure.
When adsorption rate equals to desorption rate,
simplify it and solve θ ,
The adsorption isotherm equation is,
Parameter Finding
We turned to the literature to find parameters for our model, given in the Table below. In the past few years, M. Linder el at. were doing well in the field of adsorption of Janus. We looked for parameter values that had been measured in the work M. Linder el at. did, but the substrates they used were alkylated gold surface, salinized surface, polystyrene surface, etc. while what we use is PET(polyethylene terephthalate), which has different surface property. Therefore, we estimated parameters according to the data in their assays on fusion protein. Such aggregating parameters is somewhat uncertain endeavor; thereby predicting a range of the parameters and7 preparing it with the practical experiments. An explanation for how we arrived at each parameter is given in the table.
ADSORPTION PARAMETERS
Parameter |
Value |
Description |
Source/Rationale |
|
1 |
τ∞,HFBI-EGIc |
0.28μmol•m-2(1.33μg• cm-2) |
Maximal surface concentration of sJanus-EGIc fusion protein |
Determined base on fitting data into the first-order Langmuir isotherm, as reported in [7].It is the binding isotherm of EGIc- sJanus fusion protein to silanized glass. |
Kd,HFBI-EGIc |
21.8μg•ml-1 |
Desorption equilibrium constant of sJanus -EGIc fusion protein |
||
2 |
T∞,HFBI |
0.21μg• cm-2 |
Maximal surface concentration of sJanus |
Estimated according to the mass relationship between sJanus and fusion protein sJanus -EGIc, which is based on the assumption that the adsorbing and desorbing ability of sJanus keep same as fusion protein. |
Kd,HFBI |
21.8μg•ml-1 |
Desorption equilibrium constant of sJanus |
||
3 |
T∞,HGFI |
0.224μg• cm-2 |
Maximal surface concentration of sJanus |
Estimated according to an assay[8],with series of pH resulting in different adsorbance; thereby there is a range of desorption constant |
Kd,HGFI |
49~280μg•ml-1 |
Desorption equilibrium constant of sJanus |
As an attachment, we read almost all the assays about sJanus and inJanus adsorption in the paper published and we found diverse conditions in every assay, such as different substrates: alkylated gold, polystyrene, silanized glass, 1-hexanethiol etc. Hence, it is not easy to estimate the certain parameters when the Janus self-assemble on the PET. NO.1 parameters in the table is in the condition where the fusion protein adsorbs on the silanized quartz surface attained from the paper and we estimate the parameters when the protein is pure Janus by ignoring the conformation change during the protein fusion. Besides, theoretically it only applies to silanized surface but we initially think it applies to all hydrophobic surfaces. Further, we used another assay[8] data to estimate the parameters when inJanus adsorb on PET although in this case they used polystyrene as the substrate.
Initial Model Result
Using these estimated parameters, we simulated our model of the adsorption process. Below are two Janus, sJanus and inJanus, have different adsorbance with different concentration of free Janus. On the left we simulated a red line and a blue line, a range of the adsorption isotherm, which corresponds to pH=4 and pH=10, respectively. On the right is sJanus adsorption isotherm. It is convinced that no significant effect of temperature is noted; hence we do not make any note about temperature on the line graph.
In the initial model, it is indicated that two classes of Janus have similar ability of self-assembly. They both have strong hydrophobic area stuck to hydrophobic substrate.
Result in Lab
We set up this model to predict the quantity of absorption wild Janus and fused Janus. We have done the PET hydrolysis experiment with Janus or fused Janus existing, however, we did not have enough time to accomplish further discussion and parameter fitting which we planned to study with QCM(Quartz crystal microbalance) and SPR(Surface plasmon resonance). Despite that, we are working to make it more clear and specific in the future.
Reference
[1]Wang, X., Graveland Bikker, J., Kruif, C. & Robillard, G. Oligomerization of hydrophobin SC3 in solution: From soluble state to self-assembly. Protein Sci. 13, 810–821 (2004).
[2]Sunde, M., Kwan, A., Templeton, M., Beever, R. & Mackay, J. Structural analysis of hydrophobins. Micron 39, 773–84 (2007).
[3]Géza R. Szilvay. Self-assembly of hydrophobin proteins from the fungus Trichoderma reesei.(2007)
[4]Kisko, K. et al. Self-assembled films of hydrophobin protein HFBIII from Trichoderma reesei. J Appl Crystallogr(2007). doi:10.1107/S0021889807001331
[5]Kwan et al. Structural basis for rodlet assembly in fungal hydrophobins. Proc. Natl. Acad. Sci. U.S.A. 103, 3621–3626 (2006).
[6]Kwan, A. H. et al. The Cys3-Cys4 loop of the hydrophobin EAS is not required for rodlet formation and surface activity. J. Mol. Biol. 382, 708–20 (2008).
[7]Linder, M., Szilvay, G., Nakari Setälä, T.Söderlund, H. & Penttilä, M. Surface adhesion of fusion proteins containing the hydrophobins HFBI and HFBII from Trichoderma reesei. Protein Science 11, 2257–2266 (2002).
[8]Wang, Z. et al. Hydrophilic modification of polystyrene with hydrophobin for time-resolved immunofluorometric assay. Biosens Bioelectron 26, 1074–9 (2010).
[9]Takatsuji, Y. et al. Solid-support immobilization of a ‘swing’ fusion protein for enhanced glucose oxidase catalytic activity. Colloids Surf B Biointerfaces 112, 186–91 (2013).
[10]Takahashi,T.et al.Ionic interaction of positive amino acid residues of fungal hydrophobin RolA with acidic amino acid residues of cutinase CutL1. Molecular Microbiology 96,14-27(2015)
Modelling——optimization of Aqueous Two-Phase extraction
Introduction
The purpose of this part of modelling is trying to found out optimize process conditions for the effective partitioning of our fusion protein and a way to proof whether or not the novel two-step extraction method is applicable to any other kinds of fusion proteins constructed by other target protein of interest and hydrophobin. Moreover, we’d like to put a mighty-possible way to find the feasible experimental condition for any other target protein.
Part 1. Optimize process conditions modelling
Given the fact that our independent variables might to be concerned (Triton molecular weights and concentrations, temperature, phase volume ratios R, pH, fusion protein’s surface hydrophobicity, charge, structure, and molecular weight) do have a lot interaction between each other, we can see it that a high level of non-linearity exists in our system, complicating the study of the behavior in the aqueous two-phase extraction. What’s more, the common method of experimental design and optimization where operating conditions are optimized by changing one variable at a time, gives no guarantee for the optimal parameter determination. In order to simplify our problem, we decided to use the response surface methodology (RSM) in our model, which is based on the statistical theory. In this case, our model is less theoretical, nonetheless effective in our situation.
Following we present you our optimization model.
1. Response surface methodology (RSM)
RSM consists of a group of mathematical and statistical techniques for seeking the optimum conditions in a system which had multi-variable and complex interactions existing between the parameters. It can be used to define the relationships between the response variables (in our project are: recovery percentage and the purity of protein) and the independent variables (in our project are: the molecular weight、the concentration and the Grand average of hydropathicity of fusion protein (GRAVY) and phase volume ratios R) and also generates a mathematical model that describes the process.
The formula below is the basic equation used in RSM, in which xixj represents the interaction between two parameters, β is the equation coefficients. As for why we choose xi, xi2, xixj to regress? Here is the reason: take Taylor's formula as an instance, when the second-order can meet the accuracy requirements, than we shall set the polynomial as a second-order one, cause third-order is not necessary. The same reason, the following second-order model can meet our requirements so we do not set it as a third-order one.
2. Experiment data
To find out the statistical results of the model, we designed several experiments to gain our statistics by using the central composite rotatable design, considering four independent variables: the molecular weight、the concentration and the Grand average of hydropathicity of fusion protein and phase volume ratios R. The range of variables was chosen according to pre-experiment, during which we studied respective individual parameter. After the experiments were carried out, a regression analysis was carried out to extract equation coefficient in the model by using the software SPSS (Statistical Product and Service Solutions).
3. Supplementary Analysis on the Model
Principal Component Analysis (PCA)
In order to simplify our model, PCA is used to reduce the variables. Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. After the transformation is done, we minimize the variables and carry on the work.
4. Data Analysis and Model Validation
Due to the fact that by the time we submit the wiki, our experiments are still undergoing, so we can only process the few data we already got. Thanks to the useful software SPSS, which is a tiny but powerful tool speaking of statistical problems. We solved the basic statistical analysis by SPSS, here’s the results.
Another thing to mention is that we define a response variable as purity ratio, which equals the purity of target protein after extraction divide by purity of target protein before extraction.
protein |
GRAVY(x1) |
Phase ratio(x2) |
Purity before |
Purity after |
Purity ratio(y) |
GFP |
-0.338 |
0.02 |
59.8 |
88.7 |
1.483277592 |
GFP |
-0.338 |
0.05 |
94.5 |
100 |
1.058201058 |
GFP |
-0.338 |
0.05 |
86.7 |
90.1 |
1.039215686 |
GFP |
-0.338 |
0.05 |
26.4 |
74.6 |
2.825757576 |
BFP |
-0.303 |
0.20 |
27.5 |
69 |
2.509090909 |
RFP |
-0.563 |
0.02 |
19.5 |
78.6 |
4.030769231 |
RFP |
-0.563 |
0.05 |
19.5 |
60.9 |
3.123076923 |
The experiment data
And , and has the confident of 73.0%
And here are some figures:
These three figures illustrate the purify ratio from different point of view.
5. Sensitivity Analysis of the Model
To examine the influence on the response variables, we write the sensitivity analysis’s program, however, we do not get enough data so we did not do this part right now.
6. Conclusions
We can see that when we do the regression, the hydropathicity (x1) has been neglect, we may say that the target protein may be of any kind because the hydropathicity is not taken into account (Notice that here I only use the word “may”).
7. Strengths and Weaknesses
Strengths
Get rid of the complexity of the cases and draw a convincing conclusion despite of the complicate situation.
The result is applied to all at a wide confidence interval and easy to get once you have limited statistics.
Above all, the model is easy to establish and be used during research.
Weaknesses
The results can only be reached after you have some basic experiments (statistics), and cannot be made if you have nothing in hand. In other word, the model can only be used to forecast by practically rather than theoretically.
The burden of the computer increases as the data increases rapidly, in this case the model may not be suitable for occasions where there are too many variables (by saying too many, we mean more than 10 or so).
8. Where to improve
Take our project into consideration, it is still a problem that we do not have enough time to conduct so many experiments, there’s two way to solve the problem: the first is that although it is the best to conduct 31 experiments designed above, but consider there is only 15 coefficients for us to determined which gives the equations freedom, it is ok if we reduce the experiments down to 20 or so. However if you still think that is too much, then we can only solve the problem through the second method, that is we should find another way to modelling, maybe a more theoretical way that we do not have to depend the data too much.
Part 2. Feasible experimental condition modelling
Raise questions
In the process of modeling, we use the regression method to find the best experimental conditions. In the gradient experiment, the experimental conditions of the three kinds of proteins were obtained, which were 60, 70 and 80, take the best effect as 100 for example. Then we used the experimental data to do RSM analysis. The experimental conditions were independent variables, the extraction effect was variable. After the analysis the regression equation was obtained, and by using the value of derivative is zero we can get the best experimental conditions when the maximum effect value is 100 points.
And then, using the experimental data we have, what can we do if we want to use from three kinds of protein, with its separate properties (such as molecular weight and isoelectric point, hydrophilic and hydrophobic) as decision variables, to predict the acceptable conditions of any other proteins (regard the effect greater than 70 points as acceptable), whose properties were known, and how to generalize?
Analysis problem
The difficulty of the problem lies in, because the mechanism of extraction system is unclear, so the establishment of the mechanism model is difficult to achieve; and return to the regression which can get accurate results without knowing the mechanism well, it has always been a large sum of data to do regression analysis to predict, never have I heard small number of data can do regression to predict well. First of all, with little data may get no results (thinking that when the number of coefficients is a lot, then the data need more too to determine the coefficients’ value), not to mention the accuracy, which makes regression become meaningless. If you really have to make predictions using less data, you can consider interpolation method.
Interpolation and regression
Both interpolation and regression are widely used for prediction and forecasting, however, there are some differences between them. Normally there is a regression function defined in terms of a finite number of unknown parameters that are estimated from the data on the basis of minimize the sum of residuals, so the regression function do not have to fit the data exactly. The effect of regression often depends to some extent on making assumptions about the process, what’s more, these assumptions are required to have a sufficient quantity of data to observe, otherwise regression can easily give misleading results. Interpolation is more used for a small number of numerical processing, through the known data points, to construct new data points or a resulting function smoothly fit within the range of a discrete set of known data points to perform the function of prediction and forecasting. In the process of interpolation, we do not need to know what the function is like, complicated or simple, whatever, we only need to do some calculation. Errors exist as well as the regression analysis, however, considering the situation where we do not have much data to use, and the resulting function is fitted by the data, we may say that interpolation is a better way compared to regression. And the same thing between interpolation and regression is that, because the dimension of our prediction function depends on the number of data points, so with small sum of data points, the dimension of our prediction function is not high, the accuracy may not be too high.
The problem of few data has temporarily been solved by choosing the interpolation instead of the regression analysis, and the selection of the interpolation also lead to new problems. First of all, we deal with the first-order interpolation more often, but considering the characteristics of the protein data, there are three variables, and therefore we have to use multidimensional interpolation; second, there are more than one interpolation method, which is the best choice? The answer to this question is: we don’t know either. There is no conclusion or rule for interpolation method we use, we can only say that, try as much method as possible, then you may find the best answer.
Problem solving
As we discussed before, the selection of interpolation method is varied, so we start with the most simple one, multivariate polynomial interpolation in the Lagrange type interpolation (the most simple one should be multivariate linear interpolation, but taking the accuracy into account we start from polynomial interpolation rather than linear one). Choose the selected protein’s molecular weight, isoelectric point and hydrophilic as three independent variables, and experimental conditions as dependent variables for multiple times interpolation. Method is as follows:
Three-dimensional space Lagrange interpolation formula
Suppose a rectangular ,
Given interval [a, b] of a division ;
Given interval [c, d] of a division ;
Given interval [s, t] of a division ;
Then constitute a division point set of T.
Set the function ,
Where equals the value at each point of the division point set, and is a polynomial of n-ordered x, m-ordered y, and l-ordered z.
Let be , then it is easy to know that:,
And is the Lagrange interpolation polynomial for the division point set.
And the Lagrange interpolation basis function should meet the requirements below:
,
,
,
Let be, based on the Lagrange interpolation basis function of simple function we can know:
So, the interpolation polynomial of on the division point set ∆ is:
In our project, we just have to put the experimental data into the formula, and the last work is to calculate.
Appendix
Here is the program we used in modelling.
No.1
clc
clear
load pz.txt
%pz.txt is the text where you store your data,put the independent variables
%on the left rows,and the responce variables on the right rows.
n=4;m=2;
%n is the number of the independent variables
%m is the number of the responce variables
x0=pz(:,1:n);
y0=pz(:,n+1:end);
x=x0(:,3);
y=x0(:,4);
z=y0(:,1);
[X,Y]=meshgrid(min(x):0.01:max(x),min(y):0.001:max(y));
Z=griddata(x,y,z,X,Y,'v4');
figure(1)
%the picture of
[c,h]=contour3(X,Y,Z);hold on
colorbar
hold off
meshc(X,Y,Z);
figure(2)
pcolor(X,Y,Z);shading interp
colorbar
figure(3)
[c,h]=contour(X,Y,Z);
set(h,'ShowText','on');
colorbar
No.2
function [ output_args ] = RSM( A0,A1,A2,A3,A4,A11,A22,A33,A44,...
A12,A13,A14,A23,A24,A34 )
x1=32;
x2=16;
[x3,x4]=meshgrid(4:0.1:8,0.05:0.01:0.45);
f=A0+A1*x1+A2*x2+A3*x3+A4*x4+A11*x1*x1+A22*x2*x2+A33*x3.*x3+A44*x4.*x4+...
A12*x1*x2+A13*x1*x3+A14*x1*x4+A23*x2*x3+A24*x2*x4+A34*x3.*x4;
Z=f;
figure(4)
surf(x3,x4,Z)
figure(5)
[c,h]=contour(x3,x4,Z);
set(h,'ShowText','on');
colorbar
end
No.3
function [ ] = maxRSM( A0,A1,A2,A3,A4,A11,A22,A33,A44,...
A12,A13,A14,A23,A24,A34 )
syms x1 x2 x3 x4 y;
f=A0+A1*x1+A2*x2+A3*x3+A4*x4+A11*x1.*x1+A22*x2.*x2+A33*x3.*x3+A44*x4.*x4+...
A12*x1.*x2+A13*x1.*x3+A14*x1.*x4+A23*x2.*x3+A24*x2.*x4+A34*x3.*x4;
fx=y-(A0+A1*x1+A2*x2+A3*x3+A4*x4+A11*x1.*x1+A22*x2.*x2+A33*x3.*x3+A44*x4.*x4+... A12*x1.*x2+A13*x1.*x3+A14*x1.*x4+A23*x2.*x3+A24*x2.*x4+A34*x3.*x4);
eq1=diff(f,'x1');
eq2=diff(f,'x2');
eq3=diff(f,'x3');
eq4=diff(f,'x4');
[X1 X2 X3 X4]=solve(eq1,eq2,eq3,eq4);
X1=vpa(X1,5)
X2=vpa(X2,5)
X3=vpa(X3,5)
X4=vpa(X4,4)
dx1=diff(solve(fx,x1),'y');
dx2=diff(solve(fx,x2),'y');
dx3=diff(solve(fx,x3),'y');
dx4=diff(solve(fx,x4),'y');
maxf=subs(f,{x1,x2,x3,x4},{X1,X2,X3,X4})
r1=vpa(maxf./X1./subs(dx1,{x1,x2,x3,x4,y},{X1,X2,X3,X4,maxf}),5)
r2=vpa(maxf./X2./subs(dx2,{x1,x2,x3,x4,y},{X1,X2,X3,X4,maxf}),5)
r3=vpa(maxf./X3./subs(dx3,{x1,x2,x3,x4,y},{X1,X2,X3,X4,maxf}),5)
r4=vpa(maxf./X4./subs(dx4,{x1,x2,x3,x4,y},{X1,X2,X3,X4,maxf}),5)