Team:UFSCar-Brasil/part2.html
Protein Solubilization toolkit
Where are the singular effects?
Simulation and analysis
In this modeling step, we will develop a study about different chaperones arrangement efficiency in the proteins folding, using as example our protein of interest, Limonene Synthase. As it was not possible to obtain the complete system of this toolkit, there are no available experimental data to reinforce our findings. Nevertheless, the modeling of this section is extremely significant and could be useful to another teams with some similar problems. In this sense, we decided to model our system starting from a simulated dataset where we use fake data, and in the future, real data could simply replace them.
First of all, the experiment to measure the efficiency of each chaperone and arrangement of them was conceived in the following form:
All chaperones would be tested alone (IbpA+B, ClpB and DnaK), in pairs and all together. The proteins would be encoded in plasmid pSB1C3 under control of a constitutive promoter, allied to our gene of interest under the same promoter. Protein of interest would be produced during a few hours, and the cells would be harvested and lysed by ultrasound. Finally, the cell debris would be precipitated by centrifugation and the soluble and insoluble portions would be recovered. Then, in a SDS-PAGE the band of interest would be quantified by optical densitometry. Chaperone efficiency coefficient would be given as the ratio between soluble to insoluble bands.
Simulated data following this procedures were provided in the table below. A control group without any chaperones was included to comparison, and all measurements were made in triplicates. This table shows portion of well folded proteins (soluble) and misfolded ones (insoluble).
Table 1: Simulated data using different chaperone sets and the solubility after the treatment.
Chaperone arrangements | Optical density of Insoluble band | Optical density of Soluble band |
---|---|---|
control group without chaperones | 800 | 0 |
control group without chaperones | 865 | 0 |
control group without chaperones | 855 | 0 |
Ibp | 600 | 124 |
Ibp | 658 | 165 |
Ibp | 659 | 188 |
ClpB | 649 | 133 |
ClpB | 655 | 132 |
ClpB | 689 | 160 |
dnaK | 699 | 144 |
dnaK | 699 | 150 |
dnaK | 689 | 161 |
Ibp_ClpB | 400 | 250 |
Ibp_ClpB | 458 | 266 |
Ibp_ClpB | 423 | 267 |
Ibp_dnaK | 465 | 200 |
Ibp_dnaK | 465 | 220 |
Ibp_dnaK | 462 | 280 |
dnaK_clpB | 765 | 180 |
dnaK_clpB | 733 | 199 |
dnaK_clpB | 756 | 187 |
Ibp_Clpb_dnak | 300 | 300 |
Ibp_Clpb_dnak | 356 | 310 |
Ibp_Clpb_dnak | 346 | 330 |
Chaperones efficiency coefficient still means the proportion of correct and incorrectly folded proteins. Considering the case of 100% of solubility, this coefficient tends to infinity, since the insoluble portion tends to zero. In this sense, a case of this type would be impossible to work, to prevent this, we decided work with protein yield. Protein yield coefficient would be given as percent (0-100) and calculated as follows:
$$Y = 100*[Soluble/(Soluble+Insoluble)] \tag{1}$$
Table 2: Chaperone arrangements related to chaperones efficiency and protein yield coefficients.
Chaperone arrangement | Soluble/Insoluble | Yield % |
---|---|---|
control group without chaperones | 0 | 0 |
control group without chaperones | 0 | 0 |
control group without chaperones | 0 | 0 |
Ibp | 0,20666667 | 17,12707 |
Ibp | 0,25075988 | 20,0486 |
Ibp | 0,28528073 | 22,19599 |
ClpB | 0,20493066 | 17,00767 |
ClpB | 0,20152672 | 16,77255 |
ClpB | 0,23222061 | 18,8457 |
dnaK | 0,20600858 | 17,08185 |
dnaK | 0,21459227 | 17,66784 |
dnaK | 0,23367199 | 18,94118 |
Ibp_ClpB | 0,625 | 38,46154 |
Ibp_ClpB | 0,58078603 | 36,74033 |
Ibp_ClpB | 0,63120567 | 38,69565 |
Ibp_dnaK | 0,43010753 | 30,07519 |
Ibp_dnaK | 0,47311828 | 32,11679 |
Ibp_dnaK | 0,60606061 | 37,73585 |
dnaK_clpB | 0,23529412 | 19,04762 |
dnaK_clpB | 0,27148704 | 21,35193 |
dnaK_clpB | 0,2473545 | 19,83033 |
Ibp_Clpb_dnak | 1 | 50 |
Ibp_Clpb_dnak | 0,87078652 | 46,54655 |
Ibp_Clpb_dnak | 0,95375723 | 48,81657 |
In order to observe this large set of combinations relate to each other in terms of yield, we used a hierarchical analysis model of clustering available on a free webserver, DENDROUPGMA. The calculations were done used in the creation of dendogram shown in Figure 1. This analysis basically shows together those data that are mathematically closer and separates those who are further away. As we can see, the values obtained with all chaperones acting together, here named "all", are more isolated from other combinations for being the most efficient treatment of the others and so it is statistically more 'distant' to the other groups. Similarly we see the double "Ibp+clpB" and "Ibp+DnaK" grouped in the same node, so we conclude that their values are closer enough to not differentiate them and therefore cannot be separated.
Figure 1: Dendogram of distances using simulated dataset generating a clustering complex behaviour.
Cophenetic correlation coefficient, which brings how feaseable the dendogram is, was estimated as 0.84134 and values higher than 0.7 are considered as a good-fit. It can be seen that the combination of "all" chaperones together is farther away from the others, which was expected since its yield reached the highest value. Arrangements between "Ibp+DnaK" and "Ibp+clpB" are together on a branch, indicating that their yield values are closer than others. In addition, they are nearby branch of combination of all chaperones, indicating its yield not as high as the performance of all chaperones together, although close enough. This may indicate combinations "Ibp+DnaK" and "Ibp+clpB" as good substitutes for all three chaperones, when it would not be possible the complete arrangement.
As possible to observe, the protein yield using chaperones arrangement are not trivial, since there will be cross interactions hard to predict its behaviour jointly taking in account the single values. Predicting the interference between chaperones system, we equate all possible combinations of yield corresponding to each combination and multiply them by a mathematical factor \(x_i \). The goal is to interpret the values of these constants, to make it possible to compare the influence that each chaperone component isolated influences on the set of combinations.
Each index \(x_i\) represents an arrangement of all studied chaperones, in this way: \(x_1 \rightarrow Ibp\); \(x_2\rightarrow Clbp\); \(x_3\rightarrow DnaK\); \(x_4\rightarrow Ibp+Clbp\);\(x_5\rightarrow Ibp+DnaK\);\(x_6 \rightarrow Clbp+DnaK\); \(x_7 \rightarrow Ibp+DnaK+Clpb\).
Results obtained for this system were:
$$x_1=(471/124)x_7-(251/248)x_6 \tag{2}$$
$$x_2= -(314/71)x_7 +(502/213)x_6 \tag{3}$$
$$x_3=(471/109)x_7 - (251/218)x_6 \tag{4}$$
$$x_4=(251/612)x_6 \tag{5}$$
$$x_5=(1884/503)x_7-(502/503)x_6 \tag{6}$$
$$x_6=x_6 \tag{7}$$
$$x_7=x_7 \tag{8}$$
In this way, it will be possible to describe all parameters taking in account just two values \(x_6\) and \(x_7\). There is a relationship between all chaperones and the two combinations (all of them and “DnaK+Clpb”. Fixing one of them, it will be possible to study the behaviour of others using graphical tools (Figure 2 and 3).
Figure 2: Solutions to values of \(x_i\) in function of values of \(x_6\) with \(x_7\) fixed equals to 1.
Figure 3: Solutions to values of \(x_i\) in function of values of \(x_7\) with \(x_6\) fixed equals to 1.
It was possible to see that there is no single solution to our problem, and probably even with actual values there is no single solution. Although, the graphics can nonetheless take some dependency relationships between the coefficients, such as if \(x_7 \) is fixed \(x_2 \) component increases with slope greater than \(x_1 \), which decreases with the increasing of \(x_6 \) value. This indicates that ClpB influence in the chaperone combinations are higher than Ibp in these arrangements. In this sense, this result is reversed to from the moment when increase the amounts related to \(x_7 \). As it was not possible to obtain experimental data to validate our discussions and predict outcomes, we are pleased to leave this modeling as a theoretical model that can be deployed and tested in future projects of our or another iGEM team that wants to employ it.
References
S. Garcia-Vallve, J. Palau and A. Romeu (1999) Horizontal gene transfer in glycosyl hydrolases inferred from codon usage in Escherichia coli and Bacillus subtilis. Molecular Biology and Evololution 16(9):1125-1134.