Difference between revisions of "Team:BostonU/Modeling"

 
(57 intermediate revisions by 3 users not shown)
Line 4: Line 4:
 
<title>Modeling</title>
 
<title>Modeling</title>
 
<style type="text/css">
 
<style type="text/css">
 
+
/*
 
     .contentdiv
 
     .contentdiv
 
     {
 
     {
Line 34: Line 34:
 
}
 
}
 
/*footer*/
 
/*footer*/
#footer {
+
 
    margin: 0px auto 0 auto;
+
    width: 98.2%;
+
    height: 60px;
+
    padding: 10px;
+
    color: #FFFFFF;
+
    text-align: center;
+
    background: #333333;
+
    font-family: Verdana, Arial, Helvetica, sans-serif;
+
    font-size: 11px;
+
    clear: both;
+
}
+
 
.heading {
 
.heading {
 
     padding: 2px;
 
     padding: 2px;
Line 55: Line 44:
 
     font-weight: bold;
 
     font-weight: bold;
 
     font-size: 20px;
 
     font-size: 20px;
}
+
}*/
  
  
Line 64: Line 53:
  
 
<body>
 
<body>
 
+
<div id="wrapper">
  
 
<div class="contentdiv">
 
<div class="contentdiv">
<br><br><br>
+
 
<table cellspacing="50px" align="center">
+
<table align="center" style="margin: 0px auto;">
 +
<td>
 +
<a href="https://2015.igem.org/Team:BostonU/Modeling" class='button'>Modeling</a>
 +
</td>
 
<td>
 
<td>
<a href="https://2015.igem.org/Team:BostonU/Modeling" class='button'>Matlab Code</a>
+
<a href="https://2015.igem.org/Team:BostonU/Modeling/Dimerization_Domain" class='button'>Dimerization Domains</a>
 
</td>
 
</td>
 
<td>
 
<td>
<a href="https://2015.igem.org/Team:BostonU/Modeling/Split_Sites" class='button'>Split Site Identification</a>
+
<a href="https://2015.igem.org/Team:BostonU/Attributions/Experimental_Workflow" class='button'>Experimental Workflow</a>
 
</td>
 
</td>
 
</table>
 
</table>
 
<div class="mainpage">
 
<div class="mainpage">
<h2 style="padding: 5px; background-color: #990000; font-family: Calibri; color: #FFFFFF; font-size: 30px; text-align:left;">Modeling</h2>
+
<h3>Modeling</h3>
    <h3 style="padding: 5px; font-family: Calibri; font-size: 25px; text-align: left;"></h3>
+
<p>One of the main aspects of our project was to develop and refine a model that would help us predict the best places to split a protein, in order to most efficiently implement conditional dimerization. Protein primary sequence corresponds to the fundamental structure of the polypeptide chain; it is comprised of a string of amino acids that are covalently linked by peptide bonds. Theoretically, given that a protein that is n amino acids long, there are n-1 places to split the protein, since one can split the primary sequence between any amino acid (i.e. any peptide bond can be cleaved to yield two halves of the protein).
 
+
<p>One of the main parts of our project was developing a model to help us predict the best places to split a protein. A model was previously developed in Matlab by Billy Law, and we built off of this model in our project. The overall goal of our model was to find places to split the protein that would create inert halves but still have robust activity when put back together through induction.
+
 
</p>
 
</p>
 
<p>
 
<p>
Proteins are comprised of long strings of amino acids. Theoretically, given a protein that is n amino acids long, there are n-1 places to split the protein, since you can split it between each amino acid. This would be unfeasible and too time consuming, so we focused on two major criteria in order to narrow down split sites.
+
Testing all of these configurations individually would be an incredibly infeasible problem, particularly given our time, cost, and effort constraints. Typically, many researchers that are interested in splitting proteins rely on annotated primary sequences corresponding to functional domains that are generally understood through crystal structure (generally the quaternary structure). However, this can be a laborious trial-and-error process, and oftentimes the crystal structures of proteins are not known. Furthermore, it does not account for other fundamental structural aspects related to protein folding, including secondary and tertiary structures.  
 
</p>
 
</p>
 +
<center><img style="height:45%; width:45%; padding-top:20px; padding-bottom:20px;" src="https://static.igem.org/mediawiki/2015/thumb/0/05/Protein_structure.png/800px-Protein_structure.png" /><center>
 
<p>
 
<p>
Our first criteria was to choose the hydrophilic regions of the protein to split. We know that proteins generally have a hydrophobic core and a hydrophilic surface, and we hypothesized that splitting a protein through its core could potentially interfere with its folding activity and function. Therefore, we focused on avoiding hydrophobic regions in the protein and targeting the hydrophilic regions.
+
A model was previously developed in Matlab by a graduate student in Wilson Wong’s lab, Billy Law; we built off of Billy’s preliminary model in our project. The overall goal of the model was to narrow down the window of split site choices by focusing on feasible regions - regions that would be least likely to interfere with secondary, tertiary, and quarternary structure elements. We hypothesized that such a model could lead to an important predictive tool for scientists, if it could ultimately find the few optimal places to split a protein that would create inert split halves but yield robust activity when dimerization occurs.
 
</p>
 
</p>
 
<p>
 
<p>
We used the Janin hydrophobicity scale, which assigns each amino acid a number based on how hydrophobic it is (the higher the number is, the more hydrophobic the amino acid is). We took a running average of the hydrophobicity of 11 consecutive amino acids in our model to create a hydrophobicity curve of the entire protein.
+
We focused on three major criteria to narrow down split site choices.</p>
 +
<p>
 +
Our first criterion was to avoid the secondary structures in the protein: the alpha helices and beta sheets. (This was part of Billy’s original model.) We hypothesized that splitting through these sheets could also potentially disrupt folding activity and function.
 +
</p>
 +
<p>We used an online tool (<a href="http://www.compbio.dundee.ac.uk/jpred/" style="font-color:"#000000";">JPRED</a>) to predict where there would be alpha helices and beta sheets in the protein. It also predicted where there were likely a few catalytic residues. This tool required input of the primary structure (linear amino acid sequence) of the protein as the input, and gave an output with the secondary structure prediction based on the linear sequence. The model aligned the secondary structure prediction regions against the primary sequence, and we did not pick split sites that fell in these regions.
 
</p>
 
</p>
 +
<br>
 +
<center><img style="height:50%; width:50%;" src="https://static.igem.org/mediawiki/2015/thumb/3/37/SaCas9_secondary_JPRED.png/800px-SaCas9_secondary_JPRED.png" /><center>
 +
</br>
 
<p>
 
<p>
Our second criteria was to avoid the secondary structures in the protein: the alpha helices and beta sheets. We hypothesized that splitting through these sheets could also potentially disrupt folding activity and function.  
+
Our second criterion was to <b>avoid largely hydrophobic regions</b>, likely corresponding to the protein’s interior regions. (This was part of Billy’s original model.) Proteins often tend to adopt globular structures that bury hydrophobic residues within a core, such that they do not unfavorably interact with hydrophilic solvents. The exterior surface residues, in contrast, are generally hydrophilic. We hypothesized that splitting a protein through its hydrophobic core could potentially greatly interfere with its folding ability and overall function. Therefore, we focused on <b>avoiding hydrophobic regions</b> in the protein and <b>targeting hydrophilic regions</b>.
We used an online tool (http://www.compbio.dundee.ac.uk/jpred/) to predict where there would be alpha helices and beta sheets in the protein. This tool required the primary structure (amino acid sequence) of the protein as the input, and output the secondary structure prediction.
+
 
</p>
 
</p>
 
<p>
 
<p>
Additionally, we wanted to avoid splitting at special catalytic residues in our protein. We wanted to avoid these amino acids since we attributed them to certain functions of the protein through previous literature, and we wanted to prevent any interference with activity.
+
We used the Janin hydrophobicity scale<sup>1</sup>, which assigns each amino acid in the primary sequence an index based on how hydrophobic it is (the higher the number is, the “more hydrophobic” the amino acid is). We took a running average of the hydrophobicity of 11 consecutive amino acids in the model to create a hydrophobicity profile of the entire protein. The model aligned the hydrophobicity profile against the primary sequence, and we did not pick split sites that fell in hydrophobic windows.
 
</p>
 
</p>
 +
<center><img style="height:25%; width:50%;" src="https://static.igem.org/mediawiki/2015/thumb/b/b8/Sacas9_secondary_MATLAB_code.png/800px-Sacas9_secondary_MATLAB_code.png" /><center>
 
<p>
 
<p>
Using these criteria, we were able to build and manipulate a model in Matlab that would help us predict the best places to split our proteins. Below are the graphs produced by our model, along with the split sites that we chose shown in black.
+
Our third criterion was to avoid splitting within a known catalytic domain of each of the proteins. (This was our main contribution to the model itself.) We hypothesized that splitting within this functional domain could really interfere with the protein’s overall activity. We wanted to avoid any such huge potential disruptions to protein activity so that our system would still have robust activity when the protein halves dimerize.
We realized that these criteria do not account for a protein’s 3-D structure. As a result, our model ignores the loops and turns in between alpha helices and beta sheets. Loops and turns are structures in the protein that can contribute the most to protein function, such as binding sites, and can be identified in a protein’s 3-D structure.  
+
 
</p>
 
</p>
 
<p>
 
<p>
However, not all proteins have known 3-D structures. We therefore conclude that our model can be used when 3-D structures are unknown, but in order to best identify the most viable split sites, it is more beneficial to examine the 3-D structure of a protein.  
+
We looked into the literature and found any relevant annotations of our proteins of interest, especially noting where catalytic domains may have been located. Additionally, Billy’s model had included a few hypothesized catalytic residues for TP901-1 and PhiC31 integrases. The model aligned the “annotated catalytic domain” against the primary sequence, and we did not pick split sites that fell in this window.
 
</p>
 
</p>
 +
<center><img style="height:50%; width:50%;" src="https://static.igem.org/mediawiki/2015/thumb/2/28/Sacas9_tertiary_literature.png/800px-Sacas9_tertiary_literature.png" /><center>
 +
<p>
 +
Using these three criteria, we used our MATLAB tool to predict optimal regions in which to split our proteins. We chose 4-10 initial split sites for each protein, which is a much more realistic number to test compared to testing all possible split locations!
 +
</p>
 +
<p>
 +
Below is an image that incorporates all our criteria for one of the proteins that we tested. Our chosen split sites to test are indicated in black. We did not find catalytic domain/residue annotations for all of the proteins, and thus these images may not have these elements.
 +
</p>
 +
<br>
 +
<center><img style="height:45%; width:45%;" src="https://static.igem.org/mediawiki/2015/thumb/3/33/Phic31_all_three.png/800px-Phic31_all_three.png" /><center>
 +
</br>
 +
<p>
 +
Our results for our protein splits can be found in the individual results sections of our application pages: <a href="https://2015.igem.org/Team:BostonU/App_1/Results" style="color:#FF9966;">Integrase/RDF results</a> and <a href="https://2015.igem.org/Team:BostonU/App_2/Results" style="color:#FF9966;">SaCas9 results</a>. (This was our main contribution to model validation.)
 +
</p>
 +
<p>
 +
Our experimental results ultimately informed us that our model had promise, and that there are still ways to refine the predictive capacity. Read our considerations below to learn more.
 +
</p>
 +
<br>
 +
<p><b>Considerations</b></p>
 +
<p>
 +
We realized that these criteria do not account for all elements of a protein’s secondary, tertiary, and quaternary structure. One example that we noticed is that our model ignores regions that may correspond to loops and turns in between alpha helices and beta sheets. Loops and turns are structures in the protein that might actually contribute significantly to overall protein function (many of these are binding sites). However, these are most easily identified through protein 3D structural analysis, rather than through the primary sequence.
 +
</p>
 +
<p>
 +
Over time, we think that more computational tools will be developed that can provide more insight into various levels of protein structure and how these contribute to function, and these could potentially be integrated into the model. Also, we think that with more experimental validation, including our analyses, we can further refine some of the criteria.
 +
</p>
 +
<p style="padding-bottom:60px;">
 +
Ultimately we someday hope to see a refined tool that can input a protein’s primary sequence and <i>predict</i> the most optimal split site(s). Such a tool would be incredibly useful for researchers interested in splitting arbitrary proteins of interest, as well as for truly understanding all elements of protein structure.
 +
</p>
 +
<br>
 
</div>
 
</div>
</div>
 
<div id="footer">
 
<p>Copyright Boston Unversity IGEM &copy; 2015  <a href="#"></a></p>
 
    <div>
 
       
 
<div>
 
<a href="[full link to your Twitter]">
 
<img title="Twitter" alt="Twitter" src="https://socialmediawidgets.files.wordpress.com/2014/03/twitter.png" width="35" height="35" />
 
</a>
 
<a href="[full link to your Facebook page]">
 
<img title="Facebook" alt="Facebook" src="https://socialmediawidgets.files.wordpress.com/2014/03/facebook.png" width="35" height="35" />
 
</a>
 
<a href="mailto:[email address]">
 
<img title="Email" alt="Email" src="https://socialmediawidgets.files.wordpress.com/2014/03/email.png" width="35" height="35" />
 
</a>
 
 
 
</div>
 
</div>
  
    </div>
+
<div id="push"></div>
 
</div>
 
</div>
  
 +
<h4 style="font-size:16px; text-align:center;">Citations</h4>
 +
<ol style="font-size:12px;">
 +
<li>Janin, J., “Surface and inside volumes in globular proteins”, Nature, 1979.</li>
 +
<br><br><br><br><br>
 
</body>
 
</body>
 
</html>
 
</html>

Latest revision as of 23:09, 18 September 2015

Modeling
Modeling Dimerization Domains Experimental Workflow

Modeling

One of the main aspects of our project was to develop and refine a model that would help us predict the best places to split a protein, in order to most efficiently implement conditional dimerization. Protein primary sequence corresponds to the fundamental structure of the polypeptide chain; it is comprised of a string of amino acids that are covalently linked by peptide bonds. Theoretically, given that a protein that is n amino acids long, there are n-1 places to split the protein, since one can split the primary sequence between any amino acid (i.e. any peptide bond can be cleaved to yield two halves of the protein).

Testing all of these configurations individually would be an incredibly infeasible problem, particularly given our time, cost, and effort constraints. Typically, many researchers that are interested in splitting proteins rely on annotated primary sequences corresponding to functional domains that are generally understood through crystal structure (generally the quaternary structure). However, this can be a laborious trial-and-error process, and oftentimes the crystal structures of proteins are not known. Furthermore, it does not account for other fundamental structural aspects related to protein folding, including secondary and tertiary structures.

A model was previously developed in Matlab by a graduate student in Wilson Wong’s lab, Billy Law; we built off of Billy’s preliminary model in our project. The overall goal of the model was to narrow down the window of split site choices by focusing on feasible regions - regions that would be least likely to interfere with secondary, tertiary, and quarternary structure elements. We hypothesized that such a model could lead to an important predictive tool for scientists, if it could ultimately find the few optimal places to split a protein that would create inert split halves but yield robust activity when dimerization occurs.

We focused on three major criteria to narrow down split site choices.

Our first criterion was to avoid the secondary structures in the protein: the alpha helices and beta sheets. (This was part of Billy’s original model.) We hypothesized that splitting through these sheets could also potentially disrupt folding activity and function.

We used an online tool (JPRED) to predict where there would be alpha helices and beta sheets in the protein. It also predicted where there were likely a few catalytic residues. This tool required input of the primary structure (linear amino acid sequence) of the protein as the input, and gave an output with the secondary structure prediction based on the linear sequence. The model aligned the secondary structure prediction regions against the primary sequence, and we did not pick split sites that fell in these regions.



Our second criterion was to avoid largely hydrophobic regions, likely corresponding to the protein’s interior regions. (This was part of Billy’s original model.) Proteins often tend to adopt globular structures that bury hydrophobic residues within a core, such that they do not unfavorably interact with hydrophilic solvents. The exterior surface residues, in contrast, are generally hydrophilic. We hypothesized that splitting a protein through its hydrophobic core could potentially greatly interfere with its folding ability and overall function. Therefore, we focused on avoiding hydrophobic regions in the protein and targeting hydrophilic regions.

We used the Janin hydrophobicity scale1, which assigns each amino acid in the primary sequence an index based on how hydrophobic it is (the higher the number is, the “more hydrophobic” the amino acid is). We took a running average of the hydrophobicity of 11 consecutive amino acids in the model to create a hydrophobicity profile of the entire protein. The model aligned the hydrophobicity profile against the primary sequence, and we did not pick split sites that fell in hydrophobic windows.

Our third criterion was to avoid splitting within a known catalytic domain of each of the proteins. (This was our main contribution to the model itself.) We hypothesized that splitting within this functional domain could really interfere with the protein’s overall activity. We wanted to avoid any such huge potential disruptions to protein activity so that our system would still have robust activity when the protein halves dimerize.

We looked into the literature and found any relevant annotations of our proteins of interest, especially noting where catalytic domains may have been located. Additionally, Billy’s model had included a few hypothesized catalytic residues for TP901-1 and PhiC31 integrases. The model aligned the “annotated catalytic domain” against the primary sequence, and we did not pick split sites that fell in this window.

Using these three criteria, we used our MATLAB tool to predict optimal regions in which to split our proteins. We chose 4-10 initial split sites for each protein, which is a much more realistic number to test compared to testing all possible split locations!

Below is an image that incorporates all our criteria for one of the proteins that we tested. Our chosen split sites to test are indicated in black. We did not find catalytic domain/residue annotations for all of the proteins, and thus these images may not have these elements.



Our results for our protein splits can be found in the individual results sections of our application pages: Integrase/RDF results and SaCas9 results. (This was our main contribution to model validation.)

Our experimental results ultimately informed us that our model had promise, and that there are still ways to refine the predictive capacity. Read our considerations below to learn more.


Considerations

We realized that these criteria do not account for all elements of a protein’s secondary, tertiary, and quaternary structure. One example that we noticed is that our model ignores regions that may correspond to loops and turns in between alpha helices and beta sheets. Loops and turns are structures in the protein that might actually contribute significantly to overall protein function (many of these are binding sites). However, these are most easily identified through protein 3D structural analysis, rather than through the primary sequence.

Over time, we think that more computational tools will be developed that can provide more insight into various levels of protein structure and how these contribute to function, and these could potentially be integrated into the model. Also, we think that with more experimental validation, including our analyses, we can further refine some of the criteria.

Ultimately we someday hope to see a refined tool that can input a protein’s primary sequence and predict the most optimal split site(s). Such a tool would be incredibly useful for researchers interested in splitting arbitrary proteins of interest, as well as for truly understanding all elements of protein structure.


Citations

  1. Janin, J., “Surface and inside volumes in globular proteins”, Nature, 1979.