Team:BostonU/Modeling

Modeling
Matlab Code Split Site Identification

Modeling

One of the main parts of our project was developing a model to help us predict the best places to split a protein. A model was previously developed in Matlab by a graduate student in our lab, Billy Law, and we built off of this model in our project. The overall goal of our model was to find places to split the protein that would create inert halves but still have robust activity when dimerization occurs.

Proteins are comprised of long strings of amino acids. Theoretically, given a protein that is n amino acids long, there are n-1 places to split the protein, since you can split it between each amino acid. This would be unfeasible and too time consuming, so we focused on three major criteria in order to narrow down split site choices.

Our first criteria was to choose to split the protein’s exterior regions. We hypothesized that splitting a protein through its core could potentially interfere with its folding activity and function. We also know that proteins generally have hydrophilic surfaces and hydrophobic cores.

Therefore, we focused on avoiding hydrophobic regions in the protein and targeting hydrophilic regions.

We used the Janin hydrophobicity scale, which assigns each amino acid a number based on how hydrophobic it is (the higher the number is, the more hydrophobic the amino acid is). We took a running average of the hydrophobicity of 11 consecutive amino acids in our model to create a hydrophobicity curve of the entire protein.

Our second criteria was to avoid the secondary structures in the protein: the alpha helices and beta sheets. We hypothesized that splitting through these sheets could also potentially disrupt folding activity and function.

We used an online tool JPRED to predict where there would be alpha helices and beta sheets in the protein. This tool required the primary structure (amino acid sequence) of the protein as the input, and output the secondary structure prediction. Our third criteria was to avoid splitting the catalytic domain of the proteins. We hypothesized that splitting this functional domain could interfere with the protein’s activity. We wanted to avoid any potential disruptions to protein activity so that our system would still have robust activity when the protein halves dimerize.

Using these criteria, we were able to build and manipulate a model in Matlab that would help us predict the best places to split our proteins. Below are the hydrophobicity curves for each of the proteins that we split produced by our model, along with the split sites that we chose shown in black.

We realized that these criteria do not account for a protein’s 3-D structure. As a result, our model ignores the loops and turns in between alpha helices and beta sheets. Loops and turns are structures in the protein that can contribute the most to protein function, such as binding sites, and can be identified in a protein’s 3-D structure.

However, not all proteins have known 3-D structures. We therefore conclude that our model can be used when 3-D structures are unknown, but in order to best identify the most viable split sites, it can be more beneficial to examine the 3-D structure of a protein.