Team:BostonU/Modeling
Modeling | Dimerization Domains | Experimental Workflow |
Modeling
One of the main aspects of our project was to develop and refine a model that would help us predict the best places to split a protein, in order to most efficiently implement conditional dimerization. Protein primary sequence corresponds to the fundamental structure of the polypeptide chain; it is comprised of a string of amino acids that are covalently linked by peptide bonds. Theoretically, given that a protein that is n amino acids long, there are n-1 places to split the protein, since one can split the primary sequence between any amino acid (i.e. any peptide bond can be cleaved to yield two halves of the protein).
Testing all of these configurations individually would be an incredibly infeasible problem, particularly given our time, cost, and effort constraints. Typically, many researchers that are interested in splitting proteins rely on annotated primary sequences corresponding to functional domains that are generally understood through crystal structure (generally the quaternary structure). However, this can be a laborious trial-and-error process, and oftentimes the crystal structures of proteins are not known. Furthermore, it does not account for other fundamental structural aspects related to protein folding, including secondary and tertiary structures.
A model was previously developed in Matlab by a graduate student in Wilson Wong’s lab, Billy Law; we built off of Billy’s preliminary model in our project. The overall goal of the model was to narrow down the window of split site choices by focusing on feasible regions - regions that would be least likely to interfere with secondary, tertiary, and quarternary structure elements. We hypothesized that such a model could lead to an important predictive tool for scientists, if it could ultimately find the few optimal places to split a protein that would create inert split halves but yield robust activity when dimerization occurs.
We focused on three major criteria to narrow down split site choices.
Our first criterion was to avoid the secondary structures in the protein: the alpha helices and beta sheets. (This was part of Billy’s original model.) We hypothesized that splitting through these sheets could also potentially disrupt folding activity and function.
We used an online tool called JPRED to predict where there would be alpha helices and beta sheets in the protein. It also predicted where there were likely a few catalytic residues. This tool required input of the primary structure (linear amino acid sequence) of the protein as the input, and gave an output with the secondary structure prediction based on the linear sequence. The model aligned the secondary structure prediction regions against the primary sequence, and we did not pick split sites that fell in these regions.
Our second criterion was to avoid largely hydrophobic regions, likely corresponding to the protein’s interior regions. (This was part of Billy’s original model.) Proteins often tend to adopt globular structures that bury hydrophobic residues within a core, such that they do not unfavorably interact with hydrophilic solvents. The exterior surface residues, in contrast, are generally hydrophilic. We hypothesized that splitting a protein through its hydrophobic core could potentially greatly interfere with its folding ability and overall function. Therefore, we focused on avoiding hydrophobic regions in the protein and targeting hydrophilic regions.
We used the Janin hydrophobicity scale1, which assigns each amino acid in the primary sequence an index based on how hydrophobic it is (the higher the number is, the “more hydrophobic” the amino acid is). We took a running average of the hydrophobicity of 11 consecutive amino acids in the model to create a hydrophobicity profile of the entire protein. The model aligned the hydrophobicity profile against the primary sequence, and we did not pick split sites that fell in hydrophobic windows.
Using these criteria, we were able to build and manipulate a model in Matlab that would help us predict the best places to split our proteins. Below are the hydrophobicity curves for each of the proteins that we split produced by our model, along with the split sites that we chose shown in black.
We realized that these criteria do not account for a protein’s 3-D structure. As a result, our model ignores the loops and turns in between alpha helices and beta sheets. Loops and turns are structures in the protein that can contribute the most to protein function, such as binding sites, and can be identified in a protein’s 3-D structure.
However, not all proteins have known 3-D structures. We therefore conclude that our model can be used when 3-D structures are unknown, but in order to best identify the most viable split sites, it can be more beneficial to examine the 3-D structure of a protein.