Team:BostonU/Project

Our Project

Developing Conditionally Dimerizable Split Protein Systems for Genetic Logic and Genome Editing Applications

Abstract

The field of synthetic biology seeks to engineer desirable cellular functionalities by developing molecular technologies that enable precise genetic manipulation. A promising solution is to reliably control proteins that naturally execute genetic modifications. Current strategies to regulate activity of such proteins primarily rely on modulating protein expression level through transcriptional control; however, these methods are susceptible to slow response and leaky expression. In contrast, strategies that exploit post-translational regulation of activity, such as conditional dimerization of split protein halves, have been demonstrated to bypass these limitations. Here, we compare the relative efficiency of previously characterized dimerization domains in regulating activities of three important genetic manipulation proteins - integrases and recombination directionality factors for genetic logic applications, and saCas9 for in vivo genome editing applications. We also establish guidelines to rationally identify promising protein split sites. Our characterization of these systems in mammalian cells ultimately paves way for important biomedical applications.

Integrases and RDFs

Recombinases are a class of proteins that recognize specific sites in a genome, and can rearrange the DNA located in between these sites and the sites themselves. Recombinases are widely used in synthetic biology to induce or knock down the expression of genes in vitro and in vivo. These reactions are generally one-directional and irreversible. A special class of recombinase proteins called integrases can induce these reactions, but can also reverse the reaction in the presence of another protein called a Recombinase Directionality Factor (RDF). By tightly temporally controlling the activity of an integrase and its corresponding RDF, a system could be engineered in which a cell could be induced to go back and forth between different phenotypes. Also, different integrase – RDF systems can recognize orthogonal DNA sequences. A system in which multiple orthogonal recognition sites is combined with tight temporal control over Integrases and their corresponding RDF’s could allow for high order genetic logic with multiple phenotypic outputs. This past summer, the team focused on developing the tools necessary to temporally control the activity of these exciting proteins with genetic logic applications.

saCas9

The CRISPR/Cas9 method of genome editing is based on natural system used by bacteria to prevent infection. CRISPRs (Clustered regularly interspaced short palindromic repeats) are arranged in an array of identical repeat sequences separated by spacer sequences, and they help confer genetic memory and provide immunity against foreign genetic elements. The CRISPR/Cas9 system utilizes the Cas9 protein, an RNA-guided endonuclease. The Cas9 protein involves the use of a sgRNA, a guide RNA molecule. sgRNA is a short strand of RNA that identifies a complementary sequence of DNA in the entire genome. If the DNA contains a complementary sequence and a PAM (protospacer adjacent motif) downstream, Cas9 will bind to the target sequence and produce a double strand break. Now one of two things happens. The DSB is either repaired through homologous directed repair or non-homologous end joining. Using a donor template, HDR can lead to precise gene modification and NHEJ can lead to indels. Gaining temporal control of Cas9 allows you to increase the efficiency of gene targeting and producing desired effects at a gene of interest. By producing indels, you can efficiently knock out the genes. One important use of this is being able to distinguish between driver and passenger mutations in cancer. It also simplifies the study of oncogenes and tumor-suppressing genes. This system is by no means perfect, as was demonstrated by the Chinese team that recently attempted to genetically edit human embryos. They succeeded in editing only a little more than a quarter of the embryos, and ignited a massive ethical debate. This is why it is important to gain temporal control over Cas9 – eliminating basal activity and creating an inducible response are important in improving the accuracy and efficiency of Cas9. Splitting Cas9 can also allow for the restriction of genome editing to intersections of cell populations. The most popular version of Cas9 is streptococcus pyrogenes cas9. This protein was already split by the Zhang lab, with methods similar to ours. His study revealed that split-cas9 fragments can be used to induce indels without high levels of mutations at off-target sites. So we decided to do the same thing but with saCas9, stapholococcus aureus cas9. We plan to identify good split sites for saCas9 and split the protein and induce activity afterwards. We wanted to use saCas9 because it is over 1kbp smaller than spCas9, allowing you to additional regulatory elements to the vector and making it easier to virally package. We plan to test our split saCas9 using a traffic light reporter. Remember that the DSB produced by Cas9 is repaired by either NHEJ or HDR, and in each case, the TLR will light up in a different color. Originally, neither color is expressed. If the DSB is repaired by NHEJ, the GFP shown here becomes gibberish due to a frame shift, and mCherry will now be in frame, and will be expressed. If the DSB is repaired by HDR, the GFP will be repaired using the GFP template donor, and GFP will be expressed. An important note here is that we care more about the saCas9 actually producing its desired activity and making the DSB, so either color being expressed will prove the success of our induced saCas9. The TLR allows us to not only verify the success, but also characterize the activity of our saCas9. This experiment can lead way to the same applications as split spCas9, but we want to examine the effectiveness of the saCas9 which is significantly smaller than spCas9. What is significant about splitting Cas9 is that you can not only edit genomes in test tubes, but ultimately allow for inducible in-vivo genome editing.

Dimerizable Domains

We decided to try three different dimerizable domains with our protein splitting procedure. FKBP/FRB induced with rapalog, PYL/ABI induced with abscisic acid, and CRY2/CIBN induced with blue light. All of these systems have been studied in literature in the past, and have their own benefits. The CRY2/CIBN system is incredibly fast, the domains can come together in as little as 300 microseconds. Additionally, blue light has a very high resolution and can be delivered to cells with higher accuracy than chemically induced systems. However, these domains are the largest out of the three, which may inhibit efficient binding when fused to smaller proteins such as RDFs. FKBP/FRB is the most well documented dimerizable system and has been previously shown to be effective. We are using a slightly altered inducer of the standard rapamycin called rapalog in order to decrease non-specific binding. The domains bind the tightest of the three and also have the smallest domains. PYL/ABA is a dimerizable system found naturally in plants and is completely orthogonal to mammalian systems. Once bound in the presence of the inducer, abscisic acid, the domains can be split by washing the system and waiting 24 hours. All of these systems are orthogonal, an important aspect for future research into genetic logic circuits with the integrase RDF system.

Split Sites and Modeling

Our team had three main criteria for making split site decisions. We decided that the important things to avoid were the inner core of the protein, secondary structures, and catalytic residues. We hypothesized that splitting in these areas may disrupt the function of the protein even when the protein halves were dimerized. In order to determine where these regions were on the integrases, RDFs, and saCas9, we used multiple resources. Firstly, we utilized a model developed by a graduate student in our lab to predict the hydrophobicity of a chain of amino acids. Since the inner core of the protein is typically very hydrophobic, we were able to use this model to ensure that we only chose split sites that were on the hydrophilic surface of the protein. Next, we used an online tool called JPRED. This resource uses the primary structure of a protein to predict the locations of alpha helices and beta sheets, the secondary structure. Using JPRED, we chose sites that were not in any secondary structures of the proteins. Lastly, we wanted to avoid catalytic residues so we looked through literature and were sure to avoid sites that corresponded to important catalytic regions of the protein. All in all, we decided on 8 split sites for each integrase and for saCas9 and 4 split sites for the RDFs. We wanted to test each split site with each of the three dimerizable domains and in each orientation. For instance, if TP901-1 was split at AA253, we wanted to try fusing FKBP to AA1-253 and FRB to AA254-486 as well as FRB to AA1-253 and FKBP to AA254-486.

The Experiment

Results