Team:Waterloo/Modeling/PAM Flexibility
Modelling Engineered Cas9 PAM Flexibility
CRISPR-Cas9 is one of the most significant advances in genetic engineering in the last several years. There are multiple examples in nature of a CRISPR-Cas9 system being used for defence. All of these use their own variant of the Cas9 protein to perform nuclease activity in a region dependent on the presence and location of a protospacer adjacent motif (PAM) sequence. From these, it has been observed that naturally occurring PAM sequences can be quite different from each other. For example, S. pyogenes uses a PAM sequence of NGG while S. thermophilus has a PAM sequence of NAAGAW in its CRISPR1-Cas system and NGGNG in its CRISPR3-Cas system. In current genetic engineering technologies, it is mostly the S. pyogenes CRISPR-Cas9 (spCas9) protein which is used.
However, one limitation of spCas9 is the NGG PAM required for binding to the target DNA site. While the 20bp protospacer can be easily specified with different sgRNAs, the PAM is essentially hardcoded into the Cas9 protein structure of the PAM-interacting (PI) domain. This limits the number of possible targets in a given genome, and thus the possible uses of spCas9 for genetic engineering. However, attempts to selectively modify Cas9 to bind new PAMs, specifically by changing two amino acids thought to bind directly to the PAM DNA sequence, failed.
Kleinstiver et al. recently demonstrated modified spCas9 with altered PAM specificity. Cells were transformed with a plasmid bearing a toxic gene and an NGA adjacent protospacer, as well as a spCas9 gene with a randomly mutagenized PAM Identification Domain. Only cells capable of binding the NGA PAM sequence were able to cut the plasmid, and thus survive. By sequencing surviving clones, the authors were able to identify mutations in the PI domain that altered PAM sequence specificity.
Two mutants in particular showed both high specificity and activity for the NGA PAM sequence. The first mutant, referred to as VQR (D1135V,R1335Q,T1337R), worked equivalently on all four NGAN PAM sequences. The second mutant, EQR (D1135E, R1335Q, T1337R) was most specific to NGAG PAMs. A second round of testing also found a VRER variant ((D1135V,G1218R,R1335E,T1337R) that was specific to NGCG PAMs.
These result indicate that spCas9 can display altered PAM specificity with relatively few mutants. Additionally, the data produced by Kleinstiver et al. also provided significant insight into the types of mutations that altered PAM specificity. This allows for a more targeted approach to engineering new spCas9 PAMs.
Computational Engineering of PAM Flexibility
Modelling Binding with PyMOL and PyRosetta
As of June 2015, at least three different groups have produce experimental data on Cas9 structure. By using a catalytically dead version of spCas9 (dCas9), Anders et al. managed to determine the crystal structure of a Cas9 protein bound to both the sgRNA and the double stranded target DNA. The PDB file containing this structure can be found in the Protein Data Bank with ID 4UN3.
Canonical molecular dynamics data, mention source of PDB
Our approach for further structural mutations in the protospacer adjacent motif (PAM) region for Cas9 recognition was applied by generating a combination of 64 different PAM sequences using both 3DNA and Chimera (our choice of software will be elaborated below); these combinations were implemented by altering the original 4UN3 PDB file and were subsequently tested and scored using PyRosetta and visualized through PyMOL. Our results from the initial scoring of the set of 64 PAM variants can be analyzed below.
insert figure of graph of data visualization for 64 PAMs stat distribution
We then developed a script that allowed for more stringent mutations in the region of interest of the Cas9 protein that interacts with the PAM sequence in DNA. Depending on the amount of amino acid substitutions specified, our script induced in silico mutations resulting in a change of amino acid sequence in the Cas9 protein; we tested the interaction using a set of two PDB files per variant- a shortened Cas9 whose structure only included the area of interaction and the full Cas9 protein.
insert image of clipped NGG vs. full sctrucured NGG
Accuracy of the interaction between Cas9 and the PAM region was ensured by further specifying physical fields such as the angle or torsion, degrees of freedom with respect to movement of the ligands, and repacking which allowed for the optimal conformation of the newly mutated protein so that conditions were ideal for binding.
Available PAM Affinity Data
Mention Klienstiver and
Engineering Pipeline
The overall approach to identifying possible mutants for new PAM variants is:
- Start with the pdb with crystal structure data for spCas9 bound to the target DNA
- Modify the amino acid sequence of the PDB. Mutants will be generate based on the location and amino acid substitution data from Kleinstiver et al., as well as known biochemical properties of the amino acid being mutated.
- Once a mutant has been generated, a full set of pdbs with the 64 or 256 different PAM variants will be created for each mutant.
- The DNA sequence is docked to the PI domain using PyRosetta, with scores being tracked over multiple runs.
- Scores are averaged and mutant specificity to different PAM sequences is determined, based on the clustering of the scores by PAM.
- Mutants highly specific for a particular PAM will be tested in the lab.
Model Validation
Choice of DNA Mutation Software
Chimera VS 3DNA, give stats details
Patterns of Variation between Simulations
Simulations are Monte Carlo, we analyzed the differences between them
Wild-Type Cas9 PAM Affinities are Reproduced
Report Kendall's Tau, Pearson, Clustering Results
Mutant Cas9 PAM Affinities show Mixed Results
Mutating VQR/EQR, how EQR and VQR mutants change Cas9 visualizations, measures of distance from DNA bases to binding sites in the Cas9
Framework and Future Work
Proposed Tool
Link to software page re:pyrosetta
Future Work
Talk about adding other info like Gibb's Free Energy
Show a few graphs from Kleinstiver and link to dataset encouraging others to use