Team:Waterloo/Modeling/Cas9 Dynamics

Modelling Genomic Effects of CRISPR/Cas9

CRISPR/Cas9 has been extensively studied for its applications in eukaryotic genome editing and gene expression control. Last year, the Waterloo iGEM team created an ODE model of dCas9 binding and control of gene expression. This year, however, the modelling team chose to investigate the effects of CRISPR/Cas9 on an genomic rather than molecular level. Specifically, we wanted to model the accumulation of mutations in a target genome and eventual deactivation of target genes after cutting by CRISPR/Cas9 and repair by Non-Homologous End Joining (NHEJ).

Model Formation

When bound to a single guide RNA (sgRNA), the S. pyogenes Cas9 nuclease diffuses through the cell in three dimensions, searching for the sequence 'NGG' in the target genome . When it finds an 'NGG', known as a PAM site, Cas9 binds and undergoes a conformational change that allows it to unwind the DNA helix and compare the sequence of its sgRNA with the DNA. If the sgRNA matches well, Cas9 cleaves the DNA, producing a double-stranded break (DSB) 3-4 bp upstream of the PAM site .

In the absence of a template, DSBs are repaired by Non-Homologous End Joining (NHEJ), which is an error-prone process that sometimes creates indels at the site of repair . This effect has recently been exploited to target double-stranded viruses such as HBV . Though there have been extensive efforts to characterize the factors that contribute to effective targeting and deactivation by CRISPR/Cas9 and NHEJ, they have not, to the best of our knowledge, been synthesized into a single model.

The aim of the model is thus to capture the cutting events initiated by Cas9 and predict the outcomes of these events. We model each genome as containing multiple domains of interest, such as promoters or ORFs, and track whether these domains have been deactivated by CRISPR/Cas9 activity. There may be more than one sgRNA target in each domain and many domains can be targeted at once.

Genomes contain domains, such as promoters or ORFs, which we endeavour to deactivate by directing Cas9 to a target or targets within them.

If Cas9 successfully cuts at a target site, the double-stranded break may be resolved in three ways. The most common resolution is for NHEJ to successfully repair the DSB without creating any indels

. However, NHEJ repair is error-prone and will often indels at the target site. Finally, since multiple sgRNA targets are considered, it is possible that large deletions will occur between two targets that are simultaneously cut.

Possible events after double-stranded breaks caused by CRISPR-Cas9: repair, indels or a large deletion between sites.

At each timestep, the model considered the state (cut or uncut) and sequence of all targets and computes the probability of the following events at each target: CRISPR/Cas9 cutting, NHEJ repair or large deletion. The remainder of the model formation section discusses how we determined the probability of each event.

Probability of Double-Stranded Cuts made by CRISPR/Cas9

Taking into account target effects

Cas9 diffuses in three dimensions until it finds PAM sites.

Error-Prone Repair by Non-Homologous End Joining

Indel Probabilities

Large Deletions

Other Model Parameters and Assumptions

Genomes do not interact

: we expect there to be multiple viral genomes in our plant defense example and it is possible that simultaneous cuts on different genomes could result two genomes being joined together. we decided that multiple stochastic simulations could be averaged to get an overall picture

Talk about where all the probabilities come from

Software Implementation

Pseudocode goes here, with links to files in GitHub I think

Results

Model Validation

Include notes on how the model matches reality/our expectations of reality in this section.

Simulate w/ targets that mismatch to different extents.

Effect of sgRNA Strength

Matt visualizations for different sgRNAs.

Graph of 3 different sgRNA designs of different strengths, show % functional

Importance of Large Deletions

Include notes on how the model matches reality/our expectations of reality in this section.

Effect of Cas9 Concentration

Include notes on how the model matches reality/our expectations of reality in this section.

Predicting CRISPR Plant Defense

This model was applied to the CRISPR Plant Defense aspect of our project, investigating whether the P6 protein of Cauliflower Mosaic Virus (CaMV) could be deactivated by frameshift mutations. The P6 protein was chosen as a focus of the investigation because it suppresses natural plant RNAi defenses and trans-activates translation of other CaMV proteins . Details on P6 and the CaMV genome can be found on CaMV Biology page.

The model was run with HOW MANY targets in the P6 gene of the simulated CaMV genome described above. We tracked the percent of simulated genomes with functional P6 across 1000 runs fo the model, giving a general prediction of how long it will take before the P6 of a particular CaMV genome is rendered non-functional by our Plant Defense system.

PLOT % functional for P6/time over many simulations.

Discussion

References

    Top