Introduction & Motivation
The in vivo proof of concept implementation of CRISPR-Cas9 as an anti-viral in plants requires integration of the coding sequence for Cas9 along with its corresponding sgRNA(s) into a plant genome. We've chosen to design sgRNAs that will guide Cas9 to three highly-conserved coding sequences of the Cauliflower Mosaic Virus (CaMV) genome where it will induce double-stranded breaks. As the virus attempts to reproduce its own DNA and express virulence proteins in the plant cell nucleus, it is hoped the consitutively expressed Cas9 endonuclease will have a severe impact on the number of viable viral genomes and provide a meaningful level of immunity.
When Cas9 is used to target plant chromosomal DNA for genome engineering, the double stranded breaks it makes are most often repaired by error-prone non-homologous end joining (NHEJ) . As Cas9 cuts the viral genome, plant NHEJ repair mechanisms will also inadvertently act to repair the viral chromosome. If only one evolutionarily nonconserved site on the viral genome is targeted, even after a Cas9-guided double stranded break it is possible that the viral genome will be quickly repaired without any major detriment to its life cycle. To increase the probability that the virus will be rendered inert, multiple sgRNAs targeting highly conserved sequences throughout the viral genome will need to be expressed simultaneously. In addition, targeting the CaMV genes most important for reproduction and export may help decrease the number of stable virus mini-chromosomes in the nucleus, reduce synthesis of new viral DNA, and stop export into the cytoplasm. A recent study by Seeger & Sohn targeted Hepatitis B virus in a human cell line. They found that most gRNA targets were able to reduce the number of HBV-positive cells five-fold or more, but mechanism of immunity not entirely clear. For most gRNA designs, the Cas9 breaks induces short insertions and deletions, likely caused by error-prone Non-Homologous End Joining of the cleaved ends, and the number of indels correlated with the reduction in virulence. One design, however, left 55% of cells wild-type while achieving similar reductions. The dynamics of repair and deletion after double-stranded cleavage by Cas9 is a major focus of the mathematical modelling this year. However, we didn't have time to finish those models before ordering our gRNA constructs, so we decided to design targets for several possible situations.
I. Targeting P1, P3, P5, P6
We chose to target these genes because they have well-established functions related to the dynamics of CaMV inside the plant cell. P2 is not targeted because it is related to interactions with the aphid vector and P4 has a number of isomorphisms that make it somewhat impractical for targeting. You can see more about the functions of each of these genes on the Cauliflower Mosaic Virus wiki page.
II. Multiple targets on P6
P6 trans-activates the 35s mRNA and is responsible for many of the viral defences against the plant cell, making it the obvious choice if only a single gene is targeted. By placing a number of targets on P6, we hope we might disrupt its activity more immediately and induce large deletions when multiple cuts occur at one.
III. Target non-coding locations around promoter
This design lets us examine whether large deletions may be created by flanking a region with Cas9 sites and whether double-stranded breaks alone are descrutive. If CRISPR immunity is mainly conferred by frameshift mutations or small deletions caused by NHEJ, we wouldn't expect this design to offer much reduction in virulence. However, if CRISPR immunity is mainly conferred by degradation of cleaved DNA or large deletions between neighbouring cut sites, we would still expect a significant reduction in virulence. If we see a significant reduction in virulence with this design, we'll have to examine the DNA structure to see if there are large deletions.
IV. Reduce mRNA transciption using dCas9
Perhaps NHEJ is insufficiently error-prone to eliminate the viruses at the pace we need or that non-destructive mutations introduced by NHEJ will change the DNA sequence enough to prevent prevent Cas9 from recognizing the site for a second attempt. In these cases, suppression of the 35S or 19S promoter using dCas9 might do more to prevent virulence that cleavage using Cas9.
gRNA Selection Criteria
To select the gRNA targets, we ran the CaMV genomic sequence through the Benchling CRISPR design tool, identifying 682 possible targets in the CaMV with NGG pam sites. To narrow down the list of targets, we considered four factors: genome position, conservation, off-targeting and efficiency.
The goal with many of the designs is to target a specific gene or genes and induce frameshift mutations. Previous research suggests that any frameshifts within the first 50% of a gene should be deleterious enough to prevent it from producing a functional protein. For design I we used a 40% cutoff after looking at the functional regions of the CaMV transcripts. The targets for Designs II-IV were subset by location (correct protein for II and IV, non-coding for III) but didn't have to all be near the beginning of the gene.
We want to target areas of the genome that are functionally important and assumed this would correspond well to areas that are conserved (i.e. unchanging) between CaMV and closely-related viruses. Targeting conserved areas should also make our gRNA targets more robust, since those areas are unlikely to contain many mutations or isomorphisms among different strains of CaMV.
Since we want to provide extra protection for Arabidopsis plants against invading viruses, a major design priority was to avoid cutting apart the Arabidopsis genome by accident. All the gRNA candidates were given off-target scores according to the algorithm employed by the MIT CRISPR design tool (though as implemented by Benchling). Screenings by Hsu et al. showed that each position in the gRNA contributes differently to binding efficiency and this is accounted for in the off-target score. Many regions of the Arabidopsis genome are non-coding, so we also ran a BLAST result of all the off-target matches identified by Benchling. For example, we considered an off-target match in a tDNA insertion site or a putative protein to be less important than one related to a known mRNA or protein.
Benchling, the tool used to find candidate gRNA targets, also includes an efficiency score for how well the gRNA is expected to knock out the target. This score comes from Doench et al. . Doench described the algorithm as follows on the Addgene blog: We examined sequence features that enhance on-target activity of sgRNAs by creating all possible sgRNAs for a panel of genes and assessing, by flow cytometry, which sequences led to complete protein knockout. By examining the nucleotide features of the most-active sgRNAs from a set of 1,841 sgRNAs, we derived scoring rules.
In several designs we were interested in using multiple gRNAs to target the same area (for example, for design II we chose four targets within P6). However, we didn't want to have any overlaps in our target sequence, since that would make our system less robust in the face of random mutations in a particular CaMV genome and would potentially cause interference between (d)Cas9 proteins vying to bind the same region. For this reason, only the single best gRNA in a region with many PAM sites could be chosen.
The specific steps we followed to generate our gRNA sequences were as follows:
- Downloaded all genome sequences in the Caulimovirus genus from NCBI (includes CaMV). These sequences are in the file caulimovirus_sequence.fasta.
- That FASTA was uploaded to Guidance v2 with default parameters to generate a multiple sequence alignment.
- Masked multiple sequence alignment based on base pairs that had at least 0.93% identity across all Caulimovirus. Note that we also tested alignments in the entire Caulimoviridae family, but the sequence homology was so low that there were very few continuous conserved regions.
- Copied the masked sequence for CaMV to Benchling, which removed all gaps (represented as dashes) automatically. The masked sequence can been seen at this read-only link.
We then created a CRISPR design on both the masked genome sequence from Guidance and unmasked CaMV genome, which was imported into Benchling using its NCBI Accession Number (NC_001497). The CRISPR design was run using the following parameters:
- Start: 0, End: 8024 (entire genome)
- PAM: NGG
- Guide length: 20 nt
- Design type: single guide
- Genome for off-targets: TAIR10 (A. thaliania)
- Genome region to exclude for off-targets: none
- Benchling identified 682 targets in the unmasked genome and *275 in the masked genome. These targets were exported to Excel using Benchling's export tool. Some gRNAs identified in the masked genome contained 'N' nucleotides (i.e. were flanked by a PAM site but were partially masked) so there were only 235 true gRNA candidates in the masked genome. Note that Benchling has updated their algorithms since our design was carried out so that such partly-masked candidates are not assigned scores.
The gRNA found in the masked genome are in one sheet and the gRNA in the unmasked genome in another. Each off-target score in the Benchling interface is a link, which brings up the matches found:
Benchling screenshot showing matches that lead to an off-target score.
For every gene accession number provided, we used NCBI Nucleotide to see which off-targets were most important since, as we mentioned before, off-targets in e.g. tDNA insertion sites are less concerning than off-targets in genes. We didn't want to go through this searching for all 682 gRNA, so we subset them based on the criteria of Designs I-IV before looking up the off-target scores:
- Design I: To induce early frameshift mutations, the masked candidate gRNA targets were subset to start coordinates within the first 40% of the genes, i.e. nt 364-757 (P1), nt 1830-1985 (P2), nt 3633-4440 (P5), and nt 5756-6380 (P6). No targets in this range we found for P3 in the conservation-masked genome, so gRNA targets from the unmasked sequence were selected. Five top candidates were identified for each gene based on their position and (mostly) their off-target score. One gRNA for each gene was then selected based on BLAST results.
- Design II: Four sets of 5 gRNA candidates were chosen for further inspection: the 5 P6 gRNA from Design I, gRNA near the beginning of P6 found in the unmasked genome, gRNA in the middle of P6 and gRNA near the end. One from each set was selected based on BLAST results.
- Design III: The noncoding region stretches from nt 7339-7366. Four gRNA near the beginning and five gRNA near the end of the region were found in the unmasked genome (none were found in that region in the masked genome, which is perhaps unsurprising as a non-coding region is less likely to be conserved). Two gRNA were chosen near the beginning and two near the end based on BLAST results.
- Design IV: For this design, we wanted to stop expression of the two mRNA transcripts. However, we were using the CaMV 35S promoter elsewhere in our design, so we couldn't target it directly. Instead, we looked at the first 100 nucleotides of each transcript (i.e. as close to the promoter as possible) and chose a few candidates to BLAST, eventually choosing one gRNA target for each transcript.
To have every tissue of a plant express the anti-viral CRISPR-Cas9 system one must first integrate its sequence into the genome of the plant's germ line so that every cell of the next generation becomes a CRISPR-Cas9 carrier. Agrobacterium tumefaciens is widely considered nature's genetic engineer; its life cycle involves the insertion of a portion of its own DNA into the genome of its plant host cells, including germ line cells . In nature, that inserted sequence codes for genes that encourage plant cell tumorigenesis, the expression of biosynthesis enzymes that produce bacterial nutrients, and genes that result in the creation of specialized amino acids called opines . These are collectively known as the T-DNA and are found on the tumor inducing plasmid (Ti plasmid) within Agrobacterium flanked by 25 base pair repeat border sequences . The Ti plasmid also contains a set of virulence genes which code for proteins responsible for the excision of single stranded DNA between border sequences and the transport and integration of that DNA into the plant cell genome . It has been frequently demonstrated that the entire set of genes between the border sequences can be replaced with up to approximately 25,000 base pairs and still be acted on for plant genome integration . Further, it's been found that the Ti plasmid's virulence genes can act in trans to integrate a sequence from another plasmid (often a shuttle vector) so long as it is flanked by border sequences . Strains of Agrobacterium that have had their Ti plasmids modified to contain no border sequences (and thus no sequence to be integrated) are said to be disarmed.
pCAMBIA is an open-source plasmid repository specializing in plant genome integration. Their vectors are meant for use in conjunction with Agrobacterium that have disarmed Ti plasmids (virulence genes will act in trans to integrate genes between the border sequences of the pCAMBIA plasmid). pCAMBIA 1200 (GenBank: AF234292.1) is a shuttle vector with chloramphenicol resistance and origins of replication for both Agrobacterium (pVS1-REP) and the plasmid-construction friendly E. coli (pBR322) (figure 3). The vector contains a basta resistance gene and a multiple cloning site between its border sequences; successful transgenic seeds get screened by surviving basta herbicide, and the multiple cloning site sits within a lac Z alpha coding sequence for blue-white insertion screening on X-gal media. Once the Cas9 and sgRNA sequences are cloned into a pCAMBIA 1200 vector, it gets electroporated into a strain of Agrobacterium containing only disarmed Ti plasmids. Successful transformants can be screened in chloramphenicol supplemented media then grown up to and allowed to infect 4-week-old plants.
20 base pair gRNA portions of three sgRNA sequences were substituted for three different sequences complimentary to the P6 coding sequence of the CaMV genome. Each of these sgRNAs were designed to have flanking 40 base pair Gibson Assembly ends that are complimentary to one another such that each sgRNA will anneal to the other in a Gibson reaction. A plant-codon optimized Cas9 was found through AddGene (Plasmid #52254) and primers were designed to amplify it out and add flanking Gibson Assembly ends - one end was designed to be homologous with the left side of the pCAMBIA multiple cloning sequence and the other end was designed to be homologous to the left-most Gibson Assembly end of the first sgRNA. The Gibson Assembly sequence on the right side of the third sgRNA was designed to be homologous with the right side of the pCAMBIA multiple cloning sequence.
Although Agrobacterium is capable of integrating DNA into a wide range of eukaryotic species, the competence for transformation into the Arabidopsis thaliana genome has been demonstrated and used routinely for research since the late 70s . Transformation of the plant's female gametes is accomplished simply by dipping the plant into a solution of Agrobacterium cells containing an appropriate plasmid (pCAMBIA) with the sequences to be integrated . Seeds from the treated plants are sown and allowed to germinate in selective medium that screens for successful transformants - mature transgenic plants can be attained within a mere 3 months . As the prototypical plant model organism, Arabidopsis has a fast life cycle, is prolific, easy to grow, and has a relatively small well annotated genome sequence . In addition, its interaction with the Cauliflower Mosaic Virus (CaMV) -- famous for the widespread use of its 35s promoter for transgenic expression in plants -- has been well characterized in the literature .
There are three different ways we sought to express Cas9 from Arabidopsis plant cells: floral dip, agroinfiltration, and protoplast transformation. For both floral dip and agroinfiltration, the Cas9 and its corresponding anti-CaMV sgRNAs had to be cloned into pCAMBIA since each of those methods relied on the action of Agrobacterium.
Since we were originally attempting an ambitious assembly to put this together (six fragments, two of which were >5kb), none of these original attempts worked. Thankfully, the homologous ends that each fragment was designed to have lent themselves to other forms of cloning which were not necessarily Gibson-based. PCR overlap extension, a technique suggested by the University of Ottawa iGEM team, provided us with a possible route to success. By joining together some of the smaller sgRNA fragments into clusters, we sought to increase the probably of Gibson Assembly working by reducing the number of fragments. Two fragments of the less ambitious five-fragment Gibson Assembly plan were successfully joined by PCR overlap extension before another Gibson reaction was attempted. The now four-fragment reaction produced colonies that produced promising PCR diagnostic results (pictured gel electrophoresis image), but inconclusive sequencing results. Since plants take time to grow, after the integration of our construct into the Arabidopsis genome by Agrobacterium-mediated floral dip we'll need to wait approximately 1-2 months for mature CRISPR-Arabidopsis plants to be in a testable state against the CaMV virus -- not in time to have them tested before the Jamboree. We were able to test our design in other ways, however.
Agroinfiltration allowed us to induce transient expression of the potential pCAMBIA construct in Arabidopsis leafs. We were able to bring our agroinfiltrated plants to a plant testing facility in London Ontario and expose them to the CaMV virus. Several days after exposure, we found the agroinfiltrated plant leafs were just as badly infected as he non-modified plant leafs. It was suggested to us that the double-insult of the harsh process of agroinfiltration followed shortly after by viral exposure (virus is rubbed into the leaf vigorously), it's likely the leaves were simply overwhelmed by the entire process.
Protoplast Cas9 Expression
Protoplasts are plant cells that have had their plant cell walls stripped off of them. They will express any coding sequences downstream of plant-specific promoters that they are transfected by for 16 hours. By transfecting plasmids that have Cas9 coding sequences downstream of plant specific promoters, we aimed to test whether the Cas9 would successfully express by performing a dot blot with protoplast samples and exposing to an anti-Cas9 antibody.
We used the original AddGene vector we purchased that encoded the plant codon optimized Cas9 under a CaMV constitutive promoter as a positive control (labelled as "pC"), a GFP-expressing vector as the negative control (labelled as "-C "), and our tentative unconfirmed pCAMBIA construct that may contain the AddGene Cas9.
The first image is the dot blot exposure itself, and the second image is a visible light photograph of the same blot.
The third image is a simple viability stain we performed to confirm protoplast health and validate our protocol.
Although we also attempted to perform western blots on the protoplast samples, we didn't get beyond the ponceau stain stage (pictured below).