Team:Peking/Modeling/test

Modeling

Specificity!!!!

Overview

To increase the accuracy and specificity of the detection, we developed an assay over our Paired dCas9 Reporter (PC Reporter) System to get more sequence information from the target genome. The core as well as the first step of the design of the array is to screen over the entire genome and get paired specific sequences (CRISPR target sites) with high specificity as markers.
We develop a method named SSPD to achieve our aim, which is composed of 4 steps:

  • Search for guide sequences of gRNA candidates
  • Specificity test for each candidate
  • Pair left and right target sites with optimal spacer length
  • Design PCR fragments
We will introduce each step in detail separately with analysis about Mycobacterium tuberculosis (MTB) genome as an example. After the target sites are chosen, we developed an Oligo Generator to turn the target sites into oligonucleotides sequences for following sgRNA construction combined with our gRNA generator (Part).

SSPD Methods

Search for guide sequences of gRNA candidates

Recall the structure of Paired dCas9 Reporter (PC Reporter) System, a protospacer adjacent motif (PAM) sequence in the form of 5’-NGG-3’ at 3’ end of guide sequence, usually 20bp, on the non-complementary strand. (Figure 1) As it was showed in our experimental results that PAM-out orientation (5’-CCNN20-…-N20NGG-3’) was highly efficient for PC Reporter system to work, thus our model would focus on this orientation. (However, it can be more convenient to adjust our program for guide sequence design also with other orientations. See more in Supplementary 1 )

Modeling_Fig1

Figure 1. Schematic illustration of guide design in PAM-out orientation. Note the 20nt guide sequence is identical to target non-complementary strand.

We took advantage of Python 3.4.3 build-in regular expression to search for left guide sequences of gRNA (‘(?<=cc).(?=.{20})’) and right guide sequences of gRNA (‘(?<=.{20}).(?=gg)’) separately, which would be paired later for PC reporter system to function.

Specificity test for each candidate

Specificity of guide sequence of gRNA here is defined as the probability of the gRNA binds to the corresponding target site instead of other similar non-target sites. It is measured by taking both quantity of potential off-target sites and similarity between off-target sites and the unique target site into consideration. Since sputum sample is commonly used in MTB detection, we compared our guide sequence candidates with Human Oral Meta-Genome (HOMG), and reserved the specific sequences orthogonal to oral meta-genome to avoid false positive signals in MTB detection. Here we adopted a BLAST-based 2-step filter approach to realize it. In general, the two steps are: a) Filter out guides with off-targets that have 12 bp PAM-proximal sequence identical to corresponding target; b) Score the reserved gRNA on specificity. The principle of the score-rule is that higher specificity should get higher score (see the detail below). Thus we can easily filter out guides with high off-target probability, which is indicated by a low score.

Pair left and right target sites with optimal spacer length

All reserved left and right target sites after two-step filtration are considered to be qualified for pairing. In this step, a left site and a right site with appropriate spacer length will be paired. The best spacer length is 19-23bp for split-luciferase dCas9 fusion system according to our experimental data (Figure 4, Link to CRISPR). Single sites that cannot pair with any other sites within the given range of the spacer length would be eliminated.

Modeling_Fig4

Figure 4. The effect of spacer length variation on the performance of PC reporter system. Orientation of sgRNA binding-site pairs used in the test is "PAM-out" and the spacer sequence length varies from 5bp to 107bp.

Design PCR fragments

We provide two methods for determining PCR fragments. For the first, fix pair number k per fragment, search the adjacent but non-overlapped k pairs. Sort the results with fragment length. For another, fix maximal PCR fragment length, sort the results with pair numbers per fragment. Sorted results will be presented on user interface, enabling users to select fragments by themselves as needed. Selecting overlapping fragment is not allowable for array design.

Modeling_Fig5

Figure 5. Schematic illustration of PCR fragment determination method, taking 2 pairs per fragment as an example. The top xxx shows all left and right targets on given segment of pathogen genome, and the chart aside lists all pairs. Only adjacent but non-overlapped pairs can be deposited on one fragment. Users can choose one through the four optional PCR fragment listed below, since they are overlapped.

Modeling_Fig6

Figure 6. The result of 72 sequence markers chosen in MTB genome. They are divided into 9 fragments for PCR as shown above

Oligo Generator

Using SSPD method mentioned above, we can easily find the reliable target sites on genome. However, designing multiple target sites into oligonucleotides sequences for following sgRNA construction manually can be laborious. Thus here we developed a supplementary program to facilitate oligo sequence generation, which is combined with our sgRNA generator (Link to Part xxx, Figure 6). Specifically, we used Golden Gate Cloning to make it more convenient to substitute guide sequences for different target sites. Detailed operation is explained on flow chart below. See more details in Supplementary 2.

Modeling_Fig7

Figure 7. Schematic illustration of a flow chart explaining the protocol of guide sequence substitution.