Introduction
Background
Tuberculosis (TB) ranks as the 2nd leading cause of death among various infectious diseases [1]. The fact that earlier diagnosis and treatment indicate better prognosis attaches great importance to the diagnostic services for TB patients. The current methods include tuberculin skin test, blood test and sputum culture. However, tuberculin skin test is not sensitive, blood test is risky and sputum culture needs a waiting period as long as 1-2 months which will delay timely treatment. Methods based on nucleic acid detection (NAD) are highly sensitive, safe and rapid; however, they are not widely applied to clinical practice. Why?
After consulting some front-line practitioners of TB control, we found that NAD has high false positive rate because non-specific amplification frequently occurs and current methods of NAD are not able to read out sequence-specific information; this makes them barely reliable. What' more, NAD requires expensive and clumsy instruments which are not accessible to rural areas where TB is epidemic (For more detailed information, see Human Practice). Therefore, Peking iGEM 2015 is devoted to building a new reporter system that enables NAD to be specific, reliable and less instrument-dependent.
CRISPR
To improve the specificity, we need a system that can read out sequence information directly. The first thing coming to our mind is the clustered regularly interspaced, short palindromic repeats (CRISPR)-associated protein 9 (Cas9) system, which is originally from prokaryotic immune system [2]. Cas9 is a sequence-specific, DNA cleavage protein guided by single-guide RNA (sgRNA); they form a complex and can be easily programmed to recognize any target sequence [3]; in previous studies it was engineered to be catalytically dead (dCas9), thus to form a programmable DNA binding complex [4]. But how can we convert the invisible sequence information into easily measurable signal?
Split luciferase
Split enzymes [5] were selected as the candidates to convert the presence of target DNA sequence into measureable signal. Each protein fragment by itself is inactive; when the fragments are reassembled, the enzymatic activity of the original protein would be reconstituted, thus providing easily measurable readout through enzymatic reaction. Among diverse split enzymes, we chose split luciferase which produces bioluminescence signal (Figure 1).
Figure 1. Illustration of the working mechanism of split luciferase. A functional luciferase is split into two inactive fragments (Nluc, orange and Cluc, yellow). Each fragment is fused to an interacting domain (gray) that tends to dimerize. The interacting domains cause non-covalent complementation of Nluc and Cluc to reconstitute the enzymatic activity, thus to emit measurable bioluminescence.
Design
We combined dCas9 and split luciferase to obtain a paired dCas9 (PC) Reporter system that can not only extract the sequence information of target DNA, but also quantitatively visualize its abundance through bioluminescence emission. As shown in the schematic diagram (Figure 2), when N-luciferase-dCas9:sgRNA complex and C-luciferase-dCas9:sgRNA complex simultaneously bind to adjacent sites on a target DNA, the two fragments of luciferase are brought into proximity and the subsequent complementation brings measurable bioluminescence signal.
Figure 2. Schematic of the paired dCas9 (PC) Reporter system. dCas9 was fused with fragments of split luciferase (Nluc and Cluc), respectively, to form two kinds of split luciferase-dCas9:sgRNA complexes. In the presence of target DNA, the complexes, respectively, bind to two adjacent recognition sites, bringing Nluc and Cluc into proximity to produce bioluminescence signal.
Results
Validating the functional reconstitution of split luciferase
We began by examining whether our split luciferase could be really functionally reconstituted by protein fragment-assisted complementation (PFAC) [7,8]. For the complementation-assisting partners, we chose rapamycin-binding domain (FRB) of human mTOR that binds with high affinity to FK-506-binding protein 12 (FKBP); rapamycin is able to induce the dimerization to form a FRB-rapamycin-FKBP complex (Figure 3a). A pair of widely-used split firefly (Photinus pyralis) luciferase fragments (Nluc 416/Cluc 398) was then selected to construct Nluc-FRB and FKBP-Cluc fusion proteins; they are, therefore, subjected to rapamycin induced complementation [9]. Our results of PFAC assay on our split luciferase fragments confirmed that the luciferase activity is able to be successfully reconstituted in a rapamycin-dependent manner (Figure 3b).
Figure 3. Rapamycin-induced Nluc-FRB/FKBP-Cluc complementation. (a) The working mechanism of rapamycin induced dimerization. The interacting protein partners (FRB and FKBP) get closer and dimerize soon after rapamycin is added (40nM) [10], thus to reconstitute the enzymatic activity of luciferase. (b) The experimental data. Error bars denote s.d.; n=3.
The location of His-tag
Despite the fact that His-tag is widely used for protein purification, it is well-known that His-tag might have some considerable impact on protein structure. Hence, we first tested the activity of C-terminally and N-terminally His-tagged split luciferase-dCas9 fusion proteins to see whether the location of His-tag matters. We found that when the His-tag is attached to Nluc or Cluc, the activity of fusion protein would dramatically decrease (See working mechanism of PC reporter in Figure 2); this is probably due to the interruption of luciferase reconstitution by His-tag (Figure 4). When the His-tag is attached to dCas9, the enzymatic activity was reconstituted significantly. Therefore, we chose split luciferase-dCas9 fusion proteins with His-tagged dCas9 for the following study.
Figure 4. The impact of His-tag location on the reconstitution of luciferase activity. When attached to dCas9, His-tag has minimal impact on the complementation of Nluc and Cluc.
Construction and optimization of PC Reporter system
Considering that dCas9 is a huge protein and is structurally complex, we need to figure out the configuration of PC Reporter system. We first fused split luciferase to either N or C terminus of dCas9 (Figure 5a) to see its influence. Then, provided that initial binding of dCas9 depends on the protospacer adjacent motif (PAM, a short 3’ motif adjacent to target sequence) [11], four sets of sgRNA orientation settings were also tested (Figure 5b). One set placed two PAM sequences distal from the spacer sequence between the sgRNA pair, with the 5' end of the sgRNA adjacent to the spacer (PAM-out) while another set put both PAM sequences adjacent to the spacer (PAM-in). PAM-direct 1 and PAM-direct 2 combine both sgRNA orientations with one PAM sequence adjacent to and another distal from the spacer (Figure 5b).
In total, 16 pairs of split luciferase-dCas9:sgRNA complexes were constructed and tested (Figure 5c). Most of these pairs worked very well. Among them we selected Nluc-dCas9/Cluc-dCas9, the most robust protein construct with minimal intergroup difference, and PAM-out, the sgRNA orientation exhibiting the strongest luminescence, for the following study.
Figure 5. Construction and optimization of PC Reporter system. (a) 4 different split luciferase-dCas9 fusion strategies. (b) 4 different sgRNA orientation settings. In orientation PAM-out, the pair of PAM sequences are distal from the spacer sequence, with the 5' end of the sgRNA adjacent to the spacer; in orientation PAM-in, the pair of PAM sequences are adjacent to the spacer sequence, with the 3' end of the sgRNA in proximity to the spacer; in orientation PAM-direct 1 and PAM-direct 2, one PAM sequence is adjacent to and another distal from the spacer. (c) The experimental results of PC Reporter systems using 16 different (4 different constructs X 4 different sgRNA orientation settings) configurations.
As the reconstitution of enzymatic activity was constrained by the distance between Nluc and Cluc [11], we varied the spacer length from 5 bp to 107 bp (Figure 6) to find how it affects the performance of our PC Reporter system. Results showed that the PC Reporter system worked with a preference for spacer length of about 21 bp when two fusion proteins bind at the same side of the DNA double helix. As the spacer length increased, the luminescence intensity decreased. At the distance of 28 bp where two fusion proteins bind at the opposite side of the DNA double helix, the reconstitution was unstable, thus producing a lower luminescence. However, the luminescence went stronger again when two fusion proteins bind at the same side at the length of 33 bp. And finally it was too far for split luciferase to reconstitute again.
Figure 6. The effect of spacer length variation on the performance of PC Reporter system. Spacer is defined as the sequence between the sgRNA pairs. The spacer length varies from 5 bp to 107 bp. Note that the effect of spacer length exhibits a period of 10-11bp, which is consistent with the structure of DNA double helix.
Taking all the above together, the final design of our PC Reporter system was determined: both fragments of split luciferase fused to N-terminus of dCas9, PAM-out sgRNA orientation, and spacer length of 21 bp. Next we set out to validate whether PC reporter can read out the sequence-specific information using crude PCR product instead of well-purified DNA samples.
Applying PC reporter to crude NAD product
We first carried out specificity test using PC reporter system to distinguish the target plasmid (BBa_K909009) from different kinds of non-specific controls (Figure 7a,b,c). The results showed that the luminescence intensity using the positive target is significantly higher than that of non-specific controls (Figure 7d).
Figure 7. Specificity test of PC Reporter system. (a) Positive target with two sites recognized simultaneously by a pair of split luciferase-dCas9:sgRNA complexes. (b) Control with only one site recognized by "half pair". (c) Negative control with no site to be recognized. (d) Experimental data. Error bars denote s.d.; n=3.
Also, we need to test the sensitivity of PC Reporter system. As shown in Figure 8, PC reporter can still work at the concentration of ~0.1nM target for both plasmid (Figure 8a) and crude PCR product (Figure 8b).
Figure 8. Exploring the sensitivity of PC reporter using plasmid and PCR product as the target. (a) Bioluminescence intensity at different concentrations of plasmid as the target. (b) Bioluminescence intensity at different concentrations of crude PCR product as the target. Both showed that PC reporter was able to detect target DNA at a low concentration (0.1 nM).
However, we don’t know whether 0.1nM is the bottom limit of detection. We speculated that with multiple PC reporters recognizing more sites on one target, enhanced bioluminescence signal could be observed. In that way we could get a relatively high signal at a lower concentration of target DNA. The results confirmed our speculation (Figure 9).
Figure 9. Bioluminescence intensity with multiple PC reporters. One, three, and five PC reporters were, respectively, subjected to the measurement using PCR product of 0.01nM.
After validating the specificity and sensitivity of PC Reporter system. We were eager to apply it to detect the real MTB genome. Fortunately, we got the absolutely safe MTB genomic DNA from Prof. LIU Cuihua (Chinese Academy of Sciences).
First, we used the most popular commercial TB NAD toolkit (TB-LAMP) to amplify MTB genome and found out that there was a large yield of non-specific product after amplification (Figure 10a,b), which could be mistakenly considered as a positive result if using the conventional diagnosis method. However, our PC Reporter system can easily distinguish the true from the false (Figure 10c).
Figure 10. (a) Gel electrophoresis of amplification product of LAMP using MTB genome or control genome (E.coli). (b) Amount of amplified nucleic acid measured using the method provided in TB-LAMP toolkit. (c) Bioluminescence intensity generated by PC reporter, which is able to distinguish the true PCR product from the false.
Then we set out to test whether our PC reporter could be applied to the clinical cases where patients might be in an early stage of TB thus the copies of MTB in their pathological samples could not be that many. The key factor is to figure out the minimum copy numbers of MTB genome in crude sample that can be detected by PC reporter when combined with PCR. We performed a series of dilution on the MTB genome before PCR amplification and found that, when the PCR products are subjected to the measurement, our PC reporter can easily reach the detection limit of only 1 genome copy per tube, which is sensitive enough for real TB diagnosis (Figure 11).
Figure 11. The effect of genome copies/mL for NAD amplification on the bioluminescence intensity of PC reporter.
From single marker to multi-marker array
We have demonstrated that our Paired dCas9 (PC) Reporter system works well in bench and could be applied to the detection of MTB genomic DNA. Particularly, when combined with PCR, our PC reporter is as sensitive as to confidently detect only 1 genome copy per tube.
However, in the clinical practice, the situations are much more complicated considering the fact that strain mutation, sample difference and other environmental factors would cause misdiagnosis for single test. So we got an idea of designing a diagnosis array to extract more sequence information for our PC Reporter to test multiple sites on the target genome at one time, thus to guarantee reliability of diagnosis.
The core and also the first step of array design is to enumerate all the dCas9 binding site (PAM site) through the entire genome of MTB and to identify MTB-specific PAM sites as markers via SSPD. After computational screening, we totally obtained 2791 MTB-specific markers. 72 sgRNA pairs was designed according to 72 markers out of 9 fragments on MTB genome (Figure 12). Then we utilized Oligo Generator to design oligos for Golden Gate Cloning in a high-throughput manner, built sgRNA Generators for cell-free transcription, and finally obtained 70 sgRNA pairs (2 pairs failed during cloning).
Figure 12. 72 target sites (MTB-specific markers) out of 9 fragments on MTB genome, screened out using SSPD.
Then we performed array-based high-throughput measurement using PC Reporter system (See Methods). Two separated assays were carried out using MTB and a control bacteria (E. coli), respectively. By data analysis, the results were presented as heat maps below (Figure 13). We can easily discriminate the results of MTB from that of control strain. Note that among 70 successfully constructed sgRNAs, 59 exhibit a significant performance for the detection of MTB; 11, however, fail to be positive. This is consistent with our concern that some factors might disrupt the performance of individual single test of PC reporter. Reassuringly, the array-based PC reporter detection solved this problem (Figure 13).
Figure 13. Results of high-throughput assay for MTB and control strain. F denotes fragments obtained from MTB genome (a) or control strain (b); P denotes markers from each fragment.
To validate the results in a quantitative way, we selected Wilcoxon Rank Sum Test of Block Design as a model, in which the sum of Wilcoxon Rank Sum statistics Wj (1≤j≤m) of each block was calculated as
With the data of our experiment substituted, the statistics WBD was calculated as
Therefore MTB and control strain (E.coli) are significantly different in signal. That is, our PC Reporter system is able to distinguish the sequence of MTB genome from that of other non-specific strains.
Methods
Clone construction of split luciferase-dCas9 fusion protein
We cloned our split luciferase-dCas9 fusion protein in the pET-21a by Golden Gate assembly (Figure 14a,b).
Figure 14. Plasmids of split luciferase-dCas9 fusion protein in pET-21a. (a) Plasmid of Nluc-dCas9 fusion architecture with T7 promoter, Cluc-dCas9 coding sequence and T7 terminator. (b) Plasmid of Cluc-dCas9 fusion architecture T7 promoter, Nluc-dCas9 coding sequence and T7 terminator.
Selection of test target sequence
Before taking the sequence of Mycobacterium tuberculosis(MTB) genome as target of our PC Reporter system, we first performed our experiments within a test target gene, which must not only conform to the substrate requirements of dCas9, but also harmless and safe enough to its bacteria carrier after chemical transformation. After screening over iGEM Registry of Standard Biological Parts, we finally took Part: BBa_K909009 (cDNA of UV-B sensing protein UVR8 from Arabidopsis thaliana (from iGEM12_ETH_Zurich) as the test target(Figure 15).
Figure 15. Schematic of Part: BBa_K909009.
sgRNA generation
To obtain sgRNA, we used SSPD to screen out all guide sequences from the entire genome and Oligo Generator to generate oligos for Golden Gate Cloning, then inserted all guide sequences into sgRNA Generator backbone (Figure 16). Through PCR amplification and cell-free transcription using HiScirbe T7 Quick High Yield RNA Synthesis Kit (New England Biolabs), the sgRNA was produced and purified at last.We specially thank ZHAO Xuejin (Research Assistant, Institute of Microbiology, Chinese Academy of Sciences) for providing us with the sgRNA Generator!
Figure 16. Schematic of sgRNA Generator backbone, containing T7 promotor, lacZa’, crRNA(tracrRNA) and T7 terminator.
Testing PC Reporter system
We invented a new protocol for testing our PC Reporter system (Figure 17).
Figure 17. Flow chart for testing PC Reporter system. Since one dCas9 fusion protein would form a complex with a pair of sgRNAs simultaneously, which will decrease the complementation of Nluc and Cluc, we incubated two split luciferase-dCas9:sgRNA complexes separately at 25°C for 10 min. After adding target DNA into the mixture of two complexes and incubated at 37°C for 30 min, they simultaneously bind to two adjacent half-sites on target DNA, bringing Nluc and Cluc into close proximity, thus leading to recovery of its enzymatic activity. Then transfer the mixture to 96-well plates before adding Luciferase Assay Reagent (Promega) and promoptly measure the luminescence in Microplate Reader (Thermo Scientific).
High-throughput array detection assay
Based on the results we acquired, single-site target detection presents relatively high specificity and sensitivity. However, taking strain mutations, sample differences and other environmental factors in clinical practices into consideration, we set an array (Figure 18) to detect multiple target sites (markers) of MTB to make our PC reporter quantitatively reliable.
Figure 18. The 96-well plate layout in array detection assay. 9 fragments (F1-F9) screened out by SSPD from MTB genome were set in each line, and 8 MTB markers (M1-M8) in each fragment were respectively set in each sample pool. In total 8*9=72 markers were detected in one single assay. For each fragment, P9 meant a negative control with a mismatch sgRNA pair added. Two separated assays were carried out using MTB and a control strain (E. coli) as experimental subjects respectively.
References
1. WHO. Global Tuberculosis Report 2014[M]. World Health Organization, 2014.
2. Rodolphe Barrangou, Christophe Fremaux, Hélène Deveau, et al. CRISPR provides acquired resistance against viruses. Science, 2007, 315: 1709-1712.
3. John P Guilinger, David B Thompson, David R Liu. Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nature biotechnology, 2014, 32, 6.
4. Lei S. Qi, Matthew H. Larson, Luke A. Gilbert et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell, 2013, 152: 1173-1183.
5. Sujan S Shekhawat, Indraneel Ghosh. Split-protein systems: beyond binary protein-protein interactions. Current Opinion in Chemical Biology, 2011, 15: 789–797.
6. Taha Azad, Amin Tashakor, Saman Hosseinkani. Split-luciferase complementary assay: applications, recent developments, and future perspectives. Anal Bioanal Chem, 2014, 406: 5541-5560.
7. Rossi F, Charlton C, Blau. H. Monitoring protein-protein interactions in intact eukaryotic cells by beta-galactosidase complementation. Proc. Natl. Acad. Sci. 1994, USA 94: 8405–8410.
8. Remy I, Galarneau A, Michnick S W. Detection and visualization of protein interactions with protein fragment complementation assays. Methods Mol. Biol. 2002, 185: 447–459.
9. Kathryn E. Luker, Matthew C. P. Smith, et al. Kinetics of regulated protein–protein interactions revealed with firefly luciferase complementation imaging in cells and living animals. PNAS, 2004, 101: 12288-12293.
10. Ramasamy Paulmurugan and Sanjiv S. Gambhir. Combinatorial Library Screening for Developing an Improved Split-Firefly Luciferase Fragment-Assisted Complementation System for Studying Protein-Protein Interactions. Anal. Chem. 2007, 79: 2346-2353.
11. Jinek M, Chylinski K, Fonfara I et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science, 2012, 337(6096): 816-821.
12. Ramasamy Paulmurugan, Sanjiv S. Gambhir. An intramolecular folding sensor for imaging estrogen receptor-ligand interactions. The National Academy of Sciences of the USA, 2006, 103: 15883-15888.
13. Bruce Alberts et al. Molecular Biology of the Cell. Fifth edition.