Team:Tsinghua/Design

Brief Introduction

As the team iGEM Tsinghua 2015, we established a biological information storage platform with visible lights as input, and DNA sequences edited by modified recombinase as information stored. A hardware with supporting software is developed to carry out the work with genetically modified bacteria. Stored information is read out by the means of DNA sequencing, which is then decoded by our software. By developing this system, in the future one can easily store information from any file in the computer or elsewhere into the bacteria mediated by light and read it out just a click-away by sequencing.



Light-switchable two-component system

Nowadays, synthetic photobiology has become a relatively mature field, within which scientists develop light systems from all sorts of organisms and integrate them into bacterial systems. Plus, different components and modules of light-responsive proteins from different species have been engineered together to achieve highest efficiency. Therefore, if we have to choose one form of signal as the input, the optical input is favored [1].



·Advantages
Using light as an input signal has obvious advantages. First, it has extremely high spatial and temporal precision, unlike small chemical molecules which can be diffusible and will be diluted when bacterial proliferate and culture medium is changed. Second, easy access and low cost renders light system frequently used. For example, a light-emitting diode (LED) usually costs less than 10 cent. Third, optical stimulation is noninvasive and mild, unlike thermal, mechanical and chemical stimulation that might potentially put the bacteria in jeopardy. Its minimal off-pathway effect is also a must when considering arbitrarily adding light-responsive elements into the bacteria. Forth, it is potentially orthogonal and programmable. Different light systems generally do not interfere with each other, and therefore can be stimulated and silenced in parallel [2].



·Principle
Light-switchable two-component system (TCS) is one example of how light signal can be wired into the metabolic pathway within the bacteria [3]. As is indicated by its nomenclature, this system is switchable – it has two interchangeable states when stimulated by different light conditions. Additionally, there are two components within: a light sensor and a response regulator, the former sensing the incoming light and responding to it by changing the conformation, the latter reacting to the sensor and turning on or off the gene expression due to its transcriptional factor nature. To be more specific, a light sensor is made up of two modules: an actual light sensor and an effector which possesses both kinase and phosphatase activity.

The sensor and the effector interact closely in order to give a precise light-induced response. The principle behind light-switchable two-component system goes like this: When a beam of light hits on the bacteria, the effector module in the sensor, i.e., the HK domain, will change its confirmation accordingly, therefore its catalytic activity transits from a phosphokinase to a phosphatase. Consequently, its target response regulator will be dephosphorylated and in turn inactivated. As a result, RR cannot recognize its downstream target sequence and cannot activate the expression of the reporter gene. In our system, three mainstream light systems we took advantage of all follow this basic scheme.


·Design
Three types of TCS are now the most commonly investigated, including red, blue and green light system, named by at which wavelength the system is responsive.

The red-light system used in our project consists of two components, a membrane-bound light sensor Cph8 and a response regulator OmpR [3]. The light sensor is made up of a red-light-sensitive cyanobacterial phytochrome sensor module Phy derived from a protein called Cph1 from S. PCC 6803, and a histidine kinase domain from a protein called EnvZ from E. coli. The response regulator is derived from OmpR of which the recognition site is a promoter named OmpC. Red light will induce reversible conformational switch in Cph8, leading to kinase activity loss. OmpR, as a substrate of Cph8 kinase, will be dephosphorylated, which prevents it from binding to OmpC promoter and driving the expression of genes downstream. Since that there is an endogenous expression level of red-light system in E. coli, a bacterial knock-out technique is introduce to avoid a potentially confusing result [4].

The blue light system follows similar principles, containing two components as well. It is also a protein hybrid that is made up of modules from different species. The blue-light-sensitive LOV domain in its soluble light sensor YF1 is derived from a protein termed YtvA from B. subtilis, whereas the histidine kinase domain derived from the protein FixL and the response regulator FixJ are found B. japonicum. In this system, a Jα chain is introduced to link the light sensor and effector together, of which the conformation change is induced, switching the YF1 (the fusion protein) from a kinase to a phosphatase. Thus, the response regulator, FixJ, is dephosphorylated and deprived of the ability to drive FixK2-promotor-regulated gene expression [3].

The green-light system works an extremely similar way to that of the blue light system: it is comprised of two essential components, a light sensor and a response regulator. Here, however, the light sensor module is designated as Cyb, along with its histidine kinase, constituting the light sensor component CcaS. Its response regulator is called CcaR, recognizing PcpcG2 promoter and in turn regulator its downstream genes. These are constituents of cyanobacteriochromes [3].


dCas9-recombinase system

There are two commonly used gene-editing tools: site-specific recombinase and CRISPR-Cas9 system.
Site-specific recombinase is an endonuclease that is capable of inserting, deleting and inverting a DNA fragment within the recognition site. Generally, two families of recombinase have been identified: the tyrosine recombinase and the serine recombinase [5]. Though one particular outcome of recombination, be it inserting, deleting or inverting, is preferred in different organisms, other editing modes can also been selected when arbitrarily manipulated. As a result, a recombinase system is the most ideal candidate when looking for an information storing executor.

Cas9, an endonuclease from Streptococcus pyogenes, can target and cleave specific DNA sequences that are next to the proto-spacer adjacent motif (PAM) when provided with a guide RNA. With the advancement of gene editing technology, today CRISPR/Cas9 system has been exploited to carry out a myriad of functions, such as knock-out and knock-down of a certain gene, single molecule imaging, etc. The list goes on. Of course, it is easy to understand why we then turn to Cas9 and see if it can overcome the specificity issue from recombinase issue [6].

Recombinases were previously utilized to accomplish information storage in biological systems due to its specificity [7]. However, they bind unique recognition sites, and are thus limited in this respect. It is exactly its specificity that disfavors this approach. In other words, a major drawback of this information-storing platform is that every time a new recombinase has to be used when increasing the storing capability. Finding a new recombinase that suits the need, however, is computationally heavy. We then decided to seek help from other gene-editing tool [6].  

CRISPR-Cas9 system is a newly developed gene-editing tool that breaks the limit of specific recognition sites. Following the guidance of sgRNAs, Cas9 endonuclease can be used to modify any site of the genome conveniently. Consequently it is regarded as a complementary DNA cutter that is not restricted to recognize unique sequences, but is versatile that can recognize any sequence within the genome guided by its sgRNA. This means that if current information storage capacity is not enough, we do not need to search for a new recombinase, instead changing the sgRNA pairs can solve the problem. However, accurate deletion or inversion, a vital aspect to consider when devising an information storing platform, are hard to accomplish because of double-strand breaks introduced by Cas9 endonuclease. An outstanding feature of tyrosine recombinases is that they do not introduce double-strand breaks that might cause unwanted consequences. Instead, in the process a cross-strand intermediate called Holliday Junction is formed [5]. That is to say, we still count on the specificity and accuracy of recombinase, but meanwhile we need the assistance from Cas9.

Therefore, we decided to combine the two. We deleted the DNA binding domain of Bxb1 and Flp, fused their catalytic domain and dimerization domain with dCas9, whose endonuclease activity was lost. A pair of sgRNAs targeting the sense and antisense strand of the DNA sequence guide the fusion protein to the location, and the recombinase part of the fusion protein carries out deletion or inversion of the DNA segment. With this new tool we don’t need to bother considering and introducing recognition sites of the recombinases anymore. Also, without causing double-strand breaks, gene editing can be safer and more accurate [6].

To achieve highest efficiency, we need to consider two aspects of optimization – linker design and distance between sgRNA pairs. We built an inducible ccdB screening system for it. ccdB is a lethal protein. Addition of iPTG induces expression of ccdB and then bacteria get killed, but if ccdB gene is successfully disrupted by dCas9 recombinase, then the bacteria survive. Along ccdB’s sequence we designed 25 and 24 sgRNAs targeting each strand of the DNA. Then there are altogether 600 pairs of sgRNAs with minimal distance of 0 bp and maximal 700. Effects of different designs of linkers are also tested using this inducible ccdB system.

We established a model of the inducible ccdB system indicating the relationship between iPTG addition and OD value after discussing parameters of iPTG, ccdB, relative concentration of bacteria, and OD value measured. It turns out that the inducible ccdB system works perfectly well, so it is competent to be used to screen the optimal distance between sgRNA pairs and an appropriate linker choice. In addition, the inducible ccdB system has other potential future applications.

Although the screening work wasn’t finished, a linker choice and distance between two sgRNAs worked finely. These were tested with another system concerning BFP expression.

After successfully constructing all the systems required and confirming its efficacy, we can bridge the light-switchable TCS and the dCas9-recombines system together. In this way, precise gene editing and information storing can be achieved by utilizing the light system to regulate the dCas9-recombinase hybrid.



DNA digital data recording mechanism

Reference:
[1] Tabor J J, Levskaya A, Voigt C A. Multichromatic control of gene expression in Escherichia coli[J]. Journal of molecular biology, 2011, 405(2): 315-324.
[2] Schmidl S R, Sheth R U, Wu A, et al. Refactoring and optimization of light-switchable Escherichia coli two-component systems[J]. ACS synthetic biology, 2014, 3(11): 820-831.
[3] Camsund D, Lindblad P, Jaramillo A. Genetically engineered light sensors for control of bacterial gene expression[J]. Biotechnology journal, 2011, 6(7): 826-836.
[4] Parts S B. Engineering Escherichia colito see light[J]. NATURE, 2005, 438: 24.
[5] Grindley N D F, Whiteson K L, Rice P A. Mechanisms of site-specific recombination*[J]. Annu. Rev. Biochem., 2006, 75: 567-605.
[6] Liu D R, Guilinger J P, Thompson D B. Cas9-recombinase fusion proteins and uses thereof: U.S. Patent Application 14/320,467[P]. 2014-6-30.
[7] Yang L, Nielsen A A K, Fernandez-Rodriguez J, et al. Permanent genetic memory with> 1-byte capacity[J]. Nature methods, 2014, 11(12): 1261-1266.

________________________________________________________________________________________________________________________