Project Description

What we want to do?

This year we the team Tsinghua 2015 attempt to store biological information into E. coli by combining the light system with the gene editing tools. We took advantage of the high precision and programmability of light system and the specificity and the convenience from a Cas9-recombinase hybrid. In order to build an information storage platform described above we devised a hardware assisted by a software that can eventually convert any form of profile into biological meaningful information.
For the light system we selected light-switchable two-component systems as the signal input, and intended to rely on three commonly used ones: red, blue, and green [1]. We adapted an engineering strategy onto these two component system by combining different modules and components from different species in order to achieve highest efficiency [2].
For the Cas9-recombinsae system we selected recombinase system as the tool to edit the gene due to its specificity for consensus sequences. Yet it is this advantage that limit its application because it is not convenient for upgrading the storage capacity. We therefore complemented this system by utilizing the CRISPR/Cas9 system, because it is guided by a sgRNA pair that is not limited to specific sequences. Minor changes, however, have been made to render it more applicable [3].
Given the ideas come up with above, how can we put all parts together in order to store information within the E. coli? A straightforward strategy is to use light-switchable two-component system to directly control the gene-editing hybrid. This is the basic philosophy behind our information storage platform. For example, we can denote the blue-light system to control information containing “0” whereas the red-light system to control information containing “1”. Green-light system do not represent none of two types of binary information, instead it acts as a license that allows the recombines to work [4].

What we have done?

In order to utilize the light as an input signal, we have to first test its basic parameters which can be refer to. That why we first constructed several plasmids for measurement. Two types of experiments were done to fulfill this need: a qualitative one and a quantitative one. We received quite convincing results to support that light-switchable two-component system can work successfully in E. coli. We additionally built a model of the relationship between the light input and the protein expression output based on previous results.
For the Cas9-recombinase system, we designed an iPTG-inducible ccdB screening system to test whether its gene editing ability is powerful or not. Eventually 600 possibilities of sgRNA combination can be tested using this screening strategy. Using this screening system, we can also determine the optimal distance between sgRNA pairs and length of the linker. All being said, we still needed to first determine the basic parameter of this inducible system. Again, qualitative and quantitative experiments are carried out, turning out to be promising to ensure that inducible system can work successfully in E. coli. Models discussing the relationship between the concentration of added iPTG and optical density value of the bacteria culture are built.
With two systems measured, now it is time to combine the two together. To cater to this need, we devised a hardware that can instantaneously emits light signals in massive parallel onto the bacteria. With the assistance from the software, we can either convert a file into the binary data string which can be transformed to a light emitting pattern with a coding protocol (a pre-programmed grammar), in turn being encoded into the bacteria by a modified recombinase, or we can put in light parameters and encrypt the information into the bacteria.  
The E-light 1.0 hardware system has 3 major components: the light-exposure & bacterial culture system, the controlling circuit and the computer interacting port. The light-exposure & bacterial culture system is based on a 24-well plate coupled with tri-color LEDs. The controlling circuit utilizes 3 AT89S52-24PU DIP-40 SCMs (single chip microcomputer) to execute programmed-controlling of the 24 tri-color LEDs, while the computer interacting port monitors the whole system through given protocol sequences. The ultimate result is the programmable operation and real-time monitoring of light-exposure (on both timing and wave-length) on every single well.
The E-code 1.0 software system aims to provide convenient commanding for users of the E-light hardware system. The software provides two operating modes: the E.coli-code mode is able to convert any given information into light-coded files, and therefore turn these files into actual light-exposure commands of the E-light hardware system. With the help of the coding-plasmids from our CRISPR-Recombinase system, we can eventually store any information into the E.coli DNA and of course, extract the information later on through sequencing. The self-code mode provides more flexible input options, enabling users to program the light-exposure commands manually for every single bacterial-culture-unit. Thus, combined with our light-switch, the user is able to gain better control over the bacteria’s metabolism pathways.

What we can do in the future?

With this information storage platform in hand, what can we do in the future? With more delicate design and further investigation into this system, we can take our platform onto a whole new level.
First, we could be able to search for more light-switchable two-component systems. Our light input will not be limited to only three colors, instead we might take it to another level. We can not only increase the storage capacity by increasing the number of light systems being used, but also design better combination of protein from different organisms which requires more delicate protein engineering design. What’s more? Adding NOT and OR gates downstream the light system might render the system more complex and therefore available for better information storing capacity.
Second, we can screen for better dCas9-recombinase hybrid, with optimal distance between sgRNA pairs and linker length. Using smaller Cas9 endonuclease is also a must when considering transforming the entire plasmid into the bacteria, which is actually a metabolic burden for the cell. With the help from structural biology, we in the future might know how this system works in the end. For example, its detailed catalytic reaction will be unveiled with higher resolution revealed by structural biology.
At present our system can theoretically encode and encrypt any form of file and convert their binary data stream into biological meaningful information, i.e. the sequence changes. At later times we plan to reverse the entire action, which is the decoding process relied on the hardware as well. We also would be able to develop better encoding algorithm to integrate more information into the bacterial genome and in a more versatile way. It is not hard to imagine a world where you can simply store any kind of digital information you want into the bacteria, and break the storing limitation of general electrical device. This platform is potentially game-changing.

[1] Schmidl S R, Sheth R U, Wu A, et al. Refactoring and optimization of light-switchable Escherichia coli two-component systems[J]. ACS synthetic biology, 2014, 3(11): 820-831.
[2] Tabor J J, Levskaya A, Voigt C A. Multichromatic control of gene expression in Escherichia coli[J]. Journal of molecular biology, 2011, 405(2): 315-324.
[3] Camsund D, Lindblad P, Jaramillo A. Genetically engineered light sensors for control of bacterial gene expression[J]. Biotechnology journal, 2011, 6(7): 826-836.
[4] Liu D R, Guilinger J P, Thompson D B. Cas9-recombinase fusion proteins and uses thereof: U.S. Patent Application 14/320,467[P]. 2014-6-30.