Team:Paris Bettencourt/Project/Continuity

Purpose

In order to gain the trust of the public, we believe the technology should belong to everyone. In a manner similar to the open-source software industry, people should be able to improve our project or create their own versions of it. This idea of openness is very common among the community of synthetic biologists, but a lot of pitfalls have to be overcome to make it a sustainable reality.

A parallel could be drawn with electronics in the 1960's, when computer programming was extremely low-level and belonged to the realm of academia. Since then, it has reached a far wider population thanks to the creation of frameworks allowing for abstraction of the most technical parts. How could the same principles be applied to synthetic biology, in the context of metabolic engineering and vitamins production?

Even though a lot of lab strains designed for easier modification have been designed in the past, they usually have a very general purpose and biotechnology remains a matter of specialists where every modification has to be made from scratch. We imagined a repurposed organism made especially for the quick construction of these self-replicative tiny factories, that could be easily used by startups, community labs or just by enthusiasts. In the following section we discuss the constraints associated with it, and what such an organism could look like.

For the product to be usable, several specifications have to be taken into account:
  • It must be easily extendable:
    Our micro-organism should be a chassis allowing for quick addition of standard cassettes
  • It must be modular:
    The different metabolic pathways should be independant so they can be put together without going through tedious troubleshooting,
  • It must survive in the real world:
    To make our micro-organism more resistant to contamination, we need to design it so our modifications come with a minimal fitness cost.
  • It must be all-in-one:
    For people with limited equipment, having only one strain that does everything is a huge advantage, because only one bioreactor and one production line is needed. This makes it accessible to community labs or NGOs that would want to start producing their own version of our product.

Our design

From the lab to the world

For a biological product to leave the benches and actually reach the population, it's essential to foresee its life in the hands of the people who will cultivate it and make sure it stays alive for the future. Our design must therefore provide strategies to create a durable, usable product. On paper, the plan is simple: the manufacturers grow the micro-organism, distribute it and save a little fraction with which to start a new culture. This could in principle continue ad infinitum, but in reality the universal rules of biology soon kick back in.

Let's consider the following scenario: a wild type organism sneaks into the incubator and starts to replicate along with the engineered organism. Our microbe cannot compete: this contaminant has been selected precisely for its ability to sneak into environments and replicate, during hundreds of years, while our microbe has the burden of producing tons of enzymes to make the precious vitamins. Additionally, unnatural proteins and metabolites can have toxic effects when their production rate is high. After a couple of growth cycles, the worst seems unavoidable: the microorganism that will be distributed will not be the right one. Not only does it not produce nutrients, but it might not ferment the rice well or even be pathogenic.

These contamination events bring a lot of hassle for the manufacturer, so our design must provide solutions for making them as rare as possible.

Our approaches is based on two strategies:

  • Reducing the fitness burden: To make our micro-organism more resistant to contamination, we need to design it so our modifications come with a minimal fitness cost.
  • Identifying the contamination: If a contamination occurs, it is essential that it does not go unnoticed. Our design must allow the manufacturer to detect contamination, and check that what he is growing is exactly what he wants to grow.

It seems impossible to make a strain that fullfills its nutrient-producing functions while growing as fast as the wild type, so we found a workaround: the cells that people use are not the cells that people grow.
We embedded a differentiation system into our organism, so the vitamin-producing pathways are only expressed after a recombination event. First, the cells that are grown are almost identical to the wild-type cells. The battle against contaminants is now a fair fight. The only difference is that they express a gene allowing for their identification, that we will further refer to as the ID gene.
Before distribution, the differentiation is induced and the cells start to produce vitamins in high quantity. They lose the ID gene in the process.

Our model for decreasing the fitness burden of our organism during the production phase. The cells do not produce any vitamin during the manufacturing process. The vitamin production is triggered before the distribution.

We protected our product against foreign organisms, but one threat remains: our organism's own mutants. If a mutation occurs in the active site of an enzyme, or in the promoter of an operon, the functionality of the organism might be impaired. How can we prevent our organism from mutating?

Fortunately, our friends at the Vanderbilt University iGEM team worked precisely on that problem this summer. We worked hand in hand with them to see what a real-life application of their invention would mean practically. They invented an algorithm to scan the sequences looking for regions that are likely to mutate, and proposed alternative versions of our sequences.
As they worked on this project while we were working on ours, we obviously could not use their optimized sequences for our constructs. However, we relied on gene synthesis for a lot of parts, so it would not have been a problem to use the optimized sequences instead. Their algorithm is therefore a valuable tool for any synthetic biologist willing to create durable products, and we applaud their work.

The ID gene

The quality control of our system is possible thanks to an ID gene, which consists in the fluorescent protein mCherry expressed to low levels. We chose this protein because its fluorescence colour is easily distinguishable from the media that will be used for growth, hence a better signal.

The aim of this gene is to provide a quick and reliable way to determine whether the strain that will be distributed is the one intended. After growing the micro-organism in a bioreactor, a sample is taken in order to start a new culture from it. We suggest that, while doing so, the sample has to pass a quality check where its fluorescence is measured.

If the sample displays fluorescence, the culture is sent for packaging and a new culture can be launched. If the fluorescence in not sufficient, the sample is discarded and a new blister stock of the original strain is used to start the culture.

This fluorescence measurement is a good example of real-life use for the DIλ spectrophotometer, a low-budget device that is developed by our neighbours at the Openlab. Heavy development is currently ongoing to make it capable of fluorescence measurements.

Click here to learn more about the DIλ spectrophotometer

How quality control will be performed. The transfer of the inoculate should be done in sterile conditions in a container that also plays the role of fluorimeter cuvette. This ensures that even low-budget labs will always distribute the right strain to people.

An extendable system

Our differentiation system is inspired by the Brainbow system, initially developed for tracking the axons of neurons in mammalian's brain. We modified it so it becomes extendable (A).

This system is randomized on a single-cell level, so each cell produce one —and only one—, vitamin pathway. In most research work, metabolic engineering has been done only one target compound at a time, and little is known about what happens when production pathways are used simultaneously in the same cell (B).
Having one cell expressing only one pathway should theoretically preclude unexpected interactions between different pathways, thus making an extendable framework where every synthesis function is decoupled (C).
The different vitamin-producing pathways can be prototyped separately on a classical lab strain, and it is then easy to put them all together in the same chassis for a multi-functional organism.

The chassis

Let us see how it works under the hood.
Before addition of any metabolic pathways, this is what our empty chassis would look like. The following cassette is integrated in the chromosome. All proteins' coding regions are preceded by a RBS (Ribosome Binding Site) and followed by a transcription terminator.






Constitutive promoter: Thanks to this promoter, a RNA transcript of the cassette will be produced until the first terminator is reached.

The Lox Array: The original LoxP site comes from the phage P1. When an enzyme called the CRE recombinase is expressed, all the DNA between two Lox sites is deleted. Each lox site is made of one overlap region (in bold) surrounded by two complementary flanking regions. The middle of the sequence can be modified, but two LoxP sites will recombine together only if the sequence is exactly identical for both (Richier 2015). The flanking regions cannot be mutated and determine the specificity for one recombination enzyme.

Here are the four orthogonal Lox sites we used:

  • LoxP: ATAACTTCGTATAATGTATGCTATACGAAGTTAT
  • Lox2272: ATAACTTCGTATAAAGTATCCTATACGAAGTTAT
  • LoxN: ATAACTTCGTATAAGGTATACTATACGAAGTTAT
  • Lox5171: ATAACTTCGTATAATGTGTACTATACGAAGTTAT

The ID gene: This gene is entirely optional but can be used as a barcode to identify the strain. This allows for quality control of what is inoculated when a new production culture is started.

The landing pad: This part allows for easy integration of new gene cassettes into the system.

CRE-recombinase: The CRE recombinase should be integrated in the chromosome as well, so we do not have to use an antibiotic for maintaining the plasmid. It has to be under the control of an inducible promoter. The expression of CRE will trigger the differentiation.

The Landing Pad

Starting from this chassis, up to four metabolic pathways can be added by using the attB sequence as a landing pad. Like the Lox sites, this sequence comes from a bacteriophage: the PhiC31 phage uses it to integrate itself in the genome of the host. To insert a new sequence in this landing pad, all you need to do is build a plasmid with the matching "attP" site and express the PhiC31 integrase.

  • attB: GTGCGGGTGCCAGGGCGTGCCCTTGGGCTCCCCGGGCGCGTACTCCA
  • attP: AGTGCCCCAACTGGGGTAACCTTTGAGTTCTCTCAGTTGGGGGCGT
When inserting something in the landing pad, a new landing pad should be added for subsequent integration. This landing pad should be orthogonal to the first one to avoid multiple successive integrations. The same integrase can be used, the central TT just has to be replaced by CC to make the two sites orthogonal.

In summary, a new gene to be added in the system should have the following standard structure (A):

  • An attP sequence different from the one that was used just before,
  • A Lox sequence (Lox sequences should be added in the same order they come in the Lox Array),
  • The operon to be expressed,
  • An attB sequence, orthogonal to the attP used for integration,
  • A selection system (not depicted here for clarity).
When the phage PhiC31 integrase is expressed, this plasmid will be integrated in the locus (B). The CRISPR-Cas9 system from S. pyogenes should work well for selecting the cells who integrated the plasmid(Jiang, Bikard 2013), as the attB contains the protospacer adjacent motif "NGG" next to the two central bases (Mojica 2009). It is therefore possible to kill the cells who still have an intact attB site, just by using CRISPR spacers targeting the following sequences:
  • GCGGGTGCCAGGGCGTGCCCTTGGGCTCCC for killing cells who have not integrated anything in the first attB version,
  • GCGGGTGCCAGGGCGTGCCCCCGGGCTCCC, for the second attB version.
It has the advantage of leaving no scar, thus reducing the number of recombination sites present in the locus.
After integration, the new cassette becomes a new part of the system (C).


The proposed process for addition of new operons to the chassis. A standard cassette is integrated in the PhiC31 locus. It then becomes a new possible outcome for the random differentiation, thanks to the addition of a new LoxP site.

Division of labour

Now that the different genes have been added to the chassis, it is time to see it in action.

The CRE recombinase will cut the LoxP sites in the middle, remove the region in-between, and join the two remaining halves of LoxP sites together (Nagy 2000). This only occurs if the overlap sequence are exactly identical (Missirlis 2006). This means that, in the picture of the right, only LoxP sites of same colour would recombine. Given the configuration of this system, any LoxP recombination event would result in the loss of several other LoxP site, in a such way that further recombinations are not possible. The pair of LoxP sites that undergo recombination is therefore chosen randomly by each cell.

Depending on which region is excised, one random coding region settles next to the promoter and starts to be expressed. For a chassis containing four different operons, the mother cells differentiates in four different daughter cells, each of them expressing one operon.

Even if the chassis is not completely filled, it still works: the number of different daughter cells is always equal to the number of inserted cassettes, and the probability of each is adjusted accordingly.

Overview of the differentiation process. For a complete organism with four metabolic operons, there are four possible outcomes, each of them leading to the expression of only one operon. In other words, each cell randomly choses one operon to express.

How to induce the differentiation?

There are different ways the CRE recombinase can be induced.

Chemical induction

A chemical would be one of the most predictable, efficient way to differentiate the cells. However, it requires to have access to this chemical, and to open the reactor which can be impractical for community labs with low resources to maintain sterility. It is nevertheless the solution of choice for funded factories. Carbohydrates such as glucose, arabinose or lactose seem to be the best options since they are not toxic.

Heat

On the other hand, heat does not require to open the bioreactor, so it's ideal when sterile conditions are not easy to obtain. Numerous heat-sensitive promoters exist, such as the Heat-Shock Promoter that is present in the registry (BBa_K338001).

The problem of leakiness

Most of the promoters from living organisms are leaky, i.e. they still lead to a small amount of transcription even in the absence of inducer. In our case, it means that the CRE recombinase will be expressed from time to time in some cells and it could result in the differentiation of some of them. If their number is low, it should not have any consequence as the differentiated cells are very unlikely to take over, but leakiness may be an obstacle if it affects a large proportion of the cells.

The perfect expression level

We could try to differentiate all the cells as quickly as possible, or let the cells differentiate slowly in a prograssive manner. Provided the differentiated cells grow significantly slower than the mother cell, the strategy that generates the largest amount of vitamins is not obvious.
To answer this question, we created a mathematical and computational model of the situation. Given the growth rate of the mother cells and the daughter cells, it is possible to calculate the optimal differentiation rate, and chose the strength of the promoter accordingly.

Click here to learn more about the model


Results

Construction of the system

We succesfully assembled a prototype version of this system in the model bacteria Escherichia coli.
The initial promoter is a strong constitutive promoter from the biobricks registry, BBa_J23119.
The genes involved in vitamin production are replaced with fluorescent proteins, allowing for easy monitoring of their production. Our construct contains mCherry as a reporter gene, and two other fluorescent proteins to mimick pathways operons. It also has a phage PhiC31 integration site for subsequent addition of new genes.

In theory, the cells with this cassette integrated in the chromosome are expected to emit a red fluorescence. Upon induction of the CRE-recombinase, they should lose the red fluorescence and start to express either mCerualean (a cyan fluorescent protein) or mVenus (a yellow fluorescent protein). Each cell should express only one of those two proteins at the same time.

Map of the DNA sequence we constructed. The promoter BBaJ23199 is constitutive. mCherry, mCerulean, mVenus are fluorescent proteins of different colours.

Chromosomal integration

This cassette was constructed by gene synthesis and Gibson assembly and assembled in a self-integrating plasmid vector (Saint-Pierre, 2013). This vector uses the integrase of the phage HK022 to integrate itself in E. coli's chromosome. This plasmid was electroporated in the bacteria and the HK022 integrase was induced.

To check that the cassette has correctly been integrated in the right locus, we performed an analytical PCR on the whole genome of the transformants, with a set of four primers which allows for amplification of the junction between the vector and the chromosome.

Here is presented the result of this PCR using the genome of six clones of transformants as a template.

It tells us that the vector was succesfully integrated at the right locus. It also shows that there have been only one integration and no tandem integrations, which would have resulted in an additional band.


Gel electrophoresis after PCR for checking the integration. We amplified the junctions between the artificial cassette and E. coli 's chromosome. For every screened clone, the two bands have the expected sizes, which proves that the cassette is integrated in the correct locus.




Gel electrophoresis after PCR for checking the presence of the three fluorescent proteins. For the clone shown here, it means that all three proteins ORFs are present on the chromosome, even though only the first one is actually expressed.

Integrity of the cassette

We then performed three other PCRs with pairs of primers binding on the ORF of the three fluorescent proteins. As a positive control, we performed the same PCR on the pure fragment that have been used for the assembly.
This way we ensured that the cassette was present in its entirety in the chromosome.

Sequencing of the Lox Array

To investigate whether unexpected recombination occured within the LoxP sites due to homologous recombination, we performed Sanger sequencing on the first part of the integrated cassette, where the Lox Array is. This way we could make sure that it was still intact and contained no PCR-induced mutations.

Impact of the Lox array on the transcription

The Lox array was the most difficult region to construct. As it is a very repetitive sequence with numerous dyad repeats, it is tedious to synthesize, amplify and assemble. That's why we created biobrick BBa_K1678005. It contains the promoter followed by the four orthogonal Lox sites. We sequenced this biobrick to confirm that it contains no mutation.

We characterized this new biobrick's function by assembling it in pSB1C3 with the part BBa_K516030 which contains a RBS, the mRFP coding sequence and a double terminator. For comparison, the biobrick BBa_J23119 was assembled with the same mRFP cassette on the same vector.

As in prokaryotes the 30S subunit of the ribosome binds directly to the RBS, the LoxP array does not theoretically interfere with translation. It can however interfere with the transcription.
During the transcription, the RNA polymerase has to go through the LoxP array, which is made of repetitive sequences that are likely to form a hairpin. We show that this has an impact on the transcription efficiency (Mann-Whitney-Wilcoxon test, p-value < 10-6). However, it still allows for strong protein expression as the average expression level was equal to 91% of the expression level of the BBa_J23119 promoter. The fraction of RNA polymerases that go through the Lox array should be more than enough for our design.


Characterization of the promoter followed by the four LoxP sites.
Using standard biobrick assembly, three plasmids were constructed and transformed into E. coli:

  • The promoter directly connected to the mRFP sequence (RBS + ORF + Terminator),
  • The promoter connected to the Lox array, connected to the mRFP sequence,
  • The promoter alone, without any fluorescent proteins as a negative control.
The cells were diluted to an OD600 of 0.01, grown to exponential phase and the fluorescence was measured on a TECAN plate reader when the OD reached 0.3. The excitation wavelength was 585 nm and the detection wavelength was 615 nm.

Expression of the proteins

Expression of the first protein of the cassette in the chromosome.
A "mother cell" with our differentiation system integrated in the chromosome was grown to exponential phase and its fluorescence was measured when OD600 reached 0.3. As a negative control, a cell without fluorescent underwent the same treatment. Excitation wavelength: 585 nm. Detection wavelength: 615 nm.

Because this differentiation system requires that only one copy of the sequence is present in the cell, we measured the expression level of mCherry on cells with the chromosomally integrated cassette.

The cells exhibit clear fluorescence (Mann-Whitney test, p-value < 10-6), even though it was not visible to the naked eye. The mCherry fluorescent protein is therefore a suitable reporter for quality control of the strain.

Induction of the differentiation

We then aimed to trigger the differentiation of our newly constructed strain. A strain carrying the plasmid pFHC2938, that allows for expression of the CRE recombinase upon arabinose induction (Nielsen 2006), was aquired.
Unfortunately, all our attempts at transforming our strain with it have been unsuccesful, even when resorting to very efficient techniques such as electroporation. It always lead to either nothing, or a lawn of bacteria that did not seem to carry the antibiotic resistance using for selecting the plasmid. To troubleshoot this transformation, we made three hypothesis:

  • Is there something in the cell that interferes transformation?
  • Is the strain not suitable for CRE expression, e.g. there are LoxP sites somewhere that result in deletions in the genome?
  • Is the plasmid the wrong one?

To figure this out, we transformed our strain with the CRE recombinase plasmid along with a standard pSB1C3-mRFP plasmid. The transformation of the CRE-recombinase gave a lawn, while the control plasmid gave clear colonies. We then performed the same 4-primer PCR that was used to check for integration on the bacteria present on the plates. The bacteria transformed with pSB1C3-mRFP still contained the integrated cassette, while the bacteria on the pFHC2938 plate did not display any band, meaning that they were contaminants. This means that the transformation process is not the problem.

We then transformed a non-modified strain without our integrated cassete with the pFHC2938 plasmid. At the same time, the integrative plasmid was transformed into another E. coli strain (STBL). None of the transformations yielded any colonies, meaning that the problem came from the plasmid and not from the strain.


Troubleshooting of the transformation. A. Strain carrying the integrated cassette before transformation. B. Cells picked from the lawn after pFHC2938 transformation. C. Colony picked after the transformation with a control plasmid (pSB1C3-mRFP).

Unmodified Top10 lab strain after transformation with pFHC2938. The great nothingness.

Outlook

More than just operons

The system we invented, as presented here, works primarily when the whole vitamin pathway fits in one operon. It was notably the case for our pathway for vitamin A, where all the required enzymes are tied together in one big polycistron.
It is still possible to implement this differentiation system for a pathway that needs several promoters to function. For this, we can put the different operons under promoters that are activated by another factor, and this factor is put in the differentiation system.
Two related technologies, CRISPR interference and CRISPR activation (Bikard 2013), appear as an ideal way to do this. It relies on a mutant of Cas9 defective for nuclease activity (dCas9), that can be targeted at about any place in the genome. This can lead to either repression of transcription, or activation by fusing a transcription activation factor to dCas9. It works on both prokaryotes and eukaryotes (Perez-Pinera 2013), and is extremely versatile and programmable. As seen in our phytase project, inactivation of a gene is sometimes useful for nutrient production.

By expressing dCas9 and replacing the fluorescent proteins in our construct with CRISPR arrays, it could be possible to widely change the expression profile of the micro-organism while keeping the advantages of differentiation.

Litterature

  • Mojica et al., 2009, "Short motif sequences determine the targets of the prokaryotic CRISPR defence system". Microbiology 155 (Pt 3): 733–740.
  • Jiang, Bikard et al., 2013. "RNA-guided editing of bacterial genomes using CRISPR-Cas systems", Nat Biotechnol. 2013 Mar;31(3):233-9.
  • Nagy et al., 2000. "Cre recombinase: the universal reagent for genome tailoring". Genesis 26 (2): 99–109.
  • Missirlis et al., 2006. "A high-throughput screen identifying sequence and promiscuity characteristics of the loxP spacer region in Cre-mediated recombination". BMC Genomics 7: 73.
  • Saint-Pierre et al., 2013. "One-step cloning and chromosomal integration of DNA". ACS synthetic biology 20;2(9):537-41.
  • Nielsen et al., 2006. "Dynamics of chromosome segregation in Escherichia coli. BioCentrum, Ph.D thesis.
  • Bikard et al., 2013, "Programmable repression and activation of bacterial gene expression using an engineered CRISPR-Cas system", Nucleic Acids Res. (15):7429-37
  • Perez-Pinera et al., 2013, "RNA-guided gene activation by CRISPR-Cas9–based transcription factors", Nature Methods 10, 973–976.

Attribution

This project was designed and accomplished by Antoine Vigouroux in consultation with Jason Bland and Ihab Boulas. Most of the strains (DH5alpha, Top10, NEB turbo, Pir116) were kindly provided by Inserm U1001. Plasmids pFHC2938 and pMEV250 were provided by Jason Bland and Aleksandra Nivina at Didier Mazel's lab at Institut Pasteur. Plasmids pL1F2 and pR6K-shortened were provided by Antoine Decrulle and Ihab Boulas at Inserm U1001. The pIT5-KH vector was provided by Lun Cui at David Bikard's lab at Institut Pasteur. Special thanks to all the people who gave me an hand during this project, and all the Paris Bettencourt team for making this adventure so much fun.