Team:Exeter/RNA Riboswitches
RNA and Riboswitches
"In the beginning, RNA was a simple molecule, but over time it has gained many functions. From self-replication, to storing and utilising information, to regulating cellular pathways, it is an example to all molecules..."
The Basic Molecules of Life
The Basics of DNA:
DNA (deoxyribonucleic acid) is a biological molecule found in all forms of life, excepting some types of viruses. DNA is known as a polymer molecule, which means that it is made up of many subunits. In DNA, these subunits are known as nucleotides/bases, of which there are four types; adenine (A), thymine (T), guanine (G), and cytosine (C). Each nucleotide in DNA has three main sections; a phosphate group, a deoxyribose sugar, and the nucleotide (either A, T, C or G). These nucleotides are joined together via phosphodiester bonds between the phosphate group of one nucleotide's phosphate group and another nucleotide's deoxyribose sugar to form a phosphate backbone, which makes up the backbone of DNA. The DNA molecule also has direction (i.e. it has a beginning and an end). The beginning is known as the 5' (five prime) end, and the end is known as the 3' (three prime). New DNA bases (nucleotides) join on to the 3' end of the existing DNA molecule. (figure 1). As well as nucleotides being able to join to adjacent nucleotides via phosphodiester bonds, each type of nucleotide is able to bond to another specific nucleotide perpendicular (at right angles to) the phosphate backbone via H-bonds in a process known as base pairing. In DNA, adenine (A) is able to base pair to thymine (T), and guanine (G) to cytosine (C). Nucleotides which base pair are called complementary, therefore A & T are complementary, as are G & C. In nature, DNA is rarely found as a single strand, instead it is found as a complex of two DNA strands, one wrapped around the other to give the familiar double stranded helix structure associated with DNA. Each DNA strand is anti-parallel and complementary to the other (i.e. their directions are opposite and where one strand has, for example, an A, the other will have a T). The DNA strand which is in the 5' to 3' orientation is called the sense strand, and the strand which runs from 3' to 5' is known as the antisense strand (figure 2). DNA's primary role in cells is to store genetic information. The information stored on DNA molecules refers to the characteristics and functions of a cell, and therefore the entire organism. In multicellular organisms (e.g. animals), each cell contains identical genetic information, however the information which is used depends on the type of cell. For example, cells which make up the eyes will use information corresponding to sight and eye colour, while muscle cells will use information which corresponds to contraction and relaxation of the cells during use. The genetic information on DNA is stored in discrete units called genes. Each gene contains information which corresponds to at least one characteristic/function of the cell (And therefore the organism), and is encoded in the language of nucleotides. The sequence of nucleotides within a gene (e.g. ATTCTGCTA) is used to produce a specific molecule (normally a protein). This process is described in more detail in the next section; 'Translation and Transcription'. (Figure 3).
The Basics of RNA:
RNA (ribonucleic acid) is another polymer molecule which shares some similarity with DNA. The subunits which make up RNA (called ribonucleotides/bases) are similar to those which make up DNA, however they have a few crucial differences. The first is that while both DNA and RNA bases contain a phosphate group and the nucleotide, instead of a deoxyribose sugar, ribonucleotides have a ribose sugar. In addition, in RNA there is no thymine (T) bases, instead there is another type of base called uracil (U). A and U are complementary in RNA. Excepting these differences, the basic structure of RNA is very similar to that of DNA; they both have directions (5' to 3'), and both have their subunits joined by phosphodiester bonds to form a phosphate backbone (figure 4). Another difference between DNA and RNA is that RNA is often found as a single strand, as opposed to the double stranded helix of DNA. Although this may make it seem like RNA will be found as a linear molecule, it is important to realise that this is not the case; RNA can actually have more complex structures than DNA. As the RNA strand is not bound to another RNA strand, it has all of its ribonucleotides free to base pair, which they do. The ribonucleotides can bind with complementary bases on the same RNA strand, or indeed with those on other RNA strands to form an RNA-RNA complex (although it is rare that this complex will have the double helix structure of DNA). This allows RNA to have a great many structures (figure 5). As many diferent structures of RNA can be formed within a cell, it is perhaps not surprising that there are many types of RNA, each with different functions within the cell. Three types of RNA are used in the process of utilising the genetic information stored on DNA, and is described in the next section; 'Transcription and Translation'. Other functions of RNA are described in the section 'The Functions of RNA'.
The Basics of Proteins:
Proteins are another type of biological molecule, which is a polymer like DNA and RNA, however the subunits which make up proteins are known as amino acids. There are 21 types of (natural) amino acid, and all of them share a similar structure; a hydrogen group (H), a carboxylic acid group (COOH), an amino group (NH2), and a functional group (R). The functional group is different for each type of amino acid. Unlike with DNA and RNA, amino acids are no joined by phosphodiester bonds, but by amide bonds between the carboxylic acid group of one amino acid, and the amino group of another. The interactions of the functional groups, both with other functional groups of the same/different proteins, and with other molecules/etc. in its environment, gives the protein its overall function (figure 6). These functions can range from catalytic speed up the rate of a reaction) to structural (shape/strength of a cell), to virulence (causing disease). As has been eluded to before, these proteins are encoded for by DNA and the production of them involves RNA. In the next section we will see how exactly this mechanism works.
Transcription and Translation
The Central Dogma:
In molecular biology, the central dogma explains the flow of genetic information from DNA to RNA to proteins. Essentially, this means that information is stored in the form of DNA, converted into RNA, and then used to synthesise proteins (figure 1) In the previous section we described briefly the structures and roles of DNA, RNA, and proteins. In this section we will show the process of the central dogma (DNA to RNA, and RNA to protein), and look more in depth at the roles of each molecule in this process, also known as protein synthesis.
Transcription:
Transcription is the term given to the process of converting the information found on DNA to RNA. It may seem unnecessary to use an RNA intermediate instead of simply using DNA. There are a few reasons for this, the main ones being:
- Protection of DNA: damage to DNA can cause unfavourable mutations so it is safer to use a 'copy' rather than the original,
- Regulatory reasons: the presence or absence of RNA can correspond to the presence/absence of the protein which it encodes for, meaning that it can be used to control cellular pathways
- Inability of DNA to reach protein machinery: in eukaryotic cells (animals, plants, fungi, etc.), the DNA is separated from the rest of the cell by a nuclear envelope, DNA is unable to pass through this envelope but RNA is able to
The process of converting information from DNA to an RNA form requires the use of a protein known as RNA polymerase. RNA polymerase initially binds to a specific type of sequence on a DNA molecule known as a promoter. Promoters are usually relatively short in length (~30 to 50 nucleotides) and are found preceding a gene. Different promoters can have different 'strengths', with a strong promoter causing more mRNA to be produced than a weak one. Once bound, the RNA polymerase causes a short section of the DNA to unwind out of its helical structure, and breaks the H-bonds used in base pairing to separate the two DNA strands, creating what is known as a transcription bubble. The RNA polymerase then reads along the antisense strand of the DNA molecule, elongating the transcription bubble as it goes. The antisense strand is used as a template to synthesise a new RNA molecule, which is called messenger RNA (mRNA). The mRNA molecule which is produced can be thought of as the 'RNA version' of the sense strand of the DNA molecule. This is because the RNA molecule formed is antiparallel (opposite and complementary) to the antisense strand from which it is synthesised. It is the RNA version as where any thymines (Ts) would be added, uracil (U) is added instead. When the RNA polymerase reaches the end of the gene, it will find a terminator which causes it to 'fall off' of the DNA molecule and release the newly synthesised mRNA molecule (figure 2).
Translation:
The next step after the mRNA has been synthesised is to use it to synthesise a protein. In order to do this, two more types of RNA are required; ribosomal RNA (rRNA) and transfer RNA (tRNA). Before talking about these, a little more information about the 'language of nucleotides' is required. As has already been stated, amino acids are encoded for by the sequence of nucleotides. To be more specific, each amino acid is encoded for by a triplet of nucleotides (e.g. ACG, or ACA, GGC...) called a codon. In fact, most amino acids have more than one codon which corresponds to itself, for example the amino acid phenylalanine is coded for by both UUU and UUC. This phenomenon is known as 'the redundancy of the genetic code'. The genetic code is also referred to as 'unambiguous', as although an amino acid can be encoded for by multiple codons, each codon corresponds to only one amino acid. While this explains how nucleotides code for amino acids, how does this system work in a cell? To answer this we need to first talk about transfer RNA (tRNA). tRNA is a type of RNA which folds up into structure containing two important sections; the attachment site and the mRNA binding site. The mRNA binding site contains three bases called the anticodon. The anticodon is antiparallel to a specific codon on the mRNA. The attachment site is found at the 3' end of the tRNA and has a sequence of CCA. Enzymes in the cell can attach the correct amino acid to the corresponding tRNA. The amino acid which is attached is that which is encoded for by the codon which the anticodon of the tRNA matches. For example, if a tRNA molecule has an anticodon UCU, then the codon which matches it is AGA, which encodes for the amino acid serine. Therefore, serine will be attached to tRNA. Essentially, the tRNA can be thought of as a linker between the mRNA and the amino acids (figure 3).
To recap, we now know how mRNA is produced from DNA, and how the code of the mRNA corresponds to the amino acid code of a protein. All that is left to understand now is how these amino acids are joined together. This part of the process involves the use of a complex called a ribosome, which is shown both in figure 4, and our Ribonostics logo. Ribosomes are complexes composed of both proteins and a type of RNA called ribosomal RNA (rRNA) and have a large subunit and a small subunit. Ribosomes also have three sites; the A site, the P site, and the E site. We will discuss the functions of these sites further on. As the ribosome is composed largely of RNA, it is able to bind to RNA molecules with complementary sequences. The mRNA which has been produced contains a sequence roughly 6-7 nucleotides before the section which codes for the protein called the RBS (ribosome binding site). The ribosome binding site is relatively short and allows the ribosome to bind to the mRNA. Once the ribosome has bound to the mRNA, it begins to move along the RNA molecule until it comes across the start codon - AUG, which is also the codon for methionine (Met) (figure 5). Once the ribosome reaches the start codon, translation can begin.
As has been mentioned briefly before, ribosomes contain three site; A, P, and E sites, each of which has a different function. Each site is roughly the size of three nucleotides (i.e. a codon). When the A site reaches the start codon (AUG), a tRNA with the correct anticodon (CAU, 5' to 3') bound to Met can bind to the codon on the mRNA. Once the tRNA has bound, the ribosome moves forward three nucleotides, so that the Met tRNA is now in the P site, and the A site is unoccupied and covering the next codon. Again,a tRNA with the correct anticodon binds in the A-site. However, this time the amino acid on the second tRNA binds the amino Met on the tRNA now in the P-site and releases it from that tRNA. The ribosome then moves along again so that the Met tRNA is in the E-site, the tRNA with the two bound amino acids is in the P site, and the A site is unoccupied. Again, a correct tRNA binds in the A site, and the Met tRNA leaves the E-site and the complex. The amino acid on the tRNA in the A site binds the two amino acids on the tRNA in the P site and releases that tRNA, before the ribosome moves along again, shifting the tRNAs to the next site and leaving the A site empty. This continues on until a STOP codon (UAA, CAC, UGA) is in the A site. This time, instead of a tRNA entering the A site, a stop factor binds and releases the amino acid chain from the tRNA in the P site, and hence from the entire complex. This amino acid chain is then able to fully fold into the correct structure to become a protein and carry out its function (figure 6).
General Regulatory RNAs
RNA is a molecule capable of many functions, and is fundamental in a number of cellular processes. We have already seen in the previous sections three types of RNA (mRNA, tRNA, and rRNA), each of which has a different function in the process of protein synthesis. There are many more types of RNA than these three, and they are found involved in different cellular processes, normally in a regulatory role. In this section, we will look at some of the main types of regulatory RNAs and their importance to life.
Regulation through protein synthesis
Proteins are fundamental in cellular processes/pathways, from catalysing a reaction to activating/inhibiting other catalytic proteins, to products/inputs of the pathway, and their presence/absence can dramatically change, activate, or stop a pathway from occurring. This makes proteins ideal targets for regulating these pathways. Therefore perhaps the simplest way conceptually in which RNA can regulate pathways is through controlling the amounts of these proteins in the cell. An obvious way in which this can be achieved is through changing the levels of that protein's mRNA. There are many ways in which this can be achieved. On way in which mRNA levels can be changed is through inhibiting/enhancing the transcription of DNA to produce the mRNA. In this regulatory method, RNA is not actually controlling the regulation of the pathway, but is instead a target for regulation. Proteins called transcription factors are able to either inhibit or enhance the binding of RNA polymerase to a promoter, and therefore control the amount of mRNA produced, and hence the amount of protein. If the protein being controlled is involved in the rate limiting step of a pathway (i.e. the slowest step of the pathway), control of this protein can change the rate of the entire pathway. Even if the protein is not involved in the rate limiting step of a pathway, if that protein is down regulated (produced less) enough then it may become the rate limiting step and therefore slow down, or even turn off, the entire pathway (figure 1). The amount of mRNA can also be controlled after it has been produced through the control of its degradation; stabilisation of the mRNA (and therefore a decrease in its degradation) means that it can be read more times by a ribosome before it breaks down, hence increasing the amounts of that protein. Inversely an increase in degradation will result in less proteins being produced per mRNA molecule. There are many ways in which this can be controlled. One way is by proteins; some proteins have RNA degradation activity and will therefore break down the mRNA, and some proteins are able to modify mRNA to protect them from degradation, hence increasing its stability. RNA can also play an important role in this regulatory method. Some degradation proteins are able to bind guide RNAs with sequences complementary to specific mRNAs, and therefore allow specific mRNA degradation. In addition to this, some RNAs are able to carry out the degradation themselves. These RNAs are termed catalytic RNAs, also known as ribozymes. These ribozymes can base pair to specific mRNA sequences and then, due to their structure, cause cleavage of the mRNA. This cleavage decreases the stability of the mRNA, and any cleaved mRNA which is read by a ribosome will produce an incomplete protein, which is usually inactive. Some specific examples of how this can be achieved are shown in figure 2.
Riboswitches
Another important way in which RNA can regulate cellular processes/protein expression is through the use of riboswitches. Riboswitches are found in some form in most bacteria and other prokaryotic cells. In the next section riboswitches are explained in detail, and some examples are given.
Riboswitches
Riboswitches are essentially mRNA molecules which have a regulatory section which controls whether or not the protein coding section is read or not. There are quite a few different types of riboswitches, each with different mechanisms. In general, the presence/absence of a ligand/trigger (from small metal ions, to amino acids, to proteins, to other RNA molecules), or changes in conditions such as temperature/pH, causes a change in conforation of the riboswitch which then either allows or stops the protein coding region from being read by a ribosome and the protein produced. Broadly, riboswitches can be put into two groups; those which act at the transcriptional level, and those at the translational level. In this section we will describe the general mechanisms of each type.
Transcriptional Control:
One main type of riboswitch controls the production of its own mRNA. During transcription, the riboswitch section of the mRNA is produced first while the rest of the coding section is being synthesised. This regulatory section is able to take on one of two structures depending on the presence/absence of a ligand/change in conditions. If the riboswitch is turned off by the conditions/ligand presence, then it will cause a terminator to form in the mRNA and hence the RNA polymerase will stop transcription before the entire mRNA is produced, resulting in an essentially useless mRNA molecule. However, if the riboswitch is turned on, then an antiterminator is formed instead and the RNA polymerase is able to read through the entire mRNA and allow the protein to be expressed (Figure 1).
Translational control:
Those riboswitches which act at the translational level usually work by sequestering the RBS (ribosome binding site) away from the ribosome, and hence stop translation of the mRNA and protein production. There are different mechanisms of sequestering a revealing the RBS. One of these ways is through cleavage of the riboswitch. In the absence/presence of a certain ligand/condition, the riboswitch can take on a conformation in which a cleavage site is revealed. If the riboswitch becomes cleaved, then the RBS can be released and accessed by a ribosome, which can then read the protein coding region of the mRNA. The cleavage of this riboswitch can be carried out by a protein/ribozyme, or in some cases by the riboswitch itself (figure 2). Another way in which an RBS can be sequestered is by being placed within a loop structure. When in a loop structure, the ribosome is unable to bind to the RBS sequence, and hence the protein can not be produced. If a ligand/condition changes the conformation of the riboswitch such that the loop structure is removed, then the ribosome becomes able to bind to the RBS and read the rest of the mRNA (Figure 3).
Other types of riboswitches:
There are many types of riboswitches which are found in nature, each with a different type of mechanism. Our project is based around a specific riboswitch called a toehold switch. In the next section, discuss the use of riboswitches in synthetic biology, and describe the structure and mechanism of the toehold switch.
Natural and Synthetic Toehold Switches
As has been discussed in the previous sections, RNA is an important molecule which is involved in many functions, including cellular regulation. Discussed in some detail in the previous section were riboswitches and the different types and mechanisms of action. For our project we have designed and improved upon a specific type of riboswitch; a toehold switch. Toehold switches are riboswitches which regulate at the transcriptional level via RBS sequestration, and they are so named for the toehold structure which is an integral part of the riboswitch (figure 1). The basic mechanism of action is that an RNA molecule with a complementary sequence to that of the switch region of the toehold switch binds to the switch and causes the structure to open up, removing the toehold structure and revealing the RBS to allow the ribosome to bind. This mechanism is discussed in further detail below.