# Team:Yale/modeling

<!DOCTYPE html>

## Developing a Framework for the Genetic Manipulation of Non-Model and Environmentally Significant Microbes

## Modeling

## PCR Mutation Predictor

PCR errors are an important problem for all genetic engineering projects, and particularly relevant towards testing MAGE in a new organism. Reliably tracking the mutations created by a MAGE process should take consideration of the errors generated during PCR, given factors such as sequence length, polymerase type, and number of cycles. We developed an interface to calculate the expected share of mutated sequences given this set of user-defined parameters.

### Forecasting Negative Results

Our modelling team developed a PCR error rate estimation program. Originally, our modelling team developed a model based primarily off of binomial distribution. This model had similar results to the error rate calculatorhosted by Thermofisher. While not completely correlated, the results were consistently close enough that it is likely these two models had the same underlying basis. The problem with applying simple binomial distribution to PCR reactions is that the event of copying the DNA is not independent. If an error occurs in an early replication it is propagated through subsequent replications and thus this must be accounted for within the calculations.

Original samples of PCR fragments stay in the mixture after they are used as templates in the initial round of replication. Thus, the generation of a PCR fragment matters when replication is occurring. Therefore, while a third cycle fragment’s replicated product may have a [1-(0.99)]^3 chance of having a single mutation, there will still be first generation fragments whose replication products will only have a [1-0.99] chance of being mutated. Therefore, without taking generations into account, one develops a more pessimistic model than may be accurate. Due to this, we called our original model the Pessimistic Model and the newer model, which we found in literature, the New Model. There are a few other factors that need to be taken into account, such as the chance that a strand of DNA is even replicated at all during a cycle. These concerns are also accounted for in the New Model.

### Optimism and accounting for sequence history

H Sharifian 2010 presents a PCR model based on the Galton-Watson process, where an ancestor particle produces a randomly-distributed number of progeny, which repeat this cycle. This branching process can be used to model errors in replication in PCR.

This model is based on a previous model by Sun 1995 [1]. We begin with S copies of identical single-stranded DNA fragments. Every cycle, each sequence generates a copy with probability lambda (we assume lambda = 1). With each cycle, PCR errors may occur. We formulate the error rate per base as mu, and the length of the sequence in question as G. Then, the error rate per sequence is mu*G. We also take into account the fact that later generations of PCR templates have a higher probability of containing errors.

The following formula gives the probability of a randomly selected sequence containing m mutations. The parameter k represents the sequence generation, and n represents the number of PCR cycles.

The expectation of the binomial is evaluated as the product of the parameters n and (lambda exp(-mu*G)) / (lambda*exp(-mu*G) + 1).

We evaluated the above model in a web app which takes in a sequence of DNA, a PCR error rate, and the number of cycles to produce an estimate of the percentage of PCR fragments that contain mutations in the resulting amplified DNA.