Difference between revisions of "Team:Peking/Modeling/Analysis"

Line 201: Line 201:
 
                                 <h3 class="classic-title" style="margin-top:50px"><span>Model</span></h3>
 
                                 <h3 class="classic-title" style="margin-top:50px"><span>Model</span></h3>
 
                                 <h4><em>Wilcoxon Rank Sum Test of Block Design</em></h4>
 
                                 <h4><em>Wilcoxon Rank Sum Test of Block Design</em></h4>
                                 <p>In view of the unknown distributions and different variances of the signals by our Paired dCas9 Reporter System, we chose a non-parametric statistics method called Wilcoxon Rank Sum Test of Block Design with the data Rank instead of ANOVA. <br>
+
                                 <p>In view of the unknown distributions and different variances of the signals by our Paired dCas9 Reporter System, we chose a non-parametric statistics method called Wilcoxon Rank Sum Test of Block Design with the data Rank instead of ANOVA.</p>
 +
                                <p>
 
                                 In the Block Design, we regarded the same gRNA detection of two treatment, i.e. target and mismatch DNA, as a block. To test the difference between two treatments, we test the null hypothesis that two treatment have no difference. The Wilcoxon Rank Sum statistics <img alt='Peking-Analysis-W_j.gif' src="https://static.igem.org/mediawiki/2015/1/15/Peking-Analysis-W_j.gif" class='formula-inline'>of each block is calculated first by
 
                                 In the Block Design, we regarded the same gRNA detection of two treatment, i.e. target and mismatch DNA, as a block. To test the difference between two treatments, we test the null hypothesis that two treatment have no difference. The Wilcoxon Rank Sum statistics <img alt='Peking-Analysis-W_j.gif' src="https://static.igem.org/mediawiki/2015/1/15/Peking-Analysis-W_j.gif" class='formula-inline'>of each block is calculated first by
 
                                 <div align="center" class='row'>
 
                                 <div align="center" class='row'>
Line 215: Line 216:
 
                                     so <img alt='Peking-Analysis-xy_sample.gif' src="https://static.igem.org/mediawiki/2015/e/ef/Peking-Analysis-xy_sample.gif" class='formula-inline'>, which implies that <img alt='Peking-Analysis-R_sample.gif' src="https://static.igem.org/mediawiki/2015/8/81/Peking-Analysis-R_sample.gif" class='formula-inline'><br>
 
                                     so <img alt='Peking-Analysis-xy_sample.gif' src="https://static.igem.org/mediawiki/2015/e/ef/Peking-Analysis-xy_sample.gif" class='formula-inline'>, which implies that <img alt='Peking-Analysis-R_sample.gif' src="https://static.igem.org/mediawiki/2015/8/81/Peking-Analysis-R_sample.gif" class='formula-inline'><br>
 
                                     Under the null hypothesis, after calculate all the possible order of two sample sets, the distributions of the statistics <img alt='Peking-Analysis-W_j.gif' src="https://static.igem.org/mediawiki/2015/1/15/Peking-Analysis-W_j.gif" class='formula-inline'> are shown as below:
 
                                     Under the null hypothesis, after calculate all the possible order of two sample sets, the distributions of the statistics <img alt='Peking-Analysis-W_j.gif' src="https://static.igem.org/mediawiki/2015/1/15/Peking-Analysis-W_j.gif" class='formula-inline'> are shown as below:
                                     <table border='1' style='margin-top:0px;padding:10px' class='col-md-12'>
+
                                     <table border='1' style='margin-top:0px;margin-bottom:10px;padding:10px' class='col-md-12'>
 
                                         <tr>
 
                                         <tr>
 
                                             <th>W<sub>j</sub></th>
 
                                             <th>W<sub>j</sub></th>
Line 230: Line 231:
 
                                     <p>Due to the small sample size, the minimal significance level is 0.05, which means only if <img alt='Peking-Analysis-Wj%3D15.gif' src="https://static.igem.org/mediawiki/2015/e/e0/Peking-Analysis-Wj%3D15.gif" class='formula-inline'> leads to a rejection of the null hypothesis, in other words only when the minimum value of <img alt='Peking-Analysis-X_j.gif' src="https://static.igem.org/mediawiki/2015/6/60/Peking-Analysis-X_j.gif" class='formula-inline'> was greater than the maximum value of <img alt='Peking-Analysis-Y_j.gif' src="https://static.igem.org/mediawiki/2015/7/79/Peking-Analysis-Y_j.gif" class='formula-inline'> to accept the alternative hypothesis instead of the null hypothesis, the two sets of data is significantly different. So the Wilcoxon Rank Sum Test may face challenge in single block test when the experimental and control group are slightly different.However, by using Block Design, we can integrate data from m blocks similar to the idea of ANOVA. We calculated the sum of <img alt='Peking-Analysis-W_j.gif' src="https://static.igem.org/mediawiki/2015/1/15/Peking-Analysis-W_j.gif" class='formula-inline'> as the statistics.  
 
                                     <p>Due to the small sample size, the minimal significance level is 0.05, which means only if <img alt='Peking-Analysis-Wj%3D15.gif' src="https://static.igem.org/mediawiki/2015/e/e0/Peking-Analysis-Wj%3D15.gif" class='formula-inline'> leads to a rejection of the null hypothesis, in other words only when the minimum value of <img alt='Peking-Analysis-X_j.gif' src="https://static.igem.org/mediawiki/2015/6/60/Peking-Analysis-X_j.gif" class='formula-inline'> was greater than the maximum value of <img alt='Peking-Analysis-Y_j.gif' src="https://static.igem.org/mediawiki/2015/7/79/Peking-Analysis-Y_j.gif" class='formula-inline'> to accept the alternative hypothesis instead of the null hypothesis, the two sets of data is significantly different. So the Wilcoxon Rank Sum Test may face challenge in single block test when the experimental and control group are slightly different.However, by using Block Design, we can integrate data from m blocks similar to the idea of ANOVA. We calculated the sum of <img alt='Peking-Analysis-W_j.gif' src="https://static.igem.org/mediawiki/2015/1/15/Peking-Analysis-W_j.gif" class='formula-inline'> as the statistics.  
 
                                     <div align='center'>
 
                                     <div align='center'>
                                     <img class='formula-line' alt="Peking-Analysis-W_sumWj.gif" src="https://static.igem.org/mediawiki/2015/2/29/Peking-Analysis-W_sumWj.gif">
+
                                     <img class='formula-line' alt="Peking-Analysis-W_sumWj.gif" src="https://static.igem.org/mediawiki/2015/2/29/Peking-Analysis-W_sumWj.gif" style='margin-top:5px;margin-bottom:5px'>
 
                                     </div>
 
                                     </div>
 
                                     The Wilcoxon Rank Sum <img alt='Peking-Analysis-W_j.gif' src="https://static.igem.org/mediawiki/2015/1/15/Peking-Analysis-W_j.gif" class='formula-inline'> from m blocks are independent and identically distributed (i.i.d), according to the central limit theorem (CLT), as m approaches infinity, the random variable <img class='big-formula-inline' alt="Peking-Analysis-W_BD_statistics.gif" src="https://static.igem.org/mediawiki/2015/6/68/Peking-Analysis-W_BD_statistics.gif"> converges in distribution to a standard normal distribution <i>N</i>(0,1)
 
                                     The Wilcoxon Rank Sum <img alt='Peking-Analysis-W_j.gif' src="https://static.igem.org/mediawiki/2015/1/15/Peking-Analysis-W_j.gif" class='formula-inline'> from m blocks are independent and identically distributed (i.i.d), according to the central limit theorem (CLT), as m approaches infinity, the random variable <img class='big-formula-inline' alt="Peking-Analysis-W_BD_statistics.gif" src="https://static.igem.org/mediawiki/2015/6/68/Peking-Analysis-W_BD_statistics.gif"> converges in distribution to a standard normal distribution <i>N</i>(0,1)

Revision as of 13:23, 18 September 2015

Modeling

Reliablity!!!!

Background

To increase the accuracy and specificity of the detection, we developed an assay over our Paired dCas9 Reporter (PC Reporter) System to get more sequence information from the target genome in the purpose of a more reliable result. We designed m pairs of gRNA specific target sites as m markers in the MTB genome. To make sure if the idea mention above actually work, here we used the target gene and the mismatched gene to have a test, respectively. In experimental group, the gRNAs were used to detect the target gene, while in control group, the gRNA were used to detect the mismatched gene. And to reduce the random error, both the experimental and the control group were repeated n times, the result would be shown as the optical power signals, which is generated by our Paired dCas9 Reporter System. Then by comparing the intensity of the optical power signal corresponding to the target gene and mismatched gene, the difference can be seen directly.

Assumptions

  • There is no recognition site for gRNA in the mismatched gene.
  • The measure values Peking-Analysis-X_iY_i.gif of control and experimental group is independent of each other.
  • The measure values from n times repeated test compose the Peking-Analysis-X_iY_i.gif sample set, respectively, and the sample sets are both small.

Model

Wilcoxon Rank Sum Test of Block Design

In view of the unknown distributions and different variances of the signals by our Paired dCas9 Reporter System, we chose a non-parametric statistics method called Wilcoxon Rank Sum Test of Block Design with the data Rank instead of ANOVA.

In the Block Design, we regarded the same gRNA detection of two treatment, i.e. target and mismatch DNA, as a block. To test the difference between two treatments, we test the null hypothesis that two treatment have no difference. The Wilcoxon Rank Sum statistics Peking-Analysis-W_j.gifof each block is calculated first by

Peking-Analysis-Wj%3DsumR_i Peking-Analysis-W_j_range
where Peking-Analysis-R_i.gif indicates the serial number of Peking-Analysis-X_j.gif in the population of both Peking-Analysis-X_j.gif and Peking-Analysis-Y_j.gif. Note that Wilcoxon Rank Sum statistics Peking-Analysis-W_j.gif are distribution free and its distribution is known as long as the sample number is known.

For example, if Peking-Analysis-n%3D3.gif, Peking-Analysis-x_sample.gif, Peking-Analysis-y_sample.gif,
so Peking-Analysis-xy_sample.gif, which implies that Peking-Analysis-R_sample.gif
Under the null hypothesis, after calculate all the possible order of two sample sets, the distributions of the statistics Peking-Analysis-W_j.gif are shown as below:

Wj 6789101112131415
f(Wj) 0.050.050.100.150.150.150.150.100.050.05

Due to the small sample size, the minimal significance level is 0.05, which means only if Peking-Analysis-Wj%3D15.gif leads to a rejection of the null hypothesis, in other words only when the minimum value of Peking-Analysis-X_j.gif was greater than the maximum value of Peking-Analysis-Y_j.gif to accept the alternative hypothesis instead of the null hypothesis, the two sets of data is significantly different. So the Wilcoxon Rank Sum Test may face challenge in single block test when the experimental and control group are slightly different.However, by using Block Design, we can integrate data from m blocks similar to the idea of ANOVA. We calculated the sum of Peking-Analysis-W_j.gif as the statistics.

Peking-Analysis-W_sumWj.gif
The Wilcoxon Rank Sum Peking-Analysis-W_j.gif from m blocks are independent and identically distributed (i.i.d), according to the central limit theorem (CLT), as m approaches infinity, the random variable Peking-Analysis-W_BD_statistics.gif converges in distribution to a standard normal distribution N(0,1)
Peking-Analysis-W_BD_statistics_CLT.gif
So actually we use the statistics Peking-Analysis-W_BD_statistics.gif, also we can calculate the p-value Peking-Analysis-W_BD_p_value.gif, where Peking-Analysis-Phi%28x%29.gif is the distribution function of the standard normal distribution. If p-value is less than 0.01 or Peking-Analysis-W_BD_gt_2.33.gif, then we accept the alternative hypothesis that the two treatment, i.e. target and mismatch DNA, is highly statistic significantly.

Result

Peking-CRISPR-Figure13.png

Fig. 1 Results of high-throughput assay for MTB and control strain. F denotes fragments obtained from MTB genome (a) or control strain (b); P denotes markers from each fragment.

In our experiment, Peking-Analysis-n%3D3.gif, so Peking-Analysis-E%28W_j%29.gif Peking-Analysis-Var%28W_j%29.gif

Peking-Analysis-W_BD_statistics_p_value.gif
so target and mismatch DNA, are highly significantly different in signal.

Reference