Difference between revisions of "Team:UESTC Software/Modeling.html"

(Blanked the page)
 
(32 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<!doctype html>
+
 
<html class="no-js">
+
<head>
+
<meta charset="UTF-8">
+
<title>Modeling</title>
+
        <link rel="stylesheet" href="https://2015.igem.org/Template:UESTC_Software/CSS?action=raw&ctype=text/css">
+
        <link rel="stylesheet" href="https://2015.igem.org/Template:UESTC_Software/wen_css?action=raw&ctype=text/css">
+
</head>
+
<body>
+
<header class="dou_header">
+
<div class="header_logo">
+
<a href="javascript:void(0)"><img src="assets/uestc_software_head_logo.png"></a>
+
</div>
+
<nav class="dou_divide_nav">
+
<div class="dou_divide_nav_wrap">
+
<ul>
+
<li >
+
<a href="#">HOME</a>
+
</li>
+
<li class="choose">
+
<a class="" href="index2.html">PROJECT</a>
+
<ul class="dou_divide_nav_second clearfix">
+
<li><a href="">Description</a></li>
+
<li><a href="">Protocols</a></li>
+
<li><a href="">Results</a></li>
+
<li><a href="">Design</a></li>
+
<li><a href="">Modeling</a></li>
+
</ul>
+
</li>
+
<li>
+
<a href="#">TEAM</a>
+
<ul class="dou_divide_nav_second">
+
<li><a href="">Members</a></li>
+
<li><a href="">Attributions</a></li>
+
</ul>
+
<hr/>
+
</li>
+
<li>
+
<a href="#">NOTEBOOK</a>
+
</li>
+
<li>
+
<a href="#">HUMAN<br/>PRACTICE</a>
+
<ul class="dou_divide_nav_second">
+
<li><a href="">Game</a></li>
+
<li><a href="">Meet-up</a></li>
+
<li><a href="">Collaborations</a></li>
+
</ul>
+
</li>
+
<li>
+
<a href="#">REQUIREMENTS</a>
+
<ul class="dou_divide_nav_second">
+
<li><a href="">Medals</a></li>
+
<li><a href="">Safety</a></li>
+
<li><a href="">Documentation</a></li>
+
+
</ul>
+
</li>
+
</ul>
+
</div>
+
</nav>
+
</header>
+
<div class="modeling_div">
+
<h1>Modeling&Validation</h1>
+
<p>(Tools: php&Matlab)</p>
+
<h2>Modeling</h2>
+
<section>
+
<h3>Overview</h3>
+
<p>Modeling is one of the most important parts in our project. In the modeling, the data we used mainly from the CEG (Cluster of Essential Genes) data base.In  <b>CEG</b>, genes’ category  were based on their corresponding <b>KEGG</b>(Kyoto Encyclopedia of Genes and Genomes) Orthology (KO) and <b>COG</b>(Clusters of Orthologous Group) category and function descriptions.</p>
+
<p>Ubiquity retaining strategy was first put up by Koonin, we used half retain strategy to cover the shortage. Actually we use the 42% as the ratio not the half. Our modeling is to find it and prove it logically.
+
In the modeling, we use the knowledge of statistic. Find the relation of size of minimal-gene-set with number of organisms and HNR(Half Number Ratio). HNR is a ratio value that we use to replace the “half” in our strategy.  Then we construct the 3d modeling for that.</p>
+
<p>Besides, to get the ideal value of HNR, we use the  multivariable linear regression to get the fitting polynomial for the relation of OE rate with the HNR. From the polynomial, we find the appropriate HNR for our strategy, which has the highest OE rate and many other advantages compared with other HNR.</p>
+
<p>We divide our modeling into 3 steps.</p>
+
</section>
+
<section>
+
<h3>step 1</h3>
+
<p>Firstly, we randomly choose 4 organisms from the data base(totally 29 organisms) as the experiment no.1 and rechoose  5 organism as experiment no.2 until we have 28 organisms in experiment no.25(We choose 10 times for one experiment, then get the <b>average of every data of it</b>) . we observed the trend of the size of minimal gene set from experiment no.1~25, and noticed a decrease trend when using the <b>ubiquity retaining strategy</b>(Koonin) while a stabilization trend when using  the <b>half retaining strategy</b>. Line chart as follow(Fig 1.1):</p>
+
<img src="assets/figure1.3.jpg">
+
<p>Obviously, the result of the half retaining strategy seems more steady and more significant when the number of organisms increasing than the ubiquity retaining strategy.</p>
+
<p>Secondly, to find the relation between the <b>HNR</b>(Half Number Ratio) and the size of <b>minimal-gene-set</b>. We divide 100% into 100 groups (1%~100%). Now we have <b>100 groups</b> and 25 experiments. Then we respectively get the experiment no.1~25’s size of minimal-gene-set in every group. And finally get line charts like Fig 1.2 and Fig 1.3(Cause there are too much data that we only list the 1/10,2/10~10/10’s results in Fig 1.2):</p>
+
<img src="assets/figure1.3.jpg">
+
<img src="assets/figure1.3.jpg">
+
<p>The conclusion is that the HNRs between interval <b>[30/100~51/100]</b> ’s size of minimal-gene-set are more close to the ideal amount. </p>
+
<p>Finally,To get a more particular knowledge of changing trend. We made a diagram to illustrate the <b>relation</b> of minimal gene set with number of organisms and HNR.</p>
+
<p>We use the <b>least square method</b> to get the <b>fitting polynomial</b> for every lines in the figure 1.2(100 polynomials here). We choose the <b>3 order</b> polynomial. To get the 4 coefficients for the polynomial, we use the same method to get the <b>relation of coefficient with the HNR</b>. The 4 coefficients’ fitting polynomials are as follows:</p>
+
<p>The polynomial for the relation of the minimal gene set with number of organisms and HNR is as follows:</p>
+
<p>Now we have the polynomial to connect the relation of minimal gene set with number of organisms and HNR. The diagram is as follows:</p>
+
<img src="assets/HNR.jpg">
+
<p>From diagram above, we found an area that have a stable gene number when organisms number changed. It means use these half number ratio can get a stable result of minimal gene set. But to find the most stable HNR for the “half” retaining strategy, let’s move to step 2.</p>
+
</section>
+
<section>
+
<h3>step 2</h3>
+
<p>From the Step 1, we divided 100% into 100 groups (1%~100%). And we also have 25 experiments. Besides, before doing the experiments, we should define a new variable to measure whether the strategy is stable and effective. We call it OE rate( Overlapping Effectively Rate). It’s calculated by this formula:</p>
+
<p><i>cUGS: the number of current experiment’s universal-gene-set-result</i></p>
+
<p>UGS29:the number of 29-organism-experiment’s universal-gene-set-result</p>
+
<p>This formula is to reduce the influence of  the Gene’s amount. More OE rate means the strategy has a more sufficient result.</p>
+
<p>Now we have 100 groups and 25 experiments. Then we respectively calculate experiment no.1~25’s average of OE rate in every group. And finally get this line chart like Fig 2.1:</p>
+
<img src="assets/figure2.1.jpg">
+
<img src="assets/figure2.2.jpg">
+
<p>Obviously, we could find there are several groups has an ideal result, and also some unexpected results.</p>
+
<p>Next, we calculate experiment no.1~25’s <b>variance</b> in every group. Line chart as follow:</p>
+
<p>From the Fig 2.2, compare the variances, we could easily find the groups between 26/100~61/100’s variances are less than others which means strategy in this interval  is more steady.(Cause all the correlation coefficient, here is less than 0.9, we can not match a suitable polynomial for this data. But all the variances in that interval are less than 0.0005 and similar with each others.)</p>
+
<p>Fortunately, we got a pretty results. We use  the <b>least square method to get the fitting polynomial</b>. The new curve graph like this:</p>
+
<img src="assets/figure2.3.jpg">
+
<p>The polynomial is as follow:</p>
+
<p>After comparing the  order of polynomial,we choose the 4 order polynomial. The  <b>correlation coefficient</b> is 0.9703. According to the list. The polynomial is 97% to be accepted.</p>
+
<p>Next we calculate the <b>maximization</b> as our ideal HNR (<b>42%</b>). That is how we define the proportion value of the retaining strategy.</p>
+
</section>
+
<section>
+
<h2>Reference</h2>
+
<p>1.Yuan-nong Ye, Zhi-gang hua, Jian Huang, Nini Rao and Feng-biao Guo*: CEG: <b>a database of essential gene clusters</b>. BMC Genomics 2013</p>
+
<p>2.Mushegian AR, Koonin EV:<b>A minimal gene set for cellular life derived by comparison of complete bacterial genomes</b>. Proc Natl Acad Sci U S A 93, 10268-10273 (1996). </p>
+
<p>3.Roman L.Tatusov, Michael Y. Galperin, Darren A. Natale and Eugene V. Koonin*: <b>The COG database: a tool for genome-scale analysis of protein functions and evolution</b></p>
+
</section>
+
<section>
+
<h2>Validation</h2>
+
<h3>step 1</h3>
+
<p>We pick a organism from the 29 organisms(from no.1~no.29) one by one as 29 experiments . Then respectively use the remaining 28 organisms to  screen the minimal gene set again and compare the results with the completed(29 organisms) </p>
+
<p>one to get the overlapRatio. </p>
+
<p>The line chart is as follow:</p>
+
<img src="assets/figure1.3.jpg">
+
<p>The <b>variance of the overlapRatio</b> is 0.000479</p>
+
<p>From the line chart and the variance, we could easily get a conclusion that all the experiments’ overlapRatios are <b>more than 90%</b> and <b>vary slightly</b>, which means our strategy is <b>accurate</b> and <b>stable</b>.</p>
+
<h3></h3>
+
</section>
+
<section>
+
<h3>step 2</h3>
+
<p>Then let’s move to step 2. In step 1 we proved that our strategy is stable and accurate. Next, we compare the result with other two results(Gil, Mushegian&Koonin) which has got the minimal gene set by using other methods to prove our method is reliable and significant.</p>
+
<img src="assets/figure2.1.2.jpg">
+
<p>From the flow charts,  unlike these two groups we have fewer genes in our minimal gene set, and have a higher overlap numbers for having 190 same genes with the Gil, and 156 genes with the Mushegian&Koonin’s result.Only 5 genes are different from other two groups. After this step, it’s obvious that our result is reliable and stable.</p>
+
</section>
+
</div>
+
</body>
+
</html>
+

Latest revision as of 15:42, 17 September 2015