Difference between revisions of "Team:WHU-China/Modeling"

Revision as of 16:25, 14 September 2015

Modeling

1.Circumstance of various factors changing over time

2.Identification to factors influencing the input

3.Inputting situation changing over the output

The model we designed is to check whether our Criticality detection works properly or not. Therefore, we set up the model as follows:
This system is a smart genetic circuit. When Input, like light stimulation, changes, Output still keeps a similar stable outputting, which means this system can transform the signals with different intensity and length into signal impulse.
Our model is to check whether the circuit is right or not, first of all, we try to figure out all the factors which would have an impact on this circuit and the connections among all the factors. By fitting the optimal condition from the experimental data, we can give feedbacks of our project and offer constructive suggestions.

1. First of all, we point our all the parameters and factors in list one, and fit the each curve of the 4 factors which varies with time in plot 1, 2, and 3.

parameters	values
	1
	1
	10
	5
	1
	0.1
	1
	1
	2
	100
	0.1

Plot 1: RSotal changing with operating time

Plot 2: taRNA changing with operating time

Modeling

Plot 3: C1 changing with operating time

Plot4: MGFP changing with operating time (in short period)

2. Principal factor analysis on the basis of principal component analysis

We can figure out the optimal influenced component of the output by modeling on the basis of principal factor analysis 1. Teaching evaluation model based on principal component analysis Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. As each variable reflects some information of the research more or less, and there is some correlation between the indicators, the information we can get from the statistics are overlapped to some extent. Therefore, principal component analysis makes use of the ideas of dimension reduction, which can help retain the original data and minimize the loss of information. It reduces dimension of the high-dimensional variable space so that the original variable system can be integrated and simplified to the largest extent. In addition, it can objectively determine the weight of each indicator parameters, which avoids subjective judgment brought by randomness. On the basis of the original data, we can use principal component analysis to discard some information through linear transformation and then to find out composite indicator made of a combination of several indicators that are also called main ingredients. These components can reflect the characteristics of the original index and they are independent from each other. Set the original vector as

and the main components obtained through principal component analysis as

, which are linear combination of

. Coordinate system composed of

is obtained by translating the original coordinate system and orthogonal rotation. We call the space dimension of

the primary hyperplane.

Modeling

On this main hyperplane, the largest variation in the data is the first principal component

.For

we have

in turn. As a result

can reflect most information of the original data, which implies that the main hyperplane with m dimensions is the very subspace with m dimensions which can retain the original data information to the largest extent. The followings are the steps of principal component analysis. (1)Firstly, normalize the original data to eliminate the impact brought by different magnitudes and dimensions. The formula used is as follows.

Where

means the original data of the jth sample of the ith indicator.

and

respectively implies the mean and standard deviation of samples of the ith indicator. Through the normalized data sheet

, we can further calculate the correlation coefficient R with the formula

Where

Calculate the eigenvalues and eigenvectors of R. And according to the characteristic equation

we can obtain the characteristic root

and place it in descending order

we can get the corresponding feature vector

which are standard orthogonal. We call

spindle. And I mentioned above means the unit matrix. (4) Calculate the contribution rate and the cumulative contribution rate

(5)Calculate the principal component

where the principal components are independent from each other.

Modeling

（6）Comprehensive analysis.
In order to retain as much of the original data information, we should take into consideration how much precision is needed to replace the original variable system with the m-dimension hyperplane. We can make judgment through the cumulative contribution rate

and decide on the dimension of hyperplane m when

and when m can satisfy the condition that

And then we can make further analysis on the principal components extracted. （7） Finally, according to the principal components after principal component analysis and the corresponding weights

we can calculate the comprehensive indicator which reflects characteristics of air polluting. (6)

As the data information implying parameters of indicators has basically been reflected in the principal components, we can say that the comprehensive indicator W has already included the basic characteristics of teaching evaluation from these parameters in different aspects.

Statistical test system for principal component analysis

No denying that the principal component analysis is not appropriate for all sample data. Generally speaking, the method can be used to simplify data structure and to make further analysis in practical problems. Hence, principal component analysis has certain preconditions. Principal component analysis is a good choice only when variables of the original data have strong linear relationship. That is to say, when we find not enough degree of linear correlation among the original variables, it is not reasonable for us to use the principal component analysis as we can not simplify the data structure in this case. Therefore, we should test the applicability of raw data before the application of principal component analysis

1.Bartlett test of sphercity

Bartlett test of sphercity is one of the commonly used statistical test methods. It is a test of the entire correlation matrix and the null hypothesis is that the correlation matrix is unit matrix. And when we can not reject the null hypothesis, we can say that the original variables are independent from each other, which means that it is not suitable to use the principal component analysis method[4]. We can also calculate significance level P value based on the statistic testing formula. When P value is less than 0.05, we should reject the null hypothesis, indicating that principal component analysis can be applied to the original data. On the contrary, if the probability P value is larger than 0.05, the main component analysis is no longer applicable.

2 KMO (Kaiser-Meyer-Olkin) test

Modeling

3. simulation and analysis of the model

KMO (Kaiser-Meyer-Olkin) test is used to compare the simple correlation coefficient with partial correlation coefficient between the comparison variables. It is used mainly in the main factor analysis of multivariate statistics. KMO statistic is a value between 0 and 1. When the sum of the squares of simple correlation coefficient between the variables is larger than that of partial correlation coefficient, KMO value is close to 1, which means that the correlation between variables is strong and the original variable is suitable for factor analysis. Similarly, when the sum of the squares of simple correlation coefficient between the variables is close to zero, KMO value is close to zero, which means that the correlation between variables is weak and the original variable is not suitable for factor analysis. The formula for KMO test is as follows. (7)

Here rij represents the simple correlation coefficient and

represents the partial correlation coefficients. When

KMO value is always between 0 and 1. And when

we can use the principal component analysis. We conduct principal component analysis on the approximately 6000 groups of data gaining from the simulation of the system. The correlation coefficient matrix is as follows: List 2: the correlation coefficient matrix of each factor

	RStotal	taRNA	CI	M_GFP
	the correlation coefficient matrix	RStotal	1.000	0.183	0.999	0.115
taRNA		0.183	1.000	0.165	0.631
CI		0.999	0.165	1.000	0.084
MGFP		0.115	0.631	0.084	1.000

Result:
The results of KMO and Bartlett:

取样足够度的 Kaiser-Meyer-Olkin 度量。	.386
Bartlett 的球形度检验	近似卡方	48881.528
	df	6
	Sig	.000

The result tells us our data can be analysed further using principal component analysis. By principal component analysis, we can figure out the principal component, and analyse the portion of each factor in this component. The component part is as follows in list 4. Lit4: component part:

Factor	Proportion
RStotal	0.920
CI	0.909
M_GFP	0.447
taRNA	0.524

Modeling

So far, we have finished the analysis of the primary factors contributing to the output, and the relationship plot related with RStotal and CI is displayed as follows:

Plot5: relational graph of RStotal, GFP, and C1

3. When X is assigned 1, 10 ,100, the variation of output

Plot6: X equals 1

Plot 7: X equal 10

Plot8:X equals 100 Summary: by fitting the model, we prove that output does not vary with input, as displayed in plot 6,7 and 8, when input is assigned 1, 10, or 100, the output of GFP maintains 8ng. We can claim that the output remains stable with the change of input, the effectiveness of the model is conformed.

参考文献

[1]彭元.基于学生评价的高校教学评估模型研究[J].武汉科技大学学报（社会科学版）,2005,7(3):67-69. [2] Hotelling H. Analysis of a complex of statistical variables into principal components[J].Journal of Educational Psychology,1933,24:417-441. [3] 王炜,马钦忠,林命週等.主成分分析及地震活动参数的约简[J].地震学报,2005,27(5): 524 - 531. [4] 傅德印.主成分分析中的统计检验问题[C].//第十四次全国统计科学讨论会论文汇编.2007:483-488. [5] 姜启源，谢金星，叶俊. 数学模型.北京:高等教育出版社 2003 [6] N. R. Draper, H. Smith. Applied Regression Analysis (third edition). John Wiley & Sons, Inc.1998

@@ Line 515: / Line 515: @@
 <h2>Modeling</h2>
 <span>
 On this main hyperplane, the largest variation in the data is the first principal component
-<br />
 <img src="https://static.igem.org/mediawiki/2015/4/47/WHU-China_gongshi17.png" height="20"/>
-<br />
+<div style="display:inline;margin-left:20px"></div>
 .For
-<br />
 <img src="https://static.igem.org/mediawiki/2015/d/d6/WHU-China_gongshi15.png" height="20"/>
-<br />
+<div style="display:inline;margin-left:130px"></div>
 we have
-<br />
 <img src="https://static.igem.org/mediawiki/2015/e/e2/WHU-China_gongshi16.png" height="20"/>
-<br />
+<div style="display:inline;margin-left:130px"></div>
 in turn. As a result
-<br />
 <img src="https://static.igem.org/mediawiki/2015/4/47/WHU-China_gongshi17.png" height="20"/>
-<br />
+<div style="display:inline;margin-left:20px"></div>
 can reflect most information of the original data, which implies that the main hyperplane with m dimensions is the very subspace with m dimensions which can retain the original data information to the largest extent.
 </span>
@@ Line 541: / Line 537: @@
 <img src="https://static.igem.org/mediawiki/2015/5/5e/WHU-China_gongshi18.png" height="50"/>
 <br />
 Where
-<br />
 <img src="https://static.igem.org/mediawiki/2015/a/ad/WHU-China_gongshi28.png" height="20"/>
-<br /> means the original data of the jth sample of the ith indicator.
+<div style="display:inline;margin-left:20px"></div>means the original data of the jth sample of the ith indicator.
 <br />
 <img src="https://static.igem.org/mediawiki/2015/9/9e/WHU-China_gongshi29.png" height="20"/>
-<br />
+<div style="display:inline;margin-left:20px"></div>
 and
-<br />
 <img src="https://static.igem.org/mediawiki/2015/e/e3/WHU-China_gongshi30.png" height="20"/>
-<br />respectively implies the mean and standard deviation of samples of the ith indicator.
+<div style="display:inline;margin-left:20px"></div>respectively implies the mean and standard deviation of samples of the ith indicator.
 Through the normalized data sheet
-<br />
 <img src="https://static.igem.org/mediawiki/2015/f/f5/WHU-China_gongshi31.png" height="20"/>
-<br />,
+<div style="display:inline;margin-left:40px"></div>,
 we can further calculate the correlation coefficient R with the formula
 <br />
 <img src="https://static.igem.org/mediawiki/2015/6/6a/WHU-China_gongshi32.png" height="20"/>
-<br />
+<div style="display:inline;margin-left:100px"></div>
 Where
-<br />
 <img src="https://static.igem.org/mediawiki/2015/3/30/WHU-China_gongshi19.png" height="50"/>
 <br />