Team:SYSU-Software/Techniques
Backend
Framework
CORE uses Flask, an extensible framework as its backend. And it uses SQLalchemy as the Object Relational Mapper, with which CORE is portable with many kinds of SQL databases (CORE works with SQLite by default). We also use gevent to provide a high-level synchronous running environment, which can greatly shorten the response time when numerous users interact with CORE at the same time.
Friendly User System
CORE has a friendly user system. Anyone can register a CORE account for free. With an account, a user can ask/answer/comment a question, build/share/mark a brand new design, schedule the experiments and so on. Also CORE can remind you through messages and emails when your schedule is about to happen or your question is answered by other users.
Flexible Deployment
CORE can be deployed in two levels. For common users, we release CORE with executable programs in different platforms. Therefore one can just double-click the executable program to deploy CORE. For high-level users or developers, we also provide a management script and detailed instructions for them to operate.
Extensible Interfaces and Automated Documentation
CORE’s frontend and backend communicate with each other by using RESTful APIs. All APIs’ documentation can be built automatically. Every API’s usage will be illustrated with detailed instructions and examples. Documentation is built by Sphinx automatically. Developers can deploy CORE on an accessible server at first, and access it with different platforms, likes Android software or IOS application.
Miscellaneous Models and Unit test
CORE processes miscellaneous models to provide various functions, such as cooperate platform, redesigning circuits or sharing designs. It also comes with all-around unit tests, with which users can know what goes wrong before deploying CORE.
On the shoulders of giants
The primitive CORE contains a few necessary information to support the software. User can also start with some parts, devices and tasks that our team collected from previous iGEM projects for you. Users can acquire many useful components and feel less lonely.
Best Practices in Software Development
We have employed various techniques to ensure the quality of our software and make the developing process as smooth as possible. These techniques proves to be software developing best practices for years, and ensures the extensibility of C.O.R.E.
Continuous Integration
We've introduced Travis CI into our developing procedures, which brings us the advantages of Continuous Integration. By integrating our individual works into the mainline version several times a day, we've been successfully keeping our software.
Automatic Unit Testing
To enforce Automatic Unit Testing, we've selected `nose` as our unit testing tool. Software quality is ensured, and days are saved.
Automatic Documentation Generation
Sphinx is employed to automatically generate the documentations of our server side code. The online version can be found here (http://coredocs.sysusoftware.info). As you can see, the generated documentations are well-organized and beautiful :-)
By using Sphinx, we've been able to concentrate on the software itself, while providing detailed and helpful documentations.
Bug Tracking Facilities
We use Github Issues for bug tracking and features requesting. W
Changes Between Releases
We would like to continue developing C.O.R.E. even after the competition ends. Changes between releases will be posted on our Github project page.
Algorithms
Logistic Regression
When judging whether a shared design should be public or not, we use Logistic Regression Algorithm in Machine Learning to find out the judgment criteria. By checking data marked by people, Logistic Regression can learn to distinguish a good design from a bad one. We use stochastic gradient descent to train the classifier. In average, after training for 500 times (about 0.46 seconds), our algorithm can get a high accuracy (around 90%) on the training set.
Figure.1 Increase in accuracy as the training epoch increases.
Largest Set Matching
In modeling parts, we use Largest Set Matching to determine which equation we should include in our system. We use the target name as the filter of equations. And we set the number of involved value as the index in the database to accelerate the query. When matching, we also find out whether the promoter and gene/protein are located on the same circuit to ensure the accuracy of the simulation. Our set matching algorithm also support user to include multiple equations as long as they are not in conflict with each other. Searching the corresponding equation system of a 22-parts design in a dataset which processes 453 equations takes only 0.2 seconds.