Team:Korea U Seoul/Project/Result/Discussion/content

Result - Data Processing and Database Construction

① Data Processing

To search all possible paths between a set of compounds, we utilized NetworkX (Hagberg A., Schult D., Swart P. 2008). This is a Python-based software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks (NetworkX n.d.). Among many graph algorithms, our program ‘Gil’ uses an algorithm named “all simple paths”. This algorithm can generate all simple paths in a graph from a source, a starting node, to a target, an ending node).

The software has several advantages. First, we can select a simple path, the path that does not include repeated nodes. Since a repeated node indicates the presence of the repeated reactions, we could have ruled out such reactions. In addition, the result shows better paths in comparison with “All shortest paths”, another well-known algorithm. Since “All shortest paths” only computes all the shortest paths in the graph, it often shows unsuitable paths for biologists. Finally, we can set up a cutoff which is a depth to stop a search. We determined a particular cutoff value due to the fact that a longer path which needs more than eight different transgenes is not a reasonable part design to a synthetic biologist.

Then, we calculated scores of every output paths and picked up the three optimal paths from each scoring factors. When it comes to a set of compounds, the maximum twelve paths are formed into a network, scores, BioBrick interlinking information, and other related data. We saved those information into JSON and text format. When it comes to the standard biological parts, there are 24,133 parts IDs and sequences. We filtered the sequences shorter than 100 bp, and 19,808 parts were left. As a result, after running the Nucleotide BLAST to KEGG GENES data, 565,163 matches were found (with E-value of 1e-5). The number of 3972 parts were linked to 20,276 gene IDs.

② Database Construction

After collecting and processing all the data needed, our team turned them into MySQL tables and JSON files. The comparison between KEGG database and the ‘Gil’ is shown on the table below.