Team:Minnesota/Web Scrape
Team:Minnesota/Project/Insulin
From 2015.igem.org
Biotechnology and the Web
The advent of the internet has conceived a common grounds for the public to rapidly generate and spread their ideas. Although this has undoubtedly shaped and improved our lives, patterns of misinformation spread in these channels have presented serious obstacles to the advancement to biotechnology in societal applications. To address this, we have taken prelimary steps into developing a module power by Google that has the ability to probe the web on both content and temporal ranges to give research a reference for public outlook.
The past few years has brought biotechnology to the forefront of science discussion in the public. Vaccination and genetically modified organisms are the greatest examples of this debate and how the public can follow emotional driven arguments above scientific reason. How can we study these behaviors?
"Web scraping" is a term used in computer science to describe the process of extracting data and information from websites in a highly automated manner. With the effectively endless supplies of opinions, data, and articles on the web, we can clean up text from these websites and computationally sweep them for both objective and emotional content.
In addition to the physical database, we were interested in the meta-analysis of biobricks. Are parts highly used in other projects? What do the usage statistics look like? We contacted the iGEM staff to determine whether they would release the complete list of Biobricks with usage data and our request was denied.
But we had computers and we had website scraping tools, so we scraped all parts web pages for every iGEM team part from 2005 to 2014 for their usage information. Processing this information, we can get a snapshot of what is happening with the all the information this organization has accumulated. Biobricks are unconnected to any other Biobrick 72% the time, and 92% are used less than once a year. 61% of part usage is within the same team in the same year. Despite these trends, the superb parts had an entirely different story to tell.
To the (right), the top 1.3% of Biobricks by usage were mapped to all Biobricks they were used in and processed in Gephi as a network of information. The resulting image is the marvel of what the Registry of Standard Biological Parts has achieved. With each point representing a Biobrick, each color representing a year of competition, you can see the interconnectivity of iGEM at its best. Teams across the world and years building from others work and linking their work into this network of information.
And this is what we believe the best direction is for the registry and iGEM as a whole. There has been this emphasis on forcing projects into this physical standard of biobricks and punished teams for omitting submission of a registry part. These leaves massive portions of the registry dark, untouched. The registry was a revolutionary idea initially, but the registry must evolve to coexist with the modern scientific environment. Drop the physical system and commit the registry to be a haven of synthetic biology research focusing on the scientific network, not the part number.