Difference between revisions of "Team:Minnesota/Web Scrape"

 
(9 intermediate revisions by 2 users not shown)
Line 278: Line 278:
  
 
body{
 
body{
background-image:url("https://static.igem.org/mediawiki/2012/9/94/MainBanner2-4.jpg");
+
background-image:url("https://static.igem.org/mediawiki/2015/9/9d/MN_2015_BG.png");
 
background-color:gray;
 
background-color:gray;
 
}
 
}
Line 778: Line 778:
 
<b><font size="4"><center> The Rising Coalition </font></b><br></center>
 
<b><font size="4"><center> The Rising Coalition </font></b><br></center>
 
<br>
 
<br>
&nbsp; &nbsp; &nbsp; &nbsp; The past few years has brought biotechnology to the forefront of science discussion in the public. Vaccination and genetically modified organisms are the greatest examples of this debate and how the public can follow emotional driven arguments above scientific reason. How can we study these behaviors? <br><br>
+
&nbsp; &nbsp; &nbsp; &nbsp; The past few years has brought biotechnology to the forefront of science discussion in the public. Vaccination and genetically modified organisms are the greatest examples of this debate and how the public can follow emotionally driven arguments above scientific reason. How can we study these behaviors? <br><br>
  
 
&nbsp; &nbsp; &nbsp; &nbsp; "Web scraping" is a term used in computer science to describe the process of extracting data and information from websites in a highly automated manner. With the effectively endless supplies of opinions, data, and articles on the web, we can clean up text from these websites and computationally sweep them for both objective and emotional content. <br><br>
 
&nbsp; &nbsp; &nbsp; &nbsp; "Web scraping" is a term used in computer science to describe the process of extracting data and information from websites in a highly automated manner. With the effectively endless supplies of opinions, data, and articles on the web, we can clean up text from these websites and computationally sweep them for both objective and emotional content. <br><br>
  
<img src="https://static.igem.org/mediawiki/2015/a/a9/Registry_1.png" width=100% height=75% align="middle">
 
<br>
 
  
<b><font size="4"><center>Today's Registry</font></b><br></center>
+
<img src="https://static.igem.org/mediawiki/2015/0/01/Prcon.png" width=100% height=75% align="middle">
 
+
<br><br>
&nbsp; &nbsp; &nbsp; &nbsp; Maybe there still is value in the registry, and maybe it does save groups synthesis costs and time. After all, current saves researchers $325 on average a day. But when we return to this conversation in 5 years and the savings is only about $80 a day? The National Science Foundation (NSF, $37 million over 10 years), and Defense Advanced Research Projects Agency (DARPA, undisclosed grant), and National Institutes of Health (NIH, undisclosed grant) funnel hefty grants as well as the International Genetically Engineered Machines (iGEM) competition to develop and contribute to the physical stock of parts. Should we be investing in this type of technology? <br><br>
+
&nbsp; &nbsp; &nbsp; &nbsp; One example of what you can do with this software is to utilize the keyword hotspots across the web. After Google returns a list of websites with content, you can sweep the text for specific words and look at the neighborhood around these locations. This can give you an idea of how people feel and express their viewpoints about a specific technology. For the example below, we searched "GMO Benefits" and "Why to avoid GMOs" in separate queries, then used the hotspot words such as "genetically modified", "GMO", and "genetics". The resulting collection of words in the neighborhood for both of these queries were moved to Wordle to visualize the most prevalent terminology (above). This is one way to identify trends in public communication in biotechnology.
 
+
&nbsp; &nbsp; &nbsp; &nbsp; In addition to the physical database, we were interested in the meta-analysis of biobricks. Are parts highly used in other projects? What do the usage statistics look like? We contacted the iGEM staff to determine whether they would release the complete list of Biobricks with usage data and our request was denied. <br><br>
+
 
+
&nbsp; &nbsp; &nbsp; &nbsp;But we had computers and we had website scraping tools, so we scraped all  parts web pages for every iGEM team part from 2005 to 2014 for their usage information. Processing this information, we can get a snapshot of what is happening with the all the information this organization has accumulated. Biobricks are unconnected to any other Biobrick 72% the time, and 92% are used less than once a year. 61% of part usage is within the same team in the same year. Despite these trends, the superb parts had an entirely different story to tell. <br><br>
+
  
 +
<br><br>
 +
<img src="https://static.igem.org/mediawiki/2015/e/e4/Monsantoigem.png" width=100% height=75% align="middle">
 
<br>
 
<br>
<img src="https://static.igem.org/mediawiki/2015/4/45/Registry_2.png" width=60% height=100% align="middle">
+
&nbsp; &nbsp; &nbsp; &nbsp; Another example is tracking the opinions of a biotechnology company over time. If you want to apply technology developed at your company, you will always have a much easier time if there is a favorable public outlook. Tracking the sentiment of the web will illuminate impending PR problems or whether you've recovered from people actively attacking your company after a problem. To exemplify, we've used the "Google_Time_Lapse" to analyze content from the top webpages published in each month linking to the query. The sentiment analysis included in the Python module TextBlob allows us to see the trends in Monsanto's image which reaches an all-time low May 2015, which is consequently the time protest were undertaken, partially in response to Monsanto's actions involving genetically modified foods. <br><br>
<br>
+
  
<b><font size="4"><center> The Best of the Registry </font></b><br></center>
+
<br><br><center>https://github.com/PatrickHolec/PyMOL360</center><br>
<br>
+
<center>or contact:</center><br>
 
+
<center>Patrick V. Holec</center>
&nbsp; &nbsp; &nbsp; &nbsp;To the (right), the top 1.3% of Biobricks by usage were mapped to all Biobricks they were used in and processed in Gephi as a network of information. The resulting image is the marvel of what the Registry of Standard Biological Parts has achieved. With each point representing a Biobrick, each color representing a year of competition, you can see the interconnectivity of iGEM at its best. Teams across the world and years building from others work and linking their work into this network of information.<br><br>
+
<center>hole0077@umn.edu</center>
 
+
<center>University of Minnesota</center><br>
 
+
&nbsp; &nbsp; &nbsp; &nbsp;And this is what we believe the best direction is for the registry and iGEM as a whole. There has been this emphasis on forcing projects into this physical standard of biobricks and punished teams for omitting submission of a registry part. These leaves massive portions of the registry dark, untouched. The registry was a revolutionary idea initially, but the registry must evolve to coexist with the modern scientific environment. Drop the physical system and commit the registry to be a haven of synthetic biology research focusing on the scientific network, not the part number.<br><br>
+
 
+
<br>
+
<img src="https://static.igem.org/mediawiki/2015/1/17/Registry_3.png" width=70% height=100% align="middle">
+
<br>
+
  
  

Latest revision as of 03:16, 19 September 2015

Team:Minnesota/Project/Insulin - 2015.igem.org

 

Team:Minnesota/Project/Insulin

From 2015.igem.org

Team:Minnesota - Main Style Template Team:Minnesota - Template

Biotechnology and the Web


        The advent of the internet has conceived a common grounds for the public to rapidly generate and spread their ideas. Although this has undoubtedly shaped and improved our lives, patterns of misinformation spread in these channels have presented serious obstacles to the advancement to biotechnology in societal applications. To address this, we have taken prelimary steps into developing a module power by Google that has the ability to probe the web on both content and temporal ranges to give research a reference for public outlook.

The Rising Coalition

        The past few years has brought biotechnology to the forefront of science discussion in the public. Vaccination and genetically modified organisms are the greatest examples of this debate and how the public can follow emotionally driven arguments above scientific reason. How can we study these behaviors?

        "Web scraping" is a term used in computer science to describe the process of extracting data and information from websites in a highly automated manner. With the effectively endless supplies of opinions, data, and articles on the web, we can clean up text from these websites and computationally sweep them for both objective and emotional content.



        One example of what you can do with this software is to utilize the keyword hotspots across the web. After Google returns a list of websites with content, you can sweep the text for specific words and look at the neighborhood around these locations. This can give you an idea of how people feel and express their viewpoints about a specific technology. For the example below, we searched "GMO Benefits" and "Why to avoid GMOs" in separate queries, then used the hotspot words such as "genetically modified", "GMO", and "genetics". The resulting collection of words in the neighborhood for both of these queries were moved to Wordle to visualize the most prevalent terminology (above). This is one way to identify trends in public communication in biotechnology.


        Another example is tracking the opinions of a biotechnology company over time. If you want to apply technology developed at your company, you will always have a much easier time if there is a favorable public outlook. Tracking the sentiment of the web will illuminate impending PR problems or whether you've recovered from people actively attacking your company after a problem. To exemplify, we've used the "Google_Time_Lapse" to analyze content from the top webpages published in each month linking to the query. The sentiment analysis included in the Python module TextBlob allows us to see the trends in Monsanto's image which reaches an all-time low May 2015, which is consequently the time protest were undertaken, partially in response to Monsanto's actions involving genetically modified foods.



https://github.com/PatrickHolec/PyMOL360

or contact:

Patrick V. Holec
hole0077@umn.edu
University of Minnesota