Difference between revisions of "Team:Evry/Software/Predictions"

Revision as of 17:01, 21 November 2015

Protein sequences retrieval

What we produced: shell script to retrieve all proteins sequences associated to a list of gene identifiers, FASTA file containing all the proteins encoded by a list of genes (i. e. genes that are both overexpressed and mutated in tumor samples).

So far, our pipeline allowed us to select genes that are over-expressed (differential expression analysis) and mutated (variant discovery) in tumor samples. We retrieved the proteins encoded by these genes with a shell script that uses the Ensembl REST API, that produces a FASTA files containing all the sequences.

Two machine learning tools were used on this dataset to perform predictions related to proteasome cleavage of the protein and MHC-I affinity.

Immune system processing

What we produced: scripts to launch and parse results of NetChop and NetMHCpan, final result table containing candidate antigens and their final scores.

A good candidate antigen must be tumor-specific, sufficiently expressed in tumor cells, but also able to be processed efficiently by the immune system.

Proteasome cleavage prediction

The major histocompatibility complex class I (MHC-I) recognizes peptides of short length (8 to 10 aminoacids). These peptides are products of proteasomal degradation, a process that does not cut proteins randomly. Candidate antigens cannot contain proteasome cleavage sites as they would not be able to be presented to the immune system.

We performed proteasome cleavage sites prediction using NetChop, an open-source machine learning tool. We obtain a list of short peptides (with their associated NetChop score) that can be presented to the immune system. This list is then filtered by predicting if these antigens will be able to bind to the MHC-I.

MHC-I affinity prediction

Not all antigens are able to bind efficiently to the MHC-I. We used NetMHCpan, an open-source machine learning tool, to predict the binding affinity of all the antigens in our list. The predicted affinity is given in units of IC50nM, therefore a lower number indicates higher affinity. It is generally assumed that peptides with IC50 values < 50 nM are considered high affinity, < 500 nM intermediate affinity. However, the binding affinity is not correlated to the immune response, meaning that some antigens might bind very effectively without triggering an intense immune response.

We finally sorted the candidate antigens using a scoring function that combines the two types of predictions (linear combination of normalized score).

The YETIpredict web application

What we produced: a web interface to explore the results of the differential expression analysis, the variant analysis and the immune predictions

In order to make our pipeline as easy-to-use as possible, we created a web application that allows users to run all the steps of our pipeline that do not require a lot of computing resources. You can reach it here.

Visualisations of differential expression and variant analysis are available, as well as interactive tables that allow users to browse, sort and search easily the differentially expressed gene list or the candidate antigens list.

Figure 1: Input page of YETIpredict

To top

@@ Line 18: / Line 18: @@
              <h1>Prediction</h1>
          </div>
-<p class="lead text-justify">We used NetChop and NetMHCpan, two machine learning tools, to predict proteasome cleavage probability and MHC-I affinity. We sorted the potential antigens by a score that combines the two types of predictions. We created a web application that allows users to run this last step of prediction. You can reach it <a href="http://igem2015.issb.genopole.fr" target="_blank">here</a>.</p>
+<section class="page-section">
-<img src="https://static.igem.org/mediawiki/2015/f/fe/Webserver.png" class='img-responsive'>
+<h2>Protein sequences retrieval</h2>
+<p class="text-justify text-muted"><strong>What we produced: </strong>shell script to retrieve all proteins sequences associated to a list of gene identifiers, FASTA file containing all the proteins encoded by a list of genes (<em>i. e.</em> genes that are both overexpressed and mutated in tumor samples).</p>
+<p class="text-justify">So far, our pipeline allowed us to select genes that are over-expressed (differential expression analysis) and mutated (variant discovery) in tumor samples. We retrieved the proteins encoded by these genes with a shell script that
+uses the <a href="http://rest.ensembl.org/" target="_blank">Ensembl REST API</a>, that produces a FASTA files containing all the sequences.</p>
+<p class="text-justify">Two machine learning tools were used on this dataset to perform predictions related to proteasome cleavage of the protein and MHC-I affinity.</p>
+</section>
+<section class="page-section">
+<h2>Immune system processing</h2>
+<p class="text-justify text-muted"><strong>What we produced: </strong>scripts to launch and parse results of NetChop and NetMHCpan, final result table containing candidate antigens and their final scores.</p>
+<p class="text-justify">A good candidate antigen must be tumor-specific, sufficiently expressed in tumor cells, but also <strong>able to be processed efficiently</strong> by the immune system.</p>
+<h3>Proteasome cleavage prediction</h3>
+<p class="text-justify">The major histocompatibility complex class I (MHC-I) recognizes peptides of short length (8 to 10 aminoacids). These peptides are products of proteasomal degradation, a process that does not cut proteins randomly. Candidate antigens cannot contain proteasome cleavage sites as they would not be able to be presented to the immune system.</p>
+<p class="text-justify">We performed proteasome cleavage sites prediction using <a href="http://www.cbs.dtu.dk/services/NetChop/" target="_blank">NetChop</a>, an open-source machine learning tool. We obtain a list of short peptides (with their associated NetChop score) that can be presented to the immune system. This list is then filtered by predicting if these antigens will be able to bind to the MHC-I. </p>
+<h3>MHC-I affinity prediction</h3>
+<p class="text-justify">Not all antigens are able to bind efficiently to the MHC-I. We used <a href="http://www.cbs.dtu.dk/services/NetMHCpan/" target="_blank">NetMHCpan</a>, an open-source machine learning tool, to predict the binding affinity of all the antigens in our list. The predicted affinity is given in units of IC50nM, therefore a lower number indicates higher affinity. It is generally assumed that peptides with IC50 values < 50 nM are considered high affinity, < 500 nM intermediate affinity. However, the binding affinity is not correlated to the immune response, meaning that some antigens might bind very effectively without triggering an intense immune response.</p>
+<p class="text-justify">We finally sorted the candidate antigens using a scoring function that combines the two types of predictions (linear combination of normalized score).</p>
+</section>
+<section class="page-section">
+<h2>The YETIpredict web application</h2>
+<p class="text-justify text-muted"><strong>What we produced: </strong>a web interface to explore the results of the differential expression analysis, the variant analysis and the immune predictions </p>
+<p class="text-justify">In order to make our pipeline as easy-to-use as possible, we created a web application that allows users to run all the steps of our pipeline that do not require a lot of computing resources. You can reach it <a href="http://igem2015.issb.genopole.fr" target="_blank">here</a>.</p>
+<p class="text-justify">Visualisations of differential expression and variant analysis are available, as well as interactive tables that allow users to browse, sort and search easily the differentially expressed gene list or the candidate antigens list. </p>
+<img src="https://static.igem.org/mediawiki/2015/f/fe/Webserver.png" class="img-responsive"/>
+<p class="text-center"><strong>Figure 1:</strong> Input page of YETIpredict</p>
+</section>
+<!--p class="lead text-justify">We used NetChop and NetMHCpan, two machine learning tools, to predict proteasome cleavage probability and MHC-I affinity. We sorted the potential antigens by a score that combines the two types of predictions. We created a web application that allows users to run this last step of prediction. You can reach it <a href="http://igem2015.issb.genopole.fr" target="_blank">here</a>.</p>
+<img src="" class='img-responsive'-->
 </div><!-- end .side-body -->
 </div> <!-- end .container -->