Difference between revisions of "Team:Evry/Software/Pipeline"

Revision as of 00:09, 21 November 2015

All the information presented on this page (quality-control, differential expression analysis, data visualisation, variant discovery) is also available as a PDF file.

Data processing and quality control

What we produced: FASTQ files (if we don't have them), FASTQC reports, BAM and SAM files.

Figure 1: schematic overview of the pipeline for RNA-seq data analysis.

Diffeential expression analysis

What we produced: script for differential expression analysis, table with read counts (tab separated format, 7 columns, ENSG ids).

RNA-seq data can be difficult to interpret (especially in terms of differential expression quantitation). Thus, we decided to adopt a simple method for the analysis, based on counting, for each gene and for each sample, the number of available reads and then testing for significant differences between two experimental conditions or groups.

We wrote an R script that automatically creates a PDF file (in the current directory) with all the figures necessary for visual inspection and result interpretation. The input is a tab separated file with reads counts.

ensembl_id	melanocyte_1	melanocyte_2	melanome_1	melanome_2
ENSG00000000003	1964	2409	2328	2451
ENSG00000000005	0	2	10	12
ENSG00000000419	15122	19592	38225	36654
ENSG00000000457	12129	14893	7483	7812
ENSG00000000460	21930	25575	13123	13840
ENSG00000000938	48	58	26	42
ENSG00000000971	125	229	124	236
ENSG00000001036	11611	14125	14067	13518
ENSG00000001084	11429	13795	3549	3279

Figure 2: Example input format for DE analysis.

After idenfication of genes that are both overexpressed and mutated in tumor samples, we want to know if good candidate antigens can be predicted. Read more about the prediction step.

To top

@@ Line 13: / Line 13: @@
 <div class="container">
      <div class="side-body" id="content-body">
-        <div id='top-menu-anchor'></div>
-        <div id="top-menu"></div>
          <div class="page-header">
              <h1>Pipeline</h1>
          </div>
-<p class='lead'>A detailed report on the genomics pipeline we created (quality-control, differential expression analysis, data visualisation, variant discovery) can be found <a href="https://static.igem.org/mediawiki/2015/c/c4/Igem_pipeline_description.pdf" _target="_blank">here</a>.</p>
+<p class="text-justify">All the information presented on this page (quality-control, differential expression analysis, data visualisation, variant discovery) is also available as a <a href="https://static.igem.org/mediawiki/2015/c/c4/Igem_pipeline_description.pdf" _target="_blank">PDF file</a>.</p><br>
+        <div id='top-menu-anchor'></div>
+        <div id="top-menu">
+<ul>
+  <li style="height: 34px; padding-top: 8px;" class="text-muted"><strong>Jump to:</strong> &emsp;</li>
+  <li>
+    <a href="#qc-step">Data processing & quality control</a>
+  </li>
+  <li>
+    <a href="#deg-step">Differential expression analysis</a>
+  </li>
+  <li>
+    <a href="#var-step">Variant discovery</a>
+  </li>
+</ul>
+</div><br>
+<section class="page-section" id="qc-step">
+<h2>Data processing and quality control</h2>
+<p class="text-justify text-muted"><strong>What we produced: </strong>FASTQ files (if we don't have them), FASTQC reports, BAM and SAM files. </p>
+<img src="https://static.igem.org/mediawiki/2015/b/b7/Qc_pipeline.png" class="img-responsive" style="margin: 0 auto; max-width: 250px; height: auto;"/>
+<p class="text-center"><strong>Figure 1:</strong> schematic overview of the pipeline for RNA-seq data analysis.</p>
+</section>
+<section class="page-section" id="deg-step">
+<h2>Diffeential expression analysis</h2>
+<p class="text-justify text-muted"><strong>What we produced: </strong>script for differential expression analysis, table with read counts (tab separated format, 7 columns, ENSG ids).</p>
+<p class="text-justify">RNA-seq  data  can  be  difficult  to  interpret  (especially  in  terms  of  differential  expression quantitation). Thus, we decided to adopt a simple method for the analysis, based on counting, for
+each  gene  and  for  each  sample,  the  number  of  available  reads  and  then  testing  for  significant
+differences between two experimental conditions or groups.</p>
+<p class="text-justify">We  wrote  an  R  script  that  automatically  creates  a  PDF  file  (in  the  current  directory)  with  all  the
+figures  necessary  for  visual  inspection  and  result  interpretation. The  input  is  a  tab  separated  file
+with reads counts.</p>
+<br>
+<pre>ensembl_id	melanocyte_1	melanocyte_2	melanome_1	melanome_2
+ENSG00000000003	1964	2409	2328	2451
+ENSG00000000005	0	2	10	12
+ENSG00000000419	15122	19592	38225	36654
+ENSG00000000457	12129	14893	7483	7812
+ENSG00000000460	21930	25575	13123	13840
+ENSG00000000938	48	58	26	42
+ENSG00000000971	125	229	124	236
+ENSG00000001036	11611	14125	14067	13518
+ENSG00000001084	11429	13795	3549	3279</pre>
+<p class="text-center"><strong>Figure 2:</strong> Example input format for DE analysis.</p>
+</section>
+<section class="page-section" id="var-step">
+</section>
-<p class="lead">After idenfication of genes that are both overexpressed and mutated in tumor samples, we want to know if good candidate antigens can be predicted. Read more about the <a href="https://2015.igem.org/Team:Evry/Software/Prediction">prediction step</a>. </p>
+<p class="lead">After idenfication of genes that are both overexpressed and mutated in tumor samples, we want to know if good candidate antigens can be predicted. Read more about the <a href="https://2015.igem.org/Team:Evry/Software/Predictions">prediction step</a>. </p>
 </div><!-- end .side-body -->
 </div> <!-- end .container -->