Team:Evry/Software/Pipeline
Pipeline
All the information presented on this page (quality-control, differential expression analysis, data visualisation, variant discovery) is also available as a PDF file.
Data processing and quality control
What we produced: FASTQ files (if we don't have them), FASTQC reports, BAM and SAM files.
Figure 1: schematic overview of the pipeline for RNA-seq data analysis.
Differential expression analysis
What we produced: script for differential expression analysis, table with read counts (tab separated format, 7 columns, ENSG ids).
RNA-seq data can be difficult to interpret (especially in terms of differential expression quantitation). Thus, we decided to adopt a simple method for the analysis, based on counting, for each gene and for each sample, the number of available reads and then testing for significant differences between two experimental conditions or groups.
We wrote an R script that automatically creates a PDF file (in the current directory) with all the figures necessary for visual inspection and result interpretation. The input is a tab separated file with reads counts.
ensembl_id melanocyte_1 melanocyte_2 melanoma_1 melanoma_2 ENSG00000000003 1964 2409 2328 2451 ENSG00000000005 0 2 10 12 ENSG00000000419 15122 19592 38225 36654 ENSG00000000457 12129 14893 7483 7812 ENSG00000000460 21930 25575 13123 13840 ENSG00000000938 48 58 26 42 ENSG00000000971 125 229 124 236 ENSG00000001036 11611 14125 14067 13518 ENSG00000001084 11429 13795 3549 3279
Figure 2: Example input format for DE analysis.
We tested two designs, as illustrated in the tables below: normal cells vs cancerous cells (4 samples), cancerous cells vs cancerous drug treated (4 samples).
Sample name | Condition |
---|---|
melanocyte_1 | M |
melanocyte_2 | M |
melanoma_1 | C |
melanoma_2 | C |
Sample name | Condition |
---|---|
melanoma_1 | C |
melanoma_2 | C |
melanoma_drug_1 | D |
melanoma_drug_2 | D |
Table 1 and 2: tested designs.
Visual exploration of the samples
Prior to checking distances between our samples, we applied a regularized-logarithm transformation (rlog) to stabilise the variance across the mean. The effects of the transformation are shown in the figure below.
Figure 3: Effect of the regularized-logarithm transformation on 'melanocyte_1' and 'melanocyte_2' samples.
After idenfication of genes that are both overexpressed and mutated in tumor samples, we want to know if good candidate antigens can be predicted. Read more about the prediction step.