Team:Evry/Software

We developed a software tool to select the best tumoral antigen candidates that could be used in the development of new personalized cancer immunotherapies, like the one our iGEM group is developing.

General concept

Figure 1: general concept of our antigen prediction pipeline.

Our idea was to develop a pipeline allowing us to select the best candidates tumoral antigens to use for a vaccine (immunotherapy). A good candidate must be tumor-specific and sufficiently expressed in tumor cells to be presented to the immune system. Furthermore, it must be able to be processed efficiently by the immune system.

We intended to identify relevant targets by:

- Looking for differentially expressed genes (specifically upregulated genes) in tumoral tissue (vs. normal tissue), by a transcriptomic analysis (such as RNASeq)
- Looking for genetic mutations only found in tumoral tissue (identifying genetic variants)

Once the targets are identified, the goal is not to express the whole corresponding genes but to express short fragments, corresponding to putative cleavage sites by the proteasome to link to the MHC-I in order to potentially trigger an tumor-specific immune response.

RNAseq data

We created the whole pipeline using data corresponding to RNAseq reads of melanocytes cell lines. All the figures and data presented in the software section were obtained using this dataset.

Experimental procedures for data generation are described in: Vardabasso, C. et al. (2015). Histone Variant H2A.Z.2 Mediates Proliferation and Drug Sensitivity of Malignant Melanoma. Molecular Cell, 59:75-88

Data was accessed from the Sequence Read Archive website (NCBI). Study can be accessed here and sequencing data can be found here.

Code availability

Please see the GitHub repository of our antigen prediction pipeline for code and documentation.

Start discovering our pipeline.

References

Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106.
Bao, R. et al. (2014). Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing. Cancer Informatics, 13(s2):67–82
Danecek, P. et al. (2011). The Variant Call Format and VCFtools. Bioinformatics, 27:2156-2158.
DePristo, M. et al. (2011). A framework for variation discovery and genotyping using nextgeneration DNA sequencing data. Nature, 43:491-498.
McKenna, A. et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20:1297-1303.
Piskol, R. et al. (2013). Reliable identification of genomic variants from RNA-Seq data. The American Journal of Human Genetics, 93:641–651.
Quinlan, A. and Hall, I. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26:841-842.
R Development Core Team. (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: http://www.R-project.org. San Lucas, F.A. et al. (2012). Integrated annotation and analysis of genetic variants from next-generation sequencing studies with variant tools. Bioinformatics, 28:421-422.
Sherry, S.T. et al. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids Research, 29:308-11.
Van der Auwera, G.A. et al. (2013). From fastQ data to high-confidence variant calls: the Genome Analysis Toolkit Best Practices Pipeline. Bioinformatics, 43:11.10.1-11.10.33.
Vardabasso, C. et al. (2015). Histone variant H2A.Z.2 mediates proliferation and drug sensitivity of malignant melanoma. Molecular Cell, 59:75-88.
Wang, G., Peng, B. and Leal, S. (2014). Variant Association Tools for quality control and analysis of large-scale sequence and genotyping array data. The American Journal of Human Genetics, 94:770-783.
Wang, J., Duncan, D., Shi, Z. and Zhang, B. (2013). WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Research, 41(Web Server issue):W77-83. Wang, K. et al. (2010). ANNOVAR: functional annotation of genetic variants from highthroughput sequencing data. Nucleic Acids Research, 38:e164.
Kesmir C, Nussbaum AK, Schild H, Detours V, Brunak S.Prediction of proteasome cleavage motifs by neural networks. Protein Eng., 15(4):287-96, 2002
Nielsen M, Lundegaard C, Blicher T, Lamberth K, Harndahl M, Justesen S, Roeder G, Peters B, Sette A, Lund O, Buus S.NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence.
Ilka Hoof, Bjoern Peters, John Sidney, Lasse Eggers Pedersen, Ole Lund, Soren Buus, and Morten Nielsen. NetMHCpan - MHC class I binding prediction beyond humans

To top

Team:Evry/Software

Antigen prediction

General concept

RNAseq data

Code availability

References