TCGA Reanalysis Pipeline - RNAseq¶
Modules included in the TCGA RNAseq Cancer pipeline.
TCGA Download¶
- omics_pipe.modules.TCGA_download.TCGA_download(sample, TCGA_download_flag)[source]¶
Downloads and unzips TCGA data from Manifest.xml downloaded from CGHub. input:
TGCA XML file- output:
- downloaded files from TCGA
- citation:
- The Cancer Genome Atlas
- link:
- https://cghub.ucsc.edu/software/downloads.html
- parameters from parameters file:
TCGA_XML_FILE:
TCGA_KEY:
TCGA_OUTPUT_PATH:
CGATOOLS_VERSION:
FASTQC¶
- omics_pipe.modules.fastqc.fastqc(sample, fastqc_flag)[source]¶
QC check of raw .fastq files using FASTQC.
- input:
- .fastq file
- output:
- folder and zipped folder containing html, txt and image files
- citation:
- Babraham Bioinformatics
- link:
- http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- parameters from parameters file:
RAW_DATA_DIR:
QC_PATH:
FASTQC_VERSION:
COMPRESSION:
STAR Aligner¶
- omics_pipe.modules.star.star(sample, star_flag)[source]¶
Runs STAR to align .fastq files.
- input:
- .fastq file
- output:
- Aligned.out.bam
- citation:
- Dobin et al, Bioinformatics 2012; doi: 10.1093/bioinformatics/bts635 “STAR: ultrafast universal RNA-seq aligner”
- link:
- https://code.google.com/p/rna-star/
- parameters from parameters file:
ENDS:
RAW_DATA_DIR:
STAR_INDEX:
STAR_OPTIONS:
STAR_RESULTS:
SAMTOOLS_VERSION:
STAR_VERSION:
COMPRESSION:
REF_GENES:
HTSEQ-count¶
- omics_pipe.modules.htseq.htseq(sample, htseq_flag)[source]¶
Runs htseq-count to get raw count data from alignments.
- input:
- Aligned.out.sort.bam
- output:
- counts.txt
- citation:
- Simon Anders, EMBL
- link:
- http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html
- parameters from parameters file:
STAR_RESULTS:
HTSEQ_OPTIONS:
REF_GENES:
HTSEQ_RESULTS:
TEMP_DIR:
SAMTOOLS_VERSION:
BAM_FILE_NAME:
PYTHON_VERSION:
RSEQC¶
- omics_pipe.modules.rseqc.rseqc(sample, rseqc_flag)[source]¶
Runs rseqc to determine insert size as QC for alignment.
- input:
- .bam
- output:
- pdf plot
- link:
- http://rseqc.sourceforge.net/
- parameters from parameters file:
STAR_RESULTS:
QC_PATH:
BAM_FILE_NAME:
RSEQC_REF:
RSEQC_VERSION:
TEMP_DIR:
Fusion Catcher¶
- omics_pipe.modules.fusion_catcher.fusion_catcher(sample, fusion_catcher_flag)[source]¶
Detects fusion genes in paired-end RNAseq data.
- input:
- paired end .fastq files
- output:
- list of candidate fusion genes
- citation:
- Kangaspeska, S. Hultsch, H. Edgren, D. Nicorici, A. Murumgi, O.P. Kallioniemi, Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms, PLOS One, Oct. 2012. http://dx.plos.org/10.1371/journal.pone.0048745
- link:
- https://code.google.com/p/fusioncatcher
- parameters from parameters file:
ENDS:
RAW_DATA_DIR:
FUSION_RESULTS:
FUSIONCATCHERBUILD_DIR:
TEMP_DIR:
SAMTOOLS_VERSION:
FUSIONCATCHER_VERSION:
FUSIONCATCHER_OPTIONS:
TISSUE:
PYTHON_VERSION:
BWA/SNPiR¶
BWA¶
- omics_pipe.modules.bwa.bwa1(sample, bwa1_flag)[source]¶
BWA aligner for read1 of paired_end reads.
- input:
- .fastq
- output:
- .sam
- citation:
- Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]
- link:
- http://bio-bwa.sourceforge.net/bwa.shtml
- parameters from parameters file:
BWA_RESULTS:
TEMP_DIR:
SAMTOOLS_VERSION:
BWA_VERSION:
BWA_INDEX:
RAW_DATA_DIR:
GATK_READ_GROUP_INFO:
COMPRESSION:
- omics_pipe.modules.bwa.bwa2(sample, bwa2_flag)[source]¶
BWA aligner for read2 of paired_end reads.
- input:
- .fastq
- output:
- .sam
- citation:
- Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]
- link:
- http://bio-bwa.sourceforge.net/bwa.shtml
- parameters from parameters file:
BWA_RESULTS:
TEMP_DIR:
SAMTOOLS_VERSION:
BWA_VERSION:
BWA_INDEX:
RAW_DATA_DIR:
GATK_READ_GROUP_INFO:
COMPRESSION:
- omics_pipe.modules.bwa.bwa_RNA(sample, bwa_flag)[source]¶
BWA aligner for single end reads.
- input:
- .fastq
- output:
- .sam
- citation:
- Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]
- link:
- http://bio-bwa.sourceforge.net/bwa.shtml
- parameters from parameters file:
BWA_RESULTS:
TEMP_DIR:
SAMTOOLS_VERSION:
BWA_VERSION:
BWA_INDEX:
RAW_DATA_DIR:
GATK_READ_GROUP_INFO:
COMPRESSION:
SNPiR¶
- omics_pipe.modules.snpir_variants.snpir_variants(sample, snpir_variants_flag)[source]¶
Calls variants using SNPIR pipeline.
- input:
- Aligned.out.sort.bam or accepted_hits.bam
- output:
- final_variants.vcf file
- citation:
- Piskol, R., et al. (2013). “Reliable Identification of Genomic Variants from RNA-Seq Data.” The American Journal of Human Genetics 93(4): 641-651.
- link:
- http://lilab.stanford.edu/SNPiR/
- parameters from parameters file:
VARIANT_RESULTS:
TEMP_DIR:
SAMTOOLS_VERSION:
BWA_VERSION:
PICARD_VERSION:
GATK_VERSION:
BEDTOOLS_VERSION:
UCSC_TOOLS_VERSION:
GENOME:
REPEAT_MASKER:
SNPIR_ANNOTATION:
RNA_EDIT:
DBSNP:
MILLS:
G1000:
WORKING_DIR:
BWA_RESULTS:
SNPIR_VERSION:
SNPIR_CONFIG:
SNPIR_DIR:
ENCODING:
Filter Variants¶
- omics_pipe.modules.filter_variants.filter_variants(sample, filter_variants_flag)[source]¶
Filters variants to remove common variants.
- input:
- .bam or .sam file
- output:
- .vcf file
- citation:
- Piskol et al. 2013. Reliable identification of genomic variants from RNA-seq data. The American Journal of Human Genetics 93: 641-651.
- link:
- http://lilab.stanford.edu/SNPiR/
- parameters from parameters file:
VARIANT_RESULTS:
TEMP_DIR:
SAMTOOLS_VERSION:
BWA_VERSION:
PICARD_VERSION:
GATK_VERSION:
BEDTOOLS_VERSION:
UCSC_TOOLS_VERSION:
GENOME:
REPEAT_MASKER:
SNPIR_ANNOTATION:
RNA_EDIT:
DBSNP:
MILLS:
G1000:
WORKING_DIR:
BWA_RESULTS:
SNPIR_VERSION:
SNPIR_CONFIG:
SNPIR_DIR:
SNPEFF_VERSION:
dbNSFP:
VCFTOOLS_VERSION:
WORKING_DIR:
SNP_FILTER_OUT_REF:
Intogen¶
- omics_pipe.modules.intogen.intogen(sample, intogen_flag)[source]¶
Runs Intogen to rank mutations and implication for cancer phenotype. Follows variant calling.
- input:
- .vcf
- output:
- variant list
- citation:
- Gonzalez-Perez et al. 2013. Intogen mutations identifies cancer drivers across tumor types. Nature Methods 10, 1081-1082.
- link:
- http://www.intogen.org/
- parameters from parameter file:
VCF_FILE:
INTOGEN_OPTIONS:
INTOGEN_RESULTS:
INTOGEN_VERSION:
USERNAME:
WORKING_DIR:
TEMP_DIR:
SCHEDULER:
VARIANT_RESULTS:
OncoRep Cancer Report¶
- omics_pipe.modules.BreastCancer_RNA_report.BreastCancer_RNA_report(sample, BreastCancer_RNA_report_flag)[source]¶
Runs R script with knitr to produce report from RNAseq pipeline.
- input:
- results from other steps in RNAseq pipelines
- output:
- html report
- citation:
- Meissner
- parameters from parameter file:
WORKING_DIR:
R_VERSION:
REPORT_RESULTS:
PARAMS_FILE:
TABIX_VERSION:
TUMOR_TYPE:
GENELIST:
COSMIC:
CLINVAR:
PHARMGKB_rsID:
PHARMGKB_Allele:
DRUGBANK:
CADD: