TCGA Reanalysis Pipeline - RNAseq

Modules included in the TCGA RNAseq Cancer pipeline.

TCGA Download

omics_pipe.modules.TCGA_download.TCGA_download(sample, TCGA_download_flag)[source]

Downloads and unzips TCGA data from Manifest.xml downloaded from CGHub. input:

TGCA XML file
output:
downloaded files from TCGA
citation:
The Cancer Genome Atlas
link:
https://cghub.ucsc.edu/software/downloads.html
parameters from parameters file:

TCGA_XML_FILE:

TCGA_KEY:

TCGA_OUTPUT_PATH:

CGATOOLS_VERSION:

FASTQC

omics_pipe.modules.fastqc.fastqc(sample, fastqc_flag)[source]

QC check of raw .fastq files using FASTQC.

input:
.fastq file
output:
folder and zipped folder containing html, txt and image files
citation:
Babraham Bioinformatics
link:
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
parameters from parameters file:

RAW_DATA_DIR:

QC_PATH:

FASTQC_VERSION:

COMPRESSION:

STAR Aligner

omics_pipe.modules.star.star(sample, star_flag)[source]

Runs STAR to align .fastq files.

input:
.fastq file
output:
Aligned.out.bam
citation:
  1. Dobin et al, Bioinformatics 2012; doi: 10.1093/bioinformatics/bts635 “STAR: ultrafast universal RNA-seq aligner”
link:
https://code.google.com/p/rna-star/
parameters from parameters file:

ENDS:

RAW_DATA_DIR:

STAR_INDEX:

STAR_OPTIONS:

STAR_RESULTS:

SAMTOOLS_VERSION:

STAR_VERSION:

COMPRESSION:

REF_GENES:

HTSEQ-count

omics_pipe.modules.htseq.htseq(sample, htseq_flag)[source]

Runs htseq-count to get raw count data from alignments.

input:
Aligned.out.sort.bam
output:
counts.txt
citation:
Simon Anders, EMBL
link:
http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html
parameters from parameters file:

STAR_RESULTS:

HTSEQ_OPTIONS:

REF_GENES:

HTSEQ_RESULTS:

TEMP_DIR:

SAMTOOLS_VERSION:

BAM_FILE_NAME:

PYTHON_VERSION:

RSEQC

omics_pipe.modules.rseqc.rseqc(sample, rseqc_flag)[source]

Runs rseqc to determine insert size as QC for alignment.

input:
.bam
output:
pdf plot
link:
http://rseqc.sourceforge.net/
parameters from parameters file:

STAR_RESULTS:

QC_PATH:

BAM_FILE_NAME:

RSEQC_REF:

RSEQC_VERSION:

TEMP_DIR:

Fusion Catcher

omics_pipe.modules.fusion_catcher.fusion_catcher(sample, fusion_catcher_flag)[source]

Detects fusion genes in paired-end RNAseq data.

input:
paired end .fastq files
output:
list of candidate fusion genes
citation:
  1. Kangaspeska, S. Hultsch, H. Edgren, D. Nicorici, A. Murumgi, O.P. Kallioniemi, Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms, PLOS One, Oct. 2012. http://dx.plos.org/10.1371/journal.pone.0048745
link:
https://code.google.com/p/fusioncatcher
parameters from parameters file:

ENDS:

RAW_DATA_DIR:

FUSION_RESULTS:

FUSIONCATCHERBUILD_DIR:

TEMP_DIR:

SAMTOOLS_VERSION:

FUSIONCATCHER_VERSION:

FUSIONCATCHER_OPTIONS:

TISSUE:

PYTHON_VERSION:

BWA/SNPiR

BWA

omics_pipe.modules.bwa.bwa1(sample, bwa1_flag)[source]

BWA aligner for read1 of paired_end reads.

input:
.fastq
output:
.sam
citation:
Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]
link:
http://bio-bwa.sourceforge.net/bwa.shtml
parameters from parameters file:

BWA_RESULTS:

TEMP_DIR:

SAMTOOLS_VERSION:

BWA_VERSION:

BWA_INDEX:

RAW_DATA_DIR:

GATK_READ_GROUP_INFO:

COMPRESSION:

omics_pipe.modules.bwa.bwa2(sample, bwa2_flag)[source]

BWA aligner for read2 of paired_end reads.

input:
.fastq
output:
.sam
citation:
Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]
link:
http://bio-bwa.sourceforge.net/bwa.shtml
parameters from parameters file:

BWA_RESULTS:

TEMP_DIR:

SAMTOOLS_VERSION:

BWA_VERSION:

BWA_INDEX:

RAW_DATA_DIR:

GATK_READ_GROUP_INFO:

COMPRESSION:

omics_pipe.modules.bwa.bwa_RNA(sample, bwa_flag)[source]

BWA aligner for single end reads.

input:
.fastq
output:
.sam
citation:
Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]
link:
http://bio-bwa.sourceforge.net/bwa.shtml
parameters from parameters file:

BWA_RESULTS:

TEMP_DIR:

SAMTOOLS_VERSION:

BWA_VERSION:

BWA_INDEX:

RAW_DATA_DIR:

GATK_READ_GROUP_INFO:

COMPRESSION:

SNPiR

omics_pipe.modules.snpir_variants.snpir_variants(sample, snpir_variants_flag)[source]

Calls variants using SNPIR pipeline.

input:
Aligned.out.sort.bam or accepted_hits.bam
output:
final_variants.vcf file
citation:
Piskol, R., et al. (2013). “Reliable Identification of Genomic Variants from RNA-Seq Data.” The American Journal of Human Genetics 93(4): 641-651.
link:
http://lilab.stanford.edu/SNPiR/
parameters from parameters file:

VARIANT_RESULTS:

TEMP_DIR:

SAMTOOLS_VERSION:

BWA_VERSION:

PICARD_VERSION:

GATK_VERSION:

BEDTOOLS_VERSION:

UCSC_TOOLS_VERSION:

GENOME:

REPEAT_MASKER:

SNPIR_ANNOTATION:

RNA_EDIT:

DBSNP:

MILLS:

G1000:

WORKING_DIR:

BWA_RESULTS:

SNPIR_VERSION:

SNPIR_CONFIG:

SNPIR_DIR:

ENCODING:

Filter Variants

omics_pipe.modules.filter_variants.filter_variants(sample, filter_variants_flag)[source]

Filters variants to remove common variants.

input:
.bam or .sam file
output:
.vcf file
citation:
Piskol et al. 2013. Reliable identification of genomic variants from RNA-seq data. The American Journal of Human Genetics 93: 641-651.
link:
http://lilab.stanford.edu/SNPiR/
parameters from parameters file:

VARIANT_RESULTS:

TEMP_DIR:

SAMTOOLS_VERSION:

BWA_VERSION:

PICARD_VERSION:

GATK_VERSION:

BEDTOOLS_VERSION:

UCSC_TOOLS_VERSION:

GENOME:

REPEAT_MASKER:

SNPIR_ANNOTATION:

RNA_EDIT:

DBSNP:

MILLS:

G1000:

WORKING_DIR:

BWA_RESULTS:

SNPIR_VERSION:

SNPIR_CONFIG:

SNPIR_DIR:

SNPEFF_VERSION:

dbNSFP:

VCFTOOLS_VERSION:

WORKING_DIR:

SNP_FILTER_OUT_REF:

Intogen

omics_pipe.modules.intogen.intogen(sample, intogen_flag)[source]

Runs Intogen to rank mutations and implication for cancer phenotype. Follows variant calling.

input:
.vcf
output:
variant list
citation:
Gonzalez-Perez et al. 2013. Intogen mutations identifies cancer drivers across tumor types. Nature Methods 10, 1081-1082.
link:
http://www.intogen.org/
parameters from parameter file:

VCF_FILE:

INTOGEN_OPTIONS:

INTOGEN_RESULTS:

INTOGEN_VERSION:

USERNAME:

WORKING_DIR:

TEMP_DIR:

SCHEDULER:

VARIANT_RESULTS:

OncoRep Cancer Report

omics_pipe.modules.BreastCancer_RNA_report.BreastCancer_RNA_report(sample, BreastCancer_RNA_report_flag)[source]

Runs R script with knitr to produce report from RNAseq pipeline.

input:
results from other steps in RNAseq pipelines
output:
html report
citation:
  1. Meissner
parameters from parameter file:

WORKING_DIR:

R_VERSION:

REPORT_RESULTS:

PARAMS_FILE:

TABIX_VERSION:

TUMOR_TYPE:

GENELIST:

COSMIC:

CLINVAR:

PHARMGKB_rsID:

PHARMGKB_Allele:

DRUGBANK:

CADD: