Whole Genome and Whole Exome Sequencing Modules

FASTQC

omics_pipe.modules.fastqc.fastqc(sample, fastqc_flag)[source]

QC check of raw .fastq files using FASTQC.

input:
.fastq file
output:
folder and zipped folder containing html, txt and image files
citation:
Babraham Bioinformatics
link:
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
parameters from parameters file:

RAW_DATA_DIR:

QC_PATH:

FASTQC_VERSION:

COMPRESSION:

BWA-MEM

omics_pipe.modules.bwa.bwa_mem(sample, bwa_mem_flag)[source]

BWA aligner with BWA-MEM algorithm.

input:
.fastq
output:
.sam
citation:
Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]
link:
http://bio-bwa.sourceforge.net/bwa.shtml
parameters from parameters file:

BWA_RESULTS:

TEMP_DIR:

SAMTOOLS_VERSION:

BWA_VERSION:

GENOME:

RAW_DATA_DIR:

BWA_OPTIONS:

COMPRESSION:

PICARD Mark Duplicates

omics_pipe.modules.picard_mark_duplicates.picard_mark_duplicates(sample, picard_mark_duplicates_flag)[source]

Picard tools Mark Duplicates.

input:
sorted.bam
output:
_sorted.rg.md.bam
citation:
http://picard.sourceforge.net/
link:
http://picard.sourceforge.net/
parameters from parameters file:

BWA_RESULTS:

TEMP_DIR:

PICARD_VERSION:

SAMTOOLS_VERSION:

GATK Preprocessing

WES

omics_pipe.modules.GATK_preprocessing_WES.GATK_preprocessing_WES(sample, GATK_preprocessing_WES_flag)[source]

GATK preprocessing steps for whole exome sequencing.

input:
sorted.rg.md.bam
output:
.ready.bam
citation:
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
link:
http://www.broadinstitute.org/gatk/
parameters from parameters file:

BWA_RESULTS:

TEMP_DIR:

GATK_VERSION:

GENOME:

DBSNP:

MILLS:

G1000:

CAPTURE_KIT_BED:

SAMTOOLS_VERSION:

WGS

omics_pipe.modules.GATK_preprocessing_WGS.GATK_preprocessing_WGS(sample, GATK_preprocessing_WGS_flag)[source]

GATK preprocessing steps for whole genome sequencing.

input:
sorted.rg.md.bam
output:
.ready.bam
citation:
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
link:
http://www.broadinstitute.org/gatk/
parameters from parameters file:

BWA_RESULTS:

TEMP_DIR:

GATK_VERSION:

GENOME:

DBSNP:

MILLS:

G1000:

SAMTOOLS_VERSION:

GATK Variant Discovery

omics_pipe.modules.GATK_variant_discovery.GATK_variant_discovery(sample, GATK_variant_discovery_flag)[source]

GATK_variant_discovery.

input:
sorted.rg.md.bam
output:
.ready.bam
citation:
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
link: GATK_variant_discovery
http://www.broadinstitute.org/gatk/
parameters from parameters file:

BWA_RESULTS:

TEMP_DIR:

GATK_VERSION:

GENOME:

DBSNP:

VARIANT_RESULTS:

GATK Variant Filtering

omics_pipe.modules.GATK_variant_filtering.GATK_variant_filtering(sample, GATK_variant_filtering_flag)[source]

GATK_variant_filtering.

input:
sorted.rg.md.bam
output:
.ready.bam
citation:
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
link: GATK_variant_filtering
http://www.broadinstitute.org/gatk/
parameters from parameters file:

VARIANT_RESULTS:

TEMP_DIR:

GATK_VERSION:

GENOME:

DBSNP:

MILLS:

OMNI:

HAPMAP:

R_VERSION:

G1000_SNPs:

G1000_Indels:

omics_pipe.modules.GATK_variant_filtering.GATK_variant_filtering_group(sample, GATK_variant_filtering_group_flag)[source]

GATK_variant_filtering.

input:
sorted.rg.md.bam
output:
.ready.bam
citation:
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
link: GATK_variant_filtering
http://www.broadinstitute.org/gatk/

parameters from parameters file:

VARIANT_RESULTS:

TEMP_DIR:

GATK_VERSION:

GENOME:

DBSNP:

MILLS_G1000:

OMNI:

HAPMAP:

R_VERSION:

G1000: