Stages

Listing of stages implemented in YMP

stage Import[source]

Imports raw read files into YMP.

>>> ymp make toy
>>> ymp make mpic
rule export_qiime_map_file[source]
stage annotate_blast[source]

Annotate sequences with BLAST

Searches a reference database for hits with blastn. Use E flag to specify exponent to required E-value. Use N or Mega to specify default. Use Best to add -subject_besthit flag.

This stage produces blast7.gz files as output.

>>> ymp make toy.ref_genome.index_blast.annotate_blast
rule blast_db_size[source]

Determines size of BLAST database (for splitting)

rule blast_db_size_SPLIT[source]

Variant of blast_db_size for multi-file blast indices

rule blast_db_size_V4[source]

Variant of blast_db_size for V4 blast indices

rule blastn_join_result[source]

Merges BLAST results

rule blastn_query[source]

Runs BLAST

rule blastn_query_SPLIT[source]

Variant of blastn_query for multi-file blast indices

rule blastn_query_V4[source]

Variant of blastn_query for V4 blast indices

rule blastn_split_query_fasta[source]

Split FASTA query file into chunks for individual BLAST runs

rule blastn_split_query_fasta_hack[source]

Workaround for a problem with snakemake checkpoints and run: statements

stage annotate_diamond[source]

FIXME

rule diamond_blastx_fasta[source]
rule diamond_view[source]

Convert Diamond binary output (daa) to BLAST6 format

stage annotate_prodigal[source]

Call genes using prodigal

>>> ymp make toy.ref_genome.annotate_prodigal
rule prodigal[source]

Predict genes using prodigal

stage annotate_tblastn[source]

Runs tblastn

rule blast7_to_gtf[source]

Convert from Blast Format 7 to GFF/GTF format

rule tblastn_query[source]

Runs a TBLASTN search against an assembly.

stage assemble_megahit[source]

Assemble metagenome using MegaHit.

>>> ymp make toy.assemble_megahit.map_bbmap
>>> ymp make toy.group_ALL.assemble_megahit.map_bbmap
>>> ymp make toy.group_Subject.assemble_megahit.map_bbmap
rule megahit[source]

Runs MegaHit.

stage assemble_spades[source]

Assemble reads using spades

>>> ymp make toy.assemble_spades
>>> ymp make toy.group_ALL.assemble_spades
>>> ymp make toy.group_Subject.assemble_spades
>>> ymp make toy.assemble_spades
>>> ymp make toy.assemble_spadesMeta
>>> ymp make toy.assemble_spadesSc
>>> ymp make toy.assemble_spadesRna
>>> ymp make toy.assemble_spadesIsolate
>>> ymp make toy.assemble_spadesNC
>>> ymp make toy.assemble_spadesMetaNC
rule spades[source]

Runs Spades. Supports reads.by_COLUMN.sp/complete as target for by group co-assembly.

rule spades_input_yaml[source]

Prepares a dataset config for spades. Spades commandline is limited to at most 9 pairs of fq files, so to allow arbitrary numbers we need to use the dataset config option.

Preparing in a separate rule so that the main spades rule can use the shell: rule and not run:, which would preclude it from using conda environments.

stage assemble_trinity[source]
rule trinity[source]
rule trinity_stats[source]
stage assemble_unicycler[source]

Assemble reads using unicycler

>>> ymp make toy.assemble_unicycler
rule unicycler[source]

Runs unicycler

stage basecov_bedtools[source]

Creates BLAST index running makeblastdb on input fasta.gz files.

>>> ymp make toy.ref_genome.index_blast
rule bedtools_genomecov[source]
stage bin_metabat2[source]

Bin metagenome assembly into MAGs

>>> ymp make mock.assemble_megahit.map_bbmap.sort_bam.bin_metabat2
>>> ymp make mock.group_ALL.assemble_megahit.map_bbmap.sort_bam.group_ALL.bin_metabat2
rule metabat2_bin[source]

Bin metagenome with MetaBat2

rule metabat2_depth[source]

Generates a depth file from BAM

stage check[source]

Verify file availability

This stage provides rules for checking the file availability at a given point in the stage stack.

Mainly useful for testing and debugging.

rule check_fasta[source]

Verify availability of FastA type reference

rule check_fastp[source]

Verify availability of FastP type reference

stage cluster_cdhit[source]

Clusters protein sequences using CD-HIT

>>> ymp make toy.ref_query.cluster_cdhit
rule cdhit_clstr_to_csv[source]
rule cdhit_faa_single[source]

Clustering predicted genes using cdhit

rule cdhit_prepare_input[source]

Prepares input data for CD-HIT

  • rewrites ‘*’ to ‘X’ as stop-codon not understood by CD-HIT

  • prefixes lost ID to Fasta ID

stage correct_bbmap[source]

Correct read errors by overlapping inside tails

Applies BBMap's “bbmerge.sh ecco” mode. This will overlap the inside of read pairs and choose the base with the higher quality where the alignment contains mismatches and increase the quality score as indicated by the double observation where the alignment contains matches.

>>> ymp make toy.correct_bbmap
>>> ymp make mpic.correct_bbmap
rule bbmap_error_correction[source]

Error correction with BBMerge overlapping

rule bbmap_error_correction_all[source]
rule bbmap_error_correction_se[source]

Error correction with BBMerge overlapping

stage count_diamond[source]
rule diamond_count[source]
stage count_stringtie[source]
rule stringtie[source]
rule stringtie_abundance[source]
rule stringtie_all[source]
rule stringtie_all_target[source]
rule stringtie_gather_ballgown[source]
rule stringtie_merge[source]
stage coverage_samtools[source]

Computes coverage from a sorted bam file using samtools coverage

rule samtools_coverage[source]
stage dedup_bbmap[source]

Remove duplicate reads

Applies BBMap's “dedupe.sh”

>>> ymp make toy.dedup_bbmap
>>> ymp make mpic.dedup_bbmap
rule bbmap_dedupe[source]

Deduplicate reads using BBMap’s dedupe.sh

rule bbmap_dedupe_all[source]
rule bbmap_dedupe_se[source]

Deduplicate reads using BBMap's dedupe.sh

stage dust_bbmap[source]

Perform entropy filtering on reads using BBMap's bbduk.sh

The parameter Enn gives the entropy cutoff. Higher values filter more sequences.

>>> ymp make toy.dust_bbmap
>>> ymp make toy.dust_bbmapE60
rule bbmap_dust[source]
stage extract_reads[source]

Extract reads from BAM file using samtools fastq.

Parameters fn, Fn and Gn are passed through to samtools view. Reads are output only if all bits in f are set, none of the bits in F are set, and any of the bits in G is unset.

1: paired 2: proper pair (both aligned in right orientation) 4: unmapped 8: other read unmapped

Some options include:

  • f2: correctly mapped (only proper pairs)

  • F12: both ends mapped (but potentially “improper”)

  • G12: either end mapped

  • F2: not correctly mapped (not proper pair, could also be unmapped)

  • f12: not mapped (neither read mapped)

rule samtools_fastq[source]
stage extract_seqs[source]

Extract sequences from .fasta.gz file using samtools faidx

Currently requires a .blast7 file as input.

Use parameter Nomatch to instead keep unmatched sequences.

rule samtools_faidx[source]
rule samtools_select_blast[source]
stage filter_bmtagger[source]

Filter(-out) contaminant reads using BMTagger

>>> ymp make toy.ref_phiX.index_bmtagger.remove_bmtagger
>>> ymp make toy.ref_phiX.index_bmtagger.remove_bmtagger.assemble_megahit
>>> ymp make toy.ref_phiX.index_bmtagger.filter_bmtagger
>>> ymp make mpic.ref_phiX.index_bmtagger.remove_bmtagger
rule bmtagger_filter[source]

Filter reads using reference

rule bmtagger_filter_all[source]
rule bmtagger_filter_out[source]

Filter-out reads using reference

rule bmtagger_filter_revread[source]

Filter reads using reference

rule bmtagger_find[source]

Match paired end reads against reference

rule bmtagger_find_se[source]

Match single end reads against reference

rule bmtagger_remove_all[source]
stage format_bbmap[source]

Process sequences with BBMap's format.sh

Parameter Ln filters sequences at a minimum length.

>>> ymp make toy.assemble_spades.format_bbmapL200
rule bbmap_reformat[source]
stage humann2[source]

Compute functional profiles using HUMAnN2

rule humann2[source]

Runs HUMAnN2 with separately processed Metaphlan2 output.

Note

HUMAnN2 has no special support for paired end reads. As per manual, we just feed it the concatenated forward and reverse reads.

rule humann2_all[source]
rule humann2_join_tables[source]

Joins HUMAnN2 per sample output tables

rule humann2_renorm_table[source]

Renormalizes humann2 output tables

stage index_bbmap[source]

Creates BBMap index

>>> ymp make toy.ref_genome.index_bbmap
rule bbmap_makedb[source]

Precomputes BBMap index

stage index_blast[source]
rule blast_makedb[source]

Build Blast index

stage index_bmtagger[source]
rule bmtagger_bitmask[source]
rule bmtagger_index[source]
stage index_bowtie2[source]
>>> ymp make toy.ref_genome.index_bowtie2
rule bowtie2_index[source]
stage index_diamond[source]
rule diamond_makedb[source]

Build Diamond index file

stage map_bbmap[source]

Map reads using BBMap

>>> ymp make toy.assemble_megahit.map_bbmap
>>> ymp make toy.ref_genome.map_bbmap
>>> ymp make mpic.ref_ssu.map_bbmap
rule bbmap_map[source]

Map read from each (co-)assembly read file to the assembly

rule bbmap_map_SE[source]

Map read from each (co-)assembly read file to the assembly

stage map_bowtie2[source]

Map reads using Bowtie2

>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2
>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2VF
>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2F
>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2S
>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2VS
>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2X800
>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2I5
>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2L
>>> ymp make toy.assemble_megahit.index_bowtie2.map_bowtie2
>>> ymp make toy.group_Subject.assemble_megahit.index_bowtie2.map_bowtie2
>>> ymp make mpic.ref_ssu.index_bowtie2.map_bowtie2
rule bowtie2_map[source]
rule bowtie2_map_SE[source]
stage map_diamond[source]
rule diamond_blastx_fastq[source]
rule diamond_blastx_fastq2[source]
rule diamond_view_2[source]

Convert Diamond binary output (daa) to BLAST6 format

stage map_hisat2[source]

Map reads using Hisat2

rule hisat2_map[source]

For hisat we always assume a pre-build index as providing SNPs and haplotypes etc is beyond this pipelines scope.

stage map_star[source]

Map RNA-Seq reads with STAR

rule star_map[source]
stage markdup_sambamba[source]
rule sambamba_markdup[source]
stage metaphlan2[source]

Assess metagenome community composition using Metaphlan 2

rule metaphlan2[source]

Computes community profile from mapped reads and Metaphlan’s custom reference database.

rule metaphlan2_map[source]

Align reads to Metaphlan’s custom reference database.

rule metaphlan2_merge[source]

Merges Metaphlan community profiles.

stage polish_pilon[source]

Polish genomes with Pilon

Requires fasta.gz and sorted.bam files as input.

rule pilon_polish[source]
stage primermatch_bbmap[source]

Filters reads by matching reference primer using BBMap's “bbduk.sh”.

>>> ymp make mpic.ref_primers.primermatch_bbmap
rule bbduk_primer[source]

Splits reads based on primer matching into “primermatch” and “primerfail”.

rule bbduk_primer_all[source]
rule bbduk_primer_se[source]

Splits reads based on primer matching into “primermatch” and “primerfail”.

stage profile_centrifuge[source]

Classify reads using centrifuge

rule centrifuge[source]
stage qc_fastqc[source]

Quality screen reads using FastQC

>>> ymp make toy.qc_fastqc
rule qc_fastqc[source]

Run FastQC on read files

stage qc_multiqc[source]

Aggregate QC reports using MultiQC

rule multiqc_fastqc[source]

Assemble report on all FQ files in a directory

stage qc_quast[source]

Estimate assemly quality using Quast

rule metaquast_all_at_once[source]

Run quast on all assemblies in the previous stage at once.

rule metaquast_by_sample[source]

Run quast on each assembly

rule metaquast_multiq_summary[source]

Aggregate Quast per assembly reports

stage quant_rsem[source]

Quantify transcripts using RSEM

rule rsem_all[source]
rule rsem_all_for_target[source]
rule rsem_quant[source]
stage references[source]

This is a “virtual” stage. It does not process read data, but comprises rules used for reference provisioning.

rule human_db_download[source]

Download HUMAnN2 reference databases

rule prepare_reference[source]

Provisions files in <reference_dir>/<reference_name>

  • Creates symlinks to downloaded references

  • Compresses references provided uncompressed upstream

  • Connects files requested by stages with downloaded files and unpacked archives

rule unpack_archive[source]

Template rule for unpacking references provisioned upstream as archive.

rule unpack_ref_GRCh38_eaa4c10f

Unpacks ref_GRCh38 archive:

URL: ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38_snp_tran.tar.gz

Files:

  • ALL.1.ht2

  • ALL.2.ht2

  • ALL.3.ht2

  • ALL.4.ht2

  • ALL.5.ht2

  • ALL.6.ht2

  • ALL.7.ht2

  • ALL.8.ht2

rule unpack_ref_centrifuge_0d910a96

Unpacks ref_centrifuge archive:

URL: ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/p+h+v.tar.gz

Files:

  • p+h+v.1.cf

  • p+h+v.2.cf

  • p+h+v.3.cf

rule unpack_ref_centrifuge_1ee7c028

Unpacks ref_centrifuge archive:

URL: ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/nt.tar.gz

Files:

  • nt.1.cf

  • nt.2.cf

  • nt.3.cf

rule unpack_ref_centrifuge_43ba6165

Unpacks ref_centrifuge archive:

URL: ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/p_compressed.tar.gz

Files:

  • p_compressed.1.cf

  • p_compressed.2.cf

  • p_compressed.3.cf

rule unpack_ref_centrifuge_a9964521

Unpacks ref_centrifuge archive:

URL: ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/p_compressed+h+v.tar.gz

Files:

  • p_compressed+h+v.1.cf

  • p_compressed+h+v.2.cf

  • p_compressed+h+v.3.cf

rule unpack_ref_greengenes_305aa905

Unpacks ref_greengenes archive:

URL: ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz

Files:

  • rep_set/99_otus.fasta

  • rep_set/97_otus.fasta

  • rep_set/94_otus.fasta

rule unpack_ref_metaphlan2_a6545140

Unpacks ref_metaphlan2 archive:

URL: https://depot.galaxyproject.org/software/metaphlan2/metaphlan2_2.6.0_src_all.tar.gz

Files:

  • db_v20/mpa_v20_m200.1.bt2

  • db_v20/mpa_v20_m200.2.bt2

  • db_v20/mpa_v20_m200.3.bt2

  • db_v20/mpa_v20_m200.4.bt2

  • db_v20/mpa_v20_m200.rev.1.bt2

  • db_v20/mpa_v20_m200.rev.2.bt2

  • db_v20/mpa_v20_m200.pkl

rule unpack_ref_mothur_SEED_39c9f686

Unpacks ref_mothur_SEED archive:

URL: https://www.mothur.org/w/images/a/a4/Silva.seed_v128.tgz

Files:

  • silva.seed_v128.tax

  • silva.seed_v128.align

stage remove_bbmap[source]

Filter reads by reference

This stage aligns the reads with a given reference using BBMap in fast mode. Matching reads are collected in the stage filter_bbmap and remaining reads are collectec in the stage remove_bbmap.

>>> ymp make toy.ref_phiX.index_bbmap.remove_bbmap
>>> ymp make toy.ref_phiX.index_bbmap.filter_bbmap
>>> ymp make mpic.ref_phiX.index_bbmap.remove_bbmap
rule bbmap_split[source]
rule bbmap_split_all[source]
rule bbmap_split_all_remove[source]
rule bbmap_split_se[source]
stage sort_bam[source]
rule sambamba_sort[source]
stage split_library[source]

Demultiplexes amplicon sequencing files

This rule is treated specially. If a configured project specifies a barcode_col, reads from the file (or files) are used in combination with

rule fastq_multix[source]
rule split_library_compress_sample[source]
stage trim_bbmap[source]

Trim adapters and low quality bases from reads

Applies BBMap's “bbduk.sh”.

Parameters:

A: append to enable adapter trimming Q20: append to select phred score cutoff (default 20) L20: append to select minimum read length (default 20)

>>> ymp make toy.trim_bbmap
>>> ymp make toy.trim_bbmapA
>>> ymp make toy.trim_bbmapAQ10L10
>>> ymp make mpic.trim_bbmap
rule bbmap_trim[source]

Trimming and Adapter Removal using BBTools BBDuk

rule bbmap_trim_all[source]
rule bbmap_trim_se[source]

Trimming and Adapter Removal using BBTools BBDuk

stage trim_sickle[source]

Perform read trimming using Sickle

>>> ymp make toy.trim_sickle
>>> ymp make toy.trim_sickleQ10L10
>>> ymp make mpic.trim_sickleL20
rule sicke_all[source]
rule sickle[source]
rule sickle_se[source]
stage trim_trimmomatic[source]

Adapter trim reads using trimmomatic

>>> ymp make toy.trim_trimmomaticT32
>>> ymp make mpic.trim_trimmomatic
rule trimmomatic_adapter[source]

Trimming with Trimmomatic

rule trimmomatic_adapter_all[source]
rule trimmomatic_adapter_se[source]

Trimming with Trimmomatic

rule download_file_ftp[source]

Downloads remote file using wget

rule download_file_http[source]

Downloads remote file using internal downloader

rule mkdir[source]

Auto-create directories listed in ymp config.

Use these as input: >>> input: tmpdir = ancient(ymp.get_config().dir.tmp) Or as param: >>> param: tmpdir = “/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/doc/tmp”

rule prefetch[source]

Downloads SRA files into NCBI SRA folder (ncbi/public/sra).

rule fastq_dump[source]

Extracts FQ from SRA files

rule cdhit_fna_single[source]

Clustering predicted genes (nuc) using cdhit-est

rule normalize_16S[source]

Normalize 16S by copy number using picrust, must be run with closed reference OTU table

rule predict_metagenome[source]

Predict metagenome using picrust

rule categorize_by_function[source]

Categorize PICRUSt KOs into pathways

rule rsem_index[source]

Build Genome Index for RSEM

rule star_index[source]

Build Genome Index for Star