Stages

Listing of stages implemented in YMP

System Message: ERROR/3 (/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/doc/stages.rst, line 1)

Error in “sm:stage” directive: 1 argument(s) required, 0 supplied.

.. sm:stage:: 
   :source: ymp/rules/00_import.rules:64

   Imports raw read files into YMP.

   >>> ymp make toy
   >>> ymp make mpic
stage annotate_blast[source]

Annotate sequences with BLAST

Searches a reference database for hits with blastn. Use E flag to specify exponent to required E-value. Use N or Mega to specify default. Use Best to add -subject_besthit flag.

stage annotate_diamond[source]

FIXME

stage annotate_prodigal[source]

Call genes using prodigal

>>> ymp make toy.ref_genome.annotate_prodigal
stage annotate_tblastn[source]

Runs tblastn

stage assemble_megahit[source]

Assemble metagenome using MegaHit.

>>> ymp make toy.assemble_megahit.map_bbmap
>>> ymp make toy.group_ALL.assemble_megahit.map_bbmap
>>> ymp make toy.group_Subject.assemble_megahit.map_bbmap
stage assemble_metaspades[source]

Assemble reads using metaspades

>>> ymp make toy.assemble_metaspades
>>> ymp make toy.group_ALL.assemble_metaspades
>>> ymp make toy.group_Subject.assemble_metaspades
stage assemble_trinity[source]
stage bin_metabat2[source]

Bin metagenome assembly into MAGs

stage check[source]

Verify file availability

This stage provides rules for checking the file availability at a given point in the stage stack.

Mainly useful for testing and debugging.

stage cluster_cdhit[source]

Clusters protein sequences using CD-HIT

>>> ymp make toy.ref_query.cluster_cdhit
stage correct_bbmap[source]

Correct read errors by overlapping inside tails

Applies BBMap's “bbmerge.sh ecco” mode. This will overlap the inside of read pairs and choose the base with the higher quality where the alignment contains mismatches and increase the quality score as indicated by the double observation where the alignment contains matches.

>>> ymp make toy.correct_bbmap
>>> ymp make mpic.correct_bbmap
stage count_diamond[source]
stage count_stringtie[source]
stage coverage_samtools[source]

Computes coverage from a sorted bam file using samtools coverage

stage dedup_bbmap[source]

Remove duplicate reads

Applies BBMap’s “dedupe.sh”

>>> ymp make toy.dedup_bbmap
>>> ymp make mpic.dedup_bbmap
stage dust_bbmap[source]

Perform entropy filtering on reads using BBMap’s bbduk.sh

The parameter Enn gives the entropy cutoff. Higher values filter more sequences.

>>> ymp make toy.dust_bbmap
>>> ymp make toy.dust_bbmapE60
stage extract_reads[source]

Extract reads from BAM file using samtools fastq.

Parameters fn, Fn and Gn are passed through. Some options include:

  • f2: fully mapped (only proper pairs)

  • F2: not fully mapped (unmapped at least one read)

  • f12: not mapped (neither read mapped)

stage extract_seqs[source]

Extract sequences from .fasta.gz file using samtools faidx

Currently requires a .blast7 file as input.

Use parameter Nomatch to instead keep unmatched sequences.

stage filter_bmtagger[source]

Filter(-out) contaminant reads using BMTagger

>>> ymp make toy.ref_phiX.index_bmtagger.remove_bmtagger
>>> ymp make toy.ref_phiX.index_bmtagger.remove_bmtagger.assemble_megahit
>>> ymp make toy.ref_phiX.index_bmtagger.filter_bmtagger
>>> ymp make mpic.ref_phiX.index_bmtagger.remove_bmtagger
stage format_bbmap[source]

Process sequences with BBMap’s format.sh

Parameter Ln filters sequences at a minimum length.

>>> ymp make toy.assemble_metaspades.format_bbmapL200
stage humann2[source]

Compute functional profiles using HUMAnN2

stage index_bbmap[source]
>>> ymp make toy.ref_genome.index_bbmap
stage index_blast[source]
stage index_bmtagger[source]
stage index_bowtie2[source]
>>> ymp make toy.ref_genome.index_bowtie2
stage index_diamond[source]
stage map_bbmap[source]

Map reads using BBMap

>>> ymp make toy.assemble_megahit.map_bbmap
>>> ymp make toy.ref_genome.map_bbmap
>>> ymp make mpic.ref_ssu.map_bbmap
stage map_bowtie2[source]

Map reads using Bowtie2

>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2
>>> ymp make toy.assemble_megahit.index_bowtie2.map_bowtie2
>>> ymp make toy.group_Subject.assemble_megahit.index_bowtie2.map_bowtie2
>>> ymp make mpic.ref_ssu.index_bowtie2.map_bowtie2
stage map_diamond[source]
stage map_hisat2[source]

Map reads using Hisat2

stage map_star[source]

Map RNA-Seq reads with STAR

stage metaphlan2[source]

Assess metagenome community composition using Metaphlan 2

stage primermatch_bbmap[source]

Filters reads by matching reference primer

>>> ymp make mpic.ref_primers.primermatch_bbmap
stage profile_centrifuge[source]

Classify reads using centrifuge

stage qc_fastqc[source]

Quality screen reads using FastQC

>>> ymp make toy.qc_fastqc
stage qc_multiqc[source]

Aggregate QC reports using MultiQC

stage qc_quast[source]

Estimate assemly quality using Quast

stage quant_rsem[source]

Quantify transcripts using RSEM

stage references[source]

This is a “virtual” stage. It does not process read data, but comprises rules used for reference provisioning.

stage remove_bbmap[source]

Filter reads by reference

This stage aligns the reads with a given reference using BBMap in fast mode. Matching reads are collected in the stage filter_bbmap and remaining reads are collectec in the stage remove_bbmap.

>>> ymp make toy.ref_phiX.index_bbmap.remove_bbmap
>>> ymp make toy.ref_phiX.index_bbmap.filter_bbmap
>>> ymp make mpic.ref_phiX.index_bbmap.remove_bbmap
stage sort_bam[source]
stage split_library[source]

Demultiplexes amplicon sequencing files

This rule is treated specially. If a configured project specifies a barcode_col, reads from the file (or files) are used in combination with

stage trim_bbmap[source]

Trim adapters and low quality bases from reads

Applies BBMap’s “bbduk.sh”.

Parameters:

A: append to enable adapter trimming Q20: append to select phred score cutoff (default 20) L20: append to select minimum read length (default 20)

>>> ymp make toy.trim_bbmap
>>> ymp make toy.trim_bbmapA
>>> ymp make toy.trim_bbmapAQ10L10
>>> ymp make mpic.trim_bbmap
stage trim_sickle[source]

Perform read trimming using Sickle

>>> ymp make toy.trim_sickle
>>> ymp make toy.trim_sickleQ10L10
>>> ymp make mpic.trim_sickleL20
stage trim_trimmomatic[source]

Adapter trim reads using trimmomatic

>>> ymp make toy.trim_trimmomaticT32
>>> ymp make mpic.trim_trimmomatic
rule download_file_ftp[source]

Downloads remote file using wget

rule download_file_http[source]

Downloads remote file using internal downloader

rule prefetch[source]

Downloads SRA files into NCBI SRA folder (ncbi/public/sra).

rule fastq_dump[source]

Extracts FQ from SRA files

rule combine_with_ref[source]
rule align_mafft[source]
rule blast7_merge[source]

Merges blast results from all samples into single file

rule blast7_extract[source]

Generates meta-data csv and sequence fasta pair from blast7 file for one gene.

rule blast7_extract_merge[source]

Merges extracted csv/fasta pairs over all samples.

rule blast7_all[source]
rule blast7_reports[source]
rule blast7_eval_hist[source]
rule blast7_eval_plot[source]
rule cdhit_fna_single[source]

Clustering predicted genes (nuc) using cdhit-est

rule 87[source]
rule 88[source]
rule 89[source]
rule 90[source]
rule 91[source]
rule 92[source]
rule faa_fastp[source]
rule fasta_to_fastp_gz[source]
rule gunzip[source]

Generic temporary gunzip

Use ruleorder: gunzip > myrule to prefer gunzipping over re-running a rule. E.g.

>>> ruleorder: gunzip > myrule
>>> rule myrule:
>>>   output: temp("some.txt"), "some.txt.gzip"
rule mkdir[source]

Auto-create directories listed in ymp config.

Use these as input: >>> input: tmpdir = ancient(icfg.dir.tmp)

rule fq2fa[source]

Unzip and convert fastq to fasta

rule make_otu_table[source]
rule otu_to_qiime_txt[source]
rule otu_to_biom[source]
rule blast7_coverage_per_otu[source]
rule pick_open_otus[source]

Pick open reference OTUs

rule pick_closed_otus[source]

Pick closed reference OTUs

rule rarefy_table[source]
rule convert_to_closed_ref[source]

Convert open reference otu table to closed reference

rule env_wait[source]
rule ticktock[source]
rule noop[source]
rule normalize_16S[source]

Normalize 16S by copy number using picrust, must be run with closed reference OTU table

rule predict_metagenome[source]

Predict metagenome using picrust

rule categorize_by_function[source]

Categorize PICRUSt KOs into pathways

rule raxml_tree[source]
rule rsem_index[source]

Build Genome Index for RSEM

rule scnic_within_minsamp[source]
rule scnic_within_sparcc_filter[source]
rule star_index[source]

Build Genome Index for Star