Stages¶
Listing of stages implemented in YMP
-
stage
annotate_blast
[source]¶ Annotate sequences with BLAST
Searches a reference database for hits with
blastn
. UseE
flag to specify exponent to required E-value. UseN
orMega
to specify default. UseBest
to add-subject_besthit
flag.This stage produces
blast7.gz
files as output.>>> ymp make toy.ref_genome.index_blast.annotate_blast
-
rule
blast_db_size_SPLIT
[source]¶ Variant of
blast_db_size
for multi-file blast indices
-
rule
blast_db_size_V4
[source]¶ Variant of
blast_db_size
for V4 blast indices
-
rule
blastn_query_SPLIT
[source]¶ Variant of
blastn_query
for multi-file blast indices
-
rule
blastn_query_V4
[source]¶ Variant of
blastn_query
for V4 blast indices
-
rule
-
stage
annotate_prodigal
[source]¶ Call genes using prodigal
>>> ymp make toy.ref_genome.annotate_prodigal
-
stage
assemble_megahit
[source]¶ Assemble metagenome using MegaHit.
>>> ymp make toy.assemble_megahit.map_bbmap >>> ymp make toy.group_ALL.assemble_megahit.map_bbmap >>> ymp make toy.group_Subject.assemble_megahit.map_bbmap
-
stage
assemble_spades
[source]¶ Assemble reads using spades
>>> ymp make toy.assemble_spades >>> ymp make toy.group_ALL.assemble_spades >>> ymp make toy.group_Subject.assemble_spades >>> ymp make toy.assemble_spades >>> ymp make toy.assemble_spadesMeta >>> ymp make toy.assemble_spadesSc >>> ymp make toy.assemble_spadesRna >>> ymp make toy.assemble_spadesIsolate >>> ymp make toy.assemble_spadesNC >>> ymp make toy.assemble_spadesMetaNC
-
rule
spades
[source]¶ Runs Spades. Supports reads.by_COLUMN.sp/complete as target for by group co-assembly.
-
rule
spades_input_yaml
[source]¶ Prepares a dataset config for spades. Spades commandline is limited to at most 9 pairs of fq files, so to allow arbitrary numbers we need to use the dataset config option.
Preparing in a separate rule so that the main spades rule can use the
shell:
rule and notrun:
, which would preclude it from using conda environments.
-
rule
-
stage
assemble_unicycler
[source]¶ Assemble reads using unicycler
>>> ymp make toy.assemble_unicycler
-
stage
basecov_bedtools
[source]¶ Creates
BLAST
index runningmakeblastdb
on input fasta.gz files.>>> ymp make toy.ref_genome.index_blast
-
stage
bin_metabat2
[source]¶ Bin metagenome assembly into MAGs
>>> ymp make mock.assemble_megahit.map_bbmap.sort_bam.bin_metabat2 >>> ymp make mock.group_ALL.assemble_megahit.map_bbmap.sort_bam.group_ALL.bin_metabat2
-
stage
check
[source]¶ Verify file availability
This stage provides rules for checking the file availability at a given point in the stage stack.
Mainly useful for testing and debugging.
-
stage
cluster_cdhit
[source]¶ Clusters protein sequences using CD-HIT
>>> ymp make toy.ref_query.cluster_cdhit
-
stage
correct_bbmap
[source]¶ Correct read errors by overlapping inside tails
Applies
BBMap's
“bbmerge.sh ecco” mode. This will overlap the inside of read pairs and choose the base with the higher quality where the alignment contains mismatches and increase the quality score as indicated by the double observation where the alignment contains matches.>>> ymp make toy.correct_bbmap >>> ymp make mpic.correct_bbmap
-
stage
dedup_bbmap
[source]¶ Remove duplicate reads
Applies
BBMap's
“dedupe.sh”>>> ymp make toy.dedup_bbmap >>> ymp make mpic.dedup_bbmap
-
stage
dust_bbmap
[source]¶ Perform entropy filtering on reads using
BBMap's
bbduk.shThe parameter
Enn
gives the entropy cutoff. Higher values filter more sequences.>>> ymp make toy.dust_bbmap >>> ymp make toy.dust_bbmapE60
-
stage
extract_reads
[source]¶ Extract reads from BAM file using
samtools fastq
.Parameters
fn
,Fn
andGn
are passed through tosamtools view
. Reads are output only if all bits inf
are set, none of the bits inF
are set, and any of the bits inG
is unset.1: paired 2: proper pair (both aligned in right orientation) 4: unmapped 8: other read unmapped
Some options include:
f2: correctly mapped (only proper pairs)
F12: both ends mapped (but potentially “improper”)
G12: either end mapped
F2: not correctly mapped (not proper pair, could also be unmapped)
f12: not mapped (neither read mapped)
-
stage
extract_seqs
[source]¶ Extract sequences from
.fasta.gz
file usingsamtools faidx
Currently requires a
.blast7
file as input.Use parameter
Nomatch
to instead keep unmatched sequences.
-
stage
filter_bmtagger
[source]¶ Filter(-out) contaminant reads using BMTagger
>>> ymp make toy.ref_phiX.index_bmtagger.remove_bmtagger >>> ymp make toy.ref_phiX.index_bmtagger.remove_bmtagger.assemble_megahit >>> ymp make toy.ref_phiX.index_bmtagger.filter_bmtagger >>> ymp make mpic.ref_phiX.index_bmtagger.remove_bmtagger
-
stage
format_bbmap
[source]¶ Process sequences with
BBMap's
format.shParameter
Ln
filters sequences at a minimum length.>>> ymp make toy.assemble_spades.format_bbmapL200
-
stage
humann2
[source]¶ Compute functional profiles using HUMAnN2
-
stage
map_bbmap
[source]¶ Map reads using
BBMap
>>> ymp make toy.assemble_megahit.map_bbmap >>> ymp make toy.ref_genome.map_bbmap >>> ymp make mpic.ref_ssu.map_bbmap
-
stage
map_bowtie2
[source]¶ Map reads using Bowtie2
>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2 >>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2VF >>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2F >>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2S >>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2VS >>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2X800 >>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2I5 >>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2L >>> ymp make toy.assemble_megahit.index_bowtie2.map_bowtie2 >>> ymp make toy.group_Subject.assemble_megahit.index_bowtie2.map_bowtie2 >>> ymp make mpic.ref_ssu.index_bowtie2.map_bowtie2
-
stage
metaphlan2
[source]¶ Assess metagenome community composition using Metaphlan 2
-
stage
polish_pilon
[source]¶ Polish genomes with Pilon
Requires fasta.gz and sorted.bam files as input.
-
stage
primermatch_bbmap
[source]¶ Filters reads by matching reference primer using
BBMap's
“bbduk.sh”.>>> ymp make mpic.ref_primers.primermatch_bbmap
-
stage
references
[source]¶ This is a “virtual” stage. It does not process read data, but comprises rules used for reference provisioning.
-
rule
prepare_reference
[source]¶ Provisions files in
<reference_dir>/<reference_name>
Creates symlinks to downloaded references
Compresses references provided uncompressed upstream
Connects files requested by stages with downloaded files and unpacked archives
-
rule
unpack_archive
[source]¶ Template rule for unpacking references provisioned upstream as archive.
-
rule
unpack_ref_GRCh38_eaa4c10f
¶ Unpacks ref_GRCh38 archive:
URL: ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38_snp_tran.tar.gz
Files:
ALL.1.ht2
ALL.2.ht2
ALL.3.ht2
ALL.4.ht2
ALL.5.ht2
ALL.6.ht2
ALL.7.ht2
ALL.8.ht2
-
rule
unpack_ref_centrifuge_0d910a96
¶ Unpacks ref_centrifuge archive:
URL: ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/p+h+v.tar.gz
Files:
p+h+v.1.cf
p+h+v.2.cf
p+h+v.3.cf
-
rule
unpack_ref_centrifuge_1ee7c028
¶ Unpacks ref_centrifuge archive:
URL: ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/nt.tar.gz
Files:
nt.1.cf
nt.2.cf
nt.3.cf
-
rule
unpack_ref_centrifuge_43ba6165
¶ Unpacks ref_centrifuge archive:
URL: ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/p_compressed.tar.gz
Files:
p_compressed.1.cf
p_compressed.2.cf
p_compressed.3.cf
-
rule
unpack_ref_centrifuge_a9964521
¶ Unpacks ref_centrifuge archive:
URL: ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/p_compressed+h+v.tar.gz
Files:
p_compressed+h+v.1.cf
p_compressed+h+v.2.cf
p_compressed+h+v.3.cf
-
rule
unpack_ref_greengenes_305aa905
¶ Unpacks ref_greengenes archive:
URL: ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz
Files:
rep_set/99_otus.fasta
rep_set/97_otus.fasta
rep_set/94_otus.fasta
-
rule
unpack_ref_metaphlan2_a6545140
¶ Unpacks ref_metaphlan2 archive:
URL: https://depot.galaxyproject.org/software/metaphlan2/metaphlan2_2.6.0_src_all.tar.gz
Files:
db_v20/mpa_v20_m200.1.bt2
db_v20/mpa_v20_m200.2.bt2
db_v20/mpa_v20_m200.3.bt2
db_v20/mpa_v20_m200.4.bt2
db_v20/mpa_v20_m200.rev.1.bt2
db_v20/mpa_v20_m200.rev.2.bt2
db_v20/mpa_v20_m200.pkl
-
rule
unpack_ref_mothur_SEED_39c9f686
¶ Unpacks ref_mothur_SEED archive:
URL: https://www.mothur.org/w/images/a/a4/Silva.seed_v128.tgz
Files:
silva.seed_v128.tax
silva.seed_v128.align
-
rule
-
stage
remove_bbmap
[source]¶ Filter reads by reference
This stage aligns the reads with a given reference using
BBMap
in fast mode. Matching reads are collected in the stage filter_bbmap and remaining reads are collectec in the stage remove_bbmap.>>> ymp make toy.ref_phiX.index_bbmap.remove_bbmap >>> ymp make toy.ref_phiX.index_bbmap.filter_bbmap >>> ymp make mpic.ref_phiX.index_bbmap.remove_bbmap
-
stage
split_library
[source]¶ Demultiplexes amplicon sequencing files
This rule is treated specially. If a configured project specifies a
barcode_col
, reads from the file (or files) are used in combination with
-
stage
trim_bbmap
[source]¶ Trim adapters and low quality bases from reads
Applies
BBMap's
“bbduk.sh”.- Parameters:
A: append to enable adapter trimming Q20: append to select phred score cutoff (default 20) L20: append to select minimum read length (default 20)
>>> ymp make toy.trim_bbmap >>> ymp make toy.trim_bbmapA >>> ymp make toy.trim_bbmapAQ10L10 >>> ymp make mpic.trim_bbmap
-
stage
trim_sickle
[source]¶ Perform read trimming using Sickle
>>> ymp make toy.trim_sickle >>> ymp make toy.trim_sickleQ10L10 >>> ymp make mpic.trim_sickleL20
-
stage
trim_trimmomatic
[source]¶ Adapter trim reads using trimmomatic
>>> ymp make toy.trim_trimmomaticT32 >>> ymp make mpic.trim_trimmomatic
-
rule
mkdir
[source]¶ Auto-create directories listed in ymp config.
Use these as input: >>> input: tmpdir = ancient(ymp.get_config().dir.tmp) Or as param: >>> param: tmpdir = “/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/doc/tmp”