Stages¶
Listing of stages implemented in YMP
-
stage
annotate_blast
[source]¶ Annotate sequences with BLAST
Searches a reference database for hits with
blastn
. UseE
flag to specify exponent to required E-value. UseN
orMega
to specify default. UseBest
to add-subject_besthit
flag.
-
stage
annotate_prodigal
[source]¶ Call genes using prodigal
>>> ymp make toy.ref_genome.annotate_prodigal
-
stage
assemble_megahit
[source]¶ Assemble metagenome using MegaHit.
>>> ymp make toy.assemble_megahit.map_bbmap >>> ymp make toy.group_ALL.assemble_megahit.map_bbmap >>> ymp make toy.group_Subject.assemble_megahit.map_bbmap
-
stage
assemble_metaspades
[source]¶ Assemble reads using metaspades
>>> ymp make toy.assemble_metaspades >>> ymp make toy.group_ALL.assemble_metaspades >>> ymp make toy.group_Subject.assemble_metaspades
-
stage
check
[source]¶ Verify file availability
This stage provides rules for checking the file availability at a given point in the stage stack.
Mainly useful for testing and debugging.
-
stage
cluster_cdhit
[source]¶ Clusters protein sequences using CD-HIT
>>> ymp make toy.ref_query.cluster_cdhit
-
stage
correct_bbmap
[source]¶ Correct read errors by overlapping inside tails
Applies
BBMap's
“bbmerge.sh ecco” mode. This will overlap the inside of read pairs and choose the base with the higher quality where the alignment contains mismatches and increase the quality score as indicated by the double observation where the alignment contains matches.>>> ymp make toy.correct_bbmap >>> ymp make mpic.correct_bbmap
-
stage
dedup_bbmap
[source]¶ Remove duplicate reads
Applies BBMap’s “dedupe.sh”
>>> ymp make toy.dedup_bbmap >>> ymp make mpic.dedup_bbmap
-
stage
dust_bbmap
[source]¶ Perform entropy filtering on reads using BBMap’s bbduk.sh
The parameter
Enn
gives the entropy cutoff. Higher values filter more sequences.>>> ymp make toy.dust_bbmap >>> ymp make toy.dust_bbmapE60
-
stage
extract_reads
[source]¶ Extract reads from BAM file using
samtools fastq
.Parameters
fn
,Fn
andGn
are passed through. Some options include:f2: fully mapped (only proper pairs)
F2: not fully mapped (unmapped at least one read)
f12: not mapped (neither read mapped)
-
stage
extract_seqs
[source]¶ Extract sequences from
.fasta.gz
file usingsamtools faidx
Currently requires a
.blast7
file as input.Use parameter
Nomatch
to instead keep unmatched sequences.
-
stage
filter_bmtagger
[source]¶ Filter(-out) contaminant reads using BMTagger
>>> ymp make toy.ref_phiX.index_bmtagger.remove_bmtagger >>> ymp make toy.ref_phiX.index_bmtagger.remove_bmtagger.assemble_megahit >>> ymp make toy.ref_phiX.index_bmtagger.filter_bmtagger >>> ymp make mpic.ref_phiX.index_bmtagger.remove_bmtagger
-
stage
format_bbmap
[source]¶ Process sequences with BBMap’s format.sh
Parameter
Ln
filters sequences at a minimum length.>>> ymp make toy.assemble_metaspades.format_bbmapL200
-
stage
map_bbmap
[source]¶ Map reads using BBMap
>>> ymp make toy.assemble_megahit.map_bbmap >>> ymp make toy.ref_genome.map_bbmap >>> ymp make mpic.ref_ssu.map_bbmap
-
stage
map_bowtie2
[source]¶ Map reads using Bowtie2
>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2 >>> ymp make toy.assemble_megahit.index_bowtie2.map_bowtie2 >>> ymp make toy.group_Subject.assemble_megahit.index_bowtie2.map_bowtie2 >>> ymp make mpic.ref_ssu.index_bowtie2.map_bowtie2
-
stage
primermatch_bbmap
[source]¶ Filters reads by matching reference primer
>>> ymp make mpic.ref_primers.primermatch_bbmap
-
stage
references
[source]¶ This is a “virtual” stage. It does not process read data, but comprises rules used for reference provisioning.
-
stage
remove_bbmap
[source]¶ Filter reads by reference
This stage aligns the reads with a given reference using BBMap in fast mode. Matching reads are collected in the stage filter_bbmap and remaining reads are collectec in the stage remove_bbmap.
>>> ymp make toy.ref_phiX.index_bbmap.remove_bbmap >>> ymp make toy.ref_phiX.index_bbmap.filter_bbmap >>> ymp make mpic.ref_phiX.index_bbmap.remove_bbmap
-
stage
split_library
[source]¶ Demultiplexes amplicon sequencing files
This rule is treated specially. If a configured project specifies a
barcode_col
, reads from the file (or files) are used in combination with
-
stage
trim_bbmap
[source]¶ Trim adapters and low quality bases from reads
Applies BBMap’s “bbduk.sh”.
- Parameters:
A: append to enable adapter trimming Q20: append to select phred score cutoff (default 20) L20: append to select minimum read length (default 20)
>>> ymp make toy.trim_bbmap >>> ymp make toy.trim_bbmapA >>> ymp make toy.trim_bbmapAQ10L10 >>> ymp make mpic.trim_bbmap
-
stage
trim_sickle
[source]¶ Perform read trimming using Sickle
>>> ymp make toy.trim_sickle >>> ymp make toy.trim_sickleQ10L10 >>> ymp make mpic.trim_sickleL20
-
stage
trim_trimmomatic
[source]¶ Adapter trim reads using trimmomatic
>>> ymp make toy.trim_trimmomaticT32 >>> ymp make mpic.trim_trimmomatic
-
rule
blast7_extract
[source]¶ Generates meta-data csv and sequence fasta pair from blast7 file for one gene.
-
rule
gunzip
[source]¶ Generic temporary gunzip
Use
ruleorder: gunzip > myrule
to prefer gunzipping over re-running a rule. E.g.>>> ruleorder: gunzip > myrule >>> rule myrule: >>> output: temp("some.txt"), "some.txt.gzip"
-
rule
mkdir
[source]¶ Auto-create directories listed in ymp config.
Use these as input: >>> input: tmpdir = ancient(icfg.dir.tmp)