Table Of Conents¶
YMP - a Flexible Omics Pipeline¶
Welcome to the YMP documentation!
YMP is a tool that makes it easy to process large amounts of NGS read data. It comes “batteries included” with everything needed to preprocess your reads (QC, trimming, contaminant removal), assemble metagenomes, annotate assemblies, or assemble and quantify RNA-Seq transcripts, offering a choice of tools for each of those procecssing stages. When your needs exceed what the stock YMP processing stages provide, you can easily add your own, using YMP to drive novel tools, tools specific to your area of research, or tools you wrote yourself.
Features:¶
- batteries included
YMP comes with a large number of Stages implementing common read processing steps. These stages cover the most common topics, including quality control, filtering and sorting of reads, assembly of metagenomes and transcripts, read mapping, community profiling, visualisation and pathway analysis.
For a complete list, check the documentation or the source.
- get started quickly
Simply point YMP at a folder containing read files, at a mapping file, a list of URLs or even an SRA RunTable and YMP will configure itself. Use tab expansion to complete your desired series of stages to be applied to your data. YMP will then proceed to do your bidding, downloading raw read files and reference databases as needed, installing requisite software environments and scheduling the execution of tools either locally or on your cluster.
- explore alternative workflows
Not sure which assembler works best for your data, or what the effect of more stringent quality trimming would be? YMP is made for this! By keeping the output of each stage in a folder named to match the stack of applied stages, YMP can manage many variant workflows in parallel, while minimizing the amount of duplicate computation and storage.
- go beyond the beaten path
Built on top of Bioconda and Snakemake, YMP is easily extended with your own Snakefiles, allowing you to integrate any type of processing you desire into YMP, including your own, custom made tools. Within the YMP framework, you can also make use of the extensions to the Snakemake language provided by YMP (default values, inheritance, recursive wildcard expansion, etc.), making writing rules less error prone and repetative.
Background¶
Bioinformatical data processing workflows can easily get very complex, even convoluted. On the way from the raw read data to publishable results, a sizeable collection of tools needs to be applied, intermediate outputs verified, reference databases selected, and summary data produced. A host of data files must be managed, processed individually or aggregated by host or spatial transect along the way. And, of course, to arrive at a workflow that is just right for a particular study, many alternative workflow variants need to be evaluated. Which tools perform best? Which parameters are right? Does re-ordering steps make a difference? Should the data be assembled individually, grouped, or should a grand co-assembly be computed? Which reference database is most appropriate?
Answering these questions is a time consuming process, justifying the plethora of published ready made pipelines each providing a polished workflow for a typical study type or use case. The price for the convenience of such a polished pipeline is the lack of flexibility - they are not meant to be adapted or extended to match the needs of a particular study. Workflow management systems on the other hand offer great flexibility by focussing on the orchestration of user defined workflows, but typicially require significant initial effort as they come without predefined workflows.
YMP strives to walk the middle ground between these. It brings everything needed to classic metagenome and RNA-Seq workflows, yet built on the workflow management system Snakemake, it can be easily expanded by simply adding Snakemake rules files. Designed around the needs of processing primarily multi-omic NGS read data, it brings a framework for handling read file meta data, provisioning reference databases, and organizing rules into semantic stages.
Installing and Updating YMP¶
Working with the Github Development Version¶
Installing from GitHub¶
Clone the repository:
git clone --recurse-submodules https://github.com/epruesse/ymp.git
Or, if your have github ssh keys set up:
git clone --recurse-submodules git@github.com:epruesse/ymp.git
Create and activate conda environment:
conda env create -n ymp --file environment.yaml source activate ymp
Install YMP into conda environment:
pip install -e .
Verify that YMP works:
source activate ymp ymp --help
Updating Development Version¶
Usually, all you need to do is a pull:
git pull
git submodule update --recursive --remote
If environments where updated, you may want to regenerate the local installations and clean out environments no longer used to save disk space:
source activate ymp
ymp env update
ymp env clean
# alternatively, you can just delete existing envs and let YMP
# reinstall as needed:
# rm -rf ~/.ymp/conda*
conda clean -a
If you see errors before jobs are executed, the core requirements may have changed. To update the YMP conda environment, enter the folder where you installed YMP and run the following:
source activate ymp
conda env update --file environment.yaml
If something changed in setup.py
, a re-install may be necessary:
source activate ymp
pip install -U -e .
Configuration¶
YMP reads its configuration from a YAML formatted file ymp.yml
. To
run YMP, you need to first tell it which datasets you want to process
and where it can find them.
Contents
Getting Started¶
A simple configuration looks like this:
projects:
myproject:
data: mapping.csv
This tells YMP to look for a file mapping.csv
located in the same
folder as your ymp.yml
listing the datasets for the project
myproject
. By default, YMP will use the left most unique column as
names for your datasets and try to guess which columns point to your
input data.
The matching mapping.csv
might look like this:
sample,fq1,fq2
foot,sample1_1.fq.gz,sample1_2.fq.gz
hand,sample2_1.fq,gz,sample2_2.fq.gz
So we have two samples, foot
and hand
, and the read files for
those in the same directory as the configuration file. Using relative or
absolute paths you can point to any place in your filesystem. You can
also use SRA references like SRR123456
or URLs pointing to remote
files.
The mapping file itself may be in comma separated or tab separated
format or may be an Excel file. For Excel files, you may specify the
sheet to be used separated from the file name by a %
sign. For
example:
project:
myproject:
data: myproject.xlsx%sheet3
The matching Excel file could then have a sheet3
with this content:
sample
fq1
fq2
srr
foot
/data/foot1.fq.gz
/data/foot2.fq.gz
hand
SRR123456
head
SRR234234
For foot
, the two gzipped FastQ files are used. The data for
hand
is retrieved from SRA and the data for head
downloaded from
datahost
. The SRR number for head
is ignored as the URL pair is
found first.
Referencing Read Files¶
YMP will search your map file data for references to the read data files. It understands three types of references to your reads:
- Local FastQ files:
data/some_1.fq.gz, data/some_2.fq.gz
The file names should end in
.fastq
or.fq
, optionally followed by.gz
if your data is compressed. You need to provide forward and reverse reads in separate columns; the left most column is assumed to refer to the forward reads.If the filename is relative (does not start with a
/
), it is assumed to be relative to the location ofymp.yml
.- Remote FastQ files:
http://myhost/some_1.fq.gz, http://myhost/some_2.fq.gz
If the filename starts with
http://
orhttps://
, YMP will download the files automatically.Forward and reverse reads need to be either both local or both remote.
- SRA Run IDs:
SRR123456
Instead of giving names for FastQ files, you may provide SRA Run accessions, e.g.
SRR123456
(orERRnnn
orDRRnnn
for runs originally submitted to EMBL or DDBJ, respectively). YMP will usefastq-dump
to download and extract the SRA files.
Which type to use is determined for each row in your map file data individually. From left to right, the first recognized data source is used in the order they are listed above.
Configuration processing an SRA RunTable:
projects: smith17: data: - SraRunTable.txt id_col: Sample_Name_s
Project Configuration¶
Each project must have a data
key defining which mapping file(s) to
load. This may be a simple string referring to the file (URLs are OK as
well) or a more complex
configuration.
Specifying Columns¶
By default, YMP will choose the columns to use as data set name and to locate the read data automatically. You can override this behavior by specifying the columns explicitly:
Data set names:
id_col: Sample
The left most unique column may not always be the most informative to use as names for the datasets. In the above example, we specify the column to use explicitly with the line
id_col: Sample_Name_s
as the columns in SRA run tables are sorted alpha-numerically and the left most unique one may well contain random numeric data.Default: left most unique column
Data set read columns:
reads_cols: [fq1, fq2]
If your map files contain multiple references to source files, e.g. local and remote, and the order of preference used by YMP does not meet your needs you can restrict the search for suitable data references to a set of columns using the key
read_cols
.Default: all columns
Multiple Mapping Files per Project¶
To combine data sets from multiple mapping files, simply list the files
under the data
key:
projects:
myproject:
data:
- sequencing_run_1.txt
- sequencing_run_2.txt
The files should at least share one column containing unique values to use as names for the datasets.
If you need to merge meta-data spread over multiple files, you can use
the join
key:
project:
myproject:
data:
- join:
- SraRunTable.txt
- metadata.xlsx%reference_project
- metadata.xlsx%our_samples
This will merge rows from SraRunTable.txt
with rows in the
reference_project
sheet in metadata.xls
if all columns of the
same name contain the same data (natural join) and add samples from the
our_samples
sheet to the bottom of the list.
Complete Example¶
projects:
myproject:
data:
- join:
- SraRunTable.txt
- metadata.xlsx%reference_project
- metadata.xlsx%our_samples
- mapping.csv
id_col: Sample
read_cols:
- fq1
- fq2
- Run_s
Command Line¶
ymp¶
Welcome to YMP!
Please find the full manual at https://ymp.readthedocs.io
ymp [OPTIONS] COMMAND [ARGS]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
--version
¶
Show the version and exit.
-
--install-completion
¶
Install command completion for the current shell. Make sure to have psutil installed.
-
--profile
<profile>
¶ Profile execution time using Yappi
env¶
Manipulate conda software environments
These commands allow accessing the conda software environments managed by YMP. Use e.g.
>>> $(ymp env activate multiqc)
to enter the software environment for multiqc
.
ymp env [OPTIONS] COMMAND [ARGS]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
activate¶
source activate environment
Usage: $(ymp activate env [ENVNAME])
ymp env activate [OPTIONS] ENVNAME
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
Arguments
-
ENVNAME
¶
Required argument
clean¶
Remove unused conda environments
ymp env clean [OPTIONS] [ENVNAMES]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
-a
,
--all
¶
Delete all environments
Arguments
-
ENVNAMES
¶
Optional argument(s)
export¶
Export conda environments
Resolved package specifications for the selected conda
environments can be exported either in YAML format suitable for
use with conda env create -f FILE
or in TXT format containing
a list of URLs suitable for use with conda create --file
FILE
. Please note that the TXT format is platform specific.
If other formats are desired, use ymp env list
to view the
environments’ installation path (“prefix” in conda lingo) and
export the specification with the conda
command line utlity
directly.
ymp env export [OPTIONS] [ENVNAMES]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
-d
,
--dest
<FILE>
¶ Destination file or directory. If a directory, file names will be derived from environment names and selected export format. Default: print to standard output.
-
-f
,
--overwrite
¶
Overwrite existing files
-
-c
,
--create-missing
¶
Create environments not yet installed
-
-s
,
--skip-missing
¶
Skip environments not yet installed
-
-t
,
--filetype
<filetype>
¶ Select export format. Default: yml unless FILE ends in ‘.txt’
- Options
yml|txt
Arguments
-
ENVNAMES
¶
Optional argument(s)
install¶
Install conda software environments
ymp env install [OPTIONS] [ENVNAMES]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
-p
,
--conda-prefix
<conda_prefix>
¶ Override location for conda environments
-
-e
,
--conda-env-spec
<conda_env_spec>
¶ Override conda env specs settings
-
-n
,
--dry-run
¶
Only show what would be done
-
-f
,
--force
¶
Install environment even if it already exists
Arguments
-
ENVNAMES
¶
Optional argument(s)
list¶
List conda environments
ymp env list [OPTIONS] [ENVNAMES]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
--static
,
--no-static
¶
List environments statically defined via env.yml files
-
--dynamic
,
--no-dynamic
¶
List environments defined inline from rule files
-
-a
,
--all
¶
List all environments, including outdated ones.
-
-s
,
--sort
<sort_col>
¶ Sort by column
- Options
name|hash|path|installed
-
-r
,
--reverse
¶
Reverse sort order
Arguments
-
ENVNAMES
¶
Optional argument(s)
prepare¶
Create envs needed to build target
ymp env prepare [OPTIONS] TARGET_FILES
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
-n
,
--dryrun
¶
Only show what would be done
-
-p
,
--printshellcmds
¶
Print shell commands to be executed on shell
-
-k
,
--keepgoing
¶
Don’t stop after failed job
-
--lock
,
--no-lock
¶
Use/don’t use locking to prevent clobbering of files by parallel instances of YMP running
-
--rerun-incomplete
,
--ri
¶
Re-run jobs left incomplete in last run
-
-F
,
--forceall
¶
Force rebuilding of all stages leading to target
-
-f
,
--force
¶
Force rebuilding of target
-
--notemp
¶
Do not remove temporary files
-
-t
,
--touch
¶
Only touch files, faking update
-
--shadow-prefix
<shadow_prefix>
¶ Directory to place data for shadowed rules
-
-r
,
--reason
¶
Print reason for executing rule
-
-N
,
--nohup
¶
Don’t die once the terminal goes away.
Arguments
-
TARGET_FILES
¶
Optional argument(s)
remove¶
Remove conda environments
ymp env remove [OPTIONS] [ENVNAMES]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
Arguments
-
ENVNAMES
¶
Optional argument(s)
run¶
Execute COMMAND with activated environment ENV
Usage: ymp env run <ENV> [–] <COMMAND…>
- (Use the “–” if your command line contains option type parameters
beginning with - or –)
ymp env run [OPTIONS] ENVNAME [COMMAND]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
Arguments
-
ENVNAME
¶
Required argument
-
COMMAND
¶
Optional argument(s)
update¶
Update conda environments
ymp env update [OPTIONS] [ENVNAMES]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
--reinstall
<reinstall>
¶ Remove and reinstall environments rather than trying to update
Arguments
-
ENVNAMES
¶
Optional argument(s)
init¶
Initialize YMP workspace
ymp init [OPTIONS] COMMAND [ARGS]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
cluster¶
Set up cluster
ymp init cluster [OPTIONS]
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
-y
,
--yes
¶
Confirm every prompt
make¶
Build target(s) locally
ymp make [OPTIONS] TARGET_FILES
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
-n
,
--dryrun
¶
Only show what would be done
-
-p
,
--printshellcmds
¶
Print shell commands to be executed on shell
-
-k
,
--keepgoing
¶
Don’t stop after failed job
-
--lock
,
--no-lock
¶
Use/don’t use locking to prevent clobbering of files by parallel instances of YMP running
-
--rerun-incomplete
,
--ri
¶
Re-run jobs left incomplete in last run
-
-F
,
--forceall
¶
Force rebuilding of all stages leading to target
-
-f
,
--force
¶
Force rebuilding of target
-
--notemp
¶
Do not remove temporary files
-
-t
,
--touch
¶
Only touch files, faking update
-
--shadow-prefix
<shadow_prefix>
¶ Directory to place data for shadowed rules
-
-r
,
--reason
¶
Print reason for executing rule
-
-N
,
--nohup
¶
Don’t die once the terminal goes away.
-
-j
,
--cores
<CORES>
¶ The number of parallel threads used for scheduling jobs
-
--dag
¶
Print the Snakemake execution DAG and exit
-
--rulegraph
¶
Print the Snakemake rule graph and exit
-
--debug-dag
¶
Show candidates and selections made while the rule execution graph is being built
-
--debug
¶
Set the Snakemake debug flag
Arguments
-
TARGET_FILES
¶
Optional argument(s)
show¶
Show configuration properties
ymp show [OPTIONS] PROPERTY
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
-h
,
--help
¶
-
-s
,
--source
¶
Show source
Arguments
-
PROPERTY
¶
Optional argument
stage¶
Manipulate YMP stages
ymp stage [OPTIONS] COMMAND [ARGS]...
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
list¶
List available stages
ymp stage list [OPTIONS] STAGE
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
-l
,
--long
¶
Show full stage descriptions
-
-s
,
--short
¶
Show only stage names
-
-c
,
--code
¶
Show definition file name and line number
-
-t
,
--types
¶
Show input/output types
Arguments
-
STAGE
¶
Optional argument(s)
submit¶
Build target(s) on cluster
The parameters for cluster execution are drawn from layered profiles. YMP includes base profiles for the “torque” and “slurm” cluster engines.
ymp submit [OPTIONS] TARGET_FILES
Options
-
-P
,
--pdb
¶
Drop into debugger on uncaught exception
-
-q
,
--quiet
¶
Decrease log verbosity
-
-v
,
--verbose
¶
Increase log verbosity
-
--log-file
<log_file>
¶ Specify a log file
-
-n
,
--dryrun
¶
Only show what would be done
-
-p
,
--printshellcmds
¶
Print shell commands to be executed on shell
-
-k
,
--keepgoing
¶
Don’t stop after failed job
-
--lock
,
--no-lock
¶
Use/don’t use locking to prevent clobbering of files by parallel instances of YMP running
-
--rerun-incomplete
,
--ri
¶
Re-run jobs left incomplete in last run
-
-F
,
--forceall
¶
Force rebuilding of all stages leading to target
-
-f
,
--force
¶
Force rebuilding of target
-
--notemp
¶
Do not remove temporary files
-
-t
,
--touch
¶
Only touch files, faking update
-
--shadow-prefix
<shadow_prefix>
¶ Directory to place data for shadowed rules
-
-r
,
--reason
¶
Print reason for executing rule
-
-N
,
--nohup
¶
Don’t die once the terminal goes away.
-
-P
,
--profile
<NAME>
¶ Select cluster config profile to use. Overrides cluster.profile setting from config.
-
-c
,
--snake-config
<FILE>
¶ Provide snakemake cluster config file
-
-d
,
--drmaa
¶
Use DRMAA to submit jobs to cluster. Note: Make sure you have a working DRMAA library. Set DRMAA_LIBRAY_PATH if necessary.
-
-s
,
--sync
¶
Use synchronous cluster submission, keeping the submit command running until the job has completed. Adds qsub_sync_arg to cluster command
-
-i
,
--immediate
¶
Use immediate submission, submitting all jobs to the cluster at once.
-
--command
<CMD>
¶ Use CMD to submit job script to the cluster
-
--wrapper
<CMD>
¶ Use CMD as script submitted to the cluster. See Snakemake documentation for more information.
-
--max-jobs-per-second
<N>
¶ Limit the number of jobs submitted per second
-
-l
,
--latency-wait
<T>
¶ Time in seconds to wait after job completed until files are expected to have appeared in local file system view. On NFS, this time is governed by the acdirmax mount option, which defaults to 60 seconds.
-
-J
,
--cluster-cores
<N>
¶ Limit the maximum number of cores used by jobs submitted at a time
-
-j
,
--cores
<N>
¶ Number of local threads to use
-
--args
<ARGS>
¶ Additional arguments passed to cluster submission command. Note: Make sure the first character of the argument is not ‘-‘, prefix with ‘ ‘ as necessary.
-
--scriptname
<NAME>
¶ Set the name template used for submitted jobs
Arguments
-
TARGET_FILES
¶
Optional argument(s)
Stages¶
Listing of stages implemented in YMP
-
stage
annotate_blast
[source]¶ Annotate sequences with BLAST
Searches a reference database for hits with
blastn
. UseE
flag to specify exponent to required E-value. UseN
orMega
to specify default. UseBest
to add-subject_besthit
flag.
-
stage
annotate_prodigal
[source]¶ Call genes using prodigal
>>> ymp make toy.ref_genome.annotate_prodigal
-
stage
assemble_megahit
[source]¶ Assemble metagenome using MegaHit.
>>> ymp make toy.assemble_megahit.map_bbmap >>> ymp make toy.group_ALL.assemble_megahit.map_bbmap >>> ymp make toy.group_Subject.assemble_megahit.map_bbmap
-
stage
assemble_metaspades
[source]¶ Assemble reads using metaspades
>>> ymp make toy.assemble_metaspades >>> ymp make toy.group_ALL.assemble_metaspades >>> ymp make toy.group_Subject.assemble_metaspades
-
stage
check
[source]¶ Verify file availability
This stage provides rules for checking the file availability at a given point in the stage stack.
Mainly useful for testing and debugging.
-
stage
cluster_cdhit
[source]¶ Clusters protein sequences using CD-HIT
>>> ymp make toy.ref_query.cluster_cdhit
-
stage
correct_bbmap
[source]¶ Correct read errors by overlapping inside tails
Applies
BBMap's
“bbmerge.sh ecco” mode. This will overlap the inside of read pairs and choose the base with the higher quality where the alignment contains mismatches and increase the quality score as indicated by the double observation where the alignment contains matches.>>> ymp make toy.correct_bbmap >>> ymp make mpic.correct_bbmap
-
stage
dedup_bbmap
[source]¶ Remove duplicate reads
Applies BBMap’s “dedupe.sh”
>>> ymp make toy.dedup_bbmap >>> ymp make mpic.dedup_bbmap
-
stage
dust_bbmap
[source]¶ Perform entropy filtering on reads using BBMap’s bbduk.sh
The parameter
Enn
gives the entropy cutoff. Higher values filter more sequences.>>> ymp make toy.dust_bbmap >>> ymp make toy.dust_bbmapE60
-
stage
extract_reads
[source]¶ Extract reads from BAM file using
samtools fastq
.Parameters
fn
,Fn
andGn
are passed through. Some options include:f2: fully mapped (only proper pairs)
F2: not fully mapped (unmapped at least one read)
f12: not mapped (neither read mapped)
-
stage
extract_seqs
[source]¶ Extract sequences from
.fasta.gz
file usingsamtools faidx
Currently requires a
.blast7
file as input.Use parameter
Nomatch
to instead keep unmatched sequences.
-
stage
filter_bmtagger
[source]¶ Filter(-out) contaminant reads using BMTagger
>>> ymp make toy.ref_phiX.index_bmtagger.remove_bmtagger >>> ymp make toy.ref_phiX.index_bmtagger.remove_bmtagger.assemble_megahit >>> ymp make toy.ref_phiX.index_bmtagger.filter_bmtagger >>> ymp make mpic.ref_phiX.index_bmtagger.remove_bmtagger
-
stage
format_bbmap
[source]¶ Process sequences with BBMap’s format.sh
Parameter
Ln
filters sequences at a minimum length.>>> ymp make toy.assemble_metaspades.format_bbmapL200
-
stage
map_bbmap
[source]¶ Map reads using BBMap
>>> ymp make toy.assemble_megahit.map_bbmap >>> ymp make toy.ref_genome.map_bbmap >>> ymp make mpic.ref_ssu.map_bbmap
-
stage
map_bowtie2
[source]¶ Map reads using Bowtie2
>>> ymp make toy.ref_genome.index_bowtie2.map_bowtie2 >>> ymp make toy.assemble_megahit.index_bowtie2.map_bowtie2 >>> ymp make toy.group_Subject.assemble_megahit.index_bowtie2.map_bowtie2 >>> ymp make mpic.ref_ssu.index_bowtie2.map_bowtie2
-
stage
primermatch_bbmap
[source]¶ Filters reads by matching reference primer
>>> ymp make mpic.ref_primers.primermatch_bbmap
-
stage
references
[source]¶ This is a “virtual” stage. It does not process read data, but comprises rules used for reference provisioning.
-
stage
remove_bbmap
[source]¶ Filter reads by reference
This stage aligns the reads with a given reference using BBMap in fast mode. Matching reads are collected in the stage filter_bbmap and remaining reads are collectec in the stage remove_bbmap.
>>> ymp make toy.ref_phiX.index_bbmap.remove_bbmap >>> ymp make toy.ref_phiX.index_bbmap.filter_bbmap >>> ymp make mpic.ref_phiX.index_bbmap.remove_bbmap
-
stage
split_library
[source]¶ Demultiplexes amplicon sequencing files
This rule is treated specially. If a configured project specifies a
barcode_col
, reads from the file (or files) are used in combination with
-
stage
trim_bbmap
[source]¶ Trim adapters and low quality bases from reads
Applies BBMap’s “bbduk.sh”.
- Parameters:
A: append to enable adapter trimming Q20: append to select phred score cutoff (default 20) L20: append to select minimum read length (default 20)
>>> ymp make toy.trim_bbmap >>> ymp make toy.trim_bbmapA >>> ymp make toy.trim_bbmapAQ10L10 >>> ymp make mpic.trim_bbmap
-
stage
trim_sickle
[source]¶ Perform read trimming using Sickle
>>> ymp make toy.trim_sickle >>> ymp make toy.trim_sickleQ10L10 >>> ymp make mpic.trim_sickleL20
-
stage
trim_trimmomatic
[source]¶ Adapter trim reads using trimmomatic
>>> ymp make toy.trim_trimmomaticT32 >>> ymp make mpic.trim_trimmomatic
-
rule
blast7_extract
[source]¶ Generates meta-data csv and sequence fasta pair from blast7 file for one gene.
-
rule
gunzip
[source]¶ Generic temporary gunzip
Use
ruleorder: gunzip > myrule
to prefer gunzipping over re-running a rule. E.g.>>> ruleorder: gunzip > myrule >>> rule myrule: >>> output: temp("some.txt"), "some.txt.gzip"
-
rule
mkdir
[source]¶ Auto-create directories listed in ymp config.
Use these as input: >>> input: tmpdir = ancient(icfg.dir.tmp)
API¶
ymp package¶
-
ymp.
get_config
()[source]¶ Access the current YMP configuration object.
This object might change once during normal execution: it is deleted before passing control to Snakemake. During unit test execution the object is deleted between all tests.
- Return type
-
ymp.
print_rule
= 0¶ Set to 1 to show the YMP expansion process as it is applied to the next Snakemake rule definition.
>>> ymp.print_rule = 1 >>> rule broken: >>> ...
>>> ymp make broken -vvv
-
ymp.
snakemake_versions
= ['5.20.1']¶ List of versions this version of YMP has been verified to work with
Subpackages¶
ymp.cli package¶
-
ymp.cli.
install_completion
(ctx, attr, value)[source]¶ Installs click_completion tab expansion into users shell
Submodules¶
ymp.cli.env module¶
ymp.cli.make module¶
Implements subcommands for ymp make
and ymp submit
-
class
ymp.cli.make.
TargetParam
[source]¶ Bases:
click.types.ParamType
Handles tab expansion for build targets
-
exception
ymp.cli.make.
YmpConfigNotFound
[source]¶ Bases:
ymp.exceptions.YmpException
Exception raised by YMP if no config was found in current path
ymp.cli.show module¶
Implements subcommands for ymp show
-
class
ymp.cli.show.
ConfigPropertyParam
[source]¶ Bases:
click.types.ParamType
Handles tab expansion for
ymp show
arguments-
complete
(_ctx, incomplete)[source]¶ Try to complete incomplete command
This is executed on tab or tab-tab from the shell
- Parameters
ctx – click context object
incomplete – last word in command line up until cursor
- Returns
list of words incomplete can be completed to
-
convert
(value, param, ctx)[source]¶ Convert value of param given context
- Parameters
value – string passed on command line
param – click parameter object
ctx – click context object
-
property
properties
¶ Find properties offered by ConfigMgr
-
ymp.stage package¶
YMP processes data in stages, each of which is contained in its own directory.
with Stage("trim_bbmap") as S:
S.doc("Trim reads with BBMap")
rule bbmap_trim:
output: "{:this:}/{sample}{:pairnames:}.fq.gz"
input: "{:prev:}/{sample}{:pairnames:}.fq.gz"
...
Submodules¶
ymp.stage.base module¶
-
class
ymp.stage.base.
BaseStage
(name)[source]¶ Bases:
object
Base class for stage types
-
STAMP_FILENAME
= 'all_targets.stamp'¶ The name of the stamp file that is touched to indicate completion of the stage.
-
can_provide
(inputs)[source]¶ Determines which of
inputs
this stage can provide.Returns a dictionary with the keys a subset of
inputs
and the values identifying redirections. An empty string indicates that no redirection is to take place. Otherwise, the string is the suffix to be appended to the priorStageStack
.
-
doc
(doc)[source]¶ Add documentation to Stage
- Parameters
doc (
str
) – Docstring passed to Sphinx- Return type
None
-
docstring
: str¶ The docstring describing this stage. Visible via
ymp stage list
and in the generated sphinx documentation.
-
get_all_targets
(stack)[source]¶ Targets to build to complete this stage given
stack
.Typically, this is the StageStack’s path appended with the stamp name.
-
get_inputs
()[source]¶ Returns the set of inputs required by this stage
This function must return a copy, to ensure internal data is not modified.
-
get_path
(stack)[source]¶ On disk location for this stage given
stack
.Called by
StageStack
to determine the real path for virtual stages (which must override this function).- Return type
-
match
(name)[source]¶ Check if the
name
can refer to this stageAs component of a
StageStack
, a stage may be identified by alternative names and may also be parametrized by suffix modifiers. Stage types supporting this behavior must override this function.- Return type
-
name
¶ The name of the stage is a string uniquely identifying it among all stages.
-
-
class
ymp.stage.base.
ConfigStage
(name, cfg)[source]¶ Bases:
ymp.stage.base.BaseStage
Base for stages created via configuration
These Stages derive from the
yml.yml
and not from a rules file.-
cfg
¶ The configuration object defining this Stage.
-
property
defined_in
¶ List of files defining this stage
Used to invalidate caches.
-
filename
¶ Semi-colon separated list of file names defining this Stage.
-
lineno
¶ Line number within the first file at which this Stage is defined.
-
ymp.stage.expander module¶
-
class
ymp.stage.expander.
StageExpander
[source]¶ Bases:
ymp.snakemake.ColonExpander
Registers rules with stages when they are created
-
class
Formatter
(expander)[source]¶ Bases:
ymp.snakemake.FormatExpander.Formatter
,ymp.string.PartialFormatter
ymp.stage.groupby module¶
-
class
ymp.stage.groupby.
GroupBy
(name)[source]¶ Bases:
ymp.stage.base.BaseStage
Dummy stage for grouping
ymp.stage.pipeline module¶
Pipelines Module
Contains classes for pre-configured pipelines comprising multiple stages.
-
class
ymp.stage.pipeline.
Pipeline
(name, cfg)[source]¶ Bases:
ymp.stage.base.ConfigStage
A virtual stage aggregating a sequence of stages, i.e. a pipeline or sub-workflow.
Pipelines are configured via
ymp.yml
.Example
- pipelines:
- my_pipeline:
stage_1
stage_2
stage_3
-
can_provide
(inputs)[source]¶ Determines which of
inputs
this stage can provide.The result dictionary values will point to the “real” output.
-
get_all_targets
(stack)[source]¶ Targets to build to complete this stage given
stack
.Typically, this is the StageStack’s path appended with the stamp name.
-
get_path
(stack)[source]¶ On disk location for this stage given
stack
.Called by
StageStack
to determine the real path for virtual stages (which must override this function).
-
property
outputs
¶ The outputs of a pipeline are the sum of the outputs of each component stage. Outputs of stages further down the pipeline override those generated earlier.
TODO: Allow hiding the output of intermediary stages.
-
property
pipeline
¶
ymp.stage.project module¶
-
class
ymp.stage.project.
PandasTableBuilder
[source]¶ Bases:
object
Builds the data table describing each sample in a project
This class implements loading and combining tabular data files as specified by the YAML configuration.
- Format:
string items are files
lists of files are concatenated top to bottom
dicts must have one “command” value:
‘join’ contains a two-item list the two items are joined ‘naturally’ on shared headers
‘table’ contains a list of one-item dicts dicts have form
key:value[,value...]
a in-place table is created from the keys list-of-dict is necessary as dicts are unordered‘paste’ contains a list of tables pasted left to right tables pasted must be of equal length or length 1
if a value is a valid path relative to the csv/tsv/xls file’s location, it is expanded to a path relative to CWD
Example
- top.csv - join: - excel.xslx%left.csv - right.tsv - table: - sample: s1,s2,s3 - fq1: s1.1.fq, s2.1.fq, s3.1.fq - fq2: s1.2.fq, s2.2.fq, s3.2.fq
-
class
ymp.stage.project.
Project
(name, cfg)[source]¶ Bases:
ymp.stage.base.ConfigStage
Contains configuration for a source dataset to be processed
-
KEY_BCCOL
= 'barcode_col'¶
-
KEY_DATA
= 'data'¶
-
KEY_IDCOL
= 'id_col'¶
-
KEY_READCOLS
= 'read_cols'¶
-
RE_FILE
= re.compile('^(?!http://).*(?:fq|fastq)(?:|\\.gz)$')¶
-
RE_REMOTE
= re.compile('^(?:https?|ftp|sftp)://(?:.*)')¶
-
RE_SRR
= re.compile('^[SED]RR[0-9]+$')¶
-
choose_id_column
()[source]¶ Configures column to use as index on runs
If explicitly configured via KEY_IDCOL, verifies that the column exists and that it is unique. Otherwise chooses the leftmost unique column in the data.
-
property
data
¶ Pandas dataframe of runs
Lazy loading property, first call may take a while.
-
property
fq_names
¶ Names of all FastQ files
-
property
fwd_fq_names
¶ Names of forward FastQ files (se and pe)
-
property
fwd_pe_fq_names
¶ Names of forward FastQ files part of pair
-
get_fq_names
(only_fwd=False, only_rev=False, only_pe=False, only_se=False)[source]¶ Get pipeline names of fq files
-
property
idcol
¶
-
property
outputs
¶ Returns the set of outputs this stage is able to generate.
May return either a
set
or adict
with the dictionary values representing redirections in the case of virtual stages such asPipeline
orReference
.
-
property
pe_fq_names
¶ Names of paired end FastQ files
-
property
project_name
¶
-
property
rev_pe_fq_names
¶ Names of reverse FastQ files part of pair
-
property
runs
¶ Pandas dataframe index of runs
Lazy loading property, first call may take a while.
-
property
se_fq_names
¶ Names of single end FastQ files
-
property
source_cfg
¶
-
property
variables
¶
-
ymp.stage.reference module¶
-
class
ymp.stage.reference.
Archive
(name, dirname, tar, url, strip, files)[source]¶ Bases:
object
-
dirname
= None¶
-
files
= None¶
-
hash
= None¶
-
name
= None¶
-
strip_components
= None¶
-
tar
= None¶
-
-
class
ymp.stage.reference.
Reference
(name, cfg)[source]¶ Bases:
ymp.stage.base.ConfigStage
Represents (remote) reference file/database configuration
-
get_path
(_stack)[source]¶ On disk location for this stage given
stack
.Called by
StageStack
to determine the real path for virtual stages (which must override this function).
-
ymp.stage.stack module¶
-
class
ymp.stage.stack.
StageStack
(path, stage=None)[source]¶ Bases:
object
The “head” of a processing chain - a stack of stages
-
property
defined_in
¶
-
classmethod
get
(path, stage=None)[source]¶ Cached access to StageStack
- Parameters
path – Stage path
stage – Stage object at head of stack
-
property
path
¶ On disk location of files provided by this stack
-
property
targets
¶ Returns the current targets
-
used_stacks
= {}¶
-
property
ymp.stage.stage module¶
-
class
ymp.stage.stage.
Param
(stage, key, name, value=None, default=None)[source]¶ Bases:
object
Stage Parameter (base class)
-
property
constraint
¶
-
property
-
class
ymp.stage.stage.
ParamChoice
(*args, **kwargs)[source]¶ Bases:
ymp.stage.stage.Param
Stage Choice Parameter
-
class
ymp.stage.stage.
ParamFlag
(*args, **kwargs)[source]¶ Bases:
ymp.stage.stage.Param
Stage Flag Parameter
-
class
ymp.stage.stage.
ParamInt
(*args, **kwargs)[source]¶ Bases:
ymp.stage.stage.Param
Stage Int Parameter
-
class
ymp.stage.stage.
Stage
(name, altname=None, env=None, doc=None)[source]¶ Bases:
ymp.snakemake.WorkflowObject
,ymp.stage.base.BaseStage
Creates a new stage
While entered using
with
, several stage specific variables are expanded within rules:{:this:}
– The current stage directory{:that:}
– The alternate output stage directory{:prev:}
– The previous stage’s directory
- Parameters
-
active
= None¶ Currently active stage (“entered”)
-
add_param
(key, typ, name, value=None, default=None)[source]¶ Add parameter to stage
Example
>>> with Stage("test") as S >>> S.add_param("N", "int", "nval", default=50) >>> rule: >>> shell: "echo {param.nval}"
This would add a stage “test”, optionally callable as “testN123”, printing “50” or in the case of “testN123” printing “123”.
- Parameters
char – The character to use in the Stage name
typ – The type of the parameter (int, flag)
param – Name of parameter in params
value – value
{param.xyz}
should be set to if param givendefault – default value for
{{param.xyz}}
if no param given
-
env
(name)[source]¶ Add package specifications to Stage environment
Note
This sets the environment for all rules within the stage, which leads to errors with Snakemake rule types not supporting conda environments
- Parameters
name (
str
) – Environment name or filename
>>> Env("blast", packages="blast =2.7*") >>> with Stage("test") as S: >>> S.env("blast") >>> rule testing: >>> ...
>>> with Stage("test", env="blast") as S: >>> rule testing: >>> ...
>>> with Stage("test") as S: >>> rule testing: >>> conda: "blast" >>> ...
- Return type
None
-
get_inputs
()[source]¶ Returns the set of inputs required by this stage
This function must return a copy, to ensure internal data is not modified.
-
match
(name)[source]¶ Check if the
name
can refer to this stageAs component of a
StageStack
, a stage may be identified by alternative names and may also be parametrized by suffix modifiers. Stage types supporting this behavior must override this function.
-
property
outputs
¶ Returns the set of outputs this stage is able to generate.
May return either a
set
or adict
with the dictionary values representing redirections in the case of virtual stages such asPipeline
orReference
.
-
require
(**kwargs)[source]¶ Override inferred stage inputs
In theory, this should not be needed. But it’s simpler for now.
Submodules¶
ymp.blast module¶
Parsers for blast output formats 6 (CSV) and 7 (CSV with comments between queries).
-
class
ymp.blast.
BlastParser
[source]¶ Bases:
object
Base class for BLAST parsers
-
FIELD_MAP
= {'% identity': 'pident', 'alignment length': 'length', 'bit score': 'bitscore', 'evalue': 'evalue', 'gap opens': 'gapopen', 'mismatches': 'mismatch', 'q. end': 'qend', 'q. start': 'qstart', 'query acc.': 'qacc', 'query frame': 'qframe', 'query length': 'qlen', 's. end': 'send', 's. start': 'sstart', 'sbjct frame': 'sframe', 'score': 'score', 'subject acc.': 'sacc', 'subject strand': 'sstrand', 'subject tax ids': 'staxids', 'subject title': 'stitle'}¶
-
FIELD_TYPE
= {'bitscore': <class 'float'>, 'evalue': <class 'float'>, 'gapopen': <class 'int'>, 'length': <class 'int'>, 'mismatch': <class 'int'>, 'pident': <class 'float'>, 'qend': <class 'int'>, 'qframe': <class 'int'>, 'qlen': <class 'int'>, 'qstart': <class 'int'>, 'score': <class 'float'>, 'send': <class 'int'>, 'sframe': <class 'int'>, 'sstart': <class 'int'>, 'staxids': <function BlastParser.tupleofint>, 'stitle': <class 'str'>}¶
-
-
class
ymp.blast.
Fmt6Parser
(fileobj)[source]¶ Bases:
ymp.blast.BlastParser
Parser for BLAST format 6 (CSV)
-
Hit
¶ alias of
BlastHit
-
field_types
= [None, None, <class 'float'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'float'>, <class 'float'>]¶
-
fields
= ['qseqid', 'sseqid', 'pident', 'length', 'mismatch', 'gapopen', 'qstart', 'qend', 'sstart', 'send', 'evalue', 'bitscore']¶ Default field types
-
-
class
ymp.blast.
Fmt7Parser
(fileobj)[source]¶ Bases:
ymp.blast.BlastParser
Parses BLAST results in format ‘7’ (CSV with comments)
-
DATABASE
= '# Database: '¶
-
FIELDS
= '# Fields: '¶
-
HITSFOUND
= ' hits found'¶
-
QUERY
= '# Query: '¶
-
-
ymp.blast.
reader
(fileobj, t=7)[source]¶ Creates a reader for files in BLAST format
>>> with open(blast_file) as infile: >>> reader = blast.reader(infile) >>> for hit in reader: >>> print(hit)
- Parameters
fileobj – iterable yielding lines in blast format
t (
int
) – number of blast format type
- Return type
ymp.blast2gff module¶
ymp.cluster module¶
Module handling talking to cluster management systems
>>> python -m ymp.cluster slurm status <jobid>
-
class
ymp.cluster.
Lsf
[source]¶ Bases:
ymp.cluster.ClusterMS
Talking to LSF
-
states
= {'DONE': 'success', 'EXIT': 'failed', 'PEND': 'running', 'POST_DONE': 'success', 'POST_ERR': 'failed', 'PSUSP': 'running', 'RUN': 'running', 'SSUSP': 'running', 'UNKWN': 'running', 'USUSP': 'running', 'WAIT': 'running'}¶
-
-
class
ymp.cluster.
Slurm
[source]¶ Bases:
ymp.cluster.ClusterMS
Talking to Slurm
-
states
= {'BOOT_FAIL': 'failed', 'CANCELLED': 'failed', 'COMPLETED': 'success', 'COMPLETING': 'running', 'CONFIGURING': 'running', 'DEADLINE': 'failed', 'FAILED': 'failed', 'NODE_FAIL': 'failed', 'PENDING': 'running', 'PREEMPTED': 'failed', 'RESIZING': 'running', 'REVOKED': 'running', 'RUNNING': 'running', 'SPECIAL_EXIT': 'running', 'SUSPENDED': 'running', 'TIMEOUT': 'failed'}¶
-
static
status
(jobid)[source]¶ Print status of job @param jobid to stdout (as needed by snakemake)
Anectotal benchmarking shows 200ms per invocation, half used by Python startup and half by calling sacct. Using
scontrol show job
instead ofsacct -pbs
is faster by 80ms, but finished jobs are purged after unknown time window.
-
ymp.common module¶
Collection of shared utility classes and methods
-
class
ymp.common.
AttrDict
[source]¶ Bases:
dict
AttrDict adds accessing stored keys as attributes to dict
-
class
ymp.common.
CacheDict
(cache, name, *args, loadfunc=None, itemloadfunc=None, itemdata=None, **kwargs)[source]¶ Bases:
ymp.common.AttrDict
-
class
ymp.common.
MkdirDict
[source]¶ Bases:
ymp.common.AttrDict
Creates directories as they are requested
ymp.config module¶
-
class
ymp.config.
ConfigExpander
(config_mgr)[source]¶ Bases:
ymp.snakemake.ColonExpander
-
class
Formatter
(expander)[source]¶ Bases:
ymp.snakemake.FormatExpander.Formatter
,ymp.string.PartialFormatter
-
class
-
class
ymp.config.
ConfigMgr
(root, conffiles)[source]¶ Bases:
object
Manages workflow configuration
This is a singleton object of which only one instance should be around at a given time. It is available in the rules files as
icfg
and viaymp.get_config()
elsewhere.ConfigMgr loads and maintains the workflow configuration as given in the
ymp.yml
files located in the workflow root directory, the user config folder (~/.ymp
) and the installationetc
folder.-
CONF_DEFAULT_FNAME
= '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/etc/defaults.yml'¶
-
CONF_FNAME
= 'ymp.yml'¶
-
CONF_USER_FNAME
= '/home/docs/.ymp/ymp.yml'¶
-
KEY_PIPELINES
= 'pipelines'¶
-
KEY_PROJECTS
= 'projects'¶
-
KEY_REFERENCES
= 'references'¶
-
RULE_MAIN_FNAME
= '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/rules/Snakefile'¶
-
property
absdir
¶ Dictionary of absolute paths of named YMP directories
-
property
cluster
¶ The YMP cluster configuration.
-
property
conda
¶
-
property
dir
¶ Dictionary of relative paths of named YMP directories
The directory paths are relative to the YMP root workdir.
-
property
ensuredir
¶ Dictionary of absolute paths of named YMP directories
Directories will be created on the fly as they are requested.
-
classmethod
find_config
()[source]¶ Locates ymp config files and ymp root
The root ymp work dir is determined as the first (parent) directory containing a file named
ConfigMgr.CONF_FNAME
(defaultymp.yml
).The stack of config files comprises 1. the default config
ConfigMgr.CONF_DEFAULT_FNAME
(etc/defaults.yml
in the ymp package directory), 2. the user configConfigMgr.CONF_USER_FNAME
(~/.ymp/ymp.yml
) and 3. theyml.yml
in the ymp root.- Returns
Root working directory conffiles: list of active configuration files
- Return type
root
-
property
limits
¶ The YMP limits configuration.
-
mem
(base='0', per_thread=None, unit='m')[source]¶ Clamp memory to configuration limits
- Params:
base: base memory requested per_thread: additional mem required per allocated thread unit: output unit (b, k, m, g, t)
-
property
pairnames
¶
-
property
pipeline
¶ Configure pipelines
-
property
platform
¶ Name of current platform (macos or linux)
-
property
ref
¶ Configure references
-
property
shell
¶ The shell used by YMP
Change by adding e.g.
shell: /path/to/shell
toymp.yml
.
-
property
snakefiles
¶ Snakefiles used under this config in parsing order
-
-
class
ymp.config.
OverrideExpander
(cfgmgr)[source]¶ Bases:
ymp.snakemake.BaseExpander
Apply rule attribute overrides from ymp.yml config
Example
Set the
wordsize
parameter in thebmtagger_bitmask
rule to 12:ymp.yml¶overrides: rules: bmtagger_bitmask: params: wordsize: 12
-
expand
(rule, ruleinfo, **kwargs)[source]¶ Expands RuleInfo object and children recursively.
Will call :meth:format (via :meth:format_annotated) on
str
items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.Set
ymp.print_rule = 1
before arule:
statement in snakefiles to enable debug logging of recursion.- Parameters
rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item
item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be
None
,str
,function
,int
,float
,dict
,list
ortuple
.expand_args – Parameters passed on late expansion (when the
dag
tries to instantiate therule
into ajob
.rec – Recursion level
-
ymp.download module¶
-
class
ymp.download.
FileDownloader
(block_size=4096, timeout=300, parallel=4, loglevel=30, alturls=None, retry=3)[source]¶ Bases:
object
Manages download of a set of URLs
Downloads happen concurrently using asyncronous network IO.
- Parameters
block_size (
int
) – Byte size of chunks to downloadtimeout (
int
) – Aiohttp cumulative timeoutparallel (
int
) – Number of files to download in parallelloglevel (
int
) – Log level for messages send to logging (Errors are send with loglevel+10)alturls – List of regexps modifying URLs
retry (
int
) – Number of times to retry download
-
error
(msg, *args, **kwargs)[source]¶ Send error to logger
Message is sent with a log level 10 higher than the default for this object.
- Return type
None
-
log
(msg, *args, modlvl=0, **kwargs)[source]¶ Send message to logger
Honors loglevel set for the FileDownloader object.
ymp.env module¶
This module manages the conda environments.
-
class
ymp.env.
CondaPathExpander
(config, *args, **kwargs)[source]¶ Bases:
ymp.snakemake.BaseExpander
Applies search path for conda environment specifications
File names supplied via
rule: conda: "some.yml"
are replaced with absolute paths if they are found in any searched directory. Eachsearch_paths
entry is appended to the directory containing the top level Snakefile and the directory checked for the filename. Thereafter, the stack of including Snakefiles is traversed backwards. If no file is found, the original name is returned.
-
class
ymp.env.
Env
(env_file=None, dag=None, singularity_img=None, container_img=None, cleanup=None, name=None, packages=None, base='none', channels=None, rule=None)[source]¶ Bases:
ymp.snakemake.WorkflowObject
,snakemake.deployment.conda.Env
Represents YMP conda environment
Snakemake expects the conda environments in a per-workflow directory configured by
conda_prefix
. YMP sets this value by default to~/.ymp/conda
, which has a greater chance of being on the same file system as the conda cache, allowing for hard linking of environment files.Within the folder
conda_prefix
, each environment is created in a folder named by the hash of the environment definition file’s contents and theconda_prefix
path. This class inherits fromsnakemake.deployment.conda.Env
to ensure that the hash we use is identical to the one Snakemake will use during workflow execution.The class provides additional features for updating environments, creating environments dynamically and executing commands within those environments.
Note
This is not called from within the execution. Snakemake instanciates its own Env object purely based on the filename.
Creates an inline defined conda environment
- Parameters
name (
Optional
[str
]) – Name of conda environment (and basename of file)packages (
Union
[list
,str
,None
]) – package(s) to be installed into environment. Version constraints can be specified in each package string separated from the package name by whitespace. E.g."blast =2.6*"
channels (
Union
[list
,str
,None
]) – channel(s) to be selected for the environmentbase (
str
) – Select a set of default channels and packages to be added to the newly created environment. Sets are defined in conda.defaults inyml.yml
-
create
(dryrun=False, force=False)[source]¶ Ensure the conda environment has been created
Inherits from snakemake.conda.Env.create
- Behavior of super class
The environment is installed in a folder in
conda_prefix
named according to a hash of theenvironment.yaml
defining the environment and the value ofconda-prefix
(Env.hash
). The latter is included as installed environments cannot be moved.If this folder (
Env.path
) exists, nothing is done.If a folder named according to the hash of just the contents of
environment.yaml
exists, the environment is created by unpacking the tar balls in that folder.
- Handling pre-computed environment specs
In addition to freezing environments by maintaining a copy of the package binaries, we allow maintaining a copy of the package binary URLs, from which the archive folder is populated on demand.
If a file
{Env.name}.txt
exists inconda.spec
FIXME
-
property
installed
¶
ymp.exceptions module¶
Exceptions raised by YMP
-
exception
ymp.exceptions.
YmpConfigError
(obj, msg, key=None, exc=None)[source]¶ Bases:
ymp.exceptions.YmpNoStackException
Indicates an error in the ymp.yml config files
-
exception
ymp.exceptions.
YmpNoStackException
(message)[source]¶ Bases:
ymp.exceptions.YmpException
,click.exceptions.ClickException
Exception that does not lead to stack trace on CLI
Inheriting from ClickException makes
click
print only theself.msg
value of the exception, rather than allowing Python to print a full stack trace.This is useful for exceptions indicating usage or configuration errors. We use this, instead of
click.UsageError
and friends so that the exceptions can be caught and handled explicitly where needed.Note that click will call the
show
method on this object to print the exception. The default implementation from click will just prefix themsg
withError:
.- FIXME: This does not work if the exception is raised from within
the snakemake workflow as snakemake.snakemake catches and reformats exceptions.
-
exception
ymp.exceptions.
YmpRuleError
(obj, msg)[source]¶ Bases:
ymp.exceptions.YmpNoStackException
Indicates an error in the rules files
This could e.g. be a Stage or Environment defined twice.
- Parameters
-
exception
ymp.exceptions.
YmpStageError
(msg)[source]¶ Bases:
ymp.exceptions.YmpNoStackException
Indicates an error in the requested stage stack
-
exception
ymp.exceptions.
YmpSystemError
(message)[source]¶ Bases:
ymp.exceptions.YmpNoStackException
Indicates problem running YMP with available system software
-
exception
ymp.exceptions.
YmpWorkflowError
(message)[source]¶ Bases:
ymp.exceptions.YmpNoStackException
Indicates an error during workflow execution
E.g. failures to expand dynamic variables
ymp.gff module¶
Implements simple reader and writer for GFF (general feature format) files.
Unfinished
only supports one version, GFF 3.2.3.
no escaping
-
class
ymp.gff.
Attributes
(ID, Name, Alias, Parent, Target, Gap, Derives_From, Note, Dbxref, Ontology_term, Is_circular)¶ Bases:
tuple
Create new instance of Attributes(ID, Name, Alias, Parent, Target, Gap, Derives_From, Note, Dbxref, Ontology_term, Is_circular)
-
property
Alias
¶ Alias for field number 2
-
property
Dbxref
¶ Alias for field number 8
-
property
Derives_From
¶ Alias for field number 6
-
property
Gap
¶ Alias for field number 5
-
property
ID
¶ Alias for field number 0
-
property
Is_circular
¶ Alias for field number 10
-
property
Name
¶ Alias for field number 1
-
property
Note
¶ Alias for field number 7
-
property
Ontology_term
¶ Alias for field number 9
-
property
Parent
¶ Alias for field number 3
-
property
Target
¶ Alias for field number 4
-
property
-
class
ymp.gff.
Feature
(seqid, source, type, start, end, score, strand, phase, attributes)¶ Bases:
tuple
Create new instance of Feature(seqid, source, type, start, end, score, strand, phase, attributes)
-
property
attributes
¶ Alias for field number 8
-
property
end
¶ Alias for field number 4
-
property
phase
¶ Alias for field number 7
-
property
score
¶ Alias for field number 5
-
property
seqid
¶ Alias for field number 0
-
property
source
¶ Alias for field number 1
-
property
start
¶ Alias for field number 3
-
property
strand
¶ Alias for field number 6
-
property
type
¶ Alias for field number 2
-
property
ymp.helpers module¶
This module contains helper functions.
Not all of these are currently in use
-
class
ymp.helpers.
OrderedDictMaker
[source]¶ Bases:
object
odict creates OrderedDict objects in a dict-literal like syntax
>>> my_ordered_dict = odict[ >>> 'key': 'value' >>> ]
Implementation: odict uses the python slice syntax which is similar to dict literals. The [] operator is implemented by overriding __getitem__. Slices passed to the operator as
object[start1:stop1:step1, start2:...]
, are passed to the implementation as a list of objects with start, stop and step members. odict simply creates an OrderedDictionary by iterating over that list.
ymp.nuc2aa module¶
ymp.snakemake module¶
Extends Snakemake Features
-
class
ymp.snakemake.
BaseExpander
[source]¶ Bases:
object
Base class for Snakemake expansion modules.
Subclasses should override the :meth:expand method if they need to work on the entire RuleInfo object or the :meth:format and :meth:expands_field methods if they intend to modify specific fields.
-
expand
(rule, item, expand_args=None, rec=- 1, cb=False)[source]¶ Expands RuleInfo object and children recursively.
Will call :meth:format (via :meth:format_annotated) on
str
items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.Set
ymp.print_rule = 1
before arule:
statement in snakefiles to enable debug logging of recursion.- Parameters
rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item
item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be
None
,str
,function
,int
,float
,dict
,list
ortuple
.expand_args – Parameters passed on late expansion (when the
dag
tries to instantiate therule
into ajob
.rec – Recursion level
-
expands_field
(field)[source]¶ Checks if this expander should expand a Rule field type
- Parameters
field – the field to check
- Returns
True if field should be expanded.
-
-
exception
ymp.snakemake.
CircularReferenceException
(deps, rule)[source]¶ Bases:
ymp.exceptions.YmpRuleError
Exception raised if parameters in rule contain a circular reference
-
class
ymp.snakemake.
ColonExpander
[source]¶ Bases:
ymp.snakemake.FormatExpander
Expander using
{:xyz:}
formatted variables.-
regex
= re.compile('\n \\{:\n (?=(\n \\s*\n (?P<name>(?:.(?!\\s*\\:\\}))*.)\n \\s*\n ))\\1\n :\\}\n ', re.VERBOSE)¶
-
spec
= '{{:{}:}}'¶
-
-
class
ymp.snakemake.
DefaultExpander
(**kwargs)[source]¶ Bases:
ymp.snakemake.InheritanceExpander
Adds default values to rules
The implementation simply makes all rules inherit from a defaults rule.
Creates DefaultExpander
Each parameter passed is considered a RuleInfo default value. Where applicable, Snakemake’s argtuples
([],{})
must be passed.
-
class
ymp.snakemake.
ExpandableWorkflow
(*args, **kwargs)[source]¶ Bases:
snakemake.workflow.Workflow
Adds hook for additional rule expansion methods to Snakemake
Constructor for ExpandableWorkflow overlay attributes
This may be called on an already initialized Workflow object.
-
classmethod
activate
()[source]¶ Installs the ExpandableWorkflow
Replaces the Workflow object in the snakemake.workflow module with an instance of this class and initializes default expanders (the snakemake syntax).
-
add_rule
(name=None, lineno=None, snakefile=None, checkpoint=False)[source]¶ Add a rule.
- Parameters
name – name of the rule
lineno – line number within the snakefile where the rule was defined
snakefile – name of file in which rule was defined
-
get_rule
(name=None)[source]¶ Get rule by name. If name is none, the last created rule is returned.
- Parameters
name – the name of the rule
-
global_workflow
= <ymp.snakemake.ExpandableWorkflow object>¶
-
classmethod
load_workflow
(snakefile='/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/rules/Snakefile')[source]¶
-
classmethod
-
class
ymp.snakemake.
FormatExpander
[source]¶ Bases:
ymp.snakemake.BaseExpander
Expander using a custom formatter object.
-
class
Formatter
(expander)[source]¶ Bases:
ymp.string.ProductFormatter
-
regex
= re.compile('\n \\{\n (?=(\n (?P<name>[^{}]+)\n ))\\1\n \\}\n ', re.VERBOSE)¶
-
spec
= '{{{}}}'¶
-
class
-
exception
ymp.snakemake.
InheritanceException
(msg, rule, parent, include=None, lineno=None, snakefile=None)[source]¶ Bases:
snakemake.exceptions.RuleException
Exception raised for errors during rule inheritance
Creates a new instance of RuleException.
Arguments message – the exception message include – iterable of other exceptions to be included lineno – the line the exception originates snakefile – the file the exception originates
-
class
ymp.snakemake.
InheritanceExpander
[source]¶ Bases:
ymp.snakemake.BaseExpander
Adds class-like inheritance to Snakemake rules
To avoid redundancy between closely related rules, e.g. rules for single ended and paired end data, YMP allows Snakemake rules to inherit from another rule.
Example
Derived rules are always created with an implicit
ruleorder
statement, making Snakemake prefer the parent rule if either parent or child rule could be used to generate the requested output file(s).Derived rules initially contain the same attributes as the parent rule. Each attribute assigned to the child rule overrides the matching attribute in the parent. Where attributes may contain named and unnamed values, specifying a named value overrides only the value of that name while specifying an unnamed value overrides all unnamed values in the parent attribute.
-
KEYWORD
= 'ymp: extends'¶ Comment keyword enabling inheritance
-
expand
(rule, ruleinfo)[source]¶ Expands RuleInfo object and children recursively.
Will call :meth:format (via :meth:format_annotated) on
str
items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.Set
ymp.print_rule = 1
before arule:
statement in snakefiles to enable debug logging of recursion.- Parameters
rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item
item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be
None
,str
,function
,int
,float
,dict
,list
ortuple
.expand_args – Parameters passed on late expansion (when the
dag
tries to instantiate therule
into ajob
.rec – Recursion level
-
-
class
ymp.snakemake.
NamedList
(fromtuple=None, **kwargs)[source]¶ Bases:
snakemake.io.Namedlist
Extended version of Snakemake’s
Namedlist
Fixes array assignment operator: Writing a field via
[]
operator updates the value accessed via.
operator.Adds
fromtuple
to constructor: Builds from Snakemake’s typial(args, kwargs)
tuples as present in ruleinfo structures.Adds
update_tuple
method: Updates values in(args,kwargs)
tuples as present inruleinfo
structures.
Create the object.
Arguments toclone – another Namedlist that shall be cloned fromdict – a dict that shall be converted to a
Namedlist (keys become names)
-
class
ymp.snakemake.
RecursiveExpander
[source]¶ Bases:
ymp.snakemake.BaseExpander
Recursively expands
{xyz}
wildcards in Snakemake rules.-
expands_field
(field)[source]¶ Returns true for all fields but
shell:
,message:
andwildcard_constraints
.We don’t want to mess with the regular expressions in the fields in
wildcard_constraints:
, and there is little use in expandingmessage:
orshell:
as these already have all wildcards applied just before job execution (byformat_wildcards()
).
-
-
class
ymp.snakemake.
SnakemakeExpander
[source]¶ Bases:
ymp.snakemake.BaseExpander
Expand wildcards in strings returned from functions.
Snakemake does not do this by default, leaving wildcard expansion to the functions provided themselves. Since we never want
{input}
to be in a string returned as a file, we expand those always.
-
class
ymp.snakemake.
WorkflowObject
(*args, **kwargs)[source]¶ Bases:
object
Base for extension classes defined from snakefiles
This currently encompasses
ymp.env.Env
andymp.stage.Stage
.This mixin sets the properties
filename
andlineno
according to the definition source in the rules file. It also maintains a registry within the Snakemake workflow object and provides an accessor method to this registry.-
property
defined_in
¶
-
property
-
ymp.snakemake.
print_ruleinfo
(rule, ruleinfo, func=<bound method Logger.debug of <Logger ymp.snakemake (WARNING)>>)[source]¶ Logs contents of Rule and RuleInfo objects.
-
ymp.snakemake.
ruleinfo_fields
= {'benchmark': {'apply_wildcards': True, 'format': 'string'}, 'conda_env': {'apply_wildcards': True, 'format': 'string'}, 'container_img': {'format': 'string'}, 'docstring': {'format': 'string'}, 'func': {'format': 'callable'}, 'input': {'apply_wildcards': True, 'format': 'argstuple', 'funcparams': ('wildcards',)}, 'log': {'apply_wildcards': True, 'format': 'argstuple'}, 'message': {'format': 'string', 'format_wildcards': True}, 'norun': {'format': 'bool'}, 'output': {'apply_wildcards': True, 'format': 'argstuple'}, 'params': {'apply_wildcards': True, 'format': 'argstuple', 'funcparams': ('wildcards', 'input', 'resources', 'output', 'threads')}, 'priority': {'format': 'numeric'}, 'resources': {'format': 'argstuple', 'funcparams': ('wildcards', 'input', 'attempt', 'threads')}, 'script': {'format': 'string'}, 'shadow_depth': {'format': 'string_or_true'}, 'shellcmd': {'format': 'string', 'format_wildcards': True}, 'threads': {'format': 'int', 'funcparams': ('wildcards', 'input', 'attempt', 'threads')}, 'version': {'format': 'object'}, 'wildcard_constraints': {'format': 'argstuple'}, 'wrapper': {'format': 'string'}}¶ describes attributes of
snakemake.workflow.RuleInfo
ymp.snakemakelexer module¶
ymp.snakemakelexer¶
-
class
ymp.snakemakelexer.
SnakemakeLexer
(*args, **kwds)[source]¶ Bases:
pygments.lexers.python.PythonLexer
-
name
= 'Snakemake'¶
-
tokens
= {'globalkeyword': [(<pygments.lexer.words object>, Token.Keyword)], 'root': [('(rule|checkpoint)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'rulename'), 'rulekeyword', 'globalkeyword', ('\\n', Token.Text), ('^(\\s*)([rRuUbB]{,2})("""(?:.|\\n)*?""")', <function bygroups.<locals>.callback>), ("^(\\s*)([rRuUbB]{,2})('''(?:.|\\n)*?''')", <function bygroups.<locals>.callback>), ('\\A#!.+$', Token.Comment.Hashbang), ('#.*$', Token.Comment.Single), ('\\\\\\n', Token.Text), ('\\\\', Token.Text), 'keywords', ('(def)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'funcname'), ('(class)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'classname'), ('(from)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'fromimport'), ('(import)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'import'), 'expr'], 'rulekeyword': [(<pygments.lexer.words object>, Token.Keyword)], 'rulename': [('[a-zA-Z_]\\w*', Token.Name.Class, '#pop')]}¶
-
ymp.sphinxext module¶
This module contains a Sphinx extension for documenting YMP stages and Snakemake rules.
The SnakemakeDomain
(name sm) provides the following directives:
-
.. sm:rule::
name
¶ Describes a
Snakemake rule
Both directives accept an optional source
parameter. If given, a
link to the source code of the stage or rule definition will be added.
The format of the string passed is filename:line
. Referenced
Snakefiles will be highlighted with pygments and added to the
documentation when building HTML.
The extension also provides an autodoc-like directive:
-
.. autosnake::
filename
¶ Generates documentation from Snakefile
filename
.
-
class
ymp.sphinxext.
AutoSnakefileDirective
(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶ Bases:
docutils.parsers.rst.Directive
Implements RSt directive
.. autosnake:: filename
The directive extracts docstrings from rules in snakefile and auto-generates documentation.
-
ymp.sphinxext.
BASEPATH
= '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src'¶ Path in which YMP package is located
- Type
-
class
ymp.sphinxext.
DomainTocTreeCollector
[source]¶ Bases:
sphinx.environment.collectors.EnvironmentCollector
Add Sphinx Domain entries to the TOC
-
clear_doc
(app, env, docname)[source]¶ Clear data from environment
If we have cached data in environment for document
docname
, we should clear it here.- Return type
None
-
merge_other
(app, env, docnames, other)[source]¶ Merge with results from parallel processes
Called if Sphinx is processing documents in parallel. We should merge this from
other
intoenv
for alldocnames
.- Return type
None
-
process_doc
(app, doctree)[source]¶ Process
doctree
This is called by
read-doctree
, so after the doctree has been loaded. The signal is processed in registered first order, so we are called after built-in extensions, such as thesphinx.environment.collectors.toctree
extension building the TOC.- Return type
None
-
select_doc_nodes
(doctree)[source]¶ Select the nodes for which entries in the TOC are desired
This is a separate method so that it might be overriden by subclasses wanting to add other types of nodes to the TOC.
- Return type
List
[Node
]
-
-
class
ymp.sphinxext.
SnakemakeDomain
(env)[source]¶ Bases:
sphinx.domains.Domain
Snakemake language domain
-
data_version
= 0¶
-
directives
= {'rule': <class 'ymp.sphinxext.SnakemakeRule'>, 'stage': <class 'ymp.sphinxext.YmpStage'>}¶
-
get_objects
()[source]¶ Return an iterable of “object descriptions”.
Object descriptions are tuples with six items:
name
Fully qualified name.
dispname
Name to display when searching/linking.
type
Object type, a key in
self.object_types
.docname
The document where it is to be found.
anchor
The anchor name for the object.
priority
How “important” the object is (determines placement in search results). One of:
1
Default priority (placed before full-text matches).
0
Object is important (placed before default-priority objects).
2
Object is unimportant (placed after full-text matches).
-1
Object should not show up in search at all.
-
initial_data
= {'objects': {}}¶
-
label
= 'Snakemake'¶
-
name
= 'sm'¶
-
object_types
= {'rule': <sphinx.domains.ObjType object>, 'stage': <sphinx.domains.ObjType object>}¶
-
resolve_xref
(env, fromdocname, builder, typ, target, node, contnode)[source]¶ Resolve the pending_xref node with the given typ and target.
This method should return a new node, to replace the xref node, containing the contnode which is the markup content of the cross-reference.
If no resolution can be found, None can be returned; the xref node will then given to the :event:`missing-reference` event, and if that yields no resolution, replaced by contnode.
The method can also raise
sphinx.environment.NoUri
to suppress the :event:`missing-reference` event being emitted.
-
roles
= {'rule': <sphinx.roles.XRefRole object>, 'stage': <sphinx.roles.XRefRole object>}¶
-
-
class
ymp.sphinxext.
SnakemakeRule
(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶ Bases:
ymp.sphinxext.YmpObjectDescription
Directive
sm:rule::
describing a Snakemake rule-
typename
= 'rule'¶
-
-
class
ymp.sphinxext.
YmpObjectDescription
(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶ Bases:
sphinx.directives.ObjectDescription
Base class for RSt directives in SnakemakeDomain
Since this inherhits from Sphinx’ ObjectDescription, content generated by the directive will always be inside an addnodes.desc.
- Parameters
source – Specify source position as
file:line
to create link
-
add_target_and_index
(name, sig, signode)[source]¶ Add cross-reference IDs and entries to
self.indexnode
- Return type
None
-
handle_signature
(sig, signode)[source]¶ Parse rule signature sig into RST nodes and append them to signode.
The retun value identifies the object and is passed to
add_target_and_index()
unchanged
-
option_spec
= {'source': <function unchanged>}¶
-
typename
= '[object name]'¶
-
class
ymp.sphinxext.
YmpStage
(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶ Bases:
ymp.sphinxext.YmpObjectDescription
Directive
sm:stage::
describing an YMP stage-
typename
= 'stage'¶
-
ymp.string module¶
-
exception
ymp.string.
FormattingError
(message, fieldname)[source]¶ Bases:
AttributeError
-
class
ymp.string.
GetNameFormatter
[source]¶ Bases:
string.Formatter
-
class
ymp.string.
OverrideJoinFormatter
[source]¶ Bases:
string.Formatter
Formatter with overridable join method
The default formatter joins all arguments with
"".join(args)
. This class overrides_vformat()
with identical code, changing only that line to one that can be overridden by a derived class.
-
class
ymp.string.
PartialFormatter
[source]¶ Bases:
string.Formatter
Formats what it can and leaves the remainder untouched
-
class
ymp.string.
ProductFormatter
[source]¶ Bases:
ymp.string.OverrideJoinFormatter
String Formatter that creates a list of strings each expanded using one point in the cartesian product of all replacement values.
If none of the arguments evaluate to lists, the result is a string, otherwise it is a list.
>>> ProductFormatter().format("{A} and {B}", A=[1,2], B=[3,4]) "1 and 3" "1 and 4" "2 and 3" "2 and 4"
-
class
ymp.string.
RegexFormatter
(regex)[source]¶ Bases:
string.Formatter
String Formatter accepting a regular expression defining the format of the expanded tags.
ymp.util module¶
-
ymp.util.
R
(code='', **kwargs)[source]¶ Execute R code
This function executes the R code given as a string. Additional arguments are injected into the R environment. The value of the last R statement is returned.
The function requires rpy2 to be installed.
- Parameters
- Yields
value of last R statement
>>> R("1*1", input=input)
-
ymp.util.
file_not_empty
(fn)[source]¶ Checks is a file is not empty, accounting for gz mininum size 20
-
ymp.util.
filter_out_empty
(*args)[source]¶ Removes empty sets of files from input file lists.
Takes a variable number of file lists of equal length and removes indices where any of the files is empty. Strings are converted to lists of length 1.
Returns a generator tuple.
Example: r1, r2 = filter_out_empty(input.r1, input.r2)
ymp.yaml module¶
-
class
ymp.yaml.
AttrItemAccessMixin
[source]¶ Bases:
object
Mixin class mapping dot to bracket access
Added to classes implementing __getitem__, __setitem__ and __delitem__, this mixin will allow acessing items using dot notation. I.e. “object.xyz” is translated to “object[xyz]”.
-
exception
ymp.yaml.
LayeredConfAccessError
[source]¶ Bases:
ymp.yaml.LayeredConfError
,KeyError
,IndexError
Can’t access
-
class
ymp.yaml.
LayeredConfProxy
(maps, parent=None, key=None)[source]¶ Bases:
ymp.yaml.MultiMapProxy
Layered configuration
-
exception
ymp.yaml.
LayeredConfWriteError
[source]¶ Bases:
ymp.yaml.LayeredConfError
Can’t write
-
class
ymp.yaml.
MultiMapProxy
(maps, parent=None, key=None)[source]¶ Bases:
collections.abc.Mapping
,ymp.yaml.MultiProxy
,ymp.yaml.AttrItemAccessMixin
Mapping Proxy for layered containers
-
class
ymp.yaml.
MultiMapProxyItemsView
(mapping)[source]¶ Bases:
ymp.yaml.MultiMapProxyMappingView
,collections.abc.ItemsView
ItemsView for MultiMapProxy
-
class
ymp.yaml.
MultiMapProxyKeysView
(mapping)[source]¶ Bases:
ymp.yaml.MultiMapProxyMappingView
,collections.abc.KeysView
KeysView for MultiMapProxy
-
class
ymp.yaml.
MultiMapProxyMappingView
(mapping)[source]¶ Bases:
collections.abc.MappingView
MappingView for MultiMapProxy
-
class
ymp.yaml.
MultiMapProxyValuesView
(mapping)[source]¶ Bases:
ymp.yaml.MultiMapProxyMappingView
,collections.abc.ValuesView
ValuesView for MultiMapProxy
-
class
ymp.yaml.
MultiProxy
(maps, parent=None, key=None)[source]¶ Bases:
object
Base class for layered container structure
-
class
ymp.yaml.
MultiSeqProxy
(maps, parent=None, key=None)[source]¶ Bases:
collections.abc.Sequence
,ymp.yaml.MultiProxy
,ymp.yaml.AttrItemAccessMixin
Sequence Proxy for layered containers
-
ymp.yaml.
load
(files)[source]¶ Load configuration files
Creates a
LayeredConfProxy
configuration object from a set of YAML files.