ymp package


Access the current YMP configuration object.

This object might change once during normal execution: it is deleted before passing control to Snakemake. During unit test execution the object is deleted between all tests.

Return type


ymp.print_rule = 0

Set to 1 to show the YMP expansion process as it is applied to the next Snakemake rule definition.

>>> ymp.print_rule = 1
>>> rule broken:
>>>   ...
>>> ymp make broken -vvv
ymp.snakemake_versions = ['6.0.5', '6.1.0', '6.1.1', '6.2.1']

List of versions this version of YMP has been verified to work with



ymp.blast module

Parsers for blast output formats 6 (CSV) and 7 (CSV with comments between queries).

class ymp.blast.BlastBase[source]

Bases: object

Base class for BLAST readers and writers

FIELD_MAP = {'% identity': 'pident', 'alignment length': 'length', 'bit score': 'bitscore', 'evalue': 'evalue', 'gap opens': 'gapopen', 'mismatches': 'mismatch', 'q. end': 'qend', 'q. start': 'qstart', 'query acc.': 'qacc', 'query frame': 'qframe', 'query length': 'qlen', 's. end': 'send', 's. start': 'sstart', 'sbjct frame': 'sframe', 'score': 'score', 'subject acc.': 'sacc', 'subject strand': 'sstrand', 'subject tax ids': 'staxids', 'subject title': 'stitle'}

Map between field short and long names

FIELD_REV_MAP = {'bitscore': 'bit score', 'evalue': 'evalue', 'gapopen': 'gap opens', 'length': 'alignment length', 'mismatch': 'mismatches', 'pident': '% identity', 'qacc': 'query acc.', 'qend': 'q. end', 'qframe': 'query frame', 'qlen': 'query length', 'qstart': 'q. start', 'sacc': 'subject acc.', 'score': 'score', 'send': 's. end', 'sframe': 'sbjct frame', 'sstart': 's. start', 'sstrand': 'subject strand', 'staxids': 'subject tax ids', 'stitle': 'subject title'}

Reversed map from short to long name

FIELD_TYPE = {'bitscore': <class 'float'>, 'evalue': <class 'float'>, 'gapopen': <class 'int'>, 'length': <class 'int'>, 'mismatch': <class 'int'>, 'pident': <class 'float'>, 'qend': <class 'int'>, 'qframe': <class 'int'>, 'qlen': <class 'int'>, 'qstart': <class 'int'>, 'score': <class 'float'>, 'send': <class 'int'>, 'sframe': <class 'int'>, 'sstart': <class 'int'>, 'staxids': <function BlastBase.tupleofint>, 'stitle': <class 'str'>}

Map defining types of fields

class ymp.blast.BlastParser[source]

Bases: ymp.blast.BlastBase

Base class for BLAST readers

class ymp.blast.BlastWriter[source]

Bases: ymp.blast.BlastBase

Base class for BLAST writers

class ymp.blast.Fmt6Parser(fileobj)[source]

Bases: ymp.blast.BlastParser

Parser for BLAST format 6 (CSV)


alias of ymp.blast.BlastHit

field_types = [None, None, <class 'float'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'float'>, <class 'float'>]
fields = ['qseqid', 'sseqid', 'pident', 'length', 'mismatch', 'gapopen', 'qstart', 'qend', 'sstart', 'send', 'evalue', 'bitscore']

Default field types

class ymp.blast.Fmt7Parser(fileobj)[source]

Bases: ymp.blast.BlastParser

Parses BLAST results in format ‘7’ (CSV with comments)

PAT_DATABASE = '# Database: '
PAT_FIELDS = '# Fields: '
PAT_HITSFOUND = ' hits found'
PAT_QUERY = '# Query: '

Returns list of available field names

Format 7 specifies which columns it contains in comment lines, allowing this parser to be agnostic of the selection of columns made when running BLAST.

Return type



List of field names (e.g. ['sacc', 'qacc', 'evalue'])


Returns True if the current hit is the first hit for the current query

Return type


class ymp.blast.Fmt7Writer(fileobj)[source]

Bases: ymp.blast.BlastWriter


Writes BLAST7 format header

ymp.blast.reader(fileobj, t=7)[source]

Creates a reader for files in BLAST format

>>> with open(blast_file) as infile:
>>>    reader = blast.reader(infile)
>>>    for hit in reader:
>>>       print(hit)
  • fileobj – iterable yielding lines in blast format

  • t (int) – number of blast format type

Return type


ymp.blast.writer(fileobj, t=7)[source]

Creates a writer for files in BLAST format

>>> with open(blast_file) as outfile:
>>>    writer = blast.writer(outfile)
>>>    for hit in hits:
>>>       writer.write_hit(hit)
Return type


ymp.blast2gff module

ymp.cluster module

Module handling talking to cluster management systems

>>> python -m ymp.cluster slurm status <jobid>
class ymp.cluster.ClusterMS[source]

Bases: object

class ymp.cluster.Lsf[source]

Bases: ymp.cluster.ClusterMS

Talking to LSF

states = {'DONE': 'success', 'EXIT': 'failed', 'PEND': 'running', 'POST_DONE': 'success', 'POST_ERR': 'failed', 'PSUSP': 'running', 'RUN': 'running', 'SSUSP': 'running', 'UNKWN': 'running', 'USUSP': 'running', 'WAIT': 'running'}
static status(jobid)[source]
static submit(args)[source]
class ymp.cluster.Slurm[source]

Bases: ymp.cluster.ClusterMS

Talking to Slurm

states = {'BOOT_FAIL': 'failed', 'CANCELLED': 'failed', 'COMPLETED': 'success', 'COMPLETING': 'running', 'CONFIGURING': 'running', 'DEADLINE': 'failed', 'FAILED': 'failed', 'NODE_FAIL': 'failed', 'PENDING': 'running', 'PREEMPTED': 'failed', 'RESIZING': 'running', 'REVOKED': 'running', 'RUNNING': 'running', 'SPECIAL_EXIT': 'running', 'SUSPENDED': 'running', 'TIMEOUT': 'failed'}
static status(jobid)[source]

Print status of job @param jobid to stdout (as needed by snakemake)

Anectotal benchmarking shows 200ms per invocation, half used by Python startup and half by calling sacct. Using scontrol show job instead of sacct -pbs is faster by 80ms, but finished jobs are purged after unknown time window.

ymp.cluster.error(*args, **kwargs)[source]

ymp.common module

Collection of shared utility classes and methods

class ymp.common.AttrDict[source]

Bases: dict

AttrDict adds accessing stored keys as attributes to dict

class ymp.common.Cache(root)[source]

Bases: object

get_cache(name, clean=False, *args, **kwargs)[source]
load(cache, key)[source]
store(cache, key, obj)[source]
class ymp.common.CacheDict(cache, name, *args, loadfunc=None, itemloadfunc=None, itemdata=None, **kwargs)[source]

Bases: ymp.common.AttrDict

get(key, default=None)[source]

Return the value for key if key is in the dictionary, else default.

items()a set-like object providing a view on D’s items[source]
keys()a set-like object providing a view on D’s keys[source]
values()an object providing a view on D’s values[source]
class ymp.common.MkdirDict[source]

Bases: ymp.common.AttrDict

Creates directories as they are requested

class ymp.common.NoCache(root)[source]

Bases: object

get_cache(name, clean=False, *args, **kwargs)[source]
load(_cache, _key)[source]
store(cache, key, obj)[source]

Wrap obj in a list as needed


Flatten lists without turning strings into letters

ymp.common.format_number(num, unit='')[source]
Return type


ymp.common.format_time(seconds, unit=None)[source]

Prints time in SLURM format

Return type



Check if object is container, considering strings not containers


Basic 1k 1m 1g 1t parsing.

  • assumes base 2

  • returns “byte” value

  • accepts “1kib”, “1kb” or “1k”


Parses time in “SLURM” format

<minutes> <minutes>:<seconds> <hours>:<minutes>:<seconds> <days>-<hours> <days>-<hours>:<minutes> <days>-<hours>:<minutes>:<seconds>

Return type


ymp.config module

class ymp.config.ConfigExpander(config_mgr)[source]

Bases: ymp.snakemake.ColonExpander

class Formatter(expander)[source]

Bases: ymp.snakemake.FormatExpander.Formatter, ymp.string.PartialFormatter

get_value(field_name, args, kwargs)[source]

Checks if this expander should expand a Rule field type


field – the field to check


True if field should be expanded.

class ymp.config.ConfigMgr(root, conffiles)[source]

Bases: object

Manages workflow configuration

This is a singleton object of which only one instance should be around at a given time. It is available in the rules files as icfg and via ymp.get_config() elsewhere.

ConfigMgr loads and maintains the workflow configuration as given in the ymp.yml files located in the workflow root directory, the user config folder (~/.ymp) and the installation etc folder.

CONF_DEFAULT_FNAME = '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/src/ymp/etc/defaults.yml'
CONF_FNAME = 'ymp.yml'
CONF_USER_FNAME = '/home/docs/.ymp/ymp.yml'
KEY_LIMITS = 'resource_limits'
KEY_PIPELINES = 'pipelines'
KEY_PROJECTS = 'projects'
KEY_REFERENCES = 'references'
RULE_MAIN_FNAME = '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/src/ymp/rules/Snakefile'
property absdir

Dictionary of absolute paths of named YMP directories

classmethod activate()[source]
property cluster

The YMP cluster configuration.

property conda
property dir

Dictionary of relative paths of named YMP directories

The directory paths are relative to the YMP root workdir.

property ensuredir

Dictionary of absolute paths of named YMP directories

Directories will be created on the fly as they are requested.

expand(item, **kwargs)[source]
classmethod find_config()[source]

Locates ymp config files and ymp root

The root ymp work dir is determined as the first (parent) directory containing a file named ConfigMgr.CONF_FNAME (default ymp.yml).

The stack of config files comprises 1. the default config ConfigMgr.CONF_DEFAULT_FNAME (etc/defaults.yml in the ymp package directory), 2. the user config ConfigMgr.CONF_USER_FNAME (~/.ymp/ymp.yml) and 3. the yml.yml in the ymp root.


Root working directory conffiles: list of active configuration files

Return type


classmethod instance()[source]

Returns the active Ymp ConfigMgr instance

property pairnames
property pipeline

Configure pipelines

property platform

Name of current platform (macos or linux)

property ref

Configure references

property rules
property shell

The shell used by YMP

Change by adding e.g. shell: /path/to/shell to ymp.yml.

property snakefiles

Snakefiles used under this config in parsing order

classmethod unload()[source]
property workflow
class ymp.config.OverrideExpander(cfgmgr)[source]

Bases: ymp.snakemake.BaseExpander

Override rule parameters, resources and threads using config values


Set the wordsize parameter in the bmtagger_bitmask rule to 12:

        wordsize: 12
        memory: 15G
      threads: 12
expand(rule, ruleinfo, **kwargs)[source]

Expands RuleInfo object and children recursively.

Will call :meth:format (via :meth:format_annotated) on str items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.

Set ymp.print_rule = 1 before a rule: statement in snakefiles to enable debug logging of recursion.

  • rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item

  • item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be None, str, function, int, float, dict, list or tuple.

  • expand_args – Parameters passed on late expansion (when the dag tries to instantiate the rule into a job.

  • rec – Recursion level

types = {'params': typing.Mapping, 'resources': typing.Mapping, 'threads': <class 'int'>}
class ymp.config.ResourceLimitsExpander(cfg)[source]

Bases: ymp.snakemake.BaseExpander

Allows adjusting resources to local compute environment

Each config item defines processing for an item in resources: or the special resource``threads``. Each item may have a default value filled in for rules not defining the resource, min and max defining the lower and uppeer bounds, and a scale value applied to the default to adjust resources up or down globally. Values in time or “human readable” format mabe parsed specially by passing the format values time or number, respectively. These values will also be reformatted, with the optional paramter unit defining the output format (k/g/m/t for numbers and minutes/seconds for time). Additional resource values may be generated from configured onces using the from keyword (e.g. to provide both mem_mb and mem_gb from a generic mem value.

static adjust_value(value, default, scale, minimum, maximum)[source]

Applies default, scale, minimum and maximum to a numeric value)

Return type


expand(rule, ruleinfo, **kwargs)[source]

Expands RuleInfo object and children recursively.

Will call :meth:format (via :meth:format_annotated) on str items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.

Set ymp.print_rule = 1 before a rule: statement in snakefiles to enable debug logging of recursion.

  • rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item

  • item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be None, str, function, int, float, dict, list or tuple.

  • expand_args – Parameters passed on late expansion (when the dag tries to instantiate the rule into a job.

  • rec – Recursion level

Return type



Checks if this expander should expand a Rule field type


field (str) – the field to check

Return type



True if field should be expanded.

formatters = {'number': <function format_number>, 'time': <function format_time>}

Parses limits config

parsers = {'number': <function parse_number>, 'time': <function parse_time>}

ymp.dna module


ymp.download module

class ymp.download.DownloadThread[source]

Bases: object

get(url, dest, md5)[source]
class ymp.download.FileDownloader(block_size=4096, timeout=300, parallel=4, loglevel=30, alturls=None, retry=3)[source]

Bases: object

Manages download of a set of URLs

Downloads happen concurrently using asyncronous network IO.

  • block_size (int) – Byte size of chunks to download

  • timeout (int) – Aiohttp cumulative timeout

  • parallel (int) – Number of files to download in parallel

  • loglevel (int) – Log level for messages send to logging (Errors are send with loglevel+10)

  • alturls – List of regexps modifying URLs

  • retry (int) – Number of times to retry download

error(msg, *args, **kwargs)[source]

Send error to logger

Message is sent with a log level 10 higher than the default for this object.

Return type


get(urls, dest, md5s=None)[source]

Download a list of URLs

Return type


log(msg, *args, modlvl=0, **kwargs)[source]

Send message to logger

Honors loglevel set for the FileDownloader object.

  • msg (str) – The log message

  • modlvl (int) – Added to default logging level for object

Return type


static make_bar_format(desc_width=20, count_width=0, rate=False, eta=False, have_total=True)[source]

Construct bar_format for tqdm

  • desc_width (int) – minimum space allocated for description

  • count_width (int) – min space for counts

  • rate (bool) – show rate to right of progress bar

  • eta (bool) – show eta to right of progress bar

  • have_total (bool) – whether a total exists (required to add percentage)

Return type


ymp.env module

This module manages the conda environments.

class ymp.env.CondaPathExpander(config, *args, **kwargs)[source]

Bases: ymp.snakemake.BaseExpander

Applies search path for conda environment specifications

File names supplied via rule: conda: "some.yml" are replaced with absolute paths if they are found in any searched directory. Each search_paths entry is appended to the directory containing the top level Snakefile and the directory checked for the filename. Thereafter, the stack of including Snakefiles is traversed backwards. If no file is found, the original name is returned.


Checks if this expander should expand a Rule field type


field – the field to check


True if field should be expanded.

format(conda_env, *args, **kwargs)[source]

Format item using *args and **kwargs

class ymp.env.Env(env_file=None, workflow=None, env_dir=None, container_img=None, cleanup=None, name=None, packages=None, base='none', channels=None)[source]

Bases: ymp.snakemake.WorkflowObject, snakemake.deployment.conda.Env

Represents YMP conda environment

Snakemake expects the conda environments in a per-workflow directory configured by conda_prefix. YMP sets this value by default to ~/.ymp/conda, which has a greater chance of being on the same file system as the conda cache, allowing for hard linking of environment files.

Within the folder conda_prefix, each environment is created in a folder named by the hash of the environment definition file’s contents and the conda_prefix path. This class inherits from snakemake.deployment.conda.Env to ensure that the hash we use is identical to the one Snakemake will use during workflow execution.

The class provides additional features for updating environments, creating environments dynamically and executing commands within those environments.


This is not called from within the execution. Snakemake instanciates its own Env object purely based on the filename.

Creates an inline defined conda environment

  • name (Optional[str]) – Name of conda environment (and basename of file)

  • packages (Union[list, str, None]) – package(s) to be installed into environment. Version constraints can be specified in each package string separated from the package name by whitespace. E.g. "blast =2.6*"

  • channels (Union[list, str, None]) – channel(s) to be selected for the environment

  • base (str) – Select a set of default channels and packages to be added to the newly created environment. Sets are defined in conda.defaults in yml.yml

create(dryrun=False, reinstall=False, nospec=False, noarchive=False)[source]

Ensure the conda environment has been created

Inherits from snakemake.deployment.conda.Env.create

Behavior of super class
  • Resolve remote file

  • If containerized, check environment path exists and return if true

  • Check for interrupted env create, delete if so

  • Return if environment exists

  • Install from archive if env_archive exists

  • Install using self.frontent if not_careful

Handling pre-computed environment specs

In addition to freezing environments by maintaining a copy of the package binaries, we allow maintaining a copy of the package binary URLs, from which the archive folder is populated on demand. We just download those to self.archive and pass on.

export(stream, typ='yml')[source]

Freeze environment

static get_installed_env_hashes()[source]
property installed

Execute command in environment

Returns exit code of command run.


Update conda environment

ymp.exceptions module

Exceptions raised by YMP

exception ymp.exceptions.YmpConfigError(obj, msg, key=None)[source]

Bases: ymp.exceptions.YmpLocateableError

Indicates an error in the ymp.yml config files

  • obj (object) – Subtree of config causing error

  • msg (str) – The message to display

  • key (Optional[object]) – Key indicating part of obj causing error

  • exc – Upstream exception causing error


Retrieve filename and linenumber from object associated with exception


Tuple of filename and linenumber

exception ymp.exceptions.YmpException[source]

Bases: Exception

Base class of all YMP Exceptions

exception ymp.exceptions.YmpLocateableError(obj, msg, show_includes=True)[source]

Bases: ymp.exceptions.YmpPrettyException

Errors that have a file location to be shown

  • obj (object) – The object causing the exception. Must have lineno and filename as these will be shown as part of the error message on the command line.

  • msg (str) – The message to display

  • show_includes (bool) – Whether or not the “stack” of includes should be printed.


Retrieve filename and linenumber from object associated with exception

Return type

Tuple[str, int]


Tuple of filename and linenumber

Return type


exception ymp.exceptions.YmpPrettyException(message)[source]

Bases: ymp.exceptions.YmpException, click.exceptions.ClickException, snakemake.exceptions.WorkflowError

Exception that does not lead to stack trace on CLI

Inheriting from ClickException makes click print only the self.msg value of the exception, rather than allowing Python to print a full stack trace.

This is useful for exceptions indicating usage or configuration errors. We use this, instead of click.UsageError and friends so that the exceptions can be caught and handled explicitly where needed.

Note that click will call the show method on this object to print the exception. The default implementation from click will just prefix the msg with Error:.

FIXME: This does not work if the exception is raised from within

the snakemake workflow as snakemake.snakemake catches and reformats exceptions.

rule = None
snakefile = None
exception ymp.exceptions.YmpRuleError(obj, msg, show_includes=True)[source]

Bases: ymp.exceptions.YmpLocateableError

Indicates an error in the rules files

This could e.g. be a Stage or Environment defined twice.

exception ymp.exceptions.YmpStageError(msg)[source]

Bases: ymp.exceptions.YmpPrettyException

Indicates an error in the requested stage stack

Return type


exception ymp.exceptions.YmpSystemError(message)[source]

Bases: ymp.exceptions.YmpPrettyException

Indicates problem running YMP with available system software

exception ymp.exceptions.YmpUsageError(message)[source]

Bases: ymp.exceptions.YmpPrettyException

General usage error

exception ymp.exceptions.YmpWorkflowError(message)[source]

Bases: ymp.exceptions.YmpPrettyException

Indicates an error during workflow execution

E.g. failures to expand dynamic variables

ymp.gff module

Implements simple reader and writer for GFF (general feature format) files.


  • only supports one version, GFF 3.2.3.

  • no escaping

class ymp.gff.Attributes(ID, Name, Alias, Parent, Target, Gap, Derives_From, Note, Dbxref, Ontology_term, Is_circular)

Bases: tuple

Create new instance of Attributes(ID, Name, Alias, Parent, Target, Gap, Derives_From, Note, Dbxref, Ontology_term, Is_circular)


Alias for field number 2


Alias for field number 8


Alias for field number 6


Alias for field number 5


Alias for field number 0


Alias for field number 10


Alias for field number 1


Alias for field number 7


Alias for field number 9


Alias for field number 3


Alias for field number 4

class ymp.gff.Feature(seqid, source, type, start, end, score, strand, phase, attributes)

Bases: tuple

Create new instance of Feature(seqid, source, type, start, end, score, strand, phase, attributes)


Alias for field number 8


Alias for field number 4


Alias for field number 7


Alias for field number 5


Alias for field number 0


Alias for field number 1


Alias for field number 3


Alias for field number 6


Alias for field number 2

class ymp.gff.reader(fileobj)[source]

Bases: object

class ymp.gff.writer(fileobj)[source]

Bases: object


ymp.helpers module

This module contains helper functions.

Not all of these are currently in use

class ymp.helpers.OrderedDictMaker[source]

Bases: object

odict creates OrderedDict objects in a dict-literal like syntax

>>>  my_ordered_dict = odict[
>>>    'key': 'value'
>>>  ]

Implementation: odict uses the python slice syntax which is similar to dict literals. The [] operator is implemented by overriding __getitem__. Slices passed to the operator as object[start1:stop1:step1, start2:...], are passed to the implementation as a list of objects with start, stop and step members. odict simply creates an OrderedDictionary by iterating over that list.

ymp.helpers.update_dict(dst, src)[source]

Recursively update dictionary dst with src

  • Treats a list as atomic, replacing it with new list.

  • Dictionaries are overwritten by item

  • None is replaced by empty dict if necessary

ymp.map2otu module

class ymp.map2otu.MapfileParser(minid=0)[source]

Bases: object

class ymp.map2otu.emirge_info(line)[source]

Bases: object


ymp.nuc2aa module

ymp.nuc2aa.fasta_dna2aa(inf, outf)[source]

ymp.snakemake module

Extends Snakemake Features

class ymp.snakemake.BaseExpander[source]

Bases: object

Base class for Snakemake expansion modules.

Subclasses should override the :meth:expand method if they need to work on the entire RuleInfo object or the :meth:format and :meth:expands_field methods if they intend to modify specific fields.

expand(rule, item, expand_args=None, rec=- 1, cb=False)[source]

Expands RuleInfo object and children recursively.

Will call :meth:format (via :meth:format_annotated) on str items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.

Set ymp.print_rule = 1 before a rule: statement in snakefiles to enable debug logging of recursion.

  • rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item

  • item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be None, str, function, int, float, dict, list or tuple.

  • expand_args – Parameters passed on late expansion (when the dag tries to instantiate the rule into a job.

  • rec – Recursion level

expand_dict(rule, item, expand_args, rec)[source]
expand_func(rule, item, expand_args, rec, debug)[source]
expand_list(rule, item, expand_args, rec, cb)[source]
expand_ruleinfo(rule, item, expand_args, rec)[source]
expand_str(rule, item, expand_args, rec, cb)[source]
expand_tuple(rule, item, expand_args, rec, cb)[source]

Checks if this expander should expand a Rule field type


field – the field to check


True if field should be expanded.

format(item, *args, **kwargs)[source]

Format item using *args and **kwargs

format_annotated(item, expand_args)[source]

Wrapper for :meth:format preserving AnnotatedString flags

Calls :meth:format to format item into a new string and copies flags from original item.

This is used by :meth:expand

exception ymp.snakemake.CircularReferenceException(deps, rule)[source]

Bases: ymp.exceptions.YmpRuleError

Exception raised if parameters in rule contain a circular reference

class ymp.snakemake.ColonExpander[source]

Bases: ymp.snakemake.FormatExpander

Expander using {:xyz:} formatted variables.

regex = re.compile('\n        \\{:\n            (?=(\n                \\s*\n                 (?P<name>(?:.(?!\\s*\\:\\}))*.)\n                \\s*\n            ))\\1\n        :\\}\n        ', re.VERBOSE)
spec = '{{:{}:}}'
class ymp.snakemake.DefaultExpander(**kwargs)[source]

Bases: ymp.snakemake.InheritanceExpander

Adds default values to rules

The implementation simply makes all rules inherit from a defaults rule.

Creates DefaultExpander

Each parameter passed is considered a RuleInfo default value. Where applicable, Snakemake’s argtuples ([],{}) must be passed.

get_super(rule, ruleinfo)[source]

Find rule parent

  • rule (Rule) – Rule object being built

  • ruleinfo (RuleInfo) – RuleInfo object describing rule being built


name of parent rule and RuleInfo describing parent rule or (None, None).

Return type


exception ymp.snakemake.ExpandLateException[source]

Bases: Exception

class ymp.snakemake.ExpandableWorkflow(*args, **kwargs)[source]

Bases: snakemake.workflow.Workflow

Adds hook for additional rule expansion methods to Snakemake

Constructor for ExpandableWorkflow overlay attributes

This may be called on an already initialized Workflow object.

classmethod activate()[source]

Installs the ExpandableWorkflow

Replaces the Workflow object in the snakemake.workflow module with an instance of this class and initializes default expanders (the snakemake syntax).

add_rule(name=None, lineno=None, snakefile=None, checkpoint=False, allow_overwrite=False)[source]

Add a rule.

  • name – name of the rule

  • lineno – line number within the snakefile where the rule was defined

  • snakefile – name of file in which rule was defined

classmethod clear()[source]
classmethod ensure_global_workflow()[source]

Get rule by name. If name is none, the last created rule is returned.


name – the name of the rule

global_workflow = <ymp.snakemake.ExpandableWorkflow object>
classmethod load_workflow(snakefile='/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/src/ymp/rules/Snakefile')[source]
classmethod register_expanders(*expanders)[source]

Register an object the expand() function of which will be called on each RuleInfo object before it is passed on to snakemake.

rule(name=None, lineno=None, snakefile=None, checkpoint=None)[source]

Intercepts “rule:” Here we have the entire ruleinfo object

class ymp.snakemake.FormatExpander[source]

Bases: ymp.snakemake.BaseExpander

Expander using a custom formatter object.

class Formatter(expander)[source]

Bases: ymp.string.ProductFormatter

format(*args, **kwargs)[source]

Format item using *args and **kwargs

regex = re.compile('\n        \\{\n            (?=(\n                (?P<name>[^{}]+)\n            ))\\1\n        \\}\n        ', re.VERBOSE)
spec = '{{{}}}'
exception ymp.snakemake.InheritanceException(msg, rule, parent, include=None, lineno=None, snakefile=None)[source]

Bases: snakemake.exceptions.RuleException

Exception raised for errors during rule inheritance

Creates a new instance of RuleException.

Arguments message – the exception message include – iterable of other exceptions to be included lineno – the line the exception originates snakefile – the file the exception originates

class ymp.snakemake.InheritanceExpander[source]

Bases: ymp.snakemake.BaseExpander

Adds class-like inheritance to Snakemake rules

To avoid redundancy between closely related rules, e.g. rules for single ended and paired end data, YMP allows Snakemake rules to inherit from another rule.


Derived rules are always created with an implicit ruleorder statement, making Snakemake prefer the parent rule if either parent or child rule could be used to generate the requested output file(s).

Derived rules initially contain the same attributes as the parent rule. Each attribute assigned to the child rule overrides the matching attribute in the parent. Where attributes may contain named and unnamed values, specifying a named value overrides only the value of that name while specifying an unnamed value overrides all unnamed values in the parent attribute.

KEYWORD = 'ymp: extends'

Comment keyword enabling inheritance

expand(rule, ruleinfo)[source]

Expands RuleInfo object and children recursively.

Will call :meth:format (via :meth:format_annotated) on str items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.

Set ymp.print_rule = 1 before a rule: statement in snakefiles to enable debug logging of recursion.

  • rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item

  • item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be None, str, function, int, float, dict, list or tuple.

  • expand_args – Parameters passed on late expansion (when the dag tries to instantiate the rule into a job.

  • rec – Recursion level


Returns the source line defining rule

Return type


get_super(rule, ruleinfo)[source]

Find rule parent

  • rule (Rule) – Rule object being built

  • ruleinfo (RuleInfo) – RuleInfo object describing rule being built


name of parent rule and RuleInfo describing parent rule or (None, None).

Return type


class ymp.snakemake.NamedList(fromtuple=None, **kwargs)[source]

Bases: snakemake.io.Namedlist

Extended version of Snakemake’s Namedlist

  • Fixes array assignment operator: Writing a field via [] operator updates the value accessed via . operator.

  • Adds fromtuple to constructor: Builds from Snakemake’s typial (args, kwargs) tuples as present in ruleinfo structures.

  • Adds update_tuple method: Updates values in (args,kwargs) tuples as present in ruleinfo structures.

get_names(*args, **kwargs)[source]

Export get_names as public func


Update values in (args, kwargs) tuple.

The tuple must be the same as used in the constructor and must not have been modified.

class ymp.snakemake.RecursiveExpander[source]

Bases: ymp.snakemake.BaseExpander

Recursively expands {xyz} wildcards in Snakemake rules.

expand(rule, ruleinfo)[source]

Recursively expand wildcards within RuleInfo object


Returns true for all fields but shell:, message: and wildcard_constraints.

We don’t want to mess with the regular expressions in the fields in wildcard_constraints:, and there is little use in expanding message: or shell: as these already have all wildcards applied just before job execution (by format_wildcards()).

exception ymp.snakemake.RemoveValue[source]

Bases: Exception

Return to remove a value from the list

class ymp.snakemake.SnakemakeExpander[source]

Bases: ymp.snakemake.BaseExpander

Expand wildcards in strings returned from functions.

Snakemake does not do this by default, leaving wildcard expansion to the functions provided themselves. Since we never want {input} to be in a string returned as a file, we expand those always.


Checks if this expander should expand a Rule field type


field – the field to check


True if field should be expanded.

format(item, *args, **kwargs)[source]

Format item using *args and **kwargs

class ymp.snakemake.WorkflowObject(*args, **kwargs)[source]

Bases: object

Base for extension classes defined from snakefiles

This currently encompasses ymp.env.Env and ymp.stage.stage.Stage.

This mixin sets the properties filename and lineno according to the definition source in the rules file. It also maintains a registry within the Snakemake workflow object and provides an accessor method to this registry.

property defined_in

Name of file in which object was defined



classmethod get_registry(clean=False)[source]

Return all objects of this class registered with current workflow


Line number of object definition



classmethod new_registry()[source]

Add self to registry

Return type



Get active workflow, loading one if necessary


Load new workflow

ymp.snakemake.make_rule(name=None, lineno=None, snakefile=None, **kwargs)[source]
ymp.snakemake.print_ruleinfo(rule, ruleinfo, func=<bound method Logger.debug of <Logger ymp.snakemake (WARNING)>>)[source]

Logs contents of Rule and RuleInfo objects.

  • rule (Rule) – Rule object to be printed

  • ruleinfo (RuleInfo) – Matching RuleInfo object to be printed

  • func – Function used for printing (default is log.error)

ymp.snakemake.ruleinfo_fields = {'benchmark': {'apply_wildcards': True, 'format': 'string'}, 'conda_env': {'apply_wildcards': True, 'format': 'string'}, 'container_img': {'format': 'string'}, 'docstring': {'format': 'string'}, 'func': {'format': 'callable'}, 'input': {'apply_wildcards': True, 'format': 'argstuple', 'funcparams': ('wildcards',)}, 'log': {'apply_wildcards': True, 'format': 'argstuple'}, 'message': {'format': 'string', 'format_wildcards': True}, 'norun': {'format': 'bool'}, 'output': {'apply_wildcards': True, 'format': 'argstuple'}, 'params': {'apply_wildcards': True, 'format': 'argstuple', 'funcparams': ('wildcards', 'input', 'resources', 'output', 'threads')}, 'priority': {'format': 'numeric'}, 'resources': {'format': 'argstuple', 'funcparams': ('wildcards', 'input', 'attempt', 'threads')}, 'script': {'format': 'string'}, 'shadow_depth': {'format': 'string_or_true'}, 'shellcmd': {'format': 'string', 'format_wildcards': True}, 'threads': {'format': 'int', 'funcparams': ('wildcards', 'input', 'attempt', 'threads')}, 'version': {'format': 'object'}, 'wildcard_constraints': {'format': 'argstuple'}, 'wrapper': {'format': 'string'}}

describes attributes of snakemake.workflow.RuleInfo

ymp.snakemakelexer module


class ymp.snakemakelexer.SnakemakeLexer(*args, **kwds)[source]

Bases: pygments.lexers.python.PythonLexer

name = 'Snakemake'

Name of the lexer

tokens = {'globalkeyword': [(<pygments.lexer.words object>, Token.Keyword)], 'root': [('(rule|checkpoint)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'rulename'), 'rulekeyword', 'globalkeyword', inherit], 'rulekeyword': [(<pygments.lexer.words object>, Token.Keyword)], 'rulename': [('[a-zA-Z_]\\w*', Token.Name.Class, '#pop')]}

Dict of {'state': [(regex, tokentype, new_state), ...], ...}

The initial state is ‘root’. new_state can be omitted to signify no state transition. If it is a string, the state is pushed on the stack and changed. If it is a tuple of strings, all states are pushed on the stack and the current state will be the topmost. It can also be combined('state1', 'state2', ...) to signify a new, anonymous state combined from the rules of two or more existing ones. Furthermore, it can be ‘#pop’ to signify going back one step in the state stack, or ‘#push’ to push the current state on the stack again.

The tuple can also be replaced with include('state'), in which case the rules from the state named by the string are included in the current one.

ymp.sphinxext module

This module contains a Sphinx extension for documenting YMP stages and Snakemake rules.

The SnakemakeDomain (name sm) provides the following directives:

.. sm:rule:: name

Describes a Snakemake rule

.. sm:stage:: name

Describes a YMP Stage

Both directives accept an optional source parameter. If given, a link to the source code of the stage or rule definition will be added. The format of the string passed is filename:line. Referenced Snakefiles will be highlighted with pygments and added to the documentation when building HTML.

The extension also provides an autodoc-like directive:

.. autosnake:: filename

Generates documentation from Snakefile filename.

class ymp.sphinxext.AutoSnakefileDirective(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Bases: docutils.parsers.rst.Directive

Implements RSt directive .. autosnake:: filename

The directive extracts docstrings from rules in snakefile and auto-generates documentation.

has_content = False

This rule does not accept content




Load the Snakefile

Return type


parse_doc(doc, source, idt=0)[source]

Convert doc string to StringList

  • doc (str) – Documentation text

  • source (str) – Source filename

  • idt (int) – Result indentation in characters (default 0)

Return type



StringList of re-indented documentation wrapped in newlines

parse_rule(rule_name, idt=0)[source]

Convert Rule to StringList

  • rule – Rule object

  • idt (int) – Result indentation in characters (default 0)


StringList containing formatted Rule documentation

Return type


parse_stage(stage, idt=0)[source]
Return type


required_arguments = 1

This rule needs one argument (the filename)




Entry point

tpl_rule = '.. sm:rule:: {name}'

Template for generated Rule RSt



tpl_source = '   :source: {filename}:{lineno}'

Template option source



tpl_stage = '.. sm:stage:: {name}'

Template for generated Stage RSt



ymp.sphinxext.BASEPATH = '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/src'

Path in which YMP package is located



class ymp.sphinxext.CondaDomain(env)[source]

Bases: sphinx.domains.Domain

name = 'conda'

should be short, but unique


domain name

object_types: Dict[str, ObjType] = {'package': <sphinx.domains.ObjType object>}

type (usually directive) name -> ObjType instance

roles: Dict[str, Union[RoleFunction, XRefRole]] = {'package': <sphinx.roles.XRefRole object>}

role name -> role callable

class ymp.sphinxext.DomainTocTreeCollector[source]

Bases: sphinx.environment.collectors.EnvironmentCollector

Add Sphinx Domain entries to the TOC

clear_doc(app, env, docname)[source]

Clear data from environment

If we have cached data in environment for document docname, we should clear it here.

Return type


Return type


locate_in_toc(app, node)[source]
Return type


Return type


merge_other(app, env, docnames, other)[source]

Merge with results from parallel processes

Called if Sphinx is processing documents in parallel. We should merge this from other into env for all docnames.

Return type


process_doc(app, doctree)[source]

Process doctree

This is called by read-doctree, so after the doctree has been loaded. The signal is processed in registered first order, so we are called after built-in extensions, such as the sphinx.environment.collectors.toctree extension building the TOC.

Return type



Select the nodes for which entries in the TOC are desired

This is a separate method so that it might be overriden by subclasses wanting to add other types of nodes to the TOC.

Return type


select_toc_location(app, node)[source]

Select location in TOC where node should be referenced

Return type


toc_insert(docname, tocnode, node, heading)[source]
Return type


class ymp.sphinxext.SnakemakeDomain(env)[source]

Bases: sphinx.domains.Domain

Snakemake language domain


Delete objects derived from file docname

data_version = 0

data version, bump this when the format of self.data changes

directives: Dict[str, Any] = {'rule': <class 'ymp.sphinxext.SnakemakeRule'>, 'stage': <class 'ymp.sphinxext.YmpStage'>}

directive name -> directive class


Return an iterable of “object descriptions”.

Object descriptions are tuples with six items:


Fully qualified name.


Name to display when searching/linking.


Object type, a key in self.object_types.


The document where it is to be found.


The anchor name for the object.


How “important” the object is (determines placement in search results). One of:


Default priority (placed before full-text matches).


Object is important (placed before default-priority objects).


Object is unimportant (placed after full-text matches).


Object should not show up in search at all.

initial_data: Dict = {'objects': {}}

data value for a fresh environment

label = 'Snakemake'

longer, more descriptive (used in messages)


domain label

name = 'sm'

should be short, but unique


domain name

object_types: Dict[str, ObjType] = {'rule': <sphinx.domains.ObjType object>, 'stage': <sphinx.domains.ObjType object>}

type (usually directive) name -> ObjType instance

resolve_xref(env, fromdocname, builder, typ, target, node, contnode)[source]

Resolve the pending_xref node with the given typ and target.

This method should return a new node, to replace the xref node, containing the contnode which is the markup content of the cross-reference.

If no resolution can be found, None can be returned; the xref node will then given to the :event:`missing-reference` event, and if that yields no resolution, replaced by contnode.

System Message: ERROR/3 (/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/src/ymp/sphinxext.py:docstring of ymp.sphinxext.SnakemakeDomain.resolve_xref, line 7); backlink

Unknown interpreted text role “event”.

The method can also raise sphinx.environment.NoUri to suppress the :event:`missing-reference` event being emitted.

System Message: ERROR/3 (/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/src/ymp/sphinxext.py:docstring of ymp.sphinxext.SnakemakeDomain.resolve_xref, line 11); backlink

Unknown interpreted text role “event”.

roles: Dict[str, Union[RoleFunction, XRefRole]] = {'rule': <sphinx.roles.XRefRole object>, 'stage': <sphinx.roles.XRefRole object>}

role name -> role callable

class ymp.sphinxext.SnakemakeRule(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Bases: sphinx.util.docutils.SphinxDirective, Generic[sphinx.directives.T]

Directive sm:rule:: describing a Snakemake rule

typename = 'rule'
class ymp.sphinxext.YmpObjectDescription(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Bases: sphinx.util.docutils.SphinxDirective, Generic[sphinx.directives.T]

Base class for RSt directives in SnakemakeDomain

Since this inherhits from Sphinx’ ObjectDescription, content generated by the directive will always be inside an addnodes.desc.


source – Specify source position as file:line to create link

add_target_and_index(name, sig, signode)[source]

Add cross-reference IDs and entries to self.indexnode

Return type


get_index_text(typename, name)[source]

Formats object for entry into index

Return type


handle_signature(sig, signode)[source]

Parse rule signature sig into RST nodes and append them to signode.

The retun value identifies the object and is passed to add_target_and_index() unchanged

  • sig (str) – Signature string (i.e. string passed after directive)

  • signode (desc) – Node created for object signature

Return type



Normalized signature (white space removed)

option_spec: Dict[str, DirectiveOption] = {'source': <function unchanged>}

Mapping of option names to validator functions.

typename = '[object name]'
class ymp.sphinxext.YmpStage(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Bases: sphinx.util.docutils.SphinxDirective, Generic[sphinx.directives.T]

Directive sm:stage:: describing an YMP stage

typename = 'stage'

Add Snakefiles to documentation (in HTML mode)


Make absolute path relative to BASEPATH


path (str) – absolute path

Return type



path relative to BASEPATH


Register the extension with Sphinx

ymp.string module

exception ymp.string.FormattingError(message, fieldname)[source]

Bases: AttributeError

class ymp.string.GetNameFormatter[source]

Bases: string.Formatter

class ymp.string.OverrideJoinFormatter[source]

Bases: string.Formatter

Formatter with overridable join method

The default formatter joins all arguments with "".join(args). This class overrides _vformat() with identical code, changing only that line to one that can be overridden by a derived class.


Joins the expanded pieces of the template string to form the output.

This function is equivalent to ''.join(args). By overriding it, alternative methods can be implemented, e.g. to create a list of strings, each corresponding to a the cross product of the expanded variables.

Return type

Union[List[str], str]

class ymp.string.PartialFormatter[source]

Bases: string.Formatter

Formats what it can and leaves the remainder untouched

get_field(field_name, args, kwargs)[source]
class ymp.string.ProductFormatter[source]

Bases: ymp.string.OverrideJoinFormatter

String Formatter that creates a list of strings each expanded using one point in the cartesian product of all replacement values.

If none of the arguments evaluate to lists, the result is a string, otherwise it is a list.

>>> ProductFormatter().format("{A} and {B}", A=[1,2], B=[3,4])
"1 and 3"
"1 and 4"
"2 and 3"
"2 and 4"
format_field(value, format_spec)[source]

Joins the expanded pieces of the template string to form the output.

This function is equivalent to ''.join(args). By overriding it, alternative methods can be implemented, e.g. to create a list of strings, each corresponding to a the cross product of the expanded variables.

Return type

Union[List[str], str]

class ymp.string.QuotedElementFormatter(*args, **kwargs)[source]

Bases: snakemake.utils.SequenceFormatter

class ymp.string.RegexFormatter(regex)[source]

Bases: string.Formatter

String Formatter accepting a regular expression defining the format of the expanded tags.


Get set of field names in format_string)

Return type



Parse format_string into tuples. Tuples contain literal_text: text to copy field_name: follwed by field name format_spec: conversion:

ymp.string.make_formatter(product=None, regex=None, partial=None, quoted=None)[source]
Return type


ymp.util module

ymp.util.R(code='', **kwargs)[source]

Execute R code

This function executes the R code given as a string. Additional arguments are injected into the R environment. The value of the last R statement is returned.

The function requires rpy2 to be installed.

  • code (str) – R code to be executed

  • **kwargs (dict) – variables to inject into R globalenv


value of last R statement

>>>  R("1*1", input=input)
ymp.util.Rmd(rmd, out, **kwargs)[source]
ymp.util.check_input(names, minlines=0, minbytes=0)[source]
Return type


ymp.util.file_not_empty(fn, minsize=1)[source]

Checks is a file is not empty, accounting for gz mininum size 20

ymp.util.filter_input(name, also=None, join=None, minsize=None)[source]
Return type



Removes empty sets of files from input file lists.

Takes a variable number of file lists of equal length and removes indices where any of the files is empty. Strings are converted to lists of length 1.

Returns a generator tuple.

Example: r1, r2 = filter_out_empty(input.r1, input.r2)

ymp.util.glob_wildcards(pattern, files=None)[source]

Glob the values of the wildcards by matching the given pattern to the filesystem. Returns a named tuple with a list of values for each wildcard.

ymp.util.make_local_path(icfg, url)[source]

ymp.yaml module

class ymp.yaml.AttrItemAccessMixin[source]

Bases: object

Mixin class mapping dot to bracket access

Added to classes implementing __getitem__, __setitem__ and __delitem__, this mixin will allow acessing items using dot notation. I.e. “object.xyz” is translated to “object[xyz]”.

class ymp.yaml.Entry(filename, yaml, index)[source]

Bases: object

exception ymp.yaml.LayeredConfAccessError(obj, msg, key=None, stack=None)[source]

Bases: ymp.yaml.LayeredConfError, KeyError, IndexError

Can’t access

exception ymp.yaml.LayeredConfError(obj, msg, key=None, stack=None)[source]

Bases: ymp.exceptions.YmpConfigError

Error in LayeredConf


Retrieve filename and linenumber from object associated with exception


Tuple of filename and linenumber

class ymp.yaml.LayeredConfProxy(maps, root=None, parent=None, key=None)[source]

Bases: ymp.yaml.MultiMapProxy

Layered configuration

save(outstream=None, layer=0)[source]
exception ymp.yaml.LayeredConfWriteError(obj, msg, key=None, stack=None)[source]

Bases: ymp.yaml.LayeredConfError

Can’t write

exception ymp.yaml.MixedTypeError(obj, msg, key=None, stack=None)[source]

Bases: ymp.yaml.LayeredConfError

Mixed types in proxy collection

class ymp.yaml.MultiMapProxy(maps, root=None, parent=None, key=None)[source]

Bases: ymp.yaml.MultiProxy, ymp.yaml.AttrItemAccessMixin, collections.abc.Mapping

Mapping Proxy for layered containers

get(k[, d])D[k] if k in D, else d.  d defaults to None.[source]
items()a set-like object providing a view on D’s items[source]
keys()a set-like object providing a view on D’s keys[source]
values()an object providing a view on D’s values[source]
class ymp.yaml.MultiMapProxyItemsView(mapping)[source]

Bases: ymp.yaml.MultiMapProxyMappingView, collections.abc.ItemsView

ItemsView for MultiMapProxy

class ymp.yaml.MultiMapProxyKeysView(mapping)[source]

Bases: ymp.yaml.MultiMapProxyMappingView, collections.abc.KeysView

KeysView for MultiMapProxy

class ymp.yaml.MultiMapProxyMappingView(mapping)[source]

Bases: collections.abc.MappingView

MappingView for MultiMapProxy

class ymp.yaml.MultiMapProxyValuesView(mapping)[source]

Bases: ymp.yaml.MultiMapProxyMappingView, collections.abc.ValuesView

ValuesView for MultiMapProxy

class ymp.yaml.MultiProxy(maps, root=None, parent=None, key=None)[source]

Bases: object

Base class for layered container structure

add_layer(name, container)[source]
get_path(key=None, absolute=False)[source]
class ymp.yaml.MultiSeqProxy(maps, root=None, parent=None, key=None)[source]

Bases: ymp.yaml.MultiProxy, ymp.yaml.AttrItemAccessMixin, collections.abc.Sequence

Sequence Proxy for layered containers

class ymp.yaml.WorkdirTag(path)[source]

Bases: object

classmethod from_yaml(_constructor, node)[source]
classmethod to_yaml(representer, instance)[source]
yaml_tag = '!workdir'
ymp.yaml.load(files, root=None)[source]

Load configuration files

Creates a LayeredConfProxy configuration object from a set of YAML files.

Files listed later will override parts of earlier included files

ymp.yaml.resolve_installed_package(fname, stack)[source]