ymp package

ymp.get_config()[source]

Access the current YMP configuration object.

This object might change once during normal execution: it is deleted before passing control to Snakemake. During unit test execution the object is deleted between all tests.

Return type

ConfigMgr

ymp.print_rule = 0

Set to 1 to show the YMP expansion process as it is applied to the next Snakemake rule definition.

>>> ymp.print_rule = 1
>>> rule broken:
>>>   ...
>>> ymp make broken -vvv
ymp.snakemake_versions = ['5.20.1']

List of versions this version of YMP has been verified to work with

Subpackages

Submodules

ymp.blast module

Parsers for blast output formats 6 (CSV) and 7 (CSV with comments between queries).

class ymp.blast.BlastParser[source]

Bases: object

Base class for BLAST parsers

FIELD_MAP = {'% identity': 'pident', 'alignment length': 'length', 'bit score': 'bitscore', 'evalue': 'evalue', 'gap opens': 'gapopen', 'mismatches': 'mismatch', 'q. end': 'qend', 'q. start': 'qstart', 'query acc.': 'qacc', 'query frame': 'qframe', 'query length': 'qlen', 's. end': 'send', 's. start': 'sstart', 'sbjct frame': 'sframe', 'score': 'score', 'subject acc.': 'sacc', 'subject strand': 'sstrand', 'subject tax ids': 'staxids', 'subject title': 'stitle'}
FIELD_TYPE = {'bitscore': <class 'float'>, 'evalue': <class 'float'>, 'gapopen': <class 'int'>, 'length': <class 'int'>, 'mismatch': <class 'int'>, 'pident': <class 'float'>, 'qend': <class 'int'>, 'qframe': <class 'int'>, 'qlen': <class 'int'>, 'qstart': <class 'int'>, 'score': <class 'float'>, 'send': <class 'int'>, 'sframe': <class 'int'>, 'sstart': <class 'int'>, 'staxids': <function BlastParser.tupleofint>, 'stitle': <class 'str'>}
tupleofint()[source]
class ymp.blast.Fmt6Parser(fileobj)[source]

Bases: ymp.blast.BlastParser

Parser for BLAST format 6 (CSV)

Hit

alias of BlastHit

field_types = [None, None, <class 'float'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'float'>, <class 'float'>]
fields = ['qseqid', 'sseqid', 'pident', 'length', 'mismatch', 'gapopen', 'qstart', 'qend', 'sstart', 'send', 'evalue', 'bitscore']

Default field types

get_fields()[source]
class ymp.blast.Fmt7Parser(fileobj)[source]

Bases: ymp.blast.BlastParser

Parses BLAST results in format ‘7’ (CSV with comments)

DATABASE = '# Database: '
FIELDS = '# Fields: '
HITSFOUND = ' hits found'
QUERY = '# Query: '
get_fields()[source]

Returns list of available field names

Format 7 specifies which columns it contains in comment lines, allowing this parser to be agnostic of the selection of columns made when running BLAST.

Return type

List[str]

Returns

List of field names (e.g. ['sacc', 'qacc', 'evalue'])

isfirsthit()[source]

Returns True if the current hit is the first hit for the current query

Return type

bool

ymp.blast.reader(fileobj, t=7)[source]

Creates a reader for files in BLAST format

>>> with open(blast_file) as infile:
>>>    reader = blast.reader(infile)
>>>    for hit in reader:
>>>       print(hit)
Parameters
  • fileobj – iterable yielding lines in blast format

  • t (int) – number of blast format type

Return type

BlastParser

ymp.blast2gff module

ymp.cluster module

Module handling talking to cluster management systems

>>> python -m ymp.cluster slurm status <jobid>
class ymp.cluster.ClusterMS[source]

Bases: object

class ymp.cluster.Lsf[source]

Bases: ymp.cluster.ClusterMS

Talking to LSF

states = {'DONE': 'success', 'EXIT': 'failed', 'PEND': 'running', 'POST_DONE': 'success', 'POST_ERR': 'failed', 'PSUSP': 'running', 'RUN': 'running', 'SSUSP': 'running', 'UNKWN': 'running', 'USUSP': 'running', 'WAIT': 'running'}
static status(jobid)[source]
static submit(args)[source]
class ymp.cluster.Slurm[source]

Bases: ymp.cluster.ClusterMS

Talking to Slurm

states = {'BOOT_FAIL': 'failed', 'CANCELLED': 'failed', 'COMPLETED': 'success', 'COMPLETING': 'running', 'CONFIGURING': 'running', 'DEADLINE': 'failed', 'FAILED': 'failed', 'NODE_FAIL': 'failed', 'PENDING': 'running', 'PREEMPTED': 'failed', 'RESIZING': 'running', 'REVOKED': 'running', 'RUNNING': 'running', 'SPECIAL_EXIT': 'running', 'SUSPENDED': 'running', 'TIMEOUT': 'failed'}
static status(jobid)[source]

Print status of job @param jobid to stdout (as needed by snakemake)

Anectotal benchmarking shows 200ms per invocation, half used by Python startup and half by calling sacct. Using scontrol show job instead of sacct -pbs is faster by 80ms, but finished jobs are purged after unknown time window.

ymp.cluster.error(*args, **kwargs)[source]

ymp.common module

Collection of shared utility classes and methods

class ymp.common.AttrDict[source]

Bases: dict

AttrDict adds accessing stored keys as attributes to dict

class ymp.common.Cache(root)[source]

Bases: object

close()[source]
commit()[source]
get_cache(name, clean=False, *args, **kwargs)[source]
load(cache, key)[source]
load_all(cache)[source]
store(cache, key, obj)[source]
class ymp.common.CacheDict(cache, name, *args, loadfunc=None, itemloadfunc=None, itemdata=None, **kwargs)[source]

Bases: ymp.common.AttrDict

get(k[, d]) → D[k] if k in D, else d. d defaults to None.[source]
items() → a set-like object providing a view on D’s items[source]
keys() → a set-like object providing a view on D’s keys[source]
values() → an object providing a view on D’s values[source]
class ymp.common.MkdirDict[source]

Bases: ymp.common.AttrDict

Creates directories as they are requested

ymp.common.ensure_list(obj)[source]

Wrap obj in a list as needed

ymp.common.flatten(item)[source]

Flatten lists without turning strings into letters

ymp.common.is_container(obj)[source]

Check if object is container, considering strings not containers

ymp.common.parse_number(s='')[source]

Basic 1k 1m 1g 1t parsing.

  • assumes base 2

  • returns “byte” value

  • accepts “1kib”, “1kb” or “1k”

ymp.config module

class ymp.config.ConfigExpander(config_mgr)[source]

Bases: ymp.snakemake.ColonExpander

class Formatter(expander)[source]

Bases: ymp.snakemake.FormatExpander.Formatter, ymp.string.PartialFormatter

get_value(field_name, args, kwargs)[source]
expands_field(field)[source]

Checks if this expander should expand a Rule field type

Parameters

field – the field to check

Returns

True if field should be expanded.

class ymp.config.ConfigMgr(root, conffiles)[source]

Bases: object

Manages workflow configuration

This is a singleton object of which only one instance should be around at a given time. It is available in the rules files as icfg and via ymp.get_config() elsewhere.

ConfigMgr loads and maintains the workflow configuration as given in the ymp.yml files located in the workflow root directory, the user config folder (~/.ymp) and the installation etc folder.

CONF_DEFAULT_FNAME = '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/etc/defaults.yml'
CONF_FNAME = 'ymp.yml'
CONF_USER_FNAME = '/home/docs/.ymp/ymp.yml'
KEY_PIPELINES = 'pipelines'
KEY_PROJECTS = 'projects'
KEY_REFERENCES = 'references'
RULE_MAIN_FNAME = '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/rules/Snakefile'
property absdir

Dictionary of absolute paths of named YMP directories

classmethod activate()[source]
property cluster

The YMP cluster configuration.

property conda
property dir

Dictionary of relative paths of named YMP directories

The directory paths are relative to the YMP root workdir.

property ensuredir

Dictionary of absolute paths of named YMP directories

Directories will be created on the fly as they are requested.

expand(item, **kwargs)[source]
classmethod find_config()[source]

Locates ymp config files and ymp root

The root ymp work dir is determined as the first (parent) directory containing a file named ConfigMgr.CONF_FNAME (default ymp.yml).

The stack of config files comprises 1. the default config ConfigMgr.CONF_DEFAULT_FNAME (etc/defaults.yml in the ymp package directory), 2. the user config ConfigMgr.CONF_USER_FNAME (~/.ymp/ymp.yml) and 3. the yml.yml in the ymp root.

Returns

Root working directory conffiles: list of active configuration files

Return type

root

classmethod instance()[source]

Returns the active Ymp ConfigMgr instance

property limits

The YMP limits configuration.

mem(base='0', per_thread=None, unit='m')[source]

Clamp memory to configuration limits

Params:

base: base memory requested per_thread: additional mem required per allocated thread unit: output unit (b, k, m, g, t)

property pairnames
property pipeline

Configure pipelines

property platform

Name of current platform (macos or linux)

property ref

Configure references

property shell

The shell used by YMP

Change by adding e.g. shell: /path/to/shell to ymp.yml.

property snakefiles

Snakefiles used under this config in parsing order

classmethod unload()[source]
class ymp.config.OverrideExpander(cfgmgr)[source]

Bases: ymp.snakemake.BaseExpander

Apply rule attribute overrides from ymp.yml config

Example

Set the wordsize parameter in the bmtagger_bitmask rule to 12:

ymp.yml
overrides:
  rules:
    bmtagger_bitmask:
      params:
        wordsize: 12
expand(rule, ruleinfo, **kwargs)[source]

Expands RuleInfo object and children recursively.

Will call :meth:format (via :meth:format_annotated) on str items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.

Set ymp.print_rule = 1 before a rule: statement in snakefiles to enable debug logging of recursion.

Parameters
  • rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item

  • item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be None, str, function, int, float, dict, list or tuple.

  • expand_args – Parameters passed on late expansion (when the dag tries to instantiate the rule into a job.

  • rec – Recursion level

ymp.dna module

ymp.dna.nuc2aa(seq)
ymp.dna.nuc2num(seq)

ymp.download module

class ymp.download.DownloadThread[source]

Bases: object

get(url, dest, md5)[source]
main()[source]
terminate()[source]
class ymp.download.FileDownloader(block_size=4096, timeout=300, parallel=4, loglevel=30, alturls=None, retry=3)[source]

Bases: object

Manages download of a set of URLs

Downloads happen concurrently using asyncronous network IO.

Parameters
  • block_size (int) – Byte size of chunks to download

  • timeout (int) – Aiohttp cumulative timeout

  • parallel (int) – Number of files to download in parallel

  • loglevel (int) – Log level for messages send to logging (Errors are send with loglevel+10)

  • alturls – List of regexps modifying URLs

  • retry (int) – Number of times to retry download

error(msg, *args, **kwargs)[source]

Send error to logger

Message is sent with a log level 10 higher than the default for this object.

Return type

None

get(urls, dest, md5s=None)[source]

Download a list of URLs

Parameters
Return type

None

log(msg, *args, modlvl=0, **kwargs)[source]

Send message to logger

Honors loglevel set for the FileDownloader object.

Parameters
  • msg (str) – The log message

  • modlvl (int) – Added to default logging level for object

Return type

None

static make_bar_format(desc_width=20, count_width=0, rate=False, eta=False, have_total=True)[source]

Construct bar_format for tqdm

Parameters
  • desc_width (int) – minimum space allocated for description

  • count_width (int) – min space for counts

  • rate (bool) – show rate to right of progress bar

  • eta (bool) – show eta to right of progress bar

  • have_total (bool) – whether a total exists (required to add percentage)

Return type

str

ymp.env module

This module manages the conda environments.

class ymp.env.CondaPathExpander(config, *args, **kwargs)[source]

Bases: ymp.snakemake.BaseExpander

Applies search path for conda environment specifications

File names supplied via rule: conda: "some.yml" are replaced with absolute paths if they are found in any searched directory. Each search_paths entry is appended to the directory containing the top level Snakefile and the directory checked for the filename. Thereafter, the stack of including Snakefiles is traversed backwards. If no file is found, the original name is returned.

expands_field(field)[source]

Checks if this expander should expand a Rule field type

Parameters

field – the field to check

Returns

True if field should be expanded.

format(conda_env, *args, **kwargs)[source]

Format item using *args and **kwargs

class ymp.env.Env(env_file=None, dag=None, singularity_img=None, container_img=None, cleanup=None, name=None, packages=None, base='none', channels=None, rule=None)[source]

Bases: ymp.snakemake.WorkflowObject, snakemake.deployment.conda.Env

Represents YMP conda environment

Snakemake expects the conda environments in a per-workflow directory configured by conda_prefix. YMP sets this value by default to ~/.ymp/conda, which has a greater chance of being on the same file system as the conda cache, allowing for hard linking of environment files.

Within the folder conda_prefix, each environment is created in a folder named by the hash of the environment definition file’s contents and the conda_prefix path. This class inherits from snakemake.deployment.conda.Env to ensure that the hash we use is identical to the one Snakemake will use during workflow execution.

The class provides additional features for updating environments, creating environments dynamically and executing commands within those environments.

Note

This is not called from within the execution. Snakemake instanciates its own Env object purely based on the filename.

Creates an inline defined conda environment

Parameters
  • name (Optional[str]) – Name of conda environment (and basename of file)

  • packages (Union[list, str, None]) – package(s) to be installed into environment. Version constraints can be specified in each package string separated from the package name by whitespace. E.g. "blast =2.6*"

  • channels (Union[list, str, None]) – channel(s) to be selected for the environment

  • base (str) – Select a set of default channels and packages to be added to the newly created environment. Sets are defined in conda.defaults in yml.yml

create(dryrun=False, force=False)[source]

Ensure the conda environment has been created

Inherits from snakemake.conda.Env.create

Behavior of super class

The environment is installed in a folder in conda_prefix named according to a hash of the environment.yaml defining the environment and the value of conda-prefix (Env.hash). The latter is included as installed environments cannot be moved.

  • If this folder (Env.path) exists, nothing is done.

  • If a folder named according to the hash of just the contents of environment.yaml exists, the environment is created by unpacking the tar balls in that folder.

Handling pre-computed environment specs

In addition to freezing environments by maintaining a copy of the package binaries, we allow maintaining a copy of the package binary URLs, from which the archive folder is populated on demand.

If a file {Env.name}.txt exists in conda.spec FIXME

export(stream, typ='yml')[source]

Freeze environment

static get_installed_env_hashes()[source]
property installed
run(command)[source]

Execute command in environment

Returns exit code of command run.

set_prefix(prefix)[source]
update()[source]

Update conda environment

ymp.exceptions module

Exceptions raised by YMP

exception ymp.exceptions.YmpConfigError(obj, msg, key=None, exc=None)[source]

Bases: ymp.exceptions.YmpNoStackException

Indicates an error in the ymp.yml config files

Parameters
  • obj (object) – Subtree of config causing error

  • msg (str) – The message to display

  • key (object) – Key indicating part of obj causing error

  • exc (Optional[Exception]) – Upstream exception causing error

exception ymp.exceptions.YmpException[source]

Bases: Exception

Base class of all YMP Exceptions

exception ymp.exceptions.YmpNoStackException(message)[source]

Bases: ymp.exceptions.YmpException, click.exceptions.ClickException

Exception that does not lead to stack trace on CLI

Inheriting from ClickException makes click print only the self.msg value of the exception, rather than allowing Python to print a full stack trace.

This is useful for exceptions indicating usage or configuration errors. We use this, instead of click.UsageError and friends so that the exceptions can be caught and handled explicitly where needed.

Note that click will call the show method on this object to print the exception. The default implementation from click will just prefix the msg with Error:.

FIXME: This does not work if the exception is raised from within

the snakemake workflow as snakemake.snakemake catches and reformats exceptions.

exception ymp.exceptions.YmpRuleError(obj, msg)[source]

Bases: ymp.exceptions.YmpNoStackException

Indicates an error in the rules files

This could e.g. be a Stage or Environment defined twice.

Parameters
  • obj (object) – The object causing the exception. Must have lineno and filename as these will be shown as part of the error message on the command line.

  • msg (str) – The message to display

show()[source]
Return type

None

exception ymp.exceptions.YmpStageError(msg)[source]

Bases: ymp.exceptions.YmpNoStackException

Indicates an error in the requested stage stack

show()[source]
Return type

None

exception ymp.exceptions.YmpSystemError(message)[source]

Bases: ymp.exceptions.YmpNoStackException

Indicates problem running YMP with available system software

exception ymp.exceptions.YmpUsageError(message)[source]

Bases: ymp.exceptions.YmpNoStackException

exception ymp.exceptions.YmpWorkflowError(message)[source]

Bases: ymp.exceptions.YmpNoStackException

Indicates an error during workflow execution

E.g. failures to expand dynamic variables

ymp.gff module

Implements simple reader and writer for GFF (general feature format) files.

Unfinished

  • only supports one version, GFF 3.2.3.

  • no escaping

class ymp.gff.Attributes(ID, Name, Alias, Parent, Target, Gap, Derives_From, Note, Dbxref, Ontology_term, Is_circular)

Bases: tuple

Create new instance of Attributes(ID, Name, Alias, Parent, Target, Gap, Derives_From, Note, Dbxref, Ontology_term, Is_circular)

property Alias

Alias for field number 2

property Dbxref

Alias for field number 8

property Derives_From

Alias for field number 6

property Gap

Alias for field number 5

property ID

Alias for field number 0

property Is_circular

Alias for field number 10

property Name

Alias for field number 1

property Note

Alias for field number 7

property Ontology_term

Alias for field number 9

property Parent

Alias for field number 3

property Target

Alias for field number 4

class ymp.gff.Feature(seqid, source, type, start, end, score, strand, phase, attributes)

Bases: tuple

Create new instance of Feature(seqid, source, type, start, end, score, strand, phase, attributes)

property attributes

Alias for field number 8

property end

Alias for field number 4

property phase

Alias for field number 7

property score

Alias for field number 5

property seqid

Alias for field number 0

property source

Alias for field number 1

property start

Alias for field number 3

property strand

Alias for field number 6

property type

Alias for field number 2

class ymp.gff.reader(fileobj)[source]

Bases: object

class ymp.gff.writer(fileobj)[source]

Bases: object

write(feature)[source]

ymp.helpers module

This module contains helper functions.

Not all of these are currently in use

class ymp.helpers.OrderedDictMaker[source]

Bases: object

odict creates OrderedDict objects in a dict-literal like syntax

>>>  my_ordered_dict = odict[
>>>    'key': 'value'
>>>  ]

Implementation: odict uses the python slice syntax which is similar to dict literals. The [] operator is implemented by overriding __getitem__. Slices passed to the operator as object[start1:stop1:step1, start2:...], are passed to the implementation as a list of objects with start, stop and step members. odict simply creates an OrderedDictionary by iterating over that list.

ymp.helpers.update_dict(dst, src)[source]

Recursively update dictionary dst with src

  • Treats a list as atomic, replacing it with new list.

  • Dictionaries are overwritten by item

  • None is replaced by empty dict if necessary

ymp.map2otu module

class ymp.map2otu.MapfileParser(minid=0)[source]

Bases: object

read(mapfiles)[source]
write(outfile)[source]
class ymp.map2otu.emirge_info(line)[source]

Bases: object

ymp.map2otu.main()[source]

ymp.nuc2aa module

ymp.nuc2aa.fasta_dna2aa(inf, outf)[source]
ymp.nuc2aa.nuc2aa(seq)[source]
ymp.nuc2aa.nuc2num(seq)[source]

ymp.snakemake module

Extends Snakemake Features

class ymp.snakemake.BaseExpander[source]

Bases: object

Base class for Snakemake expansion modules.

Subclasses should override the :meth:expand method if they need to work on the entire RuleInfo object or the :meth:format and :meth:expands_field methods if they intend to modify specific fields.

expand(rule, item, expand_args=None, rec=- 1, cb=False)[source]

Expands RuleInfo object and children recursively.

Will call :meth:format (via :meth:format_annotated) on str items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.

Set ymp.print_rule = 1 before a rule: statement in snakefiles to enable debug logging of recursion.

Parameters
  • rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item

  • item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be None, str, function, int, float, dict, list or tuple.

  • expand_args – Parameters passed on late expansion (when the dag tries to instantiate the rule into a job.

  • rec – Recursion level

expand_dict(rule, item, expand_args, rec)[source]
expand_func(rule, item, expand_args, rec, debug)[source]
expand_list(rule, item, expand_args, rec, cb)[source]
expand_ruleinfo(rule, item, expand_args, rec)[source]
expand_str(rule, item, expand_args, rec, cb)[source]
expand_tuple(rule, item, expand_args, rec, cb)[source]
expands_field(field)[source]

Checks if this expander should expand a Rule field type

Parameters

field – the field to check

Returns

True if field should be expanded.

format(item, *args, **kwargs)[source]

Format item using *args and **kwargs

format_annotated(item, expand_args)[source]

Wrapper for :meth:format preserving AnnotatedString flags

Calls :meth:format to format item into a new string and copies flags from original item.

This is used by :meth:expand

exception ymp.snakemake.CircularReferenceException(deps, rule)[source]

Bases: ymp.exceptions.YmpRuleError

Exception raised if parameters in rule contain a circular reference

class ymp.snakemake.ColonExpander[source]

Bases: ymp.snakemake.FormatExpander

Expander using {:xyz:} formatted variables.

regex = re.compile('\n \\{:\n (?=(\n \\s*\n (?P<name>(?:.(?!\\s*\\:\\}))*.)\n \\s*\n ))\\1\n :\\}\n ', re.VERBOSE)
spec = '{{:{}:}}'
class ymp.snakemake.DefaultExpander(**kwargs)[source]

Bases: ymp.snakemake.InheritanceExpander

Adds default values to rules

The implementation simply makes all rules inherit from a defaults rule.

Creates DefaultExpander

Each parameter passed is considered a RuleInfo default value. Where applicable, Snakemake’s argtuples ([],{}) must be passed.

get_super(rule, ruleinfo)[source]

Find rule parent

Parameters
  • rule (Rule) – Rule object being built

  • ruleinfo (RuleInfo) – RuleInfo object describing rule being built

Returns

name of parent rule and RuleInfo describing parent rule or (None, None).

Return type

2-Tuple

exception ymp.snakemake.ExpandLateException[source]

Bases: Exception

class ymp.snakemake.ExpandableWorkflow(*args, **kwargs)[source]

Bases: snakemake.workflow.Workflow

Adds hook for additional rule expansion methods to Snakemake

Constructor for ExpandableWorkflow overlay attributes

This may be called on an already initialized Workflow object.

classmethod activate()[source]

Installs the ExpandableWorkflow

Replaces the Workflow object in the snakemake.workflow module with an instance of this class and initializes default expanders (the snakemake syntax).

add_rule(name=None, lineno=None, snakefile=None, checkpoint=False)[source]

Add a rule.

Parameters
  • name – name of the rule

  • lineno – line number within the snakefile where the rule was defined

  • snakefile – name of file in which rule was defined

classmethod clear()[source]
classmethod ensure_global_workflow()[source]
get_rule(name=None)[source]

Get rule by name. If name is none, the last created rule is returned.

Parameters

name – the name of the rule

global_workflow = <ymp.snakemake.ExpandableWorkflow object>
classmethod load_workflow(snakefile='/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/rules/Snakefile')[source]
classmethod register_expanders(*expanders)[source]

Register an object the expand() function of which will be called on each RuleInfo object before it is passed on to snakemake.

rule(name=None, lineno=None, snakefile=None, checkpoint=None)[source]

Intercepts “rule:” Here we have the entire ruleinfo object

class ymp.snakemake.FormatExpander[source]

Bases: ymp.snakemake.BaseExpander

Expander using a custom formatter object.

class Formatter(expander)[source]

Bases: ymp.string.ProductFormatter

parse(format_string)[source]
format(*args, **kwargs)[source]

Format item using *args and **kwargs

get_names(pattern)[source]
regex = re.compile('\n \\{\n (?=(\n (?P<name>[^{}]+)\n ))\\1\n \\}\n ', re.VERBOSE)
spec = '{{{}}}'
exception ymp.snakemake.InheritanceException(msg, rule, parent, include=None, lineno=None, snakefile=None)[source]

Bases: snakemake.exceptions.RuleException

Exception raised for errors during rule inheritance

Creates a new instance of RuleException.

Arguments message – the exception message include – iterable of other exceptions to be included lineno – the line the exception originates snakefile – the file the exception originates

class ymp.snakemake.InheritanceExpander[source]

Bases: ymp.snakemake.BaseExpander

Adds class-like inheritance to Snakemake rules

To avoid redundancy between closely related rules, e.g. rules for single ended and paired end data, YMP allows Snakemake rules to inherit from another rule.

Example

Derived rules are always created with an implicit ruleorder statement, making Snakemake prefer the parent rule if either parent or child rule could be used to generate the requested output file(s).

Derived rules initially contain the same attributes as the parent rule. Each attribute assigned to the child rule overrides the matching attribute in the parent. Where attributes may contain named and unnamed values, specifying a named value overrides only the value of that name while specifying an unnamed value overrides all unnamed values in the parent attribute.

KEYWORD = 'ymp: extends'

Comment keyword enabling inheritance

expand(rule, ruleinfo)[source]

Expands RuleInfo object and children recursively.

Will call :meth:format (via :meth:format_annotated) on str items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.

Set ymp.print_rule = 1 before a rule: statement in snakefiles to enable debug logging of recursion.

Parameters
  • rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item

  • item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be None, str, function, int, float, dict, list or tuple.

  • expand_args – Parameters passed on late expansion (when the dag tries to instantiate the rule into a job.

  • rec – Recursion level

get_code_line(rule)[source]

Returns the source line defining rule

Return type

str

get_super(rule, ruleinfo)[source]

Find rule parent

Parameters
  • rule (Rule) – Rule object being built

  • ruleinfo (RuleInfo) – RuleInfo object describing rule being built

Returns

name of parent rule and RuleInfo describing parent rule or (None, None).

Return type

2-Tuple

class ymp.snakemake.NamedList(fromtuple=None, **kwargs)[source]

Bases: snakemake.io.Namedlist

Extended version of Snakemake’s Namedlist

  • Fixes array assignment operator: Writing a field via [] operator updates the value accessed via . operator.

  • Adds fromtuple to constructor: Builds from Snakemake’s typial (args, kwargs) tuples as present in ruleinfo structures.

  • Adds update_tuple method: Updates values in (args,kwargs) tuples as present in ruleinfo structures.

Create the object.

Arguments toclone – another Namedlist that shall be cloned fromdict – a dict that shall be converted to a

System Message: ERROR/3 (/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/snakemake.py:docstring of ymp.snakemake.NamedList, line 18)

Unexpected indentation.

Namedlist (keys become names)

get_names(*args, **kwargs)[source]

Export get_names as public func

update_tuple(totuple)[source]

Update values in (args, kwargs) tuple. The tuple must be the same as used in the constructor and must not have been modified.

class ymp.snakemake.RecursiveExpander[source]

Bases: ymp.snakemake.BaseExpander

Recursively expands {xyz} wildcards in Snakemake rules.

expand(rule, ruleinfo)[source]

Recursively expand wildcards within RuleInfo object

expands_field(field)[source]

Returns true for all fields but shell:, message: and wildcard_constraints.

We don’t want to mess with the regular expressions in the fields in wildcard_constraints:, and there is little use in expanding message: or shell: as these already have all wildcards applied just before job execution (by format_wildcards()).

class ymp.snakemake.SnakemakeExpander[source]

Bases: ymp.snakemake.BaseExpander

Expand wildcards in strings returned from functions.

Snakemake does not do this by default, leaving wildcard expansion to the functions provided themselves. Since we never want {input} to be in a string returned as a file, we expand those always.

expands_field(field)[source]

Checks if this expander should expand a Rule field type

Parameters

field – the field to check

Returns

True if field should be expanded.

format(item, *args, **kwargs)[source]

Format item using *args and **kwargs

class ymp.snakemake.WorkflowObject(*args, **kwargs)[source]

Bases: object

Base for extension classes defined from snakefiles

This currently encompasses ymp.env.Env and ymp.stage.Stage.

This mixin sets the properties filename and lineno according to the definition source in the rules file. It also maintains a registry within the Snakemake workflow object and provides an accessor method to this registry.

property defined_in
filename

Name of file in which object was defined

Type

str

classmethod get_registry(clean=False)[source]

Return all objects of this class registered with current workflow

lineno

Line number of object definition

Type

int

classmethod new_registry()[source]
register()[source]

Add self to registry

ymp.snakemake.check_snakemake()[source]
Return type

bool

ymp.snakemake.get_workflow()[source]

Get active workflow, loading one if necessary

ymp.snakemake.load_workflow(snakefile)[source]

Load new workflow

ymp.snakemake.make_rule(name=None, lineno=None, snakefile=None, **kwargs)[source]
ymp.snakemake.networkx()[source]
ymp.snakemake.print_ruleinfo(rule, ruleinfo, func=<bound method Logger.debug of <Logger ymp.snakemake (WARNING)>>)[source]

Logs contents of Rule and RuleInfo objects.

Parameters
  • rule (Rule) – Rule object to be printed

  • ruleinfo (RuleInfo) – Matching RuleInfo object to be printed

  • func – Function used for printing (default is log.error)

ymp.snakemake.ruleinfo_fields = {'benchmark': {'apply_wildcards': True, 'format': 'string'}, 'conda_env': {'apply_wildcards': True, 'format': 'string'}, 'container_img': {'format': 'string'}, 'docstring': {'format': 'string'}, 'func': {'format': 'callable'}, 'input': {'apply_wildcards': True, 'format': 'argstuple', 'funcparams': ('wildcards',)}, 'log': {'apply_wildcards': True, 'format': 'argstuple'}, 'message': {'format': 'string', 'format_wildcards': True}, 'norun': {'format': 'bool'}, 'output': {'apply_wildcards': True, 'format': 'argstuple'}, 'params': {'apply_wildcards': True, 'format': 'argstuple', 'funcparams': ('wildcards', 'input', 'resources', 'output', 'threads')}, 'priority': {'format': 'numeric'}, 'resources': {'format': 'argstuple', 'funcparams': ('wildcards', 'input', 'attempt', 'threads')}, 'script': {'format': 'string'}, 'shadow_depth': {'format': 'string_or_true'}, 'shellcmd': {'format': 'string', 'format_wildcards': True}, 'threads': {'format': 'int', 'funcparams': ('wildcards', 'input', 'attempt', 'threads')}, 'version': {'format': 'object'}, 'wildcard_constraints': {'format': 'argstuple'}, 'wrapper': {'format': 'string'}}

describes attributes of snakemake.workflow.RuleInfo

ymp.snakemakelexer module

ymp.snakemakelexer

class ymp.snakemakelexer.SnakemakeLexer(*args, **kwds)[source]

Bases: pygments.lexers.python.PythonLexer

name = 'Snakemake'
tokens = {'globalkeyword': [(<pygments.lexer.words object>, Token.Keyword)], 'root': [('(rule|checkpoint)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'rulename'), 'rulekeyword', 'globalkeyword', ('\\n', Token.Text), ('^(\\s*)([rRuUbB]{,2})("""(?:.|\\n)*?""")', <function bygroups.<locals>.callback>), ("^(\\s*)([rRuUbB]{,2})('''(?:.|\\n)*?''')", <function bygroups.<locals>.callback>), ('\\A#!.+$', Token.Comment.Hashbang), ('#.*$', Token.Comment.Single), ('\\\\\\n', Token.Text), ('\\\\', Token.Text), 'keywords', ('(def)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'funcname'), ('(class)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'classname'), ('(from)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'fromimport'), ('(import)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'import'), 'expr'], 'rulekeyword': [(<pygments.lexer.words object>, Token.Keyword)], 'rulename': [('[a-zA-Z_]\\w*', Token.Name.Class, '#pop')]}

ymp.sphinxext module

This module contains a Sphinx extension for documenting YMP stages and Snakemake rules.

The SnakemakeDomain (name sm) provides the following directives:

.. sm:rule:: name

Describes a Snakemake rule

.. sm:stage:: name

Describes a YMP Stage

Both directives accept an optional source parameter. If given, a link to the source code of the stage or rule definition will be added. The format of the string passed is filename:line. Referenced Snakefiles will be highlighted with pygments and added to the documentation when building HTML.

The extension also provides an autodoc-like directive:

.. autosnake:: filename

Generates documentation from Snakefile filename.

class ymp.sphinxext.AutoSnakefileDirective(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Bases: docutils.parsers.rst.Directive

Implements RSt directive .. autosnake:: filename

The directive extracts docstrings from rules in snakefile and auto-generates documentation.

has_content = False

This rule does not accept content

Type

bool

load_workflow(file_path)[source]

Load the Snakefile

Return type

ExpandableWorkflow

parse_doc(doc, source, idt=0)[source]

Convert doc string to StringList

Parameters
  • doc (str) – Documentation text

  • source (str) – Source filename

  • idt (int) – Result indentation in characters (default 0)

Return type

StringList

Returns

StringList of re-indented documentation wrapped in newlines

parse_rule(rule, idt=0)[source]

Convert Rule to StringList

Parameters
  • rule (Rule) – Rule object

  • idt (int) – Result indentation in characters (default 0)

Retuns:

StringList containing formatted Rule documentation

Return type

StringList

parse_stage(stage, idt=0)[source]
Return type

StringList

required_arguments = 1

This rule needs one argument (the filename)

Type

int

run()[source]

Entry point

tpl_rule = '.. sm:rule:: {name}'

Template for generated Rule RSt

Type

str

tpl_source = ' :source: {filename}:{lineno}'

Template option source

Type

str

tpl_stage = '.. sm:stage:: {name}'

Template for generated Stage RSt

Type

str

ymp.sphinxext.BASEPATH = '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src'

Path in which YMP package is located

Type

str

class ymp.sphinxext.DomainTocTreeCollector[source]

Bases: sphinx.environment.collectors.EnvironmentCollector

Add Sphinx Domain entries to the TOC

clear_doc(app, env, docname)[source]

Clear data from environment

If we have cached data in environment for document docname, we should clear it here.

Return type

None

get_ref(node)[source]
Return type

Optional[Node]

locate_in_toc(app, node)[source]
Return type

Optional[Node]

make_heading(node)[source]
Return type

List[Node]

merge_other(app, env, docnames, other)[source]

Merge with results from parallel processes

Called if Sphinx is processing documents in parallel. We should merge this from other into env for all docnames.

Return type

None

process_doc(app, doctree)[source]

Process doctree

This is called by read-doctree, so after the doctree has been loaded. The signal is processed in registered first order, so we are called after built-in extensions, such as the sphinx.environment.collectors.toctree extension building the TOC.

Return type

None

select_doc_nodes(doctree)[source]

Select the nodes for which entries in the TOC are desired

This is a separate method so that it might be overriden by subclasses wanting to add other types of nodes to the TOC.

Return type

List[Node]

select_toc_location(app, node)[source]

Select location in TOC where node should be referenced

Return type

Node

toc_insert(docname, tocnode, node, heading)[source]
Return type

None

class ymp.sphinxext.SnakemakeDomain(env)[source]

Bases: sphinx.domains.Domain

Snakemake language domain

clear_doc(docname)[source]

Delete objects derived from file docname

data_version = 0
directives = {'rule': <class 'ymp.sphinxext.SnakemakeRule'>, 'stage': <class 'ymp.sphinxext.YmpStage'>}
get_objects()[source]

Return an iterable of “object descriptions”.

Object descriptions are tuples with six items:

name

Fully qualified name.

dispname

Name to display when searching/linking.

type

Object type, a key in self.object_types.

docname

The document where it is to be found.

anchor

The anchor name for the object.

priority

How “important” the object is (determines placement in search results). One of:

1

Default priority (placed before full-text matches).

0

Object is important (placed before default-priority objects).

2

Object is unimportant (placed after full-text matches).

-1

Object should not show up in search at all.

initial_data = {'objects': {}}
label = 'Snakemake'
name = 'sm'
object_types = {'rule': <sphinx.domains.ObjType object>, 'stage': <sphinx.domains.ObjType object>}
resolve_xref(env, fromdocname, builder, typ, target, node, contnode)[source]

Resolve the pending_xref node with the given typ and target.

This method should return a new node, to replace the xref node, containing the contnode which is the markup content of the cross-reference.

If no resolution can be found, None can be returned; the xref node will then given to the :event:`missing-reference` event, and if that yields no resolution, replaced by contnode.

System Message: ERROR/3 (/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/sphinxext.py:docstring of ymp.sphinxext.SnakemakeDomain.resolve_xref, line 7); backlink

Unknown interpreted text role “event”.

The method can also raise sphinx.environment.NoUri to suppress the :event:`missing-reference` event being emitted.

System Message: ERROR/3 (/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/sphinxext.py:docstring of ymp.sphinxext.SnakemakeDomain.resolve_xref, line 11); backlink

Unknown interpreted text role “event”.

roles = {'rule': <sphinx.roles.XRefRole object>, 'stage': <sphinx.roles.XRefRole object>}
class ymp.sphinxext.SnakemakeRule(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Bases: ymp.sphinxext.YmpObjectDescription

Directive sm:rule:: describing a Snakemake rule

typename = 'rule'
class ymp.sphinxext.YmpObjectDescription(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Bases: sphinx.directives.ObjectDescription

Base class for RSt directives in SnakemakeDomain

Since this inherhits from Sphinx’ ObjectDescription, content generated by the directive will always be inside an addnodes.desc.

Parameters

source – Specify source position as file:line to create link

add_target_and_index(name, sig, signode)[source]

Add cross-reference IDs and entries to self.indexnode

Return type

None

get_index_text(typename, name)[source]

Formats object for entry into index

Return type

str

handle_signature(sig, signode)[source]

Parse rule signature sig into RST nodes and append them to signode.

The retun value identifies the object and is passed to add_target_and_index() unchanged

Parameters
  • sig (str) – Signature string (i.e. string passed after directive)

  • signode (desc) – Node created for object signature

Return type

str

Returns

Normalized signature (white space removed)

option_spec = {'source': <function unchanged>}
typename = '[object name]'
class ymp.sphinxext.YmpStage(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

Bases: ymp.sphinxext.YmpObjectDescription

Directive sm:stage:: describing an YMP stage

typename = 'stage'
ymp.sphinxext.collect_pages(app)[source]

Add Snakefiles to documentation (in HTML mode)

ymp.sphinxext.relpath(path)[source]

Make absolute path relative to BASEPATH

Parameters

path (str) – absolute path

Return type

str

Returns

path relative to BASEPATH

ymp.sphinxext.setup(app)[source]

Register the extension with Sphinx

ymp.string module

exception ymp.string.FormattingError(message, fieldname)[source]

Bases: AttributeError

class ymp.string.GetNameFormatter[source]

Bases: string.Formatter

get_names(pattern)[source]
class ymp.string.OverrideJoinFormatter[source]

Bases: string.Formatter

Formatter with overridable join method

The default formatter joins all arguments with "".join(args). This class overrides _vformat() with identical code, changing only that line to one that can be overridden by a derived class.

join(args)[source]

Joins the expanded pieces of the template string to form the output.

This function is equivalent to ''.join(args). By overriding it, alternative methods can be implemented, e.g. to create a list of strings, each corresponding to a the cross product of the expanded variables.

Return type

Union[List[str], str]

class ymp.string.PartialFormatter[source]

Bases: string.Formatter

Formats what it can and leaves the remainder untouched

get_field(field_name, args, kwargs)[source]
class ymp.string.ProductFormatter[source]

Bases: ymp.string.OverrideJoinFormatter

String Formatter that creates a list of strings each expanded using one point in the cartesian product of all replacement values.

If none of the arguments evaluate to lists, the result is a string, otherwise it is a list.

>>> ProductFormatter().format("{A} and {B}", A=[1,2], B=[3,4])
"1 and 3"
"1 and 4"
"2 and 3"
"2 and 4"
format_field(value, format_spec)[source]
join(args)[source]

Joins the expanded pieces of the template string to form the output.

This function is equivalent to ''.join(args). By overriding it, alternative methods can be implemented, e.g. to create a list of strings, each corresponding to a the cross product of the expanded variables.

Return type

Union[List[str], str]

class ymp.string.QuotedElementFormatter(*args, **kwargs)[source]

Bases: snakemake.utils.SequenceFormatter

class ymp.string.RegexFormatter(regex)[source]

Bases: string.Formatter

String Formatter accepting a regular expression defining the format of the expanded tags.

get_names(format_string)[source]

Get set of field names in format_string)

Return type

Set[str]

parse(format_string)[source]

Parse format_string into tuples. Tuples contain literal_text: text to copy field_name: follwed by field name format_spec: conversion:

ymp.string.make_formatter(product=None, regex=None, partial=None, quoted=None)[source]

ymp.util module

ymp.util.R(code='', **kwargs)[source]

Execute R code

This function executes the R code given as a string. Additional arguments are injected into the R environment. The value of the last R statement is returned.

The function requires rpy2 to be installed.

Parameters
  • code (str) – R code to be executed

  • **kwargs (dict) – variables to inject into R globalenv

Yields

value of last R statement

>>>  R("1*1", input=input)
ymp.util.Rmd(rmd, out, **kwargs)[source]
ymp.util.activate_R()[source]
ymp.util.fasta_names(fasta_file)[source]
ymp.util.file_not_empty(fn)[source]

Checks is a file is not empty, accounting for gz mininum size 20

ymp.util.filter_out_empty(*args)[source]

Removes empty sets of files from input file lists.

Takes a variable number of file lists of equal length and removes indices where any of the files is empty. Strings are converted to lists of length 1.

Returns a generator tuple.

Example: r1, r2 = filter_out_empty(input.r1, input.r2)

ymp.util.glob_wildcards(pattern, files=None)[source]

Glob the values of the wildcards by matching the given pattern to the filesystem. Returns a named tuple with a list of values for each wildcard.

ymp.util.is_fq(path)[source]
ymp.util.make_local_path(icfg, url)[source]
ymp.util.read_propfiles(files)[source]

ymp.yaml module

class ymp.yaml.AttrItemAccessMixin[source]

Bases: object

Mixin class mapping dot to bracket access

Added to classes implementing __getitem__, __setitem__ and __delitem__, this mixin will allow acessing items using dot notation. I.e. “object.xyz” is translated to “object[xyz]”.

exception ymp.yaml.LayeredConfAccessError[source]

Bases: ymp.yaml.LayeredConfError, KeyError, IndexError

Can’t access

exception ymp.yaml.LayeredConfError[source]

Bases: Exception

Error in LayeredConf

class ymp.yaml.LayeredConfProxy(maps, parent=None, key=None)[source]

Bases: ymp.yaml.MultiMapProxy

Layered configuration

save(outstream=None, layer=0)[source]
exception ymp.yaml.LayeredConfWriteError[source]

Bases: ymp.yaml.LayeredConfError

Can’t write

exception ymp.yaml.MixedTypeError[source]

Bases: Exception

Mixed types in proxy collection

class ymp.yaml.MultiMapProxy(maps, parent=None, key=None)[source]

Bases: collections.abc.Mapping, ymp.yaml.MultiProxy, ymp.yaml.AttrItemAccessMixin

Mapping Proxy for layered containers

get(k[, d]) → D[k] if k in D, else d. d defaults to None.[source]
items() → a set-like object providing a view on D’s items[source]
keys() → a set-like object providing a view on D’s keys[source]
values() → an object providing a view on D’s values[source]
class ymp.yaml.MultiMapProxyItemsView(mapping)[source]

Bases: ymp.yaml.MultiMapProxyMappingView, collections.abc.ItemsView

ItemsView for MultiMapProxy

class ymp.yaml.MultiMapProxyKeysView(mapping)[source]

Bases: ymp.yaml.MultiMapProxyMappingView, collections.abc.KeysView

KeysView for MultiMapProxy

class ymp.yaml.MultiMapProxyMappingView(mapping)[source]

Bases: collections.abc.MappingView

MappingView for MultiMapProxy

class ymp.yaml.MultiMapProxyValuesView(mapping)[source]

Bases: ymp.yaml.MultiMapProxyMappingView, collections.abc.ValuesView

ValuesView for MultiMapProxy

class ymp.yaml.MultiProxy(maps, parent=None, key=None)[source]

Bases: object

Base class for layered container structure

add_layer(name, container)[source]
get_files()[source]
get_linenos()[source]
make_map_proxy(key, items)[source]
make_seq_proxy(key, items)[source]
remove_layer(name)[source]
to_yaml(show_source=False)[source]
class ymp.yaml.MultiSeqProxy(maps, parent=None, key=None)[source]

Bases: collections.abc.Sequence, ymp.yaml.MultiProxy, ymp.yaml.AttrItemAccessMixin

Sequence Proxy for layered containers

extend(item)[source]
ymp.yaml.load(files)[source]

Load configuration files

Creates a LayeredConfProxy configuration object from a set of YAML files.