ymp package¶

ymp.get_config()[source]¶

Access the current YMP configuration object.

This object might change once during normal execution: it is deleted before passing control to Snakemake. During unit test execution the object is deleted between all tests.

Return type: ConfigMgr

ymp.print_rule = 0¶

Set to 1 to show the YMP expansion process as it is applied to the next Snakemake rule definition.

>>> ymp.print_rule = 1
>>> rule broken:
>>>   ...

>>> ymp make broken -vvv

ymp.snakemake_versions = ['5.20.1']¶: List of versions this version of YMP has been verified to work with

Subpackages¶

Submodules¶

ymp.blast module¶

Parsers for blast output formats 6 (CSV) and 7 (CSV with comments between queries).

class ymp.blast.BlastParser[source]¶

Bases: object

Base class for BLAST parsers

FIELD_MAP = {'% identity': 'pident', 'alignment length': 'length', 'bit score': 'bitscore', 'evalue': 'evalue', 'gap opens': 'gapopen', 'mismatches': 'mismatch', 'q. end': 'qend', 'q. start': 'qstart', 'query acc.': 'qacc', 'query frame': 'qframe', 'query length': 'qlen', 's. end': 'send', 's. start': 'sstart', 'sbjct frame': 'sframe', 'score': 'score', 'subject acc.': 'sacc', 'subject strand': 'sstrand', 'subject tax ids': 'staxids', 'subject title': 'stitle'}¶

FIELD_TYPE = {'bitscore': <class 'float'>, 'evalue': <class 'float'>, 'gapopen': <class 'int'>, 'length': <class 'int'>, 'mismatch': <class 'int'>, 'pident': <class 'float'>, 'qend': <class 'int'>, 'qframe': <class 'int'>, 'qlen': <class 'int'>, 'qstart': <class 'int'>, 'score': <class 'float'>, 'send': <class 'int'>, 'sframe': <class 'int'>, 'sstart': <class 'int'>, 'staxids': <function BlastParser.tupleofint>, 'stitle': <class 'str'>}¶

tupleofint()[source]¶

class ymp.blast.Fmt6Parser(fileobj)[source]¶

Bases: ymp.blast.BlastParser

Parser for BLAST format 6 (CSV)

Hit¶: alias of BlastHit

field_types = [None, None, <class 'float'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'float'>, <class 'float'>]¶

fields = ['qseqid', 'sseqid', 'pident', 'length', 'mismatch', 'gapopen', 'qstart', 'qend', 'sstart', 'send', 'evalue', 'bitscore']¶: Default field types

get_fields()[source]¶

class ymp.blast.Fmt7Parser(fileobj)[source]¶

Bases: ymp.blast.BlastParser

Parses BLAST results in format ‘7’ (CSV with comments)

DATABASE = '# Database: '¶

FIELDS = '# Fields: '¶

HITSFOUND = ' hits found'¶

QUERY = '# Query: '¶

get_fields()[source]¶

Returns list of available field names

Format 7 specifies which columns it contains in comment lines, allowing this parser to be agnostic of the selection of columns made when running BLAST.

Return type: List[str]
Returns: List of field names (e.g. ['sacc', 'qacc', 'evalue'])

isfirsthit()[source]¶

Returns True if the current hit is the first hit for the current query

Return type: bool

ymp.blast.reader(fileobj, t=7)[source]¶

Creates a reader for files in BLAST format

>>> with open(blast_file) as infile:
>>>    reader = blast.reader(infile)
>>>    for hit in reader:
>>>       print(hit)

Parameters

fileobj – iterable yielding lines in blast format
t (int) – number of blast format type

Return type

BlastParser

ymp.blast2gff module¶

ymp.cluster module¶

Module handling talking to cluster management systems

>>> python -m ymp.cluster slurm status <jobid>

class ymp.cluster.ClusterMS[source]¶: Bases: object

class ymp.cluster.Lsf[source]¶

Bases: ymp.cluster.ClusterMS

Talking to LSF

states = {'DONE': 'success', 'EXIT': 'failed', 'PEND': 'running', 'POST_DONE': 'success', 'POST_ERR': 'failed', 'PSUSP': 'running', 'RUN': 'running', 'SSUSP': 'running', 'UNKWN': 'running', 'USUSP': 'running', 'WAIT': 'running'}¶

static status(jobid)[source]¶

static submit(args)[source]¶

class ymp.cluster.Slurm[source]¶

Bases: ymp.cluster.ClusterMS

Talking to Slurm

states = {'BOOT_FAIL': 'failed', 'CANCELLED': 'failed', 'COMPLETED': 'success', 'COMPLETING': 'running', 'CONFIGURING': 'running', 'DEADLINE': 'failed', 'FAILED': 'failed', 'NODE_FAIL': 'failed', 'PENDING': 'running', 'PREEMPTED': 'failed', 'RESIZING': 'running', 'REVOKED': 'running', 'RUNNING': 'running', 'SPECIAL_EXIT': 'running', 'SUSPENDED': 'running', 'TIMEOUT': 'failed'}¶

static status(jobid)[source]¶

Print status of job @param jobid to stdout (as needed by snakemake)

Anectotal benchmarking shows 200ms per invocation, half used by Python startup and half by calling sacct. Using scontrol show job instead of sacct -pbs is faster by 80ms, but finished jobs are purged after unknown time window.

ymp.cluster.error(*args, **kwargs)[source]¶

ymp.common module¶

Collection of shared utility classes and methods

class ymp.common.AttrDict[source]¶

Bases: dict

AttrDict adds accessing stored keys as attributes to dict

class ymp.common.Cache(root)[source]¶

Bases: object

close()[source]¶

commit()[source]¶

get_cache(name, clean=False, *args, **kwargs)[source]¶

load(cache, key)[source]¶

load_all(cache)[source]¶

store(cache, key, obj)[source]¶

class ymp.common.CacheDict(cache, name, *args, loadfunc=None, itemloadfunc=None, itemdata=None, **kwargs)[source]¶

Bases: ymp.common.AttrDict

get(k[, d]) → D[k] if k in D, else d. d defaults to None.[source]¶

items() → a set-like object providing a view on D’s items[source]¶

keys() → a set-like object providing a view on D’s keys[source]¶

values() → an object providing a view on D’s values[source]¶

class ymp.common.MkdirDict[source]¶

Bases: ymp.common.AttrDict

Creates directories as they are requested

ymp.common.ensure_list(obj)[source]¶: Wrap obj in a list as needed

ymp.common.flatten(item)[source]¶: Flatten lists without turning strings into letters

ymp.common.is_container(obj)[source]¶: Check if object is container, considering strings not containers

ymp.common.parse_number(s='')[source]¶

Basic 1k 1m 1g 1t parsing.

assumes base 2
returns “byte” value
accepts “1kib”, “1kb” or “1k”

ymp.config module¶

class ymp.config.ConfigExpander(config_mgr)[source]¶

Bases: ymp.snakemake.ColonExpander

class Formatter(expander)[source]¶

Bases: ymp.snakemake.FormatExpander.Formatter, ymp.string.PartialFormatter

get_value(field_name, args, kwargs)[source]¶

expands_field(field)[source]¶

Checks if this expander should expand a Rule field type

Parameters: field – the field to check
Returns: True if field should be expanded.

class ymp.config.ConfigMgr(root, conffiles)[source]¶

Bases: object

Manages workflow configuration

This is a singleton object of which only one instance should be around at a given time. It is available in the rules files as icfg and via ymp.get_config() elsewhere.

ConfigMgr loads and maintains the workflow configuration as given in the ymp.yml files located in the workflow root directory, the user config folder (~/.ymp) and the installation etc folder.

CONF_DEFAULT_FNAME = '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/etc/defaults.yml'¶

CONF_FNAME = 'ymp.yml'¶

CONF_USER_FNAME = '/home/docs/.ymp/ymp.yml'¶

KEY_PIPELINES = 'pipelines'¶

KEY_PROJECTS = 'projects'¶

KEY_REFERENCES = 'references'¶

RULE_MAIN_FNAME = '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/rules/Snakefile'¶

property absdir¶: Dictionary of absolute paths of named YMP directories

classmethod activate()[source]¶

property cluster¶: The YMP cluster configuration.

property conda¶

property dir¶

Dictionary of relative paths of named YMP directories

The directory paths are relative to the YMP root workdir.

property ensuredir¶

Dictionary of absolute paths of named YMP directories

Directories will be created on the fly as they are requested.

expand(item, **kwargs)[source]¶

classmethod find_config()[source]¶

Locates ymp config files and ymp root

The root ymp work dir is determined as the first (parent) directory containing a file named ConfigMgr.CONF_FNAME (default ymp.yml).

The stack of config files comprises 1. the default config ConfigMgr.CONF_DEFAULT_FNAME (etc/defaults.yml in the ymp package directory), 2. the user config ConfigMgr.CONF_USER_FNAME (~/.ymp/ymp.yml) and 3. the yml.yml in the ymp root.

Returns: Root working directory conffiles: list of active configuration files
Return type: root

classmethod instance()[source]¶: Returns the active Ymp ConfigMgr instance

property limits¶: The YMP limits configuration.

mem(base='0', per_thread=None, unit='m')[source]¶

Clamp memory to configuration limits

Params:: base: base memory requested per_thread: additional mem required per allocated thread unit: output unit (b, k, m, g, t)

property pairnames¶

property pipeline¶: Configure pipelines

property platform¶: Name of current platform (macos or linux)

property ref¶: Configure references

property shell¶

The shell used by YMP

Change by adding e.g. shell: /path/to/shell to ymp.yml.

property snakefiles¶: Snakefiles used under this config in parsing order

classmethod unload()[source]¶

class ymp.config.OverrideExpander(cfgmgr)[source]¶

Bases: ymp.snakemake.BaseExpander

Apply rule attribute overrides from ymp.yml config

Example

Set the wordsize parameter in the bmtagger_bitmask rule to 12:

ymp.yml¶

overrides:
  rules:
    bmtagger_bitmask:
      params:
        wordsize: 12

expand(rule, ruleinfo, **kwargs)[source]¶

Expands RuleInfo object and children recursively.

Will call :meth:format (via :meth:format_annotated) on str items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.

Set ymp.print_rule = 1 before a rule: statement in snakefiles to enable debug logging of recursion.

Parameters

rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item
item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be None, str, function, int, float, dict, list or tuple.
expand_args – Parameters passed on late expansion (when the dag tries to instantiate the rule into a job.
rec – Recursion level

ymp.dna module¶

ymp.dna.nuc2aa(seq)¶

ymp.dna.nuc2num(seq)¶

ymp.download module¶

class ymp.download.DownloadThread[source]¶

Bases: object

get(url, dest, md5)[source]¶

main()[source]¶

terminate()[source]¶

class ymp.download.FileDownloader(block_size=4096, timeout=300, parallel=4, loglevel=30, alturls=None, retry=3)[source]¶

Bases: object

Manages download of a set of URLs

Downloads happen concurrently using asyncronous network IO.

Parameters

block_size (int) – Byte size of chunks to download
timeout (int) – Aiohttp cumulative timeout
parallel (int) – Number of files to download in parallel
loglevel (int) – Log level for messages send to logging (Errors are send with loglevel+10)
alturls – List of regexps modifying URLs
retry (int) – Number of times to retry download

error(msg, *args, **kwargs)[source]¶

Send error to logger

Message is sent with a log level 10 higher than the default for this object.

Return type: None

get(urls, dest, md5s=None)[source]¶

Download a list of URLs

Parameters

urls (Union[str, List[str]]) – List of URLs
dest (str) – Destination folder
md5s (Optional[List[str]]) – List of MD5 sums to check

Return type

None

log(msg, *args, modlvl=0, **kwargs)[source]¶

Send message to logger

Honors loglevel set for the FileDownloader object.

Parameters

msg (str) – The log message
modlvl (int) – Added to default logging level for object

Return type

None

static make_bar_format(desc_width=20, count_width=0, rate=False, eta=False, have_total=True)[source]¶

Construct bar_format for tqdm

Parameters

desc_width (int) – minimum space allocated for description
count_width (int) – min space for counts
rate (bool) – show rate to right of progress bar
eta (bool) – show eta to right of progress bar
have_total (bool) – whether a total exists (required to add percentage)

Return type

str

ymp.env module¶

This module manages the conda environments.

class ymp.env.CondaPathExpander(config, *args, **kwargs)[source]¶

Bases: ymp.snakemake.BaseExpander

Applies search path for conda environment specifications

File names supplied via rule: conda: "some.yml" are replaced with absolute paths if they are found in any searched directory. Each search_paths entry is appended to the directory containing the top level Snakefile and the directory checked for the filename. Thereafter, the stack of including Snakefiles is traversed backwards. If no file is found, the original name is returned.

expands_field(field)[source]¶

Checks if this expander should expand a Rule field type

Parameters: field – the field to check
Returns: True if field should be expanded.

format(conda_env, *args, **kwargs)[source]¶: Format item using *args and **kwargs

class ymp.env.Env(env_file=None, dag=None, singularity_img=None, container_img=None, cleanup=None, name=None, packages=None, base='none', channels=None, rule=None)[source]¶

Bases: ymp.snakemake.WorkflowObject, snakemake.deployment.conda.Env

Represents YMP conda environment

Snakemake expects the conda environments in a per-workflow directory configured by conda_prefix. YMP sets this value by default to ~/.ymp/conda, which has a greater chance of being on the same file system as the conda cache, allowing for hard linking of environment files.

Within the folder conda_prefix, each environment is created in a folder named by the hash of the environment definition file’s contents and the conda_prefix path. This class inherits from snakemake.deployment.conda.Env to ensure that the hash we use is identical to the one Snakemake will use during workflow execution.

The class provides additional features for updating environments, creating environments dynamically and executing commands within those environments.

Note

This is not called from within the execution. Snakemake instanciates its own Env object purely based on the filename.

Creates an inline defined conda environment

Parameters

name (Optional[str]) – Name of conda environment (and basename of file)
packages (Union[list, str, None]) – package(s) to be installed into environment. Version constraints can be specified in each package string separated from the package name by whitespace. E.g. "blast =2.6*"
channels (Union[list, str, None]) – channel(s) to be selected for the environment
base (str) – Select a set of default channels and packages to be added to the newly created environment. Sets are defined in conda.defaults in yml.yml

create(dryrun=False, force=False)[source]¶

Ensure the conda environment has been created

Inherits from snakemake.conda.Env.create

Behavior of super class

The environment is installed in a folder in conda_prefix named according to a hash of the environment.yaml defining the environment and the value of conda-prefix (Env.hash). The latter is included as installed environments cannot be moved.

If this folder (Env.path) exists, nothing is done.
If a folder named according to the hash of just the contents of environment.yaml exists, the environment is created by unpacking the tar balls in that folder.

Handling pre-computed environment specs

In addition to freezing environments by maintaining a copy of the package binaries, we allow maintaining a copy of the package binary URLs, from which the archive folder is populated on demand.

If a file {Env.name}.txt exists in conda.spec FIXME

export(stream, typ='yml')[source]¶: Freeze environment

static get_installed_env_hashes()[source]¶

property installed¶

run(command)[source]¶

Execute command in environment

Returns exit code of command run.

set_prefix(prefix)[source]¶

update()[source]¶: Update conda environment

ymp.exceptions module¶

Exceptions raised by YMP

exception ymp.exceptions.YmpConfigError(obj, msg, key=None, exc=None)[source]¶

Bases: ymp.exceptions.YmpNoStackException

Indicates an error in the ymp.yml config files

Parameters

obj (object) – Subtree of config causing error
msg (str) – The message to display
key (object) – Key indicating part of obj causing error
exc (Optional[Exception]) – Upstream exception causing error

exception ymp.exceptions.YmpException[source]¶

Bases: Exception

Base class of all YMP Exceptions

exception ymp.exceptions.YmpNoStackException(message)[source]¶

Bases: ymp.exceptions.YmpException, click.exceptions.ClickException

Exception that does not lead to stack trace on CLI

Inheriting from ClickException makes click print only the self.msg value of the exception, rather than allowing Python to print a full stack trace.

This is useful for exceptions indicating usage or configuration errors. We use this, instead of click.UsageError and friends so that the exceptions can be caught and handled explicitly where needed.

Note that click will call the show method on this object to print the exception. The default implementation from click will just prefix the msg with Error:.

FIXME: This does not work if the exception is raised from within: the snakemake workflow as snakemake.snakemake catches and reformats exceptions.

exception ymp.exceptions.YmpRuleError(obj, msg)[source]¶

Bases: ymp.exceptions.YmpNoStackException

Indicates an error in the rules files

This could e.g. be a Stage or Environment defined twice.

Parameters

obj (object) – The object causing the exception. Must have lineno and filename as these will be shown as part of the error message on the command line.
msg (str) – The message to display

show()[source]¶

Return type: None

exception ymp.exceptions.YmpStageError(msg)[source]¶

Bases: ymp.exceptions.YmpNoStackException

Indicates an error in the requested stage stack

show()[source]¶

Return type: None

exception ymp.exceptions.YmpSystemError(message)[source]¶

Bases: ymp.exceptions.YmpNoStackException

Indicates problem running YMP with available system software

exception ymp.exceptions.YmpUsageError(message)[source]¶: Bases: ymp.exceptions.YmpNoStackException

exception ymp.exceptions.YmpWorkflowError(message)[source]¶

Bases: ymp.exceptions.YmpNoStackException

Indicates an error during workflow execution

E.g. failures to expand dynamic variables

ymp.gff module¶

Implements simple reader and writer for GFF (general feature format) files.

Unfinished

only supports one version, GFF 3.2.3.

no escaping

class ymp.gff.Attributes(ID, Name, Alias, Parent, Target, Gap, Derives_From, Note, Dbxref, Ontology_term, Is_circular)¶

Bases: tuple

Create new instance of Attributes(ID, Name, Alias, Parent, Target, Gap, Derives_From, Note, Dbxref, Ontology_term, Is_circular)

property Alias¶: Alias for field number 2

property Dbxref¶: Alias for field number 8

property Derives_From¶: Alias for field number 6

property Gap¶: Alias for field number 5

property ID¶: Alias for field number 0

property Is_circular¶: Alias for field number 10

property Name¶: Alias for field number 1

property Note¶: Alias for field number 7

property Ontology_term¶: Alias for field number 9

property Parent¶: Alias for field number 3

property Target¶: Alias for field number 4

class ymp.gff.Feature(seqid, source, type, start, end, score, strand, phase, attributes)¶

Bases: tuple

Create new instance of Feature(seqid, source, type, start, end, score, strand, phase, attributes)

property attributes¶: Alias for field number 8

property end¶: Alias for field number 4

property phase¶: Alias for field number 7

property score¶: Alias for field number 5

property seqid¶: Alias for field number 0

property source¶: Alias for field number 1

property start¶: Alias for field number 3

property strand¶: Alias for field number 6

property type¶: Alias for field number 2

class ymp.gff.reader(fileobj)[source]¶: Bases: object

class ymp.gff.writer(fileobj)[source]¶

Bases: object

write(feature)[source]¶

ymp.helpers module¶

This module contains helper functions.

Not all of these are currently in use

class ymp.helpers.OrderedDictMaker[source]¶

Bases: object

odict creates OrderedDict objects in a dict-literal like syntax

>>>  my_ordered_dict = odict[
>>>    'key': 'value'
>>>  ]

Implementation: odict uses the python slice syntax which is similar to dict literals. The [] operator is implemented by overriding __getitem__. Slices passed to the operator as object[start1:stop1:step1, start2:...], are passed to the implementation as a list of objects with start, stop and step members. odict simply creates an OrderedDictionary by iterating over that list.

ymp.helpers.update_dict(dst, src)[source]¶

Recursively update dictionary dst with src

Treats a list as atomic, replacing it with new list.
Dictionaries are overwritten by item
None is replaced by empty dict if necessary

ymp.map2otu module¶

class ymp.map2otu.MapfileParser(minid=0)[source]¶

Bases: object

read(mapfiles)[source]¶

write(outfile)[source]¶

class ymp.map2otu.emirge_info(line)[source]¶: Bases: object

ymp.map2otu.main()[source]¶

ymp.nuc2aa module¶

ymp.nuc2aa.fasta_dna2aa(inf, outf)[source]¶

ymp.nuc2aa.nuc2aa(seq)[source]¶

ymp.nuc2aa.nuc2num(seq)[source]¶

ymp.snakemake module¶

Extends Snakemake Features

class ymp.snakemake.BaseExpander[source]¶

Bases: object

Base class for Snakemake expansion modules.

Subclasses should override the :meth:expand method if they need to work on the entire RuleInfo object or the :meth:format and :meth:expands_field methods if they intend to modify specific fields.

expand(rule, item, expand_args=None, rec=- 1, cb=False)[source]¶

Expands RuleInfo object and children recursively.

Will call :meth:format (via :meth:format_annotated) on str items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.

Set ymp.print_rule = 1 before a rule: statement in snakefiles to enable debug logging of recursion.

Parameters

rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item
item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be None, str, function, int, float, dict, list or tuple.
expand_args – Parameters passed on late expansion (when the dag tries to instantiate the rule into a job.
rec – Recursion level

expand_dict(rule, item, expand_args, rec)[source]¶

expand_func(rule, item, expand_args, rec, debug)[source]¶

expand_list(rule, item, expand_args, rec, cb)[source]¶

expand_ruleinfo(rule, item, expand_args, rec)[source]¶

expand_str(rule, item, expand_args, rec, cb)[source]¶

expand_tuple(rule, item, expand_args, rec, cb)[source]¶

expands_field(field)[source]¶

Checks if this expander should expand a Rule field type

Parameters: field – the field to check
Returns: True if field should be expanded.

format(item, *args, **kwargs)[source]¶: Format item using *args and **kwargs

format_annotated(item, expand_args)[source]¶

Wrapper for :meth:format preserving AnnotatedString flags

Calls :meth:format to format item into a new string and copies flags from original item.

This is used by :meth:expand

link_workflow(workflow)[source]¶

Called when the Expander is associated with a workflow

May be called multiple times if a new workflow object is created.

exception ymp.snakemake.CircularReferenceException(deps, rule)[source]¶

Bases: ymp.exceptions.YmpRuleError

Exception raised if parameters in rule contain a circular reference

class ymp.snakemake.ColonExpander[source]¶

Bases: ymp.snakemake.FormatExpander

Expander using {:xyz:} formatted variables.

regex = re.compile('\n \\{:\n (?=(\n \\s*\n (?P<name>(?:.(?!\\s*\\:\\}))*.)\n \\s*\n ))\\1\n :\\}\n ', re.VERBOSE)¶

spec = '{{:{}:}}'¶

class ymp.snakemake.DefaultExpander(**kwargs)[source]¶

Bases: ymp.snakemake.InheritanceExpander

Adds default values to rules

The implementation simply makes all rules inherit from a defaults rule.

Creates DefaultExpander

Each parameter passed is considered a RuleInfo default value. Where applicable, Snakemake’s argtuples ([],{}) must be passed.

get_super(rule, ruleinfo)[source]¶

Find rule parent

Parameters

rule (Rule) – Rule object being built
ruleinfo (RuleInfo) – RuleInfo object describing rule being built

Returns

name of parent rule and RuleInfo describing parent rule or (None, None).

Return type

2-Tuple

exception ymp.snakemake.ExpandLateException[source]¶: Bases: Exception

class ymp.snakemake.ExpandableWorkflow(*args, **kwargs)[source]¶

Bases: snakemake.workflow.Workflow

Adds hook for additional rule expansion methods to Snakemake

Constructor for ExpandableWorkflow overlay attributes

This may be called on an already initialized Workflow object.

classmethod activate()[source]¶

Installs the ExpandableWorkflow

Replaces the Workflow object in the snakemake.workflow module with an instance of this class and initializes default expanders (the snakemake syntax).

add_rule(name=None, lineno=None, snakefile=None, checkpoint=False)[source]¶

Add a rule.

Parameters

name – name of the rule
lineno – line number within the snakefile where the rule was defined
snakefile – name of file in which rule was defined

classmethod clear()[source]¶

classmethod ensure_global_workflow()[source]¶

get_rule(name=None)[source]¶

Get rule by name. If name is none, the last created rule is returned.

Parameters: name – the name of the rule

global_workflow = <ymp.snakemake.ExpandableWorkflow object>¶

classmethod load_workflow(snakefile='/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/rules/Snakefile')[source]¶

classmethod register_expanders(*expanders)[source]¶: Register an object the expand() function of which will be called on each RuleInfo object before it is passed on to snakemake.

rule(name=None, lineno=None, snakefile=None, checkpoint=None)[source]¶: Intercepts “rule:” Here we have the entire ruleinfo object

class ymp.snakemake.FormatExpander[source]¶

Bases: ymp.snakemake.BaseExpander

Expander using a custom formatter object.

class Formatter(expander)[source]¶

Bases: ymp.string.ProductFormatter

parse(format_string)[source]¶

format(*args, **kwargs)[source]¶: Format item using *args and **kwargs

get_names(pattern)[source]¶

regex = re.compile('\n \\{\n (?=(\n (?P<name>[^{}]+)\n ))\\1\n \\}\n ', re.VERBOSE)¶

spec = '{{{}}}'¶

exception ymp.snakemake.InheritanceException(msg, rule, parent, include=None, lineno=None, snakefile=None)[source]¶

Bases: snakemake.exceptions.RuleException

Exception raised for errors during rule inheritance

Creates a new instance of RuleException.

Arguments message – the exception message include – iterable of other exceptions to be included lineno – the line the exception originates snakefile – the file the exception originates

class ymp.snakemake.InheritanceExpander[source]¶

Bases: ymp.snakemake.BaseExpander

Adds class-like inheritance to Snakemake rules

To avoid redundancy between closely related rules, e.g. rules for single ended and paired end data, YMP allows Snakemake rules to inherit from another rule.

Example

Derived rules are always created with an implicit ruleorder statement, making Snakemake prefer the parent rule if either parent or child rule could be used to generate the requested output file(s).

Derived rules initially contain the same attributes as the parent rule. Each attribute assigned to the child rule overrides the matching attribute in the parent. Where attributes may contain named and unnamed values, specifying a named value overrides only the value of that name while specifying an unnamed value overrides all unnamed values in the parent attribute.

KEYWORD = 'ymp: extends'¶: Comment keyword enabling inheritance

expand(rule, ruleinfo)[source]¶

Expands RuleInfo object and children recursively.

Will call :meth:format (via :meth:format_annotated) on str items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.

Set ymp.print_rule = 1 before a rule: statement in snakefiles to enable debug logging of recursion.

Parameters

rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item
item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be None, str, function, int, float, dict, list or tuple.
expand_args – Parameters passed on late expansion (when the dag tries to instantiate the rule into a job.
rec – Recursion level

get_code_line(rule)[source]¶

Returns the source line defining rule

Return type: str

get_super(rule, ruleinfo)[source]¶

Find rule parent

Parameters

rule (Rule) – Rule object being built
ruleinfo (RuleInfo) – RuleInfo object describing rule being built

Returns

name of parent rule and RuleInfo describing parent rule or (None, None).

Return type

2-Tuple

class ymp.snakemake.NamedList(fromtuple=None, **kwargs)[source]¶

Bases: snakemake.io.Namedlist

Extended version of Snakemake’s Namedlist

Fixes array assignment operator: Writing a field via [] operator updates the value accessed via . operator.
Adds fromtuple to constructor: Builds from Snakemake’s typial (args, kwargs) tuples as present in ruleinfo structures.
Adds update_tuple method: Updates values in (args,kwargs) tuples as present in ruleinfo structures.

Create the object.

Arguments toclone – another Namedlist that shall be cloned fromdict – a dict that shall be converted to a

System Message: ERROR/3 (/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/snakemake.py:docstring of ymp.snakemake.NamedList, line 18)

Unexpected indentation.

Namedlist (keys become names)

get_names(*args, **kwargs)[source]¶: Export get_names as public func

update_tuple(totuple)[source]¶: Update values in (args, kwargs) tuple. The tuple must be the same as used in the constructor and must not have been modified.

class ymp.snakemake.RecursiveExpander[source]¶

Bases: ymp.snakemake.BaseExpander

Recursively expands {xyz} wildcards in Snakemake rules.

expand(rule, ruleinfo)[source]¶: Recursively expand wildcards within RuleInfo object

expands_field(field)[source]¶

Returns true for all fields but shell:, message: and wildcard_constraints.

We don’t want to mess with the regular expressions in the fields in wildcard_constraints:, and there is little use in expanding message: or shell: as these already have all wildcards applied just before job execution (by format_wildcards()).

class ymp.snakemake.SnakemakeExpander[source]¶

Bases: ymp.snakemake.BaseExpander

Expand wildcards in strings returned from functions.

Snakemake does not do this by default, leaving wildcard expansion to the functions provided themselves. Since we never want {input} to be in a string returned as a file, we expand those always.

expands_field(field)[source]¶

Checks if this expander should expand a Rule field type

Parameters: field – the field to check
Returns: True if field should be expanded.

format(item, *args, **kwargs)[source]¶: Format item using *args and **kwargs

class ymp.snakemake.WorkflowObject(*args, **kwargs)[source]¶

Bases: object

Base for extension classes defined from snakefiles

This currently encompasses ymp.env.Env and ymp.stage.Stage.

This mixin sets the properties filename and lineno according to the definition source in the rules file. It also maintains a registry within the Snakemake workflow object and provides an accessor method to this registry.

property defined_in¶

filename¶

Name of file in which object was defined

Type: str

classmethod get_registry(clean=False)[source]¶: Return all objects of this class registered with current workflow

lineno¶

Line number of object definition

Type: int

classmethod new_registry()[source]¶

register()[source]¶: Add self to registry

ymp.snakemake.check_snakemake()[source]¶

Return type: bool

ymp.snakemake.get_workflow()[source]¶: Get active workflow, loading one if necessary

ymp.snakemake.load_workflow(snakefile)[source]¶: Load new workflow

ymp.snakemake.make_rule(name=None, lineno=None, snakefile=None, **kwargs)[source]¶

ymp.snakemake.networkx()[source]¶

ymp.snakemake.print_ruleinfo(rule, ruleinfo, func=<bound method Logger.debug of <Logger ymp.snakemake (WARNING)>>)[source]¶

Logs contents of Rule and RuleInfo objects.

Parameters

rule (Rule) – Rule object to be printed
ruleinfo (RuleInfo) – Matching RuleInfo object to be printed
func – Function used for printing (default is log.error)

ymp.snakemake.ruleinfo_fields = {'benchmark': {'apply_wildcards': True, 'format': 'string'}, 'conda_env': {'apply_wildcards': True, 'format': 'string'}, 'container_img': {'format': 'string'}, 'docstring': {'format': 'string'}, 'func': {'format': 'callable'}, 'input': {'apply_wildcards': True, 'format': 'argstuple', 'funcparams': ('wildcards',)}, 'log': {'apply_wildcards': True, 'format': 'argstuple'}, 'message': {'format': 'string', 'format_wildcards': True}, 'norun': {'format': 'bool'}, 'output': {'apply_wildcards': True, 'format': 'argstuple'}, 'params': {'apply_wildcards': True, 'format': 'argstuple', 'funcparams': ('wildcards', 'input', 'resources', 'output', 'threads')}, 'priority': {'format': 'numeric'}, 'resources': {'format': 'argstuple', 'funcparams': ('wildcards', 'input', 'attempt', 'threads')}, 'script': {'format': 'string'}, 'shadow_depth': {'format': 'string_or_true'}, 'shellcmd': {'format': 'string', 'format_wildcards': True}, 'threads': {'format': 'int', 'funcparams': ('wildcards', 'input', 'attempt', 'threads')}, 'version': {'format': 'object'}, 'wildcard_constraints': {'format': 'argstuple'}, 'wrapper': {'format': 'string'}}¶: describes attributes of snakemake.workflow.RuleInfo

ymp.snakemakelexer module¶

ymp.snakemakelexer¶

class ymp.snakemakelexer.SnakemakeLexer(*args, **kwds)[source]¶

Bases: pygments.lexers.python.PythonLexer

name = 'Snakemake'¶

tokens = {'globalkeyword': [(<pygments.lexer.words object>, Token.Keyword)], 'root': [('(rule|checkpoint)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'rulename'), 'rulekeyword', 'globalkeyword', ('\\n', Token.Text), ('^(\\s*)([rRuUbB]{,2})("""(?:.|\\n)*?""")', <function bygroups.<locals>.callback>), ("^(\\s*)([rRuUbB]{,2})('''(?:.|\\n)*?''')", <function bygroups.<locals>.callback>), ('\\A#!.+$', Token.Comment.Hashbang), ('#.*$', Token.Comment.Single), ('\\\\\\n', Token.Text), ('\\\\', Token.Text), 'keywords', ('(def)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'funcname'), ('(class)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'classname'), ('(from)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'fromimport'), ('(import)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'import'), 'expr'], 'rulekeyword': [(<pygments.lexer.words object>, Token.Keyword)], 'rulename': [('[a-zA-Z_]\\w*', Token.Name.Class, '#pop')]}¶

ymp.sphinxext module¶

This module contains a Sphinx extension for documenting YMP stages and Snakemake rules.

The SnakemakeDomain (name sm) provides the following directives:

.. sm:rule:: name¶: Describes a Snakemake rule

.. sm:stage:: name¶: Describes a YMP Stage

Both directives accept an optional source parameter. If given, a link to the source code of the stage or rule definition will be added. The format of the string passed is filename:line. Referenced Snakefiles will be highlighted with pygments and added to the documentation when building HTML.

The extension also provides an autodoc-like directive:

.. autosnake:: filename¶: Generates documentation from Snakefile filename.

class ymp.sphinxext.AutoSnakefileDirective(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶

Bases: docutils.parsers.rst.Directive

Implements RSt directive .. autosnake:: filename

The directive extracts docstrings from rules in snakefile and auto-generates documentation.

has_content = False¶

This rule does not accept content

Type: bool

load_workflow(file_path)[source]¶

Load the Snakefile

Return type: ExpandableWorkflow

parse_doc(doc, source, idt=0)[source]¶

Convert doc string to StringList

Parameters

doc (str) – Documentation text
source (str) – Source filename
idt (int) – Result indentation in characters (default 0)

Return type

StringList

Returns

StringList of re-indented documentation wrapped in newlines

parse_rule(rule, idt=0)[source]¶

Convert Rule to StringList

Parameters

rule (Rule) – Rule object
idt (int) – Result indentation in characters (default 0)

Retuns:: StringList containing formatted Rule documentation

Return type: StringList

parse_stage(stage, idt=0)[source]¶

Return type: StringList

required_arguments = 1¶

This rule needs one argument (the filename)

Type: int

run()[source]¶: Entry point

tpl_rule = '.. sm:rule:: {name}'¶

Template for generated Rule RSt

Type: str

tpl_source = ' :source: {filename}:{lineno}'¶

Template option source

Type: str

tpl_stage = '.. sm:stage:: {name}'¶

Template for generated Stage RSt

Type: str

ymp.sphinxext.BASEPATH = '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src'¶

Path in which YMP package is located

Type: str

class ymp.sphinxext.DomainTocTreeCollector[source]¶

Bases: sphinx.environment.collectors.EnvironmentCollector

Add Sphinx Domain entries to the TOC

clear_doc(app, env, docname)[source]¶

Clear data from environment

If we have cached data in environment for document docname, we should clear it here.

Return type: None

get_ref(node)[source]¶

Return type: Optional[Node]

locate_in_toc(app, node)[source]¶

Return type: Optional[Node]

make_heading(node)[source]¶

Return type: List[Node]

merge_other(app, env, docnames, other)[source]¶

Merge with results from parallel processes

Called if Sphinx is processing documents in parallel. We should merge this from other into env for all docnames.

Return type: None

process_doc(app, doctree)[source]¶

Process doctree

This is called by read-doctree, so after the doctree has been loaded. The signal is processed in registered first order, so we are called after built-in extensions, such as the sphinx.environment.collectors.toctree extension building the TOC.

Return type: None

select_doc_nodes(doctree)[source]¶

Select the nodes for which entries in the TOC are desired

This is a separate method so that it might be overriden by subclasses wanting to add other types of nodes to the TOC.

Return type: List[Node]

select_toc_location(app, node)[source]¶

Select location in TOC where node should be referenced

Return type: Node

toc_insert(docname, tocnode, node, heading)[source]¶

Return type: None

class ymp.sphinxext.SnakemakeDomain(env)[source]¶

Bases: sphinx.domains.Domain

Snakemake language domain

clear_doc(docname)[source]¶: Delete objects derived from file docname

data_version = 0¶

directives = {'rule': <class 'ymp.sphinxext.SnakemakeRule'>, 'stage': <class 'ymp.sphinxext.YmpStage'>}¶

get_objects()[source]¶

Return an iterable of “object descriptions”.

Object descriptions are tuples with six items:

name

Fully qualified name.

dispname

Name to display when searching/linking.

type

Object type, a key in self.object_types.

docname

The document where it is to be found.

anchor

The anchor name for the object.

priority

How “important” the object is (determines placement in search results). One of:

1: Default priority (placed before full-text matches).
0: Object is important (placed before default-priority objects).
2: Object is unimportant (placed after full-text matches).
-1: Object should not show up in search at all.

initial_data = {'objects': {}}¶

label = 'Snakemake'¶

name = 'sm'¶

object_types = {'rule': <sphinx.domains.ObjType object>, 'stage': <sphinx.domains.ObjType object>}¶

resolve_xref(env, fromdocname, builder, typ, target, node, contnode)[source]¶

Resolve the pending_xref node with the given typ and target.

This method should return a new node, to replace the xref node, containing the contnode which is the markup content of the cross-reference.

If no resolution can be found, None can be returned; the xref node will then given to the :event:`missing-reference` event, and if that yields no resolution, replaced by contnode.

System Message: ERROR/3 (/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/sphinxext.py:docstring of ymp.sphinxext.SnakemakeDomain.resolve_xref, line 7); backlink

Unknown interpreted text role “event”.

The method can also raise sphinx.environment.NoUri to suppress the :event:`missing-reference` event being emitted.

System Message: ERROR/3 (/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/stable/src/ymp/sphinxext.py:docstring of ymp.sphinxext.SnakemakeDomain.resolve_xref, line 11); backlink

Unknown interpreted text role “event”.

roles = {'rule': <sphinx.roles.XRefRole object>, 'stage': <sphinx.roles.XRefRole object>}¶

class ymp.sphinxext.SnakemakeRule(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶

Bases: ymp.sphinxext.YmpObjectDescription

Directive sm:rule:: describing a Snakemake rule

typename = 'rule'¶

class ymp.sphinxext.YmpObjectDescription(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶

Bases: sphinx.directives.ObjectDescription

Base class for RSt directives in SnakemakeDomain

Since this inherhits from Sphinx’ ObjectDescription, content generated by the directive will always be inside an addnodes.desc.

Parameters: source – Specify source position as file:line to create link

add_source_link(signode)[source]¶

Add link to source code to signode

Return type: None

add_target_and_index(name, sig, signode)[source]¶

Add cross-reference IDs and entries to self.indexnode

Return type: None

get_index_text(typename, name)[source]¶

Formats object for entry into index

Return type: str

handle_signature(sig, signode)[source]¶

Parse rule signature sig into RST nodes and append them to signode.

The retun value identifies the object and is passed to add_target_and_index() unchanged

Parameters

sig (str) – Signature string (i.e. string passed after directive)
signode (desc) – Node created for object signature

Return type

str

Returns

Normalized signature (white space removed)

option_spec = {'source': <function unchanged>}¶

typename = '[object name]'¶

class ymp.sphinxext.YmpStage(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶

Bases: ymp.sphinxext.YmpObjectDescription

Directive sm:stage:: describing an YMP stage

typename = 'stage'¶

ymp.sphinxext.collect_pages(app)[source]¶: Add Snakefiles to documentation (in HTML mode)

ymp.sphinxext.relpath(path)[source]¶

Make absolute path relative to BASEPATH

Parameters: path (str) – absolute path
Return type: str
Returns: path relative to BASEPATH

ymp.sphinxext.setup(app)[source]¶: Register the extension with Sphinx

ymp.string module¶

exception ymp.string.FormattingError(message, fieldname)[source]¶: Bases: AttributeError

class ymp.string.GetNameFormatter[source]¶

Bases: string.Formatter

get_names(pattern)[source]¶

class ymp.string.OverrideJoinFormatter[source]¶

Bases: string.Formatter

Formatter with overridable join method

The default formatter joins all arguments with "".join(args). This class overrides _vformat() with identical code, changing only that line to one that can be overridden by a derived class.

join(args)[source]¶

Joins the expanded pieces of the template string to form the output.

This function is equivalent to ''.join(args). By overriding it, alternative methods can be implemented, e.g. to create a list of strings, each corresponding to a the cross product of the expanded variables.

Return type: Union[List[str], str]

class ymp.string.PartialFormatter[source]¶

Bases: string.Formatter

Formats what it can and leaves the remainder untouched

get_field(field_name, args, kwargs)[source]¶

class ymp.string.ProductFormatter[source]¶

Bases: ymp.string.OverrideJoinFormatter

String Formatter that creates a list of strings each expanded using one point in the cartesian product of all replacement values.

If none of the arguments evaluate to lists, the result is a string, otherwise it is a list.

>>> ProductFormatter().format("{A} and {B}", A=[1,2], B=[3,4])
"1 and 3"
"1 and 4"
"2 and 3"
"2 and 4"

format_field(value, format_spec)[source]¶

join(args)[source]¶

Joins the expanded pieces of the template string to form the output.

This function is equivalent to ''.join(args). By overriding it, alternative methods can be implemented, e.g. to create a list of strings, each corresponding to a the cross product of the expanded variables.

Return type: Union[List[str], str]

class ymp.string.QuotedElementFormatter(*args, **kwargs)[source]¶: Bases: snakemake.utils.SequenceFormatter

class ymp.string.RegexFormatter(regex)[source]¶

Bases: string.Formatter

String Formatter accepting a regular expression defining the format of the expanded tags.

get_names(format_string)[source]¶

Get set of field names in format_string)

Return type: Set[str]

parse(format_string)[source]¶: Parse format_string into tuples. Tuples contain literal_text: text to copy field_name: follwed by field name format_spec: conversion:

ymp.string.make_formatter(product=None, regex=None, partial=None, quoted=None)[source]¶

ymp.util module¶

ymp.util.R(code='', **kwargs)[source]¶

Execute R code

This function executes the R code given as a string. Additional arguments are injected into the R environment. The value of the last R statement is returned.

The function requires rpy2 to be installed.

Parameters

code (str) – R code to be executed
**kwargs (dict) – variables to inject into R globalenv

Yields

value of last R statement

>>>  R("1*1", input=input)

ymp.util.Rmd(rmd, out, **kwargs)[source]¶

ymp.util.activate_R()[source]¶

ymp.util.fasta_names(fasta_file)[source]¶

ymp.util.file_not_empty(fn)[source]¶: Checks is a file is not empty, accounting for gz mininum size 20

ymp.util.filter_out_empty(*args)[source]¶

Removes empty sets of files from input file lists.

Takes a variable number of file lists of equal length and removes indices where any of the files is empty. Strings are converted to lists of length 1.

Returns a generator tuple.

Example: r1, r2 = filter_out_empty(input.r1, input.r2)

ymp.util.glob_wildcards(pattern, files=None)[source]¶: Glob the values of the wildcards by matching the given pattern to the filesystem. Returns a named tuple with a list of values for each wildcard.

ymp.util.is_fq(path)[source]¶

ymp.util.make_local_path(icfg, url)[source]¶

ymp.util.read_propfiles(files)[source]¶

ymp.yaml module¶

class ymp.yaml.AttrItemAccessMixin[source]¶

Bases: object

Mixin class mapping dot to bracket access

Added to classes implementing __getitem__, __setitem__ and __delitem__, this mixin will allow acessing items using dot notation. I.e. “object.xyz” is translated to “object[xyz]”.

exception ymp.yaml.LayeredConfAccessError[source]¶

Bases: ymp.yaml.LayeredConfError, KeyError, IndexError

Can’t access

exception ymp.yaml.LayeredConfError[source]¶

Bases: Exception

Error in LayeredConf

class ymp.yaml.LayeredConfProxy(maps, parent=None, key=None)[source]¶