ymp package¶
-
ymp.
get_config
()[source]¶ Access the current YMP configuration object.
This object might change once during normal execution: it is deleted before passing control to Snakemake. During unit test execution the object is deleted between all tests.
- Return type
-
ymp.
print_rule
= 0¶ Set to 1 to show the YMP expansion process as it is applied to the next Snakemake rule definition.
>>> ymp.print_rule = 1 >>> rule broken: >>> ...
>>> ymp make broken -vvv
-
ymp.
snakemake_versions
= ['6.0.5', '6.1.0', '6.1.1', '6.2.1']¶ List of versions this version of YMP has been verified to work with
Subpackages¶
- ymp.cli package
- ymp.stage package
- Submodules
- ymp.stage.base module
- ymp.stage.expander module
- ymp.stage.groupby module
- ymp.stage.params module
- ymp.stage.pipeline module
- ymp.stage.project module
PandasTableBuilder
Project
KEY_BCCOL
KEY_DATA
KEY_IDCOL
KEY_READCOLS
RE_FILE
RE_REMOTE
RE_SRR
choose_fq_columns
choose_id_column
data
do_get_ids
docstring
encode_barcode_path
fq_names
fwd_fq_names
fwd_pe_fq_names
get_all_targets
get_fq_names
get_group
get_ids
idcol
iter_samples
minimize_variables
outputs
pe_fq_names
project_name
raw_reads_source_path
rev_pe_fq_names
runs
se_fq_names
source_cfg
source_path
unsplit_path
variables
SQLiteProjectData
- ymp.stage.reference module
- ymp.stage.stack module
- ymp.stage.stage module
Submodules¶
ymp.blast module¶
Parsers for blast output formats 6 (CSV) and 7 (CSV with comments between queries).
-
class
ymp.blast.
BlastBase
[source]¶ Bases:
object
Base class for BLAST readers and writers
-
FIELD_MAP
= {'% identity': 'pident', 'alignment length': 'length', 'bit score': 'bitscore', 'evalue': 'evalue', 'gap opens': 'gapopen', 'mismatches': 'mismatch', 'q. end': 'qend', 'q. start': 'qstart', 'query acc.': 'qacc', 'query frame': 'qframe', 'query length': 'qlen', 's. end': 'send', 's. start': 'sstart', 'sbjct frame': 'sframe', 'score': 'score', 'subject acc.': 'sacc', 'subject strand': 'sstrand', 'subject tax ids': 'staxids', 'subject title': 'stitle'}¶ Map between field short and long names
-
FIELD_REV_MAP
= {'bitscore': 'bit score', 'evalue': 'evalue', 'gapopen': 'gap opens', 'length': 'alignment length', 'mismatch': 'mismatches', 'pident': '% identity', 'qacc': 'query acc.', 'qend': 'q. end', 'qframe': 'query frame', 'qlen': 'query length', 'qstart': 'q. start', 'sacc': 'subject acc.', 'score': 'score', 'send': 's. end', 'sframe': 'sbjct frame', 'sstart': 's. start', 'sstrand': 'subject strand', 'staxids': 'subject tax ids', 'stitle': 'subject title'}¶ Reversed map from short to long name
-
FIELD_TYPE
= {'bitscore': <class 'float'>, 'evalue': <class 'float'>, 'gapopen': <class 'int'>, 'length': <class 'int'>, 'mismatch': <class 'int'>, 'pident': <class 'float'>, 'qend': <class 'int'>, 'qframe': <class 'int'>, 'qlen': <class 'int'>, 'qstart': <class 'int'>, 'score': <class 'float'>, 'send': <class 'int'>, 'sframe': <class 'int'>, 'sstart': <class 'int'>, 'staxids': <function BlastBase.tupleofint>, 'stitle': <class 'str'>}¶ Map defining types of fields
-
-
class
ymp.blast.
BlastParser
[source]¶ Bases:
ymp.blast.BlastBase
Base class for BLAST readers
-
class
ymp.blast.
BlastWriter
[source]¶ Bases:
ymp.blast.BlastBase
Base class for BLAST writers
-
class
ymp.blast.
Fmt6Parser
(fileobj)[source]¶ Bases:
ymp.blast.BlastParser
Parser for BLAST format 6 (CSV)
-
Hit
¶ alias of
ymp.blast.BlastHit
-
field_types
= [None, None, <class 'float'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'float'>, <class 'float'>]¶
-
fields
= ['qseqid', 'sseqid', 'pident', 'length', 'mismatch', 'gapopen', 'qstart', 'qend', 'sstart', 'send', 'evalue', 'bitscore']¶ Default field types
-
-
class
ymp.blast.
Fmt7Parser
(fileobj)[source]¶ Bases:
ymp.blast.BlastParser
Parses BLAST results in format ‘7’ (CSV with comments)
-
PAT_DATABASE
= '# Database: '¶
-
PAT_FIELDS
= '# Fields: '¶
-
PAT_HITSFOUND
= ' hits found'¶
-
PAT_QUERY
= '# Query: '¶
-
-
class
ymp.blast.
Fmt7Writer
(fileobj)[source]¶ Bases:
ymp.blast.BlastWriter
-
ymp.blast.
reader
(fileobj, t=7)[source]¶ Creates a reader for files in BLAST format
>>> with open(blast_file) as infile: >>> reader = blast.reader(infile) >>> for hit in reader: >>> print(hit)
- Parameters
fileobj – iterable yielding lines in blast format
t (
int
) – number of blast format type
- Return type
ymp.blast2gff module¶
ymp.cluster module¶
Module handling talking to cluster management systems
>>> python -m ymp.cluster slurm status <jobid>
-
class
ymp.cluster.
Lsf
[source]¶ Bases:
ymp.cluster.ClusterMS
Talking to LSF
-
states
= {'DONE': 'success', 'EXIT': 'failed', 'PEND': 'running', 'POST_DONE': 'success', 'POST_ERR': 'failed', 'PSUSP': 'running', 'RUN': 'running', 'SSUSP': 'running', 'UNKWN': 'running', 'USUSP': 'running', 'WAIT': 'running'}¶
-
-
class
ymp.cluster.
Slurm
[source]¶ Bases:
ymp.cluster.ClusterMS
Talking to Slurm
-
states
= {'BOOT_FAIL': 'failed', 'CANCELLED': 'failed', 'COMPLETED': 'success', 'COMPLETING': 'running', 'CONFIGURING': 'running', 'DEADLINE': 'failed', 'FAILED': 'failed', 'NODE_FAIL': 'failed', 'PENDING': 'running', 'PREEMPTED': 'failed', 'RESIZING': 'running', 'REVOKED': 'running', 'RUNNING': 'running', 'SPECIAL_EXIT': 'running', 'SUSPENDED': 'running', 'TIMEOUT': 'failed'}¶
-
static
status
(jobid)[source]¶ Print status of job @param jobid to stdout (as needed by snakemake)
Anectotal benchmarking shows 200ms per invocation, half used by Python startup and half by calling sacct. Using
scontrol show job
instead ofsacct -pbs
is faster by 80ms, but finished jobs are purged after unknown time window.
-
ymp.common module¶
Collection of shared utility classes and methods
-
class
ymp.common.
AttrDict
[source]¶ Bases:
dict
AttrDict adds accessing stored keys as attributes to dict
-
class
ymp.common.
CacheDict
(cache, name, *args, loadfunc=None, itemloadfunc=None, itemdata=None, **kwargs)[source]¶ Bases:
ymp.common.AttrDict
-
class
ymp.common.
MkdirDict
[source]¶ Bases:
ymp.common.AttrDict
Creates directories as they are requested
-
ymp.common.
is_container
(obj)[source]¶ Check if object is container, considering strings not containers
ymp.config module¶
-
class
ymp.config.
ConfigExpander
(config_mgr)[source]¶ Bases:
ymp.snakemake.ColonExpander
-
class
Formatter
(expander)[source]¶ Bases:
ymp.snakemake.FormatExpander.Formatter
,ymp.string.PartialFormatter
-
class
-
class
ymp.config.
ConfigMgr
(root, conffiles)[source]¶ Bases:
object
Manages workflow configuration
This is a singleton object of which only one instance should be around at a given time. It is available in the rules files as
icfg
and viaymp.get_config()
elsewhere.ConfigMgr loads and maintains the workflow configuration as given in the
ymp.yml
files located in the workflow root directory, the user config folder (~/.ymp
) and the installationetc
folder.-
CONF_DEFAULT_FNAME
= '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/src/ymp/etc/defaults.yml'¶
-
CONF_FNAME
= 'ymp.yml'¶
-
CONF_USER_FNAME
= '/home/docs/.ymp/ymp.yml'¶
-
KEY_LIMITS
= 'resource_limits'¶
-
KEY_PIPELINES
= 'pipelines'¶
-
KEY_PROJECTS
= 'projects'¶
-
KEY_REFERENCES
= 'references'¶
-
RULE_MAIN_FNAME
= '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/src/ymp/rules/Snakefile'¶
-
property
absdir
¶ Dictionary of absolute paths of named YMP directories
-
property
cluster
¶ The YMP cluster configuration.
-
property
conda
¶
-
property
dir
¶ Dictionary of relative paths of named YMP directories
The directory paths are relative to the YMP root workdir.
-
property
ensuredir
¶ Dictionary of absolute paths of named YMP directories
Directories will be created on the fly as they are requested.
-
classmethod
find_config
()[source]¶ Locates ymp config files and ymp root
The root ymp work dir is determined as the first (parent) directory containing a file named
ConfigMgr.CONF_FNAME
(defaultymp.yml
).The stack of config files comprises 1. the default config
ConfigMgr.CONF_DEFAULT_FNAME
(etc/defaults.yml
in the ymp package directory), 2. the user configConfigMgr.CONF_USER_FNAME
(~/.ymp/ymp.yml
) and 3. theyml.yml
in the ymp root.- Returns
Root working directory conffiles: list of active configuration files
- Return type
root
-
property
pairnames
¶
-
property
pipeline
¶ Configure pipelines
-
property
platform
¶ Name of current platform (macos or linux)
-
property
ref
¶ Configure references
-
property
rules
¶
-
property
shell
¶ The shell used by YMP
Change by adding e.g.
shell: /path/to/shell
toymp.yml
.
-
property
snakefiles
¶ Snakefiles used under this config in parsing order
-
property
workflow
¶
-
-
class
ymp.config.
OverrideExpander
(cfgmgr)[source]¶ Bases:
ymp.snakemake.BaseExpander
Override rule parameters, resources and threads using config values
Example
Set the
wordsize
parameter in thebmtagger_bitmask
rule to 12:overrides: rules: bmtagger_bitmask: params: wordsize: 12 resources: memory: 15G threads: 12
-
expand
(rule, ruleinfo, **kwargs)[source]¶ Expands RuleInfo object and children recursively.
Will call :meth:format (via :meth:format_annotated) on
str
items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.Set
ymp.print_rule = 1
before arule:
statement in snakefiles to enable debug logging of recursion.- Parameters
rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item
item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be
None
,str
,function
,int
,float
,dict
,list
ortuple
.expand_args – Parameters passed on late expansion (when the
dag
tries to instantiate therule
into ajob
.rec – Recursion level
-
types
= {'params': typing.Mapping, 'resources': typing.Mapping, 'threads': <class 'int'>}¶
-
-
class
ymp.config.
ResourceLimitsExpander
(cfg)[source]¶ Bases:
ymp.snakemake.BaseExpander
Allows adjusting resources to local compute environment
Each config item defines processing for an item in
resources:
or the special resource``threads``. Each item may have adefault
value filled in for rules not defining the resource,min
andmax
defining the lower and uppeer bounds, and ascale
value applied to thedefault
to adjust resources up or down globally. Values in time or “human readable” format mabe parsed specially by passing theformat
valuestime
ornumber
, respectively. These values will also be reformatted, with the optional paramterunit
defining the output format (k/g/m/t for numbers and minutes/seconds for time). Additional resource values may be generated from configured onces using thefrom
keyword (e.g. to provide bothmem_mb
andmem_gb
from a genericmem
value.-
static
adjust_value
(value, default, scale, minimum, maximum)[source]¶ Applies default, scale, minimum and maximum to a numeric value)
-
expand
(rule, ruleinfo, **kwargs)[source]¶ Expands RuleInfo object and children recursively.
Will call :meth:format (via :meth:format_annotated) on
str
items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.Set
ymp.print_rule = 1
before arule:
statement in snakefiles to enable debug logging of recursion.- Parameters
rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item
item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be
None
,str
,function
,int
,float
,dict
,list
ortuple
.expand_args – Parameters passed on late expansion (when the
dag
tries to instantiate therule
into ajob
.rec – Recursion level
- Return type
-
formatters
= {'number': <function format_number>, 'time': <function format_time>}¶
-
parsers
= {'number': <function parse_number>, 'time': <function parse_time>}¶
-
static
ymp.download module¶
-
class
ymp.download.
FileDownloader
(block_size=4096, timeout=300, parallel=4, loglevel=30, alturls=None, retry=3)[source]¶ Bases:
object
Manages download of a set of URLs
Downloads happen concurrently using asyncronous network IO.
- Parameters
block_size (
int
) – Byte size of chunks to downloadtimeout (
int
) – Aiohttp cumulative timeoutparallel (
int
) – Number of files to download in parallelloglevel (
int
) – Log level for messages send to logging (Errors are send with loglevel+10)alturls – List of regexps modifying URLs
retry (
int
) – Number of times to retry download
-
error
(msg, *args, **kwargs)[source]¶ Send error to logger
Message is sent with a log level 10 higher than the default for this object.
- Return type
-
log
(msg, *args, modlvl=0, **kwargs)[source]¶ Send message to logger
Honors loglevel set for the FileDownloader object.
ymp.env module¶
This module manages the conda environments.
-
class
ymp.env.
CondaPathExpander
(config, *args, **kwargs)[source]¶ Bases:
ymp.snakemake.BaseExpander
Applies search path for conda environment specifications
File names supplied via
rule: conda: "some.yml"
are replaced with absolute paths if they are found in any searched directory. Eachsearch_paths
entry is appended to the directory containing the top level Snakefile and the directory checked for the filename. Thereafter, the stack of including Snakefiles is traversed backwards. If no file is found, the original name is returned.
-
class
ymp.env.
Env
(env_file=None, workflow=None, env_dir=None, container_img=None, cleanup=None, name=None, packages=None, base='none', channels=None)[source]¶ Bases:
ymp.snakemake.WorkflowObject
,snakemake.deployment.conda.Env
Represents YMP conda environment
Snakemake expects the conda environments in a per-workflow directory configured by
conda_prefix
. YMP sets this value by default to~/.ymp/conda
, which has a greater chance of being on the same file system as the conda cache, allowing for hard linking of environment files.Within the folder
conda_prefix
, each environment is created in a folder named by the hash of the environment definition file’s contents and theconda_prefix
path. This class inherits fromsnakemake.deployment.conda.Env
to ensure that the hash we use is identical to the one Snakemake will use during workflow execution.The class provides additional features for updating environments, creating environments dynamically and executing commands within those environments.
Note
This is not called from within the execution. Snakemake instanciates its own Env object purely based on the filename.
Creates an inline defined conda environment
- Parameters
name (
Optional
[str
]) – Name of conda environment (and basename of file)packages (
Union
[list
,str
,None
]) – package(s) to be installed into environment. Version constraints can be specified in each package string separated from the package name by whitespace. E.g."blast =2.6*"
channels (
Union
[list
,str
,None
]) – channel(s) to be selected for the environmentbase (
str
) – Select a set of default channels and packages to be added to the newly created environment. Sets are defined in conda.defaults inyml.yml
-
create
(dryrun=False, reinstall=False, nospec=False, noarchive=False)[source]¶ Ensure the conda environment has been created
Inherits from snakemake.deployment.conda.Env.create
- Behavior of super class
Resolve remote file
If containerized, check environment path exists and return if true
Check for interrupted env create, delete if so
Return if environment exists
Install from archive if env_archive exists
Install using self.frontent if not_careful
- Handling pre-computed environment specs
In addition to freezing environments by maintaining a copy of the package binaries, we allow maintaining a copy of the package binary URLs, from which the archive folder is populated on demand. We just download those to self.archive and pass on.
-
property
installed
¶
ymp.exceptions module¶
Exceptions raised by YMP
-
exception
ymp.exceptions.
YmpConfigError
(obj, msg, key=None)[source]¶ Bases:
ymp.exceptions.YmpLocateableError
Indicates an error in the ymp.yml config files
- Parameters
-
exception
ymp.exceptions.
YmpLocateableError
(obj, msg, show_includes=True)[source]¶ Bases:
ymp.exceptions.YmpPrettyException
Errors that have a file location to be shown
- Parameters
-
exception
ymp.exceptions.
YmpPrettyException
(message)[source]¶ Bases:
ymp.exceptions.YmpException
,click.exceptions.ClickException
,snakemake.exceptions.WorkflowError
Exception that does not lead to stack trace on CLI
Inheriting from ClickException makes
click
print only theself.msg
value of the exception, rather than allowing Python to print a full stack trace.This is useful for exceptions indicating usage or configuration errors. We use this, instead of
click.UsageError
and friends so that the exceptions can be caught and handled explicitly where needed.Note that click will call the
show
method on this object to print the exception. The default implementation from click will just prefix themsg
withError:
.- FIXME: This does not work if the exception is raised from within
the snakemake workflow as snakemake.snakemake catches and reformats exceptions.
-
rule
= None¶
-
snakefile
= None¶
-
exception
ymp.exceptions.
YmpRuleError
(obj, msg, show_includes=True)[source]¶ Bases:
ymp.exceptions.YmpLocateableError
Indicates an error in the rules files
This could e.g. be a Stage or Environment defined twice.
-
exception
ymp.exceptions.
YmpStageError
(msg)[source]¶ Bases:
ymp.exceptions.YmpPrettyException
Indicates an error in the requested stage stack
-
exception
ymp.exceptions.
YmpSystemError
(message)[source]¶ Bases:
ymp.exceptions.YmpPrettyException
Indicates problem running YMP with available system software
-
exception
ymp.exceptions.
YmpUsageError
(message)[source]¶ Bases:
ymp.exceptions.YmpPrettyException
General usage error
-
exception
ymp.exceptions.
YmpWorkflowError
(message)[source]¶ Bases:
ymp.exceptions.YmpPrettyException
Indicates an error during workflow execution
E.g. failures to expand dynamic variables
ymp.gff module¶
Implements simple reader and writer for GFF (general feature format) files.
Unfinished
only supports one version, GFF 3.2.3.
no escaping
-
class
ymp.gff.
Attributes
(ID, Name, Alias, Parent, Target, Gap, Derives_From, Note, Dbxref, Ontology_term, Is_circular)¶ Bases:
tuple
Create new instance of Attributes(ID, Name, Alias, Parent, Target, Gap, Derives_From, Note, Dbxref, Ontology_term, Is_circular)
-
Alias
¶ Alias for field number 2
-
Dbxref
¶ Alias for field number 8
-
Derives_From
¶ Alias for field number 6
-
Gap
¶ Alias for field number 5
-
ID
¶ Alias for field number 0
-
Is_circular
¶ Alias for field number 10
-
Name
¶ Alias for field number 1
-
Note
¶ Alias for field number 7
-
Ontology_term
¶ Alias for field number 9
-
Parent
¶ Alias for field number 3
-
Target
¶ Alias for field number 4
-
-
class
ymp.gff.
Feature
(seqid, source, type, start, end, score, strand, phase, attributes)¶ Bases:
tuple
Create new instance of Feature(seqid, source, type, start, end, score, strand, phase, attributes)
-
attributes
¶ Alias for field number 8
-
end
¶ Alias for field number 4
-
phase
¶ Alias for field number 7
-
score
¶ Alias for field number 5
-
seqid
¶ Alias for field number 0
-
source
¶ Alias for field number 1
-
start
¶ Alias for field number 3
-
strand
¶ Alias for field number 6
-
type
¶ Alias for field number 2
-
ymp.helpers module¶
This module contains helper functions.
Not all of these are currently in use
-
class
ymp.helpers.
OrderedDictMaker
[source]¶ Bases:
object
odict creates OrderedDict objects in a dict-literal like syntax
>>> my_ordered_dict = odict[ >>> 'key': 'value' >>> ]
Implementation: odict uses the python slice syntax which is similar to dict literals. The [] operator is implemented by overriding __getitem__. Slices passed to the operator as
object[start1:stop1:step1, start2:...]
, are passed to the implementation as a list of objects with start, stop and step members. odict simply creates an OrderedDictionary by iterating over that list.
ymp.nuc2aa module¶
ymp.snakemake module¶
Extends Snakemake Features
-
class
ymp.snakemake.
BaseExpander
[source]¶ Bases:
object
Base class for Snakemake expansion modules.
Subclasses should override the :meth:expand method if they need to work on the entire RuleInfo object or the :meth:format and :meth:expands_field methods if they intend to modify specific fields.
-
expand
(rule, item, expand_args=None, rec=- 1, cb=False)[source]¶ Expands RuleInfo object and children recursively.
Will call :meth:format (via :meth:format_annotated) on
str
items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.Set
ymp.print_rule = 1
before arule:
statement in snakefiles to enable debug logging of recursion.- Parameters
rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item
item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be
None
,str
,function
,int
,float
,dict
,list
ortuple
.expand_args – Parameters passed on late expansion (when the
dag
tries to instantiate therule
into ajob
.rec – Recursion level
-
expands_field
(field)[source]¶ Checks if this expander should expand a Rule field type
- Parameters
field – the field to check
- Returns
True if field should be expanded.
-
-
exception
ymp.snakemake.
CircularReferenceException
(deps, rule)[source]¶ Bases:
ymp.exceptions.YmpRuleError
Exception raised if parameters in rule contain a circular reference
-
class
ymp.snakemake.
ColonExpander
[source]¶ Bases:
ymp.snakemake.FormatExpander
Expander using
{:xyz:}
formatted variables.-
regex
= re.compile('\n \\{:\n (?=(\n \\s*\n (?P<name>(?:.(?!\\s*\\:\\}))*.)\n \\s*\n ))\\1\n :\\}\n ', re.VERBOSE)¶
-
spec
= '{{:{}:}}'¶
-
-
class
ymp.snakemake.
DefaultExpander
(**kwargs)[source]¶ Bases:
ymp.snakemake.InheritanceExpander
Adds default values to rules
The implementation simply makes all rules inherit from a defaults rule.
Creates DefaultExpander
Each parameter passed is considered a RuleInfo default value. Where applicable, Snakemake’s argtuples
([],{})
must be passed.
-
class
ymp.snakemake.
ExpandableWorkflow
(*args, **kwargs)[source]¶ Bases:
snakemake.workflow.Workflow
Adds hook for additional rule expansion methods to Snakemake
Constructor for ExpandableWorkflow overlay attributes
This may be called on an already initialized Workflow object.
-
classmethod
activate
()[source]¶ Installs the ExpandableWorkflow
Replaces the Workflow object in the snakemake.workflow module with an instance of this class and initializes default expanders (the snakemake syntax).
-
add_rule
(name=None, lineno=None, snakefile=None, checkpoint=False, allow_overwrite=False)[source]¶ Add a rule.
- Parameters
name – name of the rule
lineno – line number within the snakefile where the rule was defined
snakefile – name of file in which rule was defined
-
get_rule
(name=None)[source]¶ Get rule by name. If name is none, the last created rule is returned.
- Parameters
name – the name of the rule
-
global_workflow
= <ymp.snakemake.ExpandableWorkflow object>¶
-
classmethod
load_workflow
(snakefile='/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/src/ymp/rules/Snakefile')[source]¶
-
classmethod
-
class
ymp.snakemake.
FormatExpander
[source]¶ Bases:
ymp.snakemake.BaseExpander
Expander using a custom formatter object.
-
class
Formatter
(expander)[source]¶ Bases:
ymp.string.ProductFormatter
-
regex
= re.compile('\n \\{\n (?=(\n (?P<name>[^{}]+)\n ))\\1\n \\}\n ', re.VERBOSE)¶
-
spec
= '{{{}}}'¶
-
class
-
exception
ymp.snakemake.
InheritanceException
(msg, rule, parent, include=None, lineno=None, snakefile=None)[source]¶ Bases:
snakemake.exceptions.RuleException
Exception raised for errors during rule inheritance
Creates a new instance of RuleException.
Arguments message – the exception message include – iterable of other exceptions to be included lineno – the line the exception originates snakefile – the file the exception originates
-
class
ymp.snakemake.
InheritanceExpander
[source]¶ Bases:
ymp.snakemake.BaseExpander
Adds class-like inheritance to Snakemake rules
To avoid redundancy between closely related rules, e.g. rules for single ended and paired end data, YMP allows Snakemake rules to inherit from another rule.
Example
Derived rules are always created with an implicit
ruleorder
statement, making Snakemake prefer the parent rule if either parent or child rule could be used to generate the requested output file(s).Derived rules initially contain the same attributes as the parent rule. Each attribute assigned to the child rule overrides the matching attribute in the parent. Where attributes may contain named and unnamed values, specifying a named value overrides only the value of that name while specifying an unnamed value overrides all unnamed values in the parent attribute.
-
KEYWORD
= 'ymp: extends'¶ Comment keyword enabling inheritance
-
expand
(rule, ruleinfo)[source]¶ Expands RuleInfo object and children recursively.
Will call :meth:format (via :meth:format_annotated) on
str
items encountered in the tree and wrap encountered functions to be called once the wildcards object is available.Set
ymp.print_rule = 1
before arule:
statement in snakefiles to enable debug logging of recursion.- Parameters
rule – The :class:snakemake.rules.Rule object to be populated with the data from the RuleInfo object passed from item
item – The item to be expanded. Initially a :class:snakemake.workflow.RuleInfo object into which is recursively decendet. May ultimately be
None
,str
,function
,int
,float
,dict
,list
ortuple
.expand_args – Parameters passed on late expansion (when the
dag
tries to instantiate therule
into ajob
.rec – Recursion level
-
-
class
ymp.snakemake.
NamedList
(fromtuple=None, **kwargs)[source]¶ Bases:
snakemake.io.Namedlist
Extended version of Snakemake’s
Namedlist
Fixes array assignment operator: Writing a field via
[]
operator updates the value accessed via.
operator.Adds
fromtuple
to constructor: Builds from Snakemake’s typial(args, kwargs)
tuples as present in ruleinfo structures.Adds
update_tuple
method: Updates values in(args,kwargs)
tuples as present inruleinfo
structures.
-
class
ymp.snakemake.
RecursiveExpander
[source]¶ Bases:
ymp.snakemake.BaseExpander
Recursively expands
{xyz}
wildcards in Snakemake rules.-
expands_field
(field)[source]¶ Returns true for all fields but
shell:
,message:
andwildcard_constraints
.We don’t want to mess with the regular expressions in the fields in
wildcard_constraints:
, and there is little use in expandingmessage:
orshell:
as these already have all wildcards applied just before job execution (byformat_wildcards()
).
-
-
exception
ymp.snakemake.
RemoveValue
[source]¶ Bases:
Exception
Return to remove a value from the list
-
class
ymp.snakemake.
SnakemakeExpander
[source]¶ Bases:
ymp.snakemake.BaseExpander
Expand wildcards in strings returned from functions.
Snakemake does not do this by default, leaving wildcard expansion to the functions provided themselves. Since we never want
{input}
to be in a string returned as a file, we expand those always.
-
class
ymp.snakemake.
WorkflowObject
(*args, **kwargs)[source]¶ Bases:
object
Base for extension classes defined from snakefiles
This currently encompasses
ymp.env.Env
andymp.stage.stage.Stage
.This mixin sets the properties
filename
andlineno
according to the definition source in the rules file. It also maintains a registry within the Snakemake workflow object and provides an accessor method to this registry.-
property
defined_in
¶
-
property
-
ymp.snakemake.
print_ruleinfo
(rule, ruleinfo, func=<bound method Logger.debug of <Logger ymp.snakemake (WARNING)>>)[source]¶ Logs contents of Rule and RuleInfo objects.
- Parameters
rule (
Rule
) – Rule object to be printedruleinfo (
RuleInfo
) – Matching RuleInfo object to be printedfunc – Function used for printing (default is log.error)
-
ymp.snakemake.
ruleinfo_fields
= {'benchmark': {'apply_wildcards': True, 'format': 'string'}, 'conda_env': {'apply_wildcards': True, 'format': 'string'}, 'container_img': {'format': 'string'}, 'docstring': {'format': 'string'}, 'func': {'format': 'callable'}, 'input': {'apply_wildcards': True, 'format': 'argstuple', 'funcparams': ('wildcards',)}, 'log': {'apply_wildcards': True, 'format': 'argstuple'}, 'message': {'format': 'string', 'format_wildcards': True}, 'norun': {'format': 'bool'}, 'output': {'apply_wildcards': True, 'format': 'argstuple'}, 'params': {'apply_wildcards': True, 'format': 'argstuple', 'funcparams': ('wildcards', 'input', 'resources', 'output', 'threads')}, 'priority': {'format': 'numeric'}, 'resources': {'format': 'argstuple', 'funcparams': ('wildcards', 'input', 'attempt', 'threads')}, 'script': {'format': 'string'}, 'shadow_depth': {'format': 'string_or_true'}, 'shellcmd': {'format': 'string', 'format_wildcards': True}, 'threads': {'format': 'int', 'funcparams': ('wildcards', 'input', 'attempt', 'threads')}, 'version': {'format': 'object'}, 'wildcard_constraints': {'format': 'argstuple'}, 'wrapper': {'format': 'string'}}¶ describes attributes of
snakemake.workflow.RuleInfo
ymp.snakemakelexer module¶
ymp.snakemakelexer¶
-
class
ymp.snakemakelexer.
SnakemakeLexer
(*args, **kwds)[source]¶ Bases:
pygments.lexers.python.PythonLexer
-
name
= 'Snakemake'¶ Name of the lexer
-
tokens
= {'globalkeyword': [(<pygments.lexer.words object>, Token.Keyword)], 'root': [('(rule|checkpoint)((?:\\s|\\\\\\s)+)', <function bygroups.<locals>.callback>, 'rulename'), 'rulekeyword', 'globalkeyword', inherit], 'rulekeyword': [(<pygments.lexer.words object>, Token.Keyword)], 'rulename': [('[a-zA-Z_]\\w*', Token.Name.Class, '#pop')]}¶ Dict of
{'state': [(regex, tokentype, new_state), ...], ...}
The initial state is ‘root’.
new_state
can be omitted to signify no state transition. If it is a string, the state is pushed on the stack and changed. If it is a tuple of strings, all states are pushed on the stack and the current state will be the topmost. It can also becombined('state1', 'state2', ...)
to signify a new, anonymous state combined from the rules of two or more existing ones. Furthermore, it can be ‘#pop’ to signify going back one step in the state stack, or ‘#push’ to push the current state on the stack again.The tuple can also be replaced with
include('state')
, in which case the rules from the state named by the string are included in the current one.
-
ymp.sphinxext module¶
This module contains a Sphinx extension for documenting YMP stages and Snakemake rules.
The SnakemakeDomain
(name sm) provides the following directives:
-
.. sm:rule::
name
¶ Describes a
Snakemake rule
Both directives accept an optional source
parameter. If given, a
link to the source code of the stage or rule definition will be added.
The format of the string passed is filename:line
. Referenced
Snakefiles will be highlighted with pygments and added to the
documentation when building HTML.
The extension also provides an autodoc-like directive:
-
.. autosnake::
filename
¶ Generates documentation from Snakefile
filename
.
-
class
ymp.sphinxext.
AutoSnakefileDirective
(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶ Bases:
docutils.parsers.rst.Directive
Implements RSt directive
.. autosnake:: filename
The directive extracts docstrings from rules in snakefile and auto-generates documentation.
-
ymp.sphinxext.
BASEPATH
= '/home/docs/checkouts/readthedocs.org/user_builds/ymp/checkouts/latest/src'¶ Path in which YMP package is located
- Type
-
class
ymp.sphinxext.
CondaDomain
(env)[source]¶ Bases:
sphinx.domains.Domain
-
name
= 'conda'¶ should be short, but unique
- Type
domain name
-
-
class
ymp.sphinxext.
DomainTocTreeCollector
[source]¶ Bases:
sphinx.environment.collectors.EnvironmentCollector
Add Sphinx Domain entries to the TOC
-
clear_doc
(app, env, docname)[source]¶ Clear data from environment
If we have cached data in environment for document
docname
, we should clear it here.- Return type
-
merge_other
(app, env, docnames, other)[source]¶ Merge with results from parallel processes
Called if Sphinx is processing documents in parallel. We should merge this from
other
intoenv
for alldocnames
.- Return type
-
process_doc
(app, doctree)[source]¶ Process
doctree
This is called by
read-doctree
, so after the doctree has been loaded. The signal is processed in registered first order, so we are called after built-in extensions, such as thesphinx.environment.collectors.toctree
extension building the TOC.- Return type
-
select_doc_nodes
(doctree)[source]¶ Select the nodes for which entries in the TOC are desired
This is a separate method so that it might be overriden by subclasses wanting to add other types of nodes to the TOC.
- Return type
List
[Node
]
-
-
class
ymp.sphinxext.
SnakemakeDomain
(env)[source]¶ Bases:
sphinx.domains.Domain
Snakemake language domain
-
data_version
= 0¶ data version, bump this when the format of
self.data
changes
-
directives
: Dict[str, Any] = {'rule': <class 'ymp.sphinxext.SnakemakeRule'>, 'stage': <class 'ymp.sphinxext.YmpStage'>}¶ directive name -> directive class
-
get_objects
()[source]¶ Return an iterable of “object descriptions”.
Object descriptions are tuples with six items:
name
Fully qualified name.
dispname
Name to display when searching/linking.
type
Object type, a key in
self.object_types
.docname
The document where it is to be found.
anchor
The anchor name for the object.
priority
How “important” the object is (determines placement in search results). One of:
1
Default priority (placed before full-text matches).
0
Object is important (placed before default-priority objects).
2
Object is unimportant (placed after full-text matches).
-1
Object should not show up in search at all.
-
initial_data
: Dict = {'objects': {}}¶ data value for a fresh environment
-
label
= 'Snakemake'¶ longer, more descriptive (used in messages)
- Type
domain label
-
name
= 'sm'¶ should be short, but unique
- Type
domain name
-
object_types
: Dict[str, ObjType] = {'rule': <sphinx.domains.ObjType object>, 'stage': <sphinx.domains.ObjType object>}¶ type (usually directive) name -> ObjType instance
-
resolve_xref
(env, fromdocname, builder, typ, target, node, contnode)[source]¶ Resolve the pending_xref node with the given typ and target.
This method should return a new node, to replace the xref node, containing the contnode which is the markup content of the cross-reference.
If no resolution can be found, None can be returned; the xref node will then given to the :event:`missing-reference` event, and if that yields no resolution, replaced by contnode.
The method can also raise
sphinx.environment.NoUri
to suppress the :event:`missing-reference` event being emitted.
-
-
class
ymp.sphinxext.
SnakemakeRule
(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶ Bases:
sphinx.util.docutils.SphinxDirective
,Generic
[sphinx.directives.T
]Directive
sm:rule::
describing a Snakemake rule-
typename
= 'rule'¶
-
-
class
ymp.sphinxext.
YmpObjectDescription
(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶ Bases:
sphinx.util.docutils.SphinxDirective
,Generic
[sphinx.directives.T
]Base class for RSt directives in SnakemakeDomain
Since this inherhits from Sphinx’ ObjectDescription, content generated by the directive will always be inside an addnodes.desc.
- Parameters
source – Specify source position as
file:line
to create link
-
add_target_and_index
(name, sig, signode)[source]¶ Add cross-reference IDs and entries to
self.indexnode
- Return type
-
handle_signature
(sig, signode)[source]¶ Parse rule signature sig into RST nodes and append them to signode.
The retun value identifies the object and is passed to
add_target_and_index()
unchanged
-
option_spec
: Dict[str, DirectiveOption] = {'source': <function unchanged>}¶ Mapping of option names to validator functions.
-
typename
= '[object name]'¶
-
class
ymp.sphinxext.
YmpStage
(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]¶ Bases:
sphinx.util.docutils.SphinxDirective
,Generic
[sphinx.directives.T
]Directive
sm:stage::
describing an YMP stage-
typename
= 'stage'¶
-
ymp.string module¶
-
exception
ymp.string.
FormattingError
(message, fieldname)[source]¶ Bases:
AttributeError
-
class
ymp.string.
GetNameFormatter
[source]¶ Bases:
string.Formatter
-
class
ymp.string.
OverrideJoinFormatter
[source]¶ Bases:
string.Formatter
Formatter with overridable join method
The default formatter joins all arguments with
"".join(args)
. This class overrides_vformat()
with identical code, changing only that line to one that can be overridden by a derived class.
-
class
ymp.string.
PartialFormatter
[source]¶ Bases:
string.Formatter
Formats what it can and leaves the remainder untouched
-
class
ymp.string.
ProductFormatter
[source]¶ Bases:
ymp.string.OverrideJoinFormatter
String Formatter that creates a list of strings each expanded using one point in the cartesian product of all replacement values.
If none of the arguments evaluate to lists, the result is a string, otherwise it is a list.
>>> ProductFormatter().format("{A} and {B}", A=[1,2], B=[3,4]) "1 and 3" "1 and 4" "2 and 3" "2 and 4"
-
class
ymp.string.
RegexFormatter
(regex)[source]¶ Bases:
string.Formatter
String Formatter accepting a regular expression defining the format of the expanded tags.
ymp.util module¶
-
ymp.util.
R
(code='', **kwargs)[source]¶ Execute R code
This function executes the R code given as a string. Additional arguments are injected into the R environment. The value of the last R statement is returned.
The function requires rpy2 to be installed.
- Parameters
- Yields
value of last R statement
>>> R("1*1", input=input)
-
ymp.util.
file_not_empty
(fn, minsize=1)[source]¶ Checks is a file is not empty, accounting for gz mininum size 20
-
ymp.util.
filter_out_empty
(*args)[source]¶ Removes empty sets of files from input file lists.
Takes a variable number of file lists of equal length and removes indices where any of the files is empty. Strings are converted to lists of length 1.
Returns a generator tuple.
Example: r1, r2 = filter_out_empty(input.r1, input.r2)
ymp.yaml module¶
-
class
ymp.yaml.
AttrItemAccessMixin
[source]¶ Bases:
object
Mixin class mapping dot to bracket access
Added to classes implementing __getitem__, __setitem__ and __delitem__, this mixin will allow acessing items using dot notation. I.e. “object.xyz” is translated to “object[xyz]”.
-
exception
ymp.yaml.
LayeredConfAccessError
(obj, msg, key=None, stack=None)[source]¶ Bases:
ymp.yaml.LayeredConfError
,KeyError
,IndexError
Can’t access
-
exception
ymp.yaml.
LayeredConfError
(obj, msg, key=None, stack=None)[source]¶ Bases:
ymp.exceptions.YmpConfigError
Error in LayeredConf
-
class
ymp.yaml.
LayeredConfProxy
(maps, root=None, parent=None, key=None)[source]¶ Bases:
ymp.yaml.MultiMapProxy
Layered configuration
-
exception
ymp.yaml.
LayeredConfWriteError
(obj, msg, key=None, stack=None)[source]¶ Bases:
ymp.yaml.LayeredConfError
Can’t write
-
exception
ymp.yaml.
MixedTypeError
(obj, msg, key=None, stack=None)[source]¶ Bases:
ymp.yaml.LayeredConfError
Mixed types in proxy collection
-
class
ymp.yaml.
MultiMapProxy
(maps, root=None, parent=None, key=None)[source]¶ Bases:
ymp.yaml.MultiProxy
,ymp.yaml.AttrItemAccessMixin
,collections.abc.Mapping
Mapping Proxy for layered containers
-
class
ymp.yaml.
MultiMapProxyItemsView
(mapping)[source]¶ Bases:
ymp.yaml.MultiMapProxyMappingView
,collections.abc.ItemsView
ItemsView for MultiMapProxy
-
class
ymp.yaml.
MultiMapProxyKeysView
(mapping)[source]¶ Bases:
ymp.yaml.MultiMapProxyMappingView
,collections.abc.KeysView
KeysView for MultiMapProxy
-
class
ymp.yaml.
MultiMapProxyMappingView
(mapping)[source]¶ Bases:
collections.abc.MappingView
MappingView for MultiMapProxy
-
class
ymp.yaml.
MultiMapProxyValuesView
(mapping)[source]¶ Bases:
ymp.yaml.MultiMapProxyMappingView
,collections.abc.ValuesView
ValuesView for MultiMapProxy
-
class
ymp.yaml.
MultiProxy
(maps, root=None, parent=None, key=None)[source]¶ Bases:
object
Base class for layered container structure
-
class
ymp.yaml.
MultiSeqProxy
(maps, root=None, parent=None, key=None)[source]¶ Bases:
ymp.yaml.MultiProxy
,ymp.yaml.AttrItemAccessMixin
,collections.abc.Sequence
Sequence Proxy for layered containers
-
ymp.yaml.
load
(files, root=None)[source]¶ Load configuration files
Creates a
LayeredConfProxy
configuration object from a set of YAML files.Files listed later will override parts of earlier included files