ymp.stage package¶
YMP processes data in stages, each of which is contained in its own directory.
with Stage("trim_bbmap") as S:
S.doc("Trim reads with BBMap")
rule bbmap_trim:
output: "{:this:}/{sample}{:pairnames:}.fq.gz"
input: "{:prev:}/{sample}{:pairnames:}.fq.gz"
...
Submodules¶
ymp.stage.base module¶
-
class
ymp.stage.base.BaseStage(name)[source]¶ Bases:
objectBase class for stage types
-
STAMP_FILENAME= 'all_targets.stamp'¶ The name of the stamp file that is touched to indicate completion of the stage.
-
can_provide(inputs)[source]¶ Determines which of
inputsthis stage can provide.Returns a dictionary with the keys a subset of
inputsand the values identifying redirections. An empty string indicates that no redirection is to take place. Otherwise, the string is the suffix to be appended to the priorStageStack.
-
doc(doc)[source]¶ Add documentation to Stage
- Parameters
doc (
str) – Docstring passed to Sphinx- Return type
None
-
docstring: str¶ The docstring describing this stage. Visible via
ymp stage listand in the generated sphinx documentation.
-
get_all_targets(stack)[source]¶ Targets to build to complete this stage given
stack.Typically, this is the StageStack’s path appended with the stamp name.
-
get_inputs()[source]¶ Returns the set of inputs required by this stage
This function must return a copy, to ensure internal data is not modified.
-
get_path(stack)[source]¶ On disk location for this stage given
stack.Called by
StageStackto determine the real path for virtual stages (which must override this function).- Return type
-
match(name)[source]¶ Check if the
namecan refer to this stageAs component of a
StageStack, a stage may be identified by alternative names and may also be parametrized by suffix modifiers. Stage types supporting this behavior must override this function.- Return type
-
name¶ The name of the stage is a string uniquely identifying it among all stages.
-
-
class
ymp.stage.base.ConfigStage(name, cfg)[source]¶ Bases:
ymp.stage.base.BaseStageBase for stages created via configuration
These Stages derive from the
yml.ymland not from a rules file.-
cfg¶ The configuration object defining this Stage.
-
property
defined_in¶ List of files defining this stage
Used to invalidate caches.
-
filename¶ Semi-colon separated list of file names defining this Stage.
-
lineno¶ Line number within the first file at which this Stage is defined.
-
ymp.stage.expander module¶
-
class
ymp.stage.expander.StageExpander[source]¶ Bases:
ymp.snakemake.ColonExpanderRegisters rules with stages when they are created
-
class
Formatter(expander)[source]¶ Bases:
ymp.snakemake.FormatExpander.Formatter,ymp.string.PartialFormatter
ymp.stage.groupby module¶
-
class
ymp.stage.groupby.GroupBy(name)[source]¶ Bases:
ymp.stage.base.BaseStageDummy stage for grouping
ymp.stage.pipeline module¶
Pipelines Module
Contains classes for pre-configured pipelines comprising multiple stages.
-
class
ymp.stage.pipeline.Pipeline(name, cfg)[source]¶ Bases:
ymp.stage.base.ConfigStageA virtual stage aggregating a sequence of stages, i.e. a pipeline or sub-workflow.
Pipelines are configured via
ymp.yml.Example
- pipelines:
- my_pipeline:
stage_1
stage_2
stage_3
-
can_provide(inputs)[source]¶ Determines which of
inputsthis stage can provide.The result dictionary values will point to the “real” output.
-
get_all_targets(stack)[source]¶ Targets to build to complete this stage given
stack.Typically, this is the StageStack’s path appended with the stamp name.
-
get_path(stack)[source]¶ On disk location for this stage given
stack.Called by
StageStackto determine the real path for virtual stages (which must override this function).
-
property
outputs¶ The outputs of a pipeline are the sum of the outputs of each component stage. Outputs of stages further down the pipeline override those generated earlier.
TODO: Allow hiding the output of intermediary stages.
-
property
pipeline¶
ymp.stage.project module¶
-
class
ymp.stage.project.PandasTableBuilder[source]¶ Bases:
objectBuilds the data table describing each sample in a project
This class implements loading and combining tabular data files as specified by the YAML configuration.
- Format:
string items are files
lists of files are concatenated top to bottom
dicts must have one “command” value:
‘join’ contains a two-item list the two items are joined ‘naturally’ on shared headers
‘table’ contains a list of one-item dicts dicts have form
key:value[,value...]a in-place table is created from the keys list-of-dict is necessary as dicts are unordered‘paste’ contains a list of tables pasted left to right tables pasted must be of equal length or length 1
if a value is a valid path relative to the csv/tsv/xls file’s location, it is expanded to a path relative to CWD
Example
- top.csv - join: - excel.xslx%left.csv - right.tsv - table: - sample: s1,s2,s3 - fq1: s1.1.fq, s2.1.fq, s3.1.fq - fq2: s1.2.fq, s2.2.fq, s3.2.fq
-
class
ymp.stage.project.Project(name, cfg)[source]¶ Bases:
ymp.stage.base.ConfigStageContains configuration for a source dataset to be processed
-
KEY_BCCOL= 'barcode_col'¶
-
KEY_DATA= 'data'¶
-
KEY_IDCOL= 'id_col'¶
-
KEY_READCOLS= 'read_cols'¶
-
RE_FILE= re.compile('^(?!http://).*(?:fq|fastq)(?:|\\.gz)$')¶
-
RE_REMOTE= re.compile('^(?:https?|ftp|sftp)://(?:.*)')¶
-
RE_SRR= re.compile('^[SED]RR[0-9]+$')¶
-
choose_id_column()[source]¶ Configures column to use as index on runs
If explicitly configured via KEY_IDCOL, verifies that the column exists and that it is unique. Otherwise chooses the leftmost unique column in the data.
-
property
data¶ Pandas dataframe of runs
Lazy loading property, first call may take a while.
-
property
fq_names¶ Names of all FastQ files
-
property
fwd_fq_names¶ Names of forward FastQ files (se and pe)
-
property
fwd_pe_fq_names¶ Names of forward FastQ files part of pair
-
get_fq_names(only_fwd=False, only_rev=False, only_pe=False, only_se=False)[source]¶ Get pipeline names of fq files
-
property
idcol¶
-
property
outputs¶ Returns the set of outputs this stage is able to generate.
May return either a
setor adictwith the dictionary values representing redirections in the case of virtual stages such asPipelineorReference.
-
property
pe_fq_names¶ Names of paired end FastQ files
-
property
project_name¶
-
property
rev_pe_fq_names¶ Names of reverse FastQ files part of pair
-
property
runs¶ Pandas dataframe index of runs
Lazy loading property, first call may take a while.
-
property
se_fq_names¶ Names of single end FastQ files
-
property
source_cfg¶
-
property
variables¶
-
ymp.stage.reference module¶
-
class
ymp.stage.reference.Archive(name, dirname, tar, url, strip, files)[source]¶ Bases:
object-
dirname= None¶
-
files= None¶
-
hash= None¶
-
name= None¶
-
strip_components= None¶
-
tar= None¶
-
-
class
ymp.stage.reference.Reference(name, cfg)[source]¶ Bases:
ymp.stage.base.ConfigStageRepresents (remote) reference file/database configuration
-
get_path(_stack)[source]¶ On disk location for this stage given
stack.Called by
StageStackto determine the real path for virtual stages (which must override this function).
-
ymp.stage.stack module¶
-
class
ymp.stage.stack.StageStack(path, stage=None)[source]¶ Bases:
objectThe “head” of a processing chain - a stack of stages
-
property
defined_in¶
-
classmethod
get(path, stage=None)[source]¶ Cached access to StageStack
- Parameters
path – Stage path
stage – Stage object at head of stack
-
property
path¶ On disk location of files provided by this stack
-
property
targets¶ Returns the current targets
-
used_stacks= {}¶
-
property
ymp.stage.stage module¶
-
class
ymp.stage.stage.Param(stage, key, name, value=None, default=None)[source]¶ Bases:
objectStage Parameter (base class)
-
property
constraint¶
-
property
-
class
ymp.stage.stage.ParamChoice(*args, **kwargs)[source]¶ Bases:
ymp.stage.stage.ParamStage Choice Parameter
-
class
ymp.stage.stage.ParamFlag(*args, **kwargs)[source]¶ Bases:
ymp.stage.stage.ParamStage Flag Parameter
-
class
ymp.stage.stage.ParamInt(*args, **kwargs)[source]¶ Bases:
ymp.stage.stage.ParamStage Int Parameter
-
class
ymp.stage.stage.Stage(name, altname=None, env=None, doc=None)[source]¶ Bases:
ymp.snakemake.WorkflowObject,ymp.stage.base.BaseStageCreates a new stage
While entered using
with, several stage specific variables are expanded within rules:{:this:}– The current stage directory{:that:}– The alternate output stage directory{:prev:}– The previous stage’s directory
- Parameters
-
active= None¶ Currently active stage (“entered”)
-
add_param(key, typ, name, value=None, default=None)[source]¶ Add parameter to stage
Example
>>> with Stage("test") as S >>> S.add_param("N", "int", "nval", default=50) >>> rule: >>> shell: "echo {param.nval}"
This would add a stage “test”, optionally callable as “testN123”, printing “50” or in the case of “testN123” printing “123”.
- Parameters
char – The character to use in the Stage name
typ – The type of the parameter (int, flag)
param – Name of parameter in params
value – value
{param.xyz}should be set to if param givendefault – default value for
{{param.xyz}}if no param given
-
env(name)[source]¶ Add package specifications to Stage environment
Note
This sets the environment for all rules within the stage, which leads to errors with Snakemake rule types not supporting conda environments
- Parameters
name (
str) – Environment name or filename
>>> Env("blast", packages="blast =2.7*") >>> with Stage("test") as S: >>> S.env("blast") >>> rule testing: >>> ...
>>> with Stage("test", env="blast") as S: >>> rule testing: >>> ...
>>> with Stage("test") as S: >>> rule testing: >>> conda: "blast" >>> ...
- Return type
None
-
get_inputs()[source]¶ Returns the set of inputs required by this stage
This function must return a copy, to ensure internal data is not modified.
-
match(name)[source]¶ Check if the
namecan refer to this stageAs component of a
StageStack, a stage may be identified by alternative names and may also be parametrized by suffix modifiers. Stage types supporting this behavior must override this function.
-
property
outputs¶ Returns the set of outputs this stage is able to generate.
May return either a
setor adictwith the dictionary values representing redirections in the case of virtual stages such asPipelineorReference.
-
require(**kwargs)[source]¶ Override inferred stage inputs
In theory, this should not be needed. But it’s simpler for now.