The workflow file (usually named as workflow.xml) is the file where all the steps to execute and theirs parameters will be set. This file use the XML syntax and is divided in 3 sections :
In all parameter values you can use variables (e.g. ${variable}
) that contains values for :
${eoulsan.version}
,
${eoulsan.build.number}
, ${eoulsan.build.date}
,
${design.file.path}
, ${workflow.file.path}
,
${output.path}
, ${job.path}
,
${job.id}
, ${job.uuid}
and ${available.processors}
)${design.header.Project}
, ${design.header.GenomeFile}
...)${java.version}
)${PATH}
, ${PWD}
)User can also insert in parameter or attribute values the output of a shell command with expression between "`":
<value>`cat /proc/cpuinfo | grep processor | wc -l`</value> <value>`pwd`/tmp</value> <value>`basedir ${user.home}`/tmp</value> <step> skip="false" discardoutput="true" requiredprocs="`nprocs`"</step>
All the tags must be in lower case. The following source show the structure of a typical workflow.xml file:
<analysis> <formatversion>1.0</formatversion> <name>my analysis</name> <description>Demo analysis</description> <author>Laurent Jourdren</author> <constants> <parameter> <name>my.constant</name> <value>myconstantvalue</value> </parameter> </constants> <steps> <!-- Filter reads --> <step id="filterreads" skip="false"> <name>filterreads</name> <parameters> <parameter> <name>trim.length.threshold</name> <value>11</value> </parameter> <parameter> <name>quality.threshold</name> <value>12</value> </parameter> </parameters> </step> <!-- Map reads --> <step id="mapreads" skip="false"> <module>mapreads</module> <parameters> <parameter> <name>mapper</name> <value>bowtie</value> </parameter> <parameter> <name>mapper.arguments</name> <value>--best -k 2</value> </parameter> </parameters> </step> <!-- SAM filter --> <step id="filtersam" skip="false"> <module>filtersam</module> <parameters> <parameter> <name>removeunmapped</name> <value></value> </parameter> <parameter> <name>removemultimatches</name> <value></value> </parameter> </parameters> </step> <!-- Expression --> <step id="expression" skip="false"> <module>expression</module> <parameters> <parameter> <name>counter</name> <value>htseq-count</value> </parameter> <parameter> <name>genomictype</name> <value>gene</value> </parameter> <parameter> <name>attributeid</name> <value>ID</value> </parameter> <parameter> <name>stranded</name> <value>no</value> </parameter> <parameter> <name>overlapmode</name> <value>union</value> </parameter> <parameter> <name>removeambiguouscases</name> <value>true</value> </parameter> </parameters> </step> <!-- Normalization --> <step id="normalization" skip="false"> <module>normalization</module> <parameters/> </step> <!-- Diffana --> <step id="diffana" skip="false"> <module>diffana</module> <parameters> <parameter> <name>disp.est.method</name> <value>pooled</value> </parameter> <parameter> <name>disp.est.sharing.mode</name> <value>maximum</value> </parameter> <parameter> <name>disp.est.fit.type</name> <value>local</value> </parameter> </parameters> </step> </steps> <globals> <parameter> <name>main.tmp.dir</name> <value>/tmp</value> </parameter> </globals> </analysis>
The first tags of the workflow file allow to set some information about the file:
The constant section allow to define additional variables that can be used in the values of the parameters with the ${variable}
syntax.
Previously defined constants (and other variables) can be used in a new constant.
Note that the constants section is optional.
<constants> <parameter> <name>my.constant1</name> <value>foo</value> </parameter> <parameter> <name>my.constant2</name> <value>${my.constant1}-bar</value> </parameter> </constants>
The steps section contains the list all the steps to execute. Each step has a name and parameters and optionnaly a version and inputs:
Tag | Type | Optional | Description |
---|---|---|---|
module | string | False | The name of the module to execute by the step |
version | string | True | The version of the step to use |
inputs | XML tags | True | Manually define the data sources to use by the step |
parameters | XML tags | True | The parameters of the step |
The step tag can have 3 optional attributes:
Attribute | Type | Default value | Description |
---|---|---|---|
id | string | The name of the module to execute | This value define the identifier of the step. The id value must be unique in a workflow. The identifier is used to named output filenames of the step |
discardoutput | string | no | When this attribute is set to success, the output files of the step will be saved in the working directory instead of the output directory of the workflow and will be removed at the end of the workflow if successful. If you use asap instead of success, the output files of the step will be removed once all the steps that require the outputs will be completed. |
skip | boolean | false | The skip attribute allow to skip a step if its value is set to true |
requiredprocs | integer | -1 | The requiredprocs attribute allow to set the number of processors to use by the step.
By default one processor will be used to process each task of a step (except for steps that in local mode that handle their parallelization like the mapping step). |
requiredmemory | integer | -1 | The requiredmemory attribute allow to set the amount of memory required in megabytes by the step.
This value is only used in clusterexec mode. If not set, Eoulsan will require to the cluster scheduler the same amount of memory allocated to Eoulsan JVM.
Unit prefixes like MB, M, GB, G can be used for the required memory value (e.g. 8GB). |
dataproduct | string | cross | The dataproduct attribute allow to set the method to use for combining data before executing a step.
By default a cross product is used. If you need that all the input data have the same name and must be executed together use match method instead. |
If not set by the user, the Eoulsan workflow engine will take as data source for each input port the last previous step that generate data of the format that requested by the input port.
If the user do not want to use the last source of data, it can manually define using input
tags and its port
, fromstep
and fromport
subtags.
<steps> <!-- Filter reads --> <step id="myfilterreadstep" discardoutput="true" skip="false" dataproduct="cross"> <module>filterreads</module> <version>2.6.1</version> <parameters> <parameter> <name>trim.length.threshold</name> <value>11</value> </parameter> <parameter> <name>quality.threshold</name> <value>12</value> </parameter> </parameters> </step> <!-- Map reads --> <step id="mapping" skip="false" requiredprocs="4" requiredmemory="8GB" dataproduct="cross"> <module>mapreads</module> <version>2.6.1</version> <inputs> <input> <port>reads</port> <fromstep>myfilterreadstep</fromstep> <fromport>output</fromport> </input> </inputs> <parameters> <parameter> <name>mapper</name> <value>soap</value> </parameter> </parameters> </step> ... </steps>
The global parameter section contains parameters that are shared by all the steps. The syntax of the global parameters is the same as in the steps.
<globals> <parameter> <name>main.tmp.dir</name> <value>/home/jourdren/tmp</value> </parameter> </globals>
The global parameters override the values of the configuration file. For more information about the configuration file see the configuration file page.