This module allow to filter input reads. Currently, this module can trim polyN read tails, remove reads with a short length and discard reads with bad base quality mean. Eoulsan provides a plugin system for reads filters. To enable a filter, a parameter for this filter must be set. If the filter takes no option, add a parameter with the name of the filter as the key and an empty string for the value.
The filters order that will be applied on the reads is the same order of the filters parameters in the workflow file. So the count of the filtered reads by each filter in the log will be different according to the filter parameters order in the workflow file.
When the parameter type is none, the value of the parameter is not read by the filter and it can be left empty.
Warning: Some filters can modify the output reads (e.g. the trimpolynend
filter remove the polyN tails of the reads). So a filter like the quality
filter will not produce the same output if declared before or after the trim filter.
Parameter | Type | Description | Default value | Modify reads |
---|---|---|---|---|
paircheck | none | Check if the identifiers of the two ends had the same identifier if enabled. | N/A | No |
pairedend.accept.paired.end | boolean | Remove all paired-end reads if false. | Not set | No |
pairedend.accept.single.end | boolean | Remove all single-end reads if false. | Not set | No |
illuminaid | none | Remove all reads that not pass illumina filters if enabled. | N/A | No |
quality.threshold | float | The threshold for the mean base quality. Unit in decimal quality score | Not set | No |
trimpolynend | none | This filter trim polyN tails of reads if enabled. | N/A | Yes |
length.minimal.length.threshold | integer | The minimal threshold for the reads length. Unit in bases. | Not set | No |
trim.length.threshold | integer | The threshold for the length of the reads. Unit in bases. This filter trim polyN tails of reads. This filter is deprecated, use instead trimpolyn and length.minimal.length.threshold . |
Not set | Yes |
readnamestartwith.forbidden.prefixes | string | Remove all reads with id that starts with one of prefixes separated by comma. | Not set | No |
readnamestartwith.allowed.prefixes | string | Keep only the reads with id that starts with one of prefixes separated by comma. | Not set | No |
readnameregex.forbidden.regex | string | Remove all the reads with id that matches with the regular expression. | Not set | No |
readnameregex.allowed.regex | string | Keep only the reads with id that matches with the regular expression. | Not set | No |
hadoop.reducer.task.count | integer | The count of Hadoop reducer tasks to use for this step. This parameter is only used in Hadoop mode. | Not set | N/A |
maxlength.maximum.length.threshold | integer | The maximum threshold for the reads length. Unit in bases. | Not set | No |
readsequenceregex.forbidden.regex | string | Remove all the reads with pattern that matches with the regular expression. | Not set | No |
readsequenceregex.allowed.regex | string | Keep only the reads with pattern that matches with the regular expression. | Not set | No |
slidingwindow.arguments | string | Cutting once the average quality within the window falls below a threshold. | Not set | No |
trailing.arguments | string | Remove low quality bases from the end. | Not set | No |
leading.arguments | string | Remove low quality bases from the beginning. | Not set | No |
headcrop.arguments | string | Remove the specified number of bases from beginning of the read. | Not set | No |
crop.arguments | string | Remove bases regardless of quality from the end of the read. | Not set | No |
nanoporesequencetype.keep | string | Keep only a type of Nanopore reads. Available values are: template, complement and consensus. For 1D sequencing, use consensus value to keep all the reads. | consensus | No |
polyatail.minimal.length | integer | Mininal length of polyA/polyT tail. This filter just add a "tail_type" field in the read headers. | 10 | Header |
polyatail.maximal.error.rate | float | Maximal threshold allowed errors in polyA/polyT tails. | 0.1 | Header |
polyatail.minimal.length.for.error.rate.computation | integer | Minimal length of tail sequence before computing the error rate. | 5 | Header |
reversepolyt | none | This filter reverse complements reads with a "tail_type=polyT" field in read header. | N/A | No |
removeinvalidpolya.allowed.tail.type | string | This filter will keep only reads with specified value(s) for the "tail_type" field in the read headers. Reads with this field will be discarted. | polyaA,polyT | No |
ggghead | none | This filter will search for GGG head and CCC tail and add additional fields in read header fields. | N/A | Header |
requireggghead.allow.mismatch | boolean | This filter will remove any sequence without GGG head. This parameter allow one mismatch in the GGG sequence. | true | No |
<!-- Filter reads step --> <step id="myfilterreadsstep" skip="false" discardoutput="true"> <module>filterreads</module> <parameters> <parameter> <name>illuminaid</name> <value></value> </parameter> <parameter> <name>trimpolynend</name> <value></value> </parameter> <parameter> <name>length.minimal.length.threshold</name> <value>40</value> </parameter> <parameter> <name>quality.threshold</name> <value>30</value> </parameter> </parameters> </step>