The design file is the main element of the pipeline. It contains all informations and descriptions of the experiment. It is a simple tabulated plain text file inspired by limma design file. Usually the design file is named design.txt. Here is a sample of design file:
SampleNumber Name Reads Genome Annotation FastqFormat Condition RepTechGroup Experiment Reference 1 s1 s1.fq mouse_build37.fasta mouse_build37.gff fastq-illumina c1 repT1 Experiment1 true 2 s2 s2.fq mouse_build37.fasta mouse_build37.gff fastq-illumina c2 repT2 Experiment1 false 3 s3 s3.fq mouse_build37.fasta mouse_build37.gff fastq-illumina c1 repT3 Experiment2 false 4 s4 s4.fq mouse_build37.fasta mouse_build37.gff fastq-illumina c2 repT3 Experiment2 false
In a design file 3 fields are mandatory:
User can add any additional field in this file. Some of the optional field are currently used in Eoulsan:
All paths in the design file can be URL and compressed files in gzip (.gz) or bzip2 (.bz2) are handled by Eoulsan.
Note: For some fields like genome or annotation, only one unique value is allowed for all the samples of the design file.
Paired-end files of a sample can be set between bracket and must be separated by a coma. The following design file show a example of design file with paired-end files:
SampleNumber Name Reads Genome Annotation FastqFormat Condition ReplicateType UUID 1 s1 [s1a.fq.bz2,s1b.fq.bz2] mouse_build37.fasta mouse_build37.gff fastq-sanger s1 B 705d190c-de47-4c4f-8ddf-881c9b89ca66 2 s2 [s2a.fq.bz2,s2b.fq.bz2] mouse_build37.fasta mouse_build37.gff fastq-sanger s2 B 54c8833f-77e1-4e4f-90c2-742e459df7a7
To avoid duplication of genome and annotation files (and save disk space), Eoulsan can access to central repositories dedicated for this types of data. To access to this repositories, Eoulsan support four protocols : genome, gff, gtf and additionalannotation. See the data repository to see how define repositories. Here are few example of design file using genome and gff protocols:
SampleNumber Name Reads Genome Annotation FastqFormat Condition RepTechGroup Experiment 1 s1 s1.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT1 project1 12 s2 s2.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT1 project1 3 s3 s3.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT2 project1 54 s4 s4.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT2 project1 15 s1 s1.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 repT3 project1 5 s2 s2.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 repT4 project1 25 s3 s3.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 repT3 project1 13 s4 s4.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 repT3 project1
To handle replicates, the RepTechGroup fieldhave been add to the design file. Here is an example of a design file using RepTechGroup field:
SampleNumber Name Reads Genome Annotation FastqFormat Condition RepTechGroup Experiment 1 s1 s1.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT1 project1 12 s2 s2.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT1 project1 3 s3 s3.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT2 project1 54 s4 s4.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT2 project1 15 s1 s1.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 na project1 5 s2 s2.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 NA project1 25 s3 s3.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 repT3 project1 13 s4 s4.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 repT3 project1
In this example below there is three technical replicates groups. For all sample with the same RepTechGroup value, reads counts are pooled. Sample with na, NA, Na or nA value aren't pooled.
For experiments without technicals replicates, all RepTechGroup fields must have a na value. Here is an example:
SampleNumber Name Reads Genome Annotation FastqFormat Condition RepTechGroup Experiment 1 s1 s1.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 na project1 12 s2 s2.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 Na project1 3 s3 s3.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 na project1 54 s4 s4.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 NA project1 15 s1 s1.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 na project1 5 s2 s2.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 NA project1 25 s3 s3.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 na project1 13 s4 s4.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 NA project1
SampleNumber Name Reads Genome Annotation FastqFormat Condition RepTechGroup Experiment 1 s1 s1.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT1 project1 12 s2 s2.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT1 project1 3 s3 s3.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 repT2 project1 54 s4 s4.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 repT2 project1 15 s1 s1.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c3 repT3 project2 5 s2 s2.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c3 repT3 project2 25 s3 s3.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c4 repT4 project2 13 s4 s4.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c4 repT4 project2
SampleNumber Name Reads Genome Annotation FastqFormat Condition RepTechGroup Experiment Reference 1 s1 s1.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT1 project1 true 12 s2 s2.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c1 repT1 project1 false 3 s3 s3.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 repT2 project1 false 54 s4 s4.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c2 repT2 project1 false 15 s1 s1.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c3 repT3 project2 false 5 s2 s2.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c3 repT3 project2 false 25 s3 s3.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c4 repT4 project2 false 13 s4 s4.fq genome://mouse_build37 gff://mouse_build37 fastq-illumina c4 repT4 project2 false