Aozan is a tool that automatically handle Illumina sequencer generated data from the end of sequencing to the demultiplexing while also performing quality control. One of the greatest strength of Aozan is that it doesn't require any user action to process data.
Each step of the post-sequencing data processing is rather easy to do. However, the duration of each step (data transfer, demultiplexing and quality control) is quite long and before getting data ready to analysis, user must watch the end of each step to avoid waste of time. Executing these tasks after each sequencing process is a laborious job. Aozan allows to save time by automate all these tasks. In addition, Aozan provide a Bcl2fastq CSV samplesheet generator from an XLS or XLSX file to avoid common syntax errors in the CSV file and to allow usage of alias for the index sequences. This online tool is available here.
Aozan is not an interactive tool, it communicates with users through mails. It is launched regularly (usually every hour) through a cron job. There are 5 steps in Aozan. Once the end of a run has been discovered, synchronization, demultiplexing and quality control will be automatically executed. However, if the end of another run is discovered at the end of this last 4 steps, the synchronization of the new run will be launched before resuming the end of the analysis of the previous run.
The 6 native steps of Aozan are:
To simply the installation and configuration of Azan, we provide a shell script, that allow to create all directories required by Aozan and a valid configuration file for your system. This script can also download all the files required by the demo (Azoan, raw data and reference data). However, you still need to install the Aozan requierements.
The script is available in the example data section of the documentation.
To run Aozan, you need to install the following software:
On Debian/Ubuntu, you can install requirements (except Bcl2fastq and BCL Convert) using the 'apt-get' command, here is an example:
$ sudo apt-get install openjdk-11-jre-headless rsync
The Bcl2FastQ conversion software is a tool which handle bcl conversion and demultiplexing of both unzipped and zipped bcl files. bcl2fastq 2 can be downloaded on the Illumina website here.
On CentOS, you can install Bcl2fastq using the following commands:
$ cd /tmp $ wget http://support.illumina.com/content/dam/illumina-support/documents/downloads/software/bcl2fastq/bcl2fastq2-v2-18-0-12-linux-x86-64.zip # Install $ unzip bcl2fastq2-*.zip $ sudo yum -y --nogpgcheck localinstall /tmp/bcl2fastq2-*.rpm # Patch a punctual error to search css file for create the final report html $ cd /usr/local/bin $ sudo ln -s ../share/ # Install requiered dependencies $ yum install -y zip.x86_64
As Bcl2fastq 2 is a static binary, you can also use the RPM package on Debian/Ubuntu using the following commands:
$ cd /tmp $ wget http://support.illumina.com/content/dam/illumina-support/documents/downloads/software/bcl2fastq/bcl2fastq2-v2-18-0-12-linux-x86-64.zip $ unzip bcl2fastq2-*.zip $ sudo alien -i bcl2fastq2-*.rpm
The BCL Convert software is a tool which handle bcl conversion and demultiplexing of bcl files. BCL Convert can be downloaded on the Illumina website here.
On CentOS, you can install Bcl2fastq using the following commands:
$ cd /tmp $ wget https://webdata.illumina.com/downloads/software/bcl-convert/bcl-convert-4.0.3-2.el7.x86_64.rpm # Install $ sudo yum --assumeyes install bcl-convert-4.0.3-2.el7.x86_64.rpm
As BCL Convert is just one binary in a RPM package, you can also use the RPM package on Debian/Ubuntu using the following commands:
$ cd /tmp $ wget https://webdata.illumina.com/downloads/software/bcl-convert/bcl-convert-4.0.3-2.el7.x86_64.rpm $ sudo alien -i bcl-convert-4.0.3-2.el7.x86_64.rpm
In Aozan, the output of the "Overrepresented Sequences" module from FastQC has been improved. For sequences labelled as "No hit", we launch a blast on the NR databank and report its best hit. This greatly helps for the discovery of contaminating sequences.
Aozan can use Blast2 ou Blast+ to perform the blast.
To installing ncbi-blast+ on your system (Debian or Ubuntu), use the following command line:
$ sudo apt-get install ncbi-blast+
Now download the required "nt" database from NCBI :
$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.??.tar.gz*
Unzip all files, the first nt.00.tar.gz contains the file nt.nal.
Use the perl script to update the database, you can consult the NCBI documentation.
The installation of Aozan is very easy, you just had to uncompress the archive:
$ tar xzf aozan-3.1.1.tar.gz
Aozan is written in Python and Java. It uses the Java implementation of Python (Jython) that is bundled in Aozan.
Aozan and its dependancies are available throw Docker images. You can:
To see how install docker on your system, go to the Docker website. Even if Docker can run in virtual machines in Windows or macOS, we recommand to only run Aozan on a Linux host.
You can use a Docker image with Aozan and all its optional dependencies (Bc2fastq and Blast) instead of installating manually Aozan.
This image is named genomicpariscentre/aozan:3.1.1
.
When you use this Docker image you need to mount all the required directories by Aozan in the Docker container.
If you had installed manually Aozan, you can launch bcl2fastq and/or blast inside a Docker container.
To do this, you only need to set the Aozan bcl2fastq.use.docker
configuration property to True for bcl2fastq and qc.conf.fastqc.blast.use.docker
to True for Blast.
If you do not use the /var/run/docker.sock
socket to communicate with Docker deamon, you must change the value of the docker.uri
setting in the Aozan configuration.
Aozan is usually launched regularly as a cron job. However, Aozan can also be launched manually.
In the following examples, Aozan is installed in /usr/local/aozan
and the configuration file is /etc/aozan.conf
.
Note that it is better to configure your aozan.conf file before running Aozan.
The configuration file is a text file and parameters are key-value pairs. See the pages about steps for more details.
In this case, we can launch Aozan with the following command:
$ /usr/local/aozan/aozan.sh /etc/aozan.conf
In the following lines, we configure our system to launch Aozan every hour using a script named /etc/cron.daily/aozan
(on a Debian/Ubuntu GNU/Linux distribution).
#!/bin/bash # User to use to launch Aozan AOZAN_USER=nobody # Path to Aozan base directory AOZAN_DIR=/usr/local/aozan # Path to Aozan configuration AOZAN_CONF=/etc/aozan.conf su $AOZAN_USER -c "$AOZAN_DIR/aozan.sh --quiet $AOZAN_CONF"
The --quiet
option avoid displaying message if another Aozan instance is currently running.
Then we set the permission on the Aozan cron script:
$ sudo chmod 755 /etc/cron/daily/aozan && sudo chmod root:root /etc/cron/daily/aozan
Aozan can handle several sequencer instruments. For each instruments you must allow Aozan computer to have access to HiSeq output directories. On HiSeq 2000/2500, 2 hard drives are dedicated to each flow cell slot. So you must share each hard drive with Aozan computer.
You can also choose to force the sequencer to directly write its data on a network storage like a NAS. In this case you must mount this network storage (using preferably an Unix network file system like NFS) on the computer where Aozan is installed.
First on the sequencer computer, share the hard drive that contains generated data (usualy F: and G:). To do this, open the explorer and right-click on each Hard drive, share... The shares can be in read only mode (recommended).
Security issues: we recommend to shares sequencer output directories in read only mode and restrict the shares access to Aozan computer. To do this, you can configure the Windows firewall.
$ sudo apt-get install cifs-utils smbclient
$ smbclient -U sbsuser 'smb://hiseq01.example.com/F$'
//hiseq01.example.com/F$ /mnt/hiseq01_f cifs username=sbsuser,password=hiseqpassword 0 0 //hiseq01.example.com/G$ /mnt/hiseq01_g cifs username=sbsuser,password=hiseqpassword 0 0
$ sudo mkdir -p /mnt/hiseq01_f /mnt/hiseq01_g && \ sudo mount /mnt/hiseq01_f && \ sudo mount /mnt/hiseq01_g
You can also use autofs to mount the share.
To work, Aozan needs the following directories. The path of these directories must be set in the Aozan configuration file.
An example of an Aozan configuration file can be found here.
Aozan property | Sample value | description |
---|---|---|
aozan.var.path (*) | /var/lib/aozan | Aozan internal data directory. It contains log files and history of processed runs |
aozan.log.path | /var/log/aozan | Path to the Aozan log file |
hiseq.data.path | /mnt/hiseq01_f:/mnt/hiseq01_g | Hiseq output directories. Multiple values are allowed if there is several sequencers or 2 output directories for each flow cell of an HiSeq 2000 (paths separated by ':') |
bcl.data.path | /mnt/storage/bcl | Sequencer output data after synchronization. Usualy cif files are not copied in this directory |
fastq.data.path | /mnt/storage/fastq | Directory for the output of demultiplexing with Bcl2fastq |
reports.data.path | /mnt/storage/reports | Directory for the QC report |
bcl2fastq.samplesheet.path | /mnt/storage/samplesheet | Directory with Bcl2fastq sample sheets (with files named like samplesheet_INSTRUMENT-SN_RUN-NUMBER.xls where INSTRUMENT-SN is the instrument serial number and RUN-NUMBER is the run number, e.g. samplesheet_SNL125_0067.xls ) for demultiplexing. If a custom script is used to generate CSV samplesheet files, this directory will no be used. |
tmp.path | /tmp | Temporary directory |
(*) The directory specified in field aozan.var.path contains the following files. Aozan allows to process several runs at the same time. At the end of a step, it adds the run id of that run that has been processed in the step log file.
This section describe the Aozan global configuration settings. For the steps settings, check in the steps documentation.
An example of aozan configuration file is here.
Aozan property | Type | Default value | description |
---|---|---|---|
include | string | No set | Load the configuration entries from another configuration file path. The values loaded from this new configuration file override existing values |
aozan.enable | boolean | False | Enable Aozan |
aozan.log.level | string | INFO | Log level (ALL, FINEST, FINER, FINE, CONFIG, INFO, WARNING, SEVERE, OFF) |
aozan.log.start.stop | boolean | False | Log application start and shutdown |
aozan.debug | boolean | False | Enable debug mode |
lock.file | string | /var/lock/aozan.lock | Aozan lock file path. This file that prevent two instances of Aozan running at the same time |
index.html.template | string | Not set | HTML page template that describe a run. If not set, the default template included in the aozan jar file will be used |
reports.url | string | Not set | Run reports URL |
hiseq.critical.min.space | integer | 1099511627776 | Threshold before sending a email at each Aozan start if not enough space is available on HiSeq output disk, the value corresponds 1 Tb in bytes |
read.only.output.files | boolean | True | Set rights of output files to read only |
Email is the only mean for Aozan to inform users. This section show how to configure Aozan email sending. Aozan currently only support sending email using SMTP without authentification and encryption.
Aozan property | Type | Default value | description |
---|---|---|---|
send.mail | boolean | False | Enable sending email |
smtp.server | string | Not set | SMTP server address |
smtp.port | integer | 25 (465 is SSL enabled) | SMTP server port |
smtp.use.starttls | boolean | False | Use StartTLS to connect to the SMTP server |
smtp.use.ssl | boolean | False | Use SSL to connect to the SMTP server |
smtp.login | string | Not set | Login to use for the connection to the SMTP server |
smtp.password | string | Not set | Password to use for the connection to the SMTP server |
mail.from | string | Not set | Email of the sender |
mail.to | string | Not set | Email recipient |
mail.error.to | string | Not set | Email recipient when an error occurs during Aozan |
mail.header | string | THIS IS AN AUTOMATED MESSAGE.\n\n | Email header |
mail.footer | string | \n\nThe Aozan team.\n | Email footer |