4.1 Run clustal omega

For help type clustalo --help.

Assuming that you have installed a local version of the software and are looking within the directory containing the filtered sequence file, run clustalo with the following command with input -i, output -o, and verbose -v options:

clustalo -i spike_filtered.fa -o spike_filtered_omega.fa -v

To run from within a docker container the following command can be used: (shown with command continuation \ for clarity.)

docker run -it --rm  \
-v $(pwd):/data -w /data \
pegi3s/clustalomega \
-i spike_filtered.fa -o spike_filtered_omega.fa -v

Command Design Note:

This is a typical docker command that will run in an interactive terminal (-it) within a container that will be removed upon completion of the task (--rm.)

The current directory $(pwd) is mapped (-v) to a directory named /data that will be created within the container and set as the default working directory (-w.)

The pulled docker image used to create the temporary container is named pegi3s/clustalomega and its internal installation of clustalomega (clustalo - implied) will immediately run upon the starting of the container and is provided with the input -i, output -o commands and files that should be present in the working directory. The verbose (-v) command will provide explicit information as the clustalo program runs

Docker for WINDOWS:

The variable defining the current directory $(pwd) is created on the fly in a Unix/Linux/MacOS environment.

Windows users would need one more step and use curly brackets:

# step 1 - define variable with Get-Location command:
$loc = Get-Location

# step 2: implement docker command with curly brackets 
# e.g. within PowerShell or cmd Windows terminal:

docker run -it --rm  -v ${loc}:/data -w /data pegi3s/clustalomega
-i spike_filtered.fa -o spike_filtered_omega.fa -v

Therefore the docker run command only differs by replacing $(pwd) with the predifined variable ${loc} written within curly brackets rather than parenthesis.

Note that a Windows PATH could be used instead of the variable, for example C:\Users\someone\somewhere\.

In either case the following output will be echoed on the terminal thanks to the verbose option. The number of threads will depend on your CPU.

Using 4 threads
Read 32 sequences (type: Protein) from spike_filtered.fa
not more sequences (32) than cluster-size (100), turn off mBed
Calculating pairwise ktuple-distances...
Ktuple-distance calculation progress done. 
CPU time: 0.56u 0.02s 00:00:00.58 Elapsed: 00:00:00
Guide-tree computation done.
Progressive alignment progress done. 
CPU time: 5.95u 0.68s 00:00:06.63 Elapsed: 00:00:07
Alignment written to spike_filtered_omega.fa

Since the sequences are very similar, looking through the aligned sequences file does not seem to provide much insight at first glance. For example using the command:

more spike_filtered_omega.fa

Changing the format from Multiple FastA format where sequences are shown one by one sequentially to a format where sequences are “meshed”, “interleaved”, or “interlaced” together in an actual alignment might be helpful.

For this we can use the EMBOSS⁹ software that have been developped over the years to provide tools pertinent to (old fashioned) sequence analysis.

As with Clustal Omega, EMBOSS can be installed locally or accessed as a docker container. The latter is the easiest option. (See Introduction Chapter 1 for material suggestion to learn how to use docker.)

http://emboss.sourceforge.net/↩︎