4.1 Run clustal omega

For help type clustalo --help.

Assuming that you have installed a local version of the software and are looking within the directory containing the filtered sequence file, run clustalo with the following command with input -i, output -o, and verbose -v options:

clustalo -i spike_filtered.fa -o spike_filtered_omega.fa -v

To run from within a docker container the following command can be used: (shown with command continuation \ for clarity.)

docker run -it --rm  \
-v $(pwd):/data -w /data \
pegi3s/clustalomega \
-i spike_filtered.fa -o spike_filtered_omega.fa -v 

Command Design Note:

This is a typical docker command that will run in an interactive terminal (-it) within a container that will be removed upon completion of the task (--rm.)

The current directory $(pwd) is mapped (-v) to a directory named /data that will be created within the container and set as the default working directory (-w.)

The pulled docker image used to create the temporary container is named pegi3s/clustalomega and its internal installation of clustalomega (clustalo - implied) will immediately run upon the starting of the container and is provided with the input -i, output -o commands and files that should be present in the working directory. The verbose (-v) command will provide explicit information as the clustalo program runs

Docker for WINDOWS:

The variable defining the current directory $(pwd) is created on the fly in a Unix/Linux/MacOS environment.

Windows users would need one more step and use curly brackets:

# step 1 - define variable with Get-Location command:
$loc = Get-Location

# step 2: implement docker command with curly brackets 
# e.g. within PowerShell or cmd Windows terminal:

docker run -it --rm  -v ${loc}:/data -w /data pegi3s/clustalomega
-i spike_filtered.fa -o spike_filtered_omega.fa -v 

Therefore the docker run command only differs by replacing $(pwd) with the predifined variable ${loc} written within curly brackets rather than parenthesis.

Note that a Windows PATH could be used instead of the variable, for example C:\Users\someone\somewhere\.

In either case the following output will be echoed on the terminal thanks to the verbose option. The number of threads will depend on your CPU.

Using 4 threads
Read 32 sequences (type: Protein) from spike_filtered.fa
not more sequences (32) than cluster-size (100), turn off mBed
Calculating pairwise ktuple-distances...
Ktuple-distance calculation progress done. 
CPU time: 0.56u 0.02s 00:00:00.58 Elapsed: 00:00:00
Guide-tree computation done.
Progressive alignment progress done. 
CPU time: 5.95u 0.68s 00:00:06.63 Elapsed: 00:00:07
Alignment written to spike_filtered_omega.fa

Since the sequences are very similar, looking through the aligned sequences file does not seem to provide much insight at first glance. For example using the command:

more spike_filtered_omega.fa

Changing the format from Multiple FastA format where sequences are shown one by one sequentially to a format where sequences are “meshed”, “interleaved”, or “interlaced” together in an actual alignment might be helpful.

For this we can use the EMBOSS9 software that have been developped over the years to provide tools pertinent to (old fashioned) sequence analysis.

As with Clustal Omega, EMBOSS can be installed locally or accessed as a docker container. The latter is the easiest option. (See Introduction Chapter 1 for material suggestion to learn how to use docker.)