4.1 Run clustal omega
For help type clustalo --help
.
Assuming that you have installed a local version of the software and are looking within the directory containing the filtered sequence file, run clustalo
with the following command with input -i
, output -o
, and verbose -v
options:
To run from within a docker container the following command can be used:
(shown with command continuation \
for clarity.)
docker run -it --rm \
-v $(pwd):/data -w /data \
pegi3s/clustalomega \
-i spike_filtered.fa -o spike_filtered_omega.fa -v
Command Design Note:
This is a typical docker
command that will run
in an interactive terminal (-it
) within a container that will be removed upon completion of the task (--rm
.)
The current directory $(pwd)
is mapped (-v
) to a directory named /data
that will be created within the container and set as the default working directory (-w
.)
The pulled docker image used to create the temporary container is named pegi3s/clustalomega
and its internal installation of clustalomega (clustalo
- implied) will immediately run upon the starting of the container and is provided with the input -i
, output -o
commands and files that should be present in the working directory. The verbose (-v
) command will provide explicit information as the clustalo
program runs
Docker for WINDOWS:
The variable defining the current directory $(pwd)
is created on the fly in a Unix/Linux/MacOS environment.
Windows users would need one more step and use curly brackets:
# step 1 - define variable with Get-Location command:
$loc = Get-Location
# step 2: implement docker command with curly brackets
# e.g. within PowerShell or cmd Windows terminal:
docker run -it --rm -v ${loc}:/data -w /data pegi3s/clustalomega
-i spike_filtered.fa -o spike_filtered_omega.fa -v
Therefore the docker run
command only differs by replacing $(pwd)
with the predifined variable ${loc}
written within curly brackets rather than parenthesis.
Note that a Windows PATH could be used instead of the variable, for example C:\Users\someone\somewhere\
.
In either case the following output will be echoed on the terminal thanks to the verbose option. The number of threads will depend on your CPU.
Using 4 threads
Read 32 sequences (type: Protein) from spike_filtered.fa
not more sequences (32) than cluster-size (100), turn off mBed
Calculating pairwise ktuple-distances...
Ktuple-distance calculation progress done.
CPU time: 0.56u 0.02s 00:00:00.58 Elapsed: 00:00:00
Guide-tree computation done.
Progressive alignment progress done.
CPU time: 5.95u 0.68s 00:00:06.63 Elapsed: 00:00:07
Alignment written to spike_filtered_omega.fa
Since the sequences are very similar, looking through the aligned sequences file does not seem to provide much insight at first glance. For example using the command:
Changing the format from Multiple FastA format where sequences are shown one by one sequentially to a format where sequences are “meshed”, “interleaved”, or “interlaced” together in an actual alignment might be helpful.
For this we can use the EMBOSS
9 software that have been developped over the years to provide tools pertinent to (old fashioned) sequence analysis.
As with Clustal Omega, EMBOSS
can be installed locally or accessed as a docker
container. The latter is the easiest option. (See Introduction Chapter 1 for material suggestion to learn how to use docker
.)