Chapter 5 Distance matrix
The sequences are very similar to each other as we could observe in the alignment.
But how many amino acids are different between the various sequences?
Another questions we could askis “what is the largest number of differences amongst all the sequences?”
The calculation of a “distance matrix” could help, and clustalo
can calculate such a matrix while performing the alignment.
The qualifier --force
is only necessary if the calculation needs to be run multiple times (e.g. when testing) to allow the overwriting of a previous file.
clustalo -i spike_filtered.fa -o spike_filtered_omega.fa -v \
--distmat-out=spike_filtered_omega.dist \
--full --force
Or if using docker
: (Windows users can refer to section 4.1 for specific Windows command format.)
docker run -it --rm -v $(pwd):/data -w /data \
pegi3s/clustalomega -i spike_filtered.fa -o spike_filtered_omega.fa -v \
--distmat-out=spike_filtered_omega.dist \
--full --force
We can look at the text file of the matrix with the following command that will prevent “soft wrapping” of lines:
Here we print a few truncated lines to explore the format showing the first 4 lines and the first 60 characters of each line:
32
QIU81885.1 0.000000 0.001571 0.001571 0.001571 0.001571
QIU80913.1 0.001571 0.000000 0.001571 0.001571 0.001571
QIU81585.1 0.001571 0.001571 0.000000 0.001571 0.001571
In this case 32
is the number of sequences and is shown alone on the first line. (Current update now has 167
sequences.)
But these numbers are not very useful in themselves.