1 Learning objectives

  • Select Docker containers from the docker hub
  • Use a Docker container to accomplish tasks
  • Review and use shared directories

In class these exercises will be run onto the classroom iMacs.

However, as best as I can I’ll provide Windows hints and instructions when possible, but a basic understanding of line-command under Windows would be more than useful for that (e.g. know what is DOS for example.)

1.1 Requirements

  • Be familiar with Docker or follow workshop 1 “Docker - Beginner Biologist 1”

  • Docker will be used from a line-command terminal: Terminal on a Macintosh in the classroom. A rudimentary knowledge of bash command-line is necessary.

  • If you are a Windows user: PowerShell can be used as a Terminal. However, setting Docker to run on Windows is more involved (not covered in class.)

  • Docker username: downloads will require a (free) username, therefore registration is necessary in order to follow the tutorial. Go to https://hub.docker.com and use the button “Sign up for Docker Hub” to register.

2 Set-up

Tutorials will be held in the Biochemistry classroom 201, and Docker has already be installed.

Instruction for installation can be found on the install link1 of the Docker web site.

Note HTML Version only:

If you are following this document in HTML format the code is shown with a colored background:

Green background: commands from local computer bash terminal
White background: standard output of programs.
Blue background: commands and output when WITHIN a bash container 
Yellow background: commands or output for information. Do not run!

2.1 Getting started

To get started we need to open a text terminal as detailed below. In class we’ll use a Macintosh.

Do one of the following:.

If you are on a Macintosh:

  1. Find the Terminal icon in the /Applications/Utilities directory. Then double-click on the icon and Terminal will open.
  2. OR use the top-right icon that looks like a magnifying glass (Spotlight Search,) start typing the word Terminal and press return. Terminal will open.

If you are on a PC:

  1. Find Power Shell e.g. using Windows search or Cortana. This will open a suitable text-based terminal.

(Note: Windows cmd does not offer the appropriate commands.)

2.2 Version check

This ensures that Docker is properly installed. The exact running version itself is not very important.

At the $ or > prompt within the window of Terminal, cmd or PowerShell type docker --version to check the version currently installed.

docker --version 
Docker version 19.03.5, build 633a0ea

2.3 Docker login: Required!

Before going further, it is necessary now to login with your Docker Hub ID. You should already have created one before this or the previous workshop. If you need to create an ID now go to https://hub.docker.com to register.

Docker login:.

docker login 
Login with your Docker ID to push and pull images from Docker Hub. 
If you don't have a Docker ID, head over to https://hub.docker.com 
to create one.
Username: YOUR_DOCKER_ID_HERE
Password: 
Login Succeeded
$ 

Note: if you do not login first you will receive an error message when tryingt to start docker in the next steps.

3 Choosing a docker image

In due time you will be able to create your own docker image. But for now we’ll use images that are available on the Docker hub.

A docker image can contain a single useful software, or it can give access to a series of software. The more software the image contains the more disk space it is likely required. For example, the ORCA image (Jackman et al. (2019)) is close to 30Gb in size but contains over 600 the bioinformatics software and utilities.

3.1 EMBOSS

For this series of exercises we’ll look for and use a docker image of the EMBOSS (Rice, Longden, and Bleasby (2000)) series of sequence analysis software.

“The European Molecular Biology Open Software Suite” (EMBOSS) is a free Open Source software analysis package specially developed for the needs of the molecular biology user community.2

EMBOSS contains a large number of sequence analysis tools, and we’ll sample a few of them via a docker method.

The purpose of this tutorial is more about learning how to use a Docker container rather than learning EMBOSS itself. However, here are a few links for learning more about EMBOSS for reference:

EMBOSS Link
Home page http://emboss.sourceforge.net
Tutorial http://emboss.sourceforge.net/docs/emboss_tutorial/emboss_tutorial.html
Applications http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/index.html
Grouped by functions http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/groups.html

3.2 EMBOSS on Docker

Any person with a Docker ID can create and upload images that are accessible to other users. Therefore we’ll find a large number of Docker images available. However, they will not be constructed in the same way, may not contain the same version of the software, and might not have been updated in a long time. Therefore finding a suitable image might require some browsing before deciding which one(s) to download and test.

Open we web browser.

  • Go to https://hub.docker.com
  • Sign-in is optional
  • In the top-left corner enter “EMBOSS” within the search field
Docker search field

The resulting web page will show results. As of today (Oct 2019) there are 23 results.

How many results did you get? ______________________________________________

The results are shown on a web page shown by “Most Popular” which may be the best option. The other option is “Recently Updated” which may be a better choice in certain cases.

For today we’ll chose the the first one named “biocontainers/emboss

EMBOSS choice 1

Important note: the full name of the docker image is biocontainers/emboss containing 2 words. This complete name will need to be used later to activate it.

Click on the biocontainers/emboss box.

The user “biocontainer” is a provider of a large number of other docker images and has well organized pages. Once you get on that page you’ll see that there are different tabs named:

  • Overview: details about all biocontainers
  • Tags: important tag (see below)
  • Dockerfile: How the docker image was created (will be useful in future workshop)
  • Builds: not used here- some images can be automatically updated (built)

In the next step we’ll want to pull (donwload from the hub) the docker image. On the default (Overview) tab you can see (on the right) the command that you can copy to pull the image onto your computer. However, if you were to do this now you’d have an error:

docker pull biocontainers/emboss
Using default tag: latest
Error response from daemon: manifest for biocontainers/emboss:latest not found: manifest unknown: manifest unknown

The error is apparently due to the fact that docker cannot find biocontainers/emboss:latest.

Tags:

This means that we need to talk about tags. The default tag is latest and is not required by default, just assumed since it’s the default. This is true most of the time, unless the author(s) of the image decide that they want to use a specific tag. In that case latest does not exist and the specific tag has to be clearly mentioned on the pull request.

For example, in the previous workshop we pulled the image for the small linux distribution called alpine. The command was simply docker pull alpine. Then, when we asked to show the list of images with the command docker image ls alpine we could note that latest was entered under the column TAG:

REPOSITORY    TAG           IMAGE ID        CREATED         SIZE
alpine        latest        11cd0b38bc3c    14 months ago   4.41MB

However, for the biocontainers images, it is necessary to use a specific tag which is listed under the Tags tab of the web page for the container.

Click on the Tags tab of the biocontainers/emboss page.

You will note that by default the tags are shown sorted as “latest” (right hand side pull-down menu). As of today this looks like this:

Tags page. Latest.

As of this writing the latest release of EMBOSS is 6.6.03 and the tag seems to reflect this within its first few characters v6.6.0.

Consequence: The pull command must contain the complete specific tag.

Pull biocontainers/emboss image.

From the above information it follows that the default pull command shown on the Overview Tag page will not work by default and a specific tag needs to be added to the request.

To that effect use the mouse to Copy the tag and add it to the pull request as shown below.

Note: The latest tag might change in the future and may be different than the one used below.

docker pull biocontainers/emboss:v6.6.0dfsg-7b1-deb_cv1

The tag will also need to be used later to activate the image (into a usable container.)

In a previous workshop we learned how to list docker images that are currently installed on the system. We can specifically list this one with the following command:

docker image ls biocontainers/emboss
REPOSITORY             TAG                      IMAGE ID        CREATED       SIZE
biocontainers/emboss   v6.6.0dfsg-7b1-deb_cv1   bc147a9dd825    5 weeks ago   638MB

Note the TAG column.

4 EMBOSS container

Now that we have what seems to be an appropriate image with EMBOSS of the latest version, we can now activate the image and “dive into it!

Reminder: To create a container from an image we use the command docker run that can also be altered by a number of modifiersIn the following command we’ll add the following modifiers as we have learned in the previous workshop:

  • -t: “Allocate a pseudo-TTY” (i.e. a text terminal)
  • -i: interactive
  • --rm: “Automatically remove the container when it exits”
  • see complete list with command docker run --help.

Finally remember that the image tag is mandatory, otherwise you’ll have an error that says:

Unable to find image 'biocontainers/emboss:latest' locally
docker: Error response from daemon: manifest for biocontainers/emboss:latest not found: manifest unknown: manifest unknown.
See 'docker run --help'.

We’ll now explore the inside of the container…