Steam train in the forest.
Photo by Mark Plötz from Pexels

The Problem

For some reason, certain containers take an unreasonably long time to stop or to restart. When you time the stop/restart operation, these containers all take roughly 10 seconds to complete the task.

The Causes

There are three reasons why you might experience this issue.

  1. SIGTERM never reaches the process in the target container.
  2. The containerized process does receive these OS signals, but it neglects to stop.
  3. The containerized application has a long shut down process.

We are going to go over issues 1 and 2.

The Underlying Causes

Let's say that your goal is to build a new Docker image. You're going to want that image to be as small as possible so that it's fast to download and fast to start up. Therefore, you decide to use a stripped down Linux OS image--like Alpine or Busybox--as the parent image.
example-dockerfile
This is commonly done and quite reasonable. However, the Linux init system was one of the things that got stripped out, and that's the source of the problem!

The Linux Init System in a Nutshell

  • It is the first process (aka PID 1) to start running.
  • It is the ancestor of all other processes.
  • It has a number of different jobs, including
    • starting up daemon processes
    • cleaning up after orphaned child processes
    • and forwarding OS signals on to its child processes

How Docker Containers Stop, and Why They Sometimes Don't

When you run a command like docker stop mycontainer, Docker sends the TERM signal to PID 1 running in mycontainer.

  • If PID 1 is the init process - PID 1 will forward the TERM signal on to its child processes, causing them to terminate. Once they've all terminated, the container will terminate.
  • If there is no init process - The default application (defined by the ENTRYPOINT or CMD Dockerfile instruction) is PID 1. It is then responsible for handling the TERM signal.
    • When the app doesn't handle SIGTERM - If the application does not listen for SIGTERM, or if it catches SIGTERM but does not run any termination logic, the app won't stop. Therefore, neither will the container.
    • Why a container takes 10 seconds to terminate - After you run the command docker stop mycontainer, Docker will wait for a grace period of 10 seconds. If the container hasn't stopped after 10 seconds, Docker will send SIGKILL directly to the kernel, bypassing the containerized application. The kernel will then abruptly terminate the app, which then terminates the container.

Why Some Containerized Applications Fail to Receive SIGTERM

If your app does not receive the TERM signal, it's probably because your app is not actually PID 1! It is probably a child process of the shell, and the shell is PID 1.

The problem is that the shell does not forward OS signals on to its child processes. That's the most common reason why a containerized app does not receive the TERM signal.

The Source of the Problem

The underlying source of the problem derives from the Dockerfile that created the Docker image. For example, take a look at the following Dockerfile.
entrypoint-shell-form
The ENTRYPOINT instruction is in shell form. This means that ./popcorn.sh is actually executed by the shell. Therefore, the shell is PID 1.

Solution 1: Use the Exec Form of the ENTRYPOINT Instruction

Instead of using the shell form of the ENTRYPOINT instruction, you should use the exec form, like this.
entrypoint-exec-form
Now ./popcorn.sh is executed as PID 1, and it will receive all signals sent to the container. Whether ./popcorn.sh actually traps Linux signals is another matter.

Note 1 - The following exec-form ENTRYPOINT instruction is functionally equivalent to the shell form example from the beginning of this section.
entrypoint-shell-form-equivalent

Note 2 - Some Dockerfiles use CMD instead of ENTRYPOINT to specify the container's default application. All of these lessons still apply, because CMD can also be specified in both shell form and exec form.

A Common Pitfall: Untrapped Linux Signals

Let's say that we are using the shell form of the ENTRYPOINT instruction...
entrypoint-shell-form-2
...and that the popcorn.sh shell script--which prints the date every second--looks like this:
popcorn-script-2
Next, let's use the Dockerfile to (1) create an image named truek8s/popcorn, (2) use it to create a container...

docker build -t truek8s/popcorn .
docker run -it --name corny --rm truek8s/popcorn

...and (3) in a different terminal, time how long it takes to stop the container

time docker stop corny

It still takes 10 seconds to terminate the container because popcorn.sh does not trap and handle SIGTERM! The way to fix this is to add signal-handling code to end the process when it traps SIGTERM.
popcorn-script-2-1
(Note that the trap code waits in the background--listening for TERM--while the script keeps looping over the date command.)

Solution 2: Use the Linux "exec" Command

If you still want to use the shell form of the ENTRYPOINT instruction, there is an alternate solution. Simply pass your command to the *nix exec command. For example:
popcorn-script-3
The reason this works is because the exec command replaces the shell process with the popcorn.sh process. Now popcorn.sh is PID 1, which means it receives all signals sent to the container!

Solution 3: Use an Init System

What do you do if the default application does not trap SIGTERM, and you cannot modify the program to fix that? Clearly neither Solution 1 nor Solution 2 will work for you. What you need to do is add an init system to your image/container.

There are quite a few different init systems that we can use. Let's use the Tini init system, which is super lightweight and specifically intended to be used in a container.

All we have to do is update the Dockerfile to

  1. Install tini
  2. Set tini as the default application
  3. Pass popcorn.sh as an argument to tini

popcorn-script-4

Now, tini runs as PID 1. Furthermore, it forwards the Linux signals that it receives on to its child process--popcorn.sh!

Note 1 - The apk add --no-cache command is how you install a new app onto Alpine linux. If you want to install tini onto an image with a different linux distro, check out the Tini github page.

Note 2 - You don't need to install tini onto your Docker image if you're going to run it with Docker. Docker already has it installed. All you need to do is Docker run --init mycontainer and you're done. This solution does not work if you are running containers in Kubernetes! That's why I never use this solution.

Do We Still Need to Trap SIGTERM?

Here's a question for you: will popcorn.sh still stop immediately if we remove the code that exits the script when it traps SIGTERM?
popcorn-script-5
The answer is Yes. It will stop immediately. The trap code is no longer required. Here's why.

Consider the situation where a signal is sent to a process. So long as that process is not PID 1, and so long as it doesn't trap the signal, the process will do the default action associated with that signal. Thus, SIGTERM will cause the process to gracefully terminate.

So then why won't that same process gracefully terminate if it's PID 1? The answer is because PID 1 is treated differently than other processes because it is normally the init process. Consequently, PID 1 does not perform the default action associated with the signals that it receives. That's why we have to include code that explicitly traps the terminate signal, and then ends the process. That's why we need the trap code in Solutions 1 and 2.

Conclusion

I hope that I was able to help you to understand this a little better. Thanks for reading this article.