Optimizing Your Docker Images with Multi-Stage Builds

In the world of containerization, Docker has become a popular tool for developers to package applications. However, one common issue arises when building images: they can become unnecessarily large. This is especially true for compiled languages, like Java, where the build tools and runtime environment are different. Fortunately, Docker multi-stage builds solve this problem.

In this blog, we'll explore Docker multi-stage builds, and show you how they help optimize your Docker images by creating a lightweight final image with only the necessary components for running the application.

The Problem: Large Docker Images

Imagine building a Java application inside a Docker container. You need the Java Development Kit (JDK) to compile the source code, and a Java Runtime Environment (JRE) to actually run it. If we include the JDK in the final image, it results in a much larger image size than necessary for production. The result is a slower, heavier container with potential security vulnerabilities because we’re shipping unnecessary tools.

This is where multi-stage builds come in handy. They allow you to:

Use a heavyweight image (like the JDK) to compile the app.
Copy only the necessary output to a smaller image (like the JRE) for the final production container.

What is a Docker Multi-Stage Build?

A multi-stage build is a feature in Docker that allows you to use multiple FROM statements in a single Dockerfile. Each FROM represents a new build stage, and you can copy the final output (compiled files, binaries, etc.) from one stage to another. This drastically reduces the size of the final image, as you only include what's needed.

Setting Up a Multi-Stage Build for a Java Application

Let’s take a simple example: a Java application called EchoApp that echoes user input.

Here’s the code for EchoApp.java:

import java.util.Scanner;

public class EchoApp {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);
        System.out.print("Enter something: ");
        String input = scanner.nextLine();
        System.out.println("You entered: " + input);
        scanner.close();
    }
}

To containerize this app efficiently, we’ll use a multi-stage Dockerfile.

Multi-Stage Dockerfile

# Stage 1: Build the application
FROM openjdk:17-jdk-alpine AS build

# Set working directory
WORKDIR /usr/src/app

# Copy source code to the container
COPY . .

# Compile the Java program
RUN javac EchoApp.java

# Stage 2: Create the runtime environment
FROM openjdk:17-jre-alpine

# Set the working directory
WORKDIR /usr/src/app

# Copy the compiled class from the previous stage
COPY --from=build /usr/src/app/EchoApp.class .

# Command to run the application
CMD ["java", "EchoApp"]

Breaking Down the Dockerfile

Let’s break it down step by step:

Stage 1: Build Stage

FROM openjdk:17-jdk-alpine AS build

The first FROM statement uses the JDK to compile the Java source code. This image is heavier because it includes development tools like javac. We name this stage build.

WORKDIR /usr/src/app
COPY . .
RUN javac EchoApp.java

Here, we set the working directory, copy the application files, and compile the EchoApp.java file to produce EchoApp.class.

Stage 2: Runtime Stage

FROM openjdk:17-jre-alpine

The second FROM statement starts a new stage. This time, we’re using a smaller JRE image, which is optimized for running Java applications without the extra overhead of build tools.

WORKDIR /usr/src/app
COPY --from=build /usr/src/app/EchoApp.class .
CMD ["java", "EchoApp"]

We copy the compiled .class file from the build stage into the final runtime image, ensuring that only what’s needed is shipped.

Benefits of Multi-Stage Builds

Reduced Image Size: By separating the build environment from the runtime environment, the final image contains only the necessary files to run the application. For Java, this means using the JRE instead of the heavier JDK, resulting in smaller images and faster deployment.
Security: Unnecessary build tools can introduce vulnerabilities. By removing them from the final image, you reduce the attack surface, making the container more secure.
Improved Efficiency: Multi-stage builds enable better caching and optimization. For instance, you can avoid re-building stages that haven’t changed, speeding up the build process.

Building and Running the Image

To build and run your multi-stage Docker image, follow these simple steps:

Build the Image:
```
docker build -t echo-app-multi-stage .
```
Run the Container:
```
docker run -it echo-app-multi-stage
```

The -it flag allows you to interact with the container, enabling user input.

Conclusion

Docker multi-stage builds are a powerful feature that allows you to optimize the size and performance of your container images. By separating the build and runtime stages, you can significantly reduce the size of the final image, remove unnecessary tools, and improve security.

In our example, we used this approach to package a simple Java application into a lightweight Docker container by separating the JDK and JRE environments. This method can be applied to other languages and frameworks as well, such as Go, Node.js, or .NET, to ensure you're shipping only what you need.

If you're looking to optimize your Docker images, multi-stage builds are a must-have technique in your toolbox.