Shrink Your Docker Images by ~50% with Multi-Stage Builds
We'll cover examples with Flask, Django, Rails, Node and Phoenix. The strategy is the same for any app.
Prefer video? Here it on YouTube.
Back in 2021 I gave a talk at DockerCon where I briefly touched on the topic of multi-stage builds and how you can use them to organize your images and reduce image size.
This post is going to focus on how you can take a specific build stage and split it into multiple stages. The general pattern is splitting your build time and run-time dependencies. This is something that you can apply to any application to save maybe even 50% or more on your final image size depending on how many build time dependencies you have.
We’re also not going to do anything that makes builds perceivably slower or make your Dockerfile more complicated in a way that makes it less readable. Of course using multi-stage builds will require understanding a new concept but where I’m going with this one is we’re not going to do any crazy tweaking or introduce non-standard workflows.
It’s also going to work with any base image such as Debian or Alpine.
All of what we’re about to cover is already done in my example Docker starter apps. We’re going to use code snippets from there as a reference here:
- https://github.com/nickjj/docker-flask-example
- https://github.com/nickjj/docker-django-example
- https://github.com/nickjj/docker-rails-example
- https://github.com/nickjj/docker-node-example
- https://github.com/nickjj/docker-phoenix-example
To be able to split up your build time and run-time dependencies we first need to understand how to differentiate them.
# Build Time vs Run-Time Dependencies
Let’s break down both types.
Build Time Dependencies
A number of programming languages like Python and Ruby have package dependencies that you can install as part of your project. Most of them are likely written in Python or Ruby but sometimes they have C dependencies.
When you have C dependencies or more specifically dependencies that require you to compile them then you need various system files such as a compiler and library related files.
Sometimes package authors pre-compile these for popular operating systems and CPU architectures such as Linux on an AMD64 CPU but if this doesn’t already exist for your set up then your system will need to compile them. This process is typically automated with popular package managers and generally “just works”.
But! I’m sure you’ve tried to install certain packages and have seen error messages like missing xxx.c header file for some package and then you went off to Google how to apt install whatever files you needed to fix it.
To get around some of these issues you might install a package like
build-essential on Debian
based images which has a ton of tools and libraries built-in. This is
considered a meta package because it’s a package that has many other packages.
It’s also quite huge, it’s a couple hundred MBs. On Alpine there’s the
alpine-sdk
package which has a bunch of build tools too.
In addition to that, depending on your app you might still need other packages
like libpq-dev
to compile C dependencies for popular Postgres adapters in a
few languages. There’s also libvips-dev
if you want to manipulate images,
such as with Rails.
In any case you can classify all of these as build time dependencies. You need them to exist to build and install your packages but once your packages are installed then you don’t need them anymore, at least not most of them.
Run-Time Dependencies
These are required to start your application so it can run.
If you have a typical Flask / Rails app and you’ve already installed your packages somewhere then you don’t need to lug around 250 MB of system level compilers and library files.
# Multi-Stage Build Strategy
After reading the above, maybe now you can start to visualize how this is going to come together, at least conceptually. If not, that’s ok too.
In stage #1 we can pip
/ uv
/ bundle
/ yarn
/ mix
/ etc. all of our
dependencies to a certain location within our Docker image. Then in stage #2 we
can COPY
those packages into this stage at a location where your language’s
package manager expects.
Then when we build our image we can target stage #2 and Docker will start executing stage #1 because it knows it’s a dependency of stage #2. When it finishes we’ll end up with a nice and tidy Docker image on disk which was built efficiently.
I mean, honestly, that’s really it. Python, Ruby, Node or Elixir don’t care how the files got to a specific location. They only care that they exist and are set up with proper user permissions.
How Much Shrinkage to Expect?
It depends on the project but for my example Flask project the app’s image size
went from 523MB down to 273MB which is about 50%. Django was about the same.
The biggest win is avoiding build-essential
which is around ~250MB so you can
expect to shrink your image by at least that amount.
I’ve been using multi-stage builds in all of these projects for as long as I
can remember but I only ditched build-essential
at run-time recently.
# Before / After Dockerfile for Multi-Stage Builds
I don’t know about you but my mind works best when I can see the code.
We’ll cover an example from the Flask app which also applies to Django. If you’re using Rails, Node, Phoenix or something else that’s ok too. Nothing about this is super specific to Python. You’ll see, I promise! Also, the demo video will look at those code bases afterwards.
Before
There’s a lot of code here related to setting up a production ready Docker image that runs as a non-root user.
There’s really only about 3 lines of code that are important for this concept. I have added comments to call those lines out.
FROM python:3.13.2-slim-bookworm
LABEL maintainer="Nick Janetakis <nick.janetakis@gmail.com>"
WORKDIR /app
ARG UID=1000
ARG GID=1000
RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential curl libpq-dev \
&& rm -rf /var/lib/apt/lists/* /usr/share/doc /usr/share/man \
&& apt-get clean \
&& groupadd -g "${GID}" python \
&& useradd --create-home --no-log-init -u "${UID}" -g "${GID}" python \
&& chown python:python -R /app
COPY --from=ghcr.io/astral-sh/uv:0.6.9 /uv /uvx /usr/local/bin/
USER python
# Copy in the package dependency files by themselves to optimize layer caching.
COPY --chown=python:python pyproject.toml uv.lock* ./
COPY --chown=python:python bin/ ./bin
ARG FLASK_DEBUG="false"
ENV FLASK_DEBUG="${FLASK_DEBUG}" \
FLASK_APP="hello.app" \
FLASK_SKIP_DOTENV="true" \
PYTHONUNBUFFERED="true" \
PYTHONPATH="." \
UV_COMPILE_BYTECODE=1 \
UV_PYTHON="/usr/local/bin/python" \
UV_PROJECT_ENVIRONMENT="/home/python/.local" \
PATH="${PATH}:/home/python/.local/bin" \
USER="python"
# This is the command to install our package dependencies.
RUN chmod 0755 bin/* && bin/uv-install
# Copy in all of our app's code.
COPY --chown=python:python . .
RUN if [ "${FLASK_DEBUG}" != "true" ]; then \
ln -s /public /app/public && SECRET_KEY=dummy flask digest compile && rm -rf /app/public; fi
ENTRYPOINT ["/app/bin/docker-entrypoint-web"]
EXPOSE 8000
CMD ["gunicorn", "-c", "python:config.gunicorn", "hello.app:create_app()"]
After
I’m going to call out some of the differences with comments.
# Now we use AS to name this stage. It can be named anything that Docker allows,
# I like to add -build to the name to label what it is doing.
FROM python:3.13.2-slim-bookworm AS app-build
LABEL maintainer="Nick Janetakis <nick.janetakis@gmail.com>"
WORKDIR /app
ARG UID=1000
ARG GID=1000
RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential curl libpq-dev \
&& rm -rf /var/lib/apt/lists/* /usr/share/doc /usr/share/man \
&& apt-get clean \
&& groupadd -g "${GID}" python \
&& useradd --create-home --no-log-init -u "${UID}" -g "${GID}" python \
&& chown python:python -R /app
COPY --from=ghcr.io/astral-sh/uv:0.6.9 /uv /uvx /usr/local/bin/
USER python
COPY --chown=python:python pyproject.toml uv.lock* ./
COPY --chown=python:python bin/ ./bin
# Take note of the UV_PROJECT_ENVIRONMENT path. This is where all of our Python
# packages will be installed to. We'll copy this location in the next stage.
ENV PYTHONUNBUFFERED="true" \
PYTHONPATH="." \
UV_COMPILE_BYTECODE=1 \
UV_PROJECT_ENVIRONMENT="/home/python/.local" \
PATH="${PATH}:/home/python/.local/bin" \
USER="python"
RUN chmod 0755 bin/* && bin/uv-install
# Given this is a build stage we don't need to run our app. Technically we can
# copy in our code and run it but it's an unnecessary step.
CMD ["bash"]
###############################################################################
# Here AS has a different name without the -build. This is our end game stage.
FROM python:3.13.2-slim-bookworm AS app
LABEL maintainer="Nick Janetakis <nick.janetakis@gmail.com>"
WORKDIR /app
ARG UID=1000
ARG GID=1000
# Notice how we're not using build-essential here. I like to keep curl around
# but it's not necessary. In this case libpq-dev is necessary because the psycopg
# package requires libpq to exist even at run-time.
RUN apt-get update \
&& apt-get install -y --no-install-recommends curl libpq-dev \
&& rm -rf /var/lib/apt/lists/* /usr/share/doc /usr/share/man \
&& apt-get clean \
&& groupadd -g "${GID}" python \
&& useradd --create-home --no-log-init -u "${UID}" -g "${GID}" python \
&& chown python:python -R /app
USER python
ARG FLASK_DEBUG="false"
ENV FLASK_DEBUG="${FLASK_DEBUG}" \
FLASK_APP="hello.app" \
FLASK_SKIP_DOTENV="true" \
PYTHONUNBUFFERED="true" \
PYTHONPATH="." \
UV_PROJECT_ENVIRONMENT="/home/python/.local" \
PATH="${PATH}:/home/python/.local/bin" \
USER="python"
# Here's where we're copying in files from the app-build stage. We want the
# Python packages to exist in the same spot since this stage uses them too.
COPY --chown=python:python --from=app-build /home/python/.local /home/python/.local
# We also copy in `uv` itself so it's available to run in this stage. I use
# it for detecting outdated dependencies and also updating lock files, but
# I never use this stage directly for building new images.
COPY --from=app-build /usr/local/bin/uv /usr/local/bin/uvx /usr/local/bin/
COPY --chown=python:python . .
RUN if [ "${FLASK_DEBUG}" != "true" ]; then \
ln -s /public /app/public && SECRET_KEY=dummy flask digest compile && rm -rf /app/public; fi
ENTRYPOINT ["/app/bin/docker-entrypoint-web"]
EXPOSE 8000
# Here we run our app's server since it's a web app.
CMD ["gunicorn", "-c", "python:config.gunicorn", "hello.app:create_app()"]
As for the compose.yaml
file, we can set the build.target
to app
on your
app’s service and Docker will take care of the rest for us when we docker compose build
:
services:
web:
build:
context: "."
target: "app"
args:
- "UID=${UID:-1000}"
- "GID=${GID:-1000}"
- "FLASK_DEBUG=${FLASK_DEBUG:-false}"
Now when you build your project you should see a substantial decrease in size.
I suggest running a docker image ls
before and after you do it so you can
compare the size. You’ll also want to run your project to make sure it still
starts up.
For example when I got greedy and tried to remove libpq-dev
from my Flask
app’s app stage I got greeted by this error message from psycopg
(Postgres
adapter):
web-1 | ImportError: no pq wrapper available.
web-1 | Attempts made:
web-1 | - couldn't import psycopg 'c' implementation: No module named 'psycopg_c'
web-1 | - couldn't import psycopg 'binary' implementation: No module named 'psycopg_binary'
web-1 | - couldn't import psycopg 'python' implementation: libpq library not found
The demo video below covers all of the above in more detail and also goes the code bases for the other example Docker starter apps to see how it works there.
# Demo Video
Timestamps
- 1:07 – Build time vs run-time dependencies
- 2:45 – A Dockerfile without multi-stage app builds
- 7:25 – Recap of build time vs run-time dependencies
- 8:00 – A Dockerfile with multi-stage app builds
- 13:30 – Building your project by setting the stage
- 15:03 – Applying it to other projects (Django, Rails, Node, Phoenix)
Were you able to apply this pattern to your project? Let me know below.