Benchmarking Debian vs Alpine as a Base Docker Image
Most official Docker images offer both Debian and Alpine based images but there's some surprising performance results between the 2.
Ever since Docker announced they were gravitating towards using Alpine in their official base Docker images, I hopped on board and embraced Alpine.
I mean, what’s not to love. It’s a super minimal distribution of Linux with an extremely small attack surface. It seems like a perfect match to run it as a base image inside of a container.
I wrote about how awesome Alpine was over a year ago and have personally been using it all this time without any show stopping issues.
Why Am I Comparing These 2 Base Images Now?
I definitely didn’t wake up today thinking “gee willikers, I wonder how Debian stacks up to Alpine as a base Docker image in 2018”.
What really happened was, I put together an article which went over a few Docker best practices about a month ago and it made its way onto Reddit.
To my surprise it received over 150 upvotes, a 99% upvote rating and a ton of engagement from the Reddit community.
To be honest, that’s why I wrote the post. I’ve been using Docker for a really long time and picked up what I think are some neat patterns over the years, but I really wanted to see what others have been doing so I can improve. I’m constantly learning from others.
Discovering 2 Negative Trends about Alpine
If you go through the Reddit comments you’ll find a number of people commenting about how DNS was randomly failing in Alpine and others commented about poor runtime performance for certain types of activities (such as a web app connecting to a database).
DNS lookup issues:
I’m not one to blindly follow 1 person’s comments, but once a few people start to mention the same thing it starts to get interesting. Still, there’s no way I would trust a few comments without some type of scientific proof.
Fortunately someone linked to a GitHub issue related to Alpine which contains more information. Perhaps this bug will be fixed in due time, but it does look like there’s something weird going on with DNS lookups inside of Alpine.
It may also be related to a 9 year old bug in BusyBox (Alpine uses BusyBox).
I haven’t experienced this personally, but I’m also not running anything at rofl scale which uses massive Kubernetes clusters, so maybe I’m just not affected by this problem.
Fortunately for us, Alpine has documentation on why DNS lookups might fail and it also explains why I never experienced this issue. Good to know.
Runtime performance woes with common web app cases:
Another Reddit user mentioned their Node app ran 15% slower when using Alpine as a base image compared to Debian. He also mentioned his Python apps were slower too.
This Reddit commenter even said they had a 35% difference in speed for real world test suites where they run 500-700 unit tests a day. That really caught my eye.
Of course, I replied to him asking for more specific information because a line like “15% slower” doesn’t say much without knowing the context of the situation.
We had a couple of back and forths and he supplied some hard numbers where he had a Python application running in a container perform 10,000 database selects in PostgreSQL which was running in a different container.
# postgres:9.6.3 and python 2.7 Total test time 15.3489780426 seconds Total test time 13.5786788464 seconds Total test time 14.2057600021 seconds # postgres:9.6.3-alpine and python 2.7 Total test time 14.262032032 seconds Total test time 13.7757499218 seconds Total test time 14.1344659328 seconds # postgres 9.6.3 and python:2.7-alpine Total test time 18.1418809891 seconds Total test time 16.0904250145 seconds Total test time 17.1380209923 seconds
What this says is the Postgres image runs just as fast in both Alpine and Debian but when an Alpine based Python image tries to connect to it, there is a very noticeable slow down.
That immediately made me think that maybe certain system level packages are the culprit here. For example most popular PostgreSQL connection libraries in Python and Ruby (and other languages) require installing
libpq-dev on Debian and
postgresql-dev in Alpine.
This is a package that needs to be installed in your image to use packages like
psycopg2 in Python and
pg in Ruby. They are what let you connect to PostgreSQL from your web app.
And here we are. Now we have something to test and a real reason to compare both of these base images.
Testing Both Base Images
Before we get into the benchmarks, it’s worth pointing out that I’m not a fan of contrived microbenchmarks. Sure, they are good for testing regressions at the language / framework level, but are pretty useless for testing real world scenarios.
I always roll my eyes when people are like, “well the flim flam web framework can do 163,816 requests per second but bippity bop only does 97,471 reqs / second, it’s trash!”.
Then you look into what the benchmark does and all it does is return an empty 200 response. As soon as you do something like serialize JSON or return HTML from a template suddenly, both frameworks become way slower.
And then you factor in something like a single database call and hey what do you know, both frameworks drop an order of magnitude in speed.
Benchmarking a Real World Web Application
I’m going to take the most up to date and complete version of my Flask course’s code base, which is running Flask 1.0.2 and all of the latest library versions at the time of writing this article.
It’s a decently sized Flask app with 4,000+ lines of code, a couple of blueprints, backed by a SQLAlchemy driven PostreSQL database and also has a bunch of other moving parts.
The test environment:
The test case will be to load up a page that uses a server rendered template to return HTML and also includes performing 1 SELECT statement to lookup a user by ID in the database.
The Flask app was configured with:
- gunicorn with Flask in production mode (not debug mode)
- gunicorn running with 8 threads (gthread) and 8 workers
- Logging level INFO
The test system (my dev box) was configured with:
- i5 3.2GHz CPU with 16GB of RAM and an SSD
- Docker Compose to launch the project with no app bind volumes
- Docker for Windows to run the Docker daemon (rebooted between each test)
- WSL to run the Docker CLI
The general testing strategy:
I’m going to run an HTTP benchmark tool called wrk to make requests to the Flask app. I will run it 5 times for each base image and take the fastest result.
I will perform this test using both Debian and Alpine as a base image with no other changes to the code base, app configuration, how the app is ran or the test machine.
I will be running
docker-compose up -d to launch the Compose project in the background and run
wrk -t8 -c50 -d30 <url to test> to launch
wrk with 8 threads while keeping open 50 HTTP connections to the page. The test will run for 30 seconds.
Spoiler alert: in both test cases my CPU maxed out at about 65%, so we’re not CPU bound.
Debian Set up
Here’s the Dockerfile I used:
FROM python:2.7-slim LABEL maintainer="Nick Janetakis <firstname.lastname@example.org>" RUN apt-get update && apt-get install -qq -y \ build-essential libpq-dev --no-install-recommends WORKDIR /app COPY requirements.txt requirements.txt RUN pip install -r requirements.txt COPY . . RUN pip install --editable . CMD gunicorn -c "python:config.gunicorn" "snakeeyes.app:create_app()"
Here’s the benchmark results:
nick@workstation:/e/tmp/bsawf$ docker-compose up -d Recreating snakeeyestest_celery_1 ... done Starting snakeeyestest_redis_1 ... done Starting snakeeyestest_postgres_1 ... done Recreating snakeeyestest_website_1 ... done nick@workstation:/e/tmp/bsawf$ wrk -t8 -c50 -d30 localhost:8000/subscription/pricing Running 30s test @ localhost:8000/subscription/pricing 8 threads and 50 connections Thread Stats Avg Stdev Max +/- Stdev Latency 167.45ms 171.80ms 1.99s 87.87% Req/Sec 39.13 29.99 168.00 74.85% 8468 requests in 30.05s, 56.17MB read Socket errors: connect 0, read 0, write 0, timeout 42 Requests/sec: 281.83 Transfer/sec: 1.87MB
Alpine Set up
Here’s the Dockerfile I used:
FROM python:2.7-alpine LABEL maintainer="Nick Janetakis <email@example.com>" RUN apk update && apk add build-base postgresql-dev WORKDIR /app COPY requirements.txt requirements.txt RUN pip install -r requirements.txt COPY . . RUN pip install --editable . CMD gunicorn -c "python:config.gunicorn" "snakeeyes.app:create_app()"
Here’s the benchmark results:
nick@workstation:/e/tmp/bsawf$ docker-compose up -d Recreating snakeeyestest_celery_1 ... done Starting snakeeyestest_redis_1 ... done Starting snakeeyestest_postgres_1 ... done Recreating snakeeyestest_website_1 ... done nick@workstation:/e/tmp/bsawf$ wrk -t8 -c50 -d30 localhost:8000/subscription/pricing Running 30s test @ localhost:8000/subscription/pricing 8 threads and 50 connections Thread Stats Avg Stdev Max +/- Stdev Latency 223.56ms 270.60ms 1.96s 88.41% Req/Sec 39.53 25.02 140.00 65.12% 8667 requests in 30.05s, 57.49MB read Socket errors: connect 0, read 0, write 0, timeout 20 Requests/sec: 288.38 Transfer/sec: 1.91MB
Let’s Talk about the Results
That’s a bit anticlimactic. I expected Debian to blow the doors off Alpine and then go on a grand crusade to change everything I use to Debian.
If you account for system variance I would be ok with saying both distributions performed about the same, although the stdev with Alpine is a little high, but on the other hand, Debian had more timeouts. Both of which could still be variance with such a short test.
I don’t doubt those Reddit commenters in the results they had, and if anything this test shows that not all test cases are created equal.
One thing I will point out is the Reddit user’s test performed 10,000 SELECT statements where as my app performed 1 SELECT statement. That is a very big difference. I wonder what would happen on a more complex page with let’s say 15 database queries instead of 1.
If you would like to pick up the torch and create more test cases across different languages or Python versions, then please do so and let us know how it goes.
For now I’m going to continue using Alpine but if I ever run into a problem or see a compelling reason to switch then I will make the switch to Debian.
Which base image do you use? Let me know in the comments below.