Finding an Ansible Bug on Apt Pinning and Installing Docker Compose v2
I recently came across a bug when trying to pin Docker Compose v2 using Ansible 2.13.3. Here's the steps I took to determine it was a bug.
Prefer video? There’s a video version of this blog post on YouTube that goes into a bit more detail about certain topics listed below.
I recently made a pretty big update to my Ansible role which installs Docker and Docker Compose. Now it supports installing both Docker Compose v1 and Docker Compose v2.
The role supports Debian and Ubuntu. On Linux based systems if you want to use
Docker Compose v2 you need to install the docker-compose-plugin
package. This
is an apt package that Docker
manages,
v2 also happens to be written in Go instead of Python.
While performing this update I noticed that Ansible 2.13.3
ignores apt pins
when apt installing packages. There’s a pull request to fix
it. It may or may not be fixed
by the time you read this post. If it’s not fixed, downgrading to the latest
Ansible 2.12.X
release works.
It took me a while to figure this one out because this might be the first time in 8 years of using Ansible where I ran into a legit bug. Before we get into debugging things, here’s a quick run down on pinning packages.
# What Is Apt Pinning and How Does It Work?
The role already supported pinning Docker itself which is apt installed. All
you have to do is set a specific version you want such as docker__version: "20.10.17"
and the role will create a pin in
/etc/apt/preferences.d/docker.pref
with:
Explanation: Pin added by Ansible role "docker"
Package: docker-ce*
Pin: version 5:20.10.17*
Pin-Priority: 600
As a quick aside, the 5:
part is the epoch, this defaults to 0 but it’s a
system apt developed to allow developers or tool vendors to change their
versioning scheme or fix broken versions. It allows versions to always be
ordered correctly.
This pin will ensure that if you ever run an apt-get update && apt-get upgrade
you won’t get a newer version. This is great because it means you can
keep your system up to date but still control which latest major or minor
versions you want of a specific package.
Normally Debian and Ubuntu do this for you based on installing set major and minor versions of a package in their sources list but with Docker it’s different. Docker is installed and managed using Docker’s sources list so you can technically get a newer version than what’s available in your distro’s list.
Of course you can choose to use your distro’s version of Docker but it’ll probably be out of date. Docker is one of the few packages where I use its upstream repo instead of what’s available by default in Debian or Ubuntu.
Priorities
Apt defaults to 500
as a pin priority. That means if there’s multiple apt
repositories available the one with the highest pin priority will get chosen.
In my case I used 600
which is higher than 500
. If you already have Docker
installed you can get a list of installation candidates and a full list of
available versions by running:
# If you don't have Docker installed, feel free to try a different package such as bash.
$ apt policy docker-ce
docker-ce:
Installed: 5:20.10.17~3-0~ubuntu-jammy
Candidate: 5:20.10.17~3-0~ubuntu-jammy
Version table:
*** 5:20.10.17~3-0~ubuntu-jammy 600
500 https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
100 /var/lib/dpkg/status
5:20.10.16~3-0~ubuntu-jammy 500
500 https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
5:20.10.15~3-0~ubuntu-jammy 500
500 https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
5:20.10.14~3-0~ubuntu-jammy 500
500 https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
5:20.10.13~3-0~ubuntu-jammy 500
500 https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
Here we can see the version we pinned has the highest priority. It means if in
the future 20.10.18
or 20.20.20
comes out with a pin of 500
then our
600
will win and our pinned version will not get updated. Perfect!
That’s important because having a version updated under our feet could break things. It also means we lose control over which version we want to install. That makes things non-reproduceable and overall introduces potential unexpected errors.
# Why Didn’t It Work with Docker Compose v2?
Since both Docker and Docker Compose v2 are apt packages there should be no difference in how these packages get installed. Apt pinning is a system level feature, not package.
Since I knew Docker pinning worked in the past I replicated what I had for
Docker Compose by setting a new docker__compose_v2_version: "2.5.0"
variable
and configuring Ansible to create
/etc/apt/preferences.d/docker-compose-plugin.pref
:
Explanation: Pin added by Ansible role "docker"
Package: docker-compose-plugin
Pin: version 2.5.0*
Pin-Priority: 600
Then while testing the patch for the Ansible role I ran Ansible to install
everything and to my surprise 2.6.0
got installed so I checked what apt policy
had to say:
$ apt policy docker-compose-plugin
docker-compose-plugin:
Installed: 2.6.0~ubuntu-jammy
Candidate: 2.6.0~ubuntu-jammy
Version table:
*** 2.6.0~ubuntu-jammy 500
500 https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
100 /var/lib/dpkg/status
2.5.0~ubuntu-jammy 600
500 https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
2.3.3~ubuntu-jammy 500
500 https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages
Yep that’s weird. Our pin is being understood but it has no effect.
Now What?
Like most debugging sessions I repeated the scenario, so I ran a one liner with Ansible to remove the package with:
ansible all -m apt \
-a "name=docker-compose-plugin autoremove=true purge=true state=absent" -b \
-i ~/src/ansible/inventories/dtp/test
That’s the same as running sudo apt remove docker-compose-plugin --purge
. It
removed the package and then I tried installing it again. Same outcome.
The Ansible task to install Docker and Docker Compose v2
- name: Install Docker and Docker Compose v2
retries: 20
delay: 15
ansible.builtin.apt:
name:
- "docker-{{ docker__edition }}"
- "docker-compose-plugin"
state: "{{ docker__state }}"
I’ll save you the boring details but I tried all sorts of things like breaking
this up into 2 tasks to remove the loop. I even renamed name
to pkg
because
“why not?!”. pkg
is an alias to name
but who knows what’s under the hood.
No matter what I did, the unpinned latest version was always getting installed. I double, triple and quadruple checked the pin file for both. There was nothing wrong. Everything lined up and should be working.
I know I recently made a lot of changes since pinning last worked. I updated Ansible versions and made a bunch of patches to the role so I scoured the internet looking for reasons why this might not work. Nope, empty.
Calling an IRC life line
After spending a solid 30 minutes on this I reached out to my friend Maciej AKA. drybjed the creator of DebOps which is one of the longest running Ansible projects I know of. He’s one of the most knowledgeable sysadmins that I’ve had the pleasure of meeting.
DebOps is a huge ecosystem of playbooks and roles to manage Debian and Ubuntu systems. He was a guest on my podcast a while back where he talked about building and using it to manage 40+ servers at a medical University.
I showed him a gist of everything and he also said everything looks good.
Reducing the problem by not using Ansible
He gave a great suggestion which was to manually install the package on the system and what do you know, it worked. The pin took effect and I got the expected result.
So why would sudo apt-get install docker-compose-plugin
work manually but not
through Ansible? Good question. Now that I knew the problem, I Googled for “apt
pinning not working with ansible” and found this
PR.
I downgraded to ansible-core==2.12.8
and then Ansible respected the pin again
with its apt
module. Mystery solved.
# Internal Biases Can Be a Real Bitch
I don’t know if this is the correct definition from a medical or psychology standpoint but to me an internal bias is when you let negative internal thoughts from previous encounters guide you into a decision or way of thinking without treating everything on equal ground.
When working with popular tools I’m so used to this workflow:
- It’s not working
- I’m bewildered to a degree where I’ve “WTF’d” at least a few times
- Start writing a GitHub issue explaining the situation to file a bug report
- Discover I’m an idiot along the way while writing the bug report
- Find the root cause due to something being overlooked that I didn’t 100% understand
- Delete the draft bug report
- Get it working by changing my implementation and fist pump
Then I reflect that. Often times I’ll write a blog post or make a video about it and chalk it up to lessons learned. Understanding something in more depth is a very enjoyable experience. I like the whole process from the struggle to the resolution.
Anyways in this case I’m so used to the above workflow that I completely discounted the idea that Ansible might actually have a bug because I know my ratio to bug reports being written and bug reports being posted are so different in frequency.
The takeaway here isn’t “wow I’m so smart, finally it was a tool that was busted” but to go back and be reminded of the basics. Break the problem down and reduce variables. These are things I know but sometimes need to be reminded of. Every Ansible role I developed started with getting it working manually before automating it with Ansible.
While I didn’t get flustered during this adventure, it was another reminder to not flip your lid. You’re in control of the situation and with enough perseverance you will find the solution. If Maciej wasn’t there to help I know I would have figured it out eventually, it just would have taken 5x longer. Never underestimate having a friend to call upon.
# Demo Video
Timestamps
- 0:39 – Adding Docker Compose v2 support to my Ansible Docker role
- 1:58 – Going over how specific package versions get installed on a system
- 4:22 – Going over the basics of pinning Apt packages
- 8:35 – Apt’s version epoch (the number before the colon)
- 9:51 – This used to work, why would Docker Compose v2 not work?
- 11:30 – Showing how I overrode the role’s defaults to pin versions
- 12:24 – The Ansible task to apt install them didn’t work but it manually works
- 13:04 – Going over the initial debug process
- 15:12 – Trying it out without Ansible
- 17:08 – I tried a few changes within Ansible
- 18:15 – The solution is often easy once you know the problem
- 19:13 – Skimming the blog post
- 19:40 – Beware of internal biases when troubleshooting
What was your latest debugging adventure? Let me know below!