Updated on November 26, 2024 in #linux

Using git bisect to Help Find Which Commit Broke Something

using-git-bisect-to-help-find-which-commit-broke-something.jpg

This automates the process of doing a manual bisection search, AKA a binary search which helps quickly find 1 item in a list of items.

Quick Jump:

Prefer video? Here it is on YouTube.

Before we automate the process with git bisect let’s quickly cover an example of when you might want to use binary search in general and how to do it manually. In conversation bisection and binary search mean the same thing.

If you already know about this then feel free to skip to using git bisect.

A common example used to be finding a name in a phone book but who uses phone books nowadays? Instead, we’ll go over a number guessing game.

Imagine you picked a random number between 1 and 100 and asked me to guess what number you picked. It would be pretty tedious if I guessed 1, 2, 3, 4, 5, etc. up until 100. If you picked 1 then it would be efficient but what if you picked 87?

With bisection searching over 100 items you would average 6 picks to guess the number. You can do log2(100) to get 6.64 which rounds down to 6. You can use this site to check other values, for example log2(50000) is 16. That really shows how powerful this method is for finding something.

That’s why the phone book example used to be popular. A big city could have a million names and it only takes on average 19 guesses to find a name with log2(1000000).

# TL;DR on How It Works

The basic idea is you can half your sample size on each guess if you guess the middle point.

For example if I asked you to pick a number between 1 and 5 and you picked 3 and I said it’s higher then we know for certain it’s 4 or 5. You don’t even need to think about picking 1 or 2 and we already know it’s not 3.

Imagine if you knew you had a million items to search through. If you took the middle point on each guess it only takes 7 guesses to reduce 1 million items down to 7812:

Hopefully that gives the gist on how it works. Here’s a practical example.

# Number Guessing Game

The rules are you think of a number between 1 and 100 and I’ll try to guess it using bisection search. If I guess too high then tell me it’s too high, likewise if I guess too low then tell me it’s too low.

Since we’re not chatting in real-time, we’ll review the results of a game that already took place. Humor me here and imagine this took place in an alternate dimension.

Step 1:

[You're thinking of 63 as a number]

Q: IS YOUR NUMBER 29?
A: No, it is higher

This is where bisection search comes into play. We know the number is not 29 and it’s not 28, 27, 26 or anything lower because you said it’s higher.

Great, that means numbers 1-29 can be completely thrown out. We’ve almost eliminated a third of our choices in 1 step. The number must be 30 to 100.

If I wanted to be the most efficient here I would guess 50 and keep picking the middle point but for the sake of this example I’m purposely semi-randomly guessing to show how it works when not everything is perfectly accounted for or able to be calculated.

Step 2:

Q: IS YOUR NUMBER 99
A: No, it is lower

Ok cool, we already know our potential lowest pick is 30 and we’ve just eliminated 99 and 100 as an option. Now our range is 30 to 98.

That didn’t help us a whole lot since we’ve only removed 2 choices but it’s better than nothing. That wasn’t an efficient guess if we know we’re doing a bisection search.

Step 3:

Q: IS YOUR NUMBER 58
A: No, it is higher

Ok, like step 1 we can take a decent chunk out. Our new range is 59 to 98.

Step 4:

Q: IS YOUR NUMBER 71
A: No, it is lower

We’re getting closer. Our new range is 59 to 70. We have a 1 in 12 chance of guessing it.

Step 5:

Q: IS YOUR NUMBER 64
A: No, it is lower

Almost, now our range is 59 to 63. We have a 1 in 5 chance (20%).

Step 6:

Q: IS YOUR NUMBER 63
A: Yes

It looks like I got lucky and guessed but if I didn’t then we would continue. Hopefully this demonstrates how it works.

# Applying Bisection Search to Git Commits

We’ve seen how it works with sorted numbers and hinted how it could work to find sorted names in a phone book but how does this apply to git commits?

Let’s say you have a project you’re working on. The HEAD (latest commit) has an area of your site that’s not working anymore.

All you know is about a week ago this area of your site was working. In between a week ago and now you’ve made 30 commits but you have no idea which one caused the problem.

We can use bisection search here right? We can jump to the 15th commit and see if that works. If it does work then we know the 16-30th commit has the problem, we’ve just halved our commits to look through.

Use Case (Debugging / Troubleshooting)

This came up recently for client work where someone asked me to help troubleshoot why their Rails app was no longer allowing them to sign in but the app was not throwing errors or warnings in the logs.

It was a pretty big app and I’ve never worked with it before. I started by asking when the last known good state was and with a little bit of thinking he said about a week ago.

We checked out a commit from a week ago and yep it worked. From there we used git bisect to identify the commit that cased the problem in only a few steps. Many thousands of lines of code across hundreds of commits was narrowed down to 1 line of code in 1 commit 20 minutes later.

Demo Project

Let’s create a dummy project with 30 files, each being added in their own commit:

#!/usr/bin/env bash

set -o errexit
set -o pipefail
set -o nounset

git init

for item in $(seq 1 30); do
  touch "${item}" && git add "${item}" && git commit -m "Add ${item} file"
done

After running that script you can show all of the commit SHAs and their message:

$ git log --pretty=oneline

16a67d1011a29570c054667202e672f3d2ec8181 (HEAD -> main) Add 30
853bce0b103a26c6d22c2ff424873083347ec17e Add 29
04fcb03e2abba96a56b07ada3574f8ec09c96112 Add 28
05d70a806beb1adfac48d0e29fcb3dad5cdc2c1b Add 27
ae20afa2203927288f94e43a6f394e088b63659f Add 26
13673a468d3f165f12bbbf19a8cbfe6d14f48483 Add 25
5dcf33e5de21f261c82fdcba48f02bcdf4f55f5c Add 24
ef7d26f98797bca41a5110342ba0332a5659a098 Add 23
e573c96d3c1b2392718317cd246c6a5a021bd55c Add 22
398f96e75006364f43820112882ce8f8f781d43d Add 21
37fd9729e490d9754fcaffc423c1f5a2e12021d4 Add 20
2479cf6a1d73895523aecc9df3848eed8e20bf5a Add 19
33ec300931af9a807e2430a28508f1962d87f67d Add 18
0bfb6529e372ed4f8efdbffb91695e9736f6e2fb Add 17
3283144f7bb3b92104e52bab42e1d0536d9491ff Add 16
b11001f0133bd656a045611ed6579eb5c7220af6 Add 15
a1fd636f9e885d974e38bf3260cdfb2196808477 Add 14
13a2af89810f85e195f7f672c752623ed542f6c6 Add 13
7da101f8b26209e9622c04d2a0c13e3a279dfe28 Add 12
5d99a9339da3d3b1b41366ffd5245d0e93a390a3 Add 11
4fe6286de507762ba3d146d8908fd740a7d8d6d2 Add 10
43d2f5cf587a9a2ad790ece6c6a4382c50e81524 Add 9
54fc853ff78d00a4d49c0d6d39c42e5b3e55f38c Add 8
90148687fb960feecf8a5f182fea1e71f6af2cc2 Add 7
929bd5d35fd2746b66caf0c76662dd3f7590076d Add 6
f6c1fdb2d110cef1f71444fd48df3b2314f32146 Add 5
f82d5fe711cfd07bc685ce09d0055099c709aec3 Add 4
1dbff4d8366fd05fbc356faf04d87389dd3fa821 Add 3
6b2c9a37541f3dd374dca8f1af4ca4016d74c2ce Add 2
21229ecc0d06a8918b0d4cb845fbea4ddb5ca9f0 Add 1

Going back to our example let’s say the bottom 21229ec commit works but the top 16a67d1 commit doesn’t. We want to find the last known good commit because then we know the commit above that has the problematic code.

Using git bisect

The basic concept is we can start doing it and then supply a good and bad commit. In our example the good is our bottom most commit and the bad is our top most commit.

It helps automate the process of choosing the middle point between 2 commits because as you’ll soon see, as we step through different commits we let git know if the currently checked out commit is good or bad. It will make more sense once we see how it works.

Step 0: Start the process

You can run:

git bisect start

That’s going to let git know we’re about to start supplying bad and good commits.

Step 1: Supply the bad commit

In our case that’s the main branch’s HEAD commit, but you may want to adjust that as necessary.

All we need to do is tell git bisect that this is our “bad” commit:

git bisect bad HEAD

Alternatively we could have used 16a67d1011a29570c054667202e672f3d2ec8181 or skipped adding an argument because it will use the currently checked out identifier by default.

Step 2: Supply the good commit

In our case we know the bottom most first commit is good:

git bisect good 21229ecc0d06a8918b0d4cb845fbea4ddb5ca9f0

Here’s the output it produces:

Bisecting: 14 revisions left to test after this (roughly 4 steps)
[b11001f0133bd656a045611ed6579eb5c7220af6] Add 15

It cut things in half and even told us roughly how many steps it will take. If you run a git log you’ll notice HEAD now points to the b11001f commit which is Add 15.

Step 3: Check if things work and continue

We have no action to take in this example because nothing is broken but here is is where you would test your code.

Let’s say things are good because your tests passed:

git bisect good

We don’t need to pass in the sha again because git will use the currently checked out sha.

Here’s the output it produces:

Bisecting: 7 revisions left to test after this (roughly 3 steps)
[e573c96d3c1b2392718317cd246c6a5a021bd55c] Add 22

We know 15 was good so it’s choosing 22 which is in between 15 and 30.

Step 4: Check if things work and continue

Let’s do it again:

git bisect good

Bisecting: 3 revisions left to test after this (roughly 2 steps)
[ae20afa2203927288f94e43a6f394e088b63659f] Add 26

Step 5: Check if things work and continue

Let’s do it again but this time say our tests failed so we’ll choose bad:

git bisect bad

Bisecting: 1 revision left to test after this (roughly 1 step)
[5dcf33e5de21f261c82fdcba48f02bcdf4f55f5c] Add 24

At this point commit 24 is checked out. Step 4 confirmed that 26 is ok. The output is letting us know we have 1 revision left to test. Either this commit or 25 contains the broken code.

Step 6: Check if things work and continue

Let’s say this is git bisect good again and again. Once we’ve reached the point where there’s no more guesses we’ve identified the bad commit:

ae20afa2203927288f94e43a6f394e088b63659f is the first bad commit
commit ae20afa2203927288f94e43a6f394e088b63659f
Author: Nick Janetakis <nick.janetakis@gmail.com>
Date:   Sat Nov 23 09:40:01 2024 -0500

    Add 26

 26 | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 26

Now you can go through whatever code you changed and hopefully find the culprit!

This is where having good commit messages helps. If you have a commit with a ... message that added 514 files then it’s going to be tough to see what’s wrong but git bisect can’t save you there, that would be a job for interactive rebasing.

The video below demonstrates going through similar steps as above. I picked different good vs bad choices near the end in the video to see how it changes.

# Demo Video

Timestamps

0:26 – How does bisection search work?
2:53 – Stepping through a number guessing game
5:49 – Applying this to troubleshoot git commits
7:34 – Setting up a demo project
9:17 – Looking at the commit SHAs
9:39 – Starting a git bisect with the first bad and good commits
10:38 – Seeing its output and moving on
11:21 – Telling git bisect to move to the next steps
12:14 – Finding a bad commit and then finding our answer

When was the last time you had to do a bisection search? Let me know below.

Like you, I'm super protective of my inbox, so don't worry about getting spammed. You can expect a few emails per year (at most), and you can 1-click unsubscribe at any time. See what else you'll get too.

Learn Docker With My Newest Course