Learn Docker With My Newest Course

Dive into Docker takes you from "What is Docker?" to confidently applying Docker to your own projects. It's packed with best practices and examples. Start Learning Docker →

Create an IP Address Allow List with CIDR Block Support in Python

create-an-ip-address-allow-list-with-cidr-block-support-in-python.jpg

We'll go over creating the logic for the whitelist, an optional exempt list, working with CIDR blocks and benchmarking a few solutions.

Quick Jump:

If you prefer video, I recorded a zero ad video of this on YouTube.

Before we dive into the code I think the “why” or the context around the problem is really important. If you don’t care about that then feel free to jump straight to the code.

This post is going to be a mixture of technical details, figuring out how to maximize business value based on your current problems and how asking questions can shape a feature.

# Understanding the Problem

Recently for some client work I was moving a cron job into Kubernetes and it called a service’s API that uses an IP address allow list as an extra layer of security to prevent unwanted callers.

Before it was in Kubernetes the cron job curl’d the external domain of the service so it was flowing through the public internet, such as https://example.com/api/hello. The application stored a comma separated list of IP addresses in a database which acted as the allow list.

On every request it would get the request’s IP, do a DB lookup, split the IPs on a comma, loop over each one and do a check to see if the exact IP matched. It was originally created to handle comparing exactly 1 IP address to another IP address and since there were only a few callers from known static IPs this wasn’t a problem.

I wasn’t around for that code but it doesn’t matter. The solution worked and this is a very successful company in the grand scheme of things.

I know this could be solved at many different layers too. For example at the firewall, nginx or application layer. They went for the app layer so that admins can easily adjust the allow list without dealing with infrastructure changes.

It’s Kubernetes Time

I used a Kubernetes CronJob and transplanted the same curl command over. However, I like the idea of getting upgrades instead of side grades if I can.

The Kubernetes cluster is on EKS (AWS) and has 2 NAT gateways which means external traffic coming out of our cluster all have the same set of IP addresses. Basically we could have 15 nodes in our cluster with hundreds of services and workloads but all outgoing traffic will be 1 of 2 IP addresses.

If we wanted to keep things exactly the same and curl the external domain like before then the DB allow list would only need to add 2 IP addresses and we’d be good to go.

But as mentioned before, I like upgrades. Connecting over the public internet for this in a Kubernetes world didn’t seem ideal. It would be like traveling 100 miles out of the way just to cross your street.

Going over the public internet is going to be slower since it involves the internet. It also needs to pass through AWS Route 53 (DNS), an AWS ALB (application load balancer) and the NAT gateway. All of these things have direct costs in money as well as time.

In the grand scheme of things maybe this takes an extra 10-15ms for something that gets called a few times per minute. That’s not a huge deal but if we can avoid it pretty easily why not do it?

Sticking to the Internal Kubernetes Network

This is nice because Kubernetes provides us internal DNS by default. You can connect to your service’s name directly such as http://example assuming you have a service named example. There’s no TLD (.com, etc.) and it’s over HTTP since the traffic never leaves your cluster. Route 53 and an ALB aren’t used, the public internet is never accessed as well.

For internal traffic within our cluster we have a CIDR block of 10.0.0.0/16 which means there’s 65,536 possible IP addresses that a pod can run on. That’s 10.0.0.0 to 10.0.255.255. You can find a CIDR calculator here.

Needless to say adding all 65k IP addresses to our allow list isn’t a feasible solution.

Basically we have 2 options, we can choose to update the IP checking code to support CIDR blocks or take a shortcut and instead of doing an exact string comparison of the whole IP, you can do a “string starts with” comparison.

# Hacking Up a Proof of Concept

At this point I haven’t talked to anyone else about the feature. At this place I’m pretty much the solo SRE / Platform / “DevOps Engineer” / whatever you want to classify this line of work as.

Anyways, I knew the cron job would fail using the internal network and it did. The app correctly reported that the IP address 10.0.8.17 wasn’t allowed.

The option of using “starts with” felt a little hacky but the IP allow list code was isolated to 1 function. Here’s a simplified version of that code:

def allow_ip_address(ipaddress):
    # allowed_ips is really a database query that returned the results.
    if ipaddress in allowed_ips.split(","):
        return True

    return False

I figured I could modify that condition to check an IP prefix, something like:

    if ipaddress.startswith("10.0") or ipaddress in allowed_ips.split(","):
        return True

This would happen before the database was touched since Python can short-circuit or conditions. It feels hacky because there’s just a floating number in the middle of a function. I know we have no real use case for needing a proper CIDR check but still, it made me feel a little dirty to consider shipping that to production.

But there were aspects of this solution that I liked, such as not needing to do a DB lookup.

# Asking a Developer for Their Opinion

While I’m in mostly a solo position I am in direct contact with a bunch of developers.

He read the ticket that I created which explained the problem and also compared the pros and cons of both solutions (starts with vs CIDR checks).

They said they thought the starts with solution would work but he immediately asked me a question of “Will that 10.0 address ever change?”.

For some reason that flipped something in my brain. Within 1 second it felt like a whole world unraveled itself. My proof of concept rung internal warning bells and felt hacky because the implementation was brittle, had floating numbers in the middle of a function and it didn’t make it easy to test different IP prefixes.

The implementation was hacky but the idea of using starts with for our use case was not.

Anyways, the 10.0 will never change unless we completely re-built our VPC and cluster and decided to use something else but a developer on the team wouldn’t know that detail. In my head I initially felt comfortable hard coding 10.0 directly because I knew there was effectively a 0% chance that would change.

But going back to the 1 second world. That shifted my whole mental model of the problem by thinking “what if it did change?” and that made me think at the very least this value should be a config option. It didn’t stop there tho because then I thought about wanting to support adding more than 1 IP or range which seems reasonable.

Then the solution became more clear. What I really want is an IP white list exemption list. Basically a list of IP addresses or ranges that will always be allowed access.

Once that became a concept then it made sense to avoid the DB lookup since it’s exempt. The detail of using starts with or a CIDR block was a secondary decision.

# Starts with 10.X or Add in CIDR Support?

In nearly a decade there was never a customer request to support a CIDR block or range of IP addresses to white list. It didn’t feel like it was worth adding that support for the heck of it.

The API isn’t super high traffic so execution speed of this function wasn’t critically important. I had a hunch a CIDR validation function would take longer to run but I didn’t let that play a big role in this decision.

It turned out to be ~50x slower to calculate an IP range within a CIDR using Python’s built in functions to do this but that benchmark is deceptive. It took 15 microseconds to calculate it where checking if a string starts with another string took 0.3 microseconds.

That translates to about ~67k vs ~3.5 million executions per second. That seems like a massive difference and would be very concerning but if it already takes 30 milliseconds to fulfill a response, a ~14.5 microsecond difference is basically nothing for our use case.

I did end up going with the starts with solution because it met our business requirements. It being faster was just an extra bonus. Plus things are now coded in such a way where switching implementations would be easy since the logic is tucked away in an easy to test always_allow_ip_address function.

# The Code and Benchmarks

In all 3 cases we’re going to use Python’s standard library, no third party dependencies are required.

I separated out 3 different solutions into 3 different files so it’s easier to demonstrate each one in a blog post and video. I also removed the idea of splitting the list of IPs on commas to focus more on the allow list code itself.

All 3 examples have the same allow_ip_address function. This is the one that reads the exempt list of IPs or does a database lookup on the IP address with a short-circuit. All this function does is return True or False depending on if the IP address is allowed.

String Starts With

# startswith.py

exempt_ips = ["127.0.0.1", "10.0"]
fake_db_allowed_ips = ["42.42.42.42", "1.2.3.4"]


def allow_ip_address(ipaddress):
    if always_allow_ip_address(ipaddress) or ipaddress in fake_db_allowed_ips:
        return True

    return False


def always_allow_ip_address(ipaddress):
    if ipaddress.startswith(tuple(exempt_ips)):
        return True

    return False

The focus point of the above code is ipaddress.startswith(tuple(exempt_ips)) in the second function. Before this adventure I didn’t know you could pass a Tuple into Python’s startswith function and it will check all of the items. This makes the code quite concise and readable in my opinion.

It pretty much reads out loud exactly what it does. If the IP address starts with any one of the exempt_ips then we found a match and we can allow it.

There’s no CIDR support here which is why the exempt_ips has 10.0. It means 10.0.3.72 will match but 10.1.6.112 will not. That works for my use case!

Using Python’s standard library with ip_address and ip_network

# net.py

from ipaddress import ip_address
from ipaddress import ip_network


exempt_ips = ["127.0.0.1", "10.0.0.0/16"]
fake_db_allowed_ips = ["42.42.42.42", "1.2.3.4"]


def allow_ip_address(ipaddress):
    if always_allow_ip_address(ipaddress) or ipaddress in fake_db_allowed_ips:
        return True

    return False


def always_allow_ip_address(ipaddress):
    for ip in exempt_ips:
        if ip_address(ipaddress) in ip_network(ip):
            return True

    return False

This one supports a CIDR which is why we have 10.0.0.0/16 in exempt_ips. This means any 10.0.X.X IP address will be allowed but 10.4.X.X will not.

Here’s a quick chart of what IP ranges a specific CIDR supports:

  • 10.0.0.0/32 only supports 10.0.0.0 (1 host)
  • 10.0.0.0/24 supports 10.0.0.X (256 hosts)
  • 10.0.0.0/16 supports 10.0.X.X (65,536 hosts)
  • 10.0.0.0/8 supports 10.X.X.X (16,777,216 hosts)

The ipaddress.ip_network function is handy. Here’s a couple of examples of using it:

>>> ipaddress.ip_address("10.0.8.42") in ipaddress.ip_network("10.0.0.0/16")
True

>>> ipaddress.ip_address("10.0.8.42") in ipaddress.ip_network("10.0.0.2")
False

>>> ipaddress.ip_address("10.0.8.42") in ipaddress.ip_network("10.0.8.42")
True

It supports matching on a CIDR block or an exact IP address which makes it pretty versatile.

Using Python’s standard library with IPv4Network

# netlist.py

from ipaddress import IPv4Network


exempt_ips = ["127.0.0.1", "10.0.0.0/16"]
fake_db_allowed_ips = ["42.42.42.42", "1.2.3.4"]


def allow_ip_address(ipaddress):
    if always_allow_ip_address(ipaddress) or ipaddress in fake_db_allowed_ips:
        return True

    return False


def always_allow_ip_address(ipaddress):
    for exempt_ip in exempt_ips:
        all_ips = [str(ip) for ip in IPv4Network(exempt_ip)]

        if ipaddress in all_ips:
            return True

    return False

This one is kind of brittle because it explicitly only supports IPv4 and not IPv6. It’s also really slow because the for comprehension with IPv4Network("10.0.0.0/16") is going to produce a list of ~65k IP addresses.

I only included this one because if you Google around for how to check if an IP address is within a CIDR with Python you’ll find a few high ranking StackOverflow results near the top suggesting to use this solution.

IPv4Network is nice to have available but in my opinion it’s not the right tool for the job to run it on every request to compare IP addresses.

Benchmark Results

The relative difference is the most important thing here but I ran this on a 7+ year old workstation that has a i5-4460 3.2GHz CPU with 16GB of memory and an SSD running within WSL 2 using Python 3.8.10.

[allow_ip_address_startswith]
Time per run in seconds: 0.00000028869998641312 (0.29 microseconds)
Executions per second:   3,463,803

[allow_ip_address_net]
Time per run in seconds: 0.00001484379998873919 (14.84 microseconds)
Executions per second:   67,368

[allow_ip_address_netlist]
Time per run in seconds: 0.10794336000108159523 (107943.36 microseconds)
Executions per second:   9

If you’re following along and want to run everything locally, here’s the file which calls all of the above solutions and benchmarks them:

# allow_ip_address.py

import sys
import timeit

from startswith import allow_ip_address as allow_ip_address_startswith
from net import allow_ip_address as allow_ip_address_net
from netlist import allow_ip_address as allow_ip_address_netlist


ipaddress = sys.argv[1]

print(allow_ip_address_startswith(ipaddress))
print(allow_ip_address_net(ipaddress))
print(allow_ip_address_netlist(ipaddress))


def benchmark(fn, number):
    time_per_run = timeit.timeit(f"{fn}('{ipaddress}')", globals=globals(),
                                 number=number) / number

    time_per = f"{time_per_run:.20f}"
    time_in_microseconds = round(float(time_per) * 1000000, 2)
    executions_per_second = f"{round(1 / float(time_per)):,}"

    results = f"""
[{fn}]
Time per run in seconds: {time_per} ({time_in_microseconds} microseconds)
Executions per second:   {executions_per_second}"""

    print(results)

    return None

# I'm using 10 samples for the last one because it takes a long time to run.
benchmark("allow_ip_address_startswith", 1000)
benchmark("allow_ip_address_net", 1000)
benchmark("allow_ip_address_netlist", 10)

Assuming you have all 4 files (3 code examples + this one) in the same directory you can run it with python3 allow_ip_address.py 10.0.0.5 and it will output if the IP address is allowed and also run the benchmarks.

# Video Walkthrough

Timestamps

  • 0:48 – Understanding the problem
  • 2:51 – It’s Kubernetes time and keeping it internal
  • 7:29 – Hacking up a proof of concept
  • 10:16 – Asking a developer for their opinion
  • 12:46 – Should we go with string starts with or add in CIDR support?
  • 15:14 – Beginning to look at the code and running the benchmarks
  • 16:23 – Going over common code and patterns for all 3 solutions
  • 17:40 – Looking at the code for the string starts with solution
  • 19:50 – Preparing to look at the 2nd solution with CIDR block support
  • 22:03 – Getting proper CIDR block support with the net solution
  • 24:45 – Looking at the third netlist solution
  • 26:39 – Side topic, let’s go back to the blog post for the first 2 solutions
  • 30:36 – Back to the third netlist solution
  • 32:02 – Going over the benchmark results and the code to produce it

If you had to implement this feature what would you do? Let me know below!

Never Miss a Tip, Trick or Tutorial

Like you, I'm super protective of my inbox, so don't worry about getting spammed. You can expect a few emails per year (at most), and you can 1-click unsubscribe at any time. See what else you'll get too.



Comments