Learn Docker With My Newest Course

Dive into Docker takes you from "What is Docker?" to confidently applying Docker to your own projects. It's packed with best practices and examples. Start Learning Docker →

Pick 1 of 2 Items Randomly with a Weighted Percent in Python and Ruby

pick-1-of-2-items-randomly-with-a-weighted-percent-in-python-and-ruby.jpg

With 1 line of code and 1 function we'll be able to randomly pick something X% of the time and something else Y% of the time.

Quick Jump:

This is really handy when you’re generating fake data such as seeding your development database with a decent amount of users or whatever resources your app needs.

You can use this when you want to let’s say generate 100 random users but have 5% of them be admins where as the other 95% will be regular members.

Or maybe you want 10% of users to be deactivated or not confirmed so you can quickly see various UI states. In another case maybe you want 33% of users to have a specific optional field filled out. You get the idea!

There’s lots of different algorithms to solve this but when you have a fairly simple case of picking either choice A or B with specific weights associated to them you can do this very easily in both Python and Ruby or any language.

# Even Distributions without Weights

Both languages make this really easy to do. The basic idea is you can pass in a list of 2 items and you’ll have a 50% chance of getting one of them. Technically you can pass in more than 2 items and they will all be split evenly, for example if you pass in 4 items you’ll have a 25% chance of getting any of them.

Python

This will pick either Heads or Tails half the time.

from random import choice

print(choice(["Heads", "Tails"]))
=> Heads
Ruby

This will pick either Heads or Tails half the time.

puts ["Heads", "Tails"].sample
=> Tails

# Using a Percent Based Weight

This is where things get a bit more interesting. As of Python 3.6 there is built-in language support for this but with Ruby 3.1 we have to come up a custom solution.

Python

This will pick Admin 5% of the time or Member 95% of the time.

# Notice how we're importing `choices` here not `choice`.
from random import choices

# It returns back a list which is why we're getting the first item with [0].
print(choices(population=["Admin", "Member"], weights=[0.05, 0.95])[0])
=> Member

This is really handy because you can pass in more than 2 items if you want. You can also optionally set k=N where N is how many results you want back in the list, it will re-run the choices function let’s say 10 times and give you back a list of 10 randomly picked results, although if you do this then chances are you’ll want to remove the [0] to ensure you can see all of the results.

Ruby

I didn’t see anything built into Ruby 3.1 that supports this out of the box. I also Google’d around and saw a bunch of solutions on StackOverflow.

All of the solutions felt too complicated for my specific use case. I saw a bunch of algorithms written in a bunch of different languages but if all you really care about is picking 1 of 2 items some percentage of the time it’s not too bad.

Here’s what I ended up with and we’ll talk about getting there below the code:

def weighted_sample(percent, yes, no)
  percent >= rand(1..100) ? yes : no
end

puts weighted_sample(5, "Admin", "Member")
=> Member

We need to step back and re-think what picking a weighted percent really means. Think about dungeons and dragons or another game that uses 4, 6, 8, 10, or 20 sided dice.

If you have a 10 sided die you have a 1 in 10 chance of rolling any number or phrased another way, a 10% chance of rolling any individual number between 1 and 10.

But when you ask the question “what’s the odds of rolling a 9 or higher?” you have a 2 out of 10 chance to do that or 1/5, in other words a 20% chance. It’s not super likely but it’s decently common.

So our algorithm now becomes, let’s roll a 100 sided die and if the percent weight you want is higher than the roll then you’ve “won”, so let’s pick the first item. Otherwise we “lost” the roll so we’ll choose the other item.

That’s exactly what we’re doing above in the code.

Instead of rolling a literal die, we use the rand(1..100) function which will pick a number between 1 and 100. I normally don’t like to use ternary conditions in Ruby (or any language) but this time it felt more readable than doing a 1 line if else end.

Revisiting the Python approach

You could do the same thing we did in Ruby with:

from random import randint

# I chose to name it weighted_choice instead of weighted_sample because Python
# has a choice() function where as Ruby used "sample" for its method.
def weighted_choice(percent, yes, no):
    return yes if percent >= randint(1, 100) else no

print(weighted_choice(5, "Admin", "Member"))
=> Member

As for which one to use, that comes down to your use case. If you find yourself doing a lot of 2 item comparisons with a weight it wouldn’t hurt to implement this function because using it is a lot less typing than the choices() function IMO.

I didn’t benchmark both solutions but if you do feel free to drop a comment below.

# Demo Video

Timestamps

  • 0:38 – Starting off with an evenly split sample without weights
  • 1:52 – Going over weighted sampling, starting off with the “why”
  • 4:16 – Breaking down the problem using dice rolls as a comparison
  • 6:01 – Running the Ruby code to pick a weighted sample
  • 6:32 – Going back to how our weighted_sample function works
  • 8:31 – Doing the same thing in Python
  • 9:38 – Python has an alternative built-in option with the choices function
  • 11:44 – With Python, which solution should you use?

Which implementation will you use to pick a random weighted sample?

Never Miss a Tip, Trick or Tutorial

Like you, I'm super protective of my inbox, so don't worry about getting spammed. You can expect a few emails per year (at most), and you can 1-click unsubscribe at any time. See what else you'll get too.



Comments