Pick 1 of 2 Items Randomly with a Weighted Percent in Python and Ruby
With 1 line of code and 1 function we'll be able to randomly pick something X% of the time and something else Y% of the time.
This is really handy when you’re generating fake data such as seeding your development database with a decent amount of users or whatever resources your app needs.
You can use this when you want to let’s say generate 100 random users but have 5% of them be admins where as the other 95% will be regular members.
Or maybe you want 10% of users to be deactivated or not confirmed so you can quickly see various UI states. In another case maybe you want 33% of users to have a specific optional field filled out. You get the idea!
There’s lots of different algorithms to solve this but when you have a fairly simple case of picking either choice A or B with specific weights associated to them you can do this very easily in both Python and Ruby or any language.
# Even Distributions without Weights
Both languages make this really easy to do. The basic idea is you can pass in a list of 2 items and you’ll have a 50% chance of getting one of them. Technically you can pass in more than 2 items and they will all be split evenly, for example if you pass in 4 items you’ll have a 25% chance of getting any of them.
Python
This will pick either Heads
or Tails
half the time.
from random import choice
print(choice(["Heads", "Tails"]))
=> Heads
Ruby
This will pick either Heads
or Tails
half the time.
puts ["Heads", "Tails"].sample
=> Tails
# Using a Percent Based Weight
This is where things get a bit more interesting. As of Python 3.6 there is built-in language support for this but with Ruby 3.1 we have to come up a custom solution.
Python
This will pick Admin
5% of the time or Member
95% of the time.
# Notice how we're importing `choices` here not `choice`.
from random import choices
# It returns back a list which is why we're getting the first item with [0].
print(choices(population=["Admin", "Member"], weights=[0.05, 0.95])[0])
=> Member
This is really handy because you can pass in more than 2 items if you want. You
can also optionally set k=N
where N
is how many results you want back in
the list, it will re-run the choices function let’s say 10 times and give you
back a list of 10 randomly picked results, although if you do this then chances
are you’ll want to remove the [0]
to ensure you can see all of the results.
Ruby
I didn’t see anything built into Ruby 3.1 that supports this out of the box. I also Google’d around and saw a bunch of solutions on StackOverflow.
All of the solutions felt too complicated for my specific use case. I saw a bunch of algorithms written in a bunch of different languages but if all you really care about is picking 1 of 2 items some percentage of the time it’s not too bad.
Here’s what I ended up with and we’ll talk about getting there below the code:
def weighted_sample(percent, yes, no)
percent >= rand(1..100) ? yes : no
end
puts weighted_sample(5, "Admin", "Member")
=> Member
We need to step back and re-think what picking a weighted percent really means. Think about dungeons and dragons or another game that uses 4, 6, 8, 10, or 20 sided dice.
If you have a 10 sided die you have a 1 in 10 chance of rolling any number or phrased another way, a 10% chance of rolling any individual number between 1 and 10.
But when you ask the question “what’s the odds of rolling a 9 or higher?” you have a 2 out of 10 chance to do that or 1/5, in other words a 20% chance. It’s not super likely but it’s decently common.
So our algorithm now becomes, let’s roll a 100 sided die and if the percent weight you want is higher than the roll then you’ve “won”, so let’s pick the first item. Otherwise we “lost” the roll so we’ll choose the other item.
That’s exactly what we’re doing above in the code.
Instead of rolling a literal die, we use the rand(1..100)
function which will
pick a number between 1 and 100. I normally don’t like to use ternary
conditions in Ruby (or any language) but this time it felt more readable than
doing a 1 line if else end.
Revisiting the Python approach
You could do the same thing we did in Ruby with:
from random import randint
# I chose to name it weighted_choice instead of weighted_sample because Python
# has a choice() function where as Ruby used "sample" for its method.
def weighted_choice(percent, yes, no):
return yes if percent >= randint(1, 100) else no
print(weighted_choice(5, "Admin", "Member"))
=> Member
As for which one to use, that comes down to your use case. If you find yourself
doing a lot of 2 item comparisons with a weight it wouldn’t hurt to implement
this function because using it is a lot less typing than the choices()
function IMO.
I didn’t benchmark both solutions but if you do feel free to drop a comment below.
# Demo Video
Timestamps
- 0:38 – Starting off with an evenly split sample without weights
- 1:52 – Going over weighted sampling, starting off with the “why”
- 4:16 – Breaking down the problem using dice rolls as a comparison
- 6:01 – Running the Ruby code to pick a weighted sample
- 6:32 – Going back to how our weighted_sample function works
- 8:31 – Doing the same thing in Python
- 9:38 – Python has an alternative built-in option with the choices function
- 11:44 – With Python, which solution should you use?
Which implementation will you use to pick a random weighted sample?