Generating Fake Data in Development to Populate Your Database
Being able to create dozens or thousands of records to populate your database in development has a lot of advantages.
After I get my database schema to a somewhat stable state, one of the first things I do is create scripts to automatically generate accurate fake data for most of my tables and fields.
I’ve been doing that now for about 5+ years and I always find it to be worth it.
# Populating Fake Data VS Seeding
Before we continue on, let’s go over the difference between these 2 terms.
To me, seeding is something you would do in development and production, such as seeding an initial admin user so you don’t have to go into the database to manually create one.
But, in this article, we’re not talking about seeding (which is worth doing btw).
Today we’re talking about generating heaps of random / fake data which is useful in development, but it’s also useful to make an app feel alive for demoing it to clients.
# Why Is Generating Fake Data Worth It?
While the amount of extra code you’ll need to write to pull this off isn’t a lot, it still is extra code you’ll need to write, but it’s so worth it.
Wishful Thinking
Imagine if you could type something like myapp add users
and 5 seconds later
you have 187 fake users generated in your database.
Each user would have their own random created at time, username, name, email address and whatever profile information you want.
But it wouldn’t be fully random like having a name of Jud6Hn-z. Instead, each field would be what it is. So when it came to email addresses, you would get a real looking email address.
Then, if you ran myapp add users
again, the previously generated users would
be removed and a new set of users would be generated.
Or perhaps you could run myapp add all
and now instead of generating only
users, it generated users, invoices, course lessons or whatever you need for
your application.
Viewing Your App in Different States
This is really useful because you can go from a totally empty database to being able to see what your app looks and feels like with a bunch of data.
On the flip side, you can also reset your database and then look at how your app behaves when it’s empty, which is really important because when no data exists for a resource, it’s a good idea to give friendly hints and links on how to add data for that resource.
But, unless you routinely view your app in these states, it’s easy to overlook doing this and without an ability to generate fake data on the fly, it becomes really tedious, which is why it’s often overlooked.
This strategy is especially useful when you’re designing your app because now you can generate enough data to trigger things like pagination, and it may also help you uncover UI issues, such as maybe you didn’t anticipate 38 comments to be loaded in a sidebar which makes it look weird, so now you limit it to 10 with a “read more” link.
# Generating Fake Data with Most Languages
Most popular programming language have a “faker” library to help generate fake data.
Links to faker libraries for a few languages:
- Python https://github.com/joke2k/faker
- Ruby https://github.com/stympy/faker
- Node https://github.com/marak/Faker.js
- Elixir https://github.com/igas/faker
- PHP https://github.com/fzaninotto/Faker
For example with Python’s Faker library you could put in
fake.past_date(start_date="-30d")
to generate a date between today and 30 days
ago.
You can generate everything from address fields to license plates to lorem ipsum to entire profiles, and it’s easy to create your own types if you need something very specific. There’s over 20+ existing types to choose from with the Python library.
Generating an entire random profile with 1 line of code:
fake.profile()
# { "address": "86523 Steven Square\nBurnston, IN 67952",
# "birthdate": datetime.date(1916, 8, 28),
# "blood_group": "B-",
# "company": "Stanley, Mitchell and Collins",
# "current_location": (Decimal("-38.2116335"), Decimal("145.281777")),
# "job": "Engineer, production",
# "mail": "kerrjoel@hotmail.com",
# "name": "John Trujillo",
# "residence": "889 Benjamin Islands Suite 753\nWest Craig, VT 89686",
# "sex": "M",
# "ssn": "208-74-8526",
# "username": "annemoss",
# "website": ["http://www.example.com/"] }
Pretty neat!
We use the Python Faker package in my Build a SAAS App with Flask course to generate fake users, payment invoices and bets (part of a game we create).
What’s your favorite faker library? Let me know below!