Dealing with Database Migrations in a Brand New Project
Most web frameworks come with a way to manage database migrations but they can be very tedious to use early on in a project.
In my case, the frameworks I work with have well supported ORMs, such as ActiveRecord with Rails, SQLAlchemy with Flask and Ecto with Phoenix. All of these frameworks and tools have the concept of database migrations, as do other frameworks such as Django, etc..
I don’t know about you, but for me, the last thing I want is to get bogged down by database migrations when I’m trying to discover a good data model for my project.
That’s because coming up with a good data model usually involves making a lot of changes.
Hacking away on a brand new project should be a fun activity where you have a ton of flexibility to try things out quickly.
Now, don’t get me wrong. I still use database migrations, but lately I’ve been finding myself using them in a slightly different way when I’m working on new projects.
Migration File Overload
When working on a new project, I’m often not sure what fields are going to be in each database table. Sometimes I also find it beneficial to get the database relationships lined up before I start filling in the fields.
That means I’m often just including 1 or 2 fields in each table, such as a
title field – or something I’m 100% sure will exist.
If I were to write a proper database migration for each change, I would end up with a ton of boilerplate migration files where I’m adding and removing fields, indexes and constraints.
This is actually a pretty big drag on productivity. Suddenly you find yourself feeling less motivated to work on your app because you dread having to make half a dozen migrations an hour to keep up with your schema’s changes.
Also, in the back of your head, you’re thinking how stupid it’s going to look if you publish your code and your “initial commit” has 30+ database migrations but only a few tables.
A Few Real World Examples
Here’s something that I ran across recently while building my own custom course hosting platform from scratch.
Adding a published field to a table:
I wanted each course to be able to be published or not, similar to how you would set a blog post as being published or not.
That’s not too bad right. You could just have a
published field that’s a boolean. Then when it’s false, the course would be hidden from everyone (draft mode) or when it’s true, the course would be visible to everyone (published).
So that’s exactly what I did. I even set up an index on the published field because it’s something you’ll use very often in your queries, because you’ll only want to list published courses, etc..
But then as I was creating the front end for this, I started thinking about the problem a little bit more. I don’t think published or unpublished is enough for a course because a course is much different than a blog post.
I think what I really needed there was a
status enum. One that allows me to set
- Unpublished would mean that no one would be able to find or enroll into the course.
- Publishing it privately would require people to know the link to find and enroll into it.
- Publishing it publicly would allow anyone to find and enroll into the course.
If I were using individual database migrations for this, I would have to create a new migration file which first drops the column and index on the
published field and then create a new column and index for
Changing name to display name:
In another case, I was saving the name of a user, but this name wasn’t being used for anything other than what is shown when they post a question for a specific lesson.
I even found myself labeling the field as “Display name” in the settings page where people can edit their profile.
That’s a simple enough change, but it also means creating another database migration file to rename
display_name and delete / create a new constraint.
23 migration files after 1 weekend:
Things like the above were happening multiple times a day and it didn’t take long to end up with a mountain of database migration files.
It was really taking its toll on productivity, especially because I’m writing my platform with Phoenix and I’m still quite new with it, so I kept having to look up the migration syntax every time I did something new.
Getting the Best of Both Worlds
Ideally, you’ll want to use your framework’s migrations because otherwise you’ll need to hand code the raw SQL and that’s probably going to cause more headaches and delays in the long run. It also goes against the grain of what your framework recommends which could make onboarding new developers a bit more difficult.
So, what I eventually discovered was to create a single migration file called
initial_schema. You can name this file whatever you want. I chose initial schema because I think it describes what it’s meant for pretty well.
Whenever you want to add, change or remove something, it all happens in this one file, but you’re never doing anything like dropping fields or indexes. Instead you just delete the old field’s line and add in the new one, then reset your database.
This means whenever you make a change, your entire database is getting reset, but when you’re early on in a project, that’s not a big deal at all. Doing this made changing my database schema WAY faster and a more pleasant experience.
That type of work flow even nudges you in a direction to create a little bit of seed data early on which is a good idea. There’s a ton of value in being able to auto-populate your database with fake data in a few seconds by running a single command.
That’s something we do in the Build a SAAS app with Flask course. Every resource we create ends up having dozens or hundreds of fake data rows generated for it.
Speaking of Flask, SQLAlchemy makes this pattern very easy to follow because you don’t even need to create that initial migration file. You can just modify the fields in your models and reset your database. You were probably doing that already.
But other frameworks like Rails and Phoenix do require creating a migration file. Although, technically with Rails you could modify your
schema.rb file directly and then use
schema:load (which resets your database), but Phoenix has no such file available.
Still, I think I’ll use the single migration file even on new Rails projects because it serves as a good working example for adding migrations and it makes it easier to follow the path of multiple migrations when you’re ready to share or deploy your code.
Sharing or Deploying Your Code
This pattern of creating a single migration file and resetting your database is really only meant for when you’re working on a new project, or something you haven’t deployed yet.
That’s because you wouldn’t want to destroy any data you’ve collected in production just to update your database.
In that case, you’ll want to use individual migration files for any changes that you make afterwards. At this point in the app’s lifecycle, that’s not too big of a deal because you’ve already done most of the fast changing schema updates early on in the project.
Have you been using this pattern all along? Let me know below.