Back when I worked at On-Site, years
ago, I screwed something up in my code. I discovered it when
everything stopped working in production.
By “everything” I mean “nearly every action a user could take would
just hang.” So that’s not great.
But I learned something important about database migrations. And that’s
nearly as good as not crashing production, right?
I’ll tell you what happened with me and crashing our servers, but first let’s
talk about database migrations.
It turns out that testing a migration on your own machine won’t show
you any bugs that happen with bad data, or too much data. You probably
migrate often, so it won’t show you any bugs that only happen when you
run several migrations back-to-back.
Luckily, you have a staging server to catch these problems. You do, right?
But it also gets migrated often. And it probably still has much less data
than production. So you’ll see fewer “bad data” bugs and lots fewer “way too much
D'oh! You meant the staging server to catch more bugs.
If you do something slow like adding a column or an index, the
database will take a break from doing anything else with that table.
It locks it down hard. That means minutes or hours of downtime before
anything touching that table can run. In the mean time, your
site is down for as long as the operation takes.
So by now you’ve figured it out. I added an index on a table. On my
dev laptop, it was instant. On staging it was instant. In production,
it took about ten minutes, but nobody was sure how long it was going
to be. Ten minutes later the site was back up. I was lucky —
On-Site sold to realtors, so ten minutes was a reasonable maintenance
window, and everything turned out okay. Of course, I sweated bullets
the whole ten minutes…
And now you know — slow database operations can take your whole
site down randomly.
I’m going to write a lot more on this topic. It turns out that there
are specific database operations that cause downtime and specific safe
changes. Sign up below for more of that (or hit the “next” button, if
it’s there yet.)