Automatic and manual Ecto migrations
- Wojtek Mach
- October 12th, 2020
- ecto
Ecto ships with built-in support for database migrations via Mix tasks and the Ecto.Migrator
module. Migrations are most commonly used for database schema changes like creating tables, columns, etc. In fact, migrations are often so convenient to use that developers use them even in other circumstances, in particular instead of (or in conjuction with) migrating schema, they migrate data. Below we’ll discuss some of the challenges with either approach, especially around deployment and operations.
Challenges with schema migrations
Let’s say you just built a v1 of your product, made the first deployment, and everything is working flawlessly. You then added some new features (and/or fixed some bugs!), deployed them, and the application started to throw errors, what happened? Better remember to run those migrations on new deployments! Since it’s so easy to forget manual steps like that, you go ahead and configure your deployment pipeline to automatically run migrations on new release and things work well again. (Ecto manages migrations via the schema_migrations
table and locks it so even if you deploy to multiple nodes and all of them automatically try to run migrations, only one node will actually do and the remaining ones will simply wait.)
If you have just one instance of your application and you make a new deployment, at some point you’ll have to restart your app to load the new code, which would mean downtime. Thus you should be running at least two instances of your application - the “new” application being updated and the “old” one that continues to serve traffic.
This approach, however, restricts which operations you can perform in your schema migrations. In a nutshell, as long as you add new tables, new columns, etc you should be fine, the “old” code doesn’t even know about them. But once you modify your schema, change the type of a column, drop a table, etc, the “old” code that was depending on it will no longer work. On those occasions, you should split your software deployment in two. The first only adds to your schema and changes the code to work on both the “old” and “new” versions. Then, after all of your instances are using the “new” code, you’ll do a second deployment to change your DB schema.
Another challenge are schema changes that take a really long time. For instance, you may add an index on a huge table, which holds up the deployment. While it’s really convenient to run migrations automatically, wouldn’t it be nice to be able to run that particular migration manually?
Challenges with data migrations
Data migrations are migrations that change the data stored in the database, rather than the database schema. For example, here is a migration that rewrites all users statuses from active to enabled:
defmodule MyApp.Repo.Migrations.UpdateUsersStatus do
use Ecto.Migration
def up do
execute "UPDATE users SET status = 'active' WHERE status = 'enabled'"
end
def down do
execute "UPDATE users SET status = 'enabled' WHERE status = 'active'"
end
end
We may choose to implement this as an Ecto migration for the following reasons:
- the logic is located in a well known place, it’s versioned along with any other code, it’s code reviewed, etc
- each migration runs just once (unless rolled back)
- migrations run in order
- Ecto ensures only one node will run migrations at a time
- it’s automatically executed on deployments (if we configured it as such)
On the flip side, slow data migrations will also slow down new deployments. We could forget about Ecto migrations for data changes and implement these as scripts (or just regular functions) and run them on demand but then we’d lose the locking and versioning mechanisms given by migrations.
In short, there’s a lot of value in using Ecto migrations but sometimes we want to run them automatically and sometimes on demand. How to do that?
Multiple migration directories
Fortunately, Ecto has support for multiple migrations directories, all we need to do is to split up our migrations accordingly, e.g.:
priv/
repo/
migrations/ # run "automatically"
manual_migrations/
When we generate a new migration we can pass a --migrations-path
option:
$ mix ecto.gen.migration --migrations-path=priv/repo/manual_migrations update_users
* creating priv/repo/manual_migrations
* creating priv/repo/manual_migrations/20201001160835_update_users.exs
We can pass it to mix ecto.migrate
too:
$ mix ecto.migrate --migrations-path=priv/repo/manual_migrations
18:17:39.083 [info] == Running 20201001160835 MyApp.Repo.Migrations.UpdateUsers.change/0 forward
18:17:39.086 [info] == Migrated 20201001160835 in 0.0s
If we deploy with releases, we can define separate functions for each set of migrations:
defmodule MyApp.Release do
@app :my_app
def migrate do
load_app()
for repo <- repos() do
path = Ecto.Migrator.migrations_path(repo)
run_migrations(repo, path)
end
end
def migrate_manual do
load_app()
for repo <- repos() do
# requires Ecto v3.4+:
path = Ecto.Migrator.migrations_path(repo, "manual_migrations")
run_migrations(repo, path)
end
end
defp run_migrations(repo, path) do
{:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, path, :up, all: true))
end
defp repos do
Application.fetch_env!(@app, :ecto_repos)
end
defp load_app do
Application.load(@app)
end
end
Since Ecto v3.4 we can pass multiple migration paths at the same time:
$ mix ecto.migrate --migrations-path=priv/repo/migrations --migrations-path=priv/repo/manual_migrations
18:17:39.083 [info] == Running 20201001160800 MyApp.Repo.Migrations.CreateUsers.change/0 forward
18:17:39.083 [info] == Running 20201001160835 MyApp.Repo.Migrations.UpdateUsers.change/0 forward
(...)
You may want to make that the default behaviour in dev & test. If you’re using Phoenix, you may already have ecto.setup
and test
Mix aliases, so let’s modify them to run all migrations:
defp aliases() do
[
"ecto.migrate_all": ["ecto.migrate --migrations-path=priv/repo/migrations --migrations-path=priv/repo/manual_migrations"],
"ecto.setup": ["ecto.create", "ecto.migrate_all", "run priv/repo/seeds.exs"],
test: ["ecto.create --quiet", "ecto.migrate_all --quiet", "test"]
]
end
Conclusion
With Ecto multiple migration directories support we can easily split up our migrations into ones that are automatically running on deployments and ones that we manually trigger after the code was updated. This technique can be useful for both schema and data migrations.
We also mentioned a situation where schema changes require us to split the deployment in two. In fact, we could even combine that into one deployment with two steps: we make the code changes, define the “destructive” schema migration as a “manual” one and deploy. Then, after the deployment is complete on all nodes (along with any “safe” automatic migrations), we simply trigger the manual one!
Finally, in dev & test we may actually want to run all migrations at the same time and we can easily do that by passing both migration directories.
Happy hacking!