Automatic and manual Ecto migrations

  • Wojtek Mach
  • October 12th, 2020
  • ecto

Ecto ships with built-in support for database migrations via Mix tasks and the Ecto.Migrator module. Migrations are most commonly used for database schema changes like creating tables, columns, etc. In fact, migrations are often so convenient to use that developers use them even in other circumstances, in particular instead of (or in conjuction with) migrating schema, they migrate data. Below we’ll discuss some of the challenges with either approach, especially around deployment and operations.

Challenges with schema migrations

Let’s say you just built a v1 of your product, made the first deployment, and everything is working flawlessly. You then added some new features (and/or fixed some bugs!), deployed them, and the application started to throw errors, what happened? Better remember to run those migrations on new deployments! Since it’s so easy to forget manual steps like that, you go ahead and configure your deployment pipeline to automatically run migrations on new release and things work well again. (Ecto manages migrations via the schema_migrations table and locks it so even if you deploy to multiple nodes and all of them automatically try to run migrations, only one node will actually do and the remaining ones will simply wait.)

If you have just one instance of your application and you make a new deployment, at some point you’ll have to restart your app to load the new code, which would mean downtime. Thus you should be running at least two instances of your application - the “new” application being updated and the “old” one that continues to serve traffic.

This approach, however, restricts which operations you can perform in your schema migrations. In a nutshell, as long as you add new tables, new columns, etc you should be fine, the “old” code doesn’t even know about them. But once you modify your schema, change the type of a column, drop a table, etc, the “old” code that was depending on it will no longer work. On those occasions, you should split your software deployment in two. The first only adds to your schema and changes the code to work on both the “old” and “new” versions. Then, after all of your instances are using the “new” code, you’ll do a second deployment to change your DB schema.

Another challenge are schema changes that take a really long time. For instance, you may add an index on a huge table, which holds up the deployment. While it’s really convenient to run migrations automatically, wouldn’t it be nice to be able to run that particular migration manually?

Challenges with data migrations

Data migrations are migrations that change the data stored in the database, rather than the database schema. For example, here is a migration that rewrites all users statuses from active to enabled:

defmodule MyApp.Repo.Migrations.UpdateUsersStatus do
  use Ecto.Migration

  def up do
    execute "UPDATE users SET status = 'active' WHERE status = 'enabled'"
  end

  def down do
    execute "UPDATE users SET status = 'enabled' WHERE status = 'active'"
  end
end

We may choose to implement this as an Ecto migration for the following reasons:

  • the logic is located in a well known place, it’s versioned along with any other code, it’s code reviewed, etc
  • each migration runs just once (unless rolled back)
  • migrations run in order
  • Ecto ensures only one node will run migrations at a time
  • it’s automatically executed on deployments (if we configured it as such)

On the flip side, slow data migrations will also slow down new deployments. We could forget about Ecto migrations for data changes and implement these as scripts (or just regular functions) and run them on demand but then we’d lose the locking and versioning mechanisms given by migrations.

In short, there’s a lot of value in using Ecto migrations but sometimes we want to run them automatically and sometimes on demand. How to do that?

Multiple migration directories

Fortunately, Ecto has support for multiple migrations directories, all we need to do is to split up our migrations accordingly, e.g.:

priv/
  repo/
    migrations/ # run "automatically"
    manual_migrations/

When we generate a new migration we can pass a --migrations-path option:

$ mix ecto.gen.migration --migrations-path=priv/repo/manual_migrations update_users
* creating priv/repo/manual_migrations
* creating priv/repo/manual_migrations/20201001160835_update_users.exs

We can pass it to mix ecto.migrate too:

$ mix ecto.migrate --migrations-path=priv/repo/manual_migrations

18:17:39.083 [info]  == Running 20201001160835 MyApp.Repo.Migrations.UpdateUsers.change/0 forward

18:17:39.086 [info]  == Migrated 20201001160835 in 0.0s

If we deploy with releases, we can define separate functions for each set of migrations:

defmodule MyApp.Release do
  @app :my_app

  def migrate do
    load_app()

    for repo <- repos() do
      path = Ecto.Migrator.migrations_path(repo)
      run_migrations(repo, path)
    end
  end

  def migrate_manual do
    load_app()

    for repo <- repos() do
      # requires Ecto v3.4+:
      path = Ecto.Migrator.migrations_path(repo, "manual_migrations")
      run_migrations(repo, path)
    end
  end

  defp run_migrations(repo, path) do
    {:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, path, :up, all: true))
  end

  defp repos do
    Application.fetch_env!(@app, :ecto_repos)
  end

  defp load_app do
    Application.load(@app)
  end
end

Since Ecto v3.4 we can pass multiple migration paths at the same time:

$ mix ecto.migrate --migrations-path=priv/repo/migrations --migrations-path=priv/repo/manual_migrations

18:17:39.083 [info]  == Running 20201001160800 MyApp.Repo.Migrations.CreateUsers.change/0 forward

18:17:39.083 [info]  == Running 20201001160835 MyApp.Repo.Migrations.UpdateUsers.change/0 forward

(...)

You may want to make that the default behaviour in dev & test. If you’re using Phoenix, you may already have ecto.setup and test Mix aliases, so let’s modify them to run all migrations:

defp aliases() do
  [
    "ecto.migrate_all": ["ecto.migrate --migrations-path=priv/repo/migrations --migrations-path=priv/repo/manual_migrations"],
    "ecto.setup": ["ecto.create", "ecto.migrate_all", "run priv/repo/seeds.exs"],
    test: ["ecto.create --quiet", "ecto.migrate_all --quiet", "test"]
  ]
end

Conclusion

With Ecto multiple migration directories support we can easily split up our migrations into ones that are automatically running on deployments and ones that we manually trigger after the code was updated. This technique can be useful for both schema and data migrations.

We also mentioned a situation where schema changes require us to split the deployment in two. In fact, we could even combine that into one deployment with two steps: we make the code changes, define the “destructive” schema migration as a “manual” one and deploy. Then, after the deployment is complete on all nodes (along with any “safe” automatic migrations), we simply trigger the manual one!

Finally, in dev & test we may actually want to run all migrations at the same time and we can easily do that by passing both migration directories.

Happy hacking!