A sneak peek at Ecto 3.0: query improvements (part 2)

Welcome to the “A sneak peek at Ecto 3.0” series:

  1. Breaking changes
  2. Query improvements part 1
  3. Query improvements part 2 (you are here!)
  4. Performance, migrations and more

This time we are back to cover other improvements coming to Ecto.Query in Ecto 3.0.

UNION, EXCEPT and INTERSECT

With Ecto 3.0, it is now possible to add unions/excepts/intersects to queries. For example, to get all cities for both customers and suppliers, you can now do:

customer_city_query = Customer |> select([c], c.city)
Supplier |> select([s], s.city) |> union(customer_city_query)

Keep in mind that union will attempt to remove any duplicates and that can be expensive. In many cases, especially when you know duplicates cannot happen or you don’t care about returning duplicates, you should use union_all instead.

Adding support for unions has been a frequently requested feature in Ecto for quite some time. However, all previous approaches to implement this feature were misguided because all of them assumed that we would need to introduce a new data-type that holds the union of two queries.

In other words, in the approaches we had in mind, union(query1, query2) would return a new construct similar to Ecto.UnionQuery{left: query1, right: query2}. We were skeptical about this as it could push accidental complexity to users of Ecto that would now have to handle different types of queries.

All of this changed when Timofey Martynov sent a pull request that adds UNION / UNION ALL support by simply treating the UNION / UNION ALL as a field in the Ecto.Query, in the same way we store ORDER BYs, LIMITs, WHEREs and so on. While this direction seemed misguided at first, once we re-read the SQL specification, it became clear that this is the correct way to model UNION / UNION ALL.

Let’s see an example. Consider this SQL query:

SELECT city FROM suppliers UNION SELECT city FROM customers LIMIT 10

Which of the queries below is equivalent to the one above?

a) (SELECT city FROM suppliers) UNION (SELECT city FROM customers LIMIT 10) b) SELECT city FROM suppliers UNION (SELECT city FROM customers) LIMIT 10

After an informal poll, many chose a) because they expected UNION to work like some top-level, low-precedence operator, but the correct answer is b). The PostgreSQL documentation also discusses this:

The UNION clause has this general form:

select_statement UNION [ ALL | DISTINCT ] select_statement

select_statement is any SELECT statement without an ORDER BY, LIMIT, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, or FOR KEY SHARE clause. (ORDER BY and LIMIT can be attached to a subexpression if it is enclosed in parentheses. Without parentheses, these clauses will be taken to apply to the result of the UNION, not to its right-hand input expression.)

In other words, UNION/INTERSECT/EXCEPTs should be modelled as WHERE as they are both considered clauses of a given query and not a top-level operation. This is precisely how it has been implemented in Ecto. The more you know!

WINDOW and OVER support

Ecto 3.0 finally gets support for windows. I mean WINDOWs, not Windows. We have always supported Windows. Ok. This is confusing. Let’s try again.

Ecto 3.0 finally gets support for WINDOW clauses, the OVER operator, as well as many WINDOW functions. For example, to compare each employee’s salary with the average salary in their department:

from e in Employee,
select: {e.depname, e.empno, e.salary, avg(e.salary) |> over(:department)},
windows: [department: [partition_by: e.depname]]

The over/2 operator expects either a window name or a window expression as second argument. The query below would return the same results:

from e in Employee,
select: {e.depname, e.empno, e.salary, avg(e.salary) |> over(partition_by: e.depname)}

The first argument should have an aggregator or any of the WINDOW functions. By default we support all of the built-in functions found in PostgreSQL and MySQL. They can be found in the Ecto.Query.WindowAPI module (we are linking to the source as the docs haven’t been released yet).

This work was contributed by Anton. You can read the original discussion in the issues tracker.

Other changes

There are many other exciting changes in Ecto.Query. For example, it now has built-in support for coalesce, such as select: coalesce(p.title, p.old_title), or even better with the pipe operator: p.field1 |> coalesce(p.field2) |> coalesce(p.field3).

We also support FILTER expressions, allowing you filter the value of aggregators: select: filter(count(), p.public == true)

Finally, order_by now supports :asc_nulls_last, :asc_nulls_first, :desc_nulls_last, and :desc_nulls_first, allowing you to configure exactly when NULLs are returned when ordering: order_by: [desc_nulls_first: p.title]. If you are using :desc and :asc, then the behaviour is the same as in Ecto 2.0, which is database dependent (and surprise, surprise! they won’t agree with each other).

This finishes the third article on our series about Ecto 3.0. There are many other things we would like to share with you, such as performance improvements, safer migrations and more. We are not quite sure how many articles we still have to write but we are certainly not done. See you soon!

P.S.: This post was originally published on Plataformatec’s blog.