Using Broadway at Hexdocs.pm

This is a quick blog post about our experience replacing Hexdocs.pm’s GenStage pipeline with Broadway.

To give some background information, Hexdocs.pm started out as basically just static file hosting for documentation. With the introduction of private Hexdocs it became a distinct Elixir application. Over time, we have also moved handling of documentation tarballs there to offload API servers. Instead of API servers doing all the work, they now just upload a tarball to S3 which automatically sends a SQS message which is then picked up by the Hexdocs app. The initial implementation of Hexdocs pipeline was done with a custom GenStage producer and a consumer.

Updating the pipeline to use Broadway was really straightforward. We’ve completely removed our custom producer and replaced it with BroadwaySQS.Producer. In terms of consuming messages, our code is pretty much unchanged, instead of implementing GenStage.handle_events/3 callback we now implement Broadway.handle_message/3.

Previously, we needed to configure our supervision tree to start X producers and Y consumers, and set consumers to be subscribed to producers. With Broadway, we specify the desired topology and it starts all processes under a dedicated supervisor. Not only it’s a more declarative approach, Broadway automatically adds a “Terminator” process to the supervision tree that ensures proper application shutdown. While before the application could abort a job in the middle of processing, now Broadway ensures the job queue is drained before shutting down the app.

On the testing front, we didn’t start our GenStage pipeline at all during tests to avoid doing network requests, and we tested the logic through internal APIs. Now, we’re conditionally using Broadway.DummyProducer, which doesn’t hit the network, and we’re triggering an event in the pipeline using Broadway.test_messages/2 making the test more realistic.

Perhaps the biggest win by moving over to Broadway was that it automatically batches and acknowledges messages. This, along with other existing and planned future features like rate-limiting and backoff, is what is most appealing about Broadway - that the community best practices will usually be the default behaviour or just a matter of configuration.

Overall, we were very happy with updating Hexdocs to use Broadway and we’ve been running it in production for last few months without issues. Not only we removed a lot of code, we got a couple nice features for free and we will continue to reap the benefits as Broadway gets updated.

See hexpm/hexdocs#11 to see all required code changes.

P.S.: This post was originally published on Plataformatec’s blog.