Welcome to our blog: how it was made!

José Valim
February 3rd, 2020
elixir, phoenix

Two weeks ago we officially unveiled Dashbit and today we are glad to bring our blog to life! And in our first post as Dashbit, we want to share how we implemented the blog itself.

To be clear, we are aware it is 2020 and implementing a blog is nothing fancy nowadays. However, we chose to not rely on a database, which is a different approach than most would take, and we want to talk about this process as it may be applicable in other scenarios.

UPDATE #1: We have recently encapsulated a good chunk of this article (with some changes) into a project we called NimblePublisher. Give it a try!

Off-the-shelf or roll our own?

When implementing Dashbit’s website, our biggest question was: should we use something off-the-shelf, such as Wordpress or any CMS as a service, or should we roll our own? Dashbit’s website is mostly static content, so the main discussion point turned out to be the blog engine.

In the past, I have worked with both static page generators and publishing platforms. My favorite feature of static page generators is that we typically use pull requests to manage content and write new blog posts. In this scenario, blog posts are usually files in a Git repository. Given that everyone in our team is a developer, it perfectly fits our workflow. We know how to use Git to manage changes, track history, and review code via pull requests.

However, a static page generator has to build all pages upfront, which ultimately limits the range of features and usability that can be provided by the blog. This is not a concern on publishing platforms, which typically store all of the posts in the database, allowing them to dynamically render content in multiple different ways.

What if we could have the best of both worlds? What if we could keep the blog posts as simple files in our Git repository but still serve the posts with all dynamic features that you would expect from a blog, without having to rely on a database?

Precompiling blog posts

Dashbit’s website is a regular Phoenix application. In our codebase, to get a list of all blog posts, we simply call Dashbit.Blog.list_posts(), which is not different from how most Phoenix applications interact with their business domains.

The difference, however, is that Dashbit.Blog.list_posts() returns a list of blog posts that have been precompiled and already loaded into memory. There is no database involved. In a nutshell, when our project compiles, we read all blog posts from disk and convert them into in-memory data structures.

As we will see, there are many advantages to this approach. But let’s see some code first and then we will talk about why we like it.

The big traversal

What we know so far is that our application has a Dashbit.Blog context module which exports a list_posts() function. This function will return a list of Dashbit.Blog.Post structs. Let’s see how they look like.

We define our posts as regular Elixir structs with the following fields:

defmodule Dashbit.Blog.Post do
  @enforce_keys [:id, :author, :title, :body, :description, :tags, :date]
  defstruct [:id, :author, :title, :body, :description, :tags, :date]
end

When compiling the Dashbit.Blog module, we traverse a directory looking for all posts. It is roughly implemented like this:

defmodule Dashbit.Blog do
  alias Dashbit.Blog.Post

  posts_paths = "posts/**/*.md" |> Path.wildcard() |> Enum.sort()

  posts =
    for post_path <- posts_paths do
      @external_resource Path.relative_to_cwd(post_path)
      Post.parse!(post_path)
    end

  @posts Enum.sort_by(posts, & &1.date, {:desc, Date})

  def list_posts do
    @posts
  end
end

First, we traverse all posts in the filesystem. Our posts are placed in the posts directory at the root of our project. Each post follows this naming schema:

 /posts/YEAR/MONTH-DAY-ID.md

For each post found, we declare the source file as an @external_resource and then we call Post.parse!/1. Using @external_resource tells the Elixir compiler that, if the post changes in disk, it should recompile the Dashbit.Blog module. As we will see later, this plays an important role in live reloading. Then Post.parse!/1 is responsible for reading the post from disk and returning a Post struct. We will see how it is implemented soon.

Once all posts have been parsed, we sort the posts by descending date, using the new sorting feature in Elixir v1.10, and we store them in a module attribute. We read the module attribute inside the list_posts function, which will effectively embed all blog posts into the function. In other words, calling list_posts at runtime will simply return a list of all blog posts, which at that point have already been loaded into memory.

Those 15-ish lines are pretty much the core of our blog system. They allow us to read data from disk at compilation time and embedded them into our modules. Now it is time to talk about parsing.

Parsing blog posts

Now that we traverse all blog posts, we need to convert the contents in disk to a Post struct. This is done by the Post.parse!/1 function. However, we do have a challenge here. Besides its body, a post is made of many fields: title, author, tags, etc. So we need a simple syntax for writing a post that can include its body and all of its attributes. In our case, we choose a simple syntax like this:

 ==FIELD==
 VALUE

For example, this blog post itself looks like this:

 ==title==
 Welcome to our blog: how it was made!

 ==author==
 José Valim

 ==description==
 Today we announce...

 ==tags==
 elixir, phoenix

 ==body==
 Two weeks ago we officially unveiled Dashbit...

Furthermore, remember that our posts are placed in disk with the following filename format:

 /posts/YEAR/MONTH-DAY-ID.md

This post in particular is placed at:

 /posts/2020/02-03-welcome-to-our-blog-how-it-was-made.md

So besides the attributes inside the post contents, we also need to extract the Post :date and :id from its filesystem path.

Overall, our parse!/1 function looks like this:

def parse!(filename) do
  # Get the last two path segments from the filename
  [year, month_day_id] = filename |> Path.split() |> Enum.take(-2)

  # Then extract the month, day and id from the filename itself
  [month, day, id_with_md] = String.split(month_day_id, "-", parts: 3)

  # Remove .md extension from id
  id = Path.rootname(id_with_md)

  # Build a Date struct from the path information
  date = Date.from_iso8601!("#{year}-#{month}-#{day}")

  # Get all attributes from the contents
  contents = parse_contents(id, File.read!(filename))

  # And finally build the post struct
  struct!(__MODULE__, [id: id, date: date] ++ contents)
end

where parse_contents/2 is a private function implemented as follows:

defp parse_contents(id, contents) do
  # Split contents into  ["==title==\n", "this title", "==tags==\n", "this, tags", ...]
  parts = Regex.split(~r/^==(\w+)==\n/m, contents, include_captures: true, trim: true)

  # Now chunk each attr and value into pairs and parse them
  for [attr_with_equals, value] <- Enum.chunk_every(parts, 2) do
    [_, attr, _] = String.split(attr_with_equals, "==")
    attr = String.to_atom(attr)
    {attr, parse_attr(attr, value)}
  end
end

and finally parse_attr/2 has the logic for parsing each individual attribute:

defp parse_attr(:title, value),
  do: String.trim(value)

defp parse_attr(:author, value),
  do: String.trim(value)

defp parse_attr(:description, value),
  do: String.trim(value)

defp parse_attr(:body, value),
  do: value

defp parse_attr(:tags, value),
  do: value |> String.split(",") |> Enum.map(&String.trim/1) |> Enum.sort()

And that’s it! With the logic for parsing and handling each individual attribute, we can convert our files into structs and embedded them into Dashbit.Blog.list_posts(). Now all we need to do is to call Dashbit.Blog.list_posts() in our controllers and display the blog posts in the UI, as in any other Phoenix application.

Writing posts in Markdown

There is one feature missing in our blog engine: markdown support. So far we are showing the blog posts bodies as they are written. Just recall the parse_attr(:body, value) implementation above:

defp parse_attr(:body, value),
  do: value

It would be nice if we could write our posts in Markdown and have them converted into HTML at compile time. And it would be even nicer if we could actually add syntax highlighting to all of the code snippets during compilation too. This would mean no need for extra .js dependencies in the front-end!

Luckily, we can easily support Markdown and Syntax Highlighting in our blog by adding 2 dependencies, thanks to the amazing job done by the Elixir community: Earmark and Makeup Elixir.

Let’s add them to the deps function in our mix.exs:

{:earmark, "~> 1.3"},
{:makeup_elixir, "~> 0.14"},

Now, because we need to use them at compilation time, let’s make sure to start them before we parse the posts. Go back to Dashbit.Blog and add this at the top:

for app <- [:earmark, :makeup_elixir] do
  Application.ensure_all_started(app)
end

Finally, let’s change the parse_attr(:body, value) clause to the following:

defp parse_attr(:body, value),
  do: value |> Earmark.as_html!() |> Dashbit.Blog.Highlighter.highlight()

Earmark will convert the post from Markdown to HTML and Dashbit.Blog.Highlighter provides syntax highlighting. Dashbit.Blog.Highlighter.highlight/1 is a literal copy of the syntax highlighter code that ships with ExDoc. You could also depend on ExDoc for this functionality too, it is your call to have an extra dependency or not.

And that’s all. Now we got a complete blog engine, with both markdown support and syntax highlighting! In terms of syntax highlighting, Makeup supports both Elixir and Erlang. If you want to support other languages, we definitely encourage writing other makeup lexers and contribute them to the community!

Summing up the work so far

We are quite happy with the results we got! We can write posts using our favorite editors and review new blog posts via pull requests. Git will also keep a history of all of the changes that we have done, so we got that for free too. Publishing a new blog post is simply a matter of doing a new deployment.

Because all of the blog posts are pre-compiled, with Markdown and Syntax Highlighting, serving blog posts is extremely fast and we avoid the need for syntax highlighting on the front-end. However, the blog itself is not static in nature. We still have a collection of posts in memory, which means we can sort, paginate, and filter them, using all of the functionality available in Elixir.

In fact, before we go, let’s take a look at two small features we can add to make our blog system even better.

Bonus feature #1: tag filtering

Since all of the posts are a collection in memory, adding a feature that lists all tags or selects all posts with a given tag (as you can see in our sidebar) is very straight-forward.

Back in Dashbit.Blog, just add this code:

defmodule NotFoundError do
  defexception [:message, plug_status: 404]
end

@tags posts |> Enum.flat_map(& &1.tags) |> Enum.uniq() |> Enum.sort()

def list_tags do
  @tags
end

def get_posts_by_tag!(tag) do
  case Enum.filter(list_posts(), &(tag in &1.tags)) do
    [] -> raise NotFoundError, "posts with tag=#{tag} not found"
    posts -> posts
  end
end

And we are done! We sort and build our collection of tags at compile-time, similar to how we did with our post collection, and expose them in list_tags. Then to get all posts with a given tag, we filter the list of all posts looking for that given tag. In case we can’t find any post, we raise Dashbit.Blog.NotFoundError, which has a status of 404, allowing us to show a “Not Found” page whenever someone attempts to look for a tag that doesn’t exist.

Bonus feature #2: live reloading

The second bonus feature is live reloading. Wouldn’t it be nice if, as we wrote our blog posts, we could see how they would appear on our site immediately? Given that:

we are using Phoenix
and Phoenix has support for live reloading
and we list all posts as external resources using @external_resource

Then we already have this feature almost working! All we need to do to get live reloading is a one line of code change in our config files, simply to tell Phoenix Live Reloading system to also watch the “posts” directory. Open up config/dev.exs, search for live_reload: and add this to the list of patterns:

  live_reload: [
    patterns: [
      ...,
      ~r"posts/*/.*(md)$"
    ]
  ]

and now you can enjoy live reloading as you write!

We hope you have enjoyed our introduction to our blog! We have many more interesting articles in the pipeline, so subscribe to our newsletter on top of our sidebar or follow us on Twitter for further news.

Welcome to our blog: how it was made!

Off-the-shelf or roll our own?

Precompiling blog posts

The big traversal

Parsing blog posts

Writing posts in Markdown

Summing up the work so far

Bonus feature #1: tag filtering

Bonus feature #2: live reloading

Subscribe to Dashbit Updates

Follow us

Recent Posts

Tags