Rails’ ActiveRecord – the bad and the ugly

I’m know to not be a big fan of ActiveRecord. No, that would be a simplification: I probably hate ActiveRecord and think it adds more problems than it solves, specially after I began to work with functional programming and saw how difficult, if not utterly impossible, is to make ActiveRecord models behave like immutable structures or separate (and maybe even predict) the I/O from the rest of the code.

The ActiveRecord pattern (not the GEM) was created to hide SQL details from the users. The Gem elevates this to extremes: you never know when a query is issue, what query is issued (unless you check the logs), and sometimes a latter clause modifies the way previous clauses work. Also, to extend ActiveRecord, you need to rely on monkey-patches and other internal implementation details, and there are API changes that seem innocent but are tremendously dangerous.

Now, what I want to do in this post is to elaborate the bad and the ugly parts. I’m not gonna talk about the “good parts” because we already know: auto-discovery of fields, fast prototyping, simple CRUDs, and so on. One could argue that this “easy setup, fast prototyping” is not worth the amount of technical debt you’ll have later, but let’s focus on the bad parts instead:

Transactions

Lots of tasks in ActiveRecord implicitly open transactions on DB. This is not really an issue but nested transactions are unpredictable in ActiveRecord:

User.transaction {
  User.transaction {
    User.create!(name: "FOOBAR")
    raise ActiveRecord::Rollback
  }
}

For example, this code commits the change, even when we added a rollback at the end. I know it’s documented behavior and also that there’s a workaround, but the problem is how to know when you should use that workaroud or not, given that multiple places (before_save, after_save) implicitly creates a transaction.

The problem is not only that explicitly triggering a rollback inside these “nested transactions would not rollback anything – the bigger problem is that it does not throw an error, nor an warning, nothing: just do the exact opposite of what you want.

The N+1 problem

It is almost impossible to eliminate the N+1 process completely. Lots of commands do unexpected things, preloads things or issue sub-queries, and there’s no possible way to disable this “lazy fetch/preload/query” thing.

The problem is that, again, ActiveRecord does not make you think in queries; you need to fire up a console and check logs if you want to see when it’s hitting a database, and if the query it hits is performing well or not. For example, any? method will return different queries depending on how it’s called:

Example.where(id: 10).any?
# Will issue a COUNT(*), and will not preload result:
#   (145.1ms)  SELECT COUNT(*) FROM `examples` WHERE `examples`.`id` = 10

example = Example.first
example.children.any?
# Will hit a database with a "cheap query" to check existence, 
# and will not cache the result
#   Children Exists (142.4ms)  SELECT  1 AS one FROM `children` WHERE `children`.`example_id` = 1 LIMIT 1
example.children.any?
# Will hit DB again
#   Children Exists (144.6ms)  SELECT  1 AS one FROM `children` WHERE `children`.`example_id` = 1 LIMIT 1

Now, if the association is already preloaded by Rails, any? will not hit the database if it is being used to check for associations:

example.children.present?
# Will hit BD and preload
#   Children Load (143.6ms)  SELECT `children`.* FROM `children` WHERE `children`.`example_id` = 1
example.children.any?
#   (no queries)
example.children.any?
#   (no queries)

Now, in theory, empty? should be the same as blank? for array-like objects. It is not for ActiveRecord, for example:

example.children.empty?
  Children Exists (142.4ms)  SELECT  1 AS one FROM `children` WHERE `children`.`example_id` = 1 LIMIT 1
example.children.blank?
  Children Load (143.6ms)  SELECT `children`.* FROM `children` WHERE `children`.`example_id` = 1

Who knows what else we’ll find?

API Changes

In the past, .all triggered a query. On some version change, it now returns a relation, and it needs a .to_a or .first to trigger it. It seems innocent enough, but the problem is that querying things is not side-effect free: maybe you want to lock rows on your table, and you would use:

Person.transaction do
  people = Person.where('age < 18').lock.all
  # ... do something
end

Now, it’s notably difficult to test this code (specially as Ruby have the GIL and will not allow threads to run in parallel). This means that this API change made all your locks useless and also there’s no way to find out.

Let’s not forget the huge changes on the way errors were represented: once we had an person.errors_on(:field) that would return a string (if there was only one error), and array (if there were multiple) or nil (if there were none). Then, this API changed (for better) to just return an array. Then it disapeared!

At least, there was the promise of Arel: that it would compose queries easier. The problem is that Arel was never treated as a “real” separate library, but as an implementation detail. So, lots of people who tried to implement some “SQL query composer” over Arel (me included!) was hit with unexpected breakages (once to compose an OR, you had to pass an array, then it became multi-arity and AND you had to pass an array, then both became multi-arity), unexpected behavior changes (the API that supposedly creates a LIKE changed to create an ILIKE for PostgreSQL, maybe to keep symmetry with MySQL that’s case insensitive), and messed and undocumented API (lots of things simply didn’t compose, some had cyclical dependencies like Arel -> ActiveRecord -> Arel). The canonical way of creating queries, until today, is to add string fragments to .where.

Arbitrary Limitations, and Magic

Speaking of where, we still don’t have a way to generate OR queries. And the API is further complicated because of .where.not – a .were without arguments will return an intermediate class that does absolutely nothing just to be able to chain a .not there; also, it makes where have multiple return types.

There’s no way to make “UNION” nor “UNION ALL”, there’s no support for “WITH” queries (who cares that MySQL refuses to implement it?), there’s no way to extend ActiveRecord to support specific datatypes and operators like PostgreSQL’s JSONB, ARRAY, and also for a looong time we didn’t have foreign keys (lots of databases still suffer today because of these constraints relaxations).

There’s also some code that’s notably unpredictable: duplicated table names on joins will generate alias so you need to be aware of these names but remember that you can’t join in a different order nor decide on an alias name; .select, .group does not adds the table name for the fields, but .where does; .includes changes the query depending on future (or previous) where clauses, so it’s unpredictable; .eager_load ignores all your .select statements; .unscoped will remove all clauses (even things that don’t scope like SELECT, FROM, etc); and so on…

Let’s not forget that, for a long time, LEFT JOIN was not supported too…

What’s the Alternative?

Ever heard of Sequel? Or maybe it’s time we free Ruby from the Rails world and start to thing on something similar to Elixir’s Ecto, or Clojure’s HoneySQL, something that can help us thing differently.

If we want some answer right now, Sequel is a great choice: it’s simpler, it was designed to support extensions (and it does a pretty good job at it!), and it does not, by default, issue queries unless you ask him for it. For example, this is the way we declare a mapping to a “people” table:

class Person < Sequel::Model
end

Identical do ActiveRecord, right? The difference is that you don’t need to work with Sequel the same as you would work on ActiveRecord. You can make things immutable and only add queries on the class methods:

class Person < OpenStruct
  extend Sequel::Model::ClassMethods
  self.dataset = Sequel::DATABASES[0][:people]

  def self.call(params)
    new(params).freeze
  end

  def self.create!(params)
    now = Time.current
    fields = params.merge(
      created_at: now, updated_at: now
    )
    pk = insert(fields)
    call(fields.merge(id: pk))
  end

  def update!(params)
    fields = params.merge(updated_at: Time.current)
    Person.dataset.where(id: id).update(fields)
    Person.call(to_h.merge(fields))
  end
end

Now, update! will return a copy of a Person, but the original one is unaffected. It also allows for pluggable validation methods (you can use ActiveRecord’s or Dry::Validation, or anything really), you don’t need to worry about “stale” objects, and so on. If you want to plug some validation, you just need to add some code right before insert or Person.dataset.where(id: id).update(fields):

  def update!(params)
    fields = params.merge(updated_at: Time.current)
    errors = validate(to_h, fields)
    return FailureObject.new(errors) if errors
    Person.dataset.where(id: id).update(fields)
    Person.call(to_h.merge(fields))
  end

Things become even more interesting, because you can validate if a new set of fields is valid based on the current set of fields: no more getter-pollution like changes, name_before_last_change, name_changed?, and so on. You also control how you create the object, and this is huge: you can implement some kind of “single table inheritance” without relying on magic class names on the database; you can disable, for example, updates on an object if a Person is on status deleted; you can make specific APIs for queries and completely separate then from data representation, and so on…

We need to get rid of magical code in our systems. Just because ActiveRecord seems like something easy to use, and it’s the default choice on the Ruby world doesn’t mean that it needs to be used everywhere, and be the default for any project.

This entry was posted in Ruby and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *