I’m known to not be a big fan of ActiveRecord. No, that would be a simplification: I probably hate ActiveRecord and think it adds more problems than it solves, specially after I began to work with functional programming and saw how difficult, if not utterly impossible, is to make ActiveRecord models behave like immutable structures or separate (and maybe even predict) the I/O from the rest of the code.
The ActiveRecord pattern (not the GEM) was created to hide SQL details from the users. The Gem elevates this to extremes: you never know when a query is issue, what query is issued (unless you check the logs), and sometimes a later clause modifies the way previous clauses work. Also, to extend ActiveRecord, you need to rely on monkey-patches and other internal implementation details, and there are API changes that seem innocent but are tremendously dangerous.
Now, what I want to do in this post is to elaborate the bad and the ugly parts. I’m not gonna talk about the “good parts” because we already know: auto-discovery of fields, fast prototyping, simple CRUDs, and so on. One could argue that this “easy setup, fast prototyping” is not worth the amount of technical debt you’ll have later, but let’s focus on the bad parts instead:
Transactions
Lots of tasks in ActiveRecord implicitly open transactions on DB. This is not really an issue but nested transactions are unpredictable in ActiveRecord:
User.transaction { User.transaction { User.create!(name: "FOOBAR") raise ActiveRecord::Rollback } }
For example, this code commits the change, even when we added a rollback at the end. I know it’s documented behavior and also that there’s a workaround, but the problem is how to know when you should use that workaroud or not, given that multiple places (before_save
, after_save
) implicitly creates a transaction.
The problem is not only that explicitly triggering a rollback inside these “nested transactions would not rollback anything – the bigger problem is that it does not throw an error, nor an warning, nothing: just do the exact opposite of what you want.
The N+1 problem
It is almost impossible to eliminate the N+1 process completely. Lots of commands do unexpected things, preloads things or issue sub-queries, and there’s no possible way to disable this “lazy fetch/preload/query” thing.
The problem is that, again, ActiveRecord does not make you think in queries; you need to fire up a console and check logs if you want to see when it’s hitting a database, and if the query it hits is performing well or not. For example, any?
method will return different queries depending on how it’s called:
Example.where(id: 10).any? # Will issue a COUNT(*), and will not preload result: # (145.1ms) SELECT COUNT(*) FROM `examples` WHERE `examples`.`id` = 10 example = Example.first example.children.any? # Will hit a database with a "cheap query" to check existence, # and will not cache the result # Children Exists (142.4ms) SELECT 1 AS one FROM `children` WHERE `children`.`example_id` = 1 LIMIT 1 example.children.any? # Will hit DB again # Children Exists (144.6ms) SELECT 1 AS one FROM `children` WHERE `children`.`example_id` = 1 LIMIT 1
Now, if the association is already preloaded by Rails, any?
will not hit the database if it is being used to check for associations:
example.children.present? # Will hit BD and preload # Children Load (143.6ms) SELECT `children`.* FROM `children` WHERE `children`.`example_id` = 1 example.children.any? # (no queries) example.children.any? # (no queries)
Now, in theory, empty?
should be the same as blank?
for array-like objects. It is not for ActiveRecord, for example:
example.children.empty? Children Exists (142.4ms) SELECT 1 AS one FROM `children` WHERE `children`.`example_id` = 1 LIMIT 1 example.children.blank? Children Load (143.6ms) SELECT `children`.* FROM `children` WHERE `children`.`example_id` = 1
Who knows what else we’ll find?
API Changes
In the past, .all
triggered a query. On some version change, it now returns a relation, and it needs a .to_a
or .first
to trigger it. It seems innocent enough, but the problem is that querying things is not side-effect free: maybe you want to lock rows on your table, and you would use:
Person.transaction do people = Person.where('age < 18').lock.all # ... do something end
Now, it’s notably difficult to test this code (specially as Ruby have the GIL and will not allow threads to run in parallel). This means that this API change made all your locks useless and also there’s no way to find out.
Let’s not forget the huge changes on the way errors were represented: once we had an person.errors_on(:field)
that would return a string (if there was only one error), and array (if there were multiple) or nil
(if there were none). Then, this API changed (for better) to just return an array. Then it disapeared!
At least, there was the promise of Arel: that it would compose queries easier. The problem is that Arel was never treated as a “real” separate library, but as an implementation detail. So, lots of people who tried to implement some “SQL query composer” over Arel (me included!) was hit with unexpected breakages (once to compose an OR
, you had to pass an array, then it became multi-arity and AND
you had to pass an array, then both became multi-arity), unexpected behavior changes (the API that supposedly creates a LIKE
changed to create an ILIKE
for PostgreSQL, maybe to keep symmetry with MySQL that’s case insensitive), and messed and undocumented API (lots of things simply didn’t compose, some had cyclical dependencies like Arel -> ActiveRecord -> Arel). The canonical way of creating queries, until today, is to add string fragments to .where
.
UPDATE: Now on Rails 5, every query that used
.distinct.sum
changed the final query:
Person.group(:age).sum(:salary) # Will issue, on Rails < 5: # SELECT DISTINCT age, SUM(salary) # And on Rails >= 5: # SELECT age, SUM(DISTINCT salary)
Admittedly, the first queries makes no sense. The problem is that is common, on Rails, to pass “relations” around, so one of these “relations” could have set the .distinct
and then other places are using aggregate functions over that information. The problem is that, instead of creating a sum_distinct
function, it just broke the API in a way that’s confusing – for every part of the code, you’ll have to track back things – and trust me, it can be nested really deep.
Arbitrary Limitations, and Magic
Speaking of where, we still don’t have a way to generate OR
queries. And the API is further complicated because of .where.not
– a .were
without arguments will return an intermediate class that does absolutely nothing just to be able to chain a .not
there; also, it makes where
have multiple return types.
There’s no way to make “UNION” nor “UNION ALL”, there’s no support for “WITH” queries (who cares that MySQL refuses to implement it?), there’s no way to extend ActiveRecord to support specific datatypes and operators like PostgreSQL’s JSONB, ARRAY, and also for a looong time we didn’t have foreign keys (lots of databases still suffer today because of these constraints relaxations).
There are also some very strange confusion about which class to return, and each one of these behave differently in some very subtle ways:
Person.distinct.class # => Person::ActiveRecord_Relation Person.all.class # => Person::ActiveRecord_Relation Person.first.children.class # => Child::ActiveRecord_Associations_CollectionProxy Person.first.children.distinct.class # => Array
There’s also some code that’s notably unpredictable: duplicated table names on joins
will generate alias so you need to be aware of these names but remember that you can’t join in a different order nor decide on an alias name; .select
, .group
does not adds the table name for the fields, but .where
does; .includes
changes the query depending on future (or previous) where clauses, so it’s unpredictable; .eager_load
and .group
ignores all your .select
statements; .unscoped
will remove all clauses (even things that don’t scope like SELECT, FROM, etc); and so on…
Let’s not forget that, for a long time, LEFT JOIN
was not supported too…
What’s the Alternative?
Ever heard of Sequel? Or maybe it’s time we free Ruby from the Rails world and start to think on something similar to Elixir’s Ecto, or Clojure’s HoneySQL, something that can help us thing differently.
If we want some answer right now, Sequel is a great choice: it’s simpler, it was designed to support extensions (and it does a pretty good job at it!), and it does not, by default, issue queries unless you ask him for it. For example, this is the way we declare a mapping to a “people” table:
class Person < Sequel::Model end
Identical do ActiveRecord, right? The difference is that you don’t need to work with Sequel the same as you would work on ActiveRecord. You can make things immutable and only add queries on the class methods:
class Person < OpenStruct extend Sequel::Model::ClassMethods self.dataset = Sequel::DATABASES[0][:people] def self.call(params) new(params).freeze end def self.create!(params) now = Time.current fields = params.merge( created_at: now, updated_at: now ) pk = insert(fields) call(fields.merge(id: pk)) end def update!(params) fields = params.merge(updated_at: Time.current) Person.dataset.where(id: id).update(fields) Person.call(to_h.merge(fields)) end end
UPDATE: as with all things Ruby-related, Sequel changed the way it defines its ORM structure so now the code above don’t work anymore. So again, we’re left with a breaking-change and little to no alternative. I still think that Sequel is a better choice than ActiveRecord, but the code above will not work anymore. For the matter of completeness, here’s the rest of the post, that became outdated unfortunately…
Now, update!
will return a copy of a Person, but the original one is unaffected. It also allows for pluggable validation methods (you can use ActiveRecord’s or Dry::Validation, or anything really), you don’t need to worry about “stale” objects, and so on. If you want to plug some validation, you just need to add some code right before insert
or Person.dataset.where(id: id).update(fields)
:
def update!(params) fields = params.merge(updated_at: Time.current) errors = validate(to_h, fields) return FailureObject.new(errors) if errors Person.dataset.where(id: id).update(fields) Person.call(to_h.merge(fields)) end
Things become even more interesting, because you can validate if a new set of fields is valid based on the current set of fields: no more getter-pollution like changes
, name_before_last_change
, name_changed?
, and so on. You also control how you create the object, and this is huge: you can implement some kind of “single table inheritance” without relying on magic class names on the database; you can disable, for example, updates on an object if a Person
is on status deleted
; you can make specific APIs for queries and completely separate then from data representation, and so on…
We need to get rid of magical code in our systems. Just because ActiveRecord seems like something easy to use, and it’s the default choice on the Ruby world doesn’t mean that it needs to be used everywhere, and be the default for any project.