For some time now, we’ve been working with SQL to communicate with database systems. What we learned in these years is that SQL is not a good way to query data, and I’m going to explain why.
SQL should be a standard way of querying data, but most programmers have learned (probably the hard way) that most of databases implement SQL in a different way. What is means in practice is that any time we need to change databases we will face lots of incompatibilities and queries that simply won’t work as we expect. But this is only the beginning of our problems…
We tried lots of ways to solve this kind of problem, one of them migrating to ORMs. But, ORMs in fact solve a different problem – the one that relational databases work with row-column structures, and our programming languages use objects, hash-maps, records, and other richer ways of representing data. Ruby’s ActiveRecord was a huge step forward, promising us to deliver value simplifying our relational-object mapping, but in the end we faced the same problems – incompatible queries, SQL fragments being thrown in the code, and in the end, we ended up with another huge kind of problems – performance, complexity, and separation of concerns problems (a single ActiveRecord mapping is responsible for validation, for queries, and to define business logic). Even worse, the Arel promise (a complete library to abstract every possible SQL query) was underused – it’s now an internal library to ActiveRecord, it doesn’t really have a stable public API, and in every minor version, something changes in a bizarre and incompatible way.
So, I’ve started a simple project named relational. In the beginning, it was just a playground to learn Scala. But, right now, and faced with modern problems (I’m working with Clojure, and it doesn’t really have a good way to query relational databases – Korma is incomplete in multiple ways, HoneySQL doesn’t really delivers what I want, and other libs are just wrappers around string queries), I’m implementing a version of Relational in Clojure, and the reason I’ve started working on it is kinda simple…
SQL isn’t a standard.
Okay, if we just want to query all data from a single database, inner-joining with other, just listing the fields, it’s completely fine. Add SQL functions and pagination, and we’re in a pinch – for instance, the standard way of limiting the result to just 100
rows is:
SELECT * FROM table FETCH FIRST 100 ROWS ONLY
I don’t know a single person who wrote this kind of query, simply because almost no database supports the standard – in PostgreSQL, MySQL and Sqlite, it’s written as:
SELECT * FROM table LIMIT 100
In Oracle, it is
SELECT * FROM table WHERE rownum < 100
In Microsoft SQL Server, it is
SELECT TOP 100 * FROM table
And don't even start with GROUP_CONCAT
or other strange SQL functions…
Things become even worse when we’re dynamically filtering data: for instance, imagine we have a webpage when we have a bunch of filters. How do we write the query? Sure, we can write a function that checks if the list of filters is not empty, then adds a WHERE
clause… and if it’s more than one, we join these strings with AND
. And, if one of the filters refers another table, we add an INNER JOIN
… and now, we’re writing code to write SQL, and not to query our data. This is more than reason to create Relational for me: strings are not a good way of representing SQL. In Relational, there is no way write strings to query data – we need to use an API to query.
The idea behind relational is simple – every query is represented by a structure. This structure must respond to a function – in Clojure, partial-fn
– that will return us another function. This second function expects a single argument – an SQL adapter – that will correctly generate the SQL tuple, correctly quoting attributes, generating SQL functions, and generating windowing functions the right way. The SQL tuple is simply a string fragment, followed by a list of bindings. So, for Clojure, the query:
(-> (select :*) (from :table) (where (:< :age 10)))
Will generate a structure, that when we apply partial-fn
, will generate a function, that when applied to MySQL adapter, will generate:
["SELECT * FROM `table` WHERE `table`.`age` < ?", [10]]
And then, we can pass this to JDBC or any other system to query database.
Why care?
First of all, today we have in-memory databases that are as powerful as other systems – HSQLDB and Sqlite are wonderful for unit-tests, but we all know the dangers of working with different databases in production. When using this indirect approach, we know that things will work the same, because every specific, non-standard SQL of every database is already implemented and correctly tested in Relational.
What I really find strange is that this approach is what we’ve been doing all the time with programming languages – developing languages to run on VMs, so we can re-use code made in one language on others. I found virtually no work being done on working with SQL the same way, and sometimes I think I’m trying to solve the wrong problem. Yet, there’s a lot to be done – SQL queries are complex in multiple ways, and there are lots of things that some databases simply doesn’t support – like MySQL that doesn’t support recursive queries, or some DBs that doesn’t support full outer joins. Even worse, there are UNION ALL
, INTERSECT
, and other things that really doesn’t translate to SELECT ... FROM ... WHERE
.
There’s still lots to be done, and only time will show us if this is a correct direction, or only another shot in the dark. What we all know is that we need a better way to query databases. NoSQL isn’t the solution. Pure SQL strings aren’t solution either. Stored procedures are more a part of the problem than of the solution. Relational still have the problem that it needs to be implemented for every language, but maybe it’ll be a better approach that we have today.