Advent of Code in SQL – Day 01

Ok, this year I decided to try the Advent of Code. But to level up my SQL game, I decided to do it purely in PostgreSQL. No stored procedures, no tables, just pure logic over SQL (or, at least, PostgreSQL-flavored SQL). I’ll also try to not use PG specific datatypes like JSONB or Arrays/Ranges, but I can’t promise this – the challenge is already hard with the way it is.

So, I decided that the input will be a single table called input with a single record and field, called input. This means that I have to split the input in some way – I’m currently using regexp_split_to_table(input.input, '\n') to generate a table with all the lines when this makes sense.

So, day 1. The first thing is to generate the input:

WITH input AS (SELECT '199
200
208
210
200
207
240
269
260
263' AS input)

The Day 01 problem is kinda simple: you compare a row with the next one and when the next one is greater, you add 1 to a result. So in this case, we do need to split the input into numeric rows. There are two ways of doing this: you either first split then cast, or split and cast in a single query. The first one is easier, so let’s go with that:

WITH depths_as_str as (SELECT regexp_split_to_table(input.input, '\n') depth FROM input)
  , depths as (SELECT depth  :: integer FROM depths_as_str)

(more…)

Understanding Ruby’s Awesome Nested Set

Recently, I saw people migrating from Awesome Nested Set to Ancestry. The reasons are simple – Ancestry is very simple, it just need a new string field in your table, and it’s easier to reason about. So, why should I even consider to use an alternative?

The answer is simple, and I’m going to quote H. L. Mencken: For every complex problem there is an answer that is clear, simple, and wrong.

And the reason that using Ancestry is a bad idea is simple: it doesn’t reflect good design. It treats a single field as a multi-valored column that keeps the ancestry of your object. This means that there’s no way to fetch all records AND their parents in a single query, or fetch all records AND their children, because we need to split the string (ruby-side) and then create a query (also ruby-side). So, in this post, I’ll show some tricks of what we can do with Awesome Nested Set, or even beter, how does it implements the tree pattern (and why it calls it a “set”, not a “tree”).

First, when we think about categories and sub-categories, we think like this:

But this is, in fact, a terrible way of representing trees in SQL. So, the alternative, is to transform it in a group of sets: Parent 1 (P1, for short), is a super-set containing subsets C1 and C2 (Children 1 and Children 2, for short). Each of the children have its own grandchildren, so C1 is a super-set containing G1, and C2 is a super-set containing G2 and G3.


(more…)

Clojure with SQL databases

There are times when a specific piece of technology captures my attention in such a way that I feel the need to study more, and do things with it. Since I began to study Clojure, and after two jobs working with it, I’m still in love with it – I think it matches what I expect in a language most of the time, and also matches my repl-driven-development workflow.

So why I’m not considering it for new projects? Even personal ones?

When I need to prototype something really fast, I end up going for Ruby, with Sinatra (sorry rails, but you’re a terrible choice for me). I saw that in my last project, I worked with Sequel, postgresql, and Sinatra. For tests, I used VCR and recorded the external api calls that I had to make.

Well, I could probably prototype a simple VCR that would record a specific function call in Clojure. Sinatra, I could use Compojure. As for Sequel… Or ActiveRecord, for that matter…

There are a few libraries that can be used to communicate with SQL: Korma, that doesn’t support select ... for update and have some implicit connection problems and other issues, HoneySQL that interprets Clojure maps and converts then to SQL, and lots of other libraries that simply ask you to write your query and then, somehow, call then for you. I didn’t care too much for the later, after all, if I want to write queries I prefer to not depend on a library that does it to me – I know that there are people that don’t like to have SQL in the middle of the code, but I don’t see any problem with it.
(more…)

We need a better way to write SQL

For some time now, we’ve been working with SQL to communicate with database systems. What we learned in these years is that SQL is not a good way to query data, and I’m going to explain why.

SQL should be a standard way of querying data, but most programmers have learned (probably the hard way) that most of databases implement SQL in a different way. What is means in practice is that any time we need to change databases we will face lots of incompatibilities and queries that simply won’t work as we expect. But this is only the beginning of our problems…

We tried lots of ways to solve this kind of problem, one of them migrating to ORMs. But, ORMs in fact solve a different problem – the one that relational databases work with row-column structures, and our programming languages use objects, hash-maps, records, and other richer ways of representing data. Ruby’s ActiveRecord was a huge step forward, promising us to deliver value simplifying our relational-object mapping, but in the end we faced the same problems – incompatible queries, SQL fragments being thrown in the code, and in the end, we ended up with another huge kind of problems – performance, complexity, and separation of concerns problems (a single ActiveRecord mapping is responsible for validation, for queries, and to define business logic). Even worse, the Arel promise (a complete library to abstract every possible SQL query) was underused – it’s now an internal library to ActiveRecord, it doesn’t really have a stable public API, and in every minor version, something changes in a bizarre and incompatible way.

So, I’ve started a simple project named relational. In the beginning, it was just a playground to learn Scala. But, right now, and faced with modern problems (I’m working with Clojure, and it doesn’t really have a good way to query relational databases – Korma is incomplete in multiple ways, HoneySQL doesn’t really delivers what I want, and other libs are just wrappers around string queries), I’m implementing a version of Relational in Clojure, and the reason I’ve started working on it is kinda simple…

SQL isn’t a standard.

Okay, if we just want to query all data from a single database, inner-joining with other, just listing the fields, it’s completely fine. Add SQL functions and pagination, and we’re in a pinch – for instance, the standard way of limiting the result to just 100 rows is:

SELECT * FROM table FETCH FIRST 100 ROWS ONLY

I don’t know a single person who wrote this kind of query, simply because almost no database supports the standard – in PostgreSQL, MySQL and Sqlite, it’s written as:

SELECT * FROM table LIMIT 100

In Oracle, it is

SELECT * FROM table WHERE rownum < 100

In Microsoft SQL Server, it is

SELECT TOP 100 * FROM table

And don't even start with GROUP_CONCAT or other strange SQL functions…
(more…)

SQL Orientado a Objetos

O nome parece estranho, mas um ORM, dependendo de como ele for implementado, pode ser usado exatamente para isso.

Estou trabalhando numa lib em Scala chamada relational, na qual eu pretendo fazer um SQL inteiro virar um objeto Scala. Mais ou menos o que o Arel tenta fazer, porém de forma esquisita (meio compatível com Rails, meio compatível com álgebra relacional, e não 100% nada). Mas isso fica pra um outro momento…

No post anterior, eu falei bastante sobre SQL, e sobre todas as coisas que podemos fazer ao saber montar uma query. A idéia agora é tentar montar, de fato, uma query, mas com mais do que apenas fragmentos SQL, mas com o próprio ORM.

Vamos pensar que temos uma tabela de usuários, e uma de números de telefones. O número pertence a um usuário, um usuário tem muitos números de telefone (nada de “join-tables” e coisas mais complexas por agora). Digamos que eu queira saber números de telefone possuem o mesmo prefixo (os primeiros quatro números-vamos ignorar, por hora, os nono dígito para deixar o código mais fácil) de um determinado número.

A idéia, num primeiro momento, é fazer o código para um único número. Vamos, por simplicidade, deixar isso na classe de Telephone mesmo:

class User < ActiveRecord::Base
  has_many :telephones
end

class Telephone < ActiveRecord::Base
  belongs_to :user
  
  def self.same_prefix_of(telephone)
    where('SUBSTR(telephones.number, 0, 5) = ?', telephone.number[0...4])
  end
end

#Para usar:
Telephone.same_prefix_of(Telephone.first)

Por hora, tudo bem. Um código simples, porém é agora que a coisa começa a ficar divertida: generalização
(more…)

Programadores Precisam Aprender SQL!

De uns tempos para cá, tenho vistos muitos códigos bizarros acontecendo no mundo dos ORMs, que eventualmente me deixaram pensando: será que ORMs estão fazendo-nos esquecer como se faz SQLs?

No mundo Ruby/Rails, virou quase uma regra não-escrita que escrever SQLs na mão é errado. Quanto mais pudermos aproveitar o ActiveRecord (e algumas mágicas que ele faz por nós), melhor. Isso acaba trazendo algumas coisas bem esquisitas, tal como achar que 90% dos códigos nunca vão precisar de “OR” (e maior, menor, diferente, etc), já que a API do ActiveRecord não suporta isso, ou algumas outras práticas meio estranhas. Do outro lado, temos por exemplo o framework “Play!”, de Scala, que faz o extremo oposto: não usa nenhum ORM, e todas as queries SQL passam a ser feitas na mão…

O ponto é que quando temos um ORM, parece que abusamos dele e esquecemos que nem tudo é um “SELECT * FROM <table> WHERE <attribute> = <value>”. Por exemplo, no código abaixo:

users = User.all
users.each do |user|
  puts &quot;User #{user.name} has #{user.addresses.count} addresses&quot;
end

Temos o famoso caso do “N+1”: 1 busca para achar N usuários, e a partir daí “N” buscas para achar a contagem de endereços. O problema é que muita gente acha que apenas um “User.include(:addresses)” resolve o problema, quando na verdade não resolve: o “include” vai trazer todos os endereços, mas a contagem (se o Rails optar por usar “count”) vai continuar sendo feita em banco, ou então (se o Rails optar por não usar “count”) será feita em Ruby, e teremos trazido registros a mais do banco de dados sem necessidade. A solução, nesse caso, é usar um comando SQL mesmo:

users = User.all
num_addresses = Address.join(:user).group('user.id').count
users.each do |user|
  puts &quot;User #{user.name} has #{num_addresses[user.id]} addresses&quot;
end

Ok, mas SQL é difícil, chato, e é mais fácil fazer as coisas em Ruby, certo? Bom, sim e não. É mais fácil fazer os comandos em Ruby (ou na linguagem que você escolher) puramente por “costume”. Estamos acostumados a fazer a sequencia: buscamos uma lista, tratamos a lista, exibimos a lista. SQL não trabalha com “listas”, mas com “conjuntos”. E é isso que vamos ver a seguir:
(more…)