Recently, I saw people migrating from Awesome Nested Set to Ancestry. The reasons are simple – Ancestry is very simple, it just need a new string field in your table, and it’s easier to reason about. So, why should I even consider to use an alternative?
The answer is simple, and I’m going to quote H. L. Mencken: For every complex problem there is an answer that is clear, simple, and wrong.
And the reason that using Ancestry is a bad idea is simple: it doesn’t reflect good design. It treats a single field as a multi-valored column that keeps the ancestry of your object. This means that there’s no way to fetch all records AND their parents in a single query, or fetch all records AND their children, because we need to split the string (ruby-side) and then create a query (also ruby-side). So, in this post, I’ll show some tricks of what we can do with Awesome Nested Set, or even beter, how does it implements the tree pattern (and why it calls it a “set”, not a “tree”).
First, when we think about categories and sub-categories, we think like this:
But this is, in fact, a terrible way of representing trees in SQL. So, the alternative, is to transform it in a group of sets: Parent 1 (P1
, for short), is a super-set containing subsets C1
and C2
(Children 1 and Children 2, for short). Each of the children have its own grandchildren, so C1
is a super-set containing G1
, and C2
is a super-set containing G2
and G3
.
We, somehow, need to make our code understand this new format. A simple way to do it is to assign each node two informations: “what’s my position on the left” and “what’s my position on the right”, or lft
and rgt
as fields (not left
or right
because MySQL screams when we try to use these names to the fields). In the picture below, we see every field with left and right already defined (right is the orage-ish color):
So, we’ll transport it to our database:
+--------+--------+-------+ | NAME | LFT | RGT | +--------+--------+-------+ | P1 | 1 | 12 | | C1 | 2 | 5 | | G1 | 3 | 4 | | C2 | 6 | 11 | | G2 | 7 | 8 | | G3 | 9 | 10 | | P2 | 13 | 18 | | C3 | 14 | 17 | | G4 | 15 | 16 | +--------+--------+-------+
The GEM will also add parent_id
as a required field, but it is completely unnecessary as we’ll see. Now, querying it without any ORDER BY
clause is probably not a good idea, so we’ll not do it. But there are orders for anything: if we want to query for all nodes, and that the ones that appear first will be parents of the ones that appear after then, we can ORDER BY lft
:
SELECT * FROM categories ORDER BY lft
ID | NAME | LFT | RGT 1 | P1 | 1 | 12 3 | C1 | 2 | 5 2 | G1 | 3 | 4 4 | C2 | 6 | 11 6 | G2 | 7 | 8 5 | G3 | 9 | 10 7 | P2 | 13 | 18 9 | C3 | 14 | 17 8 | G4 | 15 | 16
Ok, so if we want all ancestors of a specific node, we can just query it in a way that left and right stays between the left and right of the parent. For example, to get all children of C2, we can use the following query:
SELECT * FROM categories WHERE lft > 6 -- 6 is "lft" of C2 AND rgt < 11 -- 11 is "rgt" of C2 ORDER BY lft
ID | NAME | LFT | RGT 6 | G2 | 7 | 8 5 | G3 | 9 | 10
If we want to query for every leaf node in the tree, no problem too: we just need to remember that if a leaf node have no children, it’ll have it’s rgt
equals to lft
, incremented by one:
SELECT * FROM categories WHERE rgt = lft + 1
ID | NAME | LFT | RGT 2 | G1 | 3 | 4 5 | G3 | 9 | 10 6 | G2 | 7 | 8 8 | G4 | 15 | 16
To get all ancestors of a specific node, you query for nodes where lft
is lower and rgt
is greater: in this case, let’s find all ancestors of node G3 (lft = 9
and rgt = 10
):
SELECT * FROM categories WHERE lft < 9 AND rgt > 10
ID | NAME | LFT | RGT 1 | P1 | 1 | 12 4 | C2 | 6 | 11
And, finally, if you want all root nodes, you just need to query for any node that have no parents – that is, where there not exists a node where lft
is lower and rgt
is greater:
SELECT * FROM categories WHERE NOT EXISTS (SELECT id FROM categories parents WHERE parents.lft < categories.lft AND parents.rgt > categories.rgt)
ID | NAME | LFT | RGT 1 | P1 | 1 | 12 7 | P2 | 13 | 18
Also, this is called Anti-Join, and every good database can optimize it to use indexes and to plan the query the correct way (that means, you’ll not have performance issues if you’re trying to use this query on your code).