Tuesday, 28 August 2007

Hierarchical data with Common Table Expressions

Today, I finally managed to crack a problem that's been bothering me for some time. Some weeks ago I started working on the database schema that the site will be using, and the Entropia auction system gave me some trouble.

You see, the auction in the game is built up hierarchically, and I eventually want to show individual items in the database under the auction section they belong to (as with a lot of items, it's unclear exacty where to find them).

That in itself was easy enough, but not if you want to store the data in third normal form, but at the same time want to use some form of a cookie-trail, so the user will know exactly where to go. (With cookie-trail I mean something along the lines of "Items \ Tools \ Misc Tools"). I chose to store my data in a table with an AuctionCategoryID, AuctionCategoryName, and an AuctionCategoryParentID, the latter of which represents a SELF-JOIN.





(The figure above doesn't actually represent my own table. I chose to give the very bottom nodes of the tree a ParentID of -1, and did not define -1 as an AuctionCategoryID. Hence, creating a Foreign Key is just not possible. The figure is just for clarification).

I first tried meddling with code a bit myself, but found that my code just didn't do the trick. Sure, I could hard-code a number of self-joins, but what if suddenly more categories were added? I'd have to alter my function. That was definitely not an option.

I tried a few more things, but found my code was not robust enough, and that there were sorting troubles (among other things).

I Googled a bit, and initially ended up with a solution like this (Joe Celko describes a similar method in his "SQL for Smarties"). Hardly elegant, and hardly dynamic (although it would be possible, of course, to ensure there would be a difference of a few numbers between each of the individual "nodes". However, that still potentially means I could run out of space, and would then have to manually alter stuff again. Not an option if you ask me).

Enter the recursive Common Table Expression. BOL doesn't seem very helpful, but hopefully the code below will make things easier to see.

The first query within the Common Table Expression defines the root level. As you see, the -1 value I mentioned above is hard-coded. The UNION ALL is the key to ensuring the self-join will work for an indefinte number of child nodes. The second query just fills in those child nodes.

Note that it is possible to potentially create an infinite loop, though CTEs also offer a way to cap this.

The code:

WITH AuctionPath -- The name of the CTE
(
[Name]
, [AuctionCategoryID]
, [Path]
, [Level]
)
AS
(
SELECT
AuctionCategoryName -- Name
, AuctionCategoryID -- AuctionCategoryID
, CONVERT(varchar(255), AuctionCategoryName) -- Path
, 1 -- Level
FROM
AuctionCategories
WHERE
AuctionCategoryParentID = -1 -- Only the parent nodes here
UNION ALL
SELECT
AuctionCategoryName -- Name
, AuctionCategories.AuctionCategoryID -- AuctionCategoryID
, CONVERT(varchar(255), Path + ' \ ' + AuctionCategoryName) -- Path
, [Level] + 1 -- Level
FROM
AuctionCategories
INNER JOIN AuctionPath ON AuctionCategories.AuctionCategoryParentID = AuctionPath.[AuctionCategoryID]
)
SELECT
[Name]
, [AuctionCategoryID]
, [Path]
, [Level]
FROM
AuctionPath
ORDER BY AuctionCategoryID
OPTION (MAXRECURSION 5); -- The maximum amount of recursions

Some things of note. There are certain conditions to recursive CTEs. From BOL:

  • The number of columns in the anchor and recursive members must be the same.
  • The data type of a column in the recursive member must be the same as the
    data type of the corresponding column in the anchor member.

In the quoted text, anchor is what I have refered to as root node, and recursive member is a child node.

I hope this helps a little if you're facing a situation where you need to do hierarchies. Once you manage to get the trick, there's nothing to it, really :)

Many thanks to both Kalman Toth and russellb over at the sqlmag forums.

No comments: