View Single Post

   
  #8 (permalink)  
Old 02-27-2008, 07:38 AM
jefftyzzer
 
Posts: n/a
Default Re: Trees, recursion, and grouping

Thank you, Joe--I look forward to digging into your suggestion, and
appreciate your considered reply to my posting.

--Jeff

--CELKO-- wrote:
> In a DB2 UDB LUW table, I have a table with pairs of equivalent ID's.
> What I want to do is assign all equivalent IDs to the same group
> number, including those that are transitively related, i.e., if A = B
> and B = C then A = C, so I'd group all three together.
>
> Although they're not related in a composition fashion per se, it seems
> like the way to go conceptually is to consider the ID relationships as
> a reporting tree (ID_A would be the manager and ID_B would be the
> employee) and assign all IDs that share the same root to the same
> group.
>
> For instance, let's say I have the following pairs
>
> ID_A ID_B
> ---- ----
> 1800 1804
> 1800 1808
> 1806 1809
> 1808 1810
> 1808 1812
> 1809 1815
> 1810 1811
>
> I'd have two trees (sideways):
>
> 1800 1804
> 1808
> 1810
> 1811
> 1812
> and
>
> 1806
> 1809
> 1815
>
> I'm struggling with the following:
>
> 1. How to group based on a shared *root* (I'd hate to have to build a
> chain column, e.g., for 1811: "1800-->1808-->1810" and do something
> like DENSE_RANK() OVER(ORDER BY SUBSTR(CHAIN,1,4)--that seems
> unreliable, and the ID is not always the same length)
>
> 2. How to write a recursive CTE that accomodates multiple,
> independent,
> trees.
>
> >> I feel like I'm close--I've read Serge's "CONNECT BY" article and Molinaro's chapter "Hierarchical Queries" in his _SQL Cookbook_, but I'm just not able to stitch it all together. <<

>
> You should have been reading Celko's TREES & HIERARCHIES IN SQL instead
> !
>
> What you have is a forest, not a tree and you can do this with a nested
> sets model
>
> CREATE TABLE FoobarForest
> (foobar_id INTEGER NOT NULL PRIMARY KEY, --assumption
> tree_nbr INTEGER NOT NULL,
> lft INTEGER NOT NULL CHECK (lft > 0),
> rgt INTEGER NOT NULL,
> CHECK (lft < rgt),
> UNIQUE (tree_nbr, lft));
>
> Instead of a recursive CTE, pick the roots and use a simple push-down
> stack for tree traversal. Here is a SQL/PSM version for one tree:
>
> - Tree holds the adjacency model
> CREATE TABLE Tree
> (node CHAR(10) NOT NULL,
> parent CHAR(10));
>
> -- Stack starts empty, will holds the nested set model
> CREATE TABLE Stack
> (stack_top INTEGER NOT NULL,
> node CHAR(10) NOT NULL,
> lft INTEGER,
> rgt INTEGER);
>
> CREATE PROCEDURE TreeTraversal ()
> LANGUAGE SQL
> DETERMINISTIC
> BEGIN ATOMIC
> DECLARE counter INTEGER;
> DECLARE max_counter INTEGER;
> DECLARE current_top INTEGER;
>
> SET counter = 2;
> SET max_counter = 2 * (SELECT COUNT(*) FROM Tree);
> SET current_top = 1;
>
> --clear the stack
> DELETE FROM Stack;
>
> -- push the root
> INSERT INTO Stack
> SELECT 1, node, 1, max_counter
> FROM Tree
> WHERE parent IS NULL;
>
> -- delete rows from tree as they are used
> DELETE FROM Tree WHERE parent IS NULL;
>
> WHILE counter <= max_counter- 1
> DO IF EXISTS (SELECT *
> FROM Stack AS S1, Tree AS T1
> WHERE S1.node = T1.parent
> AND S1.stack_top = current_top)
> THEN BEGIN -- push when top has subordinates and set lft value
> INSERT INTO Stack
> SELECT (current_top + 1), MIN(T1.node), counter, NULL
> FROM Stack AS S1, Tree AS T1
> WHERE S1.node = T1.parent
> AND S1.stack_top = current_top;
>
> -- delete rows from tree as they are used
> DELETE FROM Tree
> WHERE node = (SELECT node
> FROM Stack
> WHERE stack_top = current_top + 1);
> -- housekeeping of stack pointers and counter
> SET counter = counter + 1;
> SET current_top = current_top + 1;
> END;
> ELSE
> BEGIN -- pop the stack and set rgt value
> UPDATE Stack
> SET rgt = counter,
> stack_top = -stack_top -- pops the stack
> WHERE stack_top = current_top;
> SET counter = counter + 1;
> SET current_top = current_top - 1;
> END;
> END IF;
> END WHILE;
> -- SELECT node, lft, rgt FROM Stack;
> -- the top column is not needed in the final answer
> -- move stack contents to new tree table
> END;


Reply With Quote