How does WITH work in MS SQL Server?
September 11, 2009 3:05 PM   Subscribe

Can someone help me understand how the WITH common table expression works in Microsoft SQL Server?

I'd like to learn more about how the WITH common table expression works in Microsoft SQL Server. I saw an example in the SQL Cookbook, but don't quite understand it. Can one of you SQL gurus break it down a bit for me?

Here's the data in the emp table:

deptno  ename
------  --------
    10  CLARK
    10  KING
    10  MILLER
    20  FORD
    20  ADAMS
    30  MARTIN
    30  BLAKE
    30  ALLEN
The WITH common table expression:

with x (deptno, cnt, names, empno, len) as
(
    select deptno, count(*) over (partition by deptno),
        cast(ename as varchar(100)),
        empno, 1
    from emp
    union all
    select x.deptno, x.cnt,
        cast(x.names + ',' + e.ename as varchar(100)),
        e.empno, x.len+1
    from emp e, x
    where e.deptno = x.deptno and e.empno > x.empno
)
select deptno, names
from x
where len = cnt
order by 1
The results:

deptno  names
------  ----------------------
    10  CLARK, KING, MILLER
    20  FORD, ADAMS
    30  MARTIN, BLAKE, ALLEN
posted by DakotaPaul to Computers & Internet (9 answers total) 4 users marked this as a favorite
 
You can think of it kind of like a temporary table that only lasts for one query and then goes away by itself.

"with x (deptno, cnt, names, empno, len)..." says to create a temporary table called "x", with the 5 column names shown in the brackets.

"... as ( big complicated query in here )" says that the temporary table will be filled by the results of the big complicated query.

Then in the immediately following select statement, you use x as though it was a real table. In fact it isn't, and it will disappear by itself as soon as this query finishes, but for the purposes of that one query, x works just like a table.

The example in your question is very good for obscuring the key point about what a common table expression is, so if this is the example they used to illustrate the concept in the book, they should be shot.
posted by FishBike at 3:38 PM on September 11, 2009


Response by poster: Thanks, FishBike. So what is the "big complicated query" doing? Why are there two queries in there, and what are they doing?

I didn't realize this was a bad example of a CTE. The book used this to show how get a comma-delimited column out of a set of rows.
posted by DakotaPaul at 4:12 PM on September 11, 2009


if this is the example they used to illustrate the concept in the book

As the name implies, the "SQL Cookbook" is a cookbook, and the example is taken from a recipe titled "Creating a Delimited List from Table Rows" which, like all recipes in the book, contains sample snippets for a number of database engines, and also includes a detailed description of how the snippet works.

they should be shot.

Because you don't know what a cookbook is?
posted by effbot at 4:16 PM on September 11, 2009


Why are there two queries in there, and what are they doing?

The recursive query is explained in the "Discussion" section for DB2 and SQL Server. In the copy of the book I'm looking at, that section is numbered 6.10.3.1.

(Basically, the first select pulls out the data, the second builds joined strings of varying lengths, and the select outside the with-clause filters out the strings with the right number of members. This is just one big hack to work around the fact that SQL Server doesn't have a group_concat function. See the book for a longer explanation.)
posted by effbot at 4:27 PM on September 11, 2009


I would add that the most important part of the WITH functionality is that the definition of the table can refer to itself recursively.

with example (thecol) as
(
select 1
union all
select thecol+1 from example
where thecol <> )
select * from example

Produces:

thecol
1
2
3
4
5


The union is really nothing special, however if the second part of the union refers back to the table being constructed it will continue to run (and produce rows) as long as its conditions are met.
posted by jockc at 4:34 PM on September 11, 2009


sorry I forgot to enter the '<' symbol correctly:


with example (thecol) as
(
select 1
union all
select thecol+1 from example
where thecol < 5 )
select * from example
posted by jockc at 4:36 PM on September 11, 2009


I didn't realize this was a bad example of a CTE. The book used this to show how get a comma-delimited column out of a set of rows.

It is an excellent example of how to do that, which happens to use a common table expression. I meant that it wasn't a great first example of common table expressions from which to learn what they are, because it's quite complicated.
posted by FishBike at 4:57 PM on September 11, 2009


Best answer: Let's break this down. First thing to know is that the "WITH x ([cols]) AS ( [...] )" sets up a temporary table that will be filled by some sort of recursion. The two parts of any recursion are the initial conditions and the recursive term. In the case of SQL, the initial condition is separated from the recursive term by "UNION ALL".

Here's the initial condition:
select
    deptno,
    count(*) over (partition by deptno) AS cnt,
    cast(ename as varchar(100)) AS names,
    empno,
    1 AS len
from emp
I've added some column aliases to make it clear what is happening. The only part you might get confused by here is the "count(*) over (partition by deptno)". This is called a "window clause/function", and it's basically a more powerful, generic "group by".

For recursive queries it's helpful to think of the stages as filling three temporary tables: working table, intermediate table, and the results table. I'm going to show the results table, marking rows that are in the working table with an "*". So for this step, your results table looks like this (filling in a value for empno because it seems to be missing):
deptno  cnt  names     empno  len
------  ---  --------  -----  ---
    10    3  CLARK         1    1*
    10    3  KING          2    1*
    10    3  MILLER        3    1*
    20    2  FORD          4    1*
    20    2  ADAMS         5    1*
    30    3  MARTIN        6    1*
    30    3  BLAKE         7    1*
    30    3  ALLEN         8    1*
Keep in mind that I'm actually showing you two tables here! Now, the recursive term:
select
    x.deptno,
    x.cnt,
    cast(x.names + ',' + e.ename as varchar(100)) AS names,
    e.empno,
    x.len+1 AS len
from emp e, x
where
    e.deptno = x.deptno and
    e.empno > x.empno
The recursive term operates on your working table, and stores its results in an intermediate table. Then, after each pass the intermediate table (1) gets appended to the results table, and (2) replaces the working table.

So the results table after the first pass would be (keep in mind we're doing a CROSS JOIN of the working table with the "emp" table):
deptno  cnt  names           empno  len
------  ---  --------------  -----  ---
    10    3  CLARK               1    1
    10    3  KING                2    1
    10    3  MILLER              3    1
    20    2  FORD                4    1
    20    2  ADAMS               5    1
    30    3  MARTIN              6    1
    30    3  BLAKE               7    1
    30    3  ALLEN               8    1
    10    3  CLARK, KING         2    2*
    10    3  CLARK, MILLER       3    2*
    10    3  KING, MILLER        3    2*
    20    2  FORD, ADAMS         5    2*
    30    3  MARTIN, BLAKE       7    2*
    30    3  MARTIN, ALLEN       8    2*
    30    3  BLAKE, ALLEN        8    2*
Now, for the second pass, CROSS JOIN the working table (rows with asterisks) with the "emp" table gives:
deptno  cnt  names                   empno  len
------  ---  ----------------------  -----  ---
    10    3  CLARK                       1    1
    10    3  KING                        2    1
    10    3  MILLER                      3    1
    20    2  FORD                        4    1
    20    2  ADAMS                       5    1
    30    3  MARTIN                      6    1
    30    3  BLAKE                       7    1
    30    3  ALLEN                       8    1
    10    3  CLARK, KING                 2    2
    10    3  CLARK, MILLER               3    2
    10    3  KING, MILLER                3    2
    20    2  FORD, ADAMS                 5    2
    30    3  MARTIN, BLAKE               7    2
    30    3  MARTIN, ALLEN               8    2
    30    3  BLAKE, ALLEN                8    2
    10    3  CLARK, KING, MILLER         3    3*
    30    3  MARTIN, BLAKE, ALLEN        8    3*

Now on the third pass, nothing happens. None of the rows meet the conditions of the recursive term, so the intermediate table is empty, and the SQL Server knows the recursion is finished.

Finally, we move to the last select, the one built from the working table we just created:
select deptno, names
from x
where len = cnt
order by 1
Basically, return the rows that contain the same number of names as we determined were in the department.
posted by sbutler at 7:55 PM on September 11, 2009 [2 favorites]



Finally, we move to the last select, the one built from the working table we just created

Oops. Replace that with: "Finally, we move to the last select, from the results table we just created"

posted by sbutler at 7:59 PM on September 11, 2009


« Older What are the odds that elephants will become...   |   What will Apple replace my computer with? Newer »
This thread is closed to new comments.