Join 3,382 readers in helping fund MetaFilter (Hide)


I haven't needed SQL this ugly before
December 19, 2010 4:07 AM   Subscribe

What SQL magic do I need to turn one column into several?

I need to print some tickets, each of which has enough room to hold one set of customer details and up to five items ordered by that customer. Customers who have ordered more than 5 items get multiple tickets. So from an orders table like this,
Customer | Item
---------|------
Bob      | FTMCH
Bob      | ZORP
Bob      | KLUGE
Carol    | FTMCH
Carol    | MEEP
Carol    | ZORP
Ted      | FOON
Ted      | SMOCK
Alice    | ORGO
Carol    | SQICK
Carol    | BLECH
Carol    | KLUGE
Carol    | GLURP
I need a query that returns this:
Customer | Item1 | Item2 | Item3 | Item4 | Item5
---------|-------|-------|-------|-------|------
Alice    | ORGO  | null  | null  | null  | null
Bob      | FTMCH | ZORP  | KLUGE | null  | null
Carol    | FTMCH | MEEP  | ZORP  | SQICK | BLECH
Carol    | KLUGE | GLURP | null  | null  | null
Ted      | FOON  | SMOCK | null  | null  | null
Can some kind soul help me with the SQL for this? HSQL embedded database in OpenOffice.org Base, if it makes a difference.
posted by flabdablet to Computers & Internet (14 answers total) 2 users marked this as a favorite
 
The function you need is 'decode', I'm not familier with HSQL so I can't help you with the syntax. But it should be easy enough to look in the help files.
posted by Confess, Fletch at 4:42 AM on December 19, 2010


DECODE

DECODE( value expr main, value expr match 1, value expr result 1 [...,] [, value expr default] )

DECODE takes at least 3 arguments. The value expr main is compared with value expr match 1 and if it matches, value expr result 1 is returned. If there are additional pairs of value expr match n and value expr result n, comparison is repeated until a match is found the result is returned. If no match is found, the value expr default is returned if it is specified, otherwise NULL is returned. The type of the return value is a combination of the types of the value expr result ... arguments. (HyperSQL)
So it looks to me like DECODE in SQL is kind of like a switch or case statement in procedural languages; but I'm too dim to see its immediate application here. What have I missed?
posted by flabdablet at 5:09 AM on December 19, 2010


I suggest you ask this over at stackoverflow.com
posted by SNACKeR at 5:44 AM on December 19, 2010


Done. But I bet I get better answers quicker from AskMe.
posted by flabdablet at 6:06 AM on December 19, 2010


On my phone so I can't do the whole thing, will check back when I'm home, but

You can probably do it with a subquery that adds a rank or rownumber column, then 5 case statements on a modulo of your new column and a group by.

Shame hsql doesn't have a pivot command like the newer mssql.
posted by ish__ at 7:11 AM on December 19, 2010


Preamble: none of this is tested, I just woke up, I've never used hsql only MSSql.

So, looks like theres no rank/rownum in hsql - but you can fake it with something like http://support.microsoft.com/kb/18613 (adapt to your table/hsql):
   select rank=count(*), a1.au_lname, a1.au_fname
   from authors a1, authors a2
   where a1.au_lname + a1.au_fname >= a2.au_lname + a2.au_fname
   group by a1.au_lname, a1.au_fname
   order by rank

Using that, get a subquery with your customer and item column and a new rank column in it - ordering the rows by customer asc, item asc. Hereafter this is "sq".
select
        sq.customer,       
        case when mod(rank,5) = 0 then max(item) end,
        case when mod(rank,5) = 1 then max(item) end,
        case when mod(rank,5) = 2 then max(item) end,
        case when mod(rank,5) = 3 then max(item) end,
        case when mod(rank,5) = 4 then max(item) end
from
        sq
group by
        sq.customer,
        divide_integers(rank,5)
I don't know what sort of fanciness you have to do in hsql to divide integers and get an integer results. In MSSql its a pain in the ass so we often have a function for it to cast them both to floats then drop the decimal part and convert back to integer. I assume you can duplicate that in hsql somehow.

Let me know if this works, doesn't work, doesn't make sense or if you have a followup and I'm happy to try to work with you til it does.
posted by ish__ at 7:30 AM on December 19, 2010


What you are trying to do looks like a crosstabulation, though you will have to add an artificial column like ish__ says - try googling for that.

Another approach would be to use GROUP_CONCAT to gather all the item codes for each customer, and then select again with a splitting function to get the tickets into different columns. I don't know how you would limit the grouping to five items per customer though, it looks like your dataset needs something like an item_number column to do this in any meaningful way.

I have to ask though, why do you want to transform your dataset like that? If it's a matter of printing the tickets, can't you do it with a grouped report with the customer info in the group header?
posted by Dr Dracator at 7:30 AM on December 19, 2010 [1 favorite]


ish__, that or something very like it looks like it will work. It's way past my bedtime but I will play with it in the morning and let you know. Thanks!

Dr Dracator, I'm replacing a back end, the existing front end expects to be fed a CSV file formatted like the query result above, and I'm under moderate time pressure for a usable first cut. In the fullness of time I will indeed be replacing the front end as well, at which point grouped reports are probably indeed the Right Thing.
posted by flabdablet at 7:51 AM on December 19, 2010


A similar question (not sql, though) was asked two weeks ago here.
posted by cnanderson at 8:04 AM on December 19, 2010


I just looked at the answer on stack overflow, and your response. Is there a way to do what that guy said by first running a query that results in this:

Customer | Item   | Count
---------|--------|------
Alice    | ORGO   |  1
Bob      | FTMCH  |  1
Bob      | ZORP   |  2
Bob      | KLUGE  |  3
Carol    | FTMCH  |  1
Carol    | MEEP   |  2
Carol    | ZORP   |  3
Carol    | SQICK  |  4
Carol    | BLECH  |  5
Carol    | KLUGE  |  6
Carol    | GLURP  |  7
Ted      | FOON   |  1
Ted      | SMOCK  |  2

posted by gjc at 9:39 AM on December 19, 2010


OK, this works well enough:
SELECT
	"Customer",
	MAX(CASE WHEN "Slot" = 0 THEN "Item" END) AS "Item1",
	MAX(CASE WHEN "Slot" = 1 THEN "Item" END) AS "Item2",
	MAX(CASE WHEN "Slot" = 2 THEN "Item" END) AS "Item3",
	MAX(CASE WHEN "Slot" = 3 THEN "Item" END) AS "Item4",
	MAX(CASE WHEN "Slot" = 4 THEN "Item" END) AS "Item5"
FROM (
	SELECT
		l."Customer" AS "Customer",
		l."Item" AS "Item",
		COUNT(r."Item") / 5 AS "Ticket",
		MOD(COUNT(r."Item"), 5) AS "Slot"
	FROM "Orders" AS l
	LEFT JOIN "Orders" AS r
	ON r."Customer" = l."Customer" AND r."Item" < l."Item"
	GROUP BY "Customer", "Item"
)
GROUP BY "Customer", "Ticket"
ORDER BY "Customer", "Ticket"
It makes this:
Customer | Item1 | Item2 | Item3 | Item4 | Item5 
---------|-------|-------|-------|-------|-------
Alice    | ORGO  |       |       |       |       
Bob      | FTMCH | KLUGE | ZORP  |       |       
Carol    | BLECH | FTMCH | GLURP | KLUGE | MEEP  
Carol    | SQICK | ZORP	 |       |       |       
Ted      | FOON  | SMOCK |       |       |       
Thanks to all who helped, both here and at Stack Overflow.
posted by flabdablet at 7:37 AM on December 20, 2010


Jesus, this just gets worse :-(

Turns out the business rules allow the same customer to order the same item on multiple occasions, and that all outstanding orders are to be included on the one set of tickets. So my toy table should have looked more like this:
ID  | Customer | Item 
159 | Bob      | FTMCH
264 | Bob      | ZORP 
265 | Bob      | KLUGE
288 | Carol    | FTMCH
314 | Carol    | MEEP 
323 | Carol    | ZORP 
327 | Ted      | FOON 
338 | Ted      | SMOCK
358 | Alice    | ORGO 
419 | Carol    | SQICK
716 | Carol    | MEEP 
846 | Carol    | BLECH
939 | Carol    | MEEP 
950 | Carol    | GLURP
979 | Carol    | KLUGE
Carol's multiple MEEPs bugger the ranking logic in the original solution, and I've ended up with the following hideous monster:
SELECT
	"Customer",
	MAX(CASE WHEN "Slot" = 0 THEN "Item" END) AS "Item0",
	MAX(CASE WHEN "Slot" = 1 THEN "Item" END) AS "Item1",
	MAX(CASE WHEN "Slot" = 2 THEN "Item" END) AS "Item2",
	MAX(CASE WHEN "Slot" = 3 THEN "Item" END) AS "Item3",
	MAX(CASE WHEN "Slot" = 4 THEN "Item" END) AS "Item4",
	MAX(CASE WHEN "Slot" = 0 THEN "Quantity" END) AS "Qty0",
	MAX(CASE WHEN "Slot" = 1 THEN "Quantity" END) AS "Qty1",
	MAX(CASE WHEN "Slot" = 2 THEN "Quantity" END) AS "Qty2",
	MAX(CASE WHEN "Slot" = 3 THEN "Quantity" END) AS "Qty3",
	MAX(CASE WHEN "Slot" = 4 THEN "Quantity" END) AS "Qty4"
FROM (
	SELECT
		"Customer",
		"Item",
		COUNT("ID") AS "Quantity",
		"Rank" / 5 AS "Ticket",
		MOD("Rank", 5) AS "Slot"
	FROM (
		SELECT
			main."ID" AS "ID",
			main."Customer" AS "Customer",
			main."Item" AS "Item",
			COUNT(less."Item") AS "Rank"
		FROM "Orders" AS main
		LEFT JOIN (
			SELECT DISTINCT
				"Customer",
				"Item"
			FROM "Orders") AS less
		ON less."Customer" = main."Customer" AND less."Item" < main."Item"
		GROUP BY "Customer", "Item", "ID"
	)
	GROUP BY "Customer", "Item", "Rank"
)
GROUP BY "Customer", "Ticket"
which makes this:
Customer | Item0 | Item1 | Item2 | Item3 | Item | Qty0  | Qty1 | Qty2 | Qty3 | Qty3 | Qty4
Bob      | FTMCH | KLUGE | ZORP  |       |      | 1     | 1    | 1    | 0    | 0    | 0   
Carol    | BLECH | FTMCH | GLURP | KLUGE | MEEP | 1     | 1    | 1    | 1    | 1    | 3   
Carol    | SQICK | ZORP  |       |       |      | 1     | 1    | 0    | 0    | 0    | 0   
Ted      | FOON  | SMOCK |       |       |      | 1     | 1    | 0    | 0    | 0    | 0   
Alice    | ORGO  |       |       |       |      | 1     | 0    | 0    | 0    | 0    | 0   
It does the job, I guess, but I'm feeling pretty lucky that the database involved is always going to be quite small (a few thousand rows).

Spiritually, I'm an embedded-systems guy, not a database guy. Can anybody who does this for a living tell me whether this kind of nonsense is common? Would a query with four nested SELECTs and a LEFT JOIN merit a mention on the Daily WTF?
posted by flabdablet at 12:31 AM on December 21, 2010


Not really doing this for a living, but I can tell you I have seen much, much worse in the wild.
posted by Dr Dracator at 5:34 AM on December 21, 2010


flabadablet: the question in Dr Dracator's first response, the "why are you doing it like this at all? shouldn't your front-end do formatting stuff?" I ask that *every single day* at work. I've seen guys write a 10 page select statement with grouping 6 or 7 levels deep and have it crash the SQL server and wonder why? Much, much, MUCH worse exists in the wild for companies you wouldn't even think about.
posted by ish__ at 2:12 PM on December 21, 2010


« Older Extremely specific UK train qu...   |  Please teach me how to automat... Newer »
This thread is closed to new comments.