This sql query should be simple...
August 2, 2007 9:41 PM

I'm trying to merge simple contact data from a bunch of single tables into one big text (csv) dump. The problem is that not all fields are present for each user. e.g. one user may be missing an email. Because of the missing field, my current (big and ugly) sql query will not return all the rows.

This seems like a simple problem, but I'm stuck. Help!

Here's a simplified version of what I've got:

table: users
columns: uid, username
data: 1, bill
data: 2, joe

table: profile_fields
columns: fid, name
data: 1, Phone
data: 2, email

table: profile_values
columns: uid, fid, value
data: 1, 1, 555-1212
data: 2, 1, 999-0000
data: 2, 2, foo@foo.com

What I want my query to output:


uid, username, phone, email
1, bill, 555-1212,
2, joe, 999-0000, foo@foo.com

(notice there is no email for uid 1)

What is the sql to do this? btw, i'm using mysql.
posted by kamelhoecker to Computers & Internet (13 answers total) 1 user marked this as a favorite
Using outer joins will return all of the rows. Or you coud do something like
case when isnull(email) = 1 then 'N/A' else email end

posted by sanko at 10:09 PM on August 2, 2007


You definitely want to use an outer join. Although if possible, I would suggest reworking everything into one table, it will save a lot of server overhead and be a lot easier to work with.
posted by charmston at 11:20 PM on August 2, 2007


Specifically, you want to do a right outer join, which includes rows with NULL results.
posted by Blazecock Pileon at 12:07 AM on August 3, 2007


I'm not sure the previous answerers looked at your sample tables... Except for "username", the field names are not columns in any table; they are data. This is not really what SQL is for; this should be done in a stored procedure, or at the application level. Get the field names and IDs, then construct a query like

select *, phone, email
from users
 left join (
  select u_id, value as phone from profile_values where f_id = 1
 ) subq1 on users.u_id = subq1.u_id
 left join (
  select u_id, value as e_mail from profile_values where f_id = 2
 ) subq2 on users.u_id = subq2.u_id
order by user_id;

posted by nicwolff at 12:43 AM on August 3, 2007


Whoops, sorry: select users.*, phone, email...

posted by nicwolff at 12:46 AM on August 3, 2007


Your SQL is the problem. Why not include it?

The "profile_fields" table is kinda weird. You're turning your data 90 degrees right there. If you have power over it, change that. It would be so much easier if you instead had

table: users
columns: uid, username
data: 1, bill
data: 2, joe

table: profile_values
columns: uid, phone_value, email_value
data: 1, 555-1212, NULL
data: 2, 999-0000, foo@foo.com
posted by cmiller at 5:59 AM on August 3, 2007


My SQL is no nasty, I was embarrassed. But here goes...


SELECT u.uid as uid, u.name as name, v.value as email, v1.value as phone
FROM users u, profile_values v, profile_values v1
WHERE
u.uid = v.uid AND
u.uid = v1.uid AND
v.fid = (SELECT fid FROM profile_fields where name = 'email') AND
v1.fid = (SELECT fid FROM profile_fields where name = 'Phone')


Result:


+-----+------+-------------+----------+
| uid | name | email | phone |
+-----+------+-------------+----------+
| 2 | joe | foo@foo.com | 999-0000 |
+-----+------+-------------+----------+


The database schema is a given. (The only benefit is that it is flexible if new fields need to be added in. But it makes the queries an order of magnitude more complicated. And yes, I would rather just jam everything in 'users' and do a SELECT * FROM users.

nicwolf: yeah, that does it! (Now I just have to figure out what exactly is going on there.)
posted by kamelhoecker at 7:07 AM on August 3, 2007


SELECT u.uid as uid, u.name as name, v.value as email, v1.value as phone
FROM users u, profile_values v, profile_values v1
WHERE
u.uid = v.uid AND
u.uid = v1.uid AND
v.fid = (SELECT fid FROM profile_fields where name = 'email') AND
v1.fid = (SELECT fid FROM profile_fields where name = 'Phone')


Yeah, that's making the cartesian product of all rows of all three tables. 2 × 2 × 3. Unfortunately, you want a row that can't exist in it. That last 3 has to be at least "4" -- where the last row is
"data: 1, 2, NULL"
. See? You don't select items that don't exist.

So, part of a correct SELECT has to ask for email regardless of whether there's a value in the data table.

Dig it?
posted by cmiller at 11:00 AM on August 3, 2007


To produce CSV output in SQL you need to select the columns and concatenate each with a comma.

For simplicity I'm missing out the joins

In Oracle:
Select col1 || ',' , col2 || ',' , col3 || ',' from MyTable;

In SQL Server:
Select col1 + ',' , col2 + ',' , col3 + ',' from MyTable;

This will give you output like this which you can load straight into a spreadsheet

1, bill, 555-1212, ,
2, joe, 999-0000, foo@foo.com,
3, , 444-555,bar@bar.com,
4, joe, , foo@foo.com,
posted by Lanark at 12:21 PM on August 3, 2007


For MySQL it would be something like

SELECT concat(IFNULL(col1,') ',') , concat(IFNULL(col2,') ','), concat(IFNULL(col3,') ',') from MyTable;
posted by Lanark at 12:51 PM on August 3, 2007


Oops that mangled the double quotes, lets try adding an extra space:
MySQL
SELECT concat(IFNULL(col1,' ') ',') , concat(IFNULL(col2,' ') ','), concat(IFNULL(col3,' ') ',') from MyTable;
posted by Lanark at 12:55 PM on August 3, 2007


Well, if you're using MySQL, you could just insert them into a table with engine_type "csv" and then copy away the database file when you're finished. :)
posted by cmiller at 2:53 PM on August 3, 2007


kamelhoecker: what my query does (well, if you fix my typos!) is to use subqueries to create temporary tables that map the u_id to the value for each field. I called them subq1 and subq2 out of habit; it would be clearer if we called them "user_phone" and "user_email" . So then all we do is to left join the users table to each of these tables and select each field's value.

You're going to end up having to build this query lexically, adding one subquery for each record in profile_fields. Running them once and joining to them is a lot faster than putting subqueries in the SELECT clause where they'd be run for every row in the users table.

There might be some really clever way to do this with a single SQL query. I mostly work with PostgreSQL and can't think of any way there; maybe one of the MySQL users here knows a trick.
posted by nicwolff at 3:28 PM on August 3, 2007


« Older Finding the number of the beast   |   Remote Control Dragonfly? Newer »
This thread is closed to new comments.