This sql query should be simple...
August 2, 2007 9:41 PM
I'm trying to merge simple contact data from a bunch of single tables into one big text (csv) dump. The problem is that not all fields are present for each user. e.g. one user may be missing an email. Because of the missing field, my current (big and ugly) sql query will not return all the rows.
This seems like a simple problem, but I'm stuck. Help!
Here's a simplified version of what I've got:
What I want my query to output:
(notice there is no email for uid 1)
What is the sql to do this? btw, i'm using mysql.
This seems like a simple problem, but I'm stuck. Help!
Here's a simplified version of what I've got:
table: users
columns: uid, username
data: 1, bill
data: 2, joe
table: profile_fields
columns: fid, name
data: 1, Phone
data: 2, email
table: profile_values
columns: uid, fid, value
data: 1, 1, 555-1212
data: 2, 1, 999-0000
data: 2, 2, foo@foo.com
What I want my query to output:
uid, username, phone, email
1, bill, 555-1212,
2, joe, 999-0000, foo@foo.com
(notice there is no email for uid 1)
What is the sql to do this? btw, i'm using mysql.
You definitely want to use an outer join. Although if possible, I would suggest reworking everything into one table, it will save a lot of server overhead and be a lot easier to work with.
posted by charmston at 11:20 PM on August 2, 2007
posted by charmston at 11:20 PM on August 2, 2007
Specifically, you want to do a right outer join, which includes rows with NULL results.
posted by Blazecock Pileon at 12:07 AM on August 3, 2007
posted by Blazecock Pileon at 12:07 AM on August 3, 2007
I'm not sure the previous answerers looked at your sample tables... Except for "username", the field names are not columns in any table; they are data. This is not really what SQL is for; this should be done in a stored procedure, or at the application level. Get the field names and IDs, then construct a query like
posted by nicwolff at 12:43 AM on August 3, 2007
select *, phone, email
from users
left join (
select u_id, value as phone from profile_values where f_id = 1
) subq1 on users.u_id = subq1.u_id
left join (
select u_id, value as e_mail from profile_values where f_id = 2
) subq2 on users.u_id = subq2.u_id
order by user_id;
posted by nicwolff at 12:43 AM on August 3, 2007
Your SQL is the problem. Why not include it?
The "profile_fields" table is kinda weird. You're turning your data 90 degrees right there. If you have power over it, change that. It would be so much easier if you instead had
table: users
columns: uid, username
data: 1, bill
data: 2, joe
table: profile_values
columns: uid, phone_value, email_value
data: 1, 555-1212, NULL
data: 2, 999-0000, foo@foo.com
posted by cmiller at 5:59 AM on August 3, 2007
The "profile_fields" table is kinda weird. You're turning your data 90 degrees right there. If you have power over it, change that. It would be so much easier if you instead had
table: users
columns: uid, username
data: 1, bill
data: 2, joe
table: profile_values
columns: uid, phone_value, email_value
data: 1, 555-1212, NULL
data: 2, 999-0000, foo@foo.com
posted by cmiller at 5:59 AM on August 3, 2007
My SQL is no nasty, I was embarrassed. But here goes...
Result:
The database schema is a given. (The only benefit is that it is flexible if new fields need to be added in. But it makes the queries an order of magnitude more complicated. And yes, I would rather just jam everything in 'users' and do a
nicwolf: yeah, that does it! (Now I just have to figure out what exactly is going on there.)
posted by kamelhoecker at 7:07 AM on August 3, 2007
SELECT u.uid as uid, u.name as name, v.value as email, v1.value as phone
FROM users u, profile_values v, profile_values v1
WHERE
u.uid = v.uid AND
u.uid = v1.uid AND
v.fid = (SELECT fid FROM profile_fields where name = 'email') AND
v1.fid = (SELECT fid FROM profile_fields where name = 'Phone')
Result:
+-----+------+-------------+----------+
| uid | name | email | phone |
+-----+------+-------------+----------+
| 2 | joe | foo@foo.com | 999-0000 |
+-----+------+-------------+----------+
The database schema is a given. (The only benefit is that it is flexible if new fields need to be added in. But it makes the queries an order of magnitude more complicated. And yes, I would rather just jam everything in 'users' and do a
SELECT * FROM users
.nicwolf: yeah, that does it! (Now I just have to figure out what exactly is going on there.)
posted by kamelhoecker at 7:07 AM on August 3, 2007
SELECT u.uid as uid, u.name as name, v.value as email, v1.value as phone
FROM users u, profile_values v, profile_values v1
WHERE
u.uid = v.uid AND
u.uid = v1.uid AND
v.fid = (SELECT fid FROM profile_fields where name = 'email') AND
v1.fid = (SELECT fid FROM profile_fields where name = 'Phone')
Yeah, that's making the cartesian product of all rows of all three tables. 2 × 2 × 3. Unfortunately, you want a row that can't exist in it. That last 3 has to be at least "4" -- where the last row is
"data: 1, 2, NULL"
. See? You don't select items that don't exist.
So, part of a correct SELECT has to ask for email regardless of whether there's a value in the data table.
Dig it?
posted by cmiller at 11:00 AM on August 3, 2007
FROM users u, profile_values v, profile_values v1
WHERE
u.uid = v.uid AND
u.uid = v1.uid AND
v.fid = (SELECT fid FROM profile_fields where name = 'email') AND
v1.fid = (SELECT fid FROM profile_fields where name = 'Phone')
Yeah, that's making the cartesian product of all rows of all three tables. 2 × 2 × 3. Unfortunately, you want a row that can't exist in it. That last 3 has to be at least "4" -- where the last row is
"data: 1, 2, NULL"
. See? You don't select items that don't exist.
So, part of a correct SELECT has to ask for email regardless of whether there's a value in the data table.
Dig it?
posted by cmiller at 11:00 AM on August 3, 2007
To produce CSV output in SQL you need to select the columns and concatenate each with a comma.
For simplicity I'm missing out the joins
In Oracle:
Select col1 || ',' , col2 || ',' , col3 || ',' from MyTable;
In SQL Server:
Select col1 + ',' , col2 + ',' , col3 + ',' from MyTable;
This will give you output like this which you can load straight into a spreadsheet
1, bill, 555-1212, ,
2, joe, 999-0000, foo@foo.com,
3, , 444-555,bar@bar.com,
4, joe, , foo@foo.com,
posted by Lanark at 12:21 PM on August 3, 2007
For simplicity I'm missing out the joins
In Oracle:
Select col1 || ',' , col2 || ',' , col3 || ',' from MyTable;
In SQL Server:
Select col1 + ',' , col2 + ',' , col3 + ',' from MyTable;
This will give you output like this which you can load straight into a spreadsheet
1, bill, 555-1212, ,
2, joe, 999-0000, foo@foo.com,
3, , 444-555,bar@bar.com,
4, joe, , foo@foo.com,
posted by Lanark at 12:21 PM on August 3, 2007
For MySQL it would be something like
SELECT concat(IFNULL(col1,') ',') , concat(IFNULL(col2,') ','), concat(IFNULL(col3,') ',') from MyTable;
posted by Lanark at 12:51 PM on August 3, 2007
SELECT concat(IFNULL(col1,') ',') , concat(IFNULL(col2,') ','), concat(IFNULL(col3,') ',') from MyTable;
posted by Lanark at 12:51 PM on August 3, 2007
Oops that mangled the double quotes, lets try adding an extra space:
MySQL
SELECT concat(IFNULL(col1,' ') ',') , concat(IFNULL(col2,' ') ','), concat(IFNULL(col3,' ') ',') from MyTable;
posted by Lanark at 12:55 PM on August 3, 2007
MySQL
SELECT concat(IFNULL(col1,' ') ',') , concat(IFNULL(col2,' ') ','), concat(IFNULL(col3,' ') ',') from MyTable;
posted by Lanark at 12:55 PM on August 3, 2007
Well, if you're using MySQL, you could just insert them into a table with engine_type "csv" and then copy away the database file when you're finished. :)
posted by cmiller at 2:53 PM on August 3, 2007
posted by cmiller at 2:53 PM on August 3, 2007
kamelhoecker: what my query does (well, if you fix my typos!) is to use subqueries to create temporary tables that map the u_id to the value for each field. I called them subq1 and subq2 out of habit; it would be clearer if we called them "user_phone" and "user_email" . So then all we do is to left join the users table to each of these tables and select each field's value.
You're going to end up having to build this query lexically, adding one subquery for each record in profile_fields. Running them once and joining to them is a lot faster than putting subqueries in the SELECT clause where they'd be run for every row in the users table.
There might be some really clever way to do this with a single SQL query. I mostly work with PostgreSQL and can't think of any way there; maybe one of the MySQL users here knows a trick.
posted by nicwolff at 3:28 PM on August 3, 2007
You're going to end up having to build this query lexically, adding one subquery for each record in profile_fields. Running them once and joining to them is a lot faster than putting subqueries in the SELECT clause where they'd be run for every row in the users table.
There might be some really clever way to do this with a single SQL query. I mostly work with PostgreSQL and can't think of any way there; maybe one of the MySQL users here knows a trick.
posted by nicwolff at 3:28 PM on August 3, 2007
This thread is closed to new comments.
posted by sanko at 10:09 PM on August 2, 2007