SQL Query to find unique combinations
March 14, 2012 1:42 PM Subscribe
SQLNoobFilter: I am trying to find all unique combinations on a table. Does my simple query do the trick or is there more to it?
For the purpose of this question, say I have a simple table with two variables, Letter and Name. I am trying to output the unique combinations of both Letter and Name so I can have one instance for every combination that is on the table.
My code is as follows: select distinct(letter),name from TableA
Will this get me to the output desired on the bottom?
Example TableA:
Letter Name
A Bob
A Larry
A Bob
B Bob
B Larry
C Larry
D Larry
D Larry
Output:
Letter Name
A Bob
A Larry
B Bob
B Larry
C Larry
D Larry
Thanks in advance -
For the purpose of this question, say I have a simple table with two variables, Letter and Name. I am trying to output the unique combinations of both Letter and Name so I can have one instance for every combination that is on the table.
My code is as follows: select distinct(letter),name from TableA
Will this get me to the output desired on the bottom?
Example TableA:
Letter Name
A Bob
A Larry
A Bob
B Bob
B Larry
C Larry
D Larry
D Larry
Output:
Letter Name
A Bob
A Larry
B Bob
B Larry
C Larry
D Larry
Thanks in advance -
Seconding deezil,
select distinct letter, name from tablea
posted by teragram at 2:56 PM on March 14, 2012
select distinct letter, name from tablea
posted by teragram at 2:56 PM on March 14, 2012
Distinct will work. A better way which opens up more options requires you to have an identity/unique identifier on the field, call it id:
id letter name
1 A Bob
2 A Larry
3 A Bob
4 B Bob
5 B Larry
6 C Larry
7 D Larry
8 D Larry
Then make a subquery taking the min(id) grouped on whatever you want to dedupe on, and left join it back to the main table:
select a.*, b.minid
from TableA left join ( select min(id) minid, Letter, Name from TableA group by Letter, Name) MinTableA
on TableA.id = MinTableA.minid
Results will be:
1 A Bob 1
2 A Larry 2
3 A Bob NULL
4 B Bob 4
5 B Larry 5
6 C Larry 6
7 D Larry 7
8 D Larry NULL
If you just want the uniques, only pick where it matches to the min:
select a.*, b.minid
from TableA left join ( select min(id) minid, Letter, Name from TableA group by Letter, Name) MinTableA
on TableA.id = MinTableA.minid
where MinTableA.minid is not null
This approach lets you isolate the exact records that are duplicates if you want to clean them up or report on them, they are the records on TableA where the id is not equal to the MinID.
posted by spatula at 4:33 PM on March 14, 2012
id letter name
1 A Bob
2 A Larry
3 A Bob
4 B Bob
5 B Larry
6 C Larry
7 D Larry
8 D Larry
Then make a subquery taking the min(id) grouped on whatever you want to dedupe on, and left join it back to the main table:
select a.*, b.minid
from TableA left join ( select min(id) minid, Letter, Name from TableA group by Letter, Name) MinTableA
on TableA.id = MinTableA.minid
Results will be:
1 A Bob 1
2 A Larry 2
3 A Bob NULL
4 B Bob 4
5 B Larry 5
6 C Larry 6
7 D Larry 7
8 D Larry NULL
If you just want the uniques, only pick where it matches to the min:
select a.*, b.minid
from TableA left join ( select min(id) minid, Letter, Name from TableA group by Letter, Name) MinTableA
on TableA.id = MinTableA.minid
where MinTableA.minid is not null
This approach lets you isolate the exact records that are duplicates if you want to clean them up or report on them, they are the records on TableA where the id is not equal to the MinID.
posted by spatula at 4:33 PM on March 14, 2012
A better way which opens up more options
I don't know... This takes longer to write, is harder to read, and performs (much) worse, and for what? It doesn't solve the problem at hand any better.
One thing I've learned writing SQL is to keep it simple: Solve the problem you set out to solve and move on. More just causes headaches. (In this particular case, for example, if you wanted to select or delete duplicates, there are better ways; laying the groundwork like this wouldn't pay off.)
What the OP wants here is simply, as stated above:
Also, if you want the output nicely sorted like in the question, use
posted by kprincehouse at 11:21 PM on March 14, 2012
I don't know... This takes longer to write, is harder to read, and performs (much) worse, and for what? It doesn't solve the problem at hand any better.
One thing I've learned writing SQL is to keep it simple: Solve the problem you set out to solve and move on. More just causes headaches. (In this particular case, for example, if you wanted to select or delete duplicates, there are better ways; laying the groundwork like this wouldn't pay off.)
What the OP wants here is simply, as stated above:
SELECT DISTINCT Letter, Name FROM TableA;The real reason I stopped to comment is to clarify what DISTINCT is doing here. It's modifying SELECT, not Letter, and this is important to understand: "select distinct(letter),name from TableA" suggests to me that this may not have been clear.
Also, if you want the output nicely sorted like in the question, use
SELECT DISTINCT Letter, Name FROM TableA ORDER BY Letter, Name;
posted by kprincehouse at 11:21 PM on March 14, 2012
Response by poster: Thanks everyone - and you are right kprincehouse, my thinking was that distinct modifies the variable after it. I am going to have to read some more on the net. Thanks.
posted by amazingstill at 5:39 AM on March 15, 2012
posted by amazingstill at 5:39 AM on March 15, 2012
This thread is closed to new comments.
SELECT DISTINCT Letter, Name FROM TableA , but yeah, you've got it.
posted by deezil at 1:48 PM on March 14, 2012 [1 favorite]