Tags:


techie SQL dupe problem
August 12, 2005 8:57 AM   RSS feed for this thread Subscribe

Any SQL gurus out there? I'm stuck...

Hello.
i'm trying to do a pretty simple SQL dupe-finding operation. I'm stumped though (MS SQL Server 7 btw)

Say you've got a simple table:

messageid | message | user

and the messageid column is a unique identifier datatype, one of those GUID things (eg. {3803F70B-4C3E-4C69-8C7F-EC534BBC1A8D}), not just an auto-increment.

How can I find dupes when user and message are the same? It's easy when there's an integer-autoincrement to MAX(), but I can't work it out for a GUID...

Any help would be very much appreciated!
posted by derbs to computers & internet (13 comments total)
SELECT message, user, count(messageid) AS count
FROM table
GROUP BY message, user;
posted by milkrate at 9:04 AM on August 12, 2005


select message, user
from simple_table
group by message, user
having count(*) > 1
posted by McGuillicuddy at 9:04 AM on August 12, 2005


select user, message, count(*) 'count' from TABLENAME
group by user, message
order by 'count' desc

this third column 'count' will tell you how many instances there are of each combination.
posted by alkupe at 9:06 AM on August 12, 2005


ah thanks for all your quick replies! But i really need a unique indentifier in the results so i can then go back and delete the other duplicates!
posted by derbs at 9:38 AM on August 12, 2005


How about:

select t1.messageid
from table t1 join table t2
on (t1.user = t2.user and t1.message = t2.message)
where t1.messageid <> t2.messageid
posted by uncleozzy at 9:44 AM on August 12, 2005


delete from tablename a
where a.messageid>
(select min(messageid) from tablename b where a.user=b.user and a.messageid=b.messageid)

will delete all duplicate rows.
posted by vacapinta at 9:49 AM on August 12, 2005


Or, perhaps

create table b as (
select min(message_id) as message_id, message, user
from t
group by message, user
)

Depending on the size of the table and the use of the database, this may just be easier; create a new table of just the non-duplicate info, and then drop the old table and rename. It assumes, arbitrarily, that the message_id you want to save is the min one in each case of duplication.
posted by bemis at 9:51 AM on August 12, 2005


you know this t1,t2 and tablename a,tablename b...

are these tables you have created on the fly?
posted by derbs at 10:35 AM on August 12, 2005


oops. typo. corrected below.

delete from tablename a
where a.messageid>
(select min(messageid) from tablename b where a.user=b.user and a.message=b.message);

among duplicate rows, this will delete all of them except the one with the lowest messageid. You could also use max or any clause which guarantees a unique row among many rows (I cant think of any other clauses though)
posted by vacapinta at 10:37 AM on August 12, 2005


are these tables you have created on the fly?

No. they are alternate names for the same table. Many of these SQL statements are joining or nesting the table with itself - thus the need to give it aliases.
posted by vacapinta at 10:39 AM on August 12, 2005


This deletes all duplicate messages (except the one with the lowest messageid). It assumes your table is called "messages".

DELETE FROM messages WHERE messageid NOT IN
(
SELECT MIN(messageid)
FROM messages
GROUP BY message, user
)

You may want to back up your db before trying this or any other suggestion.
posted by blue mustard at 12:12 PM on August 12, 2005


Thanks a lot everyone - finally got it sorted! All of your answers were great, but vacapinta's was the one that was most useful (although it needed a slight amount of tweaking)

However, most of you were assuming that i had a column with a unique identifier which was a integer, which i did not have originally (the primary key was a guid). But it turns out i needed to have an unique integer as well anyway (for a separate issue) so it's all turned out fine!

thanks again!
posted by derbs at 1:41 PM on August 12, 2005


Also, you could add a compound uniqueness constraint to prevent the duplicate data getting into the DB to begin with.
posted by sad_otter at 5:22 PM on August 12, 2005


« Older Recommendations for what I sho...   |   Recommendations for a New Engl... Newer »
This thread is closed to new comments.