<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: techie SQL dupe problem</title>
	<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem/</link>
	<description>Comments on Ask MetaFilter post techie SQL dupe problem</description>
	<pubDate>Fri, 12 Aug 2005 09:04:53 -0800</pubDate>
	<lastBuildDate>Fri, 12 Aug 2005 09:04:53 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: techie SQL dupe problem</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem</link>	
		<description>Any SQL gurus out there? I&apos;m stuck... &lt;br /&gt;&lt;br /&gt; Hello.&lt;br&gt;
i&apos;m trying to do a pretty simple SQL dupe-finding operation. I&apos;m stumped though (MS SQL Server 7 btw)&lt;br&gt;
&lt;br&gt;
Say you&apos;ve got a simple table:&lt;br&gt;
&lt;br&gt;
messageid | message | user&lt;br&gt;
&lt;br&gt;
and the messageid column is a unique identifier datatype, one of those GUID things (eg. {3803F70B-4C3E-4C69-8C7F-EC534BBC1A8D}), not just an auto-increment.&lt;br&gt;
&lt;br&gt;
How can I find dupes when user and message are the same? It&apos;s easy when there&apos;s an integer-autoincrement to MAX(), but I can&apos;t work it out for a GUID...&lt;br&gt;
&lt;br&gt;
Any help would be very much appreciated!</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2005:site.22519</guid>
		<pubDate>Fri, 12 Aug 2005 08:57:26 -0800</pubDate>
		<dc:creator>derbs</dc:creator>
		
			<category>sql</category>
		
	</item> <item>
		<title>By: milkrate</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360867</link>	
		<description>SELECT message, user, count(messageid) AS count&lt;br&gt;
FROM table&lt;br&gt;
GROUP BY message, user;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360867</guid>
		<pubDate>Fri, 12 Aug 2005 09:04:53 -0800</pubDate>
		<dc:creator>milkrate</dc:creator>
	</item><item>
		<title>By: McGuillicuddy</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360868</link>	
		<description>select message, user&lt;br&gt;
from simple_table&lt;br&gt;
group by message, user&lt;br&gt;
having count(*) &amp;gt; 1</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360868</guid>
		<pubDate>Fri, 12 Aug 2005 09:04:57 -0800</pubDate>
		<dc:creator>McGuillicuddy</dc:creator>
	</item><item>
		<title>By: alkupe</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360869</link>	
		<description>select user, message, count(*) &apos;count&apos; from TABLENAME&lt;br&gt;
group by user, message&lt;br&gt;
order by &apos;count&apos; desc&lt;br&gt;
&lt;br&gt;
this third column &apos;count&apos; will tell you how many instances there are of each combination.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360869</guid>
		<pubDate>Fri, 12 Aug 2005 09:06:18 -0800</pubDate>
		<dc:creator>alkupe</dc:creator>
	</item><item>
		<title>By: derbs</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360906</link>	
		<description>ah thanks for all your quick replies! But i really need a unique indentifier in the results so i can then go back and delete the other duplicates!</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360906</guid>
		<pubDate>Fri, 12 Aug 2005 09:38:01 -0800</pubDate>
		<dc:creator>derbs</dc:creator>
	</item><item>
		<title>By: uncleozzy</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360919</link>	
		<description>How about:&lt;br&gt;
&lt;br&gt;
select t1.messageid&lt;br&gt;
from table t1 join table t2&lt;br&gt;
on (t1.user = t2.user and t1.message = t2.message)&lt;br&gt;
where t1.messageid &lt;&gt; t2.messageid&lt;/&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360919</guid>
		<pubDate>Fri, 12 Aug 2005 09:44:38 -0800</pubDate>
		<dc:creator>uncleozzy</dc:creator>
	</item><item>
		<title>By: vacapinta</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360926</link>	
		<description>delete from tablename a&lt;br&gt;
where a.messageid&amp;gt;&lt;br&gt;
(select min(messageid) from tablename b where a.user=b.user and a.messageid=b.messageid)&lt;br&gt;
&lt;br&gt;
will delete all duplicate rows.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360926</guid>
		<pubDate>Fri, 12 Aug 2005 09:49:48 -0800</pubDate>
		<dc:creator>vacapinta</dc:creator>
	</item><item>
		<title>By: bemis</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360930</link>	
		<description>Or, perhaps&lt;br&gt;
&lt;br&gt;
create table b as (&lt;br&gt;
select min(message_id) as message_id, message, user&lt;br&gt;
from t&lt;br&gt;
group by message, user&lt;br&gt;
)&lt;br&gt;
&lt;br&gt;
Depending on the size of the table and the use of the database, this may just be easier; create a new table of just the non-duplicate info, and then drop the old table and rename.  It assumes, arbitrarily, that the message_id you want to save is the min one in each case of duplication.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360930</guid>
		<pubDate>Fri, 12 Aug 2005 09:51:11 -0800</pubDate>
		<dc:creator>bemis</dc:creator>
	</item><item>
		<title>By: derbs</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360939</link>	
		<description>you know this t1,t2 and tablename a,tablename b...&lt;br&gt;
&lt;br&gt;
are these tables you have created on the fly?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360939</guid>
		<pubDate>Fri, 12 Aug 2005 10:35:24 -0800</pubDate>
		<dc:creator>derbs</dc:creator>
	</item><item>
		<title>By: vacapinta</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360940</link>	
		<description>oops. typo. corrected below.&lt;br&gt;
&lt;br&gt;
delete from tablename a&lt;br&gt;
where a.messageid&amp;gt;&lt;br&gt;
(select min(messageid) from tablename b where a.user=b.user and a.message=b.message);&lt;br&gt;
&lt;br&gt;
among duplicate rows, this will delete all of them except the one with the lowest messageid. You could also use max or any clause which guarantees a unique row among many rows (I cant think of any other clauses though)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360940</guid>
		<pubDate>Fri, 12 Aug 2005 10:37:25 -0800</pubDate>
		<dc:creator>vacapinta</dc:creator>
	</item><item>
		<title>By: vacapinta</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360942</link>	
		<description>&lt;i&gt;are these tables you have created on the fly?&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
No. they are alternate names for the same table. Many of these SQL statements are joining or nesting the table with itself - thus the need to give it aliases.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360942</guid>
		<pubDate>Fri, 12 Aug 2005 10:39:36 -0800</pubDate>
		<dc:creator>vacapinta</dc:creator>
	</item><item>
		<title>By: blue mustard</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#360989</link>	
		<description>This deletes all duplicate messages (except the one with the lowest messageid).  It assumes your table is called &quot;messages&quot;.&lt;br&gt;
&lt;br&gt;
DELETE FROM messages WHERE messageid NOT IN &lt;br&gt;
(&lt;br&gt;
SELECT MIN(messageid)&lt;br&gt;
FROM messages &lt;br&gt;
GROUP BY message, user&lt;br&gt;
)&lt;br&gt;
&lt;br&gt;
You may want to back up your db before trying this or any other suggestion.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-360989</guid>
		<pubDate>Fri, 12 Aug 2005 12:12:10 -0800</pubDate>
		<dc:creator>blue mustard</dc:creator>
	</item><item>
		<title>By: derbs</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#361068</link>	
		<description>Thanks a lot everyone - finally got it sorted! All of your answers were great, but vacapinta&apos;s was the one that was most useful (although it needed a slight amount of tweaking)&lt;br&gt;
&lt;br&gt;
However, most of you were assuming that i had a column with a unique identifier which was a integer, which i did not have originally (the primary key was a guid). But it turns out i needed to have an unique integer as well anyway (for a separate issue) so it&apos;s all turned out fine!&lt;br&gt;
&lt;br&gt;
thanks again!</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-361068</guid>
		<pubDate>Fri, 12 Aug 2005 13:41:12 -0800</pubDate>
		<dc:creator>derbs</dc:creator>
	</item><item>
		<title>By: sad_otter</title>
		<link>http://ask.metafilter.com/22519/techie-SQL-dupe-problem#361155</link>	
		<description>Also, you could add a compound uniqueness constraint to prevent the duplicate data getting into the DB to begin with.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.22519-361155</guid>
		<pubDate>Fri, 12 Aug 2005 17:22:56 -0800</pubDate>
		<dc:creator>sad_otter</dc:creator>
	</item>
	</channel>
</rss>
