<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Joins got me down</title>
	<link>http://ask.metafilter.com/109967/Joins-got-me-down/</link>
	<description>Comments on Ask MetaFilter post Joins got me down</description>
	<pubDate>Wed, 24 Dec 2008 13:28:04 -0800</pubDate>
	<lastBuildDate>Wed, 24 Dec 2008 13:28:04 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Joins got me down</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down</link>	
		<description>Yet another MySQL join question. &lt;br /&gt;&lt;br /&gt; Ok, so I have a hard time conceptualizing all but the most basic MySQL joins and thus I just can&apos;t seem to wrap my head around how to do this particular task.&lt;br&gt;
&lt;br&gt;
Lets say I have a page I&apos;m going to display (which is identified by its &quot;item_id&quot;) and along with the page I display comments pulled from my database. &lt;br&gt;
&lt;br&gt;
So if I would have two tables:&lt;br&gt;
&lt;br&gt;
comments, which consists of three fields: &lt;br&gt;
&lt;br&gt;
&lt;b&gt;serial_number | item_id | comment_body&lt;/b&gt;&lt;br&gt;
&lt;br&gt;
(serial_number is a unique auto incrementing key to identify comments, item_id is the page the comment belongs to and comment body is self explanatory)  &lt;br&gt;
&lt;br&gt;
Every time a user reads a comment I mark it off in its own table called mark_as_read, which also consists of three fields:&lt;br&gt;
&lt;br&gt;
&lt;b&gt;user_name | comment_serial | item_id&lt;/b&gt;&lt;br&gt;
&lt;br&gt;
(primary key is user_name+comment_serial)&lt;br&gt;
&lt;br&gt;
What I&apos;m having a problem with is given a particular user I want to generate a list of all the item_ids that have at least one comment that hasn&apos;t been marked as read by that user. &lt;br&gt;
&lt;br&gt;
So if given a &apos;user_name&apos; I&apos;d want to generate a list of the 15 most recent &apos;comments.item_id&apos;s where there exists a &apos;comments.serial_number&apos; that isn&apos;t listed in &apos;mark_as_read&apos; paired with the provided &apos;user_name&apos;.&lt;br&gt;
&lt;br&gt;
I know how to do basically the opposite, pull up a list of all the item_ids that the user *has* read at least one comment from:&lt;br&gt;
&lt;br&gt;
SELECT DISTINCT comments.item_id &lt;br&gt;
FROM comments, mark_as_read &lt;br&gt;
WHERE mark_as_read.user_name = &quot;$user_name&quot; AND comments.serial_number = mark_as_read.comment_serial &lt;br&gt;
ORDER BY comments.serial DESC LIMIT 15&lt;br&gt;
&lt;br&gt;
But I honestly have to idea how to formulate a join to do this task, when I want to find all the items where the user hasn&apos;t read at least one comment. &lt;br&gt;
&lt;br&gt;
Any folks more clever then myself care to give some advice?&lt;br&gt;
&lt;br&gt;
Thanks much!&lt;br&gt;
Jeremy</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2008:site.109967</guid>
		<pubDate>Wed, 24 Dec 2008 13:03:05 -0800</pubDate>
		<dc:creator>Jezztek</dc:creator>
		
			<category>mySQL</category>
		
			<category>Joins</category>
		
	</item> <item>
		<title>By: toomuchpete</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583044</link>	
		<description>Though I&apos;m not convinced that your database is setup to handle this particular query efficiently, I would probably use a NOT IN and a subquery.&lt;br&gt;
&lt;br&gt;
SELECT DISTINCT comments.item_id FROM comments &lt;br&gt;
WHERE comments.user_name = &quot;$user_name&quot; AND &lt;br&gt;
comments.serial_number NOT IN (SELECT mark_as_read.serial_number FROM mark_as_read WHERE mark_as_read.user_name=&quot;$user_name&quot;)&lt;br&gt;
&lt;br&gt;
Subqueries require MySQL 5, I think, though so this might not work for you.&lt;br&gt;
&lt;br&gt;
You could probably also do this by LEFT JOINing the mark_as_read table against the comments table and then checking for places where the mark_as_read.serial_number is NULL (indicating no matching record in the mark_as_read table).</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583044</guid>
		<pubDate>Wed, 24 Dec 2008 13:28:04 -0800</pubDate>
		<dc:creator>toomuchpete</dc:creator>
	</item><item>
		<title>By: Jezztek</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583069</link>	
		<description>First option worked just fine, thanks much!</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583069</guid>
		<pubDate>Wed, 24 Dec 2008 14:13:32 -0800</pubDate>
		<dc:creator>Jezztek</dc:creator>
	</item><item>
		<title>By: orthogonality</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583097</link>	
		<description>Unfortunately, NOT IN probably gets implemented as a great big inefficient OR list.&lt;br&gt;
&lt;br&gt;
try:&lt;br&gt;
select distinct a.item_id&lt;br&gt;
from comment a where not exists ( select * from mark_as_read b where b.comment_serial = a.serial_number and b.user_name = $user_name)&lt;br&gt;
&lt;br&gt;
A few notes: one, I&apos;m sick as a dog and can barely think, and this is untested, so who knows. &lt;br&gt;
&lt;br&gt;
Two, user_name should be a user_id, for greater efficiency:&lt;br&gt;
select distinct a.item_id&lt;br&gt;
from comments a where not exists ( select * from mark_as_read b where b.comment_serial = a.serial_number and b.user_id = (select user_id from user where user_name = $user _name))&lt;br&gt;
&lt;br&gt;
That second sub-select should be optimized by the DB to a constant (as it&apos;s uncorrelated), but test to be sure it is.&lt;br&gt;
&lt;br&gt;
Three, item_id in mark_as_read is a redundant de-normalization that can introduce data inconsistencies. Get rid of it. If you&apos;re actually using it for something, replace that with a view that joins to &quot;comment&quot; when you change the table name (see below).&lt;br&gt;
&lt;br&gt;
Four, column identifiers should be the same frm table to table, or the same but prefixed with the owning table name. And id columns should be named &quot;id&quot;. So it should be &quot;id&quot; not &quot;serial_number&quot;  in table &quot;comment&quot;, and &quot;comment_id&quot; not &quot;comment_serial&quot; in mark_as_read. &lt;br&gt;
&lt;br&gt;
Five, table names should be nouns, not verbs, so it&apos;s &quot;marked_as_read&quot; or just &quot;read&quot;, not the verb phrase &quot;mark_as_read&quot;.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583097</guid>
		<pubDate>Wed, 24 Dec 2008 15:08:50 -0800</pubDate>
		<dc:creator>orthogonality</dc:creator>
	</item><item>
		<title>By: jeffamaphone</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583115</link>	
		<description>I would also suggest naming all id&apos;s similarly, so that serial_number becomes comment_id, etc.  That way you can just referer to everything as whatever_id instead of having to change it as you do currently: serial_number =&amp;gt; comment_serial.  Ideally they would just be comment_id everywhere so you always know what exactly which data you&apos;re referring to.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583115</guid>
		<pubDate>Wed, 24 Dec 2008 15:55:35 -0800</pubDate>
		<dc:creator>jeffamaphone</dc:creator>
	</item><item>
		<title>By: hattifattener</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583136</link>	
		<description>I was going to suggest the standard SQL &quot;EXCEPT&quot; operator, which does exactly this, but apparently &lt;a href=&quot;http://bugs.mysql.com/bug.php?id=1309&quot;&gt;MySQL doesn&apos;t support it&lt;/a&gt; (yet).</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583136</guid>
		<pubDate>Wed, 24 Dec 2008 16:44:40 -0800</pubDate>
		<dc:creator>hattifattener</dc:creator>
	</item><item>
		<title>By: compound eye</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583137</link>	
		<description>One thing that might be worth looking into with mysql joins in the future is that you can get around mysql not allowing sub queries by creating a view&lt;br&gt;
and joining to that view instead of a subquery.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583137</guid>
		<pubDate>Wed, 24 Dec 2008 16:51:07 -0800</pubDate>
		<dc:creator>compound eye</dc:creator>
	</item><item>
		<title>By: b1tr0t</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583214</link>	
		<description>It looks like you got an answer that works. Keep in mind that subqueries and views are EXTREMELY EVIL, since they produce in-memory tables that can&apos;t have indexes. As your database grows, anything that uses a subquery or view will become a performance bottleneck.&lt;br&gt;
&lt;br&gt;
The following left outer join query should do the trick. Note that the user name test in the WHERE clause might need to be anded with the on clause. You should also think carefully about the composite key in mark_as_read, that could get expensive over time too.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
SELECT&lt;br&gt;
FROM&lt;br&gt;
&amp;nbsp;&amp;nbsp;comments c,&lt;br&gt;
LEFT OUTER JOIN&lt;br&gt;
&amp;nbsp;&amp;nbsp;mark_as_read mar&lt;br&gt;
ON&lt;br&gt;
&amp;nbsp;&amp;nbsp;c.item_id = mar.item_id&lt;br&gt;
WHERE&lt;br&gt;
&amp;nbsp;&amp;nbsp;mar.user_name = &amp;lt;user name&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;AND mar.item_id IS NULL</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583214</guid>
		<pubDate>Wed, 24 Dec 2008 19:00:04 -0800</pubDate>
		<dc:creator>b1tr0t</dc:creator>
	</item><item>
		<title>By: b1tr0t</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583216</link>	
		<description>of course, you also need something in your select clause, like DISTINCT c.item_id</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583216</guid>
		<pubDate>Wed, 24 Dec 2008 19:00:50 -0800</pubDate>
		<dc:creator>b1tr0t</dc:creator>
	</item><item>
		<title>By: Jezztek</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583249</link>	
		<description>Wow, lots of great stuff here. I&apos;m learning, I&apos;m learning =)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583249</guid>
		<pubDate>Wed, 24 Dec 2008 20:15:52 -0800</pubDate>
		<dc:creator>Jezztek</dc:creator>
	</item><item>
		<title>By: Lame_username</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583428</link>	
		<description>I&apos;m mostly an Oracle guy, but I just wanted to give props to b1tr0t&apos;s solution.  In general, an outer join is going to be a ton more efficient than not in, which is something to be avoided if you have any choice at all.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583428</guid>
		<pubDate>Thu, 25 Dec 2008 07:17:58 -0800</pubDate>
		<dc:creator>Lame_username</dc:creator>
	</item><item>
		<title>By: Doofus Magoo</title>
		<link>http://ask.metafilter.com/109967/Joins-got-me-down#1583450</link>	
		<description>I too agree with b1tr0t&apos;s answer.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.109967-1583450</guid>
		<pubDate>Thu, 25 Dec 2008 08:38:46 -0800</pubDate>
		<dc:creator>Doofus Magoo</dc:creator>
	</item>
	</channel>
</rss>
