Joins got me down
December 24, 2008 1:03 PM   RSS feed for this thread Subscribe

Yet another MySQL join question.

Ok, so I have a hard time conceptualizing all but the most basic MySQL joins and thus I just can't seem to wrap my head around how to do this particular task.

Lets say I have a page I'm going to display (which is identified by its "item_id") and along with the page I display comments pulled from my database.

So if I would have two tables:

comments, which consists of three fields:

serial_number | item_id | comment_body

(serial_number is a unique auto incrementing key to identify comments, item_id is the page the comment belongs to and comment body is self explanatory)

Every time a user reads a comment I mark it off in its own table called mark_as_read, which also consists of three fields:

user_name | comment_serial | item_id

(primary key is user_name+comment_serial)

What I'm having a problem with is given a particular user I want to generate a list of all the item_ids that have at least one comment that hasn't been marked as read by that user.

So if given a 'user_name' I'd want to generate a list of the 15 most recent 'comments.item_id's where there exists a 'comments.serial_number' that isn't listed in 'mark_as_read' paired with the provided 'user_name'.

I know how to do basically the opposite, pull up a list of all the item_ids that the user *has* read at least one comment from:

SELECT DISTINCT comments.item_id
FROM comments, mark_as_read
WHERE mark_as_read.user_name = "$user_name" AND comments.serial_number = mark_as_read.comment_serial
ORDER BY comments.serial DESC LIMIT 15

But I honestly have to idea how to formulate a join to do this task, when I want to find all the items where the user hasn't read at least one comment.

Any folks more clever then myself care to give some advice?

Thanks much!
Jeremy
posted by Jezztek to computers & internet (11 comments total) 2 users marked this as a favorite
Though I'm not convinced that your database is setup to handle this particular query efficiently, I would probably use a NOT IN and a subquery.

SELECT DISTINCT comments.item_id FROM comments
WHERE comments.user_name = "$user_name" AND
comments.serial_number NOT IN (SELECT mark_as_read.serial_number FROM mark_as_read WHERE mark_as_read.user_name="$user_name")

Subqueries require MySQL 5, I think, though so this might not work for you.

You could probably also do this by LEFT JOINing the mark_as_read table against the comments table and then checking for places where the mark_as_read.serial_number is NULL (indicating no matching record in the mark_as_read table).
posted by toomuchpete at 1:28 PM on December 24, 2008


First option worked just fine, thanks much!
posted by Jezztek at 2:13 PM on December 24, 2008


Unfortunately, NOT IN probably gets implemented as a great big inefficient OR list.

try:
select distinct a.item_id
from comment a where not exists ( select * from mark_as_read b where b.comment_serial = a.serial_number and b.user_name = $user_name)

A few notes: one, I'm sick as a dog and can barely think, and this is untested, so who knows.

Two, user_name should be a user_id, for greater efficiency:
select distinct a.item_id
from comments a where not exists ( select * from mark_as_read b where b.comment_serial = a.serial_number and b.user_id = (select user_id from user where user_name = $user _name))

That second sub-select should be optimized by the DB to a constant (as it's uncorrelated), but test to be sure it is.

Three, item_id in mark_as_read is a redundant de-normalization that can introduce data inconsistencies. Get rid of it. If you're actually using it for something, replace that with a view that joins to "comment" when you change the table name (see below).

Four, column identifiers should be the same frm table to table, or the same but prefixed with the owning table name. And id columns should be named "id". So it should be "id" not "serial_number" in table "comment", and "comment_id" not "comment_serial" in mark_as_read.

Five, table names should be nouns, not verbs, so it's "marked_as_read" or just "read", not the verb phrase "mark_as_read".
posted by orthogonality at 3:08 PM on December 24, 2008


I would also suggest naming all id's similarly, so that serial_number becomes comment_id, etc. That way you can just referer to everything as whatever_id instead of having to change it as you do currently: serial_number => comment_serial. Ideally they would just be comment_id everywhere so you always know what exactly which data you're referring to.
posted by jeffamaphone at 3:55 PM on December 24, 2008


I was going to suggest the standard SQL "EXCEPT" operator, which does exactly this, but apparently MySQL doesn't support it (yet).
posted by hattifattener at 4:44 PM on December 24, 2008


One thing that might be worth looking into with mysql joins in the future is that you can get around mysql not allowing sub queries by creating a view
and joining to that view instead of a subquery.
posted by compound eye at 4:51 PM on December 24, 2008


It looks like you got an answer that works. Keep in mind that subqueries and views are EXTREMELY EVIL, since they produce in-memory tables that can't have indexes. As your database grows, anything that uses a subquery or view will become a performance bottleneck.

The following left outer join query should do the trick. Note that the user name test in the WHERE clause might need to be anded with the on clause. You should also think carefully about the composite key in mark_as_read, that could get expensive over time too.


SELECT
FROM
  comments c,
LEFT OUTER JOIN
  mark_as_read mar
ON
  c.item_id = mar.item_id
WHERE
  mar.user_name = <user name>
  AND mar.item_id IS NULL
posted by b1tr0t at 7:00 PM on December 24, 2008


of course, you also need something in your select clause, like DISTINCT c.item_id
posted by b1tr0t at 7:00 PM on December 24, 2008


Wow, lots of great stuff here. I'm learning, I'm learning =)
posted by Jezztek at 8:15 PM on December 24, 2008


I'm mostly an Oracle guy, but I just wanted to give props to b1tr0t's solution. In general, an outer join is going to be a ton more efficient than not in, which is something to be avoided if you have any choice at all.
posted by Lame_username at 7:17 AM on December 25, 2008


I too agree with b1tr0t's answer.
posted by Doofus Magoo at 8:38 AM on December 25, 2008


« Older I'd like to know the best way ...   |   Are there dorm room freezers?... Newer »

You are not logged in, either login or create an account to post comments