MySQL String to Constrain Duplicates
October 11, 2006 7:50 PM   Subscribe

MySQLFilter: I'm looking for a way to display a list of recent entries from multiple blogs I have stored in a MySQL database, but allow only one entry per blog to be included in the list. What should I add to the WHERE clause?

I've got a SELECT statement that's working right now but doesn't prevent multiple entries from the same blog. For example, the current code would display the 4 latest entries like this:
Blog A, Entry 1 (today)
Blog C, Entry 1 (yesterday afternoon)
Blog A, Entry 2 (yesterday morning)
Blog B, Entry 1 (two days ago)

Instead I'd like it to display the latest 4 entries as:
Blog A, Entry 1 (today)
Blog C, Entry 1 (yesterday afternoon)
Blog B, Entry 1 (two days ago)
Blog D, Entry 1 (three days ago)

I'm assuming that a WHERE clause modification that focuses on the id for each blog (f.id in the statement below) can make this happen but I'm stuck on the syntax. Ideas?

Here's the statement I'm currently using:

"SELECT e.cid, e.id, e.url, e.title, SUBSTRING( e.content, 1, 250 ), e.pubdate, f.siteurl, f.title,

FROM feeds_item e INNER JOIN feeds_channels f ON (e.cid = f.id)

ORDER BY e.pubdate DESC

LIMIT 4"
posted by DIYer to Computers & Internet (12 answers total)
 
I think you just need a DISTINCT e.cid in the statement. Maybe some parens on that.
posted by dentata at 7:55 PM on October 11, 2006


Best answer: You need something in the row that distinguishes it as the latest entry for that: a timestamp, an "is_latest" column. It's a kludge, but if ids are monotonic increasing, you can use the maximum id per feed. You have pub_date, we'll use that.

First, drive the table from feed_channel, and use an eqi-join, not an inner or outer join. By making an eqi-join, you can drop the LIMIT statement, which doesn't actually do what you want.


The select statement stays the same, of course, but, it's
"FROM feed_channels of JOIN feeds_item e on ( e.cid = f.id )
WHERE e.pubdute = ( select max( g,pubdate ) from feeds_item g where g.cid = f.id )

(dentata writes "I think you just need a DISTINCT...." Distinct doesn't work like that.)
posted by orthogonality at 9:35 PM on October 11, 2006


As with most things SQL, orthogonality is right. But do keep in mind that due to MySQL being a toy database you also have to work around the fact that some features are only available in later versions. In this case, you need to be using version 4.1 or higher to use subqueries.
posted by Rhomboid at 9:50 PM on October 11, 2006


I might be totally off base here, but wouldn't using RSS be more appropriate? I'm not sure which blogging platform you use, but there are a few wordpress plugins which I think would do what you want.
posted by maxpower at 9:52 PM on October 11, 2006


I think you are totally off base. How do you think those mysql tables were populated with blog posts? Or put another way, he's not asking how to aggregate and import the data, he's already got that covered.
posted by Rhomboid at 10:01 PM on October 11, 2006


I'm tired, so I'm oversimplifying big time:

Assuming blogs A, B, C and D are all stored in table blog... whichblog is which blog (A, B, C, D) and pubdate is the date of publishing, I think you could do:

SELECT * FROM blog ORDER BY pubdate DESC GROUP BY whichblog;
posted by twiggy at 12:35 AM on October 12, 2006


You're trying to pull the record for each feed with the maximum value in the pubdate field, right? The most efficient way to do that is probably a subquery, but I think MySQL only began supporting subqueries in 4.1. Think something along the lines of the following:
SELECT f.id, e.title
FROM feeds_channels f LEFT JOIN feeds_item e ON e.cid=f.id
WHERE e.pubdate = (SELECT MAX(pubdate) FROM feeds_item WHERE feeds_item.cid=e.cid)
ORDER BY e.pubdate DESC
LIMIT 0, 4
Note the query "embedded" in the WHERE clause; its purpose is to identify only the records where the pubdate is equal to the MAX(pubdate) for each blog.

The only problem with the above syntax is that if a particular blog has two entries posted simultaneously that were its most recent entries, this query would show both. Hopefully you're using an integer/timestamp field and not just a date field.
posted by Doofus Magoo at 3:48 AM on October 12, 2006


echo... echo... echo....
posted by Rhomboid at 5:05 AM on October 12, 2006


Response by poster: I'm using 4.1 so Dofus Magoo's WHERE clause did the trick. Thanks everyone!
posted by DIYer at 9:30 AM on October 12, 2006


DIYer writes "I'm using 4.1 so Dofus Magoo's WHERE clause did the trick. Thanks everyone!"


That's great.
posted by orthogonality at 11:40 AM on October 12, 2006


So the guy that shows up last to the thread and repeats nearly verbatim the answer given 6 hours earlier gets the best answer checkmark? Howyoufigure?
posted by Rhomboid at 3:05 PM on October 12, 2006


Response by poster: So the guy that shows up last to the thread and repeats nearly verbatim the answer given 6 hours earlier gets the best answer checkmark? Howyoufigure?

Woops, bad QA on my part. I guess my PHP is bad enough that I need to read something twice before it sinks in.

I'll switch the best answer credit to orthogonality.

Thanks again everyone.
posted by DIYer at 11:30 AM on October 13, 2006


« Older Songs to shake,shake, shake, shake your booty?   |   How to keep track of coupons? Newer »
This thread is closed to new comments.