MySQL String to Constrain Duplicates
October 11, 2006 7:50 PM Subscribe
MySQLFilter: I'm looking for a way to display a list of recent entries from multiple blogs I have stored in a MySQL database, but allow only one entry per blog to be included in the list. What should I add to the WHERE clause?
I've got a SELECT statement that's working right now but doesn't prevent multiple entries from the same blog. For example, the current code would display the 4 latest entries like this:
Blog A, Entry 1 (today)
Blog C, Entry 1 (yesterday afternoon)
Blog A, Entry 2 (yesterday morning)
Blog B, Entry 1 (two days ago)
Instead I'd like it to display the latest 4 entries as:
Blog A, Entry 1 (today)
Blog C, Entry 1 (yesterday afternoon)
Blog B, Entry 1 (two days ago)
Blog D, Entry 1 (three days ago)
I'm assuming that a WHERE clause modification that focuses on the id for each blog (f.id in the statement below) can make this happen but I'm stuck on the syntax. Ideas?
Here's the statement I'm currently using:
"SELECT e.cid, e.id, e.url, e.title, SUBSTRING( e.content, 1, 250 ), e.pubdate, f.siteurl, f.title,
FROM feeds_item e INNER JOIN feeds_channels f ON (e.cid = f.id)
ORDER BY e.pubdate DESC
LIMIT 4"
I've got a SELECT statement that's working right now but doesn't prevent multiple entries from the same blog. For example, the current code would display the 4 latest entries like this:
Blog A, Entry 1 (today)
Blog C, Entry 1 (yesterday afternoon)
Blog A, Entry 2 (yesterday morning)
Blog B, Entry 1 (two days ago)
Instead I'd like it to display the latest 4 entries as:
Blog A, Entry 1 (today)
Blog C, Entry 1 (yesterday afternoon)
Blog B, Entry 1 (two days ago)
Blog D, Entry 1 (three days ago)
I'm assuming that a WHERE clause modification that focuses on the id for each blog (f.id in the statement below) can make this happen but I'm stuck on the syntax. Ideas?
Here's the statement I'm currently using:
"SELECT e.cid, e.id, e.url, e.title, SUBSTRING( e.content, 1, 250 ), e.pubdate, f.siteurl, f.title,
FROM feeds_item e INNER JOIN feeds_channels f ON (e.cid = f.id)
ORDER BY e.pubdate DESC
LIMIT 4"
Best answer: You need something in the row that distinguishes it as the latest entry for that: a timestamp, an "is_latest" column. It's a kludge, but if ids are monotonic increasing, you can use the maximum id per feed. You have pub_date, we'll use that.
First, drive the table from feed_channel, and use an eqi-join, not an inner or outer join. By making an eqi-join, you can drop the LIMIT statement, which doesn't actually do what you want.
The select statement stays the same, of course, but, it's
"FROM feed_channels of JOIN feeds_item e on ( e.cid = f.id )
WHERE e.pubdute = ( select max( g,pubdate ) from feeds_item g where g.cid = f.id )
(dentata writes "I think you just need a DISTINCT...." Distinct doesn't work like that.)
posted by orthogonality at 9:35 PM on October 11, 2006
First, drive the table from feed_channel, and use an eqi-join, not an inner or outer join. By making an eqi-join, you can drop the LIMIT statement, which doesn't actually do what you want.
The select statement stays the same, of course, but, it's
"FROM feed_channels of JOIN feeds_item e on ( e.cid = f.id )
WHERE e.pubdute = ( select max( g,pubdate ) from feeds_item g where g.cid = f.id )
(dentata writes "I think you just need a DISTINCT...." Distinct doesn't work like that.)
posted by orthogonality at 9:35 PM on October 11, 2006
As with most things SQL, orthogonality is right. But do keep in mind that due to MySQL being a toy database you also have to work around the fact that some features are only available in later versions. In this case, you need to be using version 4.1 or higher to use subqueries.
posted by Rhomboid at 9:50 PM on October 11, 2006
posted by Rhomboid at 9:50 PM on October 11, 2006
I might be totally off base here, but wouldn't using RSS be more appropriate? I'm not sure which blogging platform you use, but there are a few wordpress plugins which I think would do what you want.
posted by maxpower at 9:52 PM on October 11, 2006
posted by maxpower at 9:52 PM on October 11, 2006
I think you are totally off base. How do you think those mysql tables were populated with blog posts? Or put another way, he's not asking how to aggregate and import the data, he's already got that covered.
posted by Rhomboid at 10:01 PM on October 11, 2006
posted by Rhomboid at 10:01 PM on October 11, 2006
I'm tired, so I'm oversimplifying big time:
Assuming blogs A, B, C and D are all stored in table blog... whichblog is which blog (A, B, C, D) and pubdate is the date of publishing, I think you could do:
SELECT * FROM blog ORDER BY pubdate DESC GROUP BY whichblog;
posted by twiggy at 12:35 AM on October 12, 2006
Assuming blogs A, B, C and D are all stored in table blog... whichblog is which blog (A, B, C, D) and pubdate is the date of publishing, I think you could do:
SELECT * FROM blog ORDER BY pubdate DESC GROUP BY whichblog;
posted by twiggy at 12:35 AM on October 12, 2006
You're trying to pull the record for each feed with the maximum value in the pubdate field, right? The most efficient way to do that is probably a subquery, but I think MySQL only began supporting subqueries in 4.1. Think something along the lines of the following:
The only problem with the above syntax is that if a particular blog has two entries posted simultaneously that were its most recent entries, this query would show both. Hopefully you're using an integer/timestamp field and not just a date field.
posted by Doofus Magoo at 3:48 AM on October 12, 2006
SELECT f.id, e.title FROM feeds_channels f LEFT JOIN feeds_item e ON e.cid=f.id WHERE e.pubdate = (SELECT MAX(pubdate) FROM feeds_item WHERE feeds_item.cid=e.cid) ORDER BY e.pubdate DESC LIMIT 0, 4Note the query "embedded" in the WHERE clause; its purpose is to identify only the records where the pubdate is equal to the MAX(pubdate) for each blog.
The only problem with the above syntax is that if a particular blog has two entries posted simultaneously that were its most recent entries, this query would show both. Hopefully you're using an integer/timestamp field and not just a date field.
posted by Doofus Magoo at 3:48 AM on October 12, 2006
Response by poster: I'm using 4.1 so Dofus Magoo's WHERE clause did the trick. Thanks everyone!
posted by DIYer at 9:30 AM on October 12, 2006
posted by DIYer at 9:30 AM on October 12, 2006
DIYer writes "I'm using 4.1 so Dofus Magoo's WHERE clause did the trick. Thanks everyone!"
That's great.
posted by orthogonality at 11:40 AM on October 12, 2006
That's great.
posted by orthogonality at 11:40 AM on October 12, 2006
So the guy that shows up last to the thread and repeats nearly verbatim the answer given 6 hours earlier gets the best answer checkmark? Howyoufigure?
posted by Rhomboid at 3:05 PM on October 12, 2006
posted by Rhomboid at 3:05 PM on October 12, 2006
Response by poster: So the guy that shows up last to the thread and repeats nearly verbatim the answer given 6 hours earlier gets the best answer checkmark? Howyoufigure?
Woops, bad QA on my part. I guess my PHP is bad enough that I need to read something twice before it sinks in.
I'll switch the best answer credit to orthogonality.
Thanks again everyone.
posted by DIYer at 11:30 AM on October 13, 2006
Woops, bad QA on my part. I guess my PHP is bad enough that I need to read something twice before it sinks in.
I'll switch the best answer credit to orthogonality.
Thanks again everyone.
posted by DIYer at 11:30 AM on October 13, 2006
This thread is closed to new comments.
posted by dentata at 7:55 PM on October 11, 2006