Tricky SQL for a simple function
November 23, 2007 1:38 AM Subscribe
Been banging my head against the wall on this the whole day. I need a way to select & sort data from a single MySQL table, but it's a little bit more complicated that it seems.
Assume a table called "activity" with 4 fields - activity_id, activity_description, activity_datetime and project_id, which is a foreign key to the "project" table. This "activity" table stores all activity updates related to a project.
The "project" table is just project_id and project_name.
In a single SQL, I want to display a table that shows each project, ordered by its latest activity_datetime, like this:
Project X | "Client meeting" | 2007-11-23
Project Z | "Web design" | 2007-10-22
Project A | "Programming" | 2007-10-10
My current SQL does not work because MAX(activity_datetime) gives me the most recent activity, but activity_description still shows the first matching record.
SELECT *, MAX(activity_datetime) FROM activity a, project p WHERE a.project_id = p.project_id GROUP BY p.project_id
Assume a table called "activity" with 4 fields - activity_id, activity_description, activity_datetime and project_id, which is a foreign key to the "project" table. This "activity" table stores all activity updates related to a project.
The "project" table is just project_id and project_name.
In a single SQL, I want to display a table that shows each project, ordered by its latest activity_datetime, like this:
Project X | "Client meeting" | 2007-11-23
Project Z | "Web design" | 2007-10-22
Project A | "Programming" | 2007-10-10
My current SQL does not work because MAX(activity_datetime) gives me the most recent activity, but activity_description still shows the first matching record.
SELECT *, MAX(activity_datetime) FROM activity a, project p WHERE a.project_id = p.project_id GROUP BY p.project_id
Response by poster: Hi evariste, the SQL doesn't work because the GROUP BY function sort of messes up the results... the results show data that do not correlate to activity_datetime. So that's the problem - I still need GROUP BY in order for each project to show up only once.
posted by arrowhead at 2:42 AM on November 23, 2007
posted by arrowhead at 2:42 AM on November 23, 2007
Best answer: The join on project is superfluous to your problem; which can be reduced to:
select description, max(dt) from activity group by projectid
try this:
select * from activity where dt in (select max(dt) from activity group by projectid)
rewriting as a single query is left as an exercise for the reader :)
posted by Leon at 2:44 AM on November 23, 2007
select description, max(dt) from activity group by projectid
try this:
select * from activity where dt in (select max(dt) from activity group by projectid)
rewriting as a single query is left as an exercise for the reader :)
posted by Leon at 2:44 AM on November 23, 2007
I'm just messing around here, but would this work?
SELECT * from activity a, project p where p.project_id=a.project_id and a.activity_datetime in (SELECT MAX(activity_datetime) FROM activity a, project p WHERE a.project_id = p.project_id GROUP BY p.project_id)
Or possibly:
SELECT * from activity a, project p where p.project_id=a.project_id and a.activity_datetime in (SELECT MAX(activity_datetime) FROM activity a, project p WHERE a.project_id = p.project_id GROUP BY p.project_id) limit (select count(*) from project)
if the first one gives you too many rows.
posted by evariste at 3:03 AM on November 23, 2007
SELECT * from activity a, project p where p.project_id=a.project_id and a.activity_datetime in (SELECT MAX(activity_datetime) FROM activity a, project p WHERE a.project_id = p.project_id GROUP BY p.project_id)
Or possibly:
SELECT * from activity a, project p where p.project_id=a.project_id and a.activity_datetime in (SELECT MAX(activity_datetime) FROM activity a, project p WHERE a.project_id = p.project_id GROUP BY p.project_id) limit (select count(*) from project)
if the first one gives you too many rows.
posted by evariste at 3:03 AM on November 23, 2007
Too slow! Looks like I was thinking along the same general track as Leon.
posted by evariste at 3:04 AM on November 23, 2007
posted by evariste at 3:04 AM on November 23, 2007
BTW, that inner select assumes that the dt column is unique. With a 1-second resolution, it is guaranteed to break at some point. In the real world, I'd solve this outside SQL.
posted by Leon at 3:05 AM on November 23, 2007
posted by Leon at 3:05 AM on November 23, 2007
BTW, that inner select assumes that the dt column is unique. With a 1-second resolution, it is guaranteed to break at some point.
We're talking about activity rows with the same datetime, but different project_id, right? I tried adding an activity row with the exact same datetime as another, but a different project, to the toy sqlite database I'm using to help me try to answer this question. It still works fine and outputs something that seems correct, because I'm including the p.project_id=a.project_id clause in the inner select. Does this address what you said, or am I missing your point?
posted by evariste at 3:10 AM on November 23, 2007
We're talking about activity rows with the same datetime, but different project_id, right? I tried adding an activity row with the exact same datetime as another, but a different project, to the toy sqlite database I'm using to help me try to answer this question. It still works fine and outputs something that seems correct, because I'm including the p.project_id=a.project_id clause in the inner select. Does this address what you said, or am I missing your point?
posted by evariste at 3:10 AM on November 23, 2007
Best answer: Leon, my best query attempt so far is this one:
SELECT * from activity a, project p where p.project_id=a.project_id and a.activity_datetime in (SELECT MAX(activity_datetime) FROM activity a, project p WHERE a.project_id = p.project_id GROUP BY p.project_id) limit (select count(*) from project)
I'm sure it could be improved. I uploaded my test sqlite database here if you want some test data. (and arrowhead, feel free to improve on my probably-crummy test database, I just needed something to play with.)
posted by evariste at 3:16 AM on November 23, 2007
SELECT * from activity a, project p where p.project_id=a.project_id and a.activity_datetime in (SELECT MAX(activity_datetime) FROM activity a, project p WHERE a.project_id = p.project_id GROUP BY p.project_id) limit (select count(*) from project)
I'm sure it could be improved. I uploaded my test sqlite database here if you want some test data. (and arrowhead, feel free to improve on my probably-crummy test database, I just needed something to play with.)
posted by evariste at 3:16 AM on November 23, 2007
*sigh*... I'm half awake this morning. You guys should just ignore me while I mumble to myself. I think evariste's solution will work, but he should be dragged to the village ducking stool for using two nested selects. (Seriously - what's the limit for? I'm not seeing it).
posted by Leon at 3:29 AM on November 23, 2007
posted by Leon at 3:29 AM on November 23, 2007
Leon, the limit clause is because I was getting too many rows! It was returning every activity record, grouped by project and then ordered by datetime, desc. I'm assuming arrowhead doesn't want every activity record, but rather only wants the single latest activity from each project, so putting on a limit equal to the number of projects seemed like the way to accomplish it—and you clearly don't want to pick a number and hardcode it, because you might add or delete projects. While two nested subqueries is pretty ugly, it does the job...I hope.
posted by evariste at 3:33 AM on November 23, 2007
posted by evariste at 3:33 AM on November 23, 2007
You know, if we assume that activity rows are always added in date-order (and so the highest activity_id for a project is always the most recent one) we can do this:
SELECT * FROM activity WHERE activity_id IN (SELECT MAX(activity_id) FROM activity GROUP BY project_id)
That can be a reasonable assumption, sometimes (logging tables).
posted by Leon at 4:07 AM on November 23, 2007
SELECT * FROM activity WHERE activity_id IN (SELECT MAX(activity_id) FROM activity GROUP BY project_id)
That can be a reasonable assumption, sometimes (logging tables).
posted by Leon at 4:07 AM on November 23, 2007
Response by poster: Thank you leon & evariste - you've both been extremely helpful. A derivation of both suggested SQLs did the job - managed solve the problem in less than 10 minutes!
posted by arrowhead at 9:06 AM on November 27, 2007
posted by arrowhead at 9:06 AM on November 27, 2007
This thread is closed to new comments.
SELECT * FROM activity a, project p WHERE a.project_id = p.project_id GROUP BY p.project_id ORDER BY activity_datetime desc;
posted by evariste at 2:12 AM on November 23, 2007