Join 3,417 readers in helping fund MetaFilter (Hide)


How do I get data from multiple other entries in a database?
July 17, 2011 2:43 PM   Subscribe

SQL help: I'm very new to SQL, and need to update one field with the maximum value of all entries in another field selected based on a shared value in a third field.

I have a table like this:
A |B |C
1 |5 |
1 |4 |
2 |0 |
2 |2 |
3 |3 |
3 |1 |

I need to update field C to contain the maximum value of field B for all entries that share the same value in field A.
That is:

A |B |C
1 |5 |5
1 |4 |5
2 |0 |2
2 |2 |2
3 |3 |3
3 |1 |3
I know I need to be doing an update on C and a select based on A and a max of B from that selection, or something, but I have no idea what the syntax for this should look like.
Help?

Relevant: I am trying to do this in Open Office Base.

I am totally new to SQL, and would do this in a spreadsheet program if I thought I could. My impression is that SQL is the right solution for this problem, and that I should probably learn it.
posted by agentofselection to Computers & Internet (12 answers total) 3 users marked this as a favorite
 
Not the most elegant solution but it will do it for you:


create table temp like table_name;
insert into temp select * from table_name;
update table_name set C=(select max(B) from temp where temp.A=table_name.A);
drop table temp;

posted by babby╩╝); Drop table users; -- at 2:51 PM on July 17, 2011


Since you admit you are new to SQL, this might be one of those situations where it is appropriate to ask what are you trying to do? The answer to your question may not be the best solution to your overall problem.

for instance,
- do want column C to constantly have the same property, meaning, as new rows are inserted into the table, the various other rows are appropriately updated to keep the property (max blah blah blah) true?
posted by mulligan at 3:08 PM on July 17, 2011


Think about normalizing your database. Derivative columns like this generally aren't a good idea. Maybe you can create column C as a view dynamically when you need to?

In any case, the important key is:
SELECT A, MAX(B) FROM TEST GROUP BY A

From there, you can get the view you want with:

SELECT V.B, V.A, W.C FROM TEST AS V, (SELECT A, MAX(B) AS C FROM TEST GROUP BY A) as W WHERE V.A = W.A
posted by devilsbrigade at 3:18 PM on July 17, 2011


mulligan: This should be a one-time operation. This is data that has been collected and is now being processed. When I have completed the processing, I will be using the data for other types of analysis (GIS, stats.) Some of the other formats (e.g. GIS) do not support SQL, so I assume if I wanted to update the database with, say, next year's data, I would have to import it all to a DB program again and repeat this stuff.
I suppose there might be other ways to calculate this, this just seemed like the best way from what I knew of SQL.
posted by agentofselection at 3:28 PM on July 17, 2011


This is the kind of thing you generally don't want to do with SQL, and you don't want to store in the database, because you are essentially duplicating data, and if you make changes in the other columns, that forces you to update column C.
posted by empath at 3:31 PM on July 17, 2011


Yeah, please use a view for this. Even if you don't think this database will change, someone later may add or remove nodes, which would require you to rerun the update (which they might not know). If, in fact, you know you have a very good reason to need the duplication in data, there are ways to do this with constraints & triggers, but it's a lot of work and very tricky. Views are a very safe way to handle this sort of thing, and almost all databases will provide some form of cached or precomputed views if you're worried about computation time.
posted by devilsbrigade at 3:34 PM on July 17, 2011 [1 favorite]


And by nodes I mean rows. Too much distributed database work lately.
posted by devilsbrigade at 3:36 PM on July 17, 2011


I don't know Open Office Base but this is what I would do in MySQL (I don't know if it's especially elegant to be honest):
update mytable1 left join (select A,max(B) as C from mytable1 group by A) mytable2 using(A)
set mytable1.C=mytable2.C
This assumes you've already made an empty column C on table1.
The left join is for making sure every A from table1 gets linked to its correspondent from table2.
The subquery (in parentheses) creates a table listing, from every group of A, the maximum B - naming it as new column C.
The last line sets empty column C as the C from the subquery.

posted by The Biggest Dreamer at 4:02 PM on July 17, 2011


Everyone above asking you to use a view is quite right. If for no other reason than it makes it much easier to change the criteria you use to calculate column_c if that ever becomes necessary.

CREATE VIEW gis_view AS
SELECT AAA.column_a,
       AAA.column_b,
       (SELECT MAX(BBB.column_b)
          FROM the_table AS BBB
         WHERE BBB.column_a = AAA.column_a
       ) AS column_c
  FROM the_table AS AAA
;

However, if you absolutely insist on updating the table.

UPDATE the_table AS AAA
   SET column_c = (SELECT MAX(BBB.column_b)
                     FROM the_table AS BBB
                    WHERE BBB.column_a = AAA.column_a
                   )
;

posted by ob1quixote at 4:29 PM on July 17, 2011 [4 favorites]


What GIS software are you using? Are you sure you can't do the transformation in that?
posted by desjardins at 4:39 PM on July 17, 2011


Seconding the second half of ob1quixote's post. It gives exactly the answer I would have suggested.

Views have their use, make no mistake, but I see faaaar too many views-on-views-on-views-etc that accomplish little beyond code obfuscation that wastes CPU time. So what if you need to re-run your update? If your data only changes slowly (and by suggesting it as a one-off task, I think "slowly" probably overstates the problem), schedule a job to do exactly that on a nightly basis and call it good.

(Also seconding desjardin's comment - MSSQL2008 has some really kickarse GIS primatives available, look into them before resorting to rolling-your-own).
posted by pla at 5:46 PM on July 17, 2011


So what if you need to re-run your update? If your data only changes slowly (and by suggesting it as a one-off task, I think "slowly" probably overstates the problem), schedule a job to do exactly that on a nightly basis and call it good.

Sorry, but that's making all of my DBA instincts twitch with rage.

Do not store calculated/derived data in your database. Ever. Use a view. Views-on-views-on-views are bad, so try to avoid them, but don't ever set up your DB in such a way that it can be internally inconsistent (and that doesn't mean "I'll be really careful whenever I run queries, and will update all corresponding records by hand, I promise"). Yeah, you'll end up with a lot of JOINs and calculated columns in your SELECT statements, but modern database engines can handle that effortlessly.

Views are so ridiculously easy to create that it's not worth taking the UPDATE query shortcut.
posted by schmod at 9:04 PM on July 17, 2011 [1 favorite]


« Older Last night (Saturday July 16th...   |  Where can I learn about (and w... Newer »
This thread is closed to new comments.