Using a Database for analysis of temperatures
November 2, 2006 7:18 AM   Subscribe

I've got an interesting question about MS Acess. It involves using lat/long data... Anyone care to help?

I have a 100MB database. It contains two tables.

Table 1:
Is very lage. It contains lat/long data for much of the whole eath and a temperature reading at that location.

Table 2:
Is a much smaller table containing test sites to query against the larger dataset.

I'd like to query each record in table two against table one. I'd like to retrieve four readings (.5 of a degree surrounding the lat/long in table two) from table one.

If anyone knows how I could go about this, that would be peachy.
posted by dance to Computers & Internet (15 answers total) 1 user marked this as a favorite
Since the query is structurally simple, and since the data are spatially referenced, and since the query is a spatial one relying on extraction of distance from lat-long, then I can only recommend a GIS. Now, you probably don't need a big expensive GIS nor one with a complex query language, nor (it seems to me) one with much mapping capacity. So, going to a site like this for free GIS software might produce a simple, easy, free program explicitly designed for spatial queries. You can then dump the output back into the databse program of your choice. I am not a big Access user but this kind of query is what GIS was developed for, and yet, you don't seem to need a bells-n-whistles GIS program. If someone knows how to make access read lat-long, then so much the better. (You could also use the GIS to convert your lat long to UTM / military grid which might be easier for Access to deal with)
posted by Rumple at 7:28 AM on November 2, 2006

Also, remember when dealing with lat/long - if you retrieve your data by .5 degree increments, this will always be 30 nautical miles between prallels of latitiude, but east-west it will only be 30 nautical miles at the equator, since meridians converge. This will give you spatially heterogenous sampling areas, with an increasing density of data points approaching singularity at the poles. (Probably you know this, but, a program like Access might not and without spatial reference/display then it might not be obvious. Apologies if this is very basic and/or doesn't reflect the nature of your test sites).
posted by Rumple at 7:38 AM on November 2, 2006

Response by poster: It is decimal lat/long? I was very much hoping to do this query within Access - is there some reason this wouldn't be possible?
posted by dance at 7:50 AM on November 2, 2006

Response by poster: And by the way, thank you for your answers. I appreciate it.
posted by dance at 8:09 AM on November 2, 2006

Decimal lat-long would be easier to query, for sure, but would still suffer from the poleward convergence issue. This really only matters if you have assumptions of, or need of, equal-area sampling units since the same 0.5 degree long/lat grid can represent anything from a 30 by 30 nautical mile spacing to a 30 by 0 nautical mile spacing. UTM would help, but has its own issues of distortion - these are simplified / minimized by the breaking up of the earth into UTM Zones -- so if the military ever has reason to shell some igloos or some penguins they can get it right..

So, I can't really answer whether this is an issue for you or not, just remember that a grid of any spatial referents contains distortion necessitated by the shape of the earth. Certainly you can do it in Access, I am just highlighting that lat/long is not a rectilinear grid therefore your sampling units will be of inequal spacing/size, and if your scope is global, then this will be a non-trivial distortion of on-the-ground representation.

ESRI has great question/answer boards that cover much more than their ArcGIS software. You could ask their, though they may well say "use a GIS -- its what they're for!" -- however, their well may be a workaround for you also.
posted by Rumple at 8:33 AM on November 2, 2006

Two theirs where there should be there: they're appalled.
posted by Rumple at 8:40 AM on November 2, 2006

Not sure if your question is about more abstract query logic or about how to calculate what lat/longs would be within .5 of a degree surrounding the given location.

I don't know about the latter, but for the former, you could set up four queries (one for each fraction of a degree you want to pull), then set up a final query that left joins from Table2 to queries 1-4. Given the large initial dataset, might not be a bad idea to use make-table queries for each of the 4 queries (so the final query would join tables rather than queries).
posted by treepour at 8:51 AM on November 2, 2006

Response by poster: Thanks rumple and treepour.

Rumple, I very much accept your argument regarding the Earth's non-rectilinear grid status. However, that's not something for me to worry about _yet_!

Treepour, thanks for your help - at this stage I am looking for practical implementation tips! I'll give this a go but have a feeling I'll come unstuck pretty quickly, it's been a while since I used Access queries.
posted by dance at 10:30 AM on November 2, 2006

I'm not clear on what you mean by degree (geographical or temperature) . . . if you can describe your table structures & criteria in more detail, I can try to give more specific suggestions . . .
posted by treepour at 12:13 PM on November 2, 2006

I'm not sure if this will be helpful at all, but when I had to do some GIS-like work with lat/longs, I found that PostgreSQL has an excellent Earth Distance plugin that handles the annoying math that lat/long inevitably involves. There may be something similar for Access.
posted by Skorgu at 12:18 PM on November 2, 2006

Response by poster: hi treepour, I definitely mean degree in terms of geographical (i.e. relating to the lat/long). I guess it being the geographical degree makes things more logical.

Specific advice on queries would be wonderful!
posted by dance at 1:46 PM on November 2, 2006

Best answer: I think part of the problem here is that you're telling us how you want to do this, not what you actually need to do.

So are all lats and longs exact multiples of .5 degrees?

Likely not, so do you reallly want the four points nearest your test point? Or "the nearest point to the north/south/eastwest"? If a point has more than four nearest neighbors, how will you handle that?

Rumple pointed out that distances between any two points depends on the latitude of the points. To calculate distance, you can use this formula:
cos(lat1)*cos(lon1)*cos(lat2)*cos(lon2) +
cos(lat1)*sin(lon1)*cos(lat2)*sin(lon2) +
sin(lat1)*sin(lat2)) * radiusOfEarth;

So, first, write some views. It'll not be a fast view, as it will require a table scan of the cartesian the two tables:

Join the two tables in a cartesian:
create view data_test_cartesian
select as aid, asd alat, a.lon as alon, a.temperature, as bid, as blat. b.lon as blon
from data a, test b

Now add a view that calculates the differences:
create view orthogonal_distance
select *, alat - blat as delta_lat, alon - blon as delta_lon
from data_test_cartesian

Now filter that view for your .5 degree difference:
select * from orthogonal_distance
where abs( delta_lat <= .5 ) or abs ( delta_lon <= .5 ).

Although I suspect that you should actually calculate the distance using the Euclidean distance formula:
select * from orthogonal_distance
where square_root( delta_lat * delta_lat + delta_lon * delta_lon ) <= .5

Or, better, the spherical distance referenced above. This require converting the lat and lon to radians, of course; to make this cleaner, we'll use another view:

create view radians_lat_lon as
select *, alat * PI / 180 as alat_radians, alon * PI / 180 as alon_radians, blat * PI / 180 as blat_radians, blon * PI / 180 as blon_radians
from data_test_cartesian

create view spherical_distance as
select *,
cos(alat_radians)*cos(alon_radians)*cos(blat_radians)*cos(blon_radians) +
cos(alat_radians)*sin(alon_radians)*cos(blat_radians)*sin(blon_radians) +
sin(alat_radians)*sin(blat_radians)) * radiusOfEarth as distance
posted by orthogonality at 2:01 PM on November 2, 2006

orthogonality demonstrates why GIS were invented -- with georeferenced point data or rasters, it is a trivial matter to make a spatial query on table 1 with surrounding rasters based on table 2. All the projection issues can be dealt with via toggles.

it seems though, that 100 megs for your point data is huge. Why so big? Can you strip out non-essentials? Would it be better to grid the data to half-degree resolution, averaging temperature in each cell -- smoother data might be more realistic. (part of this is hard because we don't know the application).

Anyway, if you can shrink the 100 megs (and I mean, that size implies millions of data points if a few bytes each for tempersature -- is that necessary for your purposes? or does it mean there is all kinds of other data mixed in just taking up space) it would make life easier for you, methinks. 100 megs is going to be hard for access to swallow.

did you finish your dissertation in archaeology, BTW?
posted by Rumple at 2:53 PM on November 2, 2006

by which I mean, also, fantastic answer ortho, flagged as such.
posted by Rumple at 2:54 PM on November 2, 2006

Response by poster: If anyone wanted more info on this, my email is:

thanks a lot everyone!
posted by dance at 6:23 AM on November 3, 2006

« Older Log File Analysis on OS X?   |   What Happens When I Press This Button? Newer »
This thread is closed to new comments.