Where to start identifying relationships in a set of numerical, binary, menu data
May 7, 2008 5:16 PM
Subscribe
Ok so I have this huge table of survey data - much of it numerical, much of it binary, some of it from selections from menus of text items (e.g. blue, green, orange etc). Where do I start to find the most noticeable relationships between variables?
I have some familiarity with regression analysis and am equipped with R (free stats package, but not too familiar with all its functionality). But how do I
a) deal with the binary and menu-based data?
b) start to find the most significant dependencies? Just randomly? (I mean for example, maybe I will discover that all females between 25 and 30 who like the colour pink tend to eat lots more icecream on Thursdays.)
Even a text book or a tutorial telling me what stats I need to know would be useful.
posted by vizsla to science & nature (13 comments total)
2 users marked this as a favorite
The vector under discussion is in parameter space: you want to quantize your data to form a set of n-dimensional Voronoi cells. You can do lossy compression by approximating the full set by a representative element in each cell.
posted by Araucaria at 5:29 PM on May 7, 2008