Help me analyze data about networks.
February 8, 2012 2:25 PM   Subscribe

How can I analyze some data about linkages between data elements in a way that will give me meaningful information about the similarity of the relationships? This has got to be a solved problem!

Let's say I've got a list of everyone who has walked through some doors, say, in a mall. In fact, I excellent records about which doors each person has walked through. I'm looking for some sort of analysis that will help me to collect people into groups based on how similar they are in terms of which doors they mostly walk through (this group often goes to Sears and JC Penny, that group only goes to Cinnabon, etc). I'm also looking to be able to be able to calculate some sort of degree of "aberrance" in behavior based on grouping - say, when someone who belongs to a group that would otherwise only ever shop at shoe stores also went to at Tower Records that one time.

The behavioral aspect is purely illustrative - I'm strictly looking to be able to group things together with some sort of numerical "confidence", hopefully that I could analyze for anomalies or change over time.

What is this area of math called? Are there any good tool sets (say, as part of R, or MATLAB/Octave) that I can use? Can anyone recommend a good textbook? I haven't taken linear algebra, so fewer pre-reqs would be nice.
posted by TheNewWazoo to Education (9 answers total) 2 users marked this as a favorite
 
You might be interested in some software recently featured on the blue: Eureqa
posted by j03 at 2:34 PM on February 8, 2012


You're describing categorical data. The area of statistics you're looking for is Analysis of Categorical Data.
posted by demiurge at 2:45 PM on February 8, 2012 [1 favorite]


Response by poster: At first glance, Eureqa seems to do something different than what I seek. Instead of having data that's
(x1,y1)
(x2,y2)
I've got data that's more like
(x, {a,a,b,c})
(y, {c,d,e})

Can I translate one into the other? Forgive me if it's a silly question.
posted by TheNewWazoo at 2:46 PM on February 8, 2012


I believe Eureqa is designed for continuous variables (real numbers) and not for categorical data.
posted by demiurge at 2:55 PM on February 8, 2012


Social network analysis is sort of what you're talking about. But also discriminant analysis is worth looking at too.
posted by k8t at 2:55 PM on February 8, 2012


How about association rule learning? It's been around forever, and I've used it before with some pretty interesting results.
posted by un petit cadeau at 4:00 PM on February 8, 2012


Response by poster: Excellent! Thank you, everyone, for the keywords I can search against. Time to brew some tea and get my study on. If anyone knows of any good open courseware or study material on these subjects, I'd appreciate a pointer.
posted by TheNewWazoo at 7:48 PM on February 8, 2012


For actually plotting the association rules, I like Gephi.
posted by gregglind at 8:33 AM on February 9, 2012


I think a chi-squared test (for variance in the population) might work for this: analyze sample subgroups (age cohorts for example) by doors of interest. My understanding is that this test would tell you whether the visitation patterns you're seeing in sample subgroups are significantly different from what might occur due to chance (i.e. the null hypothesis, or no relationship between the sample subgroup and the doors). This test assumes a large sample and more than five observations per table cell.

R or any other statistical software would be able to do this; Excel has a lot of built in stats, so that would be worth a look as well.

Disclaimer: I'm still learning stats & data analysis, so I could be wrong.
posted by smirkette at 9:23 AM on February 9, 2012


« Older Google's motto, "Don't provide actual links"   |   It's purely textual... Newer »
This thread is closed to new comments.