# What techniques should I use to tease out leading indicators from a set of data?

March 26, 2010 8:35 AM Subscribe

This is a question for the statistics / data analytics junkies out there. What techniques should I use to tease out leading indicators from a set of data?

So let me set up the question. I have a large set of data. I have a bunch of individual objects and they transition between a finite number of discrete states. I've reduced the data down to a list of state transitions which say that this object moved from this state to this state at this time. What I'm trying to do is determine if any of these objects are "influencers". For example, when they move to a particular state, a bunch of other ones consistently follow.

Does anyone have any idea where to start looking into how to do this. While I lack statistics experience, I have a mathematics background, so I'm hoping I can handle any references you may have. I just don't know where to start looking. I can't even think of which terms I should Google!

Thanks!

So let me set up the question. I have a large set of data. I have a bunch of individual objects and they transition between a finite number of discrete states. I've reduced the data down to a list of state transitions which say that this object moved from this state to this state at this time. What I'm trying to do is determine if any of these objects are "influencers". For example, when they move to a particular state, a bunch of other ones consistently follow.

Does anyone have any idea where to start looking into how to do this. While I lack statistics experience, I have a mathematics background, so I'm hoping I can handle any references you may have. I just don't know where to start looking. I can't even think of which terms I should Google!

Thanks!

Best answer: Liang et al.'s REVEAL algorithm might be of help.

posted by PMdixon at 8:52 AM on March 26, 2010

posted by PMdixon at 8:52 AM on March 26, 2010

Best answer: I've never used it, as I don't much deal with time series, but IIRC this is one of the things vector autoregression is used for.

posted by ROU_Xenophobe at 10:27 AM on March 26, 2010

posted by ROU_Xenophobe at 10:27 AM on March 26, 2010

Best answer: Seconding devilsbrigade suggestion. If you want to get fancier, you could try principal component or neural network analysis.

posted by surfgator at 3:39 AM on March 27, 2010

posted by surfgator at 3:39 AM on March 27, 2010

This thread is closed to new comments.

Or, the semi-stats way that I'd opt for would be to run the chains themselves for a long time and look for correlations in the sequences of states each object goes through (you can google around for correlation of sequences, there's some literature on it).

posted by devilsbrigade at 8:40 AM on March 26, 2010