October 11, 2012 7:50 AM Subscribe

Statistics filter: How can I categorize time series curves into pattern categories?

I have a time series measurement with a variable over 6 measurements. Each time series for an individual case creates a curve with a pattern - e.g. staying constant, decreasing over time, falling and rising again, etc.

I would like to cluster these curves into pattern categories, but don't know how.

What I've done so far is simplify the pattern based on a threshold of gain and loss, i.e. if the curve rises by X% between t0 and t1 it's a gain (/), if it falls by X% it's a loss (\), otherwise it's a horizontal movement (-). That gives me 6^3 = 216 possible patterns, such as:

------

--\\\-

-//-\-

Some of them are similar and probably belong into the same category (such as --\--- and ---\--).

Is there a better, easier way than clustering the curves by hand? Is there a more sophisticated way of finding patterns, i.e. based on more than just "rising", "falling", "horizontal"?
posted by lord_yo to Science & Nature (2 answers total) 2 users marked this as a favorite

I have a time series measurement with a variable over 6 measurements. Each time series for an individual case creates a curve with a pattern - e.g. staying constant, decreasing over time, falling and rising again, etc.

I would like to cluster these curves into pattern categories, but don't know how.

What I've done so far is simplify the pattern based on a threshold of gain and loss, i.e. if the curve rises by X% between t0 and t1 it's a gain (/), if it falls by X% it's a loss (\), otherwise it's a horizontal movement (-). That gives me 6^3 = 216 possible patterns, such as:

------

--\\\-

-//-\-

Some of them are similar and probably belong into the same category (such as --\--- and ---\--).

Is there a better, easier way than clustering the curves by hand? Is there a more sophisticated way of finding patterns, i.e. based on more than just "rising", "falling", "horizontal"?

You should at least look into "latent class growth" analysis/models (other search terms: longitudinal "finite mixture model", "growth mixture model"). The method is popular in developmental psychology and criminal justice research, but can be used in many longitudinal data applications. Very basically, the idea is: there are unobserved sub-populations (latent classes) of trajectories in your sample, and using your substantive knowledge (theory & hypotheses) in combination with statistics (e.g., Bayesian Information Criterion), you figure out how many classes of curves there are in your sample. If you have other (e.g., covariates) data that would explain why an individual would belong to a certain trajectory class, then these should be included in your model. Software can provide class assignments of individual cases (or probability of a case belonging to a particular class).

There is a freely available tutorial article by Jung and Wickrama that I think focuses on Mplus implementation. statmodel.com (homepage of Mplus software) has lots of free papers by Bengt Muthen and colleagues about this stuff (and so much more). Besides Mplus, another implementation is a user-written program for SAS, PROC TRAJ (I think by Bobby Jones at Carnegie Mellon) -- check out work by D. Nagin and Jones for more info. I think there is something of an R implementation, but can't speak to it.

This may be wrong and/or over-kill for you -- it depends on the nature of your data and the project. There might be specific aspects of your data that make this approach ill-suited; I think "time-series" may have certain connotations that I'm not familiar with -- more detail (e.g., type of variable measured, number of cases) could be helpful to a potential answerer.

There are also many opinions about why this method for classifying trajectories is not very good. I am personally just starting to delve into this topic for a thesis, so I wish I could be of more help -- but I felt like I should at least point you toward this, in case it would work for you.

posted by mean square error at 8:56 AM on October 11, 2012 [3 favorites]

There is a freely available tutorial article by Jung and Wickrama that I think focuses on Mplus implementation. statmodel.com (homepage of Mplus software) has lots of free papers by Bengt Muthen and colleagues about this stuff (and so much more). Besides Mplus, another implementation is a user-written program for SAS, PROC TRAJ (I think by Bobby Jones at Carnegie Mellon) -- check out work by D. Nagin and Jones for more info. I think there is something of an R implementation, but can't speak to it.

This may be wrong and/or over-kill for you -- it depends on the nature of your data and the project. There might be specific aspects of your data that make this approach ill-suited; I think "time-series" may have certain connotations that I'm not familiar with -- more detail (e.g., type of variable measured, number of cases) could be helpful to a potential answerer.

There are also many opinions about why this method for classifying trajectories is not very good. I am personally just starting to delve into this topic for a thesis, so I wish I could be of more help -- but I felt like I should at least point you toward this, in case it would work for you.

posted by mean square error at 8:56 AM on October 11, 2012 [3 favorites]

This thread is closed to new comments.

posted by lord_yo at 7:52 AM on October 11, 2012