Which statistical test for these two variables?
January 4, 2009 12:05 PM Subscribe
How to calculate the probability of a result not being due to chance? Not homework!
I'm trying to work out how I should calculate the probability of something not being due to chance.
I'm calculating the return over various periods of various stock market 'buy' signals. For instance, 'buy when condition A is true, and sell after 50 days".
I have a control signal, which is 'buy every day and sell after 50 days'.
So, over 200 days I will have 200 buys from the control signal and can calculate a mean and SD deviation for the expected return of buy-and-hold.
Over the same period let's say I have 50 buys from the entry signal I am testing. I can also calculate a mean and SD for these.
Let's say the entry I am testing has a mean return which is better than buy-and-hold. What is the test I should use to determine the probability of this result being due to chance?
I'm trying to work out how I should calculate the probability of something not being due to chance.
I'm calculating the return over various periods of various stock market 'buy' signals. For instance, 'buy when condition A is true, and sell after 50 days".
I have a control signal, which is 'buy every day and sell after 50 days'.
So, over 200 days I will have 200 buys from the control signal and can calculate a mean and SD deviation for the expected return of buy-and-hold.
Over the same period let's say I have 50 buys from the entry signal I am testing. I can also calculate a mean and SD for these.
Let's say the entry I am testing has a mean return which is better than buy-and-hold. What is the test I should use to determine the probability of this result being due to chance?
This would actually be Hard.
The simplest way would probably be through simulation -- create masses upon masses of fake data with the assumptions about correlations between prices of a stock over time, correlations between prices of different stocks, and so on. The exact form your simulations would take should follow pretty directly from your theory of what's actually driving stock prices.
Otherwise, there are a lot of time-serial applications out there. Which you would use would depend on the particular characteristics of your time series.
In any case, you're misunderstanding the way that these things get phrased. Statistical methods don't find the probability that something is not due to chance for a variety of reasons. First, there is no probability that some relationship is "due to chance" -- either there is a real relationship, or there isn't; you just don't (and more or less can't) know which. Second, standard statistical tests don't determine the confidence that something is "not due to chance." What they usually do is ask "Let's suppose there was really nothing going on. What would be the probability of, just through bad luck, drawing a simple random sample that happened to show, falsely, that there was something going on?"
posted by ROU_Xenophobe at 1:05 PM on January 4, 2009 [1 favorite]
The simplest way would probably be through simulation -- create masses upon masses of fake data with the assumptions about correlations between prices of a stock over time, correlations between prices of different stocks, and so on. The exact form your simulations would take should follow pretty directly from your theory of what's actually driving stock prices.
Otherwise, there are a lot of time-serial applications out there. Which you would use would depend on the particular characteristics of your time series.
In any case, you're misunderstanding the way that these things get phrased. Statistical methods don't find the probability that something is not due to chance for a variety of reasons. First, there is no probability that some relationship is "due to chance" -- either there is a real relationship, or there isn't; you just don't (and more or less can't) know which. Second, standard statistical tests don't determine the confidence that something is "not due to chance." What they usually do is ask "Let's suppose there was really nothing going on. What would be the probability of, just through bad luck, drawing a simple random sample that happened to show, falsely, that there was something going on?"
posted by ROU_Xenophobe at 1:05 PM on January 4, 2009 [1 favorite]
So it sounds like you have a sample of the overall 'buy every day and sell after 50 days'. Consider bootstraping. That is, you have 200 days and you picked 50 of them. Create as many random n=50 subsets of those 200 days as you like (say, 100,000), then see how many of those sets your proposed 50 days beat in terms of return. The probability of getting your result by chance is how many times out of the total the random sets of 50 beat yours.
posted by a robot made out of meat at 1:41 PM on January 4, 2009
posted by a robot made out of meat at 1:41 PM on January 4, 2009
Best answer: Tests of greater and lesser sophistication are possible, and as the ROU demonstrates it's possible to really go wild. The simulation approaches offer the possibility of a better estimate of the underlying population statistics than your control sample mean and SD. But you don't have to go to the trouble. The simple t-test can give you the kind of answer you want, with much less computation.
What you want to do is estimate the probability that the control and experimental returns are drawn from the same distribution, i.e. that the return from the "'buy' signal" trades could be produced by 50 draws from the distribution of all 50-day trades. Equivalently: the probability that the return from the buy signal trades could be produced by a draw from the distribution of 50 50-day trades.
How to.
posted by grobstein at 1:44 PM on January 4, 2009
What you want to do is estimate the probability that the control and experimental returns are drawn from the same distribution, i.e. that the return from the "'buy' signal" trades could be produced by 50 draws from the distribution of all 50-day trades. Equivalently: the probability that the return from the buy signal trades could be produced by a draw from the distribution of 50 50-day trades.
How to.
posted by grobstein at 1:44 PM on January 4, 2009
Response by poster: langedon, you have it right but the sample size isn't one as I see it. I have about 6000 results from the buy-every-day strategy and several hundred to several thousand results from the test strategy. I could easily compare the results week by week or month by month to generate more sample groups if that helps.
Xenophobe, you misunderstand the problem I think.
posted by unSane at 1:46 PM on January 4, 2009
Xenophobe, you misunderstand the problem I think.
posted by unSane at 1:46 PM on January 4, 2009
Response by poster: grobstein, thank you. I thought it was the T-test but couldn't remember how to apply it to something like this.
posted by unSane at 1:47 PM on January 4, 2009
posted by unSane at 1:47 PM on January 4, 2009
It's important to consider a few statistical points before blindly applying the t-test.
Firstly, the t-test is a parametric tool for comparing means which requires that the probabilities you're interested in can be modeled via a Gaussian distribution. I don't know if that holds true here or not.
The other and perhaps more pertinent point is that the regular t-test depends upon a comparison between independent populations. I suspect that this doesn't apply to your study because you are applying two strategies (in medicine we'd call this two treatments) to the same stock. In other words, your test stock is also your control stock with a sample n=1 and 50 repeated trials. Perhaps the best way to apply your analysis would be to look at 50 different stocks over a selected time period and use either the paired t-test or its non-parametric equivalent (depending on the skewedness of your distributions), the Wilcoxon signed-rank test.
posted by drpynchon at 2:18 PM on January 4, 2009
Firstly, the t-test is a parametric tool for comparing means which requires that the probabilities you're interested in can be modeled via a Gaussian distribution. I don't know if that holds true here or not.
The other and perhaps more pertinent point is that the regular t-test depends upon a comparison between independent populations. I suspect that this doesn't apply to your study because you are applying two strategies (in medicine we'd call this two treatments) to the same stock. In other words, your test stock is also your control stock with a sample n=1 and 50 repeated trials. Perhaps the best way to apply your analysis would be to look at 50 different stocks over a selected time period and use either the paired t-test or its non-parametric equivalent (depending on the skewedness of your distributions), the Wilcoxon signed-rank test.
posted by drpynchon at 2:18 PM on January 4, 2009
IANAStatistician, so what I'm posting here is as much in hopes that someone will correct me if I'm wrong as in a belief I'm necessarily right.
You have two processes generating numbers/observations. You have 50 numbers from one process and 200 from the other. (You assume, as a simplification, that both are stationary.) Each process has a mean and a distribution. Your observations (taken as a set) also have means and distributions, which are different from the processes' because they're samples. You want to know how likely the observed difference in means is if the actual difference is zero. So you can boil this down to one random variable, (Mean1 - Mean2), where Mean1 and Mean2 are themselves random variables reflecting the expected behavior of sets of observations, and you want to know if this variable has a mean > 0.
This seems like a pretty typical statistical question, but in order to answer it, I think you need to know what the distributions of the actual processes are. And you probably don't know that a priori. Obviously the simplest assumption is that they're normal/Gaussian, which again probably isn't very close to the truth, but might be good enough as a starting point. You then work forward from these assumptions— that the two buy/sell indicators have identical means, and normal distributions with unknown SD— to compute the distribution of the (Mean1-Mean2) variable. That variable will have a mean of 0, but a nonzero deviation. You then compute how likely it is to have seen your observed deviation X (integrating this variable's probability density from X to infinity, iirc). This might be best served by a t-test, which incorporates an attempt to estimate the original processes' standard deviations based on the observations.
Anyway, the above is pretty vague, but may be enough to let you find an actual answer in a stats text. On preview, I see there've been some other answers since I started writing and it doesn't look like I'm totally off in space here.
posted by hattifattener at 2:28 PM on January 4, 2009
You have two processes generating numbers/observations. You have 50 numbers from one process and 200 from the other. (You assume, as a simplification, that both are stationary.) Each process has a mean and a distribution. Your observations (taken as a set) also have means and distributions, which are different from the processes' because they're samples. You want to know how likely the observed difference in means is if the actual difference is zero. So you can boil this down to one random variable, (Mean1 - Mean2), where Mean1 and Mean2 are themselves random variables reflecting the expected behavior of sets of observations, and you want to know if this variable has a mean > 0.
This seems like a pretty typical statistical question, but in order to answer it, I think you need to know what the distributions of the actual processes are. And you probably don't know that a priori. Obviously the simplest assumption is that they're normal/Gaussian, which again probably isn't very close to the truth, but might be good enough as a starting point. You then work forward from these assumptions— that the two buy/sell indicators have identical means, and normal distributions with unknown SD— to compute the distribution of the (Mean1-Mean2) variable. That variable will have a mean of 0, but a nonzero deviation. You then compute how likely it is to have seen your observed deviation X (integrating this variable's probability density from X to infinity, iirc). This might be best served by a t-test, which incorporates an attempt to estimate the original processes' standard deviations based on the observations.
Anyway, the above is pretty vague, but may be enough to let you find an actual answer in a stats text. On preview, I see there've been some other answers since I started writing and it doesn't look like I'm totally off in space here.
posted by hattifattener at 2:28 PM on January 4, 2009
langedon, you have it right but the sample size isn't one as I see it.
You are mostly wrong, or aren't describing things clearly. From your description I gather that you have one stock, and that you're comparing one stream of 200 days to another stream of 50 days. This would be nested or hierarchical data, with time series embedded within different stocks. Or if you want to think of it different, a TSCS problem.
Presumably you are considering doing this over several or many stocks as an investment strategy. In that case, your sample space is not over days or time, it is over stocks, at least in part. So a good sample to test this strategy would not be 200 days over one stock, but 200 stocks randomly selected from whatever the parent population of interest is.
I don't mean to be rude, but given that presumably some nontrivial amount of money is on the line, you show every sign of fundamentally misunderstanding what these statistical tools are and aren't good for in a way that can cost you a whole fucking bunch of money.
At the first glance, do you have some theoretical basis for thinking that one strategy should, a priori, deliver superior returns to another? If not, you need to understand, right the fuck down to your bones, that just comparing two things doesn't work -- test lots of strategies, and some of them will show up as "significantly" better even though they aren't. These sorts of statistical tests shine, absolutely shine, when you're using them to ask "I think X is the case. Is it?" They are jaw-droppingly terrible ways to ask "I don't know what's going on... what's the deal here?"
Second, even if you *do* work something out, comparing mean returns will absolutely not be enough unless you have an infinite time horizon or an infinite tolerance for risk. The standard deviation of returns matters too, especially since it can go down to "the firm in question goes bust and is dissolved/liquidated, and that money is gone forever with no chance of future return."
Third, the right way to do this is to think of different things that would be consistent with whatever thinking led you to think that this investment strategy is better than that one. Multiple things. Then look and see if those things are true. Your goal should not be "Is the return from this scheme, in my sample, higher than the return from my "control" scheme," but rather "Is the thinking that led me to expect this strategy to be better sound thinking?"
It's your and your family's money; do what you like. I would be surprised if anything as simple as a t-test of all things is the right way to do this. I assure you that the quant jocks in the investment firms are doing more than that.
posted by ROU_Xenophobe at 3:16 PM on January 4, 2009
You are mostly wrong, or aren't describing things clearly. From your description I gather that you have one stock, and that you're comparing one stream of 200 days to another stream of 50 days. This would be nested or hierarchical data, with time series embedded within different stocks. Or if you want to think of it different, a TSCS problem.
Presumably you are considering doing this over several or many stocks as an investment strategy. In that case, your sample space is not over days or time, it is over stocks, at least in part. So a good sample to test this strategy would not be 200 days over one stock, but 200 stocks randomly selected from whatever the parent population of interest is.
I don't mean to be rude, but given that presumably some nontrivial amount of money is on the line, you show every sign of fundamentally misunderstanding what these statistical tools are and aren't good for in a way that can cost you a whole fucking bunch of money.
At the first glance, do you have some theoretical basis for thinking that one strategy should, a priori, deliver superior returns to another? If not, you need to understand, right the fuck down to your bones, that just comparing two things doesn't work -- test lots of strategies, and some of them will show up as "significantly" better even though they aren't. These sorts of statistical tests shine, absolutely shine, when you're using them to ask "I think X is the case. Is it?" They are jaw-droppingly terrible ways to ask "I don't know what's going on... what's the deal here?"
Second, even if you *do* work something out, comparing mean returns will absolutely not be enough unless you have an infinite time horizon or an infinite tolerance for risk. The standard deviation of returns matters too, especially since it can go down to "the firm in question goes bust and is dissolved/liquidated, and that money is gone forever with no chance of future return."
Third, the right way to do this is to think of different things that would be consistent with whatever thinking led you to think that this investment strategy is better than that one. Multiple things. Then look and see if those things are true. Your goal should not be "Is the return from this scheme, in my sample, higher than the return from my "control" scheme," but rather "Is the thinking that led me to expect this strategy to be better sound thinking?"
It's your and your family's money; do what you like. I would be surprised if anything as simple as a t-test of all things is the right way to do this. I assure you that the quant jocks in the investment firms are doing more than that.
posted by ROU_Xenophobe at 3:16 PM on January 4, 2009
I wouldn't rely on the Student's t-test here, because as drpynchon points out, you do not have two independent samples. For that matter, it sounds like you don't really have samples, and you cannot assume the two means are distributed normally (which you would if the means were calculated from data drawn by a simple random sample).
That latter problem should stop you from using any statistical test, unless you can reasonably specify a model of the underlying randomness.
That said, if you must do a statistical analysis, I recommend a bootstrap approach. If your re-sampling mechanism is identical to your data-gathering method, that should let you capture any departures from independence or normality.
posted by mikeand1 at 6:01 PM on January 4, 2009
That latter problem should stop you from using any statistical test, unless you can reasonably specify a model of the underlying randomness.
That said, if you must do a statistical analysis, I recommend a bootstrap approach. If your re-sampling mechanism is identical to your data-gathering method, that should let you capture any departures from independence or normality.
posted by mikeand1 at 6:01 PM on January 4, 2009
While I stand by my answer above as a first cut, I should say that I agree with many of the reservations in this thread, especially in ROU Xenophobe's most recent comment. He is right in his note of general caution -- I would not make this t statistic the linchpin of a quantitative trading strategy -- and also in his more specific methodological notes.
The data really is time series data, and if you use a model (like that t-test) that assumes otherwise, you are liable to make errors of the sort that confuse time trends with the results of your trading strategies. Building a time-series model that distinguished this from the significance of your buy signal could be quite hard, I think.
I still think the t-test is useful and can tell you a lot, but of course you should keep its limitations and pitfalls in mind.
I think drpynchon's note on independent samples is wrong. I think you have independent samples from the distribution of returns from possible 50-day trades. The majority of opinion seems to be against me, though, and I am very rusty at this.
posted by grobstein at 8:11 PM on January 4, 2009
The data really is time series data, and if you use a model (like that t-test) that assumes otherwise, you are liable to make errors of the sort that confuse time trends with the results of your trading strategies. Building a time-series model that distinguished this from the significance of your buy signal could be quite hard, I think.
I still think the t-test is useful and can tell you a lot, but of course you should keep its limitations and pitfalls in mind.
I think drpynchon's note on independent samples is wrong. I think you have independent samples from the distribution of returns from possible 50-day trades. The majority of opinion seems to be against me, though, and I am very rusty at this.
posted by grobstein at 8:11 PM on January 4, 2009
I'm confused by your presentation of the question. It's not clear to me whether you're actually going to just look at one stock under these two different conditions, or whether you've got a theory that you want to test over multiple stocks paired with model strategy vs. control strategy. You also don't mention what the condition(s) are that you're thinking of modeling and what level of the analysis they are associated with - e.g., would the conditions be associated with macro-level factors? With the timing of events/conditions specific to the company? The day of week? Something else? I ask this because I think that you need to consider on a conceptual level how your sample of experimental buy/sell transactions will differ from a randomly drawn sample in order to make the correct statistical choice. Plot your data!
IANAFinancePerson, but it seems to me that there's no reason to assume either normality of your distribution of profit/loss margins or the lack of weird dependencies between data points due to the temporal nature of the data. Don't stock prices tend to rise and fall in larger arcs than 50-day periods?
I think that everyone who's telling you to simulate/bootstrap the heck out of it is giving you sound advice. The simulation approach will also give you a lot more flexibility in how you construct your controls. For example, if when you plot your data (looking at the distribution of points that meet experimental condition x vs. those that don't) you find bunchiness over certain stretches of time, you might want to devise your simulation to grab random data points from within a certain distance around those points (i.e., in a loose way attempting to "match" those points on other time-period-associated factors). Either way, setting up the simulation would be only a minor hassle in the larger scheme of things and your results would be far less laughable from any academic or practical perspective than a straight t-test.
And, the whole exercise would be way more meaningful if you are planning to look at this across multiple stocks. I imagine that for any individual stock I imagine the noise level will be pretty big and unknowable. Just saying.
posted by shelbaroo at 10:20 PM on January 4, 2009
IANAFinancePerson, but it seems to me that there's no reason to assume either normality of your distribution of profit/loss margins or the lack of weird dependencies between data points due to the temporal nature of the data. Don't stock prices tend to rise and fall in larger arcs than 50-day periods?
I think that everyone who's telling you to simulate/bootstrap the heck out of it is giving you sound advice. The simulation approach will also give you a lot more flexibility in how you construct your controls. For example, if when you plot your data (looking at the distribution of points that meet experimental condition x vs. those that don't) you find bunchiness over certain stretches of time, you might want to devise your simulation to grab random data points from within a certain distance around those points (i.e., in a loose way attempting to "match" those points on other time-period-associated factors). Either way, setting up the simulation would be only a minor hassle in the larger scheme of things and your results would be far less laughable from any academic or practical perspective than a straight t-test.
And, the whole exercise would be way more meaningful if you are planning to look at this across multiple stocks. I imagine that for any individual stock I imagine the noise level will be pretty big and unknowable. Just saying.
posted by shelbaroo at 10:20 PM on January 4, 2009
People who are telling you not to rely on a t-test aren't kidding. Also consider:
1) comparison to dart throwing may not be the most informative or relevant thing
2) the difference between one subset and another is specific to the underlying conditions. All you can say absent theory is the probability of your method working better should the exact global market state return to where it was when you tested it.
3) p-values (what you're calculating) are not a measure of strength of association. Consider generating a confidence interval.
posted by a robot made out of meat at 7:43 AM on January 5, 2009
1) comparison to dart throwing may not be the most informative or relevant thing
2) the difference between one subset and another is specific to the underlying conditions. All you can say absent theory is the probability of your method working better should the exact global market state return to where it was when you tested it.
3) p-values (what you're calculating) are not a measure of strength of association. Consider generating a confidence interval.
posted by a robot made out of meat at 7:43 AM on January 5, 2009
Response by poster: thanks all for the many comments
the bootstrapping approach is extremely helpful and I understand why the t-test may not be appropriate
this experiment was a small response to some research done by Larry Connors in his book HOW THE STOCK MARKET REALLY WORKS. I felt that his results, while interesting, were skewed towards very short term results.
I constructed a simple test which could be run on any security and would compare both the edge ratio ( defined as maximum adverse / maximum favorable exursion over a particular period ) and the return (over the same period) with the edge and return of a buy and hold strategy.
The intent was not specifically to compare any entry strategy with a buy and hold strategy but to compare entry strategies with each other, with b&h as a reference point.
my results seemed to show that Connors was right over very short periods but spectacularly wrong given certain conditions over longer periods, which is something he never mentions.
the intent was NOT to construct an investment strategy but to better understand the differences between different kinds of entries given different holding periods.
the way my test is constructed, it's hard to present the variance of the results so I was looking for a shorthand way of indicating how much stock one should put in some of the outliers, where the return seemed very high but the number of entries was quite small.
(for example, a stock doesn't make a 200-day high very often, but when it does, the 200-day return is often quite high)
I'm not using these results to invest at all. I do have mechanical systems which I use to trade but they are based on LOTS of walk-forward testing, market-neutral, and highly diversified. I use a modified Sharpe ratio to compare them (the regular Sharpe ratio considers profitable variance to be a bad thing). They also have to pass my sanity test, which is that they make decisions that I can understand and defend, even if I don't agree with them.
FWIW they beat my own discretionary trading decisions by a massive margin!
Thanks again. The bootstrapping thing is excellent.
posted by unSane at 8:10 PM on January 5, 2009
the bootstrapping approach is extremely helpful and I understand why the t-test may not be appropriate
this experiment was a small response to some research done by Larry Connors in his book HOW THE STOCK MARKET REALLY WORKS. I felt that his results, while interesting, were skewed towards very short term results.
I constructed a simple test which could be run on any security and would compare both the edge ratio ( defined as maximum adverse / maximum favorable exursion over a particular period ) and the return (over the same period) with the edge and return of a buy and hold strategy.
The intent was not specifically to compare any entry strategy with a buy and hold strategy but to compare entry strategies with each other, with b&h as a reference point.
my results seemed to show that Connors was right over very short periods but spectacularly wrong given certain conditions over longer periods, which is something he never mentions.
the intent was NOT to construct an investment strategy but to better understand the differences between different kinds of entries given different holding periods.
the way my test is constructed, it's hard to present the variance of the results so I was looking for a shorthand way of indicating how much stock one should put in some of the outliers, where the return seemed very high but the number of entries was quite small.
(for example, a stock doesn't make a 200-day high very often, but when it does, the 200-day return is often quite high)
I'm not using these results to invest at all. I do have mechanical systems which I use to trade but they are based on LOTS of walk-forward testing, market-neutral, and highly diversified. I use a modified Sharpe ratio to compare them (the regular Sharpe ratio considers profitable variance to be a bad thing). They also have to pass my sanity test, which is that they make decisions that I can understand and defend, even if I don't agree with them.
FWIW they beat my own discretionary trading decisions by a massive margin!
Thanks again. The bootstrapping thing is excellent.
posted by unSane at 8:10 PM on January 5, 2009
This thread is closed to new comments.
posted by langedon at 1:01 PM on January 4, 2009