Statistical hypothesis testing in trading strategy development

Any sophisticated trader or developer of trading strategies has experienced the exuberance with the discovery of a superb trading strategy that would surely make him rich and later the setbacks or disappointments in the real performance of his wonderful strategy. In real trading, at best, the strategy makes him little money, and at worse, loses him a lot of money.

Out protagonist John has come to an advanced stage of trading (who else inferior to him can come up with such a superb strategy?) and known to conduct serious testing with his strategies. He is baffled by the results and can’t understand why. Then, through research, he discovered this ultimate tool called statistical hypothesis testing. Reportedly, successful hedge funds use it to validate their strategies. Plus, according to academic papers, it’s the methodology used widely by the large scientific community to confirm, for instance the discovery of Higgs Boson particles or genes related to particular genetic diseases or at least the effectiveness of drugs.

So What is statistical hypothesis testing?

From Wikipedia: “A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.”

Though the exact procedures are still not without debates, the general idea is: if a hypothesis can be confirmed as true or valid, it has to stand out from the random processes that apply to the same matter of the hypothesis.

So, it sounds very logical. For instance, if you want to prove that you have good skills at the football penalty kicks, you do say 100 kicks (without a goalkeeper) and compare your results with those of a thousand idiots. Say you scored 97 and rank the 11th among the thousand idiots, or the top 1.1%, then the committee confirms your skill, or in other words, they confirm that your claim of having good skills at the football penalty kicks as true or valid. That means that since you rank at the top 1.1% they trust that you truly have the skill and you will score similarly in future kicks.

Note here, 10 people were doing better than you, and you’re still calling them idiots?

Back to trading. Now John thoroughly studied the testing tool and tested his strategy against a large set of strategies with random trades. The result is that his strategy ranks at 89%. By a commonly accepted rule, it needs to be at least above 95% to be considered as having some merit. Now John isn’t sure if he is disappointed or encouraged by the number. On the one hand the number tells him he doesn’t have a strategy to get rich, on the other, it sort of clarifies the losses he experienced with the trading.

So the testing tool work really well? In the case where John had no clue about any real advantage of his strategy, i.e. how exactly it works, and for anyone else in a similar situation as John, the testing tool is a blessing. At least it prevents losing money naively. But in general, it’s not so simple to conclude. Let’s look at some scenarios to understand better.

In the football case, remember the 10 idiots who did better than you? They were really idiots without any football experience. Their scores were solely by chance or luck. As they all reached the top 1%, would you qualify them according to the testing rule that they do have skills?

Let’s now look instead at betting on football penalty kicks. The idiot has a 50% success rate to score (this is the same as flipping a coin, but let’s stick with football). Your task to bet on every kick whether he scores or not. I know you are a football expert, but here you don’t get to watch the kicks at all and you bet blindly together with 1000 idiots also betting blindly. We know the end results are some win 97%, 93%, 87%, …, 8%, 5%, 2%, etc. These are all purely by chance. So you know in this case you have no way to really achieve differently from the idiots.

Now, say you somehow manage to get a peek at 10% of the kicks and the ball actions, and you can bet just before the ball goes in/out of the gate. So now you have an advantage and a winning strategy which is to bet the 10% exactly as you see and the other 90% all as “out”. You will have 55% win rate. Say on every win you win 1 dollar and lose 1 dollar otherwise. Is this a superb strategy? Probably or probably not depending on who makes the judgement. But it’s surely a good strategy. Even casinos would love to have it! But would your strategy stand out from the idiots in our test? No chance. Remember some could win 97%, 93%, 87% , … simply by chance and you only win 55%. So the test would qualify the strategy as not having merit and advise you to reject it.

Now you are back to bet blindly. But say somehow a sequence of 0’s and 1’s came to your head (act of God let’s say), and you bet according to that sequence, and in the end you win 98% and you rank well within the top 1% of the crowds. So our test would confidently prove your strategy as valid. Would you use this strategy for you next round of bets? What strategy? - you might ask. Would God keep sending you the sequences? Why would he?

Now let’s go back to trading. Say every day you receive an insider tip on some stock and you know 60% of the tips are reliable or valid and the amounts of profit and loss are the same on each trade. Let’s not say if it’s legal or not, but would this be a good strategy for you? No question right? Would you want to go through the test to see if it’s valid or not?

Now let’s stay legal, shall we? Say you discovered a stock that in the past year moved in small trends like the following: up 5%, down %3, up 7%, down 4%, up 6%, down %4, up 8%, down 4%, … So you defined your buy sell strategy based on this discovery. Put in the statistical test, the strategy ranks within the top 1% among random strategies. Would you trust it as having merit going forward?

To summarize, what I am trying to say is whether a strategy has some merit depends on how much advantage it has in its essence. The statistical hypothesis testing doesn’t provide much help.

Why are the testing procedures largely adopted by the scientific communities but not helpful in trading?

This is surely a deep question that deserves expanded reasoning. For some quick point here, I would point at one clear difference. Science projects use the testing procedures on something existing at the present. Be it Higgs Boson, or gene sequences, or even your football kick skills. When you put the present truths against the random statements, truths are to stand out. Whereas in trading or betting the focus is on changes in the future. We human being have figured out some ways to glimpse into the future albeit only minimally. Great traders learned to take advantage of this minimal advantage to profit. So our capacity on grasping the target (future) is largely diminished compared to that about the present, at the same time the power of randomness stays the same on the future as on the present. However the testing procedures still require us beating the power of randomness, that’s essentially requiring us to have extreme clarity about the future, but few of us are prophets and few of our strategies can have prophetic power. So that’s where it fails and becomes not helpful.

A takeaway from this is:

Any strategy appearing superbly performing is likely not valid, and a poor strategy might be a real treasure.

Cheers! Santé! Prost!