Thursday, August 07, 2014

One Testicle and One Breast: When Averages Are Misleading

Hat tip to the Index Indicators site, which is the source for the above chart of the Standard and Poor's 600 small cap index and the percentage of those 600 stocks trading above their 20-day moving averages.  As you can see from this measure, we were oversold on this measure as of Tuesday's close, with only a little less than 24% of small caps trading above their 20-day averages.

When I performed a 3-year backtest on the indicator on the site, I found 48 non-overlapping occasions in which fewer than 30% of small cap stocks were trading above their 20 DMA.  Over the next five trading days, there were 33 profitable instances and 15 losing ones, for an average gain of .93%.  During that three year period, the small cap average was up a little over 56%.  If one had simply bought the oversold occasions tested above, one would have earned almost 45%, with far less market exposure.

That looks like a decent edge, but one ingredient is missing from the mix:  What is the variability of outcomes around the average performance?

Averages can be misleading if considered in isolation, because the average of a highly variable distribution tells us little about specific outcomes we're likely to encounter.  There's the old joke about the person who couldn't swim but confidently entered the water because it averaged only 3 feet in depth.  Or, more crudely, there is little information in the truism that the average person has one breast and one testicle.

Let's look at the adverse excursions surrounding the oversold occasions involving the small cap stocks.  During the latter part of 2011, buying the oversold small caps and holding for five days would have exposed a trader to drawdowns of 13.97%, 4.94%, and 7.38% on August 2nd, 9th, and 16th, respectively.  That strategy on September 28th of 2011 encountered a drawdown of 6.02%.  On November 17th, buying the oversold market led to a drawdown of 6.85%. 

In the lower VIX markets of 2012, we still found five-day drawdowns for buyers of the oversold market of over 3% on May 14th, May 30th, and November 8th.  Had we bought the strategy when it first triggered on July 24th of this year, we would have experienced a drawdown in excess of 3%. 

Indeed, the average drawdown over this period was 1.84%, with a standard deviation of 2.84%.  Out of the 48 occasions, 19 drew down more than 1% during the subsequent five-day period and 14 drew down more than 2%.  Not exactly risk-free.

The moral of the story is that, when it comes to testing strategies or evaluating trader track records, the path matters as much as the endpoint.  That is why statistical tests are essential:  they tell us when an edge is meaningful relative to the variability surrounding the market outcomes.  That is also why trading firms and their investors look at risk-adjusted returns, not just absolute dollar gains.

When system developers assess their systems, they don't just look at hit rates and average sizes of winning and losing trades.  They also look at maximum adverse excursions and average adverse excursions.  How much heat does the system take before it produces its results?  There's no practical edge to an idea that requires more heat than traders can prudently take.

