QIM’s Jaffray Woodriff

Multiple quotes that I found useful:

I discovered that it was much better to use multiple models than a single best model.

Probably why he uses ensemble methods.

I started out using market-specific models. I ended up realizing that these models were far more vulnerable to breaking down in actual trading because they were more prone to being overfitted to the past data. In 1993, I started to figure out that the more data I used to train the models, the better the performance. I found that using the same models across multiple markets provided a far more robust approach. So the big change that occurred during this period was moving from separate models for each market to common models applied across all markets. The second change that occurred was increased diversification. I started out trading only two markets and then for some time traded only three markets. But as assets under management increased, and I realized it was best to use the same models across all markets, I added substantially more markets to the portfolio. The transition to greater diversification also helped improve performance. By 1994, I was trading about 20 markets, and I was no longer using market-specific models. Those changes made a big difference.

Diversifying across multiple markets is a commonly used tactic to prevent over-fitting, however I don’t particularly believe that a successful model is required to perform well on all markets, only markets that are similar (such as S&P500 & Nasdaq ETFs). Models that perform well in market A, but not int market B is not necessarily overfit, it only means that the particular model captures idiosyncratic behavior that is only observable in market A. I do agree that a model that successful across multiple markets will be more robust than one that is only successful in a handful of markets.

There are books about the predictive modeling process that specifically caution against “burning the data”—that is, you have to greatly limit the number of combinations you ever try. And I found that advice markedly stupid because I knew I could find a way to try any number of combinations and not overfit the data. You get new out-of-sample data every day. If you are rigorous about acknowledging what that new data is telling you, you can really get somewhere. It may take a while. If you are trading the system, and it is not performing in line with expectations over some reasonable time frame, look for overfit and hindsight errors. If you are expecting a Sharpe ratio above 1, and you are getting a Sharpe ratio under 0.3, it means that you have made one or more important hindsight errors, or badly misjudged trading costs. I was using the data up to a year before the current date as the training data set, the final year data as the validation data set, and the ongoing real-time data as the test. Effectively, the track record became the test data set.

Walk-forward testing before testing the system live with real money would be safer.

The first thing you need to do is to get some idea of how much of the apparent edge is completely spurious. Let’s say instead of training with the target variable, which is the price change over the subsequent 24 hours, I generate random numbers that have the same distribution characteristics. I know that any models that I find that score well training on this data are 100 percent curve-fitted because they are based on intentionally bogus data. The performance of the best model on the fictitious data provides a baseline. Then, you need to come up with models that do much better than this baseline when you are training on the real data. It is only the performance difference between the models using real data and the baseline that is indicative of expected performance, not the full performance of the models in training.

This, and monte-carlo simulations are good additional tests to prevent curve-fitting.

You can look for patterns where, on average, all the models out-of-sample continue to do well. You know you are doing well if the average for the out-of-sample models is a significant percentage of the in-sample score. Generally speaking, you are really getting somewhere if the out-of-sample results are more than 50 percent of the in-sample.

Cross-validate to protect against curve-fitting.

Sometimes we give a little more weight to more recent data, but it is amazing how valuable older data still is. The stationarity of the patterns we have uncovered is amazing to me, as I would have expected predictive patterns in markets to change more over the longer term.

I think there are idiosyncratic movements that are only existent in more recent data, but he uses older data to prevent (once again) prevent-curve fitting.

The core of the risk management is evaluating the risk of each market based on an exponentially weighted moving average of the daily dollar range per contract.

How he manages to keep so close to his volatility target.

– All quotes from Jaffray Woodriff in Hedge Fund Market Wizards

Ensemble Methods with Jaffray Woodriff

I read about Jaffray Woodriff from Quantitative Investment Management in Jack Schwager’s Hedge Fund Market Wizards right when it came out, and I was reminded of him during one of my sessions studying data mining. I googled him to see if I could find more details on him and found 2 articles (1,2). He uses a method called “ensemble methods”, which means he uses an aggregate of multiple models in order to increase predictive ability. Also, since he states that he only uses OHLC data, I assume that all of his secondary variables are some kind of technical analysis.

The idea of using ensemble methods is kind of similar to using multiple trading indicators, or trading filters; however, it is more formal since it incorporates Bayesian thinking. I assume that when Woodriff states he uses ensemble methods to combine 800 trading models, the models themselves were built using data mining techniques since ensemble methods originate from the data mining field, and in Hedge Fund Market Wizards he states that he uses data mining techniques to look for historically profitable patterns.

Another thing he states is that his underlying approach to system design is different than that of major quant houses such as D.E. Shaw:

Rather than blindly searching through the data for patterns—an approach whose methodological dangers are widely appreciated within, for example, the natural science and medical research communities—we typically start by formulating a hypothesis based on some sort of structural theory or qualitative understanding of the market, and then test that hypothesis to see whether it is supported by the data.

– D. E. Shaw in Stock Market Wizards

I don’t do that. I read all of that just to get to the point that I do what I am not supposed to do, which is a really interesting observation because I am supposed to fail. According to almost everyone, you have to approach systematic trading (and predictive modelling in general) from the framework of ‘Here is a valid hypothesis that makes sense within the context of the markets.’ Instead, I blindly search through the data.

– Woodriff in Hedge Fund Market Wizards

The fact that he disregards commonly accepted principles makes me question whether his alpha is generated from his ensemble methods, or from his unique approach (or both). Of course there is always the possibility that he has found a way of mindlessly mining the data without burning it, but I wonder if his mindless mining approach possibly detracts from his system’s alpha. This is definitely a topic for future research.