Multiple quotes that I found useful:
I discovered that it was much better to use multiple models than a single best model.
Probably why he uses ensemble methods.
I started out using market-specific models. I ended up realizing that these models were far more vulnerable to breaking down in actual trading because they were more prone to being overfitted to the past data. In 1993, I started to figure out that the more data I used to train the models, the better the performance. I found that using the same models across multiple markets provided a far more robust approach. So the big change that occurred during this period was moving from separate models for each market to common models applied across all markets. The second change that occurred was increased diversification. I started out trading only two markets and then for some time traded only three markets. But as assets under management increased, and I realized it was best to use the same models across all markets, I added substantially more markets to the portfolio. The transition to greater diversification also helped improve performance. By 1994, I was trading about 20 markets, and I was no longer using market-specific models. Those changes made a big difference.
Diversifying across multiple markets is a commonly used tactic to prevent over-fitting, however I don’t particularly believe that a successful model is required to perform well on all markets, only markets that are similar (such as S&P500 & Nasdaq ETFs). Models that perform well in market A, but not int market B is not necessarily overfit, it only means that the particular model captures idiosyncratic behavior that is only observable in market A. I do agree that a model that successful across multiple markets will be more robust than one that is only successful in a handful of markets.
There are books about the predictive modeling process that specifically caution against “burning the data”—that is, you have to greatly limit the number of combinations you ever try. And I found that advice markedly stupid because I knew I could find a way to try any number of combinations and not overfit the data. You get new out-of-sample data every day. If you are rigorous about acknowledging what that new data is telling you, you can really get somewhere. It may take a while. If you are trading the system, and it is not performing in line with expectations over some reasonable time frame, look for overfit and hindsight errors. If you are expecting a Sharpe ratio above 1, and you are getting a Sharpe ratio under 0.3, it means that you have made one or more important hindsight errors, or badly misjudged trading costs. I was using the data up to a year before the current date as the training data set, the final year data as the validation data set, and the ongoing real-time data as the test. Effectively, the track record became the test data set.
Walk-forward testing before testing the system live with real money would be safer.
The first thing you need to do is to get some idea of how much of the apparent edge is completely spurious. Let’s say instead of training with the target variable, which is the price change over the subsequent 24 hours, I generate random numbers that have the same distribution characteristics. I know that any models that I find that score well training on this data are 100 percent curve-fitted because they are based on intentionally bogus data. The performance of the best model on the fictitious data provides a baseline. Then, you need to come up with models that do much better than this baseline when you are training on the real data. It is only the performance difference between the models using real data and the baseline that is indicative of expected performance, not the full performance of the models in training.
This, and monte-carlo simulations are good additional tests to prevent curve-fitting.
You can look for patterns where, on average, all the models out-of-sample continue to do well. You know you are doing well if the average for the out-of-sample models is a significant percentage of the in-sample score. Generally speaking, you are really getting somewhere if the out-of-sample results are more than 50 percent of the in-sample.
Cross-validate to protect against curve-fitting.
Sometimes we give a little more weight to more recent data, but it is amazing how valuable older data still is. The stationarity of the patterns we have uncovered is amazing to me, as I would have expected predictive patterns in markets to change more over the longer term.
I think there are idiosyncratic movements that are only existent in more recent data, but he uses older data to prevent (once again) prevent-curve fitting.
The core of the risk management is evaluating the risk of each market based on an exponentially weighted moving average of the daily dollar range per contract.
How he manages to keep so close to his volatility target.
– All quotes from Jaffray Woodriff in Hedge Fund Market Wizards