Cumulative IBS Indicator

Inspired by Larry Connors’ Cumulative RSI(2) (found in this post), and the results of my Cumulative DV2 (found in this post) I decided to test out how a Cumulative IBS indicator would work. The formula for IBS can be found here, and the cumulative IBS is the X-day simple moving average of the IBS. This is a frictionless test on SPY from 1/1/2000 – 12/26/2012. I tested the cumulative IBS using default parameters from the original post I found over IBS (Long if IBS < 45 & Short if IBS > 95).

Equity curves for cumulative 1-9 day IBS. Starts at 1-day in the top left and ends at 9-day on the bottom right. It counts from left to right, meaning that the top middle picture is the cumulative 2-day IBS.

Equity curves for cumulative 10-18 day IBS. Follows same structure as above.

We can see the equity curves of the cumulative IBS reinforces the conclusion in the cumulative DV2 post.

Here are the individual graphs in anyone is interested:

2-day

3-day

4-day

5-day

6-day

7-day

8-day

9-day

10-day

11-day

12-day

13-day

14-day

15-day

16-day

17-day

18-day

Cumulative DV2 Indicator

Inspired by Larry Connors’ Cumulative RSI(2) (found in this post), I decided to test out how a Cumulative DV2 indicator would work. This is a frictionless test on SPY from 1/1/2000 – 12/26/2012. I tested the cumulative DV2 using default parameters (Buy/Cover if DV2 < 50 & Sell/Short if DV2 > 50)

Normal DV2:

2-day cumulative DV2

3-day cumulative DV2

4-day cumulative DV2

5-day cumulative DV2

6-day cumulative DV2

7-day cumulative DV2

8-day cumulative DV2

9-day cumulative DV2

10-day cumulative DV2

11-day cumulative DV2

The results make me think of one of David Varadi‘s posts (link) about how mean-reversion isn’t necessarily ‘dead’, it has only changed to become mean-reverting on a longer period of time (per the equity charts of the 4-9 day cumulative DV2). An adaptive framework similar to that found at this post by Sanz Prophet could definitely be used.

Equity Curve Feedback

A concept that I thought of while showering a while back was treating an equity curve of a trading system as a tradeable security. It is analogous to using tactical asset allocation, except on trading systems instead of a portfolio of assets. Unfortunately for me, this is not a new idea. I also found this idea while going through the archives of my favourite blog, David Varadi’s CSS Analytics, and at Howard Bandy’s Quantitative Trading Systems in a post titled Equity Curve FeedbackSo far, I’ve been able to think of four ways to apply this idea to a trading system, based on two types of equity curve feedback that I call “soft” and “hard” equity curve feedback.

• Soft equity curve feedback allows new trades to be entered when a certain equity condition is met.
• Hard equity curve feedback allows new trades to be entered when a certain equity condition is met AND forces new trades to be undertaken on the second derivative level while mid-trade on the first derivative level.

For example, say you are using an equity curve feedback rule that states that you will only trade when the the equity curve of System XYZ > 15 Day SMA of the equity curve of System XYZ . Your first trade on System XYZ manages to push the equity curve of System XYZ above its 15 Day SMA mid-way through the trade:

• With soft equity curve feedback entry, you are now allowed to trade System XYZ, but you do not enter the trade mid-way
• With hard equity curve feedback entry, you are now only allowed to trade System XYZ, but you also immediately enter the mid-way trade

With these two variations, we can come up with four different kinds of equity curve feedback overlays:

• Soft entry; Soft exit
• Hard entry; Soft exit
• Soft entry; Hard exit
• Hard entry; Hard exit

There are also two (three, but I don’t count one of them) forms of soft exits, but they aren’t different enough for me to classify it as an entirely different form of equity curve feedback.

• True soft exits – You cannot exit the system until the equity curve feedback condition is met. I do not consider this as a legitimate approach to equity curve feedback, since you are basically barred from exiting your system unless you are lucky.
• False soft exits – You can exit the system regardless of whether or not the equity curve feedback condition is met. You are not forced to exit your position if the equity curve condition turns from true to false, and you are still allowed to exit your position if the equity curve condition is false.
• Shorting soft exits – You can exit the system, but only based off of the inverse of your original trading system. For example, say System XYZ from above has the following rules
• Sell: X <10
• Short: Y > 10
• Cover: Y <10
• X and Y are random indicators

with the same 15-Day crossover equity curve feedback rules as before. You’re using hard entries and you entered in the first trade, but somewhere along the way the equity curve dipped below its 15 Day SMA. Now instead of exiting when X < 10, you exit when Y < 10. This only work if System XYZ has both long and short components, and you allow the shorting of the trading system itself.

From my experience (I’ve only played with it through back-testing, not live money. I’ve also only used simple SMA crossover trading strategies), equity curve feedback doesn’t really improve/detract from total CAGR (sometimes it improves/detracts but usually it maintains CAGR that is roughly similar), but it improves MDD and volatility.

Market Idenfitification: Theory-Crafting

The truth is that not all markets offer good opportunities to make money. A poor market will have: 1) low and sporadic liquidity 2) a high but unpredictable degree of noise 3) few if any discernible patters or anomalies.

For some data sets, there is simply no edge available.

– Jaffray Woodriff in Hedge Fund Market Wizards

It is a commonly accepted assumption that trading systems will perform differently in different markets, most likely due to the particular markets’ idiosyncratic behaviour. For example, as seen in this XIV trading strategy, even a simple trend-following system can produce extra-ordinary results within a highly trending market. It is for this reason that I believe that market selection is not only extremely important, but possibly more-so than regime detection or the trading system itself.

In order for us to select a group of markets to trade successfully, we should try to decide the characteristics of a market the particular trading system will excel in. For the two most popular trading systems, trend-following and mean-reversion, this seems to be straightforward. In theory, trend-following systems as a whole should perform better in markets with positive serial correlation and extremely few whipsaws, while mean-reversion trading systems should perform better in markets with negative serial correlation and extremely few trends. Here are a couple of indicator ideas I thought of that could measure market “tradeability”

The Y-Day rolling average of:

• X-Day Auto-Correlation Function
• X-Day Historical Standard Deviation
• X-Day R-Squared
• TSI
• X-Day Signal-to-Noise Ratio (Not the engineering one. Will be defined in subsequent post).
• Frequency of MA crossovers

For these indicators either a static numeric threshold can be used (ex: if Y-Day rolling average of TSI < 1.6, then market is tradeable for mean-reversion systems) or some kind of adaptive approach can be used (Ex: if Y-Day rolling averge of TSI < 15 day MA of Y-Day rolling average of TSI, then market is tradeable for mean-reversion systems).

There is another indicator I thought of, but it wouldn’t fit in with the rolling average category.

• Equity curve of a daily follow through system

A static approach cannot be used with this indicator since there is no boundary that the indicator usually stays between. I can only think of an adaptive approach making this indicator useful (Ex: if equity curve < 15 day MA of equity curve, then market is tradeable for mean reversion systems).

I can only think of two potential problems with this:

This is an obvious problem, which can be solved by finding better indicators. This is easier said than done.

2) Past performance isn’t indicative of future performance

Since technical indicators only measure historical tendencies, this may not work since the future differs from the past. Seen through MarketSci’s post The Simple Made Powerful with Adaptation, daily follow was an extremely successful strategy up until ~2000. If in 2000, we used the historical performance of daily follow through as an indicator, it would provide a signal for trend-following systems to succeed, which would be horribly incorrect for the next 12 years. This problem should be solved using a rolling screening approach rather than a “screen once and forget about it” approach.

QIM’s Jaffray Woodriff

Multiple quotes that I found useful:

I discovered that it was much better to use multiple models than a single best model.

Probably why he uses ensemble methods.

I started out using market-specific models. I ended up realizing that these models were far more vulnerable to breaking down in actual trading because they were more prone to being overfitted to the past data. In 1993, I started to figure out that the more data I used to train the models, the better the performance. I found that using the same models across multiple markets provided a far more robust approach. So the big change that occurred during this period was moving from separate models for each market to common models applied across all markets. The second change that occurred was increased diversification. I started out trading only two markets and then for some time traded only three markets. But as assets under management increased, and I realized it was best to use the same models across all markets, I added substantially more markets to the portfolio. The transition to greater diversification also helped improve performance. By 1994, I was trading about 20 markets, and I was no longer using market-specific models. Those changes made a big difference.

Diversifying across multiple markets is a commonly used tactic to prevent over-fitting, however I don’t particularly believe that a successful model is required to perform well on all markets, only markets that are similar (such as S&P500 & Nasdaq ETFs). Models that perform well in market A, but not int market B is not necessarily overfit, it only means that the particular model captures idiosyncratic behavior that is only observable in market A. I do agree that a model that successful across multiple markets will be more robust than one that is only successful in a handful of markets.

There are books about the predictive modeling process that specifically caution against “burning the data”—that is, you have to greatly limit the number of combinations you ever try. And I found that advice markedly stupid because I knew I could find a way to try any number of combinations and not overfit the data. You get new out-of-sample data every day. If you are rigorous about acknowledging what that new data is telling you, you can really get somewhere. It may take a while. If you are trading the system, and it is not performing in line with expectations over some reasonable time frame, look for overfit and hindsight errors. If you are expecting a Sharpe ratio above 1, and you are getting a Sharpe ratio under 0.3, it means that you have made one or more important hindsight errors, or badly misjudged trading costs. I was using the data up to a year before the current date as the training data set, the final year data as the validation data set, and the ongoing real-time data as the test. Effectively, the track record became the test data set.

Walk-forward testing before testing the system live with real money would be safer.

The first thing you need to do is to get some idea of how much of the apparent edge is completely spurious. Let’s say instead of training with the target variable, which is the price change over the subsequent 24 hours, I generate random numbers that have the same distribution characteristics. I know that any models that I find that score well training on this data are 100 percent curve-fitted because they are based on intentionally bogus data. The performance of the best model on the fictitious data provides a baseline. Then, you need to come up with models that do much better than this baseline when you are training on the real data. It is only the performance difference between the models using real data and the baseline that is indicative of expected performance, not the full performance of the models in training.

This, and monte-carlo simulations are good additional tests to prevent curve-fitting.

You can look for patterns where, on average, all the models out-of-sample continue to do well. You know you are doing well if the average for the out-of-sample models is a significant percentage of the in-sample score. Generally speaking, you are really getting somewhere if the out-of-sample results are more than 50 percent of the in-sample.

Cross-validate to protect against curve-fitting.

Sometimes we give a little more weight to more recent data, but it is amazing how valuable older data still is. The stationarity of the patterns we have uncovered is amazing to me, as I would have expected predictive patterns in markets to change more over the longer term.

I think there are idiosyncratic movements that are only existent in more recent data, but he uses older data to prevent (once again) prevent-curve fitting.

The core of the risk management is evaluating the risk of each market based on an exponentially weighted moving average of the daily dollar range per contract.

How he manages to keep so close to his volatility target.

– All quotes from Jaffray Woodriff in Hedge Fund Market Wizards

Ensemble Methods with Jaffray Woodriff

I read about Jaffray Woodriff from Quantitative Investment Management in Jack Schwager’s Hedge Fund Market Wizards right when it came out, and I was reminded of him during one of my sessions studying data mining. I googled him to see if I could find more details on him and found 2 articles (1,2). He uses a method called “ensemble methods”, which means he uses an aggregate of multiple models in order to increase predictive ability. Also, since he states that he only uses OHLC data, I assume that all of his secondary variables are some kind of technical analysis.

The idea of using ensemble methods is kind of similar to using multiple trading indicators, or trading filters; however, it is more formal since it incorporates Bayesian thinking. I assume that when Woodriff states he uses ensemble methods to combine 800 trading models, the models themselves were built using data mining techniques since ensemble methods originate from the data mining field, and in Hedge Fund Market Wizards he states that he uses data mining techniques to look for historically profitable patterns.

Another thing he states is that his underlying approach to system design is different than that of major quant houses such as D.E. Shaw:

Rather than blindly searching through the data for patterns—an approach whose methodological dangers are widely appreciated within, for example, the natural science and medical research communities—we typically start by formulating a hypothesis based on some sort of structural theory or qualitative understanding of the market, and then test that hypothesis to see whether it is supported by the data.

– D. E. Shaw in Stock Market Wizards

I don’t do that. I read all of that just to get to the point that I do what I am not supposed to do, which is a really interesting observation because I am supposed to fail. According to almost everyone, you have to approach systematic trading (and predictive modelling in general) from the framework of ‘Here is a valid hypothesis that makes sense within the context of the markets.’ Instead, I blindly search through the data.

– Woodriff in Hedge Fund Market Wizards

The fact that he disregards commonly accepted principles makes me question whether his alpha is generated from his ensemble methods, or from his unique approach (or both). Of course there is always the possibility that he has found a way of mindlessly mining the data without burning it, but I wonder if his mindless mining approach possibly detracts from his system’s alpha. This is definitely a topic for future research.

Data Mining Project

I haven’t been posting as much as I’d like lately. I’ve been pretty caught up with everything; I’ve been reading a couple books to learn R, command line interface (I recently switched from windows to Linux), and data mining while taking multiple courses on Coursera.org I’ve also been working on a business with two of my friends and my data mining project.

For my data mining project I will try to predict the direction (and magnitude, somewhat) of daily open prices of SPY. It’s actually not restricted to SPY since I will be able to run the R code on any security of my choosing; however, I am familiar with SPY, it has plentiful data for testing, and the fact that it’s an ETF means I won’t over-fit a model as much relative to an individual security. I choose to use opening prices for my data set (sometimes I will use indicators that will use OHLC data, but I will try to predict opening prices and not closing prices) because it is much easier for me to place at-open trades versus at-close trades since often I am not near internet access during closing time. I am also trying to predict the price 24 hours in the future, and not some other time because I believe that it is easier to predict in shorter time frames. My belief is founded on the fact that it is easier to predict daily volatility versus weekly or monthly volatility using GARCH modelling techniques. I choose not to go into intra-day data because it is extremely expensive and I do not have the resources to purchase it. I could try to use open price data to predict closing prices of the same day, and try to predict the next day open prices with current day closing prices, however due to my lack of internet accessibility during closing periods, I have decided against this.

I am still not finished with my project but I am nearing completion. The hardest part, which was learning R and introductory Data Mining, is now over, and I just have to code up testing procedures. I’ve decided to make this problem a classification problem. There will be classifications: bull, neutral, and bear. Bull will be when SPY increases by at least 1%. Bear will be when SPY decreases by at least 1%. Neutral will be everything else. I reason that this will not only be a more useful prediction (since 0.5% profits will be eaten up quickly by commissions), but it will also be easier to predict because I assume that signals will be clearer near extremes versus in the middle (this assumption has no data backing it to my knowledge). I am hesitant to use static values since it detracts value from this model when using it on differing markets, but I’m not sure what market-normalized measure to use.

I chose classification because I assume (without evidence) that the secondary variables do not have a scalable quantitative relationship with price movements. What I mean by this is that there is a threshold value for indicators to obtain predictive powers (which may or may not have a quantitative relationship with closing prices), and below this particular threshold, all predictive power is lost.

I am confused about how to use direction-less magnitude predictors (volatility predictors). These predictors hold predictive power as to whether it’s in a bull/bear or neutral state, but not the particular direction. I’m thinking about making two prediction tasks (determining whether the next day will be low/high vol and whether the next day will be bull/bear) but this creates problems of its own, specifically the fact that sometimes the sum of the parts do not equal its whole.

There is also the problem that bear markets are often characterized by high volatility while bull markets are often characterized by low volatility. The model should be able to catch this, but my concern is that the trading system from this model will be inactive during bull markets. Maybe I will go long SPY or XIV when this model is inactive, and exit when it is not.

Currently, my next step is to define an objective function, or something to grade the accuracy of different models by. I will probably use the CAGR/MDD of a system, Sharpe Ratio, or some other commonly used back testing metric, but I am also considering common data mining metrics such as precision and recall.

Mean Reversion: Different Length Regimes

In my previous post, Mean-Reversion within Regimes, I discussed how volatility, trendyness, and bull/bear affected a simple mean-reversion trading rule. In this post I will elaborate upon the volatility and bull/bear regimes by testing performance using different lengths for my volatility & market direction indicators.

Volatility:

I will define high-volatility and low-volatility in two different ways. For the first method, I will define high/low volatility as

• HV: X-Day Standard Deviation of Daily Returns > Threshold Value
• LV: X-Day Standard Deviation of Daily Returns < Threshold Value

I optimized the length of the standard deviation look-back period, and the threshold value. CAGR/MDD was my objective function. The data was from Yahoo! Finance from 1/1/2000 – 12/7/2012. From now on, I will be using the same data set to conduct all further tests unless stated otherwise.

High Volatility:

The values in the chart are threshold values.

So from this we can draw the conclusion that a static threshold around .03-.04 is optimal for maximizing CAR/MDD.

The values are the length of the standard deviation function

Optimal standard deviation lengths are 10-30.

This chart was included to show that there were enough trades for the majority of the settings so that the results were statistically significant.

Low Volatility:

If we look closely towards the left of the graph we see that, generally, having a threshold value of 0 performs better than other threshold values. Since a threshold value of 0 means that no trades were taken place (since Standard Deviation has to be less than 0 for a trade to take place, which is mathematically impossible), this shows that trading in low-volatility periods deteriorates performance.

I’ve also included this optimization chart to show that if the above reason wasn’t enough to warrant not trading during low-vol periods, then the fact that the more trades you take, the more your performance deteriorates, should provide ample evidence against trading during these periods.

This is my second definition of high-volatility and low-volatility:

• HV: X-Day Standard Deviation of Daily Returns > Y-Day Simple Moving Average of X-Day Standard Deviation of Daily Returns
• LV: X-Day Standard Deviation of Daily Returns < Y-Day Simple Moving Average of X-Day Standard Deviation of Daily Returns

Once again I optimized the length of the standard deviation look-back period, and also the look-back period of the simple moving average of the X-Day standard deviation of daily returns.

High Volatility:

Length of the standard deviation period is optimal around 10-20.

Length of MA of standard deviation look-back is optimal around 10-50.

Using an MA of length 2 has extremely positive results, yet since there are no surrounding values with similar results, then it could just be from curve-fitting the data. There are a lot of strongly performing results using an MA look back of 30-45 and a standard deviation look back of 10-20.

Low Volatility:

There is no hilly region above the break-even point; once again Mean-Reversion should not be traded during low-volatility periods.

Market Direction:

I will define bull/bear market conditions as:

• Bull: RSRank > 0
• Bear: RSRank < 0

The RSRank has two lengths within the indicator. A long look-back period and a short look-back period. I described the RSRank indicator in more detail in my previous post (actually I just linked to a post that described it).

Bull:

There is no hilly region that is above the break-even point. This reinforces the conclusion from my previous post that it is disadvantageous to trade during a bull-market.

Bear:

Optimal value of the long look back period for RSRank is around 140-160.

Optimal short look back period for RSRank is around 70-100.

There are enough trades to prove this is statistically significant.

Conclusion:

It’s best to trade mean reversion systems during bear markets and periods of high volatility. The optimal method of classifying these regimes are with a RSRank indicator that has the long length at 140-160 and the short length at 70-100 and with a 10-30 day standard deviation along with a static threshold of .03-.04 by or using a 30-45 day simple moving average of a 10-20 day standard deviation. For further tests of robustness, one can test this system over similar but different markets (ex: QQQ).

Patience

Every investment strategy goes through periods where it works poorly. That’s life. If you have a strategy that always works well, that means:

• You haven’t run it long enough.
• You’re not running enough money.
• You’re not taking enough risk.

Survive through your bad times, and prosper during the times where your intelligent strategy is paying off. Patience is a virtue in investing for the most part.

– from the post “If you want to be Well-off in Life” by David Merkel from The Aleph Blog