**Background**

Outside of price action, two of the most popular market characteristics analyzed are volume and volatility. Volatility is often used to determine market regime, while the traditional use for volume is to confirm price movement. This post will investigate the relationship between these two characteristics, and whether using both of them may be redundant.

**First Look**

To investigate the effects of the two vols, I decided to look calculate some quick stats on GDX, the popular gold miners ETF. It’s plenty volatile and has traded over $1 Billion worth of daily volume, so it seemed like as good a candidate as any. And besides, who doesn’t like gold?

Using adjusted daily data for GDX since its inception from Yahoo, I calculated each day’s True Range as described on stockcharts.com. This serves as a proxy for volatility; higher true range values indicate larger daily price swings. I then normalized both the volume and true range for each day by subtracting the mean of the last 20 days and dividing by the last 20 days’ standard deviation for each. Why 20? That’s typically one month worth of trading days, and seemed reasonable enough for me. This brought the scale of the two values into a similar range, allowing for easier comparison. After transforming the data, it can be seen that the two variables are strongly correlated (Pearson coefficient = 0.75). This means that when volume’s higher than average, volatility is usually higher as well (and vice versa). This is important, because unlike most technical indicators, these are derived from two separate data series.

**Residuals of Volatility Model**

There’s a long-running debate between those who believe that markets are efficient and those who believe they aren’t. I’ve adopted the wimpy stance of siding with neither camp; I would argue that *most* markets behave *mostly* efficiently *most* of the time. However, there may be anomalies that exist which can be exploited for profit by skilled traders. If this hypothesis is true, these opportunities will present themselves when markets deviate from their normal behavior. Following this logic, we should be more interested when the market diverges from the norm.

A simple way to establish the “normal” condition is to fit an Ordinary Least Squares model to the data and see how much of the variance in volatility can be explained by volume.

Turns out, the majority of variance in volatility can be explained by the variance in volume. It’s important to note this doesn’t imply causality. A qq-plot of the residuals from the model shows that what’s left over roughly resembles a normal distribution, with slightly fatter tails.

The residuals represent the volatility that can’t be explained by volume. Large values indicate bars where the true range was large relative to the amount of volume traded, and vice versa. Sticking to the earlier hypothesis that markets are efficient except when they’re not, we’ll examine what happens when these values deviate from the norm. If volume and volatility are typically correlated, market anomalies might occur when the pair become decoupled. With luck, these anomalies represent will present profitable trading opportunities. To begin the analysis, we’ll divide the daily residual volatility values into quartiles. From there, the data is segregated based on whether the market closed higher or lower than it opened. The daily (non-annualized) sharpe ratio of returns for each bucket was then calculated, as summarized below.

It appears that there may an edge in shorting the market following up days that had larger-than-expected ranges and going long after down days with ranges more narrow than predicted by our model. However, it’s hard to make a general statement about these results, given that neither high values nor low values were consistently favored. Ideally, information would be contained in either large-range or narrow-range bars.

**Extensions to Other Markets**

The first look focused on an arbitrarily-chosen ETF. To determine if similar effects pervaded other markets, I conducted the same analysis on a dozen other ETFs. Also included in the analysis were other ways of measuring volatility. The “bar_range” field was calculated as (high – low) for each bar, while “body_range” used abs(close – open).

All of the ETFs studied showed significant correlation between normalized volume and normalized true range, with 10 out 13 exhibiting a coefficient of 0.6 or greater. It would seem that this correlation between volume and volatility is not exclusive to GDX. True range was at least as correlated as each of the other two range metrics for every ETF, so it was kept for further modeling. I won’t speculate as to why this effect is greater in some symbols than others.

Next, the edge present for each category was analyzed for each of the ETFs. Rather than split the data into quartiles like with the original study, the data was divided into up days and down days, small range days and large range days. A small range day was any day where the residual of the model was negative (less volatile than expected); large range days were those where the residual was positive. The mean daily return for each symbol was subtracted from the the series of returns, to remove the effect of symbol’s directional drift. The results are summarized below.

No clear trends jump out from the heatmap of results.

Analyzing the distribution of results for each category, perhaps there’s a bullish edge to days with smaller than expected volatility. Performing a one-way ANOVA on the four categories indicates the difference between the groups isn’t statistically significant (p=0.337). However, the test was only performed with a sample of 13 ETFs. More data is likely needed to determine whether there’s really anything here.

**Conclusion**

We’ve shown that two sources of market data, volume and volatility, are often correlated. This knowledge is beneficial by itself, as it will help to avoid using redundant information in market analysis. By fitting a linear model to these two variables and extracting the residuals, we obtained a “purified” volatility measure, without the effects of volume. Although this indicator may present an edge for GDX, it doesn’t appear to generalize well to other ETFs. However, this was only the first look at the concept and there might be more to it. If this is useful in your analysis, or you find other uses for it, let me know in the comments below.

Jupyter Notebook for this post can be found here