Correlation, Causation, and … lag()

I gave a presentation back in June at a MicroStrategy Meetup and I used a simple data set to illustrate that even one dimension and three data points can yield interesting results.  My data included the following three things:

  • Daily closing price of gold
  • Daily closing price of oil
  • Daily closing value of the VIX (fear index)
Oil, Gold, and Vix by Time
One dimension, three data points

The recent 4 year data set, when visualized looks like this:

Three values, graphed over time

The complete data set:

Gold, Oil, VIX graphed
The three values, back to 1983

My thesis in working with these three data points was that somewhere in this data we could find evidence of correlation.  So, I went about the task of building out some reports that correlated some of the combinations of the data and I plotted them out:

The VIX and Oil saw a high correlation swing between 2007 and 2009, but the overall trend leading up to 2007 was inching upwards to 1.  The sudden drop in crude prices in 2008/2009 could partially explain the easing of the VIX since the financial crisis.

When I plotted the VIX against gold, I saw more dramatic correlation swings year over year.  I found these variation differences to be more interesting than the oil and fear relationship because I had assumed that these two would stay generally correlated above 0.  To see the VIX and gold dip so low in 2008 suggests that one wasn’t keeping up with other.

In the last step I plotted oil and gold together, and found similar precipitous changes year over year.  With the first two correlations there at least seems to be a pattern, but with this last one not so much.   What I was looking for was some consistency (stay above or below 0) in the correlation, but I did not observe this with this data.

Rather than looking for the obvious perfect match between these variables over time, my next thought was to insert a lag into the data and see whether some sort of offset would smooth out the relationships.  The thinking behind this being that socionomic forces exist behind these data points, but the shifts are either reactive or proactive.  For example, it is possible that the fear index responds to changes in oil prices, or that the daily price of gold  reflects the speculation that the economy is worsening and that the only good place to invest assets is in a common precious metal.  To accomplish this I created a series of objects in MicroStrategy that allowed to quickly change the lag parameter and test my assumption.

Metric Edito
Using an embedded lag function

My correlation metric is defined as Correlation([Gold Close (lag n)], [Fear Index Close (lag n)]) {Year}

Gold Close (lag n) is defined as:  Lag([Gold Close], ?[Lag Value Gold (Number)], 0) where the “?” represents a prompt value.

From this I could run a series of quick tests and using the standard deviation of the results I could start to see that embedding a negative lag (-30, -60, then -90) into the data started to lower the dispersion of values.

Standard Deviation - Lag -90

I could certainly do more with this data, and if I was desperate to find that perfect leading indicator that could predict where commodities or the S&P were headed I suppose I was start by extending this and looking for variation of the data that yielded the lowest possible standard deviation in correlation coefficients.  Beyond the sheer number of possibilities this small data set affords me, one could easily start to add more variables into the mix — the DOW closing price, pork belly futures, or the foreign currency exchange rates.



Leave a Reply

Your email address will not be published. Required fields are marked *