Correlation, Causation, and … lag()

I gave a presentation back in June at a MicroStrategy Meetup and I used a simple data set to illustrate that even one dimension and three data points can yield interesting results.  My data included the following three things:

  • Daily closing price of gold
  • Daily closing price of oil
  • Daily closing value of the VIX (fear index)
Oil, Gold, and Vix by Time
One dimension, three data points

The recent 4 year data set, when visualized looks like this:

Three values, graphed over time

The complete data set:

Gold, Oil, VIX graphed
The three values, back to 1983

My thesis in working with these three data points was that somewhere in this data we could find evidence of correlation.  So, I went about the task of building out some reports that correlated some of the combinations of the data and I plotted them out:

The VIX and Oil saw a high correlation swing between 2007 and 2009, but the overall trend leading up to 2007 was inching upwards to 1.  The sudden drop in crude prices in 2008/2009 could partially explain the easing of the VIX since the financial crisis.

When I plotted the VIX against gold, I saw more dramatic correlation swings year over year.  I found these variation differences to be more interesting than the oil and fear relationship because I had assumed that these two would stay generally correlated above 0.  To see the VIX and gold dip so low in 2008 suggests that one wasn’t keeping up with other.

In the last step I plotted oil and gold together, and found similar precipitous changes year over year.  With the first two correlations there at least seems to be a pattern, but with this last one not so much.   What I was looking for was some consistency (stay above or below 0) in the correlation, but I did not observe this with this data.

Rather than looking for the obvious perfect match between these variables over time, my next thought was to insert a lag into the data and see whether some sort of offset would smooth out the relationships.  The thinking behind this being that socionomic forces exist behind these data points, but the shifts are either reactive or proactive.  For example, it is possible that the fear index responds to changes in oil prices, or that the daily price of gold  reflects the speculation that the economy is worsening and that the only good place to invest assets is in a common precious metal.  To accomplish this I created a series of objects in MicroStrategy that allowed to quickly change the lag parameter and test my assumption.

Metric Edito
Using an embedded lag function

My correlation metric is defined as Correlation([Gold Close (lag n)], [Fear Index Close (lag n)]) {Year}

Gold Close (lag n) is defined as:  Lag([Gold Close], ?[Lag Value Gold (Number)], 0) where the “?” represents a prompt value.

From this I could run a series of quick tests and using the standard deviation of the results I could start to see that embedding a negative lag (-30, -60, then -90) into the data started to lower the dispersion of values.

Standard Deviation - Lag -90

I could certainly do more with this data, and if I was desperate to find that perfect leading indicator that could predict where commodities or the S&P were headed I suppose I was start by extending this and looking for variation of the data that yielded the lowest possible standard deviation in correlation coefficients.  Beyond the sheer number of possibilities this small data set affords me, one could easily start to add more variables into the mix — the DOW closing price, pork belly futures, or the foreign currency exchange rates.



Still prepping the site but mostly there…

Still tweaking the widgets and the plugins for the site.  Below was the original screenshot I used for the banner on my site.  I needed to blur it a bit, hence I reversed the colors and made the white background black.

Original dual-axis chart from the time series widget in a MicroStrategy dashboard — this was from one of the network dashboards we created.

As I have been working through the list of things I want to initially cover, I’ve realized how many sites and blogs get started and then die from lack of momentum.  Reading through other blogs and books about the topic I realize that writing like this requires a cadence and a discipline.  I can’t imagine keeping a site with the regularity that Paul Krugman does but keeping the site fresh for me means writing about a litany of topics.

I’ve been debating whether to include some of the product ideas that I’ve had over the years, but I think I can write about aspects of my ideas without revealing the whole concept.  If my readers can but the sum of the parts together, then all the better.

The last idea I mocked up was a simplified BI interface that borrowed from the Windows 8 theme of boxed simplicity.

make it easier
simplified UI for BI – too many colors?

I’ll write more on this one in a future post, but the exercise of putting together a wireframe like this was almost as much fun as coming up with the idea.

In the meantime my to do list is getting longer by the day and the time I have to knock these items off is getting shorter.  The wish list includes:

  • MicroStrategy install on CentOS
  • Reviewing the enhancements in MicroStrategy 9.3
  • Hooking the Cloudera VM to 9.3
  • Completing the Coursera class
  • Skinning MicroStrategy Mobile in Xcode

…and next week will include the TDWI conference in Boston and the MicroStrategy Meetup.