Business vs. IT — Who Should be Doing What?

With the advent of “self-service BI” and “visual data exploration” there has come a shift in the business intelligence world where organizations are struggling to determine who is supposed to be doing what when it comes to having an analytics practice. On the one hand, the IT side of the house is concerned (rightly so) with managing security and handling the mechanics of how data is ingested and disbursed within the corporation. The business, however, finds that the system is laden with controls and risk mitigation, and in pursuit of faster returns on where their money is being invested they clamor for faster and better response from their business intelligence technology stack.

I am asked quite often to help delineate from a feature perspective in MicroStrategy who should be doing what.  Here’s what I think the average organization has for resources, and my recommendation for what products each person should have access to.

Role Functional Rollup Access Level
BI Administrator IT EM, CM, OM, Admin Stuff (Job Monitor, etc.)
BI Developer – Dashboards, Documents IT or Business Desktop, Web
BI Developer – Reports IT or Business Desktop, Web
BI Developer – Schema IT Architect, Desktop, Web
BI QA Analyst IT or Business Desktop, Web
Business Analyst – Non-technical Business Web
Business Analyst – Technical Business Desktop, Web
Data Modeler / Architect IT Desktop, Web
Data QA Analyst IT Web
ETL Developer IT or Business Web
Graphic Designer IT or Business Web
Product Owner IT or Business Web
Program Manager Business Web
Project Manager IT or Business Web
Scrum Master IT Web
SDK Specialist – Java Programmer IT Desktop, Web, some Admin (as needed)
Director Engineering IT Web
Director (Marketing, Finance, etc.) Business Web
Network Engineer IT Web

I’ve highlighted the role of Technical Business Analyst because it is this role that really makes organizations tick.  I have had the pleasure to work with a few incredibly talented people who meet my criteria for this role, and they are often the unsung heroes of any BI implementation.  Here’s what I look for in a technical business analyst:

  1. They know the business cold.  They can tell you at an annoyingly discrete level what calculation definitions should be, spoken in coherent “if-then-else” phrasing.  They also know the people who implemented the business rules, and they know why the calculation has to be the way it is.  This kind of institutional knowledge is priceless.
  2. Technical Business Analysts write SQL because they think in SQL.  It might not be pretty, and they may use brute force tactics to get what they need, but they can explain what they need from a data model (yes, they know how to read a data model).  When they explain what they need from the data model, they inject SQL statements into their explanation.   For example, they might say something like “we need the calculation to be driven off of the main sales fact table, but we need to do a full outer join to the customer table, but only where the customer has had activity in the last six months…”
  3. They stay current.  Great business analysts stay on top of upcoming features and they are innately in tune with why those features are coming out.  “Data wrangling?  I can totally use that…”  Technical business people get the beauty of big data because they aren’t afraid to write their own scripts to get the data if need be.
  4. They are patient.  My favorite technical business analysts understand that some of this stuff can be really hard to figure out, and they run cover for the development team.

Enterprise software sales cycles often start (and end) with the technical business analyst — and if you have someone who meets the criteria I just laid out, make sure they are happy…because they tend to know the answers, and if they don’t they know how to get the answers.  I love working with people that have the technical and business acumen because they act like translators every day and keep the peace between the two sides of the house.


New Book: Discovering Business Intelligence Using MicroStrategy 9

Happy Pi day!

Last fall I was invited to be a reviewer for a new book called Discovering Business Intelligence Using MicroStrategy 9 published by PACKT. The book offers an introduction to business intelligence using the free MicroStrategy Express offering. The authors take us through the concepts of business intelligence using real world examples and show us how to use the Visual Insight technology that has been made available by MicroStrategy.

So, a copy of it arrived in the mail yesterday and I am very happy for the authors Nelson Enriquez and Samundar Singh Rathore! Great job guys!

Hard copy:

Kindle version:

Splunk for Business Intelligence

Business Intelligence environments rely on many different components working in concert in order for the application as a whole to be successful.  The layers that matter include the actual application stack — i.e., the business intelligence tool and the databases that the BI system accesses.  A fluidly running application stack is useless if the network that glues the BI tool to the data warehouse or OLTP isn’t working, or temporarily goes down.  Even when the application stacks are fine, and the network is fully connected, the hardware itself has to be fully functional and unpegged.  This is a simplified view of the world, but breaking it down into three basic parts makes for a useful view of the world:

  • Application
  • Network
  • Hardware

For architects, dev ops / admins, developers, and your general stakeholder audience, having a holistic view of how these three components interoperate is important.  Enter Splunk.

Splunk is software that can comb logs, machine data, and basically any sort of real-time data that is relevant to a  system and puts it all together in a single unified (web-based) interface.  Even better, for hacks like me, you can download it for free and put it on your own environment.  Working with Splunk is a lot like the game Othello, where on the box it says something to the effect that it takes minutes to learn and a lifetime to master.  The lifetime to master bit is in reference to regular expressions, which are simple in concept, but can get very complex and intricate.  Here’s a screenshot from Splunk’s online documentation:


An example I have been using is to get familiar with Splunk is the DSSERRORS.log file in MicroStrategy.  Looking at this log file can be a painful experience.

2013-09-25 17:03:34.048-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... merge conflict, resolved by keeping both objects
2013-09-25 17:03:34.048-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... new object placed in folder of replacement object
2013-09-25 17:03:34.048-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Committing to object save of the remaining 1 object(s)
2013-09-25 17:03:34.486-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Purging client side cache at the start of Source Manipulator operations
2013-09-25 17:03:34.658-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Examining Object(Name="Imp1" Type=Metric ID=92012DC34233D24BD12BA8B1EBCB4C2F Proj=A8F127FF42EBF5B2E6B9A39C39513B1E Ver=DE38ADCC436A69BC49140FADEB5F72FF ), replacement path = "Public Objects\Metrics"
2013-09-25 17:03:34.673-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... merge conflict, resolved by keeping both objects
2013-09-25 17:03:34.673-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... new object placed in folder of replacement object
2013-09-25 17:03:34.673-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Committing to object save of the remaining 1 object(s)
2013-09-25 17:04:14.574-05:00 [HOST:SERVER2008][PID:2204][THR:5492][Metadata Server][Error][0x80041023] An object with name 'Imp1' and type 'Metric' already exists in the destination folder.
2013-09-25 17:04:22.481-05:00 [HOST:SERVER2008][PID:2204][THR:5492][Metadata Server][Error][0x80041023] An object with name 'Imp1' and type 'Metric' already exists in the destination folder.
2013-09-25 17:05:23.311-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Object Server][Error][0x80041019] Item 'MergeColumnHeaderCells' cannot be found.
2013-09-25 17:05:58.264-05:00 [HOST:SERVER2008][PID:2204][THR:5608][DMX][Info] CDmxPmmlrPKG::InitMaxXMLSize(215): Maximum XML size set to 3145728 bytes.
2013-09-25 17:05:59.655-05:00 [HOST:SERVER2008][PID:8740][THR:8984][DMX][Info] CDmxPmmlrPKG::InitMaxXMLSize(215): Maximum XML size set to 3145728 bytes.
2013-09-25 17:08:47.662-05:00 [HOST:SERVER2008][PID:2204][THR:5832][Kernel][Warning] MsiUserSession::DecrementProjectSessionCount: Admin user logout from project 11. Do not decrease Project-User level session count
2013-09-25 17:08:47.662-05:00 [HOST:SERVER2008][PID:2204][THR:5832][Kernel][Warning] MsiUserSession::hLogOutFromProject: Admin user logout from project 11. Do not decrease Project level session count
2013-09-26 08:49:48.195-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server checking License compliance has the following license key: xxxxxxxxxxxxxxxxxxxxx
2013-09-26 08:49:48.445-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server is checking Named User Licenses.
2013-09-26 08:49:48.945-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server is in Named User compliance based on contract 300242.
2013-09-26 08:49:49.132-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server finishes checking Named User Licenses.
2013-09-26 08:49:49.132-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server checking License compliance for activation has the following license key: xxxxxxxxxxxxxxxxxxxxx
2013-09-26 08:49:49.132-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server is checking License Activation.
2013-09-26 08:49:50.664-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Error] Distribution Service: MSINCSEngine::hParseRetreiveAppIDFromAPNSPEMFile: the name of iAPNSPEMFile is empty.
Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [22018:-3030: on SQLHANDLE] [Microsoft][ODBC Microsoft Access Driver] Data type mismatch in criteria expression. Connection String: DSN=MicroStrategy_Tutorial_Data;DBQ=C:\Program Files (x86)\MicroStrategy\Tutorial Reporting\TUTORIAL_DATA_7200.MDB;DRIVERID=25;FIL=MS Access;MAXBUFFERSIZE=8192;PAGETIMEOUT=5;UID=admin;. SQL Statement: select distinct a11.[SHIPPER_DESC] + 1 AS CustCol_12
from [LU_SHIPPER] a11.
2013-09-26 09:50:16.315-05:00 [HOST:SERVER2008][PID:2204][THR:3868][Query Engine][Error][0x80042300] QueryEngine encountered error: Execute Query failed.
Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [22018:-3030: on SQLHANDLE] [Microsoft][ODBC Microsoft Access Driver] Data type mismatch in criteria expression. Connection String: DSN=MicroStrategy_Tutorial_Data;DBQ=C:\Program Files (x86)\MicroStrategy\Tutorial Reporting\TUTORIAL_DATA_7200.MDB;DRIVERID=25;FIL=MS Access;MAXBUFFERSIZE=8192;PAGETIMEOUT=5;UID=admin;. SQL Statement: select distinct a11.[SHIPPER_DESC] + 1 AS CustCol_12
from [LU_SHIPPER] a11..
2013-09-26 09:50:16.331-05:00 [HOST:SERVER2008][PID:2204][THR:3868][Kernel][Error] [MicroStrategy Tutorial][Administrator][JobID=992] (SQL Generation CompleteQueryEngine encountered error: Execute Query failed.
Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [22018:-3030: on SQLHANDLE] [Microsoft][ODBC Microsoft Access Driver] Data type mismatch in criteria expression. Connection String: DSN=MicroStrategy_Tutorial_Data;DBQ=C:\Program Files (x86)\MicroStrategy\Tutorial Reporting\TUTORIAL_DATA_7200.MDB;DRIVERID=25;FIL=MS Access;MAXBUFFERSIZE=8192;PAGETIMEOUT=5;UID=admin;. SQL Statement: select distinct a11.[SHIPPER_DESC] + 1 AS CustCol_12
from [LU_SHIPPER] a11.. Error in Process method of Component: QueryEngineServer, Project MicroStrategy Tutorial, Job 992, Error Code= -2147212544.)

Most of the entries in the log file are noise, but at the end of the log file there is an indication that a SQL error occurred.  This is just one of the many types of incidences that can happen in a BI system.  Perhaps the database blipped, or a column was modified, or in this instance a bad attribute with some wonky syntax in an pass-through statement was issued.

splunk monitor

Using a search on specific keywords with a wildcard (such as SQLExecDirect*) I can isolate the events in the DSSERRORS.log file and catch them well before they ever get to Enterprise Manager.  In the example above, I can see based on the last vertical bar chart that something happened.  In the reporting view I can drill into that event and identify exactly what the spike represents.

When Splunk is reading the I/O on the application servers, ETL error logs, and any network logs all of the main areas of concern to the BI system can be centralized and we have visibility into the pulse of BI in the enterprise.  Getting back to the idea that this is something that can take a lifetime to master, fully integrating BI logs with all of the other components in the data ecosystem is where really understanding Splunk, getting creative with regular expressions, and combining them into reports comes into play.


Compounded Annual Growth Rate (CAGR) in MicroStrategy

The idea behind using a compounded annual growth rate is to smooth out the rate of change that something sees period over period for a successive number of periods.  Out of the box in MicroStrategy there is no function that creates this value automatically, so if a CAGR calculation is needed, either a custom metric or a custom subtotal is required.

The calculation for a compounded annual growth rate looks like this:


In the CAGR equation we divide the ending value by the beginning value, raise it by a power — in this case the number of years that compounded — and subtract the 1 from it to get a percentage value.

…and in Excel, the formula to do this will use the caret to denote that we are raising the proportion of change by a power equivalent to the number of periods that the data spans.


To do this in MicroStrategy, the best place to apply this logic is in a subtotal.  For the uninitiated, the ability to create new subtotals is very powerful, and often the best way to shoehorn a report look and feel into a grid.

To create a new subtotal for CAGR in MicroStrategy we have to specify the exponent using the power() function, and the At sign is going to be the


Here’s the code for the calculation:

(POWER((LAST(x) {@}  / FIRST(x) {@} ), (1 / COUNT(YEAR) {@} )) - 1)

The second to last step in the process is to add the subtotal to a report.  Since this subtotal is for a year, adding the CAGR means that it should be the last column on the grid report.  To make the CAGR subtotal visible as an option, it needs to be added to the list of available subtotals for the metric.  In this example we are working with a revenue calculation so we need to move the new subtotal as being available to the revenue metric.


The final piece once there is a report that contains both the Year attribute and the Revenue calculation is to add the subtotal at the right level.  For this, go to the advanced subtotals section of the grid, and put the CAGR subtotal across the Year level.


The final output should look something like this:


Note that the calculation used in this subtotal is for year, but the concept can apply to any time period.  With the same implementation, but using a different attribute (month) the calculation will only differ by the attribute name:

(Power((Last(x) {@}  / First(x) {@} ), (1 / Count(Month) {@} )) - 1)


The Utility of Custom Languages

I’ve spent an inordinate amount of time these past few weeks understanding how translation and languages (a.k.a. locales) work in MicroStrategy.  The concept is simple: the metadata and/or the data in the data warehouse can be represented differently for users depending on the preferred language.  This means that there has to be a mechanism for mapping what the user sees to whatever is stored in the metadata or the warehouse.  For the data warehouse, data translation involves dynamically selecting lookup tables that have descriptions tied to the surrogate keys in the warehouse, and for the metadata, translation means associating a language with an object and identifying the label that gets called when the object (attribute, metric, prompt, etc.) is called.

Such is the example of a lookup table that has multiple columns supporting different languages, which is dynamically accessed based on the preferences that the user has been set up with.


Metadata translation works slightly differently, and is accessed for creation and modification by right clicking on any attribute and choosing the “translate” option.


The reason I’ve been spending so much time looking into languages is that unlike many things in MicroStrategy, the creation or altering of languages and translations does not have a corollary function available in Command Manager.  This has taken me down an obscure alley in MicroStrategy to a place called the Repository Translation Wizard.  The RTW is an executable that sets up a staging area for the translations that have to occur for selected objects.  I had seen this aspect of MicroStrategy before, but never had a reason to use it — but for the uninitiated it is under the Object Manager folder:


So, essentially the RTW is a utility that can do a bulk replacement of object names.  For a selected project the developer would choose all of the objects that require translation, and the utility creates an XML dump file with the proper object IDs and names.


At this point the RTW repository is essentially a staging area where the object information is dropped, with a placeholder column for the translation.


So, a little SQL update statement…

update [MSTR_MD].[dbo].[RST_MicroStrategy_Tutorial_RTW_TESTER]
set TRANSLATION = 'Promotion for Client1'
where OBJECTID = '1B7A3F0249C78AD99C3A05A7274389BC'
and READABLEKEY = 'Object Name'

…and then check that the object has been translated…


…and when Client1 logs into the project the report will render with their custom name…


A scenario for this type of implementation is somewhat limited to highly customized environments where every user feels that the report is specific to their needs and their business.  For example, if the Promotion attribute was used for departmental reporting, the language could be used to make the attribute label look and feel like it was designed specifically for that user.

MicroStrategy and AWS

I was surprised (but pleased) to find that MicroStrategy is now available on the Amazon Web Services Marketplace.  For the past several months I had been watching the business intelligence offerings there but was seeing mostly Jaspersoft, Birt, and a handful of smaller players that I was not familiar with.  Seeing MicroStrategy available, especially the the free reporting suite, means that kicking the tires, spinning out a new instance for a prospect demo, or simply trying out 9.3 in a temporary environment is cheap and easy to do.  I also see MicroStrategy’s embrace of AWS, along with their very early support for Redshift, as signs that MicroStrategy is committed to the cloud architecture, and not just their cloud offering.

I took the MicroStrategy Suite on AWS for a test drive through my AWS account and had it up and running within 30 minutes.  The one trick was realizing that I had to decrypt the key pair I generated so that I could log into the server from remote desktop.  Once I had that in place I had no troubles getting to the machine, activating the server, and then connecting to it from my local 9.3 instance.

Unless I win the $550M PowerBall jackpot tonight I’ll probably terminate the server before the weekend.  In most companies today, to set up a VM like this I would have had to request a ticket for a new VM, haggle with the VM admin over how much memory I was allowed (I like to have at minimum 8 GB for development, and 64 GB for production), and then have to download the MicroStrategy media, move it the machine, unpack it, install it, and then reboot.  Using AWS I didn’t have to haggle with anyone, I opted for the $0.46 / hour and 7.75 GB RAM version, and instantiated the server.  All the necessary ports were open (also a typical hassle) and I was up and running without any hiccups.

SQL Server 2008 comes pre-installed on the image, and the metadata and warehouse tables are all contained in SQL Server.  Even though this was a preconfigured instance I had to run the cubes in order to get the Visual Insight reports to work in the Tutorial.  I also tweaked the memory and report governing settings, but it’s understandable that none of these were pre-configured.

I haven’t tested the mobile piece yet, but the server was fully configured and I doubt that the setup will be different than the initial spinout.  So…again, MicroStrategy goes off and does something, neglects to tout it, and through some luck I stumble upon something they’ve done that is developer / engineer friendly.

Extended Data Access

When MicroStrategy first announced their extended data access (XDA) support many moons ago, the concept didn’t really resonate with me.  Most of the use cases for XDA were one-off hacks that really didn’t have a place in the traditional data warehouse based-environments.  In recent years, however, the number of open access APIs has exploded, and combined with multi-source capabilities, dashboards, Visual Insight, and especially Mobile, the utility inherent to being able to gather a data set from a REST, SOAP, or WebDAV means that the data warehouse isn’t as critical as it used to be.

One of my favorite sites to peruse is — when the site was first created much of the data was a disparate set of Excel dumps, XML fragments, CSV, text, etc..  Not only does is make it harder to integrate these formats, but it requires someone to update the extracts.  Since I have an interest in environmental and regulatory matters, I wanted to integrate business intelligence with some EPA data.  I have also been doing some research to find ways to avoid writing ETL jobs whenever possible, and having the option to use a RESTful interface within MicroStrategy means that I can integrate data without having to write a set of data integration packages.

The high-level flow of tasks needed to create a report based on a data API is as follows:

  1. create a database instance for Xquery
  2. gather the proper Xquery syntax using the Xquery Editor and Generator
  3. create a new freeform SQL report
  4. drop the xquery definition into the freeform SQL editor
  5. create a visual insight document on top of the new report

So, creating a new xquery database instance is pretty straightforward.  The only trick needed is to reference the xquery data base type:

The real effort comes when getting the proper xquery syntax together.  When I first started researching how to create a report, the documentation wasn’t very clear on what the syntax needed to look like, so I was worried that it would entail a lot of hand coding, which leads to mistakes, which leads to me getting frustrated and then giving up.  But as I started to play with the editor, I realized that it was quite flexible, and could pretty quickly spin out the syntax I needed.

So, long story short, here’s what the query looks like in a free form SQL report:

None of this needs to be typed.  The Xquery Generator and Editor handles it all.  In this example I am using the following URL to request data from the EPA (New Hampshire):

The full query:

declare copy-namespaces no-preserve, no-inherit;
declare namespace mstr-rest = '';
declare function mstr-rest:post($uri,$payload,$http-headers) external;

let $uri := ('')
let $http-headers := ( )
let $payload := ( )
let $result := mstr-rest:post($uri,$payload,$http-headers)

<ColumnHeader  name='FACILITY_NAME' type='xs:string' />
<ColumnHeader  name='PREF_LATITUDE' type='xs:string' />
<ColumnHeader  name='PREF_LONGITUDE' type='xs:string' />
<ColumnHeader  name='FAC_CLOSED_IND' type='xs:string' />
<Data>{for $ancestor0 in $result/tri_facilityList/tri_facility
order by $ancestor0/FACILITY_NAME
return <Row>

This entire string was created by the editor (found in the MicroStrategy –> Desktop folder).  The only trick that I had to figure out was that the fields in the tree view response section have to be dragged and dropped onto the Main Grid View section.  When the desired fields are in place, the code can be generated and loaded.  The fully query is available in the Xquery generation window and can be copied as-is and dropped into the freeform SQL.

Once the query is in the report editor, the number of XML ColumnHeaders has to correlate to the number of attributes / metrics defined in the report.  In my example there are four column headers, thus four attributes in my report.

Because the data that is returned by the url is XML, MicroStrategy knows how to crosstab and render the results.  I had to make one little adjustment to the data and that involved putting a dash before the PREF_LONGITUDE field, otherwise the data will map to Russia…which isn’t very useful.

The resulting data is the lat / long coordinates of EPA toxic release sites. The high level data doesn’t indicate exactly what has happened at the site, but presumably additional extended queries to the other TRI tables would yield more information. The lat and long is easily consumed by the Visual Insight report, giving us this final output for Pennsylvania (my link gives New Hampshire data which is smaller and comes back faster):

One note: I haven’t been able to tell whether MicroStrategy will be able to consume JSON as a data format any time soon.  Doing so would presumable expand the number of native data sets that MicroStrategy could report off of.

MicroStrategy has several features that they just seem to throw out there without much fanfare or for that matter documentation.  Having SOAP and REST support is important and opens up the platform to doing pretty interesting (and real-time) things — especially on Mobile devices.  Here’s just one idea: combine NOAA and Twitter data to localize weather information and sentiment.  Is it really cold outside?  If so, just how cold?  500 tweets in your zip code in the last hour cold???


Hubway Data Challenge – Final

While the Hubway Data Challenge has been an opportunity to explore some of the newer capabilities in MicroStrategy 9.3, it was also an opportunity to work with a new data set and a get some practice modeling in geolocation data.  Working with other developers, and seeing some of the submissions has informed my understanding of what tools are out there and what other people are doing with commercial and open source visualization tools.  The concept of animating this data on top of a map has particular appeal is appropriate for this type of data.

This submission caught my attention, but there are a few others that have been submitted in recent days.

For me I was committed to sticking with the 10.31 deadline even though the deadline was extended.   I had no shortage of ideas, but ultimately I had to pick something to show and share.  The new network diagrams in 9.3 are clean looking and visually pleasing.  So, I picked one bike (the mayor’s) and graphed it’s usage.  I set the export option for dashboards to be .pdf (as opposed to .mht) and was able to create a portable document from the Visual Insight interface.

The .pdf file is available here.

The other thing I wanted to try with the data was something using the maps.  It took some trial and error and an angry email to Google before I realized that I didn’t in fact have to pay $10,000 for a Google maps API key.  To get the data in a format that would work for the maps, I decided to create a running sum of the trips by station over time.  There were two things I had to do within MicroStrategy to make this work:

1. Set the VLDB property to retain all of the lookup table information

2. Create a running sum metric to accumulate trip information:

Within the grid report the output had to look something like this before I could build a cube off of it:

The next step was to create a Visual Insight document using the map style.  I had tried to use the density version but my data was not disparate enough to use close up, so I used the traditional map and used bubbles to illustrate the data.  I set the geographic attribute to be the start station, and because I modeled in the lat and long as part of the attribute form (as latitude and longitude data types) Visual Insight recognized the data.  The last step was to put the start date into the page by and from there I could take advantage of the play button.

I used the Quicktime player on my laptop to record my screen and then I used iMovie to speed up the movie to condense something that took ~6 minutes in MicroStrategy to play in about 24 seconds.  This became the basis for my second submission to the Hubway Data Challenge.  I posted this on Youtube:

My overall impression with 9.3 has been favorable so far.  The cleaned up UI is pleasing, the interface is faster, and the Visual Insight enhancements not only look good, but they work pretty well too.  Having spent some time with other developers at the hack day event I realized just how many people are using R, and with the integration of R into MicroStrategy there are certainly some possibilities to consider there.  I know from past experience that the number of freely available R packages is improving, and that the corresponding library of visualization for R is growing.  This offers many possibilities for MicroStrategy development and deployment, and certainly extends the platform in ways that I am looking forward to exploring.





Hubway Data Challenge – Part 3

I’m out of my league.

I’ve seen two submissions for the data challenge already, and yah, some people are really good.  For the rest of who model data and try to create a narrative out of data on a daily basis, a very discrete data set with a very precise way of looking at it is a luxury.  Usually we work with a user population who thinks “I want a dashboard, but I don’t know what I want…can you build it for me?”  Not an issue here…the developers working on this exercise know exactly what they want to see.  The Hubway data set lends itself to being mapped, and while there are a lot of ways to do this, this submission by Russell Goldenberg is very creative.  It can be downloaded as an .app or as an .exe — or previewed (click on the picture to see it on Vimeo).

Then there’s this Neo4j graph database version by Max De Marzi.  I like how the width of the station corresponds to its traffic.

These aren’t dashboards, these are visualizations.  Appealing — yes — but very specific, and custom coded.  I have to keep that in mind, because working with business intelligence tools means that I am often trying to spice up relatively unexciting data.  But MicroStrategy 9.3 and the features added to Visual Insight (as well as the Report Services widget) means that I at least have something to work with now in terms of network maps.  The three layout options are as follows:

  1. force-directed
  2. circular
  3. linear
To put myself in a position to visualize the bike system data I created a cube that had both the bikes and the stations as attributes, as well as the basic metrics (trips, distance, duration).  With these levers to work with I came up three views of the data for bike number B00079 in the months of August 2011 and 2012:
Since dashboards tend to be a collection of visualizations that, when combined, compel the user to quickly glean some information from the data set, these network diagram might have some utility.  If I wanted to create a dashboard that allowed the user to understand what bikes have been used more heavily, the network view along with some numeric stats would be useful for devising a maintenance strategy for bikes.  The trap that I fell into when I first started playing with the network graphs was to try and put a lot of data into it all at once.  Looking at all of the starting and ending stations at once was a mess, but adding the bike attribute narrowed the data set down and brought clarity.
One of the big advantages of Visual Insight is that all of the hooks and interdependencies between data sets are taken care of.  As opposed to a report services dashboard, I don’t need to set the selectors or worry that I’ve forgotten to establish a target data set.  The tradeoff comes with customization, or the lack thereof.  I don’t have a place to drop a customer logo, and I don’t have control over many other things, like the grid formatting.
So, the question comes to this: to make a nice visualization of the Hubway data set, should I create a  dashboard or use Visual Insight?

Hubway Data Challenge Part 2

Once I had the Hubway bike system data loaded into a database, and modeled into MicroStrategy, I could start to play with the data and do some basic profiling.  The more I looked at the data, the more I wanted to add things.  For example, the trips table as the birth year of subscribing riders, which lends itself to creating an age attribute.  To model in age, I created an attribute and used the pass-through applysimple function.  This is the basic syntax needed: ApplySimple(“(year(#0)-#1)”, [start_date_desc], [birth_date]).

When added to a report by itself, the age attribute will generate the following SQL:

SELECT DISTINCT (YEAR(a11.start_date_desc)-a11.birth_date) CustCol_3
FROM trips a11

As mentioned in the part 1 post, the data offers the opportunity to add more layers and texture because the dimensions are so generic.  Latitude and longitude coordinates can be used to derive the elevation, which would answer one of the questions on the Hubway Data Challenge web site, Do Hubway cyclists only ride downhill?  A data dimension could be used to correlate against the academic schedule, or even gas prices.  Anyway, on to the eye candy…

For those of you who read Stephen Few you know that visual design isn’t easy.  Few’s philosophy espouses simplicity and elegance over complexity and flash.  If you can’t generally understand the data in less than ten seconds you have failed your audience.  Basic, muted colors that make careful use of highlights is preferred over harsh and bright color tones throughout.  These are all great recommendations, and as I progress through the different phases of my interaction with the data I will adhere more closely to these recommendations.  In the meantime, I simply want to profile the data using some basic charts and graphs.  The alternative to graphing the data is that you get wide and long grids of data with visual appeal.  The tradeoff is that you get to pivot the data, sort it, filter it, etc., but exceptions, trends, and a general sense of the data quality doesn’t readily present itself.

So, a quick and dirty way to start to understand the data is to graph it.  I have gotten used to the MicroStrategy graphing options, but many developers will cite the core graphing technology as one of the weaker aspects of the platform.  The widgets and visual insight graphics have exceeded the Desktop graphing capabilities, but I still like to use the graph formatting to create vertical / horizontal bar charts, scatterplots,  and time-series analyses. So, simply to get a flavor of the data I created a few graphs.

This graph shows the activity (trips) for a month — in the page-by — and I tried see if there was a way to quickly tell whether temperature spikes led to a decrease in usage.  To do this correctly I’d likely want to average out the trips by weekday and get a rolling temperature average.  Only with the means in place can I get a true understanding of whether a 10 degree shift in temperature leads to an n% variation in usage.

Trips and Temp, dual axis

One of the data challenge sample questions asks whether rentals after 2 AM are more likely to come from the under 25 population.  I extended this question to ask whether usage varies by gender.  I took liberties with the coloring for effect, but I would mute these tones in a dashboard.  I also incorporated another axis (trip distance) to see whether rides are longer at certain times in the day, but since I didn’t use an average metric, the second axis isn’t very meaningful.

Male Female Usage by Hour

No basic correlation study should go without a scatterplot.  The r values are included, but aren’t very telling.  To make this graph work I had to clear out the trips that involved 0 distance (i.e., the rent and return location are the same).  Because this graph also had month in the page-by, some months showed a higher r value than others.  Again, I’m simply using this to get a feel for the data and get some general answers to high level questions.

Scatterplot, male female correlation

Based on some feedback I got from a colleague, I was advised to try and label the axes.   I tried to do something that tied the color of the axis to the metric, and this is what I got.  To me this graph is telling in that it appears to suggest that as the bike rental program became part of the city culture, people started taking longer rides.

dual axis trending

So, it’s a start.  With some basic profiling underway I am starting to compile a list of some high level questions that might be telling or informative about the data.  Station analysis and trip patterns are a good place to go with the data, and some of the questions that I’ve started to formulate go along these lines:

  • Which station sees the most usage?
  • What percent of trips end and start in the same place?
  • What bikes see the most usage, and of them, what side of the river do they spend the most time on?
  • How has usage changed this year versus last year?  Can the data be used to illustrate the growth of the program in some neighborhoods versus others?
These questions have their parallels to the business community, and represent the typical deep dive that a business analyst would do.  The next layer of analysis is to take this data set and make predictions against the data.  For example, looking at the data at the end of July how closely could I predict usage at the various stations using the historical data, especially the trends elicited from July 2011?  Given a time of day, could I predict what percentage of bikes rented from station x will wind up at station y?  If overcrowding at a station is a problem, and people can’t drop their bikes off because the racks are full, do I need to transport bikes away from certain stations at a certain time of day?