Splunk for Business Intelligence

Business Intelligence environments rely on many different components working in concert in order for the application as a whole to be successful.  The layers that matter include the actual application stack — i.e., the business intelligence tool and the databases that the BI system accesses.  A fluidly running application stack is useless if the network that glues the BI tool to the data warehouse or OLTP isn’t working, or temporarily goes down.  Even when the application stacks are fine, and the network is fully connected, the hardware itself has to be fully functional and unpegged.  This is a simplified view of the world, but breaking it down into three basic parts makes for a useful view of the world:

  • Application
  • Network
  • Hardware

For architects, dev ops / admins, developers, and your general stakeholder audience, having a holistic view of how these three components interoperate is important.  Enter Splunk.

Splunk is software that can comb logs, machine data, and basically any sort of real-time data that is relevant to a  system and puts it all together in a single unified (web-based) interface.  Even better, for hacks like me, you can download it for free and put it on your own environment.  Working with Splunk is a lot like the game Othello, where on the box it says something to the effect that it takes minutes to learn and a lifetime to master.  The lifetime to master bit is in reference to regular expressions, which are simple in concept, but can get very complex and intricate.  Here’s a screenshot from Splunk’s online documentation:

regex

An example I have been using is to get familiar with Splunk is the DSSERRORS.log file in MicroStrategy.  Looking at this log file can be a painful experience.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
2013-09-25 17:03:34.048-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... merge conflict, resolved by keeping both objects
2013-09-25 17:03:34.048-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... new object placed in folder of replacement object
2013-09-25 17:03:34.048-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Committing to object save of the remaining 1 object(s)
2013-09-25 17:03:34.486-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Purging client side cache at the start of Source Manipulator operations
2013-09-25 17:03:34.658-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Examining Object(Name="Imp1" Type=Metric ID=92012DC34233D24BD12BA8B1EBCB4C2F Proj=A8F127FF42EBF5B2E6B9A39C39513B1E Ver=DE38ADCC436A69BC49140FADEB5F72FF ), replacement path = "Public Objects\Metrics"
2013-09-25 17:03:34.673-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... merge conflict, resolved by keeping both objects
2013-09-25 17:03:34.673-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... new object placed in folder of replacement object
2013-09-25 17:03:34.673-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Committing to object save of the remaining 1 object(s)
2013-09-25 17:04:14.574-05:00 [HOST:SERVER2008][PID:2204][THR:5492][Metadata Server][Error][0x80041023] An object with name 'Imp1' and type 'Metric' already exists in the destination folder.
2013-09-25 17:04:22.481-05:00 [HOST:SERVER2008][PID:2204][THR:5492][Metadata Server][Error][0x80041023] An object with name 'Imp1' and type 'Metric' already exists in the destination folder.
2013-09-25 17:05:23.311-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Object Server][Error][0x80041019] Item 'MergeColumnHeaderCells' cannot be found.
2013-09-25 17:05:58.264-05:00 [HOST:SERVER2008][PID:2204][THR:5608][DMX][Info] CDmxPmmlrPKG::InitMaxXMLSize(215): Maximum XML size set to 3145728 bytes.
2013-09-25 17:05:59.655-05:00 [HOST:SERVER2008][PID:8740][THR:8984][DMX][Info] CDmxPmmlrPKG::InitMaxXMLSize(215): Maximum XML size set to 3145728 bytes.
2013-09-25 17:08:47.662-05:00 [HOST:SERVER2008][PID:2204][THR:5832][Kernel][Warning] MsiUserSession::DecrementProjectSessionCount: Admin user logout from project 11. Do not decrease Project-User level session count
2013-09-25 17:08:47.662-05:00 [HOST:SERVER2008][PID:2204][THR:5832][Kernel][Warning] MsiUserSession::hLogOutFromProject: Admin user logout from project 11. Do not decrease Project level session count
2013-09-26 08:49:48.195-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server checking License compliance has the following license key: xxxxxxxxxxxxxxxxxxxxx
2013-09-26 08:49:48.445-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server is checking Named User Licenses.
2013-09-26 08:49:48.945-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server is in Named User compliance based on contract 300242.
2013-09-26 08:49:49.132-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server finishes checking Named User Licenses.
2013-09-26 08:49:49.132-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server checking License compliance for activation has the following license key: xxxxxxxxxxxxxxxxxxxxx
2013-09-26 08:49:49.132-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server is checking License Activation.
2013-09-26 08:49:50.664-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Error] Distribution Service: MSINCSEngine::hParseRetreiveAppIDFromAPNSPEMFile: the name of iAPNSPEMFile is empty.
Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [22018:-3030: on SQLHANDLE] [Microsoft][ODBC Microsoft Access Driver] Data type mismatch in criteria expression. Connection String: DSN=MicroStrategy_Tutorial_Data;DBQ=C:\Program Files (x86)\MicroStrategy\Tutorial Reporting\TUTORIAL_DATA_7200.MDB;DRIVERID=25;FIL=MS Access;MAXBUFFERSIZE=8192;PAGETIMEOUT=5;UID=admin;. SQL Statement: select distinct a11.[SHIPPER_DESC] + 1 AS CustCol_12
from [LU_SHIPPER] a11.
2013-09-26 09:50:16.315-05:00 [HOST:SERVER2008][PID:2204][THR:3868][Query Engine][Error][0x80042300] QueryEngine encountered error: Execute Query failed.
Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [22018:-3030: on SQLHANDLE] [Microsoft][ODBC Microsoft Access Driver] Data type mismatch in criteria expression. Connection String: DSN=MicroStrategy_Tutorial_Data;DBQ=C:\Program Files (x86)\MicroStrategy\Tutorial Reporting\TUTORIAL_DATA_7200.MDB;DRIVERID=25;FIL=MS Access;MAXBUFFERSIZE=8192;PAGETIMEOUT=5;UID=admin;. SQL Statement: select distinct a11.[SHIPPER_DESC] + 1 AS CustCol_12
from [LU_SHIPPER] a11..
2013-09-26 09:50:16.331-05:00 [HOST:SERVER2008][PID:2204][THR:3868][Kernel][Error] [MicroStrategy Tutorial][Administrator][JobID=992] (SQL Generation CompleteQueryEngine encountered error: Execute Query failed.
Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [22018:-3030: on SQLHANDLE] [Microsoft][ODBC Microsoft Access Driver] Data type mismatch in criteria expression. Connection String: DSN=MicroStrategy_Tutorial_Data;DBQ=C:\Program Files (x86)\MicroStrategy\Tutorial Reporting\TUTORIAL_DATA_7200.MDB;DRIVERID=25;FIL=MS Access;MAXBUFFERSIZE=8192;PAGETIMEOUT=5;UID=admin;. SQL Statement: select distinct a11.[SHIPPER_DESC] + 1 AS CustCol_12
from [LU_SHIPPER] a11.. Error in Process method of Component: QueryEngineServer, Project MicroStrategy Tutorial, Job 992, Error Code= -2147212544.)

Most of the entries in the log file are noise, but at the end of the log file there is an indication that a SQL error occurred.  This is just one of the many types of incidences that can happen in a BI system.  Perhaps the database blipped, or a column was modified, or in this instance a bad attribute with some wonky syntax in an pass-through statement was issued.

splunk monitor

Using a search on specific keywords with a wildcard (such as SQLExecDirect*) I can isolate the events in the DSSERRORS.log file and catch them well before they ever get to Enterprise Manager.  In the example above, I can see based on the last vertical bar chart that something happened.  In the reporting view I can drill into that event and identify exactly what the spike represents.

When Splunk is reading the I/O on the application servers, ETL error logs, and any network logs all of the main areas of concern to the BI system can be centralized and we have visibility into the pulse of BI in the enterprise.  Getting back to the idea that this is something that can take a lifetime to master, fully integrating BI logs with all of the other components in the data ecosystem is where really understanding Splunk, getting creative with regular expressions, and combining them into reports comes into play.

 

The Utility of Custom Languages

I’ve spent an inordinate amount of time these past few weeks understanding how translation and languages (a.k.a. locales) work in MicroStrategy.  The concept is simple: the metadata and/or the data in the data warehouse can be represented differently for users depending on the preferred language.  This means that there has to be a mechanism for mapping what the user sees to whatever is stored in the metadata or the warehouse.  For the data warehouse, data translation involves dynamically selecting lookup tables that have descriptions tied to the surrogate keys in the warehouse, and for the metadata, translation means associating a language with an object and identifying the label that gets called when the object (attribute, metric, prompt, etc.) is called.

Such is the example of a lookup table that has multiple columns supporting different languages, which is dynamically accessed based on the preferences that the user has been set up with.

internationalization

Metadata translation works slightly differently, and is accessed for creation and modification by right clicking on any attribute and choosing the “translate” option.

objecttranslation

The reason I’ve been spending so much time looking into languages is that unlike many things in MicroStrategy, the creation or altering of languages and translations does not have a corollary function available in Command Manager.  This has taken me down an obscure alley in MicroStrategy to a place called the Repository Translation Wizard.  The RTW is an executable that sets up a staging area for the translations that have to occur for selected objects.  I had seen this aspect of MicroStrategy before, but never had a reason to use it — but for the uninitiated it is under the Object Manager folder:

rtw

So, essentially the RTW is a utility that can do a bulk replacement of object names.  For a selected project the developer would choose all of the objects that require translation, and the utility creates an XML dump file with the proper object IDs and names.

rtw_language_selection

At this point the RTW repository is essentially a staging area where the object information is dropped, with a placeholder column for the translation.

rtwtable

So, a little SQL update statement…

1
2
3
4
update [MSTR_MD].[dbo].[RST_MicroStrategy_Tutorial_RTW_TESTER]
set TRANSLATION = 'Promotion for Client1'
where OBJECTID = '1B7A3F0249C78AD99C3A05A7274389BC'
and READABLEKEY = 'Object Name'

…and then check that the object has been translated…

objecttranslation_success

…and when Client1 logs into the project the report will render with their custom name…

output_final

A scenario for this type of implementation is somewhat limited to highly customized environments where every user feels that the report is specific to their needs and their business.  For example, if the Promotion attribute was used for departmental reporting, the language could be used to make the attribute label look and feel like it was designed specifically for that user.

MicroStrategy and AWS

I was surprised (but pleased) to find that MicroStrategy is now available on the Amazon Web Services Marketplace.  For the past several months I had been watching the business intelligence offerings there but was seeing mostly Jaspersoft, Birt, and a handful of smaller players that I was not familiar with.  Seeing MicroStrategy available, especially the the free reporting suite, means that kicking the tires, spinning out a new instance for a prospect demo, or simply trying out 9.3 in a temporary environment is cheap and easy to do.  I also see MicroStrategy’s embrace of AWS, along with their very early support for Redshift, as signs that MicroStrategy is committed to the cloud architecture, and not just their cloud offering.

I took the MicroStrategy Suite on AWS for a test drive through my AWS account and had it up and running within 30 minutes.  The one trick was realizing that I had to decrypt the key pair I generated so that I could log into the server from remote desktop.  Once I had that in place I had no troubles getting to the machine, activating the server, and then connecting to it from my local 9.3 instance.

Unless I win the $550M PowerBall jackpot tonight I’ll probably terminate the server before the weekend.  In most companies today, to set up a VM like this I would have had to request a ticket for a new VM, haggle with the VM admin over how much memory I was allowed (I like to have at minimum 8 GB for development, and 64 GB for production), and then have to download the MicroStrategy media, move it the machine, unpack it, install it, and then reboot.  Using AWS I didn’t have to haggle with anyone, I opted for the $0.46 / hour and 7.75 GB RAM version, and instantiated the server.  All the necessary ports were open (also a typical hassle) and I was up and running without any hiccups.

SQL Server 2008 comes pre-installed on the image, and the metadata and warehouse tables are all contained in SQL Server.  Even though this was a preconfigured instance I had to run the cubes in order to get the Visual Insight reports to work in the Tutorial.  I also tweaked the memory and report governing settings, but it’s understandable that none of these were pre-configured.

I haven’t tested the mobile piece yet, but the server was fully configured and I doubt that the setup will be different than the initial spinout.  So…again, MicroStrategy goes off and does something, neglects to tout it, and through some luck I stumble upon something they’ve done that is developer / engineer friendly.

Extended Data Access

When MicroStrategy first announced their extended data access (XDA) support many moons ago, the concept didn’t really resonate with me.  Most of the use cases for XDA were one-off hacks that really didn’t have a place in the traditional data warehouse based-environments.  In recent years, however, the number of open access APIs has exploded, and combined with multi-source capabilities, dashboards, Visual Insight, and especially Mobile, the utility inherent to being able to gather a data set from a REST, SOAP, or WebDAV means that the data warehouse isn’t as critical as it used to be.

One of my favorite sites to peruse is data.gov — when the site was first created much of the data was a disparate set of Excel dumps, XML fragments, CSV, text, etc..  Not only does is make it harder to integrate these formats, but it requires someone to update the extracts.  Since I have an interest in environmental and regulatory matters, I wanted to integrate business intelligence with some EPA data.  I have also been doing some research to find ways to avoid writing ETL jobs whenever possible, and having the option to use a RESTful interface within MicroStrategy means that I can integrate data without having to write a set of data integration packages.

The high-level flow of tasks needed to create a report based on a data API is as follows:

  1. create a database instance for Xquery
  2. gather the proper Xquery syntax using the Xquery Editor and Generator
  3. create a new freeform SQL report
  4. drop the xquery definition into the freeform SQL editor
  5. create a visual insight document on top of the new report

So, creating a new xquery database instance is pretty straightforward.  The only trick needed is to reference the xquery data base type:

The real effort comes when getting the proper xquery syntax together.  When I first started researching how to create a report, the documentation wasn’t very clear on what the syntax needed to look like, so I was worried that it would entail a lot of hand coding, which leads to mistakes, which leads to me getting frustrated and then giving up.  But as I started to play with the editor, I realized that it was quite flexible, and could pretty quickly spin out the syntax I needed.

So, long story short, here’s what the query looks like in a free form SQL report:

None of this needs to be typed.  The Xquery Generator and Editor handles it all.  In this example I am using the following URL to request data from the EPA (New Hampshire):

http://iaspub.epa.gov/enviro/efservice/tri_facility/state_abbr/NH

The full query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
declare copy-namespaces no-preserve, no-inherit;
declare namespace mstr-rest = 'http://www.microstrategy.com/xquery/rest-functions';
declare function mstr-rest:post($uri,$payload,$http-headers) external;

let $uri := ('http://iaspub.epa.gov/enviro/efservice/tri_facility/state_abbr/NJ')
let $http-headers := ( )
let $payload := ( )
let $result := mstr-rest:post($uri,$payload,$http-headers)


return
<Table><ColumnHeaders>
<ColumnHeader  name='FACILITY_NAME' type='xs:string' />
<ColumnHeader  name='PREF_LATITUDE' type='xs:string' />
<ColumnHeader  name='PREF_LONGITUDE' type='xs:string' />
<ColumnHeader  name='FAC_CLOSED_IND' type='xs:string' />
</ColumnHeaders>
<Data>{for $ancestor0 in $result/tri_facilityList/tri_facility
order by $ancestor0/FACILITY_NAME
return <Row>
<FACILITY_NAME>{fn:data($ancestor0/FACILITY_NAME)}</FACILITY_NAME>
<PREF_LATITUDE>{fn:data($ancestor0/PREF_LATITUDE)}</PREF_LATITUDE>
<PREF_LONGITUDE>-{fn:data($ancestor0/PREF_LONGITUDE)}</PREF_LONGITUDE>
<FAC_CLOSED_IND>{fn:data($ancestor0/FAC_CLOSED_IND)}</FAC_CLOSED_IND>
</Row>}</Data></Table>

This entire string was created by the editor (found in the MicroStrategy –> Desktop folder).  The only trick that I had to figure out was that the fields in the tree view response section have to be dragged and dropped onto the Main Grid View section.  When the desired fields are in place, the code can be generated and loaded.  The fully query is available in the Xquery generation window and can be copied as-is and dropped into the freeform SQL.

Once the query is in the report editor, the number of XML ColumnHeaders has to correlate to the number of attributes / metrics defined in the report.  In my example there are four column headers, thus four attributes in my report.

Because the data that is returned by the url is XML, MicroStrategy knows how to crosstab and render the results.  I had to make one little adjustment to the data and that involved putting a dash before the PREF_LONGITUDE field, otherwise the data will map to Russia…which isn’t very useful.

The resulting data is the lat / long coordinates of EPA toxic release sites. The high level data doesn’t indicate exactly what has happened at the site, but presumably additional extended queries to the other TRI tables would yield more information. The lat and long is easily consumed by the Visual Insight report, giving us this final output for Pennsylvania (my link gives New Hampshire data which is smaller and comes back faster):

One note: I haven’t been able to tell whether MicroStrategy will be able to consume JSON as a data format any time soon.  Doing so would presumable expand the number of native data sets that MicroStrategy could report off of.

MicroStrategy has several features that they just seem to throw out there without much fanfare or for that matter documentation.  Having SOAP and REST support is important and opens up the platform to doing pretty interesting (and real-time) things — especially on Mobile devices.  Here’s just one idea: combine NOAA and Twitter data to localize weather information and sentiment.  Is it really cold outside?  If so, just how cold?  500 tweets in your zip code in the last hour cold???

 

Hubway Data Challenge Part 2

Once I had the Hubway bike system data loaded into a database, and modeled into MicroStrategy, I could start to play with the data and do some basic profiling.  The more I looked at the data, the more I wanted to add things.  For example, the trips table as the birth year of subscribing riders, which lends itself to creating an age attribute.  To model in age, I created an attribute and used the pass-through applysimple function.  This is the basic syntax needed: ApplySimple(“(year(#0)-#1)”, [start_date_desc], [birth_date]).

When added to a report by itself, the age attribute will generate the following SQL:

1
2
SELECT DISTINCT (YEAR(a11.start_date_desc)-a11.birth_date) CustCol_3
FROM trips a11

As mentioned in the part 1 post, the data offers the opportunity to add more layers and texture because the dimensions are so generic.  Latitude and longitude coordinates can be used to derive the elevation, which would answer one of the questions on the Hubway Data Challenge web site, Do Hubway cyclists only ride downhill?  A data dimension could be used to correlate against the academic schedule, or even gas prices.  Anyway, on to the eye candy…

For those of you who read Stephen Few you know that visual design isn’t easy.  Few’s philosophy espouses simplicity and elegance over complexity and flash.  If you can’t generally understand the data in less than ten seconds you have failed your audience.  Basic, muted colors that make careful use of highlights is preferred over harsh and bright color tones throughout.  These are all great recommendations, and as I progress through the different phases of my interaction with the data I will adhere more closely to these recommendations.  In the meantime, I simply want to profile the data using some basic charts and graphs.  The alternative to graphing the data is that you get wide and long grids of data with visual appeal.  The tradeoff is that you get to pivot the data, sort it, filter it, etc., but exceptions, trends, and a general sense of the data quality doesn’t readily present itself.

So, a quick and dirty way to start to understand the data is to graph it.  I have gotten used to the MicroStrategy graphing options, but many developers will cite the core graphing technology as one of the weaker aspects of the platform.  The widgets and visual insight graphics have exceeded the Desktop graphing capabilities, but I still like to use the graph formatting to create vertical / horizontal bar charts, scatterplots,  and time-series analyses. So, simply to get a flavor of the data I created a few graphs.

This graph shows the activity (trips) for a month — in the page-by — and I tried see if there was a way to quickly tell whether temperature spikes led to a decrease in usage.  To do this correctly I’d likely want to average out the trips by weekday and get a rolling temperature average.  Only with the means in place can I get a true understanding of whether a 10 degree shift in temperature leads to an n% variation in usage.

Trips and Temp, dual axis

One of the data challenge sample questions asks whether rentals after 2 AM are more likely to come from the under 25 population.  I extended this question to ask whether usage varies by gender.  I took liberties with the coloring for effect, but I would mute these tones in a dashboard.  I also incorporated another axis (trip distance) to see whether rides are longer at certain times in the day, but since I didn’t use an average metric, the second axis isn’t very meaningful.

Male Female Usage by Hour

No basic correlation study should go without a scatterplot.  The r values are included, but aren’t very telling.  To make this graph work I had to clear out the trips that involved 0 distance (i.e., the rent and return location are the same).  Because this graph also had month in the page-by, some months showed a higher r value than others.  Again, I’m simply using this to get a feel for the data and get some general answers to high level questions.

Scatterplot, male female correlation

Based on some feedback I got from a colleague, I was advised to try and label the axes.   I tried to do something that tied the color of the axis to the metric, and this is what I got.  To me this graph is telling in that it appears to suggest that as the bike rental program became part of the city culture, people started taking longer rides.

dual axis trending

So, it’s a start.  With some basic profiling underway I am starting to compile a list of some high level questions that might be telling or informative about the data.  Station analysis and trip patterns are a good place to go with the data, and some of the questions that I’ve started to formulate go along these lines:

  • Which station sees the most usage?
  • What percent of trips end and start in the same place?
  • What bikes see the most usage, and of them, what side of the river do they spend the most time on?
  • How has usage changed this year versus last year?  Can the data be used to illustrate the growth of the program in some neighborhoods versus others?
These questions have their parallels to the business community, and represent the typical deep dive that a business analyst would do.  The next layer of analysis is to take this data set and make predictions against the data.  For example, looking at the data at the end of July how closely could I predict usage at the various stations using the historical data, especially the trends elicited from July 2011?  Given a time of day, could I predict what percentage of bikes rented from station x will wind up at station y?  If overcrowding at a station is a problem, and people can’t drop their bikes off because the racks are full, do I need to transport bikes away from certain stations at a certain time of day?

 

 

Hubway Data Challenge Part 1

I was interested to see that a data set had been posted and that a competition had been started to visualize data collected from the Hubway bike system.  For the uninitiated, the Hubway is a bike rental system with racks scattered across boston.  Users pay with a credit card or have a subscription to use the bikes.  When I was working in downtown Boston I would see these bikes all over the place, especially along the Esplanade, and going down Boylston Street.

The data set itself is a set of two Excel files — stations, trips — totaling about 10 MB zipped.  While quite simple, the data by itself represents an opportunity to do some interesting analysis based on the lat/long pairs of associated with the start and end points of the bike rental system.  The date pairs also represent lend themselves to time-series analysis.  With date as a hinge, other data can be incorporated, and in my example I added a comprehensive Date Dim table that extends the data into a time hierarchy (weeks, months, years), and I pulled weather data from noaa.gov to give myself an opportunity to do some basic correlations.

Some of the challenges that I faced in working with this data in MicroStrategy included:

  1. Modeling the same table (stations) for the start and end points
  2. Calculating distance from a lat/long pair
  3. Using a web service to automate the elevation of the stations
  4. Plotting the lat/long coordinates on a map

I have yet to overcome items 3 & 4, but the first two were interesting problems.  The ultimate goal of this exercise is to produce a meaningful visualization, and since MicroStrategy 9.3 was just released, this data set provides an opportunity to test some of the network diagrams, mapping widgets, and Visual Insight capabilities.

For problem #1, the solution in MicroStrategy is to use table aliases.  Basically, from a modeling standpoint aliases mean that architects do not need to create views to replicate a table.

The table alias within MicroStrategy tells the SQL generation engine that the same table can be used twice.

To create a table alias, go to the schema → tables folder, and right click on a table that has already been modeled in.  Select “Create Table Alias” and a new copy of the table will appear.  For my purposes I created 2 stations tables, one that referenced the start, and one for the end.  Within the attributes that reference the table, make sure that the mapping is set to manual, otherwise the automatic mapping will try to point to both the old and aliased table.

The resulting SQL for a report that wants to join Start Station and End Station would look something like this:

1
2
3
4
5
6
7
8
9
10
select a11.end_station_id  id,
a11.start_station_id  id0,
count(distinct a11.id)  WJXBFS1
from trips a11
join stations a12
on (a11.end_station_id = a12.id)
join stations a13
on (a11.start_station_id = a13.id)
group by a11.end_station_id,
a11.start_station_id

By aliasing the stations table twice, the engine is forced to join against itself, but the overhead from the database side is minimal.  From this we can start to glean some basic information from the data.  The South Station / North Station (TD Garden) ride is the most commonly used, and this is explained by the fact that there is no good way to get to South Station from North Station or vice versa!  Taking a bike probably constitutes a ~ seven minute ride.  I would speculate that these rides happen during rush hour, but I’ll table that speculation for future analysis.

The next challenge was to calculate distances between stations.  I found a good site that showed how to do this in Excel, and fortunately transposing Excel syntax into MicroStrategy is straightforward since the functions are named exactly the same.  Here is what the calculation looks like in Excel:

and here is what it looks like in MicroStrategy:

With this calculation in place, the previous report could be enhanced to include distance, and then by combining the distance with the trips you could derive a total mileage value.

The downside of this is that unless the start and end stations are different, then the total distance will be 0, as is the case with the Boston Public Library bike rack.

So, this is how I started the data analysis, and I have continued to build out other attributes to fully form the data and make it more interesting.  The next steps are to start to visualize the data.  I started to play with this, and with the availability of Cloud Personal, I threw up some data slices and created a first pass of a visualization.

In the coming weeks there should start to be some submissions coming online.  I have been more focused on pulling outside data together to add flavor and color to the raw data set, and a colleague suggested I analyze other events like Red Sox games or holidays into my analysis.  Any other suggestions?

 

 

 

 

MicroStrategy Task API

MicroStrategy’s Task API is an advanced topic within the MicroStrategy framework, but in short, it is a protocol that can allow for programmatic access to MicroStrategy in a lightweight manner.  The Task API differs from the URL API, which is a very cheap and easy way to call reports and affect what renders and what features are turned on or off.  Bryan Brandow has a very good explanation on his blog.  Rather than carry all of the overhead and expense of pass a full URL, with the user name and password embedded in the URL, the Task API can pass instructions to MicroStrategy and return the smallest possible result set.

My interest in the Task API came from a need to trim MicroStrategy functionality down to its lightest possible format. I needed to pass instructions to MicroStrategy either through SOAP or HTTP and get a small XML or JSON file back. MicroStrategy’s SDK provides the bulk of what is needed to set up the Task API (i.e., where the .war file is if you want to deploy on Tomcat). The base URL for the API with a Tomcat web server is:

http://localhost:8080/MicroStrategy/servlet/taskAdmin?

In release 9.2.1 the Task API included a new feature called reportDataService. The beauty of this task is that it handles the login, execution, and then logout of a user. This kind of drive-by execution in one call greatly simplifies getting data from MicroStrategy quickly. My example does not pass prompt parameters, but it could. To keep it simple I created a three row, four column static report in MicroStrategy Desktop.

With the TaskAPI framework set up, and the report created (right-click on it to get the Report ID), I could now create a simple .jsp page that called the report. My final output looks like this (some liberties taken with a .css file, and a superfluous call to a jquery library and voila!):

Since I used HTTP to make the call to MicroStrategy I had to wrap and capture the output so that I could see the output. Even though the Task API has an interface to do this on the taskAmin –> Builder page, I wanted my own wrapper so that I could do some things with the output.

First, I need to bring in a handful of java libraries to handle some of the HTTP passing and data parsing:

1
2
3
<%@ page import="java.util.*" %>
<%@ page import="java.net.*" %>
<%@ page import="java.io.*" %>

Here are the fundamental snippets:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
String sUserName = request.getParameter("userName");
String sPassword = request.getParameter("password");
String sReportID = request.getParameter("reportID");

URL uUrl1 = new URL("http://localhost:8080/MicroStrategy/servlet/taskProc?"
    + "taskId=reportDataService&taskEnv=xml&taskContentType=html"
    + "&server=SERVER2008&project=MicroStrategy+Tutorial"
    + "&userid=" + sUserName
    + "&password=" + sPassword
    + "&reportID=" + sReportID
    + "&styleName=CustomXMLReportStyle");

String sUrl1 = uUrl1.toString();
String sOutputLine = "";   

sOutputLine = GetContent(sUrl1).toString() ;

This handles the construction of the URL from the form. The getContent() function does the work:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
StringBuffer GetContent(String sUrl1) throws Exception
{
    URL uURL1 = new URL(sUrl1);
    BufferedReader oReader = new BufferedReader( new InputStreamReader( uURL1.openStream()));

    StringBuffer sResult = new StringBuffer("") ;
    String sInputLine = null ;
   
    while ((sInputLine = oReader.readLine()) != null)
        sResult.append(sInputLine);
    oReader.close();
   
    return(sResult) ;
}

The output in this example will be XML with some HTML wrapping.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
value="<taskResponse statusCode="200">
<?xml version="1.0" encoding="utf-8"?>
<mstr-report>
<information>
<report-name>taskProcReport1</report-name>
<report-id>82A8F038455A7B637C4504B72B39CFDF</report-id>
<report-desc></report-desc>
<number-of-rows>3</number-of-rows>
<number-of-cols>4</number-of-cols>
<message-id>5F5F2E7742C556A4D1B093BD76C3DBAC</message-id>
<state-id>0</state-id>
<block-begin>1</block-begin>
<block-count>-1</block-count>
</information>

<report-data>
   <titles>
      <col index="1">Category</col>
      <col index="2">Promotion Type</col>
      <col index="3">Gross Revenue</col>
      <col index="4">Cost</col>
      <col index="5">Profit</col>
      <col index="6">Profit Margin</col>
   </titles>
   <rows>
   <row index="1">
      <col index="1">Books</col>
      <col index="2">No Promotion</col>
      <col index="3">$893,845</col>
      <col index="4">$679,891</col>
      <col index="5">$213,954</col>
      <col index="6">23.94%</col>
   </row>
   <row index="2">
      <col index="1">Seasonal Sale</col>
      <col index="2">$237,616</col>
      <col index="3">$180,750</col>
      <col index="4">$27,377</col>
      <col index="5">13.15%</col>
   </row>
   <row index="3">
      <col index="1">Special Sale</col>
      <col index="2">$24,655</col>
      <col index="3">$18,756</col>
      <col index="4">$968</col>
      <col index="5">4.91%</col>
   </row>
   </rows>
</report-data>
</mstr-report>
</taskResponse>

There are several use cases for this level of simple query-response execution, and an external data API to extend an existing MicroStrategy environment for data consumption seems like the easiest to comprehend. Although MicroStrategy has portlets available, the Task API let’s developers go one level deeper into the platform. When the result is only XML or JSON, the consuming application can do whatever it wants with the chunk that is returned.