Hubway Data Challenge Part 1

I was interested to see that a data set had been posted and that a competition had been started to visualize data collected from the Hubway bike system.  For the uninitiated, the Hubway is a bike rental system with racks scattered across boston.  Users pay with a credit card or have a subscription to use the bikes.  When I was working in downtown Boston I would see these bikes all over the place, especially along the Esplanade, and going down Boylston Street.

The data set itself is a set of two Excel files — stations, trips — totaling about 10 MB zipped.  While quite simple, the data by itself represents an opportunity to do some interesting analysis based on the lat/long pairs of associated with the start and end points of the bike rental system.  The date pairs also represent lend themselves to time-series analysis.  With date as a hinge, other data can be incorporated, and in my example I added a comprehensive Date Dim table that extends the data into a time hierarchy (weeks, months, years), and I pulled weather data from noaa.gov to give myself an opportunity to do some basic correlations.

Some of the challenges that I faced in working with this data in MicroStrategy included:

  1. Modeling the same table (stations) for the start and end points
  2. Calculating distance from a lat/long pair
  3. Using a web service to automate the elevation of the stations
  4. Plotting the lat/long coordinates on a map

I have yet to overcome items 3 & 4, but the first two were interesting problems.  The ultimate goal of this exercise is to produce a meaningful visualization, and since MicroStrategy 9.3 was just released, this data set provides an opportunity to test some of the network diagrams, mapping widgets, and Visual Insight capabilities.

For problem #1, the solution in MicroStrategy is to use table aliases.  Basically, from a modeling standpoint aliases mean that architects do not need to create views to replicate a table.

The table alias within MicroStrategy tells the SQL generation engine that the same table can be used twice.

To create a table alias, go to the schema → tables folder, and right click on a table that has already been modeled in.  Select “Create Table Alias” and a new copy of the table will appear.  For my purposes I created 2 stations tables, one that referenced the start, and one for the end.  Within the attributes that reference the table, make sure that the mapping is set to manual, otherwise the automatic mapping will try to point to both the old and aliased table.

The resulting SQL for a report that wants to join Start Station and End Station would look something like this:

1
2
3
4
5
6
7
8
9
10
select a11.end_station_id  id,
a11.start_station_id  id0,
count(distinct a11.id)  WJXBFS1
from trips a11
join stations a12
on (a11.end_station_id = a12.id)
join stations a13
on (a11.start_station_id = a13.id)
group by a11.end_station_id,
a11.start_station_id

By aliasing the stations table twice, the engine is forced to join against itself, but the overhead from the database side is minimal.  From this we can start to glean some basic information from the data.  The South Station / North Station (TD Garden) ride is the most commonly used, and this is explained by the fact that there is no good way to get to South Station from North Station or vice versa!  Taking a bike probably constitutes a ~ seven minute ride.  I would speculate that these rides happen during rush hour, but I’ll table that speculation for future analysis.

The next challenge was to calculate distances between stations.  I found a good site that showed how to do this in Excel, and fortunately transposing Excel syntax into MicroStrategy is straightforward since the functions are named exactly the same.  Here is what the calculation looks like in Excel:

and here is what it looks like in MicroStrategy:

With this calculation in place, the previous report could be enhanced to include distance, and then by combining the distance with the trips you could derive a total mileage value.

The downside of this is that unless the start and end stations are different, then the total distance will be 0, as is the case with the Boston Public Library bike rack.

So, this is how I started the data analysis, and I have continued to build out other attributes to fully form the data and make it more interesting.  The next steps are to start to visualize the data.  I started to play with this, and with the availability of Cloud Personal, I threw up some data slices and created a first pass of a visualization.

In the coming weeks there should start to be some submissions coming online.  I have been more focused on pulling outside data together to add flavor and color to the raw data set, and a colleague suggested I analyze other events like Red Sox games or holidays into my analysis.  Any other suggestions?

 

 

 

 

MicroStrategy Task API

MicroStrategy’s Task API is an advanced topic within the MicroStrategy framework, but in short, it is a protocol that can allow for programmatic access to MicroStrategy in a lightweight manner.  The Task API differs from the URL API, which is a very cheap and easy way to call reports and affect what renders and what features are turned on or off.  Bryan Brandow has a very good explanation on his blog.  Rather than carry all of the overhead and expense of pass a full URL, with the user name and password embedded in the URL, the Task API can pass instructions to MicroStrategy and return the smallest possible result set.

My interest in the Task API came from a need to trim MicroStrategy functionality down to its lightest possible format. I needed to pass instructions to MicroStrategy either through SOAP or HTTP and get a small XML or JSON file back. MicroStrategy’s SDK provides the bulk of what is needed to set up the Task API (i.e., where the .war file is if you want to deploy on Tomcat). The base URL for the API with a Tomcat web server is:

http://localhost:8080/MicroStrategy/servlet/taskAdmin?

In release 9.2.1 the Task API included a new feature called reportDataService. The beauty of this task is that it handles the login, execution, and then logout of a user. This kind of drive-by execution in one call greatly simplifies getting data from MicroStrategy quickly. My example does not pass prompt parameters, but it could. To keep it simple I created a three row, four column static report in MicroStrategy Desktop.

With the TaskAPI framework set up, and the report created (right-click on it to get the Report ID), I could now create a simple .jsp page that called the report. My final output looks like this (some liberties taken with a .css file, and a superfluous call to a jquery library and voila!):

Since I used HTTP to make the call to MicroStrategy I had to wrap and capture the output so that I could see the output. Even though the Task API has an interface to do this on the taskAmin –> Builder page, I wanted my own wrapper so that I could do some things with the output.

First, I need to bring in a handful of java libraries to handle some of the HTTP passing and data parsing:

1
2
3
<%@ page import="java.util.*" %>
<%@ page import="java.net.*" %>
<%@ page import="java.io.*" %>

Here are the fundamental snippets:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
String sUserName = request.getParameter("userName");
String sPassword = request.getParameter("password");
String sReportID = request.getParameter("reportID");

URL uUrl1 = new URL("http://localhost:8080/MicroStrategy/servlet/taskProc?"
    + "taskId=reportDataService&taskEnv=xml&taskContentType=html"
    + "&server=SERVER2008&project=MicroStrategy+Tutorial"
    + "&userid=" + sUserName
    + "&password=" + sPassword
    + "&reportID=" + sReportID
    + "&styleName=CustomXMLReportStyle");

String sUrl1 = uUrl1.toString();
String sOutputLine = "";   

sOutputLine = GetContent(sUrl1).toString() ;

This handles the construction of the URL from the form. The getContent() function does the work:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
StringBuffer GetContent(String sUrl1) throws Exception
{
    URL uURL1 = new URL(sUrl1);
    BufferedReader oReader = new BufferedReader( new InputStreamReader( uURL1.openStream()));

    StringBuffer sResult = new StringBuffer("") ;
    String sInputLine = null ;
   
    while ((sInputLine = oReader.readLine()) != null)
        sResult.append(sInputLine);
    oReader.close();
   
    return(sResult) ;
}

The output in this example will be XML with some HTML wrapping.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
value="<taskResponse statusCode="200">
<?xml version="1.0" encoding="utf-8"?>
<mstr-report>
<information>
<report-name>taskProcReport1</report-name>
<report-id>82A8F038455A7B637C4504B72B39CFDF</report-id>
<report-desc></report-desc>
<number-of-rows>3</number-of-rows>
<number-of-cols>4</number-of-cols>
<message-id>5F5F2E7742C556A4D1B093BD76C3DBAC</message-id>
<state-id>0</state-id>
<block-begin>1</block-begin>
<block-count>-1</block-count>
</information>

<report-data>
   <titles>
      <col index="1">Category</col>
      <col index="2">Promotion Type</col>
      <col index="3">Gross Revenue</col>
      <col index="4">Cost</col>
      <col index="5">Profit</col>
      <col index="6">Profit Margin</col>
   </titles>
   <rows>
   <row index="1">
      <col index="1">Books</col>
      <col index="2">No Promotion</col>
      <col index="3">$893,845</col>
      <col index="4">$679,891</col>
      <col index="5">$213,954</col>
      <col index="6">23.94%</col>
   </row>
   <row index="2">
      <col index="1">Seasonal Sale</col>
      <col index="2">$237,616</col>
      <col index="3">$180,750</col>
      <col index="4">$27,377</col>
      <col index="5">13.15%</col>
   </row>
   <row index="3">
      <col index="1">Special Sale</col>
      <col index="2">$24,655</col>
      <col index="3">$18,756</col>
      <col index="4">$968</col>
      <col index="5">4.91%</col>
   </row>
   </rows>
</report-data>
</mstr-report>
</taskResponse>

There are several use cases for this level of simple query-response execution, and an external data API to extend an existing MicroStrategy environment for data consumption seems like the easiest to comprehend. Although MicroStrategy has portlets available, the Task API let’s developers go one level deeper into the platform. When the result is only XML or JSON, the consuming application can do whatever it wants with the chunk that is returned.