Splunk for Business Intelligence

Business Intelligence environments rely on many different components working in concert in order for the application as a whole to be successful.  The layers that matter include the actual application stack — i.e., the business intelligence tool and the databases that the BI system accesses.  A fluidly running application stack is useless if the network that glues the BI tool to the data warehouse or OLTP isn’t working, or temporarily goes down.  Even when the application stacks are fine, and the network is fully connected, the hardware itself has to be fully functional and unpegged.  This is a simplified view of the world, but breaking it down into three basic parts makes for a useful view of the world:

  • Application
  • Network
  • Hardware

For architects, dev ops / admins, developers, and your general stakeholder audience, having a holistic view of how these three components interoperate is important.  Enter Splunk.

Splunk is software that can comb logs, machine data, and basically any sort of real-time data that is relevant to a  system and puts it all together in a single unified (web-based) interface.  Even better, for hacks like me, you can download it for free and put it on your own environment.  Working with Splunk is a lot like the game Othello, where on the box it says something to the effect that it takes minutes to learn and a lifetime to master.  The lifetime to master bit is in reference to regular expressions, which are simple in concept, but can get very complex and intricate.  Here’s a screenshot from Splunk’s online documentation:

regex

An example I have been using is to get familiar with Splunk is the DSSERRORS.log file in MicroStrategy.  Looking at this log file can be a painful experience.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
2013-09-25 17:03:34.048-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... merge conflict, resolved by keeping both objects
2013-09-25 17:03:34.048-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... new object placed in folder of replacement object
2013-09-25 17:03:34.048-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Committing to object save of the remaining 1 object(s)
2013-09-25 17:03:34.486-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Purging client side cache at the start of Source Manipulator operations
2013-09-25 17:03:34.658-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Examining Object(Name="Imp1" Type=Metric ID=92012DC34233D24BD12BA8B1EBCB4C2F Proj=A8F127FF42EBF5B2E6B9A39C39513B1E Ver=DE38ADCC436A69BC49140FADEB5F72FF ), replacement path = "Public Objects\Metrics"
2013-09-25 17:03:34.673-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... merge conflict, resolved by keeping both objects
2013-09-25 17:03:34.673-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] ... new object placed in folder of replacement object
2013-09-25 17:03:34.673-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Client Services][Source Manipulator] Committing to object save of the remaining 1 object(s)
2013-09-25 17:04:14.574-05:00 [HOST:SERVER2008][PID:2204][THR:5492][Metadata Server][Error][0x80041023] An object with name 'Imp1' and type 'Metric' already exists in the destination folder.
2013-09-25 17:04:22.481-05:00 [HOST:SERVER2008][PID:2204][THR:5492][Metadata Server][Error][0x80041023] An object with name 'Imp1' and type 'Metric' already exists in the destination folder.
2013-09-25 17:05:23.311-05:00 [HOST:SERVER2008][PID:8740][THR:2552][Object Server][Error][0x80041019] Item 'MergeColumnHeaderCells' cannot be found.
2013-09-25 17:05:58.264-05:00 [HOST:SERVER2008][PID:2204][THR:5608][DMX][Info] CDmxPmmlrPKG::InitMaxXMLSize(215): Maximum XML size set to 3145728 bytes.
2013-09-25 17:05:59.655-05:00 [HOST:SERVER2008][PID:8740][THR:8984][DMX][Info] CDmxPmmlrPKG::InitMaxXMLSize(215): Maximum XML size set to 3145728 bytes.
2013-09-25 17:08:47.662-05:00 [HOST:SERVER2008][PID:2204][THR:5832][Kernel][Warning] MsiUserSession::DecrementProjectSessionCount: Admin user logout from project 11. Do not decrease Project-User level session count
2013-09-25 17:08:47.662-05:00 [HOST:SERVER2008][PID:2204][THR:5832][Kernel][Warning] MsiUserSession::hLogOutFromProject: Admin user logout from project 11. Do not decrease Project level session count
2013-09-26 08:49:48.195-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server checking License compliance has the following license key: xxxxxxxxxxxxxxxxxxxxx
2013-09-26 08:49:48.445-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server is checking Named User Licenses.
2013-09-26 08:49:48.945-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server is in Named User compliance based on contract 300242.
2013-09-26 08:49:49.132-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server finishes checking Named User Licenses.
2013-09-26 08:49:49.132-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server checking License compliance for activation has the following license key: xxxxxxxxxxxxxxxxxxxxx
2013-09-26 08:49:49.132-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Info] Server is checking License Activation.
2013-09-26 08:49:50.664-05:00 [HOST:SERVER2008][PID:2204][THR:5396][Kernel][Error] Distribution Service: MSINCSEngine::hParseRetreiveAppIDFromAPNSPEMFile: the name of iAPNSPEMFile is empty.
Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [22018:-3030: on SQLHANDLE] [Microsoft][ODBC Microsoft Access Driver] Data type mismatch in criteria expression. Connection String: DSN=MicroStrategy_Tutorial_Data;DBQ=C:\Program Files (x86)\MicroStrategy\Tutorial Reporting\TUTORIAL_DATA_7200.MDB;DRIVERID=25;FIL=MS Access;MAXBUFFERSIZE=8192;PAGETIMEOUT=5;UID=admin;. SQL Statement: select distinct a11.[SHIPPER_DESC] + 1 AS CustCol_12
from [LU_SHIPPER] a11.
2013-09-26 09:50:16.315-05:00 [HOST:SERVER2008][PID:2204][THR:3868][Query Engine][Error][0x80042300] QueryEngine encountered error: Execute Query failed.
Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [22018:-3030: on SQLHANDLE] [Microsoft][ODBC Microsoft Access Driver] Data type mismatch in criteria expression. Connection String: DSN=MicroStrategy_Tutorial_Data;DBQ=C:\Program Files (x86)\MicroStrategy\Tutorial Reporting\TUTORIAL_DATA_7200.MDB;DRIVERID=25;FIL=MS Access;MAXBUFFERSIZE=8192;PAGETIMEOUT=5;UID=admin;. SQL Statement: select distinct a11.[SHIPPER_DESC] + 1 AS CustCol_12
from [LU_SHIPPER] a11..
2013-09-26 09:50:16.331-05:00 [HOST:SERVER2008][PID:2204][THR:3868][Kernel][Error] [MicroStrategy Tutorial][Administrator][JobID=992] (SQL Generation CompleteQueryEngine encountered error: Execute Query failed.
Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [22018:-3030: on SQLHANDLE] [Microsoft][ODBC Microsoft Access Driver] Data type mismatch in criteria expression. Connection String: DSN=MicroStrategy_Tutorial_Data;DBQ=C:\Program Files (x86)\MicroStrategy\Tutorial Reporting\TUTORIAL_DATA_7200.MDB;DRIVERID=25;FIL=MS Access;MAXBUFFERSIZE=8192;PAGETIMEOUT=5;UID=admin;. SQL Statement: select distinct a11.[SHIPPER_DESC] + 1 AS CustCol_12
from [LU_SHIPPER] a11.. Error in Process method of Component: QueryEngineServer, Project MicroStrategy Tutorial, Job 992, Error Code= -2147212544.)

Most of the entries in the log file are noise, but at the end of the log file there is an indication that a SQL error occurred.  This is just one of the many types of incidences that can happen in a BI system.  Perhaps the database blipped, or a column was modified, or in this instance a bad attribute with some wonky syntax in an pass-through statement was issued.

splunk monitor

Using a search on specific keywords with a wildcard (such as SQLExecDirect*) I can isolate the events in the DSSERRORS.log file and catch them well before they ever get to Enterprise Manager.  In the example above, I can see based on the last vertical bar chart that something happened.  In the reporting view I can drill into that event and identify exactly what the spike represents.

When Splunk is reading the I/O on the application servers, ETL error logs, and any network logs all of the main areas of concern to the BI system can be centralized and we have visibility into the pulse of BI in the enterprise.  Getting back to the idea that this is something that can take a lifetime to master, fully integrating BI logs with all of the other components in the data ecosystem is where really understanding Splunk, getting creative with regular expressions, and combining them into reports comes into play.