Sanjay Sharma’s Weblog

July 11, 2010

Hive BI analytics: Visual Reporting

Filed under: Hadoop, Hive, HPC, Java world — Tags: , , , , , , , , — indoos @ 5:23 pm

I had earlier written about using Hive as a data source for BI tools using industry proven BI reporting tools and here is a list of the various official announcements from Pentaho, Talend. Microstrategy and Intellicus –

The topic is close to my heart since I firmly believe that while Hadoop and Hive are true large data analytics tool, their power is currently limited to use by software programmers. The advent of BI tools in Hadoop/Hive world would certainly bring it closer to the real end users – business users.

I am currently not too sure how these BI reporting tools are deciding how much part of  the analytics be left in Map reduce and how much in the reporting tool itself- guess it will take time to find the right balance. Chances are that  I will find it a bit earlier than others as I am working closely  (read here) with Intellicus team to get the changes in Hive JDBC driver for Intellicus’ interoperability with Hive.

February 8, 2010

BI with MapReduce

Filed under: Advanced computing, Hadoop — Tags: , , , , , — indoos @ 2:12 pm

Have any of you used map reduce in the context of business intelligence?

While collating my thoughts on this Linked-in Hadoop discussion, found out that I needed more visuals to explain it first to myself :).

So, here are the many ways in which Hadoop MapReduce does offer an alternative in the big-big BI world-

Scenario 1: Use Hadoop and Hive as interface to BI tools. Pentaho reporting is already supported as of Hive 0.4.0.

Scenario 2: Use Hadoop for intial data polishing, and then dump to a SQL supported column based database near-real BI reporting. Aster data/Vertica /Greenplum sell themselves by advertising  MapReduce connectors (or similar) heavily. The cost of SQL supported column based database is the only pain point here (+ the risk on how these actually scale vs what these promise)

Scenario 3: Use Hadoop for intial data polishing, and then dump to a SQL supported column based database near-real BI reporting. In case of Real time reporting, data can further be BI polished from column based databases to a fast regular RDBMS with BI support.


Scenario 4: The free way:)- Use Hadoop for intial data polishing, and then dump to a regular SQL database with BI support. The export from HDFS can be the Un-sqoop way. The onus would more be on the developer to dump only ready-for-report data (lesser) with most of the BI already completed as part of More MR step.

The important fact to note is that there might be additional costs on moving the major chunk of  BI data analysis part to programmatic interfaces (SQL or MR).  

I am not too much of a database-fallen-in-love type, so do like the way Hive can emerge as a potential BI reporting tool.

Create a free website or blog at