While collating my thoughts on this Linked-in Hadoop discussion, found out that I needed more visuals to explain it first to myself🙂.
So, here are the many ways in which Hadoop MapReduce does offer an alternative in the big-big BI world-
Scenario 1: Use Hadoop and Hive as interface to BI tools. Pentaho reporting is already supported as of Hive 0.4.0.
Scenario 2: Use Hadoop for intial data polishing, and then dump to a SQL supported column based database near-real BI reporting. Aster data/Vertica /Greenplum sell themselves by advertising MapReduce connectors (or similar) heavily. The cost of SQL supported column based database is the only pain point here (+ the risk on how these actually scale vs what these promise)
Scenario 3: Use Hadoop for intial data polishing, and then dump to a SQL supported column based database near-real BI reporting. In case of Real time reporting, data can further be BI polished from column based databases to a fast regular RDBMS with BI support.
Scenario 4: The free way:)- Use Hadoop for intial data polishing, and then dump to a regular SQL database with BI support. The export from HDFS can be the Un-sqoop way. The onus would more be on the developer to dump only ready-for-report data (lesser) with most of the BI already completed as part of More MR step.
The important fact to note is that there might be additional costs on moving the major chunk of BI data analysis part to programmatic interfaces (SQL or MR).
I am not too much of a database-fallen-in-love type, so do like the way Hive can emerge as a potential BI reporting tool.