While preparing for the keynote for the recently held HUG India meetup on 31st July, I decided that I will try to keep my session short, but useful and relevant to the lined up sesssions on hiho, JAQL and Visual hive. I have always been a keen student of geography (still take pride in it!) and thought it would be great to draw a visual geographical map of Hadoop ecosystem. Here is what I came up with a little nice story behind it-
- How did it all start- huge data on the web!
- Nutch built to crawl this web data
- Huge data had to saved- HDFS was born!
- How to use this data?
- Map reduce framework built for coding and running analytics – java, any language-streaming/pipes
- How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs – fuse,webdav, chukwa, flume, Scribe
- Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band wagon!
- High level interfaces required over low level map reduce programming– Pig, Hive, Jaql
- BI tools with advanced UI reporting- drilldown etc- Intellicus
- Workflow tools over Map-Reduce processes and High level languages
- Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere, eclipse plugin, cacti, ganglia
- Support frameworks- Avro (Serialization), Zookeeper (Coordination)
- More High level interfaces/uses- Mahout, Elastic map Reduce
- OLTP- also possible – Hbase
Would love to hear feedback about this and how to grow it further to add the missing parts!

great work! can you add the various flavors and/or commercial implementations of these tools? e.g. greenplum, hypertable, cassandra, etc.?
Comment by v — August 18, 2010 @ 5:13 am
Hello Sanjay. The eco-system has come out very well and is very comprehensive. It’d have been great if I had attended the event. Good work. Keep it up.
Comment by balamurugan — August 18, 2010 @ 5:14 am
[...] Hadoop Ecosystem World-Map « Sanjay Sharma’s Weblog (tags: hadoop) [...]
Pingback by links for 2010-08-18 | andy.edmonds.be — August 19, 2010 @ 12:06 am
[...] Hadoop ecosystem World Map – e.g., hiho and Sqoop for loading RDBMS data into Hadoop. [...]
Pingback by State of Data Last Week – Aug 22 « Dr Data's Blog — August 22, 2010 @ 11:46 pm
[...] Hadoop Ecosystem World-Map « Sanjay Sharma’s Weblog http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/ [...]
Pingback by Hadoop: Links, News and Resources (1) « Angel “Java” Lopez on Blog — August 15, 2011 @ 9:55 am
Great work! Can you add the elaboration of the each module.
Comment by Surendra — December 14, 2011 @ 8:47 am
I would have grouped Zookeeper and upcoming Apache Ambari in the coordination framework. Avro/Thrift and Protobuf as the serialization.
Comment by Vikas Deolaliker — February 20, 2012 @ 10:10 pm
[...] http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/ (Thanks to Prashanth for sharing the link!!) [...]
Pingback by Hadoop Ecosystem!! – An Overview « the BPM freak !! — March 26, 2012 @ 7:33 pm
Hello Sanjay, I plan to include link to your blog post about Hadoop ecosystem at my blog – http://spawgi.wordpress.com. I will be giving your reference as well. I hope that is OK. Please let me know otherwise.
Comment by spawgi — September 23, 2012 @ 9:31 am
It would be a pleasure!
Comment by indoos — September 23, 2012 @ 4:58 pm
I would also be referencing your ecosystem map to one of our presentation to our group for an intro to hadoop . is that okay?
Comment by Elizabeth — April 11, 2013 @ 5:23 pm