June 19, 2013
November 1, 2012
July 30, 2011
July 27, 2011
Webinar- Big Data Analytics Platform: Beyond Traditional Enterprise Data Warehouse
Webinar – Big Data Analytics Platform: Beyond Traditional Enterprise Data Warehouse
July 28, 2011 (10:00 am PT/1:00pm ET )
Register here- http://www.impetus.com/webinar?eventid=45
Free webinar on ‘Big Data Analytics Platform: Beyond Traditional Enterprise Data Warehouse’ covering-
• Traditional EDW v/s Big Data Analytics Platform – What’s missing?
• Building Big Data Analytics Platform
• Is it possible to reuse the existing EDW investments?
• Using open source effectively for enhancing/replacing EDW solutions
• Best practices/ lessons learnt in building Big Data Analytics Platform
• Real-life examples
July 14, 2011
May 10, 2011
Datastax Brisk Quick Start in 10 minutes using git source
Steps-
1. git clone <brisk git url- used brisk1 branch> to <brisk dir>
2. cd <brisk dir>
3. ant
4. <brisk dir>/bin/brisk cassandra -t
This should get the jobtracker/tasktracker running.
5. <brisk dir>/bin/brisk hive
This should get the hive cli running.
hive commands from
http://www.datastax.com/docs/0.8/brisk/about_hive
can be used to test various hive commands.
6. <brisk dir>/resources/cassandra/bin/cassandra-cli
This can be used for running cassandra command line
Demo application – Portfolio Manager works almost OK as per -http://www.datastax.com/docs/0.8/brisk/brisk_demo.
It fails while running “./bin/pricer -o UPDATE_PORTFOLIOS”
This can be resolved by first running create table commands from “<brisk dir>/demos/portfolio_manager/10_day_loss.q” to create missing tables.
Rest just works fine inline with the website documentation.
These steps were used to run brisk on opensuse 64 bit using the source code in single cluster mode.
January 20, 2011
Some quotes- Unit testing in Product Development
“Unit testing is like a Health Insurance Plan or Life term plan? – Both if used correctly”
“Software testing is same as buckling up your seat belt in Air bag enabled car- you never know what might happen”
“TDD is like putting up anti-theft security system at home from day1!”
November 18, 2010
Multicore impact on software development
Recently got the chance to rub shoulders with academia and chip-design software engineers in Bengaluru at http://www.innovate-it.in/ – a conference on “Issues in design of complex Multi-Core Systems”. I was speaking on “Multicore:Choice of Middleware and Framework” and was one of the few ones from the application software world whilst most were experts from the chip design or hard-core hardware/system level programming background.
Few of my revelations from the event were
- There are no silver bullets (yet!) for migrating traditional software to multi-core
- There is certainly a huge vacant play ground for new Players to come up with technologies to allow software to harness muti-core power with no/minimal software changes. Azul Systems is a real world example of this.
- One interesting finding was that HADOOP design is very much similar to multi-core internals. Distribution of work, data sharing/sync and better cache management are problems common to both and being solved almost in the same fashion. Nice to know that basic fundamental solutions fit in at Mega levels as well as Micro levels!
Hadoop’s programming model fits quite well in multi-core world as evident by some success reported on running Hadoop on GPU (MARS).
One of the practical tips for Hadoop clusters is to keep the count of max maps+reducers on single node depending on the number of cores. 8 core machines can run more parallel maps+reducers than 4 core machines. Also keep in mind that data node and task tracker would also be consuming some of the cores’ resources. Refer to this paper for more details- White paper-HadoopPerformanceTuning.pdf
-
August 16, 2010
Hadoop Ecosystem World-Map
While preparing for the keynote for the recently held HUG India meetup on 31st July, I decided that I will try to keep my session short, but useful and relevant to the lined up sesssions on hiho, JAQL and Visual hive. I have always been a keen student of geography (still take pride in it!) and thought it would be great to draw a visual geographical map of Hadoop ecosystem. Here is what I came up with a little nice story behind it-
- How did it all start- huge data on the web!
- Nutch built to crawl this web data
- Huge data had to saved- HDFS was born!
- How to use this data?
- Map reduce framework built for coding and running analytics – java, any language-streaming/pipes
- How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs – fuse,webdav, chukwa, flume, Scribe
- Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band wagon!
- High level interfaces required over low level map reduce programming– Pig, Hive, Jaql
- BI tools with advanced UI reporting- drilldown etc- Intellicus
- Workflow tools over Map-Reduce processes and High level languages
- Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere, eclipse plugin, cacti, ganglia
- Support frameworks- Avro (Serialization), Zookeeper (Coordination)
- More High level interfaces/uses- Mahout, Elastic map Reduce
- OLTP- also possible – Hbase
Would love to hear feedback about this and how to grow it further to add the missing parts!
