Sanjay Sharma’s Weblog

February 13, 2014

Hurray! My book on Cassandra Design Patterns is out!!

My attempt in summarizing my experiences with using Cassandra for real world problems is out.

Cassandra Design Patterns – is a concise collection of real world Cassandra USAGE and DESIGN patterns.

One of the key decisions while writing the book was whether to use some design pattern template and I finally decided to use a traditional design pattern definition template on the lines of famous GoF design patterns. This approach seemed aligned to the definition of Software Design Patterns on Wikipedia – “In software engineering, a design pattern is a general reusable solution to a commonly occurring problem within a given context in software design.”

Hopefully, strict design TRADITIONALIST would forgive me for using this template in a new paradigm of covering Cassandra Usage, Applicability, Architecture, Data modeling and Applied Design patterns along with Design patterns.

The book starts with capturing details on how Cassandra is useful for solving POST-RDBMS era challenges using  the well known 3V model in the big data world. The patterns are aptly modeled on the 3Vs – velocity, variety and volume and would be a good read for people from RDBMS world to identify if they should start looking beyond the boundaries of their RDBMS world. Interestingly, these patterns are solved by almost all popular NoSQL solutions and hence not limited only to Cassandra.

Next, Cassandra’s unique differentiators are modeled into patterns that tell us how Cassandra can solve some of the most challenging problems in the data base world today.

Real world business problems are seldom solved using a single technology stack and hence the next chapter in the book covers some usage and design patterns of combining Cassandra with some popular big data technologies like Hadoop, Solr/Elastic Search etc.

The subsequent chapters continue the journey with some known patterns and anti patterns that can be used to solve real world problems including some listed under additional interesting patterns that are based on some new upcoming features in Cassandra.

Hopefully this concise book will enable data architects, solution architects, Cassandra developers and experts alike as a helpful tool and a guide for using the power of Cassandra in the right way.

I do promise to keenly listen to community reviews/suggestions  and continue improving on enhancing this list of  Cassandra Use case/Design patterns.


September 29, 2013

Hadoop Cluster Management- Impetus’s Ankush demo at Cloud Expo New York 2013 | SYS-CON.TV

Filed under: Cloud, Hadoop, NoSQL — indoos @ 3:38 am Demo on Impetus’s Big Data Solution at Cloud Expo New York | SYS-CON.TV.

November 1, 2012

Big Data Technologies Landscape

Filed under: Cassandra, Cloud, Hadoop, Hive, NoSQL — Tags: , , , , , — indoos @ 2:24 pm

May 10, 2011

Datastax Brisk Quick Start in 10 minutes using git source

Filed under: Cassandra, Hadoop, Hive, NoSQL — Tags: , , — indoos @ 5:08 pm

1. git clone <brisk git url- used brisk1 branch> to <brisk dir>
2. cd <brisk dir>
3. ant
4. <brisk dir>/bin/brisk cassandra -t
This should get the jobtracker/tasktracker running.
5. <brisk dir>/bin/brisk hive
This should get the hive cli running.
hive commands from can be used to test various hive commands.
6. <brisk dir>/resources/cassandra/bin/cassandra-cli
This can be used for running cassandra command line

Demo application – Portfolio Manager works almost OK as per -
It fails while running “./bin/pricer -o UPDATE_PORTFOLIOS”
This can be resolved by first running create table commands from “<brisk dir>/demos/portfolio_manager/10_day_loss.q” to create missing tables.

Rest just works fine inline with the website documentation.
These steps were used to run brisk on opensuse 64 bit using the source code in single cluster mode.

August 16, 2010

Hadoop Ecosystem World-Map

While preparing for the keynote for the  recently held HUG India meetup on 31st July, I decided that I will try to keep my session short, but useful and relevant to the lined up sesssions on hiho, JAQL and Visual hive. I have always been a keen student of geography (still take pride in it!) and thought it would be great to draw a visual geographical map of Hadoop ecosystem. Here is what I came up with a little nice story behind it-

  1. How did it all start- huge data on the web!
  2. Nutch built to crawl this web data
  3. Huge data had to saved- HDFS was born!
  4. How to use this data?
  5. Map reduce framework built for coding and running analytics – java, any language-streaming/pipes
  6. How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs  – fuse,webdav, chukwa, flume, Scribe
  7. Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band wagon!
  8. High level interfaces required over low level map reduce programming– Pig, Hive, Jaql
  9. BI tools with advanced UI reporting- drilldown etc- Intellicus 
  10. Workflow tools over Map-Reduce processes and High level languages
  11. Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere, eclipse plugin, cacti, ganglia
  12. Support frameworks- Avro (Serialization), Zookeeper (Coordination)
  13. More High level interfaces/uses- Mahout, Elastic map Reduce
  14. OLTP- also possible – Hbase

Would love to hear feedback about this and how to grow it further to add the missing parts!

Hadoop ecosystem map

July 2, 2010

kundera- making life easy for Apache Cassandra users

Filed under: Cassandra, HPC, Java world, NoSQL — Tags: , , , , , — indoos @ 4:54 am

One of my colleagues Animesh has been working on creating an Annotation based wrapper over Cassandra and we have finally decided to open source it for it to be nurtured as a part of the bigger community.

kundera is hosted on and can be reached here –

Here is how to get started with kundera in 5 minutes –

The logic behind kundera is quite simple – provide ORM like wrapper over the difficult-to-use Thrift APIs. Eventually all NoSQL databases would like to have similar APIs so that it is easy to use NoSQL databases.

The initial release includes a JPA LIKE annotation library. The roadmap is to subsequently change it a Cassandra specific JPA extension. The other important feature that would be added is index/search using Lucandra/Solandra.

Blog at