Sanjay Sharma’s Weblog

October 8, 2009

My memcached experiences with Hadoop

Filed under: Advanced computing, Hadoop, HPC — Tags: , , — indoos @ 12:44 pm

Memcached as I have heard and acknowledge, is the de-facto leader in web layer cache.

Here are some interesting facts from Facebook memcached usage statistics (http://www.infoq.com/presentations/Facebook-Software-Stack)

  • Over 25 TB (whooping!!!) of in-memory cache
  • Average latency <200 micro seconds (vow!!)
  • cache serialized PHP data structures
  • Lots of multi-gets

Facebook memcached customizations

  • Over UDP
    • Reduced memory overhead of TCP con buffers
    • Application-level flow control, (optimization for multi-gets)
  • On demand aggregation of per-thread stats
    • Reduces global lock contention
  • Multiple kernel changes to optimize for Memcached usage
    • Distributing network interrupt handling over multiple cores
    • opportunistic polling of network interface

My Memcached usage experience with Hadoop

  • Problem definition- using memcached for key-value lookup in Map class. Each mapper method required look up of around 7-8 different types of key-value Maps. This meant that for each  row in input data (million+ rows), lookup was required 7 times more. The entire Map could not be used as in-memory cache due to the big size of the maps (overall 700-800 MB of hierarchical value object Maps with simple keys)
  • Trial 1- using a single Memcached server at running at Namenode with the entire lookup data in memory as key value pair. The map name and the key was used as the lookup key while value was a serialized java object. Tried Externizable implementation as well for some performance boost.The cache worked as a pure persistence cache filled up as a start up job and then working in a read-only mode in subsequent Map Reduce jobs requiring the lookups. Did have problem choosing the right Java client but finally used Danga over spymemcached as spymemcached was not working properly as a persistence read-only cache.
    • Result- no -no. The Map process were really slow
  • Trial 2 -using 15 Memcached servers- 3 running at Namenode while remaining running at individual data node machines. The entire lookup data as key value pair could be seen segregated on each memcached node using memcached command line console. Did a lot of memcached optimizations as well.
  • Result- still no-no. The through put was around 10000 gets per sec  per memcached server. This amounts to around 150000 (yes!!) lookups per sec. BUT still slow to match with our requirements !
  • Final solution- used Tokyo cabinet (a berkley DB like file based storage system) which is as good as it gets! (performance almost same as in-memory loookups)
About these ads

4 Comments »

  1. One suggestion-
    Assuming that the memcached data is read-only, make one mecached cluster per datanode (you can run more than one memcache server in the cluster on datanode machine on different ports depending upon size of data). Keep the same data on all memcached clusters (since data is read-only, having copies on clusters won’t be an issue). Therefore, memcached lookup will be localized for the mapper tasks. Try to increase max map tasks per datanode depending upon available cpus.

    Comment by Tapan Singh Pratihar — November 23, 2009 @ 1:35 pm

    • Hi,

      Could you please give me some java code where mapreduce is taking in put from memcache.

      Regards,
      Amarjeet

      Comment by Amarjeet — July 12, 2011 @ 9:33 am

  2. Hi
    Can you Please help me in integrating Hadoop and Memcached
    so that Intermediate results can be cached and hadoop performance can be improved.

    Which classes of hadoop needed to modified so as to support caching.

    Thanks and regards
    Kapil

    Comment by kapil bhosale (M.Tech) — October 6, 2012 @ 12:56 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The WordPress Classic Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 43 other followers

%d bloggers like this: