I've recently been reading up on Zookeeper; with a view to give a lightning talk on it.
At first glance I found it hard to get a firm grasp on what it is that Zookeeper actualy is/does. When I started telling people I was planning a talk on Zookeeper it turned out that I was not alone, I got a lot of responses along the lines of "great! I heard Zookeeper was awesome but I have no idea why or what it is".
After having spent some time playing with Zookeeper in Python and Closure I'm both very impressed with it and puzzled as to why I didn't get it in the first place...
It's such a simple concept and it is, pretty much, as described on the official web site. I guess I could attribute mine and others confusion to the fact that Zookeeper is kinda "low level" in as much that the site has patterns you can implement to build such things as distributed locks and leader election, but as it stands Zookeeper is just a tree based file system for meta-data... It just so happens this is the exact foundation you want for these high-level constructs.
There are some great libraries for working with Zookeeper, here are the handful I've used so far:
- Zookeeper-clj: https://github.com/liebke/zookeeper-clj
- Kazoo, a batteries included, Python library for ZooKeeper: https://github.com/python-zk/kazoo
- Avout, a Clojure library that models software transactional memory on top of ZooKeeper: https://github.com/liebke/avout
To this end I built Enclosure, a python command line tool that I can point at ZooKeeper and an on disk directory; it models znodes on the structure of that directory and loads the data from files into those nodes.
I also built a script in Python (Clojure version coming soon) that would allow an application to join an environment, download its configuration and subscribe to updates to that configuration file.
You can find Enclosure here: https://github.com/jdoig/Enclosure
I was also mulling over a system for caching bulk API calls that involved a two stage, distributed, caching mechanism that would need to communicate amongst other instances & processes as to whether a call was:
a) not yet started
b) started and streaming into level 2 cache or
c) finished, sorted, cleaned and stored into level 1 cache.
...Zookeeper's barrier recipe (a distributed lock) fit the bill just perfectly.
I'll put my lightning talk video up over the next few weeks.