LDAP Directories: The Forgotten NoSQL »

Created at: 17.12.2009 20:00, source: Engine Yard Blog, tagged: Technology Cassandra key-value stores LDAP

When most Rails developers encounter LDAP, it’s usually for user authentication. And most of the time, there’s no choice, they’re working under a dictate that requires them to use it. Usually, this means Active Directory, but very occasionally something like OpenLDAP or the Sun Java Systems Directory Server.

It’s hard to imagine now, but there was once great excitement about the potential for LDAP based directory servers to become more than just authentication servers and morph into general purpose datastores.  LDAP directories promised a single, scalable, high performance data store that could be queried for common information across multiple applications. After all, directories had a lot of virtues:

  • Fast Queries: LDAP directories were heavily indexed, so query speeds were truly impressive—reliably 10x what a relational database could manage. (Write speed was much slower for the same reason: lots of indexes to update when a write happened)
  • Replication: LDAP directories were an “eventually consistent” data store long before Dynamo or Cassandra. Multi-master replication allowed a distributed network of directories to accept writes at any node, and then relay these updates around the directory network. The last update in time always won.
  • Partionable: directories were giant tree structures, and branches could be picked up and moved to another server if the directory got too big. There was built-in referential linking from each amputation point to the correct server, and these servers could be easily geographically distributed.
  • Standardized and efficient: coming from a telecom heritage, LDAP was an efficient wire protocol. It was globalized and cross-system. LDAP queries and responses were binary encoded using distinguished encoding rules, using ASN.1 as the data representation syntax.

In addition to these benefits, directories like Netscape Directory Server and Microsoft Active Directory had a seemingly endless list of other features like rich, complex configurable access control rules and permissions; multiple ways to define groups; rich query semantics and more.

And yet, when we look around today, it’s not LDAP directories that have the NoSQL buzz; it’s the far looser and simpler key-value stores like Cassandra, MongoDB and Redis. So where did LDAP fall down, and is there anything to be learned from its (relative) failure? Here is my own take on why LDAP didn’t take over the world, colored by my (brief) tenure as a product manager for Netscape Directory Server.

  1. Telecom protocols FTL: LDAP, in my own humble opinion, was fatally crippled by its telecom parentage. Just reading the first page of the ASN.1 data structure specification could make your eyes bleed. Debugging a badly behaved LDAP client or query was basically a job for experts wielding binary to text crackers. There was a separate format—LDIF—for converting LDAP into human-readable code, but this was a friction point. Compared to ASN.1, JSON (as an example) is severely limited and incomplete, and yet… about 1000x more popular as a result.
  2. Access control that exceeded human brain capacity: LDAP directories provided lots of rope for people who cared about security to firmly and irrevocably tie themselves in knots. Time and again, I’d see customers with five or more layers of access control rules they found to be confounding, with counter-intuitive effects. Better yet, this level of complexity was indecipherable by anyone without drawing five dimensional set diagrams. Sometimes, there are features you shouldn’t put into a product no matter how much people ask you. They know not what they do.
  3. Interesting data wanted to be relational: it was a simple, but sad truth. Data that’s interesting and important enough to be accessed often by your applications, seems to want to be compared and operated on in the context of your other interesting data; that sounds a lot like the right case for a relational database. Directories, as a hierarchical data store, couldn’t easily accommodate the kinds of queries that customers ended up wanting to do, once they were storing enough interesting data. So the solution was to patch in “relationy” features like aliases which soft-linked two values in different parts of the tree—but these were patchwork solutions. In their worst (over-used) incarnation, they turned a directory server into a weird hard-to-maintain mutant hybrid of relational and hierarchical database.

There were other downsides to LDAP directories of course. The learning curve could be steep for LDAP since it was a truly novel technology for most people used to RDBMS’s and SQL. And probably most importantly, most directories weren’t open source, and so they missed the opportunity to fully leverage a community of interested developers and administrators.

Lessons for this Generation of NoSQL (?)

I hesitate to speculate on the lessons from LDAP for this generation of NoSQL stores, since open source has changed the game considerably in the last ten years. That said, I do think LDAP got a lot of things right (fast, distributable, scalable and standardized). It’s arguable whether custom binary protocols (aka MongoDB’s) will really hurt adoption as long as the data structure specifications are reasonably readable, but Couch’s JSON/REST/HTTP combo is certainly a little easier on the eyes.

I do know one thing: keep the access control simple. Your users will thank you later!


more »

Key-Value Stores in Ruby: The Wrap Up »

Created at: 17.11.2009 20:00, source: Engine Yard Blog, tagged: Technology couchdb javascript key-value stores mongodb ruby s3

This last article in our key-value series will briefly cover a few interesting topics that could each have had full articles of their own. This means that if they seem interesting to you, follow the links that I provide to get more information on them. Lastly, I’ll wrap up by introducing Moneta, written by Yehuda Katz, which provides a unified API for a wide variety of different Key-Value Stores. If you want to write code that allows the user to choose the store to use, you’ll want to pay attention to Moneta.

The difficult part of discussing Key-Value Stores stores today is that it’s a product area seeing rapid development and constant evolution. There are more interesting stores and libraries available than can easily be covered, even in a series like this. I could probably be writing posts every two weeks into next year without running out of subjects. So, alas, many things must be left undiscussed or underdiscussed. But let’s move on to the topics we can cover…

CouchDB

The first great Key-Value Store that isn’t going to get its own article is CouchDB. Apache’s CouchDB is a document-oriented database, like MongoDB. It, however, exposes a RESTful JSON based API that you address with a built in HTTP interface. Like MongoDB, it offers a schema free data store. CouchDB offers solid, built-in replication, and uses JavaScript as its query language. It is a powerful tool.

There are several Ruby libraries which can be used to facilitate using CouchDB. In the examples below, I have used CouchRest, which is based on CouchDB’s own couch.js library:

require 'rubygems'
require 'couchrest'
require 'yaml'

DBH = CouchRest.database!('exercise-log')

response = DBH.save_doc({
  :date => Time.now,
  :activity => ARGV[0],
  :duration => ARGV[1]})

stored_record = DBH.get(response['id'])
puts "Stored:\n#{stored_record.to_yaml}"
wyhaines$ ruby /tmp/couch1.rb
Stored:
--- !map:CouchRest::Document
duration: "97:34"
_rev: 1-eb6f6e3a3e2eae0cd99f3fcbc63d29d6
_id: 0d9e71f44b3e0d3a2013c282bbccb5a0
activity: pedaling
date: 2009/11/12 21:07:45 +0000

Like MongdoDB, one can store any set of keys/values together as a document in CouchDB, and then retrieve it later. CouchRest returns a response from the server that contains an id field, which can be used to retrieve the record that was just stored.

For more complex queries of the document store, one can use views. Views have a lot of power, because they are ultimately defined using JavaScript, but they don’t lend themselves to easy ad-hoc manipulation of the database.

DBH.save_doc({
  "_id" => "_design/query",
  :views => {
    :allkeys => {
      :map => "function(doc) { for (var word in doc) { if (!word.match(/^_/)) emit(word,doc[word])}}"
    }
  }
})

That inserts a view into the database that will be identified by query/allkeys. What a view does is defined by the JavaScript code  it contains. Once a view is inserted into CouchDB, using it is simple:

puts DBH.view('query/allkeys').to_yaml

That particular function was lifted shamelessly from the CouchRest README, and just has a couple terms renamed to make it a little more clear. The output:

---
total_rows: 3
rows:
- id: 0d9e71f44b3e0d3a2013c282bbccb5a0
  value: pedaling
  key: activity
- id: 0d9e71f44b3e0d3a2013c282bbccb5a0
  value: 2009/11/12 21:07:45 +0000
  key: date
- id: 0d9e71f44b3e0d3a2013c282bbccb5a0
  value: "97:34"
  key: duration
offset: 0

This is really just the tip of the iceberg with CouchDB/CouchRest; there’s a wealth of functionality. CouchDB views are implemented with map/reduce capability, which means you can use them to crunch some pretty complex problems on your data. Additionally, CouchRest provides a CouchRest::ExtendedDocument, which your own classes can inherit from. This lets you  easily create a Ruby model for your data, which is then transparently stored inside CouchDB.

class Exercise  "running", :date => Time.now, :duration => "23:44")

Dig into the CouchDB and CouchRest documentation if this looks interesting to you.

S3

I just wanted to briefly mention Amazon’s Simple Storage Service. It is, fundamentally, a simple HTTP accessible Key-Value Store that Amazon has turned into a service. Requests to S3 will have higher latency than requests to a locally hosted data store (and its response latency can be high too), but if you want a simple, robust store that will scale to as much data as you have to push at it, you might seriously consider S3.

Moneta

Moneta is a unified interface to a variety of different key-value type data stores. That is, the same code can be run against a variety of different backing stores, and it will just work. Moneta supports the following stores as of this posting:

  • Basic File Store
  • BerkeleyDB
  • CouchDB
  • DataMapper
  • File store for xattr
  • In-memory store
  • Memcache store
  • Redis
  • S3
  • SDBM
  • Tokyo
  • Xattrs in a file system

Consider this example, which, again, uses CouchDB:

irb(main):003:0> require 'moneta/couch'
require 'rubygems'
require 'yaml'
require 'moneta'
require 'moneta/couch'

cache = Moneta::Couch.new(:db => 'football')

cache['1a_final'] = {
  :where => 'Laramie; War Memorial Stadium',
  :when => "11:30 MST",
  :who => "Southeast Cyclones & Lingle-Ft. Laramie Doggers",
  :prediction => "SE Cyclones by 14"}

puts cache['1a_final'].inspect
wyhaines$ ruby /tmp/moneta1.rb
---
- prediction: SE Cyclones by 14
  when: 11:30 MST
  who: Southeast Cyclones & Lingle-Ft. Laramie Doggers
  where: Laramie; War Memorial Stadium

It works, very simply. If I want to change the code to use something else, like a file based store, it’s as simple as changing one line:

--- couch.rb    2009-11-19 15:00:07.000000000 -0700
+++ file.rb     2009-11-19 15:01:12.000000000 -0700
@@ -1,9 +1,9 @@
 require 'rubygems'
 require 'yaml'
 require 'moneta'
-require 'moneta/couch'
+require 'moneta/file'

-cache = Moneta::Couch.new(:db => 'football')
+cache = Moneta::File.new(:path => '/tmp/football')

 cache['1a_final'] = {
   :where => 'Laramie; War Memorial Stadium',

The rest of the code works without alteration. The Moneta API is designed to be very similar to that of Hash. It has a limited feature set, but the features it provides work identically across all of the supported platforms. For example, it doesn’t currently support iteration or partial matches. If your Key-Value Store needs are simple and you want something that can work with whatever store your users want to use, definitely check out Moneta; it’s a well written tool.

With that, we’ve reached the end of this series. It’s been fun to explore the unique features, as well as the threads that unify each of these different approaches to the problem, on a non-SQL key-value type data store. I hope that I’ve exposed you to new and useful tools.

The landscape of Key-Value Stores is changing rapidly, so it is difficult to stay fully informed all the time. For instance, just a couple days ago there was a blog post implementing a SQL front end for CouchDB. It’s done in Perl, but all it would take is an interested person and a little time, and you could have it in Ruby, too.

If you use a Key-Value Store system, or plan to, keep your eyes open for new developments, because you can bet that someone else will have something interesting next week or next month that may change the landscape again. As always, leave feedback in the comments, and thanks for reading!


more »