MongoMapper 0.7: Identity Map

Created at: 22.02.2010 00:00, source: RailsTips - Home , tagged: mongomapper patterns

While it isn’t quite ready to shove down everyone’s throats, IdentityMap is a hot new plugin I have been working on, directly due to a need that arose in Harmony.

One of the things I started to notice in Harmony right away was an epic amount of queries, 90% of which were simple id lookups repeated over and over. I remembered hearing about the identity map pattern before and figured I would research it a bit.

An old proverb says that a man with two watches never knows what time it is. If two watches are confusing, you can get in an even bigger mess with loading objects from a database.

So true. Now that I had the plugin system in place, I knew exactly how I would go about integrating IM (identity map) with MM. The great thing about the process was that I had a need from a real application. Over half of the IM plugin was created while vendored in Harmony. Several test cases that are currently in MM are directly due to weird bugs that we experienced in Harmony. I think it is really important to note that MM features grow from real applications, not theoretical fun.

Foo is Foo is Foo

The first thing I had to get working was to make sure that each document in the database equated to one object in Ruby. This means whether I call Item.find(id) or item.parent, if they are both loading the same document, they need to be the same object id in Ruby.

The easiest way I could think to do this was to have one method I always call whenever loading a document from the database. The obvious method name was load. Now, whenever an document is found in the database, I call load instead of new. This seems really insignificant, but down the rabbit trail it has significant implications.

Taking Advantage of Method Lookups and Super

Nothing fancy happens in load, but where it gets cool is in the IM plugin. Because every document coming from the database runs through load, all the IM plugin has to do is override load and ensure that the same document in the database is the same object in Ruby. It does this by storing each document in the map when loaded and then checking each document loaded against that store. If the document is already there, it returns the object for that document instead of creating a new one.

Each class gets an IM (just a plain old hash for now). Because each id is unique, the id is the key in the IM hash and the value is the Ruby object. Because of how method lookups work in Ruby, all that I do is include the IM plugin after the plugin that defines load and I can access the original load method using super. Below is the load method in the IM plugin:

def load(attrs)
  document = identity_map[attrs['_id']]

  if document.nil? || identity_map_off?
    document = super
    identity_map[document._id] = document if identity_map_on?
  end

  document
end

Without even knowing what the other methods do, you can get a feel for what is going on. First, I set document to where I assume it will be in the identity map for the class. If it is nil, that means it doesn’t exist, so I call super which calls the normal load method, and returns a new instance of the class based on the document from the database. Then, if the identity map is turned on, I add it to the map by setting the id as the key and the object as the value.

At the end, I return the document no matter what has happened before. With this tiny bit of code (and a few more lines for setting up hash store), I ensure that no matter how or when a document is queried from the database, it is always the same object in memory. Soooo sweet. Like I said, huge ramifications.

Identity Map Lookups Zap Simple Queries

Once I had this part done, all that was left was to zap the actual queries to the database if the document has already been loaded. I refactored the MM internal methods for finding documents to find_one and find_many. This means whenever you do any kind of find query in MM, eventually it hits one of those two methods.

Taking the same approach as load, if you are doing everything through one or two methods, all you have to do to change those methods is redefine them in the plugin to behave differently. The best part is you still have access to the originals using super. find_one in the IM plugin looks like this:

def find_one(options={})
  criteria, query_options = to_query(options)

  if simple_find?(criteria) && identity_map.key?(criteria[:_id])
    identity_map[criteria[:_id]]
  else
    super.tap do |document|
      remove_documents_from_map(document) if selecting_fields?(query_options)
    end
  end
end

simple_find? returns true if doing a query only by _id or _id and _type (which happens when using single collection inheritance). This method addition means we can return the document from the identity map without doing a query, if it has already been loaded.

A Simple, Yet Awesome Example

I learn best with code examples, so here is one for you. A simple item class for creating an even more simple tree.

MongoMapper.connection = Mongo::Connection.new('127.0.0.1', 27017, :logger => Logger.new(STDOUT))
MongoMapper.database = 'testing'

class Item
  include MongoMapper::Document
  
  key :title, String
  key :parent_id, ObjectId
  
  belongs_to :parent, :class_name => 'Item'
end

root = Item.create(:title => 'Root')
child = Item.create(:title => 'Child', :parent => root)
grand_child = Item.create(:title => 'Grand Child', :parent => child)

puts root.equal?(child.parent) # false
puts child.equal?(grand_child.parent) # false

If you run this code, you get false and false as the output, along with a few queries to find the parent of child and grand_child. Below is some sample output:

MONGODB admin.$cmd.find({:ismaster=>1}, {}).limit(-1)
MONGODB db.items.update({:_id=>4b81a114d072c40c3f000001}, {"title"=>"Root", "_id"=>4b81a114d072c40c3f000001, "parent_id"=>nil})
MONGODB db.items.update({:_id=>4b81a114d072c40c3f000002}, {"title"=>"Child", "_id"=>4b81a114d072c40c3f000002, "parent_id"=>4b81a114d072c40c3f000001})
MONGODB db.items.update({:_id=>4b81a114d072c40c3f000003}, {"title"=>"Grand Child", "_id"=>4b81a114d072c40c3f000003, "parent_id"=>4b81a114d072c40c3f000002})
MONGODB testing.items.find({:_id=>4b81a114d072c40c3f000001}, {}).limit(-1)
false
MONGODB testing.items.find({:_id=>4b81a114d072c40c3f000002}, {}).limit(-1)
false

Check out the same script, with the addition of the identity map plugin:

MongoMapper.connection = Mongo::Connection.new('127.0.0.1', 27017, :logger => Logger.new(STDOUT))
MongoMapper.database = 'testing'

class Item
  include MongoMapper::Document
  plugin MongoMapper::Plugins::IdentityMap
  
  key :title, String
  key :parent_id, ObjectId
  
  belongs_to :parent, :class_name => 'Item'
end

root = Item.create(:title => 'Root')
child = Item.create(:title => 'Child', :parent => root)
grand_child = Item.create(:title => 'Grand Child', :parent => child)

puts root.equal?(child.parent) # true
puts child.equal?(grand_child.parent) # true

Note that we get true for both. Also, we get no queries for the parents of child and grand_child, as they are already in the identity map. The output looks something like this:

MONGODB admin.$cmd.find({:ismaster=>1}, {}).limit(-1)
MONGODB db.items.update({:_id=>4b81a0c9d072c40c2c000001}, {"title"=>"Root", "_id"=>4b81a0c9d072c40c2c000001, "parent_id"=>nil})
MONGODB db.items.update({:_id=>4b81a0c9d072c40c2c000002}, {"title"=>"Child", "_id"=>4b81a0c9d072c40c2c000002, "parent_id"=>4b81a0c9d072c40c2c000001})
MONGODB db.items.update({:_id=>4b81a0c9d072c40c2c000003}, {"title"=>"Grand Child", "_id"=>4b81a0c9d072c40c2c000003, "parent_id"=>4b81a0c9d072c40c2c000002})
true
true

Obviously, this is a really small and simple example, but with one small addition, we saved a few queries. Imagine how much of a difference it makes in a big application making lots of requests. The thing I am most amazed at is how much punch this plugin adds compared to the amount of code in the implementation. As of 0.7, the identity map plugin is only 122 lines of code.

Using the IM Plugin with 0.7

The IM plugin is by no means feature complete, so I am not automatically including it in every Document yet. I will say that what I have added is production ready and we have been using it in Harmony for over a month now. You can use it on a model by model basis like this:

class Foo
  include MongoMapper::Document
  plugin MongoMapper::Plugins::IdentityMap
end

Or you can turn it on for all documents by dropping this in an initializer (stolen directly from Harmony):

module IdentityMapAddition
  def self.included(model)
    model.plugin MongoMapper::Plugins::IdentityMap
  end
end

MongoMapper::Document.append_inclusions(IdentityMapAddition)

Correct, Beautiful, Fast

The IM plugin, for me, is a great example of Correct, Beautiful, Fast. First, we built Harmony in a way that worked and was easy to read (correct and beautiful). Then, when we needed to make it fast, all we had to do was override the implementation in a few spots and we cut our queries in half (or more) over night. It is far easier to find where you need to optimize when your code is correct and beautiful.