Building an Object Mapper: Override-able Accessors »

Created at: 30.08.2010 02:04, source: RailsTips - Home, tagged: mongomapper ruby

There are several things I have learned building object mappers that I now take for granted. Last week while pairing with Jason, I was explaining a trick and he said I should blog about it, so here goes nothing.

Let’s say you are building a new object mapper named TacoMapper. A sensible place to start is with attribute accessors. One other thing to note is that we don’t care about old news, so we won’t support Rails 2 in any fashion. Deciding this, we can take advantage of ActiveModel and new features in ActiveSupport.

First Goal

First, let’s think about API. Our first goal will be to make the following work:

class User
  include TacoMapper
  attribute :email
end

user = User.new
user.email = 'John@Doe.com'
puts user.email # "John@Doe.com"

First Solution

The first thing we need is a class method named attribute.

require 'active_model'
require 'active_support/all'

module TacoMapper
  extend ActiveSupport::Concern

  module ClassMethods
    def attribute(name)
      attr_accessor(name)
    end
  end
end

class User
  include TacoMapper
  attribute :email
end

user = User.new
user.email = 'John@Doe.com'
puts user.email # "John@Doe.com"

ActiveSupport::Concern is a handy little ditty that does a lot out of the box. In our case, it will automatically call extend(ClassMethods) and add our attribute class method whenever our TacoMapper module gets included. With just a tiny bit of code, we have met our first goal.

Second Goal

Now, I am going to throw a kink in the mix. The goal of this post is to create override-able accessors. Let’s say we want to override the email writer method and make sure that we always get lowercase emails. If we override our attribute accessors right now, we have to set the instance variable for things to work and we get no benefit from TacoMapper.

require 'active_model'
require 'active_support/all'

module TacoMapper
  extend ActiveSupport::Concern

  module ClassMethods
    def attribute(name)
      attr_accessor(name)
    end
  end
end

class User
  include TacoMapper
  attribute :email

  def email=(value)
    @email = value.to_s.downcase
  end
end

user = User.new
user.email = 'John@Doe.com'
puts user.email # "john@doe.com"

In this simple example, that might be ok. But what if other things were in the mix, like dirty tracking, typecasting, etc.? Those other things would immediately stop working. Would it not be nice if we could just override our accessor and call super to get all the normal functionality of TacoMapper? I am glad you agree.

Second Solution

If you haven’t yet, you might want to read Lookin’ on Up…To the East Side, a post I wrote on how Ruby’s method lookups work. In it, I explain that if you include a module, you can override methods that were in the module and call super. We’ll do the same in TacoMapper so we can make things a bit more robust.

require 'active_model'
require 'active_support/all'

module TacoMapper
  extend ActiveSupport::Concern

  module ClassMethods
    def attribute_accessors_module
      @attribute_accessors_module ||= Module.new.tap { |mod| include(mod) }
    end

    def attribute(name)
      attribute_accessors_module.module_eval <<-CODE
        def #{name}
          @#{name}
        end

        def #{name}=(value)
          @#{name} = value
        end
      CODE
    end
  end
end

class User
  include TacoMapper
  attribute :email

  def email=(value)
    super(value.to_s.downcase)
  end
end

user = User.new
user.email = 'John@Doe.com'
puts user.email # "john@doe.com"

Note that we get the same result as the previous example, except that now we can just call super and still take advantage of all the loveliness that TacoMapper will eventually provide. We changed a couple of key things, so lets cover the differences in detail.

First, we created a method (attribute_accessors_module) that returns a memoized module and includes it in the current class. No matter how many calls you make to this method, it will return the first module we created and it will only be included once, since we memoized it (||=).

Second, since we have a module and it is included in our class, all we have to do is module_eval our accessor methods into it. This is what is happening in the attribute method.

Third, instead of setting the email instance variable in the email= method, we just call super, which will call the method we module_eval’d into our accessors module.

Pretty sweet, eh?

Third Solution

We could stop there, but we said we were going to use ActiveModel a bit, right? ActiveModel has a module for attribute accessors that does the same thing as above and a bit more (although a bit confusing).

require 'set'
require 'active_model'
require 'active_support/all'

module TacoMapper
  extend ActiveSupport::Concern
  include ActiveModel::AttributeMethods

  included do
    attribute_method_suffix('', '=')
  end

  module ClassMethods
    def attributes
      @attributes ||= Set.new
    end

    def attribute(name)
      attributes << name.to_s
    end
  end

  module InstanceMethods
    def attribute(key)
      instance_variable_get("@#{key}")
    end

    def attribute=(key, value)
      instance_variable_set("@#{key}", value)
    end

    def attributes
      self.class.attributes
    end
  end
end

class User
  include TacoMapper
  attribute :email

  def email=(value)
    super(value.to_s.downcase)
  end
end

user = User.new
user.email = 'John@Doe.com'
puts user.email # "john@doe.com"

Note that again, we get the same result and that we are using super as before. So what changed this time?

First, we included ActiveModel::AttributeMethods. We then took advantage of the attribute_method_suffix method it provides to declare that we would have a reader ('') and a writer ('='). Now all our attribute class method has to do is add the attribute to the set of attributes.

Lastly, we define methods that implement the suffix methods we defined (attribute and attribute=). Note that we also define the attributes instance method so that ActiveModel knows when it is dealing with one of our attributes. Now, it takes only a few more lines of code to add a boolean presence method.

require 'set'
require 'active_model'
require 'active_support/all'

module TacoMapper
  extend ActiveSupport::Concern
  include ActiveModel::AttributeMethods

  included do
    attribute_method_suffix('', '=', '?')
  end

  module ClassMethods
    def attributes
      @attributes ||= Set.new
    end

    def attribute(name)
      attributes << name.to_s
    end
  end

  module InstanceMethods
    def attribute(key)
      instance_variable_get("@#{key}")
    end

    def attribute=(key, value)
      instance_variable_set("@#{key}", value)
    end

    def attribute?(key)
      instance_variable_get("@#{key}").present?
    end

    def attributes
      self.class.attributes
    end
  end
end

class User
  include TacoMapper
  attribute :email

  def email=(value)
    super(value.to_s.downcase)
  end
end

user = User.new
puts user.email? # false
user.email = 'John@Doe.com'
puts user.email? # true

Using ActiveModel like this makes adding things like dirty tracking take a few minutes instead of a few hours. That said, the main point of this post is how the internals of ActiveModel actually work and how you can do it on your own if you so choose.

For those of you that are MongoMapper users, it has the same functionality built in though in a not as elegant way (which I will be updating soon). Also, your accessors are not used when loading things from the database, as that is handled internally. This means you can make your public accessors do whatever you want and it will not foobar the loading of your documents.

Hope this helps others trying to grok ActiveModel or build your own object mapper. If you enjoyed this post and would like to learn more about building an object mapper, let me know with a comment and I will try to round up a few more posts on the topic.


more »

MongoMapper 0.8: Goodies Galore »

Created at: 16.06.2010 22:00, source: RailsTips - Home, tagged: mongomapper gems

Let me tell you, this release has been a tough one. It is made up of 43 commits to Plucky and 92 commits to MongoMapper. Features added include a sexy query language, scopes, attr_accessible, a fancy cache key helper, a :typecast option for array/set keys, and a bajillion little improvements. Let’s run through each of them just for fun.

Sexy Query Language

This right here is all thanks to plucky. The goal for plucky is a fancy query language on top of MongoDB. It has been created in a such a way that other MongoDB projects (Mongoid, Candy, MongoDoc, etc.) can benefit from it if they wish. It still has a long way to go in covering edge cases and deeply nested queries, but the majority of queries one will do are covered quite nicely.

User.where(:age.gt => 27).sort(:age).all
User.where(:age.gt => 27).sort(:age.desc).all
User.where(:age.gt => 27).sort(:age).limit(1).all
User.where(:age.gt => 27).sort(:age).skip(1).limit(1).all

All of the above are supported out of the box. Each query method (limit, reverse, update, skip, fields, sort, where) returns a Plucky::Query object so they can be changed together until a kicker is hit, such as all, first, last, paginate, count, size, each, etc. It is fashioned in a similar form as ARel in this manner, but more simple as ARel has to handle a lot more than just simple queries (joins, etc).

Scopes

The main thing I was waiting for to do scopes was to get plucky to a point where scopes would be just a sprinkling of code to merge plucky queries. Thankfully that day has finally arrived and with this latest release, you can now scope away. The code is so compact, that I figured I would drop it in here for those that are curious:

module MongoMapper
  module Plugins
    module Scopes
      module ClassMethods
        def scope(name, scope_options={})
          scopes[name] = lambda do |*args|
            result = scope_options.is_a?(Proc) ? scope_options.call(*args) : scope_options
            result = self.query(result) if result.is_a?(Hash)
            self.query.merge(result)
          end
          singleton_class.send :define_method, name, &scopes[name]
        end

        def scopes
          read_inheritable_attribute(:scopes) || write_inheritable_attribute(:scopes, {})
        end
      end
    end
  end
end

Yep, that is it. With that bit of code, you can now do stuff like this:

class User
  include MongoMapper::Document

  # plain old vanilla scopes with fancy queries
  scope :johns,   where(:name => 'John')

  # plain old vanilla scopes with hashes
  scope :bills, :name => 'Bill'

  # dynamic scopes with parameters
  scope :by_name,  lambda { |name| where(:name => name) }
  scope :by_ages,  lambda { |low, high| where(:age.gte => low, :age.lte => high) }

  # Yep, even plain old methods work as long as they return a query
  def self.by_tag(tag)
    where(:tags => tag)
  end

  # You can even make a method that returns a scope
  def self.twenties; by_ages(20, 29) end

  key :name, String
  key :tags, Array
end

# simple scopes
pp User.johns.first
pp User.bills.first

# scope with arg
pp User.by_name('Frank').first

# scope with two args
pp User.by_ages(20, 29).all

# chaining class methods on scopes
pp User.by_ages(20, 40).by_tag('ruby').all

# scope made using method that returns scope
pp User.twenties.all

I am sure there are some edge cases, but I cannot wait to start swapping some of the code I have out for scopes. This is definitely one of the features I missed most from ActiveRecord.

attr_accessible

Previously, MongoMapper only supported attr_protected. The main reason was that I am lazy and someone from the community contributed the beginnings of the code. I spent some time today adding attr_accessible, so now you can really lock down your models if you want to.

class User
  include MongoMapper::Document

  attr_accessible :first_name, :last_name, :email

  key :first_name, String
  key :last_name, String
  key :email, String
  key :admin, Boolean, :default => false
end

Based on the example above, only first_name, last_name and email can be assigned when using mass assignment, such as in .new or #update_attributes.

Cache Key

On a recent MongoMapper project, I had to some caching. This led me to create bin, a MongoDB ActiveSupport cache store. The first thing you notice when you start to cache stuff is that you need a key to name the cached object or fragment. I dug around in AR and discovered the cache_key method. MongoMapper’s cache_key works the same with a little twist. You can pass arguments to it and they will become suffixes on the cache key. Lets look at an example:

class User
  include MongoMapper::Document
end

User.new.cache_key # => "User/new"
User.create.cache_key # => "User/:id"
User.create.cache_key(:foo, :bar) # => "User/:id/foo/bar"

It should also be noted that if the User model has an updated_at key, that will be appended after the id like so User/:id-:timestamp. This addition is definitely going to clean up some code on a project of mine.

Typecasting Array/Set values

A common idiom in MongoDB modeling is to use Array keys for many to many type relationships. You have a User model and a Site model. Sites can have many Users and Users can have many Sites. Typically, I make a key :user_ids, Array and store the ids of each user that has access to the site.

When this is done from web forms, everything comes in as a string, so you have to typecast those strings to object ids. The new :typecast option wraps this up in a single key/value.

class Site
  include MongoMapper::Document
  key :user_ids, Array, :typecast => 'ObjectId'
end

Now, whenever user_ids is assigned, each value gets typecast to an ObjectId. This will work with any class that defines the to_mongo class method, which means you can use it with custom types as well.

Conclusion

I learned more about Ruby while working on this release of MongoMapper than probably any other period in my brief history. I really feel like this release brings MongoMapper to the forefront of MongoDB/Ruby object mappers.

All the typical dressings are now in place and with a few more tweaks, I can smell 1.0. Hope you all find this stuff useful and as always, if you don’t, that is ok because I am enjoying the heck out of working on this stuff. :)

Oh, and if all of this above did not excite you, know that the new MongoMapper site, including full documentation, is well underway and should be ready for consumption soon. Hooray!


more »

MongoSF MongoMapper Video »

Created at: 10.05.2010 18:00, source: RailsTips - Home, tagged: mongomapper presentation

April 30th at MongoSF, I gave a presentation on MongoMapper entitled “Mapping Ruby To and From Mongo”. The slides have been up for a while, but last night, the video was finally posted. The first 15 minutes of the presentation are about using MongoMapper, but the last 15 minutes cover extending it and the future. If you are into MM at all, I would recommend watching the second 15 for sure.

Video

Slides


more »

A Nunemaker Joint »

Created at: 27.03.2010 05:00, source: RailsTips - Home, tagged: joint mongomapper gems gridfs

Since The Changelog has already scooped me (darn those guys are fast), I figured I should post something here as well. Last December I posted Getting a Grip on GridFS. Basically, I liked what Grip was doing, but made a few tweaks. Grip’s development has continued, but it is headed down a bit different path, so I thought it might be confusing to keep my fork named the same.

Today, I did some more work on the project and now that it is unrecognizable when compared to Grip, I renamed it to Joint. Yes, I realize that I now have gems named crack and joint. Joint is a tiny piece of code that joins MongoMapper and the new Ruby GridFS API.

Usage

What I love about joint is its simplicity. Simply declare the attachment and you are good to go.

class Asset
  include MongoMapper::Document
  plugin Joint # add the plugin

  attachment :file # declare an attachment named image
end

With that simple declaration, you get #file and #file= instance methods and several keys (file_id, file_name, file_type, and file_size). The #file instance method returns a simple proxy to make the API a bit prettier and sends all other calls on the proxy to the GridIO instance.

asset = Asset.create(:file => params[:file])
asset.file.id   # GridFS Object Id
asset.file.name # file name
asset.file.type # mime type as determined by wand gem
asset.file.size # size in bytes

There is no limit to the number of attachments, but each attachment uses 4 keys so I would not add more than one or two (more is a sign you are doing something wrong). As mentioned in the comment above, it uses my wand project to determine the mime type.

For those that are not familiar with wand (as I have not posted here about it), it first attempts to determine the mime type using the mime-types gem. If that fails to returning anything, it drops down to the unix file command.

What It Does Not Do

Anything else. All joint handles is assigning a file and storing it in GridFS. It doesn’t do versions, resizing, etc. For this type of stuff, I would recommend imagery with some HTTP caching (varnish, et el.) sitting in front of it.

I can certainly see having some triggers (callbacks) at some point such as after jointed or something for when you do want to do post processing. I’ll leave that for a rainy day though.


more »

MongoMapper 0.7: Identity Map »

Created at: 22.02.2010 00:00, source: RailsTips - Home, tagged: mongomapper patterns

While it isn’t quite ready to shove down everyone’s throats, IdentityMap is a hot new plugin I have been working on, directly due to a need that arose in Harmony.

One of the things I started to notice in Harmony right away was an epic amount of queries, 90% of which were simple id lookups repeated over and over. I remembered hearing about the identity map pattern before and figured I would research it a bit.

An old proverb says that a man with two watches never knows what time it is. If two watches are confusing, you can get in an even bigger mess with loading objects from a database.

So true. Now that I had the plugin system in place, I knew exactly how I would go about integrating IM (identity map) with MM. The great thing about the process was that I had a need from a real application. Over half of the IM plugin was created while vendored in Harmony. Several test cases that are currently in MM are directly due to weird bugs that we experienced in Harmony. I think it is really important to note that MM features grow from real applications, not theoretical fun.

Foo is Foo is Foo

The first thing I had to get working was to make sure that each document in the database equated to one object in Ruby. This means whether I call Item.find(id) or item.parent, if they are both loading the same document, they need to be the same object id in Ruby.

The easiest way I could think to do this was to have one method I always call whenever loading a document from the database. The obvious method name was load. Now, whenever an document is found in the database, I call load instead of new. This seems really insignificant, but down the rabbit trail it has significant implications.

Taking Advantage of Method Lookups and Super

Nothing fancy happens in load, but where it gets cool is in the IM plugin. Because every document coming from the database runs through load, all the IM plugin has to do is override load and ensure that the same document in the database is the same object in Ruby. It does this by storing each document in the map when loaded and then checking each document loaded against that store. If the document is already there, it returns the object for that document instead of creating a new one.

Each class gets an IM (just a plain old hash for now). Because each id is unique, the id is the key in the IM hash and the value is the Ruby object. Because of how method lookups work in Ruby, all that I do is include the IM plugin after the plugin that defines load and I can access the original load method using super. Below is the load method in the IM plugin:

def load(attrs)
  document = identity_map[attrs['_id']]

  if document.nil? || identity_map_off?
    document = super
    identity_map[document._id] = document if identity_map_on?
  end

  document
end

Without even knowing what the other methods do, you can get a feel for what is going on. First, I set document to where I assume it will be in the identity map for the class. If it is nil, that means it doesn’t exist, so I call super which calls the normal load method, and returns a new instance of the class based on the document from the database. Then, if the identity map is turned on, I add it to the map by setting the id as the key and the object as the value.

At the end, I return the document no matter what has happened before. With this tiny bit of code (and a few more lines for setting up hash store), I ensure that no matter how or when a document is queried from the database, it is always the same object in memory. Soooo sweet. Like I said, huge ramifications.

Identity Map Lookups Zap Simple Queries

Once I had this part done, all that was left was to zap the actual queries to the database if the document has already been loaded. I refactored the MM internal methods for finding documents to find_one and find_many. This means whenever you do any kind of find query in MM, eventually it hits one of those two methods.

Taking the same approach as load, if you are doing everything through one or two methods, all you have to do to change those methods is redefine them in the plugin to behave differently. The best part is you still have access to the originals using super. find_one in the IM plugin looks like this:

def find_one(options={})
  criteria, query_options = to_query(options)

  if simple_find?(criteria) && identity_map.key?(criteria[:_id])
    identity_map[criteria[:_id]]
  else
    super.tap do |document|
      remove_documents_from_map(document) if selecting_fields?(query_options)
    end
  end
end

simple_find? returns true if doing a query only by _id or _id and _type (which happens when using single collection inheritance). This method addition means we can return the document from the identity map without doing a query, if it has already been loaded.

A Simple, Yet Awesome Example

I learn best with code examples, so here is one for you. A simple item class for creating an even more simple tree.

MongoMapper.connection = Mongo::Connection.new('127.0.0.1', 27017, :logger => Logger.new(STDOUT))
MongoMapper.database = 'testing'

class Item
  include MongoMapper::Document
  
  key :title, String
  key :parent_id, ObjectId
  
  belongs_to :parent, :class_name => 'Item'
end

root = Item.create(:title => 'Root')
child = Item.create(:title => 'Child', :parent => root)
grand_child = Item.create(:title => 'Grand Child', :parent => child)

puts root.equal?(child.parent) # false
puts child.equal?(grand_child.parent) # false

If you run this code, you get false and false as the output, along with a few queries to find the parent of child and grand_child. Below is some sample output:

MONGODB admin.$cmd.find({:ismaster=>1}, {}).limit(-1)
MONGODB db.items.update({:_id=>4b81a114d072c40c3f000001}, {"title"=>"Root", "_id"=>4b81a114d072c40c3f000001, "parent_id"=>nil})
MONGODB db.items.update({:_id=>4b81a114d072c40c3f000002}, {"title"=>"Child", "_id"=>4b81a114d072c40c3f000002, "parent_id"=>4b81a114d072c40c3f000001})
MONGODB db.items.update({:_id=>4b81a114d072c40c3f000003}, {"title"=>"Grand Child", "_id"=>4b81a114d072c40c3f000003, "parent_id"=>4b81a114d072c40c3f000002})
MONGODB testing.items.find({:_id=>4b81a114d072c40c3f000001}, {}).limit(-1)
false
MONGODB testing.items.find({:_id=>4b81a114d072c40c3f000002}, {}).limit(-1)
false

Check out the same script, with the addition of the identity map plugin:

MongoMapper.connection = Mongo::Connection.new('127.0.0.1', 27017, :logger => Logger.new(STDOUT))
MongoMapper.database = 'testing'

class Item
  include MongoMapper::Document
  plugin MongoMapper::Plugins::IdentityMap
  
  key :title, String
  key :parent_id, ObjectId
  
  belongs_to :parent, :class_name => 'Item'
end

root = Item.create(:title => 'Root')
child = Item.create(:title => 'Child', :parent => root)
grand_child = Item.create(:title => 'Grand Child', :parent => child)

puts root.equal?(child.parent) # true
puts child.equal?(grand_child.parent) # true

Note that we get true for both. Also, we get no queries for the parents of child and grand_child, as they are already in the identity map. The output looks something like this:

MONGODB admin.$cmd.find({:ismaster=>1}, {}).limit(-1)
MONGODB db.items.update({:_id=>4b81a0c9d072c40c2c000001}, {"title"=>"Root", "_id"=>4b81a0c9d072c40c2c000001, "parent_id"=>nil})
MONGODB db.items.update({:_id=>4b81a0c9d072c40c2c000002}, {"title"=>"Child", "_id"=>4b81a0c9d072c40c2c000002, "parent_id"=>4b81a0c9d072c40c2c000001})
MONGODB db.items.update({:_id=>4b81a0c9d072c40c2c000003}, {"title"=>"Grand Child", "_id"=>4b81a0c9d072c40c2c000003, "parent_id"=>4b81a0c9d072c40c2c000002})
true
true

Obviously, this is a really small and simple example, but with one small addition, we saved a few queries. Imagine how much of a difference it makes in a big application making lots of requests. The thing I am most amazed at is how much punch this plugin adds compared to the amount of code in the implementation. As of 0.7, the identity map plugin is only 122 lines of code.

Using the IM Plugin with 0.7

The IM plugin is by no means feature complete, so I am not automatically including it in every Document yet. I will say that what I have added is production ready and we have been using it in Harmony for over a month now. You can use it on a model by model basis like this:

class Foo
  include MongoMapper::Document
  plugin MongoMapper::Plugins::IdentityMap
end

Or you can turn it on for all documents by dropping this in an initializer (stolen directly from Harmony):

module IdentityMapAddition
  def self.included(model)
    model.plugin MongoMapper::Plugins::IdentityMap
  end
end

MongoMapper::Document.append_inclusions(IdentityMapAddition)

Correct, Beautiful, Fast

The IM plugin, for me, is a great example of Correct, Beautiful, Fast. First, we built Harmony in a way that worked and was easy to read (correct and beautiful). Then, when we needed to make it fast, all we had to do was override the implementation in a few spots and we cut our queries in half (or more) over night. It is far easier to find where you need to optimize when your code is correct and beautiful.


more »