The First Step of Refactoring a Rails Application »

Created at: 08.10.2010 04:06, source: Rails Inside, tagged: Uncategorized

tickbox.pngRefactoring is an important step in software development. It helps to make sure that existing code is kept updated and clean, even after a herd of people trample over it.

Recently I've been asked a lot as to which tools I use to refactor Rails apps. I don't use any one tool to refactor, but instead have a toolbox of several tools. Having several tools at your disposal gives you more flexibility to find what code needs to be refactored.

Intuition

The first and primary tool I use is intuition. If you know your code really well, you will already know which areas cause problems and need to be refactored.

If you are new to a code base though, you won't have this intuition and need to build it up before you can use it. Luckily there are several tools to find problem code that is just waiting for you to refactor them.

wc - the Unix word counter

The first tool is a simple Unix tool called wc. wc is the word count command and it's installed on almost all Unixes by default. It does more than count words though, it also counts the number of lines in a file. In my experience, classes and files with the most lines tend to need the most refactoring. If you think about it, more lines means more code which means a greater likely hood of bad code.

Using wc -l you can easily get line counts for all of the major areas of your Rails project. I typically just look in the controllers and models since that's where most development takes place but you might also want to check your lib and vendor directories. What you are looking for are large and complex classes.

If you have a lot of classes, like Redmine does, it can be useful to pipe the output to sort -n which sorts it numerically.

$ wc -l app/controllers/*.rb | sort -n
    25 app/controllers/auto_completes_controller.rb
    25 app/controllers/ldap_auth_sources_controller.rb
    26 app/controllers/project_enumerations_controller.rb
    ... snip ...
   248 app/controllers/wiki_controller.rb
   271 app/controllers/projects_controller.rb
   272 app/controllers/account_controller.rb
   322 app/controllers/repositories_controller.rb
   324 app/controllers/timelog_controller.rb
   326 app/controllers/issues_controller.rb
   412 app/controllers/application_controller.rb
  5381 total
$ wc -l app/models/*.rb | sort -n
    22 app/models/document_observer.rb
    22 app/models/group_custom_field.rb
    22 app/models/issue_observer.rb
    ... snip ...
   200 app/models/changeset.rb
   202 app/models/wiki_page.rb
   215 app/models/repository.rb
   234 app/models/version.rb
   336 app/models/mail_handler.rb
   416 app/models/user.rb
   442 app/models/mailer.rb
   643 app/models/query.rb
   774 app/models/project.rb
   863 app/models/issue.rb
  7412 total

Reviewing the results from wc

You want to look for a few things with raw line counts:

  • is there a class with a disproportionate number of lines compared to the others?
  • are there more lines in the controllers than the models?

In general, a well factored Rails application will have a majority of it's code in the models and most classes will be the same size. There might be a few major classes that are larger than the rest, which are the primary domain objects. Using Redmine as an example, you would expect any Project, Issue, and User classes to be large since it's a project management and bug tracking application. Looking at the line counts from wc, the Timelog controller and Query model have a considerable number of lines even though they aren't the primary domain objects. In fact, because of my intuition from working with Redmine for years, I know both of these classes need some major refactoring.

Reading the code to find refactorings

After you see which classes are the largest, you will want to read through the class itself. Look for how "thick" the class is and see what you can learn about it's structure.

  • Are there a few large method that contain most of the code for the class? Might be able to use extract method and extract class to shrink their size.
  • Are there a lot of short methods? Maybe the code was over-factored or the class is trying to do too much and needs to be split.
  • Are there a lot of comments? Since wc counts comments as lines too, wc might report that a class is large but there might not be that much actual code there.

After reading through some classes you will start to get a feel for what areas to refactor and if there are any common problems. You should also be able to start working on a few refactorings as you go.

Next time I'll talk about using some tools that understand Ruby and what makes Ruby code complex.

Eric DavisThis is a guest post by Eric Davis. Eric is a member of the Redmine team and has written an ebook Refactoring Redmine to show Rails developers what refactoring real Rails code looks like. He writes about the different refactorings done to Redmine every day at http://theadmin.org.

Post to Twitter Tweet This Post


more »

Extending Rails 3 with Railties »

Created at: 07.10.2010 14:49, source: Engine Yard Blog, tagged: Uncategorized Rails 3 Railties

This post comes from guest community contributor and Engine Yard alumni Andre Arko. Andre has been building web applications with Ruby and Rails for five years, and is a member of the Bundler core team. He works for Plex, tweets as @indirect, and blogs at andre.arko.net.

Gem plugins in Rails 3.0

Rails 3.0 is finally released, and with it comes a fantastic new way to extend Rails: Railties. Railties are the basis of the core components of Rails 3, and are the result of months of careful refactoring by Carlhuda. It is easier to extend and expand Rails than it has ever been before, without an alias_method_chain in sight. Unfortunately, while the system for extending and expanding Rails has been completely overhauled, the documentation hasn't been updated yet. The Rails Plugins Guide only covers writing plugins in the old Rails 2 style. Ilya Grigorik wrote a Railtie & Creating Plugins blog post, but just scratched the surface of what is possible with a Railtie plugin. This post covers writing Railtie plugins, hooking into the Rails initialization process, packaging Railtie plugins as gems, and using gem plugins in a Rails 3 application.

Creating Railtie plugins

Creating a Railtie is easy. Just create a class that inherits from ::Rails::Railtie. Every subclass of Railtie is used to initialize your Rails application. Since ActionController, ActionView, and the other Rails components are also Railties, your plugin can function as a first-class member of the Rails application. It will have access to the same methods and context that are used by the official Rails components. Here is a sample minimal Railtie that will be loaded when your Rails application boots.
require 'rails'
class MyCoolRailtie < Rails::Railtie
  # railtie code goes here
end
The Railtie documentation lists all of the methods that are available inside each Railtie class, but doesn't really go into depth about what you can use Railties to do. Here are some example Railties explaining how to use the Railtie methods (in alphabetical order) to customize and extend Rails.

console

The console method allows your Railtie to add code that will be run when a Rails console is started.
console do
  Foo.console_mode!
end

generators

Rails will require any generators defined in lib/generators/*.rb automatically. If you ship Rails::Generators with your Railtie in some other directory, you can require them using this method.
generators do
  require 'path/to/generator'
end

rake_tasks

If you ship rake tasks for apps with your Railtie, load them using this method.
rake_tasks do
  require 'path/to/railtie.tasks'
end

initializer

The initializer method provides Railties with a lot of power. They create initializers that will be run during the Rails boot process, like the files put into config/initializers in the app directory. The initializer method takes two options, :after or :before, if there are specific initializers that you want to run before or after yours.
initializer "my_cool_railtie.boot_foo" do
  Foo.boot(Bar)
end

initializer "my_cool_railtie.boot_bar",
  :before => "my_cool_railtie.boot_foo" do
    Bar.boot!
end

Rails configuration hooks

The biggest extension hook that Railties provide is somewhat unassuming: the config method. That method returns the instance of Railtie::Configuration that belongs to the application being booted. This opens up all sorts of interesting possibilities, since the config object is the same one that is made available inside a Rails application's environment.rb file. Here are some annotated examples of using config to change how a Rails application is initialized and configured.

after_initialize

This method takes a block that will be run after Rails is is completely initialized, and all of the application's initializers have run.

app_middleware

This method exposes the MiddlewareStack that will be used to handle requests to your Rails application. You can use any of the methods defined on MiddlewareStack, including use and swap, to manage the Rails application's Rack middlewares. For example, if your Railtie included the Rack middleware MyRailtie::Middleware, you could add it to the Rails application middleware stack like this:
config.middlewares.use MyRailtie::Middleware

before_configuration

Code passed in a block to this method will be run immediately before the application configuration block inside application.rb is run. This is usually the best place to set default options that users of your plugin should be able to override themselves, as in the jquery-rails example below.

before_eager_load

The block passed to before_eager_load will be run before Rails requires the application’s classes. Eager load is never run in development mode. However, if you need to run code after Rails loads, but before any application code loads, this is the place to put it.
config.before_eager_load do
  SomeClass.set_important_value = "RoboCop"
end

before_initialize

This method takes a block to be run before the Rails initialization process happens -- this is basically equivalent to creating an initializer, and setting it to run before the first initializer the app has.

generators

This object holds the configuration for the generators that are invoked when you run the rails generate command.
config.generators do |g|
  g.orm             :datamapper, :migration => true
  g.template_engine :haml
  g.test_framework  :rspec
end
You can also use it to disable colorized logging in the console.
config.generators.colorize_logging = false

to_prepare

Last, but quite importantly, to_prepare allows you to do one-time setup. The block you pass to this method will run for every request in development mode, but only once in production. Use it when you need to set something up once before the app starts serving requests.

Examples

At this point, you're probably thinking "why would I actually want to do any of that stuff?". So, here are a few select examples of Railtie plugins packaged as gems.

rspec-rails

The rspec-rails plugin ships with a set of rake tasks and generators that integrate the RSpec gem with Rails.
module RSpec
  module Rails
    class Railtie < ::Rails::Railtie
      config.generators.integration_tool :rspec
      config.generators.test_framework   :rspec

      rake_tasks do
        load "rspec/rails/tasks/rspec.rake"
      end
    end
  end
end
This Railtie just does three things: First, it sets the generators that will be used for integration tests via the integration_tool method. Next, it sets the generators that will be used to generate model, controller, and view tests (via the test_framework method). Last, it loads the RSpec rake tasks to run RSpec tests instead of test-unit tests.

jquery-rails

The jquery-rails plugin ships with a generator that downloads and installs jQuery, the jquery-ujs script that enables Rails helpers with jQuery, and optionally installs jQueryUI as well.
module Jquery
  module Rails
    class Railtie < ::Rails::Railtie
      config.before_configuration do
        if ::Rails.root.join('public/javascripts/jquery-ui.min.js').exist?
          config.action_view.javascript_expansions[:defaults] =
            %w(jquery.min jquery-ui.min rails)
        else
          config.action_view.javascript_expansions[:defaults] =
            %w(jquery.min rails)
        end
      end
    end
  end
end
This Railtie only sets one setting, but checks for the jQueryUI library to determine the value to set. By using the config.before_configuration hook, it runs right before the application.rb config block runs. That means it has access to the Rails.root, which is needed to check for jQueryUI, and it means that users can still override javascript_expansion[:defaults] in their application.rb if they want something different than the new defaults that the plugin provides.

haml-rails

The haml-rails gem provides generators for views written in Haml instead of the default generated views that are written in ERB.
module Haml
  module Rails
    class Railtie < ::Rails::Railtie
      config.generators.template_engine :haml

      config.before_initialize do
        Haml.init_rails(binding)
        Haml::Template.options[:format] = :html5
      end
    end
  end
end
This Railtie simply changes the template engine that Rails invokes when you run rails generate, and then initializes Haml for Rails, and sets the Haml output format to HTML5.

Packaging up gem plugins

Railtie plugins are easy to turn into gem plugins for Rails. This makes them easy to distribute, manage, and upgrade. The first thing you need is a gem. If you don't have a gem yet, you can create a new gem easily using Bundler. Just run bundle gem my_new_gem and Bundler will generate a skeleton gem and gemspec that follow gem best practices. Once you have a gem, just make sure that your Railtie subclass is defined when lib/my_new_gem.rb is loaded. You can define the Railtie in a separate file and require that file, or define it directly. Last, add a dependency on the Rails gem (~>3.0) to your gemspec. If your gem is also a plain Ruby library, and you don't want to depend on the Rails gem, then you can put your Railtie in a separate file, and conditionally require that file inside your main library file.
# lib/my_new_gem/my_cool_railtie.rb
module MyNewGem
  class MyCoolRailtie < ::Rails::Railtie
    # Railtie code here
  end
end
# lib/my_new_gem.rb
require 'my_new_gem/my_cool_railtie.rb' if defined?(Rails)
This ensures that your gem can be loaded (without the Railtie) if it is loaded outside the context of a Rails application. Now that your gem has a Railtie, you can build it and release it to Gemcutter. Once your gem is on Gemcutter, using it with Rails 3 applications is extremely easy -- just add the gem to your Gemfile. Bundler will download and install your gem when you run bundle install, Rails will load it, and the Rails::Railtie class takes care of the rest!


more »

Deployment Best Practices with Engine Yard and Bundler 1.0 »

Created at: 22.09.2010 13:28, source: Engine Yard Blog, tagged: Uncategorized bundler

Bundler 1.0 was recently released and it is better than ever. The dependency resolution is now smarter and more flexible, a ton of bugs were fixed, and it has generally gotten much easier to use. We have been using Bundler at Engine Yard since the first version that was actually part of Merb. Things have changed a lot since then, but the basics are actually still pretty much the same: tell the tool what your dependencies are, it will figure out what set of gems satisfy those dependencies and make them available to your application. Now that Bundler is at 1.0 and is stable and solid, I highly recommend that everyone use it to deploy their applications. There are a few best practices that will make using Bundler to deploy work seamlessly.

Check in your Gemfile.lock

This was always a best practice when using Bundler in earlier versions. If you don't check in the lockfile, the only thing Bundler can do when deploying, is resolve all of the gems. If a new gem was released since you last tested, you could end up deploying to production with a different set of dependencies than you tested. In Bundler 1.0, when using the --deployment flag, it is required that you check in your lockfile. It will not let you deploy without one. At Engine Yard we use --deployment whenever you are using Bundler 1.0.

Your bundle should be shared across deploys, but not across applications

When you deploy your code, it would be better if Bundler did not have to rebuild every gem in your application's dependencies. You can do this by having Bundler install the gems in a place that is shared between deploys. At Engine Yard we automatically tell Bundler to put your gems in /data/app-name/shared/bundled_gems. This way you still get the benefit of having your application gems separate from your system gems, but you also don't have to rebuild every gem on every deploy.

Use groups to limit what is installed on production

Most applications have a set of dependencies that they may only want during development of their application. This could be test frameworks like rspec or cucumber, or a gem used for debugging locally like ruby-debug. When deploying with Engine Yard, Bundler does not automatically install or make available anything in the development or test groups. Your Gemfile could look something like this:
gem "rails"

group :test do
  gem "rspec", :require => "spec"
end

group :development do
  gem "ruby-debug"
end
Then when you deploy to production, you will get the rails gem and all of its dependencies, but ruby-debug and rspec will not be installed.

Package your gems before deployment

This recommendation is a bit more of a gray area. Bundler includes the bundle package command that will put the packed .gem files into your application so when you deploy, everything your app needs is already there. This makes your deployments less failure prone in the case that the gem servers are down. In reality gemcutter is not down often, and hopefully if you follow recommendation #2, most deploys should not even need to fetch any gems (except when the dependencies of your application change). This also means that there are a few more megabytes of data in your git repo. I still think it is a good idea to be resilient to those kinds of failures, but there is a trade off. Bundler 1.0 is an amazing tool that completely changes the way you think about dependencies and deployment. I feel much more comfortable knowing that any deploy that I do using Bundler is not going to randomly fail because I forgot to install a gem on my production server. With these few tips, deploys should be even more foolproof when it comes to your applications' dependencies.


more »

Introducing JRubyConf 2010 »

Created at: 25.08.2010 23:02, source: Engine Yard Blog, tagged: Uncategorized

We're excited to join forces with our friends at EdgeCase to co-host the second annual JRubyConf, taking place October 1-3 at Quest Conference Center in Columbus, Ohio. This year, we've expanded the event to include three days of JRuby-filled goodness. JRubyConf will showcase the growing use of Ruby in the enterprise while also highlighting elements of the Java language that Ruby developers can benefit from via JRuby. We've got a fantastic speaker lineup including: Tom Enebo, Chad Fowler, Jeremy Heingardner, Rich Kilmer, Keavy McKinn, Charles Nutter, Joe O’Brien, Nick Sieger, Brian Swan, Glenn Vanderburg, Jim Weirich, and more to be announced soon. If you're curious about what JRuby can do for you, or if you're someone who has been using it for years - join us! We've got something for everyone. JRubyConf will begin with Java and Ruby specific talks before progressing to more advanced sessions that demonstrate the possibilities of using both languages with JRuby – all focused on bringing the Ruby and Java communities together in a collaborative environment to share best development practices. Topics to be covered include: • Introduction to JRuby • How to use Java in Ruby applications • Best practices for introducing Ruby to Java development teams • Effectively managing large agile teams that use Ruby • Large scale testing with Ruby • How to scale Ruby on Rails Our growing sponsor list includes EdgeCase, 8th Light, ELC Technologies, Kinetic Data, O'Reilly, Terremark, and WyeWorks. Registration is now open. Take advantage of an early bird discount for registration before September 1. If you would like information on user group discounts, give us a shout! To register to attend or to participate as a sponsor, visit the JRubyConf event site. Follow @JRubyConf on Twitter to stay on top of announcements. Hope you can join us in Columbus!


more »

A Gentle Introduction to Isolation Levels »

Created at: 21.07.2010 13:16, source: Engine Yard Blog, tagged: Uncategorized

Hello all, Our latest post is from a special guest and Engine Yard partner Xavier Shay. He'll be running a pair of training sessions on "using your database to make your Ruby on Rails applications rock solid" at Engine Yard's San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.
Bob opens a database transaction and selects everything from the books table. Tom comes along and adds a new book, then Bob, in his same transaction, repeats his same query for all the books. Does Bob see the new book that Tom added? The answer is that you get to choose! It's important to understand what your choices are (and what choice your preferred database makes for you) so that you can ensure your code executes in a way that you intend. The SQL standard specifies levels for how "isolated" transactions running at the same time are, all the way from being able to see uncommitted changes (not isolated) to effectively running the transactions in serial (full isolation). Academically there are eight levels of isolation, but for most purposes you only need to worry about the four defined by the standard. MySQL implements all four, PostgreSQL only two. You can specify a global isolation level for your database, but also override it for individual transactions. The easiest to understand are the extreme levels: no isolation and total isolation. The first of these is known as *read uncommitted*, and it allows Bob to read the new book that Tom is adding _even before Tom has committed his changes_. As you can imagine this level is mostly useless, however it can very occasionally be handy in some reporting situations. At the other end of the spectrum is full isolation, known in the spec as **serializable**. Bob will never see the new book that Tom is adding until he starts a new transaction. The database Bob sees is consistent---within the one transaction, the same query will always return the same result. At first glance this level seems like a great option but there's a lot of overhead involved, it drastically reduces the amount of concurrency you can achieve, and for most purposes the serializable level is overkill. There are two isolation levels in between read uncommitted and serializable, they are *read committed* and *read repeatable*, and this is where it gets interesting. Read committed is the default isolation level in PostgreSQL and Oracle, and is one step up from read uncommitted. It is the most "common sense" level: Bob will not see any changes made by Tom until Tom commits them. MySQL defaults to *read repeatable*. In this level, Bob will not see any _updates_ Tom commits, but will see any _inserts_. Say in Bob's first select he sees one book titled "The Odessey". Tom then fixes the spelling mistake to "The Odyssey", and also add Homer's other epic poem "The Iliad". When Bob selects all books again, he will see "The Odessey" (old title, no spelling fix) and "The Iliad" (the inserted book). To summarize, the four levels from least isolated to most isolated are: *read uncommitted*, *read committed*, *repeatable read*, and *serializable*. They define what types of changes made by Tom that Bob will be able to see within a single transaction. h2. In Practice Say the books we are selecting are ordered based on an arbitrary position column (they're on our bookshelf, for instance). Assume read committed isolation level.
Title       | Position
----------------------
The Odyssey | 1
The Iliad   | 2
The Nostoi  | 3
Bob wants to move "The Odyssey" to the bottom position. To do this, he needs to update its position to the bottom of the list (position 4), then subtract 1 from all positions. At the same time, Tom is adding a new book "The Cypria". Working this through: # Bob checks the bottom position, finds it to be 4 # Tom inserts "The Cypria" in the bottom position of 4 # Bob updates the position of "The Odyssey" to 4 # Bob subtracts 1 from all positions, and since he is using *read committed* he will "see" and update the newly inserted book. # Both "The Odyssey" and "The Cypria" have a position of 3
Title       | Position
----------------------
The Iliad   | 1
The Nostoi  | 2
The Odyssey | 3
The Cypria  | 3
If Bob had used the *serializable* level, the list would have remained consistent for his entire transaction, so his update would not have affected "The Cypria" that Tom inserted, and so would not have updated its position from 4 to 3. (In practice the way databases normally handle this is to actually abort one of the transactions with an error.) For those using Rails, you may have recognized the above scenario as a typical @acts_as_list@ scenario, and you'd be correct. In a default configuration, the @acts_as_list@ plugin makes the same mistake outlined above, and will leave you with inconsistent data. The quickest fix is to wrap all list operations in a serializable transaction.
Book.transaction do
  Book.connection.execute("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE")
  @book = Book.find_by_name("The Odyssey")
  @book.move_to_bottom
end
(It may have occurred to you that some locking or a unique index on position could avoid the exact scenario above, but that breaks @acts_as_list@ and fails to address some other edge cases left as an exercise for the reader. The main point for the purpose of this article is to understand why it breaks under read committed, but works under serializable.) As a general rule, read committed is a sensible default. It's easy to reason about, fast, and forces you to be explicit about your locking strategy. Jump up to serializable when needed, usually when dealing with ranges. MySQL's repeatable read default can be confusing and deadlock in unintuitive ways, as such it is not recommended. This has been a very brief introduction to the four standard SQL isolation levels: read uncommitted, read committed, repeatable read, and serializable. Hopefully it has helped you get your head around them. I'll be going into much more detail with practical hands on exercises in my training days at Engine Yard's San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.


more »