Ruby on Rails 3 Security Updated »
Created at: 08.06.2010 15:13, source: Ruby on Rails Security Project, tagged: cross-site scripting rails Ruby on Rails security sql injection sqli web security XSS
I hold a talk about Rails 3 Security at the RailsWayCon10. It is about the new Cross-Site Scription protection in Rails 3, what is going to change in ActiveRecord and other Rails Security topics. You can find the presentation at Slideshare.
more »
Rails Performance Needs an Overhaul »
Created at: 07.06.2010 17:32, source: igvita.com, tagged: Architecture Ruby on Rails performance rails
Browsers are getting faster; JavaScript frameworks are getting faster; MVC frameworks are getting faster; databases are getting faster. And yet, even with all of this innovation around us, it feels like there is massive gap when it comes to the end product of delivering an effective and scalable service as a developer: the performance of most of our web stacks, when measured end to end is poor at best of times, and plain terrible in most.
The fact that a vanilla Rails application requires a dedicated worker with a 50MB stack to render a login page is nothing short of absurd. There is nothing new about this, nor is this exclusive to Rails or a function of Ruby as a language - whatever language or web framework you are using, chances are, you are stuck with a similar problem. But GIL or no GIL, we ought to do better than that. Node.js is a recent innovator in the space, and as a community, we can either learn from it, or ignore it at our own peril.
Measuring End-to-End Performance

A modern web-service is composed of many moving components, all of which come together to create the final experience. First, you have to model your data layer, pick the database and then ensure that it can get your data in and out in the required amount of time - lots of innovation in this space thanks to the NoSQL movement. Then, we layer our MVC frameworks on top, and fight religious wars as developers on whose DSL is more beautiful - to me, Rails 3 deserves all the hype. On the user side, we are building faster browsers with blazing-fast JavaScript interpreters and CSS engines. However, the driveshaft (the app server) which connects the two pieces (the engine: data & MVC), and the front-end (the browser + DOM & JavaScript), is often just a checkbox in the deployment diagram. The problem is, this checkbox is also the reason why the ‘scalability’ story of our web frameworks is nothing short of terrible.
It doesn't take much to construct a pathological example where a popular framework (Rails), combined with a popular database (MySQL), and a popular app server (Mongrel) produce less than stellar results. Now the finger pointing begins. MySQL is more than capable of serving thousands of concurrent requests, the app server also claims to be threaded, and the framework even allows us to configure a database pool!
Except that, the database driver locks our VM, and both the framework and the app server still have a few mutexes deep in their guts, which impose hard limits on the concurrency (read, serial processing). The problem is, this is the default behaviour! No wonder people complain about 'scalability'. The other popular choices (Passenger / Unicorn) “work around” this problem by requiring dedicated VMs per request - that's not a feature, that's a bug!
The Rails Ecosystem
To be fair, we have come a long way since the days of WEBrick. In many ways, Mongrel made Rails viable, Rack gave us the much needed interface to become app-server independent, and the guys at Phusion gave us Passenger which both simplified the deployment, and made the resource allocation story moderately better. To complete the picture, Unicorn recently rediscovered the *nix IPC worker model, and is currently in use at Twitter. Problem is, none of this is new (at best, we are iterating on the Apache 1.x to 2.x model), nor does it solve our underlying problem.
Turns out, while all the components are separate, and its great to treat them as such, we do need to look at the entire stack as one picture when it comes to performance: the database driver needs to be smarter, the framework should take advantage of the app servers capabilities, and the app server itself can't pretend to work in isolation.
If you are looking for a great working example of this concept in action, look no further than node.js. There is nothing about node that can't be reproduced in Ruby or Python (EventMachine and Twisted), but the fact that the framework forces you to think and use the right components in place (fully async & non-blocking) is exactly why it is currently grabbing the mindshare of the early adopters. Rubyists, Pythonistas, and others can ignore this trend at their own peril. Moving forward, end-to-end performance and scalability of any framework will only become more important.
Fixing the "Scalability" story in Ruby
The good news is, for every outlined problem, there is already a working solution. With a little extra work, the driver story is easily addressed (MySQL driver is just an example, the same story applies to virtually every other SQL/NoSQL driver), and the frameworks are steadily removing the bottlenecks one at a time.
After a few iterations at PostRank, we rewrote some key drivers, grabbed Thin (evented app server), and made heavy use of continuations in Ruby 1.9 to create our own API framework (Goliath) which is perfectly capable of serving hundreds of concurrent requests at a time from within a single Ruby VM. In fact, we even managed to avoid all the callback spaghetti that plagues node.js applications, which also means that the same continuation approach works just as well with a vanilla Rails application. It just baffles me that this is not a solved problem already.
The state of art in the end-to-end Rails stack performance is not good enough. We need to fix that.
more »
Non-blocking ActiveRecord & Rails »
Created at: 15.04.2010 23:39, source: igvita.com, tagged: ruby Ruby on Rails activerecord eventmachine rails
Rails and MySQL go hand in hand. ActiveRecord is perfectly capable of using a number of different databases but MySQL is by far the most popular choice for production deployments. And therein lies a dirty secret: when it comes to performance and 'scalability' of the framework, the Ruby MySQL gem is a serious offender. The presence of the GIL means that concurrency is already somewhat of a myth in the Ruby VM, but the architecture of the driver makes the problem even worse. Let's take a look under the hood.
Dissecting Ruby MySQL drivers
The native mysql gem many of us use in production was designed to expose a blocking API: you issue a SQL query, and the library blocks until the server returns a response. So far so good, but unfortunately it also introduces a nasty side effect. Because it blocks inside of the native code (inside mysql_real_query() C function), the entire Ruby VM is frozen while we wait for the response. So, if you query happens to have taken several seconds, it means that no other block, fiber, or thread will be executed by the Ruby VM. Ever wondered why your threaded Mongrel server never really flexed its threaded muscle? Well, now you know.
Fortunately, the little known mysqlplus gem addresses the immediate problem. Instead of using a single blocking call, it forwards the query to the server, and then starts polling for the response. For the curious, there are also two implementations, one in pure Ruby with a select loop, and a native (C) one which uses rb_thread_select. The benefit? Well, now you can have multiple threads execute database queries without blocking the entire VM! In fact, with a little extra work, we can even get some concurrency out of ActiveRecord.
However, we could even drop threads in our quest for concurrency! Instead of making every thread poll on a socket, we could pass each of those sockets to a single event loop (EventMachine) library, and let it handle all the IO scheduling for us: gem install em-mysqlplus. Same API, in fact, it uses mysqlplus under the covers, but now every query has a callback for true non-blocking database access. Take a look at a few examples in the slides:
Non-blocking Rails with MySQL
Now we come around full circle. The downside of a true asynchronous library is that it requires callbacks, spaghetti code and a fully asynchronous stack. Thankfully, we already have Thin for our async app server, and with the introduction of Fibers in Ruby 1.9, we can wrap our asynchronous driver to behave just as if it had a blocking API.
So, we install em-mysqlplus, require em-synchrony to emulate the 'blocking api', implement an activerecord adapter, and we finally have a fully non-blocking ActiveRecord driver which we can drop into our Rails app! Well, almost, a few other modifications: Rails provides a threaded ConnectionPool, which we need to replace with a Fiber aware one, and finally, we need to disable the built in Mutex (hap tip to Mike Perham for doing all the dirty work for us). Now let's give it a try:
class WidgetsController < ApplicationController def index Widget.find_by_sql("select sleep(1)") render :text => "Oh hai" end end
thin -D start
ab -c 5 -n 10 http://127.0.0.1/widgets/Server Software: thin
Server Hostname: 127.0.0.1
Server Port: 3000Concurrency Level: 5
Time taken for tests: 2.210 seconds
Complete requests: 10
Requests per second: 4.53 [#/sec] (mean)
Our widget action simulates a blocking one-second query, we start up a single thin server, and run an ab test against it: 10 requests, with a max concurrency of 5. And as you can see, the test finishes in just slightly over 2 seconds!
Rails 3, Ruby 1.9 and Drizzle
By mid summer we will see production releases of Rails 3, Ruby 1.9, and Drizzle, and that convergence is worth paying attention to. Both Rails 3 and Ruby 1.9 offer raw performance improvements across the board. In the meantime, Drizzle already provides a fully async libdrizzle driver (talks to MySQL & Drizzle) which we could adopt to future proof our applications. Combine all three with a fibered ActiveRecord driver, an async application server such as Thin, and we could make some serious steps forward when it comes to performance of Rails: significantly lower memory footprint and much better performance across the board.
more »
has_and_belongs_to_many double insert »
Created at: 14.04.2010 10:22, source: The Life Of A Radar, tagged: rails ruby
This is a story about my work with GetUp, in particular the past week. It’s about a problem that I’ve been putting off help one of the guys (James) solve, it didn’t seem all that important to me. So last night I kind of promised that I’d sit down with him this morning and help him work out what it was. Hopefully it was something silly either of us did and it would only take us an hour.
You know how this story is going to end up already.
It didn’t take us an hour. It’s now 5pm and I’ve only just figured out what it was.
Symptoms
We have two models who’s names aren’t important so excuse me if I use the name Person and Address to represent them. They are nothing of the sort. In their purest form to replicate this issue, they are defined like this:
class Address < ActiveRecord::Base
has_and_belongs_to_many :people
end
class Person < ActiveRecord::Base
has_and_belongs_to_many :addresses
accepts_nested_attributes_for :addresses
end
When we go to create a new Person record:
Person.create(:addresses_attributes => { "0" => { :suburb => "Camperdown" } })
It inserts 1 Person record, 1 Address record but 2 join table records.
So, wtf?
We originally thought it was a bug in our application. How, in all realities, could Rails have a bug, right?
Wrong!
I should know how many bugs Rails could have. I should have been more wary. I was not. And it bit me in the arse. So out of curiosity I googled the issue and saw that others came across it and then I tried checking out to v2.3.4, which worked!. So there was a regression between v2.3.5 and v2.3.4. A simple git bisect bad v2.3.5 with git bisect good v2.3.4 put me on the way to finding out what this was. A couple of bisects later, I found the offending commit was 6b2291f3, by Eloy Duran.
A “solution(?)”
So I generated an application to simply demonstrate that this was a 2.3.5 regression. As I say in the README, I suggest using 2-3-stable if this bothers you. Alternatively there’s always Rails 3, or simply specifying the :uniq => true option on your has_and_belongs_to_many.
That was a fun 7 hours.
As I found out this (the next) morning and Tim Riley points out in the comments the ticket for this bug is #3575 and the related commit is 146a7505 by Eloy Duran also. Freezing rails to v2.3.5 and git cherry-picking this commit into this frozen version fixes it.
more »
Announcing Rails Dispatch! »
Created at: 07.04.2010 20:00, source: Engine Yard Blog, tagged: News rails Rails 3
At Engine Yard, we’re always looking for new and fun ways to give back and help grow the community. Most recently, we organized the Ruby Summer of Code, designed to help foster student participation in open source development. Today, as the next project on our long list, we’re launching RailsDispatch.com.
Rails Dispatch will provide timely releases of up-to-date educational content, in the form of blog posts, tutorial videos, screencasts and more. We’ll be working with Engine Yard and community developers to put together the resources, and aiming for weekly content pushes from now until RailsConf. Until then the bulk of the content, in the spirit of the times, will focus on Rails 3, and the changes and new features it brings. As the content library grows, we’ll also be releasing new, interactive elements and community resources, so you definitely want to hop off the RSS feed and on to the site every little bit to check out what’s new.
Check out today’s release for the technical details on how Rails 3 makes life better, and an update to the famous Rails ‘blog in 15 minutes’ screencast.
As always, leave your feedback here, and check out Rails Dispatch!
more »
