Rails 3.1 Has Landed »
Created at: 18.10.2011 23:43, source: Engine Yard Blog, tagged: cloud Product Technology
We are ecstatic to let all of you wonderful Engine Yard Cloud and Engine Yard Managed customers know that we are officially supporting Rails 3.1! Rails 3.1 brings a metric ton (we measured) of new features and functionality that will benefit you all. Instead of just listing them out, I asked a few folks including some of our staff what they were most looking forward to in Rails 3.1.
Asset Pipeline
A whole slew of excited Twitterers said they were most looking forward to the use of the asset pipeline! Our own Evan Machnic sums it up by saying, “The asset pipeline makes it easy to organize and serve your assets. Pairing the pipeline with a well-tuned web and application server means that web sites are rendered blazing fast.” I would have said the same thing. Word for word. I’m suspicious that he copied my work.
Prepared Statements
Thomas Enebo and Nick Sieger of the JRuby crew are wildly excited about the use of Prepared Statements. PostgreSQL customers will notice a huge performance boost. It was because of Rails 3.1’s improvements for PostgreSQL that we decided to offer PostgreSQL to our customers. PostgreSQL is currently released in Alpha. MySQL customers utilizing complex statements will also notice a boost.
HTTP Streaming
Danish Khan and several folks from Facebook also chimed in for HTTP streaming response. Browsers will load pages faster than ever before, especially when used with the asset pipeline! Please note that currently we only support streaming with Unicorn. We are working as quickly as possible to support other web servers.
jQuery
Shane Becker is excited by jQuery being the default JavaScript framework for Rails with 3.1. He says, “One of my favorite Rails 3.1 features is that jQuery is the default JavaScript library. It's such a small detail. But there's something in the details, right?”
:bulk => true
Josh Lane and Tyler Poland both echoed that the use of :bulk => true is what will make their dreams come true. Using :bulk => true option in change_table migrations ensures that all schema changes use a single ALTER statement.
Tyler explains that, “Before Rails 3.1, each statement within a migration is mapped to an individual 'ALTER TABLE' statement within MySQL. This is problematic since each of those 'ALTER' statements re-writes the entire table on disk and blocks writes while it is running. Generally, each 'ALTER' takes about the same amount of time to run as a single statement capturing all those changes written in the 'SQL' syntax. Baron Schwartz discusses the topic here. This feature enables Rails to merge the changes into a single statement automatically, thus increasing the speed of your migrations.”
This is just a sampling of the new features available to you with Rails 3.1. Now everyone who has been building Rails 3.1 applications can start deploying them on Engine Yard today. For all additional documentation, go here.
When you upgrade your Rails 3 application to Rails 3.1, all this goodness is available when you ey deploy!
more »
Ruby, Concurrency, and You »
Created at: 14.10.2011 19:41, source: Engine Yard Blog, tagged: Open Source Technology 1.8 1.9 concurrency GIL implementations ironruby jruby macruby maglev MRI parallelism rubinius ruby threads
| Ruby Implementation | Concurrency | Parallelism |
|---|---|---|
| MRI 1.8 | ✔ | |
| MRI 1.9 | ✔ | |
| Rubinius 1 | ✔ | |
| Rubinius 2 | ✔ | ✔ |
| JRuby | ✔ | ✔ |
| MacRuby | ✔ | ✔ |
| Maglev | ✔ | |
| IronRuby | ✔ | ✔ |
A big topic in the world of Ruby this year has been how to get more out of Ruby, specifically, how to get more done in parallel. The topic of concurrency, though, is one fraught with misunderstanding. This is largely due to the complexities of not only thinking about multiple things at once, but the limitations of Ruby implementations and operating systems.
In this article, I’ll lay the groundwork for understanding the difference between concurrency and parallelism. Then, I’ll look at how a programmer experiences them.
Concurrency vs. Parallelism
This has been discussed many times, but I sometimes still have difficulty with it. Let’s first break down the definitions of these two words:
- Concurrent: existing, happening, or done at the same time
- Parallel: occurring or existing at the same time or in a simple way
Hmm, ok. Well, that hasn’t improved our thinking about these two topics. We need to dig deeper into how the world of computing applies to these words. Rather than looking at the abstract, let’s instead consider some real world examples.
A “Real World” Example
Let’s say you’ve sat down for the evening to complete tomorrow’s homework. This evening you’ve got both Math and History worksheets to fill out. Tonight for some reason, you decide to do one problem in Math, then one problem in History, then back to Math, etc until all the problems are done.
In the parlance of computing, you’re now doing your Math and History worksheets concurrently. This is because your Current task list includes 2 items: Math worksheet and History worksheet.
Now, clearly you the reader can see a problem here. By switching back and forth, completing your homework will probably take longer than if you did the complete Math worksheet then did the History worksheet. In other words, if you did the worksheets in serial.
So, if concurrent means “having multiple outstanding tasks at once”, then what is parallel? Parallel is the ability to make progress on multiple tasks simultaneously.
Let’s say you’ve been asked to read the book One O’Clock Jump by Lise McClendon. You also need to drive down to San Diego for Comic-Con. Thankfully you find that One O’Clock Jump is available on audiobook!
You can now listen to the book while driving. You’re simultaneously making progress on two separate tasks. This is the equivalent of parallelism in computing.
I hope that these real world examples help illustrate the difference between concurrency and parallelism. Now let's apply this newfound knowledge to Ruby.
Back to Ruby
One reason this problem can be difficult to understand is because Ruby only provides a single mechanism for concurrency. But, whether or not these Threads are parallel depends on a number of factors.
MRI 1.8
Let’s look at MRI 1.8 (and MRI forks such as REE) to begin with, because it has the simplest model. MRI 1.8 uses a technique known as “green threads” to implement Threads. This means that every once in a while (around 100 milliseconds), the program says “oh, I should let another thread run now!” This saves the current info into the current thread and restores another thread. This is exactly like our homework example above. We can have as many things as we’d like in our task list, but we can only make progress on one of them at a time.
There is a wrinkle in the concurrency/parallelism game that I haven’t mentioned before now. This wrinkle is IO, namely how Threads interact when waiting for some external event. MRI 1.8.7 is quite smart, and knows that when a Thread is waiting for some external event (such as a browser to send an HTTP request), the Thread can be put to sleep and be woken up when data is detected. This simple consolation improves the usage of Threads so much that for a very long time the MRI 1.8.7 model was good enough for all Ruby programs.
MRI 1.9
Switching back to Ruby implementations, let’s look at MRI 1.9. As has been previously reported, MRI 1.9 removes the “green threads” we had in MRI 1.8 and uses native threads to implement the Thread class. Now, what are these “native threads”? These are are units of concurrency that the underlying operating system is aware of. A big reason to switch to use native threads is that it vastly simplifies the implementation of Threading. The operating system handles the low level parts of saving and restoring Thread information in a completely transparent way. Additionally, letting the OS know what parts of a program should be concurrent allows it to use the full resources of the computer to make that happen. In this modern world, that means using multiple cores.
Up until now, all we’ve talked about with Ruby’s Threading model was about concurrency, the ability to have multiple outstanding tasks at once. Now when we add in the idea of multiple cores, we can finally talk about parallelism. When a computer includes multiple cores (which is pretty much every computer now), those cores can run different code simultaneously, providing true parallelism. When a computer only has one core, there is no true parallelism, instead there is just simple concurrency, even at the OS level. The OS manages all the processes and threads in the system the same way you handled your Math and History worksheets, doing one for a little while, then grabbing another one.
Back to multiple cores though. Now that there is the opportunity to run things truly in parallel, we have to look at if Ruby can take advantage of that. Since MRI 1.9 uses OS threads, it can actually spread out your Ruby Threads to multiple cores!
Unfortunately, MRI 1.9 prevents the Ruby code itself from running in parallel by requiring that any thread running Ruby code hold a lock. This lock is commonly knows as the GIL (Global Interpreter Lock) or GVL (Global VM Lock).
There are a few reasons the GIL to exists, but for this discussion we will say that it’s because the non-Ruby parts of MRI 1.9 are not thread-safe. This means if data were manipulated by multiple threads at the same time, the data could become corrupt. The important thing for this post is how it applies to parallelism: the GIL inhibits parallelism within Ruby code.
MRI 1.9 uses the same technique as MRI 1.8 to improve the situation, namely the GIL is released if a Thread is waiting on an external event (normally IO) which improves responsiveness. MRI 1.9 also includes an experimental API that C extensions can use to run some C code without the GIL locked to utilize parallelism. This API is very restrictive though because no Ruby object may be accessed in any way while the GIL is not held by the current thread.
That about sums up the situation with MRI 1.8 and 1.9 with regards to concurrency and parallelism. Both provide concurrency of Ruby code, but neither provide parallelism of Ruby code.
Rubinius
Let’s take a quick look at other Ruby implementations where things are a bit different than MRI. I’ll start with Rubinius, since it’s the one I’m most familiar with. Rubinius 1.x also had a GIL and worked pretty much the same as MRI 1.9. With the upcoming 2.0 release though, the GIL will be removed, allowing Ruby code to run fully concurrent and fully parallel. We think this opens up a lot of uses for Ruby (parallel algorithms, etc) that Rubinius couldn’t handle well previously.
JRuby
JRuby layers the Thread class on top of Java’s thread class, so the threading model is whatever the JVM supports. That being said, OpenJDK is the primary JVM; it puts a Java thread directly onto an OS thread with no GIL. Thusly, JRuby almost always has full concurrency and parallelism available to it.
MacRuby
MacRuby also uses Cocoa’s NSThread as its abstraction, which runs without a GIL. So, this is another fully parallel implementation.
Maglev
Maglev runs directly on top of a Smalltalk VM and thusly layers the Thread class on top of a concept called Smalltalk Processes. In this case, the GemStone VM implements Processes in the same way as MRI 1.8, namely via “green threads” that don’t expose concurrency to the OS, and therefore, have no parallelism.
IronRuby
Lastly, IronRuby layers Thread directly on top of CLR’s threads without a GIL.
Conclusion
I hope that this helps to clear up what concurrency and parallelism are and how the different Ruby implementations address them. Having this understanding is critical for discussing and understanding topics such and thread-safety of libraries and performance of applications.
In future posts, we’ll look to build on this knowledge to help you make the best use of Ruby!
more »
The Resque Way »
Created at: 13.10.2011 22:07, source: Engine Yard Blog, tagged: Technology
Introduction
When we started using Resque two years ago we were impressed by two things: its power out-of-the-box and its opportunities for scalability. Over the past two years, we’ve explored Resque internals and plugins. We’d like to share what we’ve learned from our practical experience using Resque during different phases of the web application life cycle.
Working in development and running tests
The first challenge we encountered was deciding how to organize development for non-Ruby developers and test environments. Our aims were to keep most of the team free from knowledge about Resque and Workers and to avoid stubs in tests. This idea resulted in the inline mode for Resque that solved the problem perfectly.
First step in production: ActiveRecord and Resque
Since there are many articles covering how to deploy Resque, we’ll focus on issues that haven’t been thoroughly described before.
After two weeks in production it was clear to us that there was an issue with Resque and ActiveRecord. In some cases you may enqueue a Resque job while inside a database transaction, but Redis commands are independent from database transactions. Sometimes a worker starts processing a job before the transaction that creates the specific job commits. After a few ugly solutions that forced us to restructure the code, we discovered what we needed in the ar_after_transaction gem. This Resque FAQ details how to make a Resque job wait for an ActiveRecord transaction commit, so that it can see all the changes made by that transaction. Of course, if you ensure database transactions are committed prior to enqueuing jobs, you can structure your application in any manner you desire.
Second step in production: Outer HTTP APIs with Resque
External HTTP calls are a common bottleneck for web requests and need to be moved to the background because of unpredictable response time and downtime for these APIs. You may find the resque-retry plugin (and resque-scheduler plugin as a dependency) useful, allowing you to retry exceptions in workers with a customizable delay.
Here are some common HTTP errors in the "just try again to fix" category:
@retry_exceptions = [
Timeout::Error,
Errno::ECONNREFUSED,
Errno::ECONNRESET,
# errors from your favorite
# Net::HTTP wrapping library goes here
]
N.B. Errno codes are platform-specific, make sure you understand how portable your code needs to be.
Third step in production: Email sending
If you are using an external SMTP server to send email, you will need to move the email delivery to the background -- with Resque’s help, of course. There are number of solutions available, such as ar_mailer. We decided to use resque_mailer. We encountered an initial problem with Net::SMTPServerBusy and Timeout::Error exceptions that appeared randomly while sending email. We found resque-retry is also useful here. In the case of resque_mailer we wanted to have shared configuration for resque-retry for every Mailer class. We found that this was not easy because historically Resque is configured through instance variables in a class that are not inherited. We needed a base class that could share all instance variables across any child class:
class AsyncApplicationMailer < ActionMailer::Base
include Resque::Mailer
extend Resque::Plugins::Retry
# All Notifiers inherited from this class
# require same resque-retry options.
# Resque workers are classes but not instances of classes.
# That is why resque retry require class variables that is not inherited
# In order to setup same resque-retry class variables
# for every inherited class we need this hack.
def self.inherited(host)
super(host)
host.class_eval do
@retry_exceptions = [Net::SMTPServerBusy, Timeout::Error, Resque::DirtyExit]
@retry_limit = 3
@retry_delay = 60 #seconds
end
end
end
Use this class as the base class for all your mailers and retry configuration will be shared among them.
Play Minesweeper: Bug fixing along the way
As the number of users and load grew, we decided it was a good idea to include other plugins like resque-loner (to track job uniqueness) and resque-cleaner (to cleanup failed jobs). This required fixing and improving these libraries:
- Fix resque-scheduler process death after pushing invalid resque job class
- Fix infinite recursion in edge case usage of ar_after_transaction
- Make resque-mailer respect the
ActionMailer::Base.perform_deliveriesconfiguration option - Fix resque-retry suppression in resque failures for jobs with custom identifier
- Modulize resque-loner to be compatible with other plugins
Business requirements increase: Returning results from jobs
The original Resque design does not allow you to receive something back after the worker completes. This may be beneficial for most use cases. However, for our use case (payment checkout through an outer Authorization gateway) it was important to know whether the worker was in progress or not, and if not - whether it was successful or not.
Many Resque plugins introduce a job identifier based on arguments passed to this job, but there is no standardization regarding how it should be done. Here are two examples:
resque-retry:
# @abstract You may override to implement a custom identifier,
# you should consider doing this if your job arguments
# are many/long or may not cleanly cleanly to strings.
#
# Builds an identifier using the job arguments. This identifier
# is used as part of the redis key.
def identifier(*args)
resque-loner:
#
# Payload is what Resque stored for this job along with the job's class name.
# On a Resque with no plugins installed, this is a hash containing :class and :args
#
def redis_key(payload)
In order to synchronize a job identifier across plugins, we implemented our own interface for jobs with completion status. This can be helpful for people that need something like this. However, don't confuse this with execution status in resque-status.
Contribution to open source
Last but not least, thank you to these people responsible for supporting our patches:
- @bvandenbos/resque-scheduler: Merged. Thanks. Will ship in 1.9.8.
- @defunkt/resque: Love it. ... This is a great patch - docs, tests, and code! Thanks.
- @zapnap/resque_mailer: Good point. I just pushed a change to the repository that should take care of this and released a new gem (1.0.1)
- @jayniz/resque-loner: Awesome, thanks! Will pull ASAP :)
more »
Rails 3.1 and JRuby »
Created at: 10.10.2011 13:03, source: Ruby Rockers, tagged: Solutions Technology jruby rails
Hi Folks,
If you are doing any application in which you required JRuby as a platform. You can use Rails3.1 with that. activerecord-jdbc-adapter 1.2.0 is ok with that!
All the steps are same as for the normal Rails Application
You can find more details here http://blog.jruby.org/2011/09/ar-jdbc-1-2-0-released/
Cheers,
Arun
more »
Cache Money: Why Utilize Caching? »
Created at: 07.10.2011 01:41, source: Engine Yard Blog, tagged: Technology caching http caching memcached redis
Caching is extremely useful to implement for web applications. While it can be a good idea for the majority of web applications to utilize caching, there are times where caching is unnecessary and can be a time sink for developers. When is it a good idea to use caching? When an application is getting a lot of requests and New Relic detects a strain on your instances, it's probably time to look into caching.
There are a few different types of caching and some good resources to help decide which type is best for your application. We will look at memory caches using Memcached or Redis, and HTTP caches such as Varnish and Rack::Cache. If you are using Rails you can easily use its built-in caching. Check out the Rails Guide Caching with Rails for an overview.
Redis
Redis is a key-value store. With Redis, your data is held in memory and will be persisted to disk if necessary. This allows it to be useful for caching purposes. Redis is used by companies such as GitHub, craigslist, and here at Engine Yard.

Image courtesy of Redis 101 from Peter Cooper
Let us look at how Redis can be useful for a web application.
Most web application requests return a variety of different lists such as posts, comments, followers, etc. The majority of key-value stores store these lists in single units (or a "blob"). As a result, most typical list related operations, such as adding an element, are inefficient. Fortunately, Redis has native list support which allows it to perform operations on lists very efficiently.
Counter caching in Rails allows you to accelerate performance by reducing the number of SQL queries and preventing unnecessary instantiation of objects, but the Rails implementation using generic SQL features does not scale well. Engine Yard customer MUBI uses Redis to replace Rails default counter caching with speedy Redis counters. Redis allows counter caches to be implemented with extremely fast, non-blocking atomic operations.
Let's explore the benefits of the counter caching using the example that Post has_many :comments and Comment belongs_to :post
>> Post.last.comments.size
This results in the following three SQL queries:
Post Load (0.4ms) SELECT * FROM `posts` LIMIT 1 Post Columns (2.9ms) SHOW FIELDS FROM `posts` SQL (0.3ms) SELECT count(*) AS count_all FROM `comments` WHERE post_id = 1
Caching allows us to instead use a single query by adding the following relationship in comment.rb:
belongs_to :post, :counter_cache => true
We also have to update the Post table with a new attribute:
add_column :posts, :comments_count :integer
Now there is just one trip to the database to fetch the comment count:
User Load (0.4ms) SELECT * FROM `posts` ORDER BY posts.id DESC LIMIT 1
Why stop there? We can also utilize Redis for counter caching. When we do a count query in SQL, we can write the result to a Redis key. For example:
SELECT count(*) AS count_all FROM `comments` WHERE post_id = 1
is also written to Redis:
redis = Redis.new redis.incr "post:1234:comments:count"
When pulling post records from the database we can test for the existence of relevant Redis keys for whatever counts we need. If they exist then we're done. If they do not exist, then they are looked up in MySQL and pushed into Redis, ready for next time. Also, since we are utilizing the redis incr command we avoid having to do a SQL query, except to initialize it, and since it is an atomic operation we can guarantee that the count always represents the exact number of times it was called without any race conditions.
Now some people have called Redis Memcached on steroids. However, it does not mean Memcached should be disregarded. There have been a lot of benchmarks done between Redis and Memcached and a lot of debate about the accuracy of those benchmarks. What it really comes down to is what you believe is best suited for your application.
Memcached
One difference to consider is Memcached does Least-Recently-Used (LRU) eviction of values from the cache. With Redis you only evict data when it's explicitly removed or expired, and it will store as much data as you put into it. Now in Redis 2.2 you can configure Redis using the maxmemory flag instead of setting expires so you can get LRU cache, but it is an option that you have to enable and is not the default. Memcached is being used by companies like Twitter, Reddit and Zynga. A final thing to consider is that Memcached is also integrated into Rails since Rails 2.1, making it even easier to use.
Memcached keeps the values in RAM so it's a transitory cache. Keep in mind that it discards the oldest values, so you cannot assume that data stored in Memcached will still be there when you need it. As stated earlier, it's very important to make sure it's right for your application because Memcached is slower than SELECT on localhost for small sites *; you should ensure you can keep up with the requests or it won't help you to use it.

Image courtesy of Redis 101 from Peter Cooper
A good use for Memcached is doing action caching. Action caching is a lot like page caching, but the flow is slightly different. With action caching the incoming web requests goes from the webserver to the Rails stack. One issue with page caching that the Rails guides goes through is that you cannot use it for pages that require to restrict access somehow.
Example: If you want to only allow authenticated users to edit or create a Product object, but still cache those pages:
class ProductsController < ActionController
before_filter :authenticate, :only => [ :edit, :create ]
caches_page :list
caches_action :edit, :expires_in => 1.hours
def list; end
def create
expire_page :action => :list
expire_action :action => :edit
end
def edit; end
end
Also do not forget to setup your configuration for Memcached. There is a good amount of information from an older post that discusses when Rails 2.1 got better integrated caching.
# config/environments/production.rb
config.cache_store = :mem_cache_store, 'localhost:11211'
memcache_options = {
:c_threshold => 10,000,
:compression => false,
:debug => false,
:namespace => 'app-#{RAILS_ENV}',
:readonly => false,
:urlencode => false
}
CACHE = MemCache.new memcache_options
HTTP Caching
HTTP caching is another form of caching you can utilize. If you are not familiar with HTTP caching, this blog post offers a nice overview. Two useful projects worth checking out are Varnish and Rack::Cache. Now you might be asking "Why do people use them?" One reason is to reduce latency. In regards to latency, the request is satisfied from the cache, which takes less time, making the Web seem more responsive. For example, if you have dynamically generated content, using an HTTP cache like Varnish will result in better performance than using Memcached. This is because when using an HTTP cache your application server is not accessed for cache hits.
Varnish provides you with a default setup, which can be found in default.vcl that will work for most applications. However, you have the ability to really go in and customize it, which is recommended since Varnish assumes things that might not be correct about your application. The only work you have to do is ensuring your resources have appropriate HTTP caching parameters (Expires/max-age and ETag/Last-Modified). Do not forget to normalize the hostname to avoid caching the same resource multiple times. Some other things to remember is that Varnish was meant to run on 64-bit machines if you try it on 32-bit it will work, but you will definitely be running into some issues. Other recommendations I would make are keep your VCL simple and tune when you really need to using Varnish tips and best practices that others have found useful.
Another useful way to utilize HTTP caching is with Ryan Tomayko's Rack::Cache. A key aspect of Rack::Cache is the middleware piece that sits in the front of each backend process and does not require any infrastructure investment of a separate daemon process like Varnish.
Varnish has plenty of great examples and real world use cases on their site that can help you truly understand the usefulness it provides. Check them out to get a feel for how you can utilize it for your application. Also, if you want to test out Varnish on AppCloud take a look at the Chef recipe we have for it.
Now, it's up to you to decide whether all or some of these types of caching will be beneficial for you. Make sure to utilize them to the fullest potential to ensure you and your customers have the most enjoyable experience possible. If you have any caching experiences or gotchas, please share them in the comments.
Resources:
Redis 101 presentation
A Collection of Redis Use Cases
To Redis or Not To Redis?
Memcached Basics for Rails
Caching, Memcached and Rails
Scaling Rails with Memcached
Things Caches Do
Caching Tutorial
You're Doing It Wrong
more »


