Cache Money: Why Utilize Caching? »

Created at: 07.10.2011 01:41, source: Engine Yard Blog, tagged: Technology caching http caching memcached redis

Caching is extremely useful to implement for web applications. While it can be a good idea for the majority of web applications to utilize caching, there are times where caching is unnecessary and can be a time sink for developers. When is it a good idea to use caching? When an application is getting a lot of requests and New Relic detects a strain on your instances, it's probably time to look into caching.

There are a few different types of caching and some good resources to help decide which type is best for your application. We will look at memory caches using Memcached or Redis, and HTTP caches such as Varnish and Rack::Cache. If you are using Rails you can easily use its built-in caching. Check out the Rails Guide Caching with Rails for an overview.

Redis

Redis is a key-value store. With Redis, your data is held in memory and will be persisted to disk if necessary. This allows it to be useful for caching purposes. Redis is used by companies such as GitHub, craigslist, and here at Engine Yard.

Redis Slide

Image courtesy of Redis 101 from Peter Cooper

Let us look at how Redis can be useful for a web application.

Most web application requests return a variety of different lists such as posts, comments, followers, etc. The majority of key-value stores store these lists in single units (or a "blob"). As a result, most typical list related operations, such as adding an element, are inefficient. Fortunately, Redis has native list support which allows it to perform operations on lists very efficiently.

Counter caching in Rails allows you to accelerate performance by reducing the number of SQL queries and preventing unnecessary instantiation of objects, but the Rails implementation using generic SQL features does not scale well. Engine Yard customer MUBI uses Redis to replace Rails default counter caching with speedy Redis counters. Redis allows counter caches to be implemented with extremely fast, non-blocking atomic operations.

Let's explore the benefits of the counter caching using the example that Post has_many :comments and Comment belongs_to :post

>> Post.last.comments.size

This results in the following three SQL queries:

Post Load (0.4ms) SELECT * FROM `posts` LIMIT 1
Post Columns (2.9ms) SHOW FIELDS FROM `posts`
SQL (0.3ms) SELECT count(*) AS count_all FROM `comments` WHERE post_id = 1

Caching allows us to instead use a single query by adding the following relationship in comment.rb:

belongs_to :post, :counter_cache => true

We also have to update the Post table with a new attribute:

add_column :posts, :comments_count :integer

Now there is just one trip to the database to fetch the comment count:

User Load (0.4ms) SELECT * FROM `posts` ORDER BY posts.id DESC LIMIT 1

Why stop there? We can also utilize Redis for counter caching. When we do a count query in SQL, we can write the result to a Redis key. For example:

SELECT count(*) AS count_all FROM `comments` WHERE post_id = 1

is also written to Redis:

redis = Redis.new
redis.incr "post:1234:comments:count"

When pulling post records from the database we can test for the existence of relevant Redis keys for whatever counts we need. If they exist then we're done. If they do not exist, then they are looked up in MySQL and pushed into Redis, ready for next time. Also, since we are utilizing the redis incr command we avoid having to do a SQL query, except to initialize it, and since it is an atomic operation we can guarantee that the count always represents the exact number of times it was called without any race conditions.

Now some people have called Redis Memcached on steroids. However, it does not mean Memcached should be disregarded. There have been a lot of benchmarks done between Redis and Memcached and a lot of debate about the accuracy of those benchmarks. What it really comes down to is what you believe is best suited for your application.

Memcached

One difference to consider is Memcached does Least-Recently-Used (LRU) eviction of values from the cache. With Redis you only evict data when it's explicitly removed or expired, and it will store as much data as you put into it. Now in Redis 2.2 you can configure Redis using the maxmemory flag instead of setting expires so you can get LRU cache, but it is an option that you have to enable and is not the default. Memcached is being used by companies like Twitter, Reddit and Zynga. A final thing to consider is that Memcached is also integrated into Rails since Rails 2.1, making it even easier to use.

Memcached keeps the values in RAM so it's a transitory cache. Keep in mind that it discards the oldest values, so you cannot assume that data stored in Memcached will still be there when you need it. As stated earlier, it's very important to make sure it's right for your application because Memcached is slower than SELECT on localhost for small sites *; you should ensure you can keep up with the requests or it won't help you to use it.

Image courtesy of Redis 101 from Peter Cooper

A good use for Memcached is doing action caching. Action caching is a lot like page caching, but the flow is slightly different. With action caching the incoming web requests goes from the webserver to the Rails stack. One issue with page caching that the Rails guides goes through is that you cannot use it for pages that require to restrict access somehow.

Example: If you want to only allow authenticated users to edit or create a Product object, but still cache those pages:

class ProductsController < ActionController
  before_filter :authenticate, :only => [ :edit, :create ]
  caches_page :list
  caches_action :edit, :expires_in => 1.hours

  def list; end

  def create
    expire_page :action => :list
    expire_action :action => :edit
  end

  def edit; end

end

Also do not forget to setup your configuration for Memcached. There is a good amount of information from an older post that discusses when Rails 2.1 got better integrated caching.

# config/environments/production.rb

config.cache_store = :mem_cache_store, 'localhost:11211'

memcache_options = {
                        :c_threshold => 10,000,
                        :compression => false,
                        :debug => false,
                        :namespace => 'app-#{RAILS_ENV}',
                        :readonly => false,
                        :urlencode => false
}

CACHE = MemCache.new memcache_options

 

HTTP Caching

HTTP caching is another form of caching you can utilize. If you are not familiar with HTTP caching, this blog post offers a nice overview. Two useful projects worth checking out are Varnish and Rack::Cache. Now you might be asking "Why do people use them?" One reason is to reduce latency. In regards to latency, the request is satisfied from the cache, which takes less time, making the Web seem more responsive. For example, if you have dynamically generated content, using an HTTP cache like Varnish will result in better performance than using Memcached. This is because when using an HTTP cache your application server is not accessed for cache hits.

Varnish provides you with a default setup, which can be found in default.vcl that will work for most applications. However, you have the ability to really go in and customize it, which is recommended since Varnish assumes things that might not be correct about your application. The only work you have to do is ensuring your resources have appropriate HTTP caching parameters (Expires/max-age and ETag/Last-Modified). Do not forget to normalize the hostname to avoid caching the same resource multiple times. Some other things to remember is that Varnish was meant to run on 64-bit machines if you try it on 32-bit it will work, but you will definitely be running into some issues. Other recommendations I would make are keep your VCL simple and tune when you really need to using Varnish tips and best practices that others have found useful.

Another useful way to utilize HTTP caching is with Ryan Tomayko's Rack::Cache. A key aspect of Rack::Cache is the middleware piece that sits in the front of each backend process and does not require any infrastructure investment of a separate daemon process like Varnish.

Varnish has plenty of great examples and real world use cases on their site that can help you truly understand the usefulness it provides. Check them out to get a feel for how you can utilize it for your application. Also, if you want to test out Varnish on AppCloud take a look at the Chef recipe we have for it.

Now, it's up to you to decide whether all or some of these types of caching will be beneficial for you. Make sure to utilize them to the fullest potential to ensure you and your customers have the most enjoyable experience possible. If you have any caching experiences or gotchas, please share them in the comments.

Resources:

Redis 101 presentation
A Collection of Redis Use Cases
To Redis or Not To Redis?
Memcached Basics for Rails
Caching, Memcached and Rails
Scaling Rails with Memcached
Things Caches Do
Caching Tutorial
You're Doing It Wrong


more »

Multiple Domain Page Caching »

Created at: 22.01.2010 18:00, source: RailsTips - Home, tagged: caching harmony

The other day Brandon Wright emailed me about the following tweet:

Just deployed full page caching on Harmony. Our log file stopped spinning by which made me happy and sad.

Routing

It might seem like black magic, but it isn’t all that hard. The front side for Harmony is not the same as a typical Rails app as we have multiple domains pointed at Harmony and the paths are not known up front so they don’t go in the routes file. In order to get everything headed to a controller, the last route in our file is this:

map.dispatch '*path', :controller => 'the', :action => 'dispatch'

This uses Rails route globbing to send every path to an action named dispatch in a controller dubiously named “the” (because it made us laugh). From there, we determine if it we can find the site and if the site has an item (page, link, blog, post, etc.) that matches the path.

Caching

Somewhere down the rabbit hole we render that item based on it’s liquid template, immediately after which we call something like this:

cache_item(@item, contents)

# which looks kind of like this
def cache_item(item, contents)
  # gone for brevity
  
  FileUtils.mkdir_p(File.dirname(item.page_cache_path))
  File.open(item.page_cache_path, 'w+') { |f| f.puts(contents) }
end

*We could have used caches_page in Rails, but we are already using that without including the http host for asset and theme file caching, so it was easier to just roll our own.

All cache_item does is ensure that the directory exists and then write the contents of what we are about to send back to the browser into a file. Really nothing fancy. So what does item.page_cache_path look like? For a site like railstips.org and a path of /dude/, we end up with the following cache path:

#{RAILS_ROOT}/public/cache/railstips.org/dude/index.html

Note the use of the domain in the cache path. Since we have that, we can use apache rewrites along with conditions to tell apache to check if a cached file exists based on the host. If it does, we server that file and if it doesn’t, we just hit rails, cache the file, and return the response. We use Moonshine for our deployments so all we need to do is set the Passenger page cache directory like this:

:passenger:
  :page_cache_directory: '/cache/%{HTTP_HOST}'

When we deploy, this sets up the following Apache rewrite rules:

# Rewrite to check for Rails non-html cached pages (i.e. xml, json, atom, etc)
RewriteCond  %{THE_REQUEST} ^(GET|HEAD)
RewriteCond  %{DOCUMENT_ROOT}/cache/%{HTTP_HOST}%{REQUEST_URI} -f
RewriteRule  ^(.*)$ /cache/%{HTTP_HOST}$1 [QSA,L]

# Rewrite to check for Rails cached html page
RewriteCond  %{THE_REQUEST} ^(GET|HEAD)
RewriteCond  %{DOCUMENT_ROOT}/cache/%{HTTP_HOST}%{REQUEST_URI}index.html -f
RewriteRule  ^(.*)$ /cache/%{HTTP_HOST}$1index.html [QSA,L]

Note that in the RewriteRule, we include the HTTP_HOST, which when visiting railstips.org, would be railstips.org.

One URL to Rule Them All

The key to this being effective is only having one true url for each page. We do this right now by redirecting www to no-www and ensuring that each page has a trailing slash. First, no-www.

# no www
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ http://%1$1 [R=301,L]

Next, we ensure that there is always a trailing slash when needed. This means that /foo redirects to /foo/ and foo.json just stays as foo.json.

RewriteCond  %{THE_REQUEST} ^(GET|HEAD)
RewriteCond %{REQUEST_URI} !^/admin/
RewriteRule ^(.*/[^/\.]+)$ $1/ [R]

Ensuring that each page has one URL is better for search engines and analytics. You don’t end up with split page rank for the same page (with and without slash) and the same thing is true for pageviews.

Cache Clearing

Now that I’ve explained a bit how we do the caching, I’ll mention quickly how we clear it. As they say, cache expiration and naming are the two hardest things to do in programming. We opted for the most simple solution that would work for now.

I made a simple site cache clearer module that I include in any model that can affect a site on the front side. It looks something like this.

module SiteCacheClearer
  def self.included(model)
    model.after_save    :clear_item_cache
    model.after_destroy :clear_item_cache
  end
  
  def clear_item_cache
    site.clear_item_cache if site.present?
  end
end

# To use
class Item
  include MongoMapper::Document
  include SiteCacheClearer
end

All it does is remove the entire site’s cache whenever the model is updated or destroyed. Like I said, nothing fancy. Doesn’t check if the thing is published. Doesn’t check what pages it is actually shown on and only removes them. It just blows away cache when things change.

Someday we’ll definitely do something more advanced like a reference-based cache where only the pages that need to be blown away are, but this is working great for now. Hope this is helpful to someone.

The main thing to remember is to use the host and make sure there is only one way to get to the resource.

So what does this all mean to our read heavy application? Well, we end up with Scout graphs like this:

Harmony Page Caching

The blue is apache requests and the orange is Rails requests. Notice that as our apache requests go up, our Rails requests stay pretty steady.


more »