A Modern Guide to Threads »
Created at: 21.10.2011 03:49, source: Engine Yard Blog, tagged: Technology code ruby threads
NOTE: Mike Perham from Carbon Five recently wrote a blog post about using threads in Ruby. With his permission, we're reposting it here.
Carbon Five has been building state-of-the-art web applications for startups and large institutions since early 2000. Since their inception, they have focused on quality and value as the critical components of project success.
I spoke recently at Rubyconf 2011 on some advanced topics in threading. What surprised me was how little experience people had with threads so I decided to write this post to give people a little more background on threads. Matz actually recommends not using threads (see below for why) and I think this is a big reason why Rubyists tend not to understand threading.
Simple Threading
Every time you execute ruby, rails or irb, you are creating a process. Within each process, you have something which is executing the code in your process. This is called a thread.
Your operating system starts every process with a "main" thread. Ruby allows you to create as many additional threads as you want by calling Thread.new with a block of code to be executed. Once the block of code has finished executing, the thread is considered dead. If the main thread exits, the process dies.
t1 = Thread.new do
i = 0
1_000_000.times do
i += 1
end
end
t2 = Thread.new do
j = 0
1_000_000.times do
j += 1
end
end
t1.join
t2.join
Above we have two threads independently counting up to one million, while the main thread waits for them to finish by calling join on each thread. These two threads will execute concurrently ("operating or occurring at the same time") with your process's main thread. Not so hard, right?
Race Conditions
Generally your computer can execute one thread per core. I have a dual core CPU in this laptop which means I can execute two threads at the exact same time [1]. Now imagine I want to parallelize my counting above. Instead of having one thread count to two million, I will have two threads count to one million each. That should execute twice as fast because I'll be using two threads and thus both cores:
i = 0
t1 = Thread.new do
1_000_000.times do
i += 1
end
end
t2 = Thread.new do
1_000_000.times do
i += 1
end
end
t1.join
t2.join
puts i
You'd expect the result to print "2000000″, right? Nice try.
> jruby threading.rb 1330864
Any time multiple threads try to change the same variables, they have the potential for race conditions. Why is this?
The race condition is fundamentally due to the multi-step process of changing a variable. Even a simple increment in most languages is actually a multi-step process:
register = i # read the current value from RAM into a register register = register + 1 # increment it by one i = register # write the value back to the variable in RAM
One of the features of threads is that they are controlled by the operating system; the OS can decide to stop Thread 1 and start executing Thread 2 at any point in time. This means that the OS can stop your thread after it has read the value of i into a register. Imagine this sequence of events:
i = 0 # OS is running Thread 1 register = i # 0 register = register + 1 # 1 # OS switches to Thread 2 register = i # 0 register = register + 1 # 1 i = register # 1 # Now OS switches back to Thread 1 i = register # 1
Now technically both threads have incremented i. Will the resulting value be 2? No, because the Thread 2′s increment was lost when Thread 1′s last operation overwrote the memory. This is exactly why we saw 1330864 instead of 2000000; we lost a lot of increments due to this race condition. To avoid race conditions, any variable changes (fancy CS terminology: "mutation of shared state") must be done atomically so that other threads cannot see the change midway through the change process.
Thread Safety
Now you know the fundamental requirement for thread-safe code: mutation of shared state must be done atomically. Any time you change a variable that is shared by many threads, it needs to be done atomically. Unfortunately Ruby and most other mainstream languages only give you one tool to do this: the lock aka the mutex.
Mutex is short for "mutual exclusion" as in "only one thread can be executing this code at a time". Usage is simple:
@mutex = Mutex.new @mutex.synchronize do i += 1 end
Remember that increment is a three-step process but because only one thread can be in the synchronize block at a time, we won't have any problems with race conditions; the Mutex effectively makes the increment atomic.
Here's the dirty secret that everyone who uses threads learns eventually: Threads have such a terrible reputation because locks are very painful to use in practice.
Modern Threading
What are the alternatives? There are several:
- Atomic Instructions — turn multi-step operations into a single atomic operation
- Transactional Memory (STM) — ensure that changes are done as part of a transaction which guarantee atomicity
- Actors — refactor our code so that only one thread may change a variable
My take is that locks exponentially grow the complexity of your codebase and this is a major reason why Matz has always advised Rubyists to use Processes rather than Threads for concurrency. My recent Rubyconf talk on Threads discusses these options. The Clojure language mandates transactional memory for all variable changes. Scala and Erlang offer Actors. Using plain old threads and locks is akin to writing in assembly language: there are better ways now.
In my opinion, the last option is the preferred option since you avoid the race condition in the first place: "Don't communicate by sharing state; share state by communicating". The fundamental idea behind actors is to give each thread a separate responsibility and pass messages between threads according to those responsibilities.
My first piece of advice to Rubyists: avoid Thread.new. This is exactly what Matz is saying also. Instead look for infrastructure that can abstract the use of threads into a safer concurrency model. See Celluloid and girl_friday for instance. Of course, MRI is not particularly suited to high concurrency applications; JRuby is a better choice. Other languages like Clojure or Erlang were designed with concurrency as a language feature right from the start.
I'm not saying that threads and locks should be removed completely from all software. Rather we should treat them for what they are: low-level abstractions that developers should not be using directly. Like threads and locks I see a need for assembly language but it should be used very sparingly. Understanding and knowing how to use higher level concurrency abstractions like actors and STM will make concurrent pieces of your application easier to write and maintain. Unfortunately not all of these options are available to MRI but all are available to JRuby via Java libraries.
1 — True with JRuby, not true with MRI because of the infamous "Global Interpreter Lock". ^
more »
Refactoring Environment »
Created at: 24.02.2011 15:33, source: Ruby Rockers, tagged: Technology code ruby
Just writing about refactoring environment which can be best when you are doing refactoring in your code base.
Refactoring can be done at any time of your code. When you are refactoring code you can see follow things is a benefit for you.
- Have some good tests for which code you are going to refactor.
- Let’s pair when you start refactoring. Having a pair while doing refactoring is a great idea.
- Code must be under version control. GIT/SVN
more »
Using BETWEEN for SQL comparisons »
Created at: 14.11.2009 22:55, source: Robby on Rails, tagged: programming PostgreSQL code sql development PostgreSQL mysql refactoring
Recently, Carlos, suggested that I should start sharing some basic SQL tips that help with performance and/or general usage. I recently came across some code that I didn’t like to read and/or write. For example, let’s take the following…
SELECT * FROM brochures WHERE published_at <= now() AND archived_at >= now()
Essentially, this is pulling back some data WHERE the the brochures are considered published. (We have a project that allows people to manage their brochure launch dates ahead of time.) In fact, in this project, we have no less than 6-8 dates in the database that we’re comparing data on and it’s easy to get lost in the logic when trying to understand it.
Now, there isn’t anything inheriently wrong with how this condition is constuctued. As a matter of personal taste, I find it annoying to mentally parse. Also, I find having to write now() more than once in a WHERE clause to feel like I’m repeating myself.
Read it outloud…
“WHERE the brochures published at date is less than and/or equal to right now AND the archived date is greater than and/or equal to now.”
Who talks like that?
Luckily, there is a better and in my opinion, a more readable way to express this is with the BETWEEN construct in SQL. (postgresql docs, mysql docs)
SELECT * FROM brochures WHERE now() BETWEEN published_at AND archived_at
Let’s read this outloud…
“WHERE the current date is between the published at and archived at dates.”
This sounds more natural to me.
Additionally, you can also do the inverse with NOT.
SELECT ... WHERE now() NOT BETWEEN brochures.published_at AND brochures.archive_at
Remember kids, “code is for humans first and computers second.”—Martin Fowler
more »
The 8-Hour Rails Code Audit »
Created at: 20.10.2009 15:13, source: Robby on Rails, tagged: Business Ruby on Rails ruby programming PLANET ARGON code codeaudit agile programming planetargon audit
While our team is typically focused on larger client and internal projects, we do get an opportunity to assist businesses on a much smaller scale. Whether this be through retainer-based consulting or through code audits, we have seen a lot of Ruby on Rails code over what has nearly been… five years!? We’ve been able to compile a fairly extensive checklist that we use in our code audit process that we’ve decided to streamline it into a smaller product.
Historically, this service has ranged anywhere from $2000-6000, depending the size and scope of the projects, but we want to help smaller startups1 and projects outline a roadmap for how they can begin to refactor and optimize their existing code base so that they can be more efficient at the start of 2010. So, we’ve scaled things down into an extremely affordable flat-rate package where we work off of a pre-defined number of hours.[2]
Through the end of 2009, we’re now offering the 8-Hour Rails Code Audit package for just $1000 USD (details).
We’re currently limiting this service to just two projects per week, so reserve your spot now.
1 Larger projects are welcome to benefit from this service and custom quotes are available upon request.
2 As always, we’re happy to discuss longer engagements.
Related Posts
more »
Flash Message Conductor now a Gem »
Created at: 13.10.2009 18:30, source: Robby on Rails, tagged: Ruby on Rails PLANET ARGON gem plugins github development code rubyonrails
We’ve been doing some early (or late… if you’re a half-full kind of person) spring cleaning on some of our projects. One of the small projects, flash_message_conductor, which we released last year as a plugin is now a gem. We’ve been moving away from using plugins in favor of gems as we like locking in specific released versions and being able to specify them in our environment.rb file is quite convenient.
To install, just run the following:
sudo gem install flash-message-conductor --source=http://gemcutter.org
Successfully installed flash-message-conductor-1.0.0
1 gem installed
Installing ri documentation for flash-message-conductor-1.0.0...
Installing RDoc documentation for flash-message-conductor-1.0.0...
You’ll then just need to include the following in your config/environment.rb file.
Rails::Initializer.run do |config|
# ...
config.gem 'flash-message-conductor', :lib => 'flash_message_conductor', :source => "http://gemcutter.org"
endYou can take a peak at the README for usage examples.
We’ll be packaging up a handful of our various plugins that we reuse on projects and moving them to gems. Stay tuned… :-)
more »


