Ruby, Concurrency, and You »
Created at: 14.10.2011 19:41, source: Engine Yard Blog, tagged: Open Source Technology 1.8 1.9 concurrency GIL implementations ironruby jruby macruby maglev MRI parallelism rubinius ruby threads
| Ruby Implementation | Concurrency | Parallelism |
|---|---|---|
| MRI 1.8 | ✔ | |
| MRI 1.9 | ✔ | |
| Rubinius 1 | ✔ | |
| Rubinius 2 | ✔ | ✔ |
| JRuby | ✔ | ✔ |
| MacRuby | ✔ | ✔ |
| Maglev | ✔ | |
| IronRuby | ✔ | ✔ |
A big topic in the world of Ruby this year has been how to get more out of Ruby, specifically, how to get more done in parallel. The topic of concurrency, though, is one fraught with misunderstanding. This is largely due to the complexities of not only thinking about multiple things at once, but the limitations of Ruby implementations and operating systems.
In this article, I’ll lay the groundwork for understanding the difference between concurrency and parallelism. Then, I’ll look at how a programmer experiences them.
Concurrency vs. Parallelism
This has been discussed many times, but I sometimes still have difficulty with it. Let’s first break down the definitions of these two words:
- Concurrent: existing, happening, or done at the same time
- Parallel: occurring or existing at the same time or in a simple way
Hmm, ok. Well, that hasn’t improved our thinking about these two topics. We need to dig deeper into how the world of computing applies to these words. Rather than looking at the abstract, let’s instead consider some real world examples.
A “Real World” Example
Let’s say you’ve sat down for the evening to complete tomorrow’s homework. This evening you’ve got both Math and History worksheets to fill out. Tonight for some reason, you decide to do one problem in Math, then one problem in History, then back to Math, etc until all the problems are done.
In the parlance of computing, you’re now doing your Math and History worksheets concurrently. This is because your Current task list includes 2 items: Math worksheet and History worksheet.
Now, clearly you the reader can see a problem here. By switching back and forth, completing your homework will probably take longer than if you did the complete Math worksheet then did the History worksheet. In other words, if you did the worksheets in serial.
So, if concurrent means “having multiple outstanding tasks at once”, then what is parallel? Parallel is the ability to make progress on multiple tasks simultaneously.
Let’s say you’ve been asked to read the book One O’Clock Jump by Lise McClendon. You also need to drive down to San Diego for Comic-Con. Thankfully you find that One O’Clock Jump is available on audiobook!
You can now listen to the book while driving. You’re simultaneously making progress on two separate tasks. This is the equivalent of parallelism in computing.
I hope that these real world examples help illustrate the difference between concurrency and parallelism. Now let's apply this newfound knowledge to Ruby.
Back to Ruby
One reason this problem can be difficult to understand is because Ruby only provides a single mechanism for concurrency. But, whether or not these Threads are parallel depends on a number of factors.
MRI 1.8
Let’s look at MRI 1.8 (and MRI forks such as REE) to begin with, because it has the simplest model. MRI 1.8 uses a technique known as “green threads” to implement Threads. This means that every once in a while (around 100 milliseconds), the program says “oh, I should let another thread run now!” This saves the current info into the current thread and restores another thread. This is exactly like our homework example above. We can have as many things as we’d like in our task list, but we can only make progress on one of them at a time.
There is a wrinkle in the concurrency/parallelism game that I haven’t mentioned before now. This wrinkle is IO, namely how Threads interact when waiting for some external event. MRI 1.8.7 is quite smart, and knows that when a Thread is waiting for some external event (such as a browser to send an HTTP request), the Thread can be put to sleep and be woken up when data is detected. This simple consolation improves the usage of Threads so much that for a very long time the MRI 1.8.7 model was good enough for all Ruby programs.
MRI 1.9
Switching back to Ruby implementations, let’s look at MRI 1.9. As has been previously reported, MRI 1.9 removes the “green threads” we had in MRI 1.8 and uses native threads to implement the Thread class. Now, what are these “native threads”? These are are units of concurrency that the underlying operating system is aware of. A big reason to switch to use native threads is that it vastly simplifies the implementation of Threading. The operating system handles the low level parts of saving and restoring Thread information in a completely transparent way. Additionally, letting the OS know what parts of a program should be concurrent allows it to use the full resources of the computer to make that happen. In this modern world, that means using multiple cores.
Up until now, all we’ve talked about with Ruby’s Threading model was about concurrency, the ability to have multiple outstanding tasks at once. Now when we add in the idea of multiple cores, we can finally talk about parallelism. When a computer includes multiple cores (which is pretty much every computer now), those cores can run different code simultaneously, providing true parallelism. When a computer only has one core, there is no true parallelism, instead there is just simple concurrency, even at the OS level. The OS manages all the processes and threads in the system the same way you handled your Math and History worksheets, doing one for a little while, then grabbing another one.
Back to multiple cores though. Now that there is the opportunity to run things truly in parallel, we have to look at if Ruby can take advantage of that. Since MRI 1.9 uses OS threads, it can actually spread out your Ruby Threads to multiple cores!
Unfortunately, MRI 1.9 prevents the Ruby code itself from running in parallel by requiring that any thread running Ruby code hold a lock. This lock is commonly knows as the GIL (Global Interpreter Lock) or GVL (Global VM Lock).
There are a few reasons the GIL to exists, but for this discussion we will say that it’s because the non-Ruby parts of MRI 1.9 are not thread-safe. This means if data were manipulated by multiple threads at the same time, the data could become corrupt. The important thing for this post is how it applies to parallelism: the GIL inhibits parallelism within Ruby code.
MRI 1.9 uses the same technique as MRI 1.8 to improve the situation, namely the GIL is released if a Thread is waiting on an external event (normally IO) which improves responsiveness. MRI 1.9 also includes an experimental API that C extensions can use to run some C code without the GIL locked to utilize parallelism. This API is very restrictive though because no Ruby object may be accessed in any way while the GIL is not held by the current thread.
That about sums up the situation with MRI 1.8 and 1.9 with regards to concurrency and parallelism. Both provide concurrency of Ruby code, but neither provide parallelism of Ruby code.
Rubinius
Let’s take a quick look at other Ruby implementations where things are a bit different than MRI. I’ll start with Rubinius, since it’s the one I’m most familiar with. Rubinius 1.x also had a GIL and worked pretty much the same as MRI 1.9. With the upcoming 2.0 release though, the GIL will be removed, allowing Ruby code to run fully concurrent and fully parallel. We think this opens up a lot of uses for Ruby (parallel algorithms, etc) that Rubinius couldn’t handle well previously.
JRuby
JRuby layers the Thread class on top of Java’s thread class, so the threading model is whatever the JVM supports. That being said, OpenJDK is the primary JVM; it puts a Java thread directly onto an OS thread with no GIL. Thusly, JRuby almost always has full concurrency and parallelism available to it.
MacRuby
MacRuby also uses Cocoa’s NSThread as its abstraction, which runs without a GIL. So, this is another fully parallel implementation.
Maglev
Maglev runs directly on top of a Smalltalk VM and thusly layers the Thread class on top of a concept called Smalltalk Processes. In this case, the GemStone VM implements Processes in the same way as MRI 1.8, namely via “green threads” that don’t expose concurrency to the OS, and therefore, have no parallelism.
IronRuby
Lastly, IronRuby layers Thread directly on top of CLR’s threads without a GIL.
Conclusion
I hope that this helps to clear up what concurrency and parallelism are and how the different Ruby implementations address them. Having this understanding is critical for discussing and understanding topics such and thread-safety of libraries and performance of applications.
In future posts, we’ll look to build on this knowledge to help you make the best use of Ruby!
more »
Concurrency in JRuby »
Created at: 23.07.2011 00:22, source: Engine Yard Blog, tagged: Open Source Technology concurrency jruby
This is a recap of my talk on the same subject at EventMachine RubyConf in Baltimore on the final day of RailsConf 2011.
Concurrency is a hotly debated subject in the Ruby community. Shared state or shared nothing? Threads or Events? Sync or Async? The fact that the standard Ruby interpreter does not provide multiple-core saturation without resorting to process management clouds the issue, causing developers to constantly evaluate new approaches for using all available CPUs.
JRuby enters the discussion, sporting its use of native (kernel) threads, allowing single-process access to all of your cores. Is true concurrently-executing Ruby code obtained simply by switching to JRuby? Before you think that JRuby will make your threaded code run faster, we need to take a step back and explain.
Mental Model
[caption id="attachment_9567" align="alignleft" width="240" caption="model_team_canada by Dept of Energy Solar Decathlon, on Flickr"]
[/caption]
First, a new mental model is needed. Although JRuby is just another Ruby implementation, it's also a new tool running on a completely different VM, the Java Virtual Machine, which has performance characteristics much different than Ruby's VM. These characteristics vary due to the use of native threads compared to green threads, the JVM's sophisticated garbage collection facilities, and most importantly JRuby's own codebase. So your assumptions about how code works do not carry across Ruby implementations. Code that previously ran slow may now be fast and vice versa.
Uncertainty
Adding to the uncertainty of the situation is the unpredictability of native threads. Have you ever seen "should never happen" comments in code, where some programmer was convinced that a branch of code was completely unreachable? If the code branches based on a piece of shared state corrupted by multiple threads scheduled across multiple cores, the impossible code just might end up executing.
/* ruby/struct.c */
static VALUE rb_struct_equal(VALUE s, VALUE s2) {
/* ... */
if (RSTRUCT_LEN(s) != RSTRUCT_LEN(s2)) {
rb_bug("inconsistent struct"); /* should never happen */
}
/* ... */
}
Here's a hypothetical example running on some fictitious native-threaded, optimizing Ruby VM. Say we have this singleton object that's expensive to create, so the programmer wrote it to be constructed lazily.
class ExpensiveToCreate
def self.instance
@instance ||= ExpensiveToCreate.new
end
end
As we all know, the "or-equals" operator is really just sugar for the following code:
class ExpensiveToCreate
def self.instance
unless defined?(@instance)
@instance = ExpensiveToCreate.new
end
@instance
end
end
Now let's play the role of the optimizing VM. Let's say that this VM decides to inline the new method like so:
class ExpensiveToCreate
def self.instance
unless defined?(@instance) # Line 3
@instance = ExpensiveToCreate.allocate
@instance.initialize # Line 5
end
@instance
end
end
What if two threads try to initialize the instance at the same time? Trouble! We have the potential for a race: the first thread on line 5 is not finished performing the expensive initialization, but the @instance variable has already been defined, so the second thread will happily return the uninitialized instance and try to use it. (Some of you will recognize this as a variation on the Double-checked locking problem.)
So does this mean that we need to be extra vigilant with our code, sprinkling it with mutex blocks everywhere? Will it become an unreadable, unmaintainable mess? Certainly not, as long as we follow a simple rule:
Avoid shared, mutable state.
This includes lazy initialization, which is mutating shared state at the time it is first accessed.
The consequences of programming with real threads are difficult to conceptualize at first, especially if you're used to Ruby's green threads or Ruby 1.9's global interpreter lock (GIL). Consider this code:
data = []
M.times do |m|
threads << Thread.new do
N.times do |n|
data << m * n
end
end
end
What happens to the data array after all threads have finished? Under Ruby 1.8 and Ruby 1.9, we always get an array of integers of size M * N. There may be a little randomness in the ordering of the entries, but otherwise the array is intact and well-behaved.
Under JRuby, arrays (as well as strings, hashes and other core library data structures) are not safe for mutation by multiple threads. So when we run the code above with JRuby, the array and its internals become corrupted. The array's size is frequently less than M * N, and what's more, we often observe some of the entries are nil rather than the integers we expect. Sometimes we'll encounter a ConcurrencyError raised as well.
This uncertainty can be the cause of some nasty, hard-to-pinpoint bugs in your code. So if your code works well with Ruby but blows up with unexpected nils or otherwise unexplained behavior, you can at least start to point the blame at threaded code that mutates state.
What about metaprogramming in the presence of threads? Can we corrupt the interpreter by defining classes and/or methods from many threads at once? Fortunately the answer here is no. JRuby explicitly takes steps to ensure that class and method definition are properly synchronized internally. Also, since class variables are frequently used for sharing state between objects, they are synchronized as well.
Using Native Threads
As you might expect, using native threads in JRuby is as simple as working with the regular Ruby Thread class. (Note that there are some caveats). For example, you can easily offload some computation to the background:
require 'java'
@count = java.util.concurrent.atomic.AtomicInteger.new
def send_email(message)
Thread.new do
# send the message
puts "sent #{@count.incrementAndGet} emails"
end
end
send_email("hello")
For systems with large volumes of email, this naive approach may not work well. Native threads carry a bigger initialization cost and memory overhead than green threads, so JRuby normally cannot support more than about 10,000 threads.
$ jruby -e '100_000.times { Thread.new { sleep 1 } }'
ThreadError: unable to create new native thread
To work around this, we can use a thread pool. Using JRuby's Java integration, we can easily access the built-in Executor classes:
java_import java.util.concurrent.Executors
def send_email(message, executor)
executor.submit do
# send the message
puts "sent #{@count.incrementAndGet} emails"
end
end
executor = Executors.newCachedThreadPool
send_email("hello", executor)
executor = Executors.newFixedThreadPool(2)
10.times do
send_email("hello", executor)
end
Here we're using two thread pools. The first, the "cached" thread pool, is a general-purpose pool that grows as needed by demand and frees up system resources by releasing threads after they have been idle. The second example uses a fixed pool of two threads for when you want a place hard limit on the amount of background processing.
Java's java.util.concurrent package has a number of useful utilities like these for concurrent programming including locks, semaphores, latches, queues, concurrent lists and maps, and atomic objects such as the AtomicInteger used above. And they're all trivially available to you via JRuby.
Concurrency with Actors
The shift in thinking around concurrent programming in recent years has been around the development of higher-level abstractions. This arose out of the realization that lower-level coding with fine-grained locks is hard: it's error-prone, makes code less readable and maintainable, and is difficult to troubleshoot. The upside of this is that we get to leave the hard stuff to the library programmers who create the implementations of these abstractions.
Of all the higher level ways of doing concurrent programming, the actor model has become preferred in recent years coincident with the rise in popularity of Erlang where the actor model has been proven to work well.
Ruby has had a number of Actor frameworks for some time, including a recent entry, Celluloid. (Be sure to watch Celluloid's creator Tony Arcieri in a screencast for EMRubyConf.) While these all work great on JRuby, again I'd like to focus on two Java libraries that are just as accessible from JRuby but go above and beyond what is currently possible with the pure Ruby libraries.
Jetlang/Jretlang
Jetlang isn't quite a full actor library, but instead claims to be "designed specifically for high performance in-memory messaging". (Jretlang is the JRuby wrapper around Jetlang). The main primitives in Jetlang are Fibers and Channels. Here's a "Hello World" example:
require 'java'
require 'jretlang'
fiber = JRL::Fiber.new
channel = JRL::Channel.new
fiber.start
channel.subscribe_on_fiber(fiber) do |msg|
puts msg
end
channel.publish "Hello"
If you want a toolkit for building a message-passing framework in your application, give Jretlang a look.
Akka
Akka is a platform and toolkit for concurrent, scalable, and fault-tolerant systems. It has many features inspired by Erlang, including many flavors of actors and fault-tolerant supervisor hierarchies. Here's the simplest-possible Akka-in-Ruby example:
require 'akka'
class HelloWorld
def hi
puts "hello actor world"
end
end
Actors.actorOf(HelloWorld.new).hi
puts "initiating shutdown..."
Actors.delayedShutdown 1
When run, this example prints:
$ ./runner.sh lib/everything_is_an_actor.rb [...lots of akka logging...] initiating shutdown... hello actor world
Of note here is that we're creating an actor reference to the HelloWorld object and sending it the #hi message, but that does call the method immediately. Instead, the message is routed through Akka and back to the object later.
JRuby Makes Concurrency Easy
Once again, the long and short of the concurrency story on JRuby is buttressed by the ease of which you can access both the best of Ruby and Java libraries. Go forth and glue together concurrency-heavy applications with JRuby, and please share them with us!
more »
Concurrency with Actors, Goroutines & Ruby »
Created at: 02.12.2010 19:56, source: igvita.com, tagged: ruby agent concurrency go
The world of concurrent computation is a complicated one. We have to think about the hardware, the runtime, and even choose between half a dozen different models and primitives: fork/wait, threads, shared memory, message passing, semaphores, and transactions just to name a few. Hence, not surprisingly, when Bruce Tate asked Matz, in an interview for his recent book (Seven Languages in Seven Weeks) for a feature that he would like to change in Ruby if he could go back in time, the answer was telling: “I would remove the thread and add actors or some other more advanced concurrency features”.
Process Calculi & Advanced Concurrency
It is easy to read a lot into Matz's statement, but the follow-up question is: more advanced concurrency features? Thousands of programmers are graduating each year after being taught about threads, mutexes and semaphores – that is how we do concurrency! Is there a crucial lecture we all missed on the “advanced concurrency” topic?
The answer is, probably not, to a large extent due to the "success" of the shared-memory model. Process calculi is the formal name for the study of many related approaches of modeling the behavior of concurrent systems, which provides many alternatives: CCS, CSP, ACP, and Actor models just to name a few. However, few of those acronyms have found their way into our lexicon, which is somewhat surprising given that most of them have their roots dating back to the early 1970's - hardly new stuff, but also rarely mentioned until recently.
Actors, CSP and Pi-calculus
The actor concurrency model, which is now gaining traction thanks to the recent success of the languages such as Erlang and Scala is a great example of an “alternative concurrency model” that is worth exploring. However, it is also not the only one. Google’s Go also brought back a related, but also a very different model: a mix of Tony Hoare's CSP and pi-calculus. On the surface, all very different languages, but also all based around a common core principle:
“Don’t communicate by sharing state; share state by communicating”
Let that sink in for a minute. Instead of protecting our data structures by locks, and then contending to acquire the lock, this model encourages us to explicitly pass state from process to process. This guarantees that only one process can have access to data at a given time and immediately eliminates an entire class of concurrency problems.
Actors, Goroutines, Channels & Ruby
So, given the similarity, what are the actual differences between languages such as Erlang and Go? Syntax and VM implementation details aside, in Erlang we communicate by giving each individual process a name - think mailbox. By contrast, in Go, all processes are anonymous and we name the "communication channel" instead - think UNIX pipes. A subtle difference, but one that leads to very different programming models and capabilities.

With some theory out of the way, let's take a closer look at Go's concurrency model. Install the Go runtime, or as it turns out, we can also emulate much of its model directly on top of our Ruby runtime (gem install agent):
c = Agent::Channel.new(name: 'incr', type: Integer) go(c) do |c, i=0| loop { c << i+= 1 } end p c.receive # => 1 p c.receive # => 2
A simple generator example, but one that highlights many different features. First, we create a named (‘incr’), typed channel, over which our processes will communicate. Then, the interesting bit: we call the “go” function, which takes a channel and a Ruby codeblock.
Under the hood, our “go” function actually spawns a new thread to execute our code block (the loop statement) and returns immediately to call receive on the original channel. The channel, in turn, blocks on receive until the producer has generated a value! In other words, we have just written a multi-threaded producer-consumer, except that there is not a single thread or a mutex in sight! Also take a look at a more interesting multi-worker example, sieve of Eratosthenes, and the results of a non-scientific shootout between MRI and JRuby.
Adopting "Advanced Concurrency"
Shared memory, threads and locks have their place and purpose. In fact, if you look under the hood of Agent, or within the source of any runtime with an “alternative concurrency model”, you will undoubtedly find them there at work. So, the question is not whether threads need to exist, but rather, whether they actually make for the best high-level interface to write, test, and manage code that requires concurrency, regardless of runtime.
Actors, and CSP/pi-calculus models may appear complicated at first sight, but mostly so because they are unfamiliar. In fact, they are all remarkably simple, powerful, and reproducible within any runtime once you have a few examples under your belt. Boot into Erlang, give Go a try, or install Agent to prototype some ideas with Ruby, it will be time well spent.
more »

