Garbage Collection and the Ruby Heap (from railsconf) »
Created at: 08.06.2010 19:38, source: time to bleed by Joe Damato, tagged: debugging ruby scaling systems x86 debug garbage collection GC linux ltrace memory performance profiling x86_64
Download as PDF (15mb)
Garbage Collection and the Ruby Heap
more »
Descent into Darkness: Understanding your system’s binary interface is the only way out »
Created at: 15.03.2010 21:11, source: time to bleed by Joe Damato, tagged: bugfix debugging linux ruby scaling systems x86 debug garbage collection GC memory performance syscall x86_64
Download as PDF (3mb)
Descent into Darkness: Understanding your system’s binary interface is the only way out.
more »
Garbage Collection Slides from LA Ruby Conference »
Created at: 21.02.2010 00:03, source: time to bleed by Joe Damato, tagged: bugfix debugging ruby debug garbage collection GC memory performance profiling
Garbage Collection and the Ruby Heap
more »
String together global offset tables to build a Ruby memory profiler »
Created at: 25.01.2010 14:59, source: time to bleed by Joe Damato, tagged: debugging linux ruby systems x86 debug memory profiling x86_64

If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.
Disclaimer
The tricks, techniques, and ugly hacks in this article are PLATFORM SPECIFIC, DANGEROUS, and NOT PORTABLE.
This is the third article in a series of articles describing a set of low level hacks that I used to create memprof a Ruby level memory profiler. You should be able to survive without reading the other articles in this series, but you can check them out here and here.
How is this different from the other hooking articles/techniques?
The previous articles explained how to insert trampolines in the .text segment of a binary. This article explains a cool technique for hooking functions in the .text segment of shared libraries, allowing your handler to run, and then resuming execution. Hooking shared libraries turns out to be less work than hooking the binary (in the case of Ruby, that is), but making it all happen was a bit tricky. Read on to learn more.
The “problem” with shared libraries
The problem is that if a trampoline is inserted into the code of the shared library, the trampoline will need to invoke the dynamic linker to resolve the function that is being hooked, call the function, do whatever additional logic is desired, and then resume execution.
In other words you need to (somehow) insert a trampoline for a function that will call the function being trampolined without ending up in an infinite loop.
The additional complexity occurs because when shared libraries are loaded, the kernel decides at runtime where exactly in memory the library should be loaded. Since the exact location of symbols is not known at link time, a procedure linkage table (.plt) is created so that the program and the dynamic linker can work together to resolve symbol addresses.
I explained how .plts work in a previous article, but looking at this again is worthwhile. I’ve simplified the explanation a bit1, but at a high level:
- Program calls a function in a shared object, the link editor makes sure that the program jumps to a stub function in the
.plt - The program sets some data up for the dynamic linker and then hands control over to it.
- The dynamic linker looks at the info set up by the program and fills in the absolute address of the function that was called in the
.pltin the global offset table (.got). - Then the dynamic linker calls the function.
- Subsequent calls to the same function jump to the same stub in the
.plt, but every time after the first call the absolute address is already in the.got(because when the dynamic linker is invoked the first time, it fills in the absolute address in the.got).
Disassembling a short Ruby VM function that calls rb_newobj (a memory allocation routine that we’d like to hook), shows the calls to the .plt:
000000000001af10: . . . . 1af14: e8 e7 c6 ff ff callq 17600 [rb_newobj@plt] . . . .
Let’s take a look at the corresponding .plt stub:
0000000000017600: 17600: ff 25 6a 9c 2c 00 jmpq *0x2c9c6a(%rip) # 2e1270 [_GLOBAL_OFFSET_TABLE_+0x288] 17606: 68 4e 00 00 00 pushq $0x4e 1760b: e9 00 fb ff ff jmpq 17110 <_init+0x18>
Important fact: The program and each shared library has its own .plt and .got sections (amongst other sections). Keep this in mind as it’ll be handy very shortly.
That is a lot of stub code to reproduce in the trampoline. Reproducing that stuff in the trampoline shouldn’t be hard, but invites a large number of bugs over to play. Is there a better way?
What is a global offset table (.got)?
The global offset table (.got) is a table of absolute addresses that can be filled in at runtime. In the assembly dump above, the .got entry for rb_newobj is referenced in the .plt stub code.
Intercepting a function call
It would be awesome if it were possible to overwrite the .got entry for rb_newobj and insert the address of a trampoline. But how would the intercepting function call rb_newobj itself without ending up in an infinite loop?
The important fact above comes in to save the day.
Since each shared object has its own .plt and .got sections, it is possible to overwrite the .got entry for rb_newobj in every shared object except for the object where the trampoline lives. Then, when rb_newobj is called, the .plt entry will redirect execution to the trampoline. The trampoline then calls out to its .plt entry for rb_newobj which is left untouched allowing rb_newobj to be resolved and called out to successfully.
Not as easy as it sounds, though
This solution is less work than the other hooking methods, but it has its own particular details as well:
- You’ll need to walk the link map at runtime to determine the base address for the shared library you are hooking (it could be anywhere).
- Next, you’ll need to parse the
.rela.pltsection which contains information on the location of each.pltstub, relative to the base address of the shared object. - Once you have the address of the
.pltstub, you’ll need to determine the absolute address of the.gotentry by parsing the first instruction of the.pltstub (ajmp) as seen in the disassembly above. - Finally, you can write to the
.gotentry the address of your trampoline, as long as the trampoline lives in a different shared library.
You’ve now successfully managed to poison the .got entry of a symbol in one shared library to direct execution to your own function which can then call the intercepted function itself without getting stuck in an infinite loop.
Conclusion
- There are lots of sections in each ELF object. Each section is special and important.
- ELF documentation can be difficult to obtain and understand.
- Got pretty lucky this time around. I was getting a little worried that it would get complicated. Made it out alive, though.
Thanks for reading and don’t forget to subscribe (via RSS or e-mail) and follow me on twitter.
References
more »
What is a ruby object? (introducing Memprof.dump) »
Created at: 14.12.2009 14:59, source: time to bleed by Joe Damato, tagged: debugging ruby debug garbage collection GC memory profiling

The initial Memprof release only offered a simple stats api, inspired by the one in bleak_house:
require 'memprof' Memprof.start o = Object.new Memprof.stats
1 test.rb:3:Object
With the help of lloyd’s excellent yajl json library, I’ve slowly been building a full-featured heap dumper: Memprof.dump.
require 'memprof' Memprof.start [] Memprof.dump
[
{
"address": "0xea52f0",
"source": "test.rb:3",
"type": "array",
"length": 0
}
]
Where can I find it?
This new heap dumper will be in the next release of Memprof. If you want to play with it, checkout the heap_dump branch on github.
What else is planned?
Over the next few days, I’m going to add a Memprof.dump_all method to dump out the entire ruby heap. This full dump will contain complete knowledge of the ruby object graph (what objects point to other objects), and its json format will allow for easy analysis. I’m envisioning a set of post-processing tools that can find leaks, calculate object memory usage, and generate various visualizations of memory consumption and object hierarchies.
Why should I care?
In building and testing Memprof.dump, I’ve learned a lot about different types of ruby objects. The rest of this post covers interesting details about common ruby objects, with examples of how they’re created and what they look like inside the MRI VM.
Objects and Floats
o = Object.new o.instance_variable_set(:@pi, 3+0.14159)
{
"address": "0x1823dd8",
"source": "test.rb:3",
"type": "object",
"class": "0x1854b38",
"class_name": "Object",
"ivars": {
"@pi": "0x1823da0"
}
}
This ruby object points to its class (Object 0x1854b38) and has some instance variables- here, there’s only one variable named @pi that points to another object at 0x1823da0.
The address 0x1823da0 belongs to a float object- this float was created on the heap when MRI executed the code 3 + 0.14159.
{
"address": "0x1823da0",
"source": "test.rb:4",
"type": "float",
"data": 3.14159
}
The float 0.14159 used in the addition also lives on the heap, but it is created upfront once when the ruby source is parsed.
Strings
Unlike floats, new string objects are created every time ruby encounters a string in its execution path.
1.times{"abc"}
{
"type": "string",
"shared": "0x15136a0",
"flags": ["elts_shared"]
}
This newly created string object has no character data associated with it; instead, it is marked elts_shared and points to 0x15136a0. In this case, 0x15136a0 is another string object- one that holds the actual data “abc” and was created earlier when the ruby source was parsed.
Arrays and Fixnums
[1,2,3,"hello"]
{
"type": "array",
"length": 4,
"data": [
1,
2,
3,
"0x12aa0c0"
]
}
The fixnums 1, 2 and 3 in the array are immediates, so they live in the array itself and do not occupy slots on the ruby heap1. The fourth member is the string object “hello” that lives at 0x12aa0c0.
Hashes and Symbols
{:a=>1,"b"=>:c}
{
"type": "hash",
"length": 2,
"default": null,
"data": {
"0xd13378": ":c",
":a": 1
}
}
The symbols :a and :c are also immediates, so they live directly inside the hash’s data table. The key for “b” is a pointer to that string object at 0xd13378.
Blocks and Data
Hashes can also be created with a default block.
Hash.new{|h,k| h[k] = k; h }
{
"type": "hash",
"length": 0,
"default": "0xcca208"
},
{
"address": "0xcca208",
"type": "data",
"class": "0xcced80",
"class_name": "Proc"
}
In this case, the block is converted to a new Proc data object that holds a reference to an internal struct BLOCK2. The new hash’s default field points to the address of the Proc.
Data objects are commonly created by C extensions to point to external memory that needs to be marked and freed using ruby’s garbage collector.
Classes
A simple class definition creates many objects on the heap.
class MyClass; end
First is the class itself, along with the class’s string representation (pointed to by an internal ivar __classpath__). Notice the class object holds a reference to its superclass.
{
"address":"0x29f3228",
"type": "class",
"name": "MyClass",
"super": "0x2a23b28",
"super_name": "Object",
"ivars": {
"__classpath__": "0x29f31b8"
}
},
{
"address": "0x29f31b8",
"type": "string",
"length": 7,
"data": "MyClass",
}
The class definition also creates two more objects- an internal CREF node, and another singleton class with no name that is __attached__ to MyClass.
{
"type": "node",
"node_type": "CREF",
},
{
"type": "class",
"name": null,
"super": "0x2a23a80",
"super_name": null,
"singleton": true,
"ivars": {
"__attached__": "0x29f3228"
}
}
This singleton is MyClass’s metaclass, where singleton methods and instance variables are added.
MyClass.instance_variable_set(:@a, 123)
{
"type": "class",
"name": null,
"singleton": true,
"ivars": {
"__attached__": "0x29f3228",
"@a": 123
}
}
Constants, Class and Instance Variables
Classes store both constants and class variables along with the instance variables.
class MyClass A=1 @@b=2 @c=3 end
{
"type": "class",
"name": "MyClass",
"ivars": {
"@@b": 2,
"A": 1,
"@c": 3
}
}
Methods
Methods are stored in a separate method table and represented by METHOD node objects which hold the method body.
class MyClass def d() end end
{
"type": "class",
"name": "MyClass",
"methods": {
"d": "0xb7ec30"
}
},
{
"address": "0xb7ec30",
"type": "node",
"node_type": "METHOD",
}
Method Invocation
def test() a=1 b=:b c='c' Memprof.dump3 end test()
{
"type": "scope",
"node": "0xa9bdd0",
"variables": {
"_": null,
"~": null,
"a": 1,
"b": ":b",
"c": "0xb60ce8"
}
}
During method invocation, a new scope object is created on the heap. This scope points to the node object representing the method body, and has a list of all local variables.
The local variables include the perl-style ruby magic variables $_ and $~.
Modules and IClasses
Modules in ruby are similar to classes and have the same associated strings and CREF nodes created with them.
module MyModule; end
{
"address": "0xe82248",
"type": "module",
"name": "MyModule",
"super": false,
"ivars": {
"__classpath__": "0x208eda8",
"__classid__": ":MyModule"
}
}
When a module is included into a class, an extra iclass object is created:
class MyClass include MyModule end
{
"address": "0x208ecc8",
"source": "-e:1",
"type": "iclass",
"super": "0x20bfb40",
"super_name": "Object",
"ivars": {
"__classpath__": "0x208eda8",
"__classid__": ":MyModule"
}
}
This new iclass points to MyClass’s old superclass, and shares its instance variable and method tables with MyModule. Once created, this iclass becomes MyClass’s new superclass.
{
"type": "class",
"name": "MyClass",
"super": "0x208ecc8",
"super_name": "MyModule",
}
and more..
Ruby has various other internal object types, including Regexps, Matches, Bignums, Structs, Files, Varmaps, and almost 130 different types of Nodes. Memprof will eventually be able to dump out all these objects in individual detail.
- Fixnums can, however, still have instance variables
- Future versions of memprof will print out
struct BLOCKs in more detail, to show all references held by ruby procs - Memprof.dump was called in the method body, because the scope is freed explicitly when the method ends (unless it is referenced by a block).
more »
