Hot patching inlined functions with x86_64 asm metaprogramming »

Created at: 10.12.2009 14:59, source: time to bleed by Joe Damato, tagged: debugging linux ruby systems x86 debug garbage collection GC memory profiling x86_64


If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.

Disclaimer

The tricks, techniques, and ugly hacks in this article are PLATFORM SPECIFIC, DANGEROUS, and NOT PORTABLE.

This article will make reference to information in my previous article Rewrite your Ruby VM at runtime to hot patch useful features so be sure to check it out if you find yourself lost during this article.

Also, this might not qualify as metaprogramming in the traditional definition1, but this article will show how to generate assembly at runtime that works well with the particular instructions generated for a binary. In other words, the assembly is constructed based on data collected from the binary at runtime. When I explained this to Aman, he called it assembly metaprogramming.

TLDR

This article expands on a previous article by showing how to hook functions which are inlined by the compiler. This technique can be applied to other binaries, but the binary in question is Ruby Enterprise Edition 1.8.7. The use case is to build a memory profiler without requiring patches to the VM, but just a Ruby gem.

It’s on GitHub

The memory profiler is NOT DONE, yet. It will be soon. Stay tuned.

The code described here is incorporated into a Ruby Gem which can be found on github: http://github.com/ice799/memprof specifically at: http://github.com/ice799/memprof/blob/master/ext/memprof.c#L202-318

Overview of the plan of attack

The plan of attack is relatively straight forward:

  1. Find the inlined code.
  2. Overwrite part of it to redirect to a stub.
  3. Call out to a handler from the stub.
  4. Make sure the return path is sane.

As simple as this seems, implementing these steps is actually a bit tricky.

Finding pieces of inlined code

Before finding pieces of inlined code, let’s first examine the C code we want to hook. I’m going to be showing how to hook the inline function add_freelist.

The code for add_freelist is short:

static inline void
add_freelist(p)
    RVALUE *p;
{
    if (p->as.free.flags != 0)
        p->as.free.flags = 0;
    if (p->as.free.next != freelist)
        p->as.free.next = freelist;
    freelist = p;
}

There is one really important feature of this code which stands out almost immediately. freelist has (at least) compilation unit scope. This is awesome because freelist serves as a marker when searching for assembly instructions to overwrite. Since the freelist has compilation unit scope, it’ll live at some static memory location.

If we find writes to this static memory location, we find our inline function code.

Let’s take a look at the instructions generated from this C code (unrelated instructions snipped out):

  437f21:       48 c7 00 00 00 00 00    movq   $0x0,(%rax)
   . . . . .
  437f2c:       48 8b 05 65 de 2d 00    mov    0x2dde65(%rip),%rax  # 715d98 [freelist]
   . . . . .
  437f48:       48 89 05 49 de 2d 00    mov    %rax,0x2dde49(%rip)  # 715d98 [freelist]

The last instruction above updates freelist, it is the instruction generated for the C statement freelist = p;.

As you can see from the instruction, the destination is freelist. This makes it insanely easy to locate instances of this inline function. Just need to write a piece of C code which scans the binary image in memory, searching for mov instructions where the destination is freelist and I’ve found the inlined instances of add_freelist.

Why not insert a trampoline by overwriting that last mov instruction?

Overwriting with a jmp

The mov instruction above is 7 bytes wide. As long as the instruction we’re going to implant is 7 bytes or thinner, everything is good to go. Using a callq is out of the question because we can’t ensure the stack is 16-byte aligned as per the x86_64 ABI2. As it turns out, a jmp instruction that uses a 32bit displacement from the instruction pointer only requires 5 bytes. We’ll be able to implant the instruction that’s needed, and even have room to spare.

I created a struct to encapsulate this short 7 byte trampoline. 5 bytes for the jmp, 2 bytes for NOPs. Let’s take a look:

  struct tramp_inline tramp = {
    .jmp           = {'\xe9'},
    .displacement  = 0,
    .pad           = {'\x90', '\x90'},
  };

Let’s fill in the displacement later, after actually finding the instruction that’s going to get overwritten.

So, to find the instruction that’ll be overwritten, just look for a mov opcode and check that the destination is freelist:

    /* make sure it is a mov instruction */
    if (byte[1] == '\x89') {

      /* Read the REX byte to make sure it is a mov that we care about */
      if ( (byte[0] == '\x48') ||
          (byte[0] == '\x4c') ) {

        /* Grab the target of the mov. REMEMBER: in this case the target is
         * a 32bit displacment that gets added to RIP (where RIP is the adress of
         * the next instruction).
         */
        mov_target = *(uint32_t *)(byte + 3);

        /* Sanity check. Ensure that the displacement from freelist to the next
         * instruction matches the mov_target. If so, we know this mov is
         * updating freelist.
         */
        if ( (freelist - (void *)(byte+7) ) == mov_target) {

At this point we’ve definitely found a mov instruction with freelist as the destination. Let’s calculate the displacement to the stage 2 trampoline for our jmp instruction and write the instruction into memory.

/* Setup the stage 1 trampoline. Calculate the displacement to
 * the stage 2 trampoline from the next instruction.
 *
 * REMEMBER!!!! The next instruction will be NOP after our stage 1
 * trampoline is written. This is 5 bytes into the structure, even
 * though the original instruction we overwrote was 7 bytes.
 */
 tramp.displacement = (uint32_t)(destination - (void *)(byte+5));

/* Figure out what page the stage 1 tramp is gonna be written to, mark
 * it WRITE, write the trampoline in, and then remove WRITE permission.
 */
 aligned_addr = page_align(byte);
 mprotect(aligned_addr, (void *)byte - aligned_addr + 10,
               PROT_READ|PROT_WRITE|PROT_EXEC);
 memcpy(byte, &tramp, sizeof(struct tramp_inline));
 mprotect(aligned_addr, (void *)byte - aligned_addr + 10,
              PROT_READ|PROT_EXEC);

Cool, all that’s left is to build the stage 2 trampoline which will set everything up for the C level handler.

An assembly stub to set the stage for our C handler

So, what does the assembly need to do to call the C handler? Quite a bit actually so let’s map it out, step by step:

  1. Replicate the instruction which was overwritten so that the object is actually added to the freelist.
  2. Save the value of rdi register. This register is where the first argument to a function lives and will store the obj that was added to the freelist for the C handler to do analysis on.
  3. Load the object being added to the freelist into rdi
  4. Save the value of rbx so that we can use the register as an operand for an absolute indirect callq instruction.
  5. Save rbp and rsp to allow a way to undo the stack alignment later.
  6. Align the stack to a 16-byte boundary to comply with the x86_64 ABI.
  7. Move the address of the handler into rbx
  8. Call the handler through rbx.
  9. Restore rbp, rsp, rdi, rbx.
  10. Jump back to the instruction after the instruction which was overwritten.

To accomplish this let’s build out a structure with as much set up as possible and fill in the displacement fields later. This “base” struct looks like this:

  struct inline_tramp_tbl_entry inline_ent = {
    .rex     = {'\x48'},
    .mov     = {'\x89'},
    .src_reg = {'\x05'},
    .mov_displacement = 0,

    .frame = {
      .push_rdi = {'\x57'},
      .mov_rdi = {'\x48', '\x8b', '\x3d'},
      .rdi_source_displacement = 0,
      .push_rbx = {'\x53'},
      .push_rbp = {'\x55'},
      .save_rsp = {'\x48', '\x89', '\xe5'},
      .align_rsp = {'\x48', '\x83', '\xe4', '\xf0'},
      .mov = {'\x48', '\xbb'},
      .addr = error_tramp,
      .callq = {'\xff', '\xd3'},
      .leave = {'\xc9'},
      .rbx_restore = {'\x5b'},
      .rdi_restore = {'\x5f'},
    },

    .jmp  = {'\xe9'},
    .jmp_displacement = 0,
  };

So, what’s left to do:

  1. Copy the REX and source register bytes of the instruction which was overwritten to replicate it.
  2. Calculate the displacement to freelist to fully generate the overwritten mov.
  3. Calculate the displacement to freelist so that it can be stored in rdi as an argument to the C handler.
  4. Fill in the absolute address for the handler.
  5. Calculate the displacement to the instruction after the stage 1 trampoline in order to jmp back to resume execution as normal.

Doing that is relatively straight-forward. Let’s take a look at the C snippets that make this happen:

/* Before the stage 1 trampoline gets written, we need to generate
 * the code for the stage 2 trampoline. Let's copy over the REX byte
 * and the byte which mentions the source register into the stage 2
 * trampoline.
 */
inl_tramp_st2 = inline_tramp_table + entry;
inl_tramp_st2->rex[0] = byte[0];
inl_tramp_st2->src_reg[0] = byte[2];

. . . . . 

/* Finish setting up the stage 2 trampoline. */

/* calculate the displacement to freelist from the next instruction.
 *
 * This is used to replicate the original instruction we overwrote.
 */
inl_tramp_st2->mov_displacement = freelist - (void *)&(inl_tramp_st2->frame);

/* fill in the displacement to freelist from the next instruction.
 *
 * This is to arrange for the new value in freelist to be in %rdi, and as such
 * be the first argument to the C handler. As per the amd64 ABI.
 */
inl_tramp_st2->frame.rdi_source_displacement = freelist -
                                          (void *)&(inl_tramp_st2->frame.push_rbx);

/* jmp back to the instruction after stage 1 trampoline was inserted
 *
 * This can be 5 or 7, it doesn't matter. If its 5, we'll hit our 2
 * NOPS. If its 7, we'll land directly on the next instruction.
 */
inl_tramp_st2->jmp_displacement = (uint32_t)((void *)(byte + 7) -
                                         (void *)(inline_tramp_table + entry + 1));

/* write the address of our C level trampoline in to the structure */
inl_tramp_st2->frame.addr = freelist_tramp;

Awesome.

We’ve successfully patched the binary in memory, inserted an assembly stub which was generated at runtime, called a hook function, and ensured that execution can resume normally.

So, what’s the status on that memory profiler?

Almost done, stay tuned for more updates coming SOON.

Conclusion

  • Hackery like this is unmaintainable, unstable, stupid, but also fun to work on and think about.
  • Being able to hook add_freelist like this provides the last tool needed to implement a version of bleak_house (a Ruby memory profiler) without patching the Ruby VM.
  • x86_64 instruction set is a painful instruction set.
  • Use the GNU assembler (gas) instead of trying to generate opcodes by reading the Intel instruction set PDFs if you value your sanity.

Thanks for reading and don’t forget to subscribe (via RSS or e-mail) and follow me on twitter.

References

  1. http://en.wikipedia.org/wiki/Metaprogramming
  2. x86_64 ABI


more »

10 steps to setup a new VPS (Ubuntu 9.10) »

Created at: 30.11.2009 04:10, source: Hackido, tagged: linux ubuntu

If you use a Virtual Private Server running Ubuntu, there are a number of security related things you want to do before you start deploying production code to it. Here are a list of things I do to every fresh slice. I've tried this on both Linode and Slicehost, but instructions might be different on your VPS depending on what software is installed by default and what repositories are active when your slice first boots.


1. Assuming you only have root access to start, first thing is to add yourself as a user.

adduser myusername


2. Next give yourself sudo privileges by running visudo and adding an entry below the one for root:
myusername   ALL=(ALL) ALL


3. Assuming you already have a public/private key set up, go ahead and upload your public key to the server. From your local machine type this:
scp ~/.ssh/id_rsa.pub myusername@123.456.678.9:/home/myusername/


4. Login to the remote server and move the .pub file into your local .ssh directory
mkdir /home/myusername/.ssh
mv /home/myusername/id_rsa.pub /home/myusername/.ssh/authorized_keys


5. Set the permissions correctly on the key:
chown -R myusername:myusername /home/myusername/.ssh
chmod 700 /home/myusername/.ssh
chmod 600 /home/myusername/.ssh/authorized_keys


6. Next, let's edit the /etc/ssh/sshd_config file and disable some defaults. Here are the adjustments you should make:
PermitRootLogin no
PasswordAuthentication no
X11Forwarding no
UsePAM no
UseDNS no
AllowUsers myusername


A couple of notes here. First, if you are a little more advanced, change the port from the default 22 to something else. Say 28000. Next, if you have more than one user, be sure you separate the usernames with a space rather than a comma.

7. Now let's restart ssh.
/etc/init.d/sshd restart


Open a new terminal window and login to your slice using your regular user account. Don't exit your old shell just yet until you're sure you can ssh in and access sudo without a problem. Done? Good, from here on out I'll assume you are a regular user. Let's proceed..

8. Get some required software. As a test, let's just install a couple of packages we'll want:
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install build-essential
sudo apt-get install git-core


9. Let's get a firewall going. We'll use IPTables here. First, create a test file on your server with the rules you'll need. Let's save it as /etc/iptables.test.rules. In your scenario if you've chosen a port other than 22 for ssh be sure to adjust it below otherwise you'll be very sad..
*filter


# Allows all loopback (lo0) traffic and drop all traffic to 127/8 that doesn't use lo0
-A INPUT -i lo -j ACCEPT
-A INPUT -i ! lo -d 127.0.0.0/8 -j REJECT


# Accepts all established inbound connections
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT


# Allows all outbound traffic
# You can modify this to only allow certain traffic
-A OUTPUT -j ACCEPT


# Allows HTTP and HTTPS connections from anywhere (the normal ports for websites)
-A INPUT -p tcp --dport 80 -j ACCEPT
-A INPUT -p tcp --dport 443 -j ACCEPT


# Allows SSH connections
#
# THE -dport NUMBER IS THE SAME ONE YOU SET UP IN THE SSHD_CONFIG FILE
#
-A INPUT -p tcp -m state --state NEW --dport 22 -j ACCEPT


# Allow ping
-A INPUT -p icmp -m icmp --icmp-type 8 -j ACCEPT


# log iptables denied calls
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables denied: " --log-level 7


# Reject all other inbound - default deny unless explicitly allowed policy
-A INPUT -j REJECT
-A FORWARD -j REJECT

COMMIT


Once we've got the rules set, let's apply them to the server:
sudo iptables-restore < /etc/iptables.test.rules 
sudo iptables-save > /etc/iptables.up.rules


Edit your /etc/network/interfaces and add this line below the loopback and above the main Ethernet. This will make sure we bring up the firewall on every reboot.

pre-up iptables-restore < /etc/iptables.up.rules


10. Let's get your locale setup next:
sudo locale-gen en_US.UTF-8
sudo /usr/sbin/update-locale LANG=en_US.UTF-8


With that done you're ready to go installing new software whether it's a webserver, mail server, git repo, or something completely different. It's all yours for the taking from here on out. Special thanks to the folks at Slicehost for their great tutorials


more »

Force Uninstall a package in Debian/Ubuntu »

Created at: 28.11.2009 20:16, source: Hackido, tagged: linux debian ubuntu

I just did an upgrade from Jaunty to Karmic and for some reason was having a problem with xulrunner-1.9 (why was this even installed on my server in the first place?). Anyway, I couldn't uninstall it using aptitude or apt-get.. and google didn't really help me out. Fortunately I remembered a post I did on my old blog related to lighttpd and that technique helped again here.


First, here's the error I was getting when trying to remove xulrunner-1.9:

The following packages will be REMOVED:
xulrunner-1.9
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
1 not fully installed or removed.
After this operation, 26.9MB disk space will be freed.
Do you want to continue [Y/n]? y
(Reading database ... 45200 files and directories currently installed.)
Removing xulrunner-1.9 ...
update-alternatives: error: no alternatives for xulrunner.
update-alternatives: error: no alternatives for xulrunner.
dpkg: error processing xulrunner-1.9 (--remove):
subprocess installed pre-removal script returned error exit status 2
/usr/lib/xulrunner-1.9.0.15/xulrunner-bin: error while loading shared libraries: libplds4.so: cannot open shared object file: No such file or directory
dpkg: error while cleaning up:
subprocess installed post-installation script returned error exit status 127
Errors were encountered while processing:
xulrunner-1.9
E: Sub-process /usr/bin/dpkg returned an error code (1)


Ugly! The solution is to head over to /var/lib/dpkg/info and find the package in question. There should be a file with .prerm at the end of it. In the case of xulrunner-1.9 the file is called xulrunner-1.9.prerm. Edit the file and change it's contents so it just says:

#!/bin/sh
set -e


After that the standard command will work:

sudo apt-get remove xulrunner-1.9


more »

Rewrite your Ruby VM at runtime to hot patch useful features »

Created at: 23.11.2009 14:59, source: time to bleed by Joe Damato, tagged: bugfix debugging linux monitoring python ruby systems testing x86 allocator debug garbage collection GC memory x86_64


If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.

Some notes before the blood starts flowin’

  • CAUTION: What you are about to read is dangerous, non-portable, and (in most cases) stupid.
  • The code and article below refer only to the x86_64 architecture.
  • Grab some gauze. This is going to get ugly.

TLDR

This article shows off a Ruby gem which has the power to overwrite a Ruby binary in memory while it is running to allow your code to execute in place of internal VM functions. This is useful if you’d like to hook all object allocation functions to build a memory profiler.

This gem is on GitHub

Yes, it’s on GitHub: http://github.com/ice799/memprof.

I want a memory profiler for Ruby

This whole science experiment started during RubyConf when Aman and I began brainstorming ways to build a memory profiling tool for Ruby.

The big problem in our minds was that for most tools we’d have to include patches to the Ruby VM. That process is long and somewhat difficult, so I started thinking about ways to do this without modifying the Ruby source code itself.

The memory profiler is NOT DONE just yet. I thought that the hack I wrote to let us build something without modifying Ruby source code was interesting enough that it warranted a blog post. So let’s get rolling.

What is a trampoline?

Let’s pretend you have 2 functions: functionA() and functionB(). Let’s assume that functionA() calls functionB().

Now also imagine that you’d like to insert a piece of code to execute in between the call to functionB(). You can imagine inserting a piece of code that diverts execution elsewhere, creating a flow: functionA() –> functionC() –> functionB()

You can accomplish this by inserting a trampoline.

A trampoline is a piece of code that program execution jumps into and then bounces out of and on to somewhere else1.

This hack relies on the use of multiple trampolines. We’ll see why shortly.

Two different kinds of trampolines

There are two different kinds of trampolines that I considered while writing this hack, let’s take a closer look at both.

Caller-side trampoline

A caller-side trampoline works by overwriting the opcodes in the .text segment of the program in the calling function causing it to call a different function at runtime.

The big pros of this method are:

  • You aren’t overwriting any code, only the address operand of a callq instruction.
  • Since you are only changing an operand, you can hook any function. You don’t need to build custom trampolines for each function.

This method also has some big cons too:

  • You’ll need to scan the entire binary in memory and find and overwrite all address operands of callq. This is problematic because if you overwrite any false-positives you might break your application.
  • You have to deal with the implications of callq, which can be painful as we’ll see soon.

Callee-side trampoline

A callee-side trampoline works by overwriting the opcodes in the .text segment of the program in the called function, causing it to call another function immediately

The big pro of this method is:

  • You only need to overwrite code in one place and don’t need to worry about accidentally scribbling on bytes that you didn’t mean to.

this method has some big cons too:

  • You’ll need to carefully construct your trampoline code to only overwrite as little of the function as possible (or some how restore opcodes), especially if you expect the original function to work as expected later.
  • You’ll need to special case each trampoline you build for different optimization levels of the binary you are hooking into.

I went with a caller-side trampoline because I wanted to ensure that I can hook any function and not have to worry about different Ruby binaries causing problems when they are compiled with different optimization levels.

The stage 1 trampoline

To insert my trampolines I needed to insert some binary into the process and then overwrite callq instructions like this:

  41150b:       e8 cc 4e 02 00         callq  4363dc [rb_newobj]
  411510:       48 89 45 f8             ....

In the above code snippet, the byte e8 is the callq opcode and the bytes cc 4e 02 00 are the distance to rb_newobj from the address of the next instruction, 0×411510

All I need to do is change the 4 bytes following e8 to equal the displacement between the next instruction, 0×411510 in this case, and my trampoline.

Problem.

My first cut at this code lead me to an important realization: the callq instructions used expect a 32bit displacement from the function I am calling and not absolute addresses. But, the 64bit address space is very large. The displacement between the code for the Ruby binary that lives in the .text segment is so far away from my Ruby gem that the displacement cannot be represented with only 32bits.

So what now?

Well, luckily mmap has a flag MAP_32BIT which maps a page in the first 2GB of the address space. If I map some code there, it should be well within the range of values whose displacement I can represent in 32bits.

So, why not map a second trampoline to that page which can contains code that can call an absolute address?

My stage 1 trampoline code looks something like this:

  /* the struct below is just a sequence of bytes which represent the
    *  following bit of assembly code, including 3 nops for padding:
    *
    *  mov $address, %rbx
    *  callq *%rbx
    *  ret
    *  nop
    *  nop
    *  nop
    */
  struct tramp_tbl_entry ent = {
    .mov = {'\x48','\xbb'},
    .addr = (long long)&error_tramp,
    .callq = {'\xff','\xd3'},
    .ret = '\xc3',
    .pad =  {'\x90','\x90','\x90'},
  };

  tramp_table = mmap(NULL, 4096, PROT_WRITE|PROT_READ|PROT_EXEC,
                                   MAP_32BIT|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
  if (tramp_table != MAP_FAILED) {
    for (; i < 4096/sizeof(struct tramp_tbl_entry); i ++ ) {
      memcpy(tramp_table + i, &ent, sizeof(struct tramp_tbl_entry));
    }
  }
}

It mmaps a single page and writes a table of default trampolines (like a jump table) that all call an error trampoline by default. When a new trampoline is inserted, I just go to that entry in the table and insert the address that should be called.

To get around the displacement challenge described above, the addresses I insert into the stage 1 trampoline table are addresses for stage 2 trampolines.

The stage 2 trampoline

Setting up the stage 2 trampolines are pretty simple once the stage 1 trampoline table has been written to memory. All that needs to be done is update the address field in a free stage 1 trampoline to be the address of my stage 2 trampoline. These trampolines are written in C and live in my Ruby gem.

static void
insert_tramp(char *trampee, void *tramp) {
  void *trampee_addr = find_symbol(trampee);
  int entry = tramp_size;
  tramp_table[tramp_size].addr = (long long)tramp;
  tramp_size++;
  update_image(entry, trampee_addr);
}

An example of a stage 2 trampoline for rb_newobj might be:

static VALUE
newobj_tramp() {
  /* print the ruby source and line number where the allocation is occuring */
  printf("source = %s, line = %d\n", ruby_sourcefile, ruby_sourceline);

  /* call newobj like normal so the ruby app can continue */
  return rb_newobj();
}

Programatically rewriting the Ruby binary in memory

Overwriting the Ruby binary to cause my stage 1 trampolines to get hit is pretty simple, too. I can just scan the .text segment of the binary looking for bytes which look like callq instructions. Then, I can sanity check by reading the next 4 bytes which should be the displacement to the original function. Doing that sanity check should prevent false positives.

static void
update_image(int entry, void *trampee_addr) {
  char *byte = text_segment;
  size_t count = 0;
  int fn_addr = 0;
  void *aligned_addr = NULL;

 /* check each byte in the .text segment */
  for(; count < text_segment_len; count++) {

    /* if it looks like a callq instruction... */
    if (*byte == '\xe8') {

      /* the next 4 bytes SHOULD BE the original displacement */
      fn_addr = *(int *)(byte+1);

      /* do a sanity check to make sure the next few bytes are an accurate displacement.
        * this helps to eliminate false positives.
        */
      if (trampee_addr - (void *)(byte+5) == fn_addr) {
        aligned_addr = (void*)(((long)byte+1)&~(0xffff));

        /* mark the page in the .text segment as writable so it can be modified */
        mprotect(aligned_addr, (void *)byte+1 - aligned_addr + 10,
                       PROT_READ|PROT_WRITE|PROT_EXEC);

        /* calculate the new displacement and write it */
        *(int  *)(byte+1) = (uint32_t)((void *)(tramp_table + entry)
                                     - (void *)(byte + 5));

        /* disallow writing to this page of the .text segment again  */
        mprotect(aligned_addr, (((void *)byte+1) - aligned_addr) + 10,
                      PROT_READ|PROT_EXEC);
      }
    }
    byte++;
  }
}

Sample output

After requiring my ruby gem and running a test script which creates lots of objects, I see this output:

...
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
...

Showing the file name and line number for each object getting allocated. That should be a strong enough primitive to build a Ruby memory profiler without requiring end users to build a custom version of Ruby. It should also be possible to re-implement bleak_house by using this gem (and maybe another trick or two).

Awesome.

Conclusion

  • One step closer to building a memory profiler without requiring end users to find and use patches floating around the internet.
  • It is unclear whether cheap tricks like this are useful or harmful, but they are fun to write.
  • If you understand how your system works at an intimate level, nearly anything is possible. The work required to make it happen might be difficult though.

Thanks for reading and don't forget to subscribe (via RSS or e-mail) and follow me on twitter.

References

  1. http://en.wikipedia.org/wiki/Trampoline_(computers)


more »