GCC optimization flag makes your 64bit binary fatter and slower »
Created at: 20.07.2010 15:59, source: time to bleed by Joe Damato, tagged: debugging linux systems testing x86

If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.
The intention of this post is to highlight a subtle GCC optimization bug that leads to slower and larger code being generated than would have been generated without the optimization flag.
UPDATED: Graphs are now 0 based on the y axis. Links in the tidbits section (below conclusion) for my ugly test harness and terminal session of the build of the test case in the bug report, objdump, and corresponding system information.
Hold the #gccfail tweets, son.
Everyone fucks up. The point of this post is not to rag on GCC. If writing a C compiler was easy then every asshole with a keyboard would write one for fun.
WARNING: THERE IS MATH, SCIENCE, AND GRAPHS BELOW.
Watch yourself.
The original bug report for -fomit-frame-pointer.
I stumbled across a bug report for GCC that was very interesting. It points out a very subtle bug that occurs when the -fomit-frame-pointer flag is passed to GCC. The bug report is for 32bit code, however after some testing I found that this bug also rears its head in 64bit code.
What is -fomit-frame-pointer supposed to do?
The -fomit-frame-pointer flag is intended to direct GCC to avoid saving and restoring the frame pointer (%ebp or %rbp). This is supposed to make function calls faster, since the function is doing less work each invocation. It should also make function code take fewer bytes since there are fewer instructions being executed.
A caveat of using -fomit-frame-pointer is that it may make debugging impossible on certain systems. To combat this on Linux, .debug_frame and .eh_frame sections are added to ELF binaries to assist in the stack unwinding process when the frame pointer is omitted.
What is the bug?
The bug is that when -fomit-frame-pointer is used, GCC erroneously uses the frame pointer register as a general purpose register when a different register could be used instead.
wat.
The amd64 and i386 ABIs1 2 specify a list of caller and callee saved registers.
- The frame pointer register is callee saved. That means that if a function is going to use the frame pointer register, it must save and restore the value in the register.
- The test case provided in the bug report shows that other caller saved registers were available for use.
- Had the function used a caller saved register instead, there would be no need for the additional save and restore instructions in the function.
- Removing those instructions would take fewer bytes and execute faster.
What are the consequences?
Let’s take a look at two potential pieces of code.
The first piece is the code that would be generated if -fomit-frame-pointer is not used:
test1:
pushq %rbp ; save frame pointer
movq %rsp,%rbp ; update frame pointer to the current stack pointer
; here is where your function would do work
leave ; restore the stack pointer and frame pointer
ret ; return
Size: 6 bytes.
The above assembly sequence uses the frame pointer.
Let’s take a look at the code that is generated by GCC when -fomit-frame-pointer is used:
sub $0x8, %rsp ; make room on the stack
movq %rbp, (%rsp) ; store rbp on the stack
; here is where your function would modify and use %rbp as needed
movq (%rsp), %rbp ; restore %rbp
add $0x8, %rsp ; get rid of the extra stack space
ret ; return
Size: 17 bytes.
The above assembly sequence is what is generated when GCC decides to use the frame pointer register as a general purpose register. Since it is callee saved, it must be saved before being modified and restored after being modified.
So -fomit-frame-pointer makes your binary fatter, but does it make it slower?
Only one way to find out: do science.
I built a simple (and very ugly) testing harness to test the above pieces of code to determine which piece of code is faster. Before we get into the benchmark results, I want to tell you why my benchmark is bullshit.
Yes, bullshit.
You see, it makes me sad when people post benchmarks and neglect to tell others why their benchmark may be inaccurate. So, lemme start the trend.
This benchmark is useless because:
- Reading the CPU cycle counter is unreliable (more on this below the conclusion). I also tracked wall clock time, too.
- I don’t have the ideal test environment. I ran this on bare metal hardware, and set the CPU affinity to keep the process pinned to a single CPU… BUT
- I could have done better if I had pinned
initto CPU0 (thereby forcing all children of init to be pinned to CPU0 – remember child processes inherit the affinity mask). I would have then had an entire CPU for nothing but my benchmark. - I could have done better if I forced the CPU running my benchmark program to not handle any IRQs.
- I only tested one version of GCC: (Debian 4.3.2-1.1) 4.3.2
- I could have taken more samples.
You can find more testing harness tidbits below the conclusion.
Benchmark Results
test 1 — Code sequence simulating using the frame pointer.
test 2 — Code sequence simulating using the frame pointer as a general purpose register.
64bit results
Using -fomit-frame-pointer is SLOWER (contrary to what you’d expect) than not using it!
| cycles test 1 | cycles test 2 | microsecs test 1 | microsecs test 2 | |
| mean | 3514422987.92 | 4559685515.66 | 1882707.27 | 2442663.94 |
| median | 3507007423.5 | 4562511684.5 | 1878721.5 | 2444171.5 |
| max | 3922780211 | 4672066854 | 2101457 | 2502869 |
| min | 3502194976 | 4327782795 | 1876113 | 2318452 |
| std dev | 31927179.5632 | 15449507.8196 | 17103.7755 | 8275.49788 |
| variance | 1.02E+15 | 238687291867021 | 292539135.936 | 68483865.11835 |
32bit results
Using -fomit-frame-pointer is FASTER (as it should be) than not using it! The binary is still fatter, though.
| cycles test 1 | cycles test 2 | microsecs test 1 | microsecs test 2 | |
| mean | 3502932799.49 | 3491263364.89 | 1876553.08 | 1870301.35 |
| median | 3501486586.5 | 3492013955.5 | 1875778 | 1870702.5 |
| max | 3905163528 | 3731985243 | 2092032 | 1999259 |
| min | 3500916510 | 3408834436 | 1875472 | 1826144 |
| std dev | 10066939.1113 | 7992367.6913 | 5393.0412 | 4281.5466 |
| variance | 101343263071403 | 63877941312996.4 | 29084893.2588 | 18331640.9459 |
Conclusion
- GCC is a really complex piece of software; this bug is very subtle and may have existed for a while.
- I’ve said this a few times, but knowing and understanding your system’s ABI is crucial for catching bugs like these.
- Math and science are cool now, much like computers. You should use both.
Thanks for reading and don’t forget to subscribe (via RSS or e-mail) and follow me on twitter.
Testing harness tidbits
Each run of the benchmark executes either test1 or test2 (from above) 500,000,000 times. I did around 2500 runs for each test function.
You can get the testing harness, a build script, and a test script here: http://gist.github.com/483524
You can look at the terminal session where I build the test from the original bug report on my system: http://gist.github.com/483494
The code I used to read the CPU cycle counter looks like this:
static __inline__ unsigned long long rdtsc(void)
{
unsigned long hi = 0, lo = 0;
__asm__ __volatile__ ("lfence\n\trdtsc" : "=a"(lo), "=d"(hi));
return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
The lfence instruction is a serializing instruction that ensures that all load instructions which were issued before the lfence instruction have been executed before proceeding. I did this to make sure that the cycle counter was being read after all operations in the test functions were executed.
The values returned by this function are misleading because CPU frequency may be scaled at any time. This is why I also measured wall clock time.
References
more »
Because Gem Names Are Like Domains in the 90's »
Created at: 30.03.2010 18:00, source: RailsTips - Home, tagged: gems testing
One of my favorite parts of every new gem is naming it. The other day, when I was trying to name joint, it occurred to me that I should always check if a gem name is available before I create my project. I did a quick search on RubyGems and discovered it was available.
Last night, I decided I should whip together a tiny gem that allows you to issue a whois command to see if a gem name is taken. Why leave the command line, eh?
Installation
gem install gemwhois
This adds the whois command to gem. Which means usage is pretty fun.
Usage
$ gem whois httparty
gem name: httparty
owners: John Nunemaker, Sandro Turriate
info: Makes http fun! Also, makes consuming restful web services dead easy.
version: 0.5.2
downloads: 40714
$ gem whois somenonexistantgem
Gem not found. It will be mine. Oh yes. It will be mine. *sinister laugh*
If the gem is found, you will see some details about the project (maybe you can convince them to hand over rights if they are squatting). If the gem is not found, you will receive a creepy message in the same vein as the RubyGems 404 page.
The Fun Parts
The fun part of this gem was recently I noticed that other gems have been adding commands to the gem command. I thought that was interesting so I did a bit of research. I knew that both gemedit and gemcutter added commands so I downloaded both from Github and began to peruse the source. Turns out it is quite easy.
First, you have to have a rubygems_plugin.rb file in your gems lib directory. This is mostly ripped from gemcutter:
if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.3.6')
require File.join(File.dirname(__FILE__), 'gemwhois')
end
Next, you have to create a command. At the time of this post, here is the entirety of the whois command:
require 'rubygems/gemcutter_utilities'
class Gem::Commands::WhoisCommand < Gem::Command
include Gem::GemcutterUtilities
def description
'Perform a whois lookup based on a gem name so you can see if it is available or not'
end
def arguments
"GEM name of gem"
end
def usage
"#{program_name} GEM"
end
def initialize
super 'whois', description
end
def execute
whois get_one_gem_name
end
def whois(gem_name)
response = rubygems_api_request(:get, "api/v1/gems/#{gem_name}.json") do |request|
request.set_form_data("gem_name" => gem_name)
end
with_response(response) do |resp|
json = Crack::JSON.parse(resp.body)
puts <<-STR.unindent
gem name: #{json['name']}
owners: #{json['authors']}
info: #{json['info']}
version: #{json['version']}
downloads: #{json['downloads']}
STR
end
end
def with_response(resp)
case resp
when Net::HTTPSuccess
block_given? ? yield(resp) : say(resp.body)
else
if resp.body == 'This rubygem could not be found.'
puts '','Gem not found. It will be mine. Oh yes. It will be mine. *sinister laugh*',''
else
say resp.body
end
end
end
end
The important part is inheriting from Gem::Command. Be sure to require 'rubygems/command_manager' at some point as well. Once you have the rubygems_plugin file and a command created, you simple register the command:
Gem::CommandManager.instance.register_command(:whois)
The comments and code in RubyGems itself is pretty helpful if you are curious about what you can do.
Testing
The trickier part was testing the command. Obviously, building the gem from gemspec and installing over and over does not a happy tester make. I did a bit of research and found the following testing output helpers and the unindent gem:
module Helpers
module Output
def assert_output(expected, &block)
keep_stdout do |stdout|
block.call
if expected.is_a?(Regexp)
assert_match expected, stdout.string
else
assert_equal expected.to_s, stdout.string
end
end
end
def keep_stdout(&block)
begin
orig_stream, $stdout = $stdout, StringIO.new
block.call($stdout)
ensure
s, $stdout = $stdout.string, orig_stream
s
end
end
end
end
With these little helpers, it was quite easy to setup the command and run it in an automated way:
require 'helper'
class TestGemwhois < Test::Unit::TestCase
context 'Whois for found gem' do
setup do
@gem = 'httparty'
stub_gem(@gem)
@command = Gem::Commands::WhoisCommand.new
@command.handle_options([@gem])
end
should "work" do
output = <<-STR.unindent
gem name: httparty
owners: John Nunemaker, Sandro Turriate
info: Makes http fun! Also, makes consuming restful web services dead easy.
version: 0.5.2
downloads: 40707
STR
assert_output(output) { @command.execute }
end
end
context "Whois for missing gem" do
setup do
@gem = 'missing'
stub_gem(@gem, :status => ["404", "Not Found"])
@command = Gem::Commands::WhoisCommand.new
@command.handle_options([@gem])
end
should "work" do
output = <<-STR.unindent
Gem not found. It will be mine. Oh yes. It will be mine. *sinister laugh*
STR
assert_output(output) { @command.execute }
end
end
end
The only other piece of the puzzle was using FakeWeb to stub the http responses for the found and missing gems. You can see more on that in the test helper file.
Conclusion
At any rate, the gem is pretty tiny and possibly useless to others, but it was fun. Gave me a chance to play around with testing STDOUT and creating RubyGem commands. Plus, now I know if the gem name I want is available in just a few keystrokes.
more »
Double Shot #663 »
Created at: 08.03.2010 13:55, source: A Fresh Cup, tagged: Double Shot git heroku highlight ruby testing
I'll just be over here in the corner, trying to write some code.
- Public Beta: Deployment Stacks: You now have a choice of Debian and Ruby versions on Heroku.
- Smart HTTP Transport - Interesting, setting up your own git server might get easier now.
- Smack a Ho.st - Easier testing for Rails apps with subdomains.
- Ruby's Metaprogramming Toolbox - An introduction.
- Highlight - Syntax highlighting for Rails projects via a plugin that wraps pygments.
- Stubbing and Setting Expectations on HTTP Requests Is Now Easy With WebMock - Looks like a potential alternative to fakeweb with a bit more spice.
- identity-map - Simple Identity Mapper implementation for Active Record.
more »
Easy rcov with Test::Unit and Rails 2 »
Created at: 01.02.2010 14:23, source: A Fresh Cup, tagged: Rails Articles rails rcov testing
Yesterday, I needed test coverage numbers for an application using Rails 2.3 and plain old Test::Unit (still my test framework of choice). This proved to be a good deal more difficult than I expected: there are so many forks of the original rcov code, and so much disparate advice on how to get it hooked up, that I spent a great deal of unprofitable time reading before I got the right combination. Hopefully this short list will save you from doing the same:
- Install the relevance-rcov gem. This is the stable and maintained fork.
- Install the commondream rcov_plugin. This has the best set of working rake tasks I've found.
- Add coverage/* and test/coverage/* to your .gitignore.
- Run rake test:coverage to generate coverage results.
- Open /coverage/index.html in your project to view the results.
more »
Double Shot #619 »
Created at: 05.01.2010 14:40, source: A Fresh Cup, tagged: Double Shot cromwell firebug rubinius sphinx stencil testing
Looks like the next RailsBridge BugMash is going to focus on Rails 3, just before the beta. This could be interesting.
- Lemon - Yet another ruby unit test framework, this one focused on making sure there's a test case for every method.
- Reincarnate - How to make a ruby class inherit from itself.
- Sphinx Monitoring Plugin - The latest nice touch from the Scout guys.
- Cromwell - Wrapper for ruby signal handling to allow you to easily create unkillable scripts.
- Stencil - Free-text (ie, not assuming HTML) templating system.
- Rubinius 1.0.0-RC2 Released - Ruby in Ruby moves along.
- Firebug 1.6a1 - The next branch of Firebug (for Firefox 3.6/3.7) has gone live.
more »
