Iteration Shouldn’t Spin Your Wheels! »
Created at: 27.01.2010 20:00, source: Engine Yard Blog, tagged: Technology newsletter
This article was originally included in the September issue of the Engine Yard Newsletter. To read more posts like this one, subscribe to the Engine Yard Newsletter.
In this series, Evan Phoenix, Rubinius creator and Ruby expert, presents tips and tricks to help you improve your knowledge of Ruby.
Ruby is a rich language that believes there should be more than one way to express yourself—the many ways of counting and iterating are no exception.
Most Ruby programmers are familiar with the most common one:
Integer#times 100.times { |i| p i }
Integer#times counts from 0 up to 99, yielding the current number to the block. This a simple, expressive way to execute some code a number of times.
But there are cases where you want to start counting at a number other than 0, no problem:
Integer#upto 10.upto(20) { |i| p i }
This prints out 10, 11, 12, until it hit 20. It increments by 1, and you'll notice it is inclusive, meaning that in this case we yield 11 items, not 10.
Going up is nice, but sometimes you need to go down, so use #upto's sister:
Integer#downto. 20.downto(10) { |i| p i }
If you need a little more control over your iteration, you can use:
Range#step (10..20).step(2) { |i| p i }
This will print 10, 12, 14, 16, 18, 20.
Now, in this case, we've introduced a Range, which most Ruby programmers are familiar with. It is basically an object that expresses a beginning and an end — in this case, 10 and 20. Range has another trick up it's sleeve:
(10...20).step(2) { |i| p i }
You'll notice the 3 dots instead of 2. This indicates that this range is exclusive of the end, not inclusive. So 20 is the terminator, but is not in the set of valid values itself.
Range also support #each:
(10..20).each { |i| p i }
This works exactly the same as Integer#upto. I personally prefer Integer#upto, because I feel it expresses the operation better.
Another domain is counting on a collection. Before 1.8.7 and 1.9, there was pretty much only one method to help you with doing that: Array#each_with_index.
[:foo, :bar, :baz].each_with_index { |sym, index| p [sym, index] }
This prints out [:foo, 0], [:bar, 1], and [:baz, 2].
This is nice, but it's pretty limiting because the only place you've got that index is with simple iteration. Say you wanted to map the Array and take the position into account — you'd have to do:
ary = [1, 3, 5] i = 0 ary.map { |element| x = element * i; i += 1; x }
It's kind of messy to just take the position into account. So with 1.8.7 and 1.9, Enumerator support was baked into most methods which makes this much simpler!
ary = [1,3,5] ary.map.with_index { |element, index| element * index }
For those that haven't seen Enumerators yet, you're saying "Hey! Where did the block to map go!" Well there isn't one. Array#map, when passed no block, returns a Enumerator object. This object, when you call #each, calls the original method on the original object and passes the block along. To begin with, this provides external iteration, but it also gives Ruby a place to add iteration alteration methods, such as Enumerator#with_index. Now you never need to use a while loop again!
See you next time!
more »
Render Options in Rails 3 »
Created at: 18.01.2010 21:30, source: Engine Yard Blog, tagged: Technology ActionController json newsletter rails Rails 3
This article was originally included in the October issue of the Engine Yard Newsletter. To read more posts like this one, subscribe to the Engine Yard Newsletter.
In Inside Rails, Yehuda Katz, Rails expert and core team member, and Carl Lerche, Rails expert and full-time contributor, present expert advice and insight on the Rails platform and Rails development.
In previous versions of Rails, adding a new rendering option to Rails required performing an
alias_method_chain on the render method, adding your new options, and hoping they didn't conflict with any of the other code in the rendering pipeline.
Rails 3 makes rendering options a first class citizen, and uses the same new system internally that plugins authors are expected to use. Before we get into how you can use this feature yourself, let's take a look at how Rails uses it:
ActionController.add_renderer :json do |json, options| json = ActiveSupport::JSON.encode(json) unless json.respond_to?(:to_str) json = "#{options[:callback]}(#{json})" unless options[:callback].blank? self.content_type ||= Mime::JSON self.response_body = json end
Here, we are creating a new render :json option, which behaves exactly like render :json in Rails 2.3. In the render_json method, we use the same lower-level (but still public) APIs that are used by Rails itself to set the MIME type and response body. Using a render option entirely skips the rest of the render pipeline, so you don't have to worry about blocking normal template selection and rendering, or any other internal changes that might break your added option.
Next, let's take a look at adding a new render :pdf option. We'll use JRuby and the Flying Saucer library, which takes HTML and CSS and converts it to a PDF file. The syntax for the new option will be render :pdf => "template_name", :css => %w(main.css print.pdf), :layout => "print".
In order to understand how to do this, let's first take a look at the code necessary to use Flying Saucer in JRuby. First, you'll need to grab the Flying Saucer jars (you can get them from my Muse git repo). Next, let's write the code necessary to convert HTML and CSS to PDF with Flying Saucer:
# Both of these are .jar files require "/path/to/itext" require "/path/to/core-renderer" module PdfUtils def self.string_to_pdf(input_string) io = StringIO.new renderer = org.xhtmlrenderer.pdf.ITextRenderer.new renderer.set_document_from_string_input input_string renderer.layout renderer.create_pdf(io.to_outputstream) io.string # the PDF file in a String end end
It's a bit verbose, but it does work. Now that we have a way to make the PDF, let's wire it up into a render option.
ActionController.add_renderer :pdf do |template, options| css_files = Array.wrap(options.delete(:css)) css = css_files.map {|file| File.read(file) }.join("\n") # Reuse the render semantics to get a string from the # template and options string = render_to_string template, options # Drop in the CSS before the in a style tag. # In practice, you would probably cache the file reads # and the merging of the CSS and HTML. string.gsub!(%r{}, "<style>#{css}</style>") send_data PdfUtils.string_to_pdf(string), :type => Mime::PDF end
You'd also need to register the PDF mime type:
Mime::Type.register "application/pdf", :pdf
You could now do the following in a controller:
class PostsController < ActionController::Base def show @post = Post.find(params[:id]) respond_to do |format| format.html format.pdf # or format.pdf { render :pdf => "show", :css => %w(application print) } end end end
The tricky bits of this process are now reserved to figuring out how to build and return the output, not how to inject your option into render. Pretty cool, no?
more »
Getting Started with Nokogiri »
Created at: 14.01.2010 20:00, source: Engine Yard Blog, tagged: Technology css html Nokogiri ParseTree XML XPath
Nokogiri is a library for dealing with XML and HTML documents. I wrote Nokogiri along with my (more attractive) partner in crime, Mike Dalessio. We both use and enjoy working with Nokogiri for dealing with HTML and XML on a daily basis, and I’d like to share it with you! In this post, we’ll be covering:
- Getting Nokogiri installed
- Basic document parsing
- Basic data extraction
Hopefully by the end of this article you will also be able to use and enjoy Nokogiri on a day to day basis too!
Installation
Nokogiri is actually a wrapper around Daniel Veillard’s excellent HTML/XML parsing library written, libxml2. Since Nokogiri simply wraps and builds upon this already existing library, installing libxml2 is a prerequisite for installing Nokogiri. Fortunately, libxml2 has been ported to most systems, so the installation is pretty easy.
OS X
I recommend installing libxml2 on OS X from macports. OS X ships with libxml2 installed, but macports is more up to date, so I’d recommend using it instead.
To install libxml2 from macports:
$ sudo port install libxml2 libxslt
Then to install nokogiri:
$ sudo gem install nokogiri
And that should be it!
Linux
On Linux, we still need to install libxml2. The command for installing libxml2 will change depending on the package manager and linux distribution you’re using, but we’ll cover Fedora and Ubuntu here.
On Fedora:
$ sudo yum install libxml2-devel libxslt-devel $ gem install nokogiri
On Ubuntu:
$ sudo apt-get install libxml2 libxml2-dev libxslt libxslt-dev $ gem install nokogiri
Windows
Dealing with libxml2 on Windows is so much work, that we built libxml2 for you, and now ship it along with Nokogiri. On Windows, to install, simply do gem install nokogiri.
Oh Noes! Something Went Wrong!
Nokogiri ships with some basic intelligence for finding your installation of libxml2, but clever developers can easily fool it! If you have problems, first check that the libxml2 and libxslt development packages are installed. If everything seems OK, and Nokogiri still won’t install, send an email to the Nokogiri mailing list. We’re here to help!
Basic Parsing
Now that we have installation out of the way, it’s time to get Nokogiri to do some work for us. Nokogiri lets you parse an HTML or XML document using a few different strategies:
- DOM
- SAX
- Reader
- Pull
Each of these strategies have different advantages and disadvantages. We won’t go through all the differences in this post; the DOM interface is the most common, and generally regarded as the easiest to use, so that’s what we’ll focus on here.
There are two main entry points to Nokogiri depending on the kind of document you wish to parse: one for HTML documents and one for XML documents. Parsing HTML documents looks like this:
doc = Nokogiri::HTML(html_document)
Parsing XML documents looks like this:
doc = Nokogiri::XML(xml_document)
Both of these functions will take an IO object or a String object. Since both forms accept IO objects, we can even feed open-uri straight in to Nokogiri like this:
doc = Nokogiri::HTML(open("http://www.google.com/search?q=doughnuts"))
Feeding Nokogiri an IO object is slightly more efficient than using a String, but you should choose the one that is most convenient.
Data Structures
To become data extraction Zen Masters, we first need to understand the data structure returned by Nokogiri. Notably, we need to understand that Nokogiri converts HTML and XML documents into a tree data structure.
For example, an HTML document that looks like this:
<html>
<head>
<title>Hello!</title>
</head>
<body id="uniq">
<h1>Hello World!</h1>
</body>
</html>
…will be represented in memory with a tree that looks like this:

Any data extraction technique used is simply a way for traversing this in-memory tree. If we keep this structure in mind while trying to do data extraction, we can enter data extraction nirvana!
Data Extraction
We’ve seen how to turn an HTML or XML document into an in-memory tree. Now we’re going to try to do something useful with this tree: extract some data. Let’s take a look at a few different strategies for unlocking the data in our tree.
There are three different ways to traverse our in-memory tree. The first two, XPath and CSS, are small languages built specifically for tree traversal. The last one we’ll examine is the Nokogiri API for manual tree traversal.
Basic XPath
The XPath language was written to easily traverse an XML tree structure, but we can use it with HTML trees as well. Here’s a sample program for extracting search result links from a google search. We’ll use XPath to find the data we want, and then pick apart the XPath syntax:
require 'open-uri' require 'nokogiri' doc = Nokogiri::HTML(open("http://www.google.com/search?q=doughnuts")) doc.xpath('//h3/a').each do |node| puts node.text end
The XPath used in this program is:
//h3/a
In English, this XPath says:
Find all “a” tags with a parent tag whose name is “h3″
Thus, our program finds all “a” tags with “h3″ parents, loops over them, and prints out the text content.
XPath works like a directory structure where the leading “/” indicates the root of the tree. Slashes separate the tag matching information. When there’s nothing between slashes, it’s a sort of wild card—meaning “any tag matches”. The “h3″ and “a” are tag name matchers, and only match when the tag name matches.
Finding tag names is great, but if you run the previous program, you might find that it returns more “a” tags than we actually want. We need to narrow down our search based on some attributes of the tags, specifically the “class” values. To match attribute values in XPath, we use brackets. Now let’s look at a couple of examples.
To match “h3″ tags that have a class attribute, we write:
h3[@class]
To match “h3″ tags whose class attribute is equal to the string “r”, we write:
h3[@class = "r"]
Using the attribute matching construct, we can modify our previous query to:
//h3[@class = "r"]/a[@class = "l"]
which in English terms is:
Find all “a” tags with a class attribute equal to “l” and an immediate parent tag “h3″ that has a class attribute equal to “r”
If we substitute that XPath back in to our original program, we’ll get the expected results.
For more information on doing XPath queries, I recommend checking out the tutorial at w3schools as well as the w3 recommendation.
For more information on using XPath within Nokogiri, check out the Nokogiri tutorials as well as the RDoc.
Next, let’s look at CSS syntax.
Basic CSS
CSS is similar to XPath in that it’s another language for searching a tree data structure. In this section, we’ll perform the same task as the XPath section, but we’ll examine the CSS syntax.
CSS does not separate tag matching patterns by slashes, but rather by whitespace or “greater than” characters (actually, there are more, but we’re just going to talk about those two for now). Let’s rewrite our previous XPath as CSS and examine the syntax.
//h3/a
…can be written in CSS as:
h3 > a
The “>” character indicates that the “a” tag must be a direct descendant of the “h3″ tag. Most CSS that I see uses space separators like this:
h3 a
Using a space indicates that there could be any number of tags between the ”h3″ tag and the “a” tag. The space is similar to “//” in XPath, and this CSS query could be written in XPath like this:
//h3//a
Similar to XPath, CSS can use brackets for matching attributes. Let’s do a couple more XPath to CSS translations. On the left is XPath, on the right is CSS:
h3[@class] => h3[class] h3[@class = "r"] => h3[class = "r"]
This syntax works, but CSS provides us with a shorthand for matching the ”class” attribute. To find all h3 tags whose class attribute contains “r”, we can say:
h3.r
There’s a subtle difference between the two previous examples. The selector h3[@class = "r"] must be an exact match; the class value must exactly equal the string r. In the second example, the selector h3.r means “the class attribute must contain the value r”. That means h3.r will match the following tag, but h3[@class = "r"] will not:
<h3 class="r foo">Hi!</h3>
The XPath selector and our translated CSS selector would not match this tag, but the “h3.r” selector would. Most of the time, the CSS class selectors do what we want. Only when I need something very specific do I use the bracket form in my CSS selectors.
With this knowledge in hand, we can rewrite our original program using CSS selectors:
doc = Nokogiri::HTML(open("http://www.google.com/search?q=doughnuts")) doc.css('h3.r > a.l').each do |node| puts node.text end
I think the CSS selectors usually result in more concise and clear queries than XPath, so I usually stick to CSS queries in my code. There are some tasks which CSS cannot accomplish that XPath can though, so it’s nice to be able to fall back to XPath queries when I need to.
Next, let’s look at some basic node API’s provided by Nokogiri.
Basic Node API
Since we’re dealing with a tree data structure, Nokogiri provides methods for navigating that tree. In fact, all of the tree traversal we’ve seen so far using XPath and CSS can be accomplished manually via Ruby. Manual tree traversal is, however, cumbersome and verbose, which is why languages like XPath and CSS exist. Sometimes a combination of XPath or CSS plus manual tree traversal is easiest, so it is still important to know the API.
Every tag in a document is represented by class called a Node. Each node in the tree has 0 or more children, 0 or 1 parent, 0 or more siblings, and 0 or more attributes. Nokogiri provides methods for accessing all of these things on any particular node. We can access any of those relative nodes like so:
node.parent #=> parent node node.children #=> children nodes node.next_sibling #=> next sibling node node.previous_sibling #=> previous sibling node
These node access methods can be used for manually traversing a tree, but I tend to leave the hard work to XPath or CSS queries and only use manual tree access when I have to.
When it comes to accessing attributes of a tag, the node may be treated like a normal Ruby Hash. We can get and set attributes on a node like so:
node['class'] #=> the value of the class attribute node['class'] = 'foo'
We can even get a list of attributes or values of attributes like so:
node.keys #=> list of attribute name node.values #=> list of attribute values
For more information on things you can do with Nodes, check out the Node Documentation and also the Nokogiri tutorials section.
Conclusion
I hope this article has you on your way to HTML and XML parsing nirvana. Remember the tree data structure, and remember that XPath and CSS can be performed on HTML documents and XML documents.
Make sure to check out our documentation, and if you have any problems make sure to join the mailing list!
more »
Rails and Merb Merge: Plugin API (Part 3 of 6) »
Created at: 11.01.2010 20:00, source: Engine Yard Blog, tagged: Technology ActionController ActionView rails Rails 3
I started off this series on the Rails/Merb merge (aka Rails 3) talking about modularity, and then performance. Next up, plugins!
When we announced the Rails/Merb merge, we promised to bring a more stable plugin API to Rails 3. Merb had strong opinions about an explicitly exposed plugin API, so we hoped to end the complications of alias_method_chain in favor of exposed APIs which plugin authors could rely on across versions.
Dogfooding
By far the most important element in developing the plugin APIs for Rails 3 was to eliminate the "secret" APIs used in Rails itself. Instead, we wanted to build Rails on top of the same core components that we would ask plugin authors to build on. The focal point of this effort (because we already had experience doing this with Merb) was ActionPack, and specifically ActionController.
ActionController presented us with another excellent opportunity: because ActionMailer reused a lot of the controller functionality, we had a built-in opportunity to create components that could be used in both ActionController and ActionMailer. In a sense, ActionController and ActionMailer would become plugins of the core system we were creating.
We also took a second important conceptual step. In Rails 2.3, the idea of "metal" was that you could skip the entire ActionController chain and write raw Rack handlers where needed for performance. In Rails 3, we are exposing an incremental series of components that, in sum, make up ActionController::Base. These components include rendering, layouts, conditional get, respond_to, streaming, and others.
In Rails 3, ActionController::Metal is the simplest possible controller, with no additional components added in. ActionController::Base is simply a subclass of ActionController::Metal with all components included. Again, we are dog-fooding our componentized APIs, not simply giving you a watered down version of what we use internally.
Now, let's dig into a few specific examples of new APIs.
ActionController::Renderers
One of the most common reasons people extend controllers is to expose additional renderers. For instance, you may want to render a PDF as well as HTML. The Rails 3 API that we expose to plugin authors for this purpose is identical to the one that we use internally. (As you probably know, in addition to being able to render templates, you can render :json, :xml, or :js in normal Rails controllers.)
If you take a look at the source for ActionController::Renderers, here's how we do it:
add :json do |json, options| json = ActiveSupport::JSON.encode(json) unless json.respond_to?(:to_str) json = "#{options[:callback]}(#{json})" unless options[:callback].blank? self.content_type ||= Mime::JSON self.response_body = json end
This method is in ActionController::Renderers, so it's like saying ActionController::Renderers.add(:json) do |json, options|. You can add additional renderers easily in Rails 3 plugins, and you don''t have to figure out how to hook into Rails at the appropriate time. Like other parts of the public API, we're committing to keeping this API unchanged across multiple versions of Rails. If and when when the API DOES change, the change will be flagged for you first, via the normal Rails API deprecation process.
ActionController::Metal
In Rails 3, there's a supported way to use a simple, stripped down controller, and opt into the functionality you want. The overhead of the completely stripped down controller is a mere 8 microseconds (not much more than the overhead for a pure Rack app), compared with 100 or so microseconds of overhead for the full ActionController::Base (as I said last time, this itself is significantly less than Rails 2.3).
Here's an example of using ActionController::Metal:
class SimpleController < ActionController::Metal abstract! end class HelloController < SimpleController def index self.response_body = "Hello World" end end
A few things to note here. First of all, you don't need a superclass, but you can build one to re-use in multiple places. The abstract! declaration tells Rails 3 not to include public methods on the superclass as controller actions. It's not strictly needed here (because there are no public methods), but it's a good habit for custom ActionController::Metal subclasses. Second of all, because this is a controller, you can route to it through the normal Rails router. From the perspective of the rest of Rails, this is a perfectly normal controller.
Finally, the self.response_body usage is the simple, low-level API used in ActionController itself (for instance, during template rendering), but it's now 100% first-class and available for your use. You can also set status, content_type and headers directly, regardless of whether you use the Request and Response helpers or not.
If you want to add rendering in, you can do something like this:
class SimpleController < ActionController::Metal abstract! include ActionController::Rendering append_view_path "#{Rails.root}/app/views" end class HelloController "hello" end end
You've now added in a new module, which provides rendering support. This keeps things simple by not including layout support, because you don't need it in this case. With this capability, you can add the features that you want to add, leaving out what you don't. This means that as a metal grows, you don't need to completely rewrite it as a "normal" controller. Instead, you can pull in the specific features you need. This process is seamless, and again, you can route to it as a normal controller throughout.
If you look at the source for ActionController::Base, you can see that we aren't doing anything particularly special anymore:
module ActionController class Base < Metal abstract! include AbstractController::Callbacks include AbstractController::Layouts include ActionController::Helpers helper :all # By default, all helpers should be included include ActionController::HideActions include ActionController::UrlFor include ActionController::Redirecting include ActionController::Rendering include ActionController::Renderers::All include ActionController::ConditionalGet include ActionController::RackDelegation include ActionController::Logger include ActionController::Configuration ...
Here, ActionController::Base inherits from Metal, and then proceeds to pull in all of the functionality not exposed as separate modules. Although Base pulls them all in, you don't have to worry about dependencies between these modules; handling dependencies is built into the way these modules work.
View Contexts
As one last example, the way that ActionController communicates with ActionView itself has become a fully defined, exposed API. From the perspective of ActionController, a view implements three APIs:
Context.for_controller(controller)Context#render_partial(options)Context#render_template(template, layout, options)
By implementing this API, you can drop in any object in place of ActionView. How would you use this? Well to date, I can think of two examples: rspec_rails is going to use this functionality to implement isolation tests, and you can use this to implement a Merb shim in which the controller and view are the same object (an implementation using an earlier version of the API).
In addition to the usefulness of exposing this functionality via a stable Public API, reducing the boundary between ActionController and ActionView significantly improved the architecture, and exposed areas of brittleness that benefited quite a bit from the definition.
Conclusion
The APIs provided here, while stable at this point, are not final. If you feel that they (or any other part of Rails 3) aren't adequate for a plugin you're building, please let us know by commenting.
This post is just a taste of the new ways you can extend Rails 3. In the weeks ahead, expect a lot more documentation and discussion about the exposed plugin hooks.
more »
Ruby Tips: Numeric Classes »
Created at: 07.01.2010 20:30, source: Engine Yard Blog, tagged: Technology fixnum Numeric Classes
This article was originally included in the September issue of the Engine Yard Newsletter. To read more posts like this one, subscribe to the Engine Yard Newsletter.
In this series, Evan Phoenix, Rubinius creator and Ruby expert, presents tips and tricks to help you improve your knowledge of Ruby.
Ruby’s numeric classes form a full numeric tower, providing many kinds of representations of numbers and numerical representations. It contains at its core a very elegant pattern that allows classes to participate in the tower easily.
Lets say we want to add a new numeric class called Money, which contains the number of dollars and cents:
class Money def initialize(dollars, cents=0) @dollars = dollars @cents = cents end attr_reader :dollars, :cents end
Now, lets say we’d like to have Money be able to interact with all integers nicely, with an integer representing a number of whole dollars. It’s not too hard add a + method to do that:
class Money def +(other) case other when Money Money.new(@dollars + other.dollars, @cents + other.cents) when Integer Money.new(@dollars + other.to_i, @cents) else raise ArgumentError, "Unknown type!" end end end
but we’d also like to be able to do:
allowance = Money.new(5)more = 1 + allowance
Trying this straight away, you’ll receive a message about Money not being able to be coerced to a Fixnum. This gives you a hint as to how to allow Money to interact with Fixnum better. We need to teach Money how to interact with the rest of the numeric tower, which we do with just one method:
class Money def coerce(other) [self, Money.new(other.to_i)] end end
now we can do more = 1 + allowance and we see that we get #.
Wonderful! Fixnum#+, seeing the argument isn’t a Fixnum, uses the coerce protocol. This is a simple double dispatch protocol, which gives the argument the ability to change the values being operated on, then call the original method again. We simply return an array of the new values to use, here we convert the argument to a Money object, and then + is called again on the first element in the Array, passing the second as the argument.
Lets say we’d like “1 + allowance” to return 6 instead. Easy!
class Money def coerce(other) [@dollars, other.to_i] end end
Now, for your homework, make Money work also with Floats! See you next time…
more »
