Shifting to the client again »
Created at: 16.03.2012 01:33, source: has_many :through, tagged: Architecture
This is my take on the current shift to rich, in-browser JavaScript apps.
Looking back over a few decades, this is the progression of how applications have been built:
- mainframes and dumb terminals
- minicomputers and smart terminals
- networked workstations
- workstations and shared code/data repositories
- web apps and static HTML
- web services and rich browser apps
Translation: The main body of the application code lives on the:
- server
- server + client
- client
- server + client
- server
- server + client
The server/client pendulum swings back and forth. The next logical step is apps that run on the client using standard services. Just give it a few more years...
more »
Web-VPN: Secure Proxies with SPDY & Chrome »
Created at: 01.12.2011 19:33, source: igvita.com, tagged: Architecture http spdy ssl vpn
Amazon's recent launch of their new Kindle Fire device delivered an interesting surprise: their Silk browser routes all requests via an Amazon powered SPDY proxy! In principle, this is not new; Opera, Blackberry and many other mobile providers have been using similar approaches to "optimize" the browsing experience for some time. However, Silk's announcement immediately generated a slew of security outcries: doesn't this mean that Amazon is now effectively a man in the middle (MITM) for all of our sessions, including SSL? Is this all just a big data mining ploy, or worse?
Turns out, the answer is: none of the above. Amazon does not route SSL traffic through their SPDY proxies, instead all HTTPS requests are routed directly to the destination. However, this incident did surface an interesting underlying question: is it, in fact, possible for a web browser to securely route SSL sessions via a web-proxy?
Tunneling SSL over HTTP
First, it is worth recalling that HTTP does allow us to tunnel SSL connections, end-to-end, via an HTTP proxy. This is a little known protocol feature, but an important one for our purposes:

To proxy an HTTPS session the client connects to the proxy by sending a special "CONNECT" HTTP method, which also includes just the host and the port of the destination. The proxy authenticates the request, completes the TCP handshake with the specified destination and returns a "200 Connection Established" response, after which, it simply becomes a dumb, two-way router. At this point, the client and the server negotiate the SSL session and start to exchange encrypted data directly through the proxy. In other words, the proxy only sees encrypted data, and the only thing it knows are the hostnames and the ports of both parties.
This is great, but there is still one problem: the original CONNECT handshake (in green), alongside the host and the port of the destination is sent in clear text. Even worse, the proxy authentication (if any) is also running over the same insecure channel. Unfortunately, browsers today do not support secure HTTP proxies! Hence the reason why we've historically resorted to heavyweight SSL VPN's, SOCKS tunnels, and the like.
Chrome, SPDY and Secure Web Proxies
Except, as it turns out, what we said above is not entirely true: earlier this year, Google Chrome added support for secure web proxies! Since all SPDY sessions run over SSL, a SPDY proxy, by definition, would need to be SSL capable. Hence, the Chrome team went ahead and added secure web proxy support - nice.
Browsers today do not support secure proxies. Although proxies can tunnel SSL, connectivity to the proxy itself is only over HTTP. To support SPDY, we need to modify Chromium to support a SSL-based proxy.
Web-VPN & Chrome
Enabling SSL for the first hop to the proxy seems like a minor change - and it did go largely unnoticed - but it opens up some really interesting use cases! For one, the authentication can now be handled at a certificate level, which can be provisioned, revoked, and verified securely. In other words, your Chrome browser is now also a "Web-VPN client" - except without the annoying installation, configuration, and maintenance!
How exactly would Web-VPN work? Simply deploy a secure HTTP proxy as a gateway to any private network, point your Chrome browser at it, and voila, you will be able to resolve any internal hostname or service just as if you were running a VPN client. The first hop to the proxy runs over SSL, and all subsequent tunnels as well.
SPDY: 2012 is the year!
One of the primary motivations for SPDY is to reduce the overall page loading time. However, while speed is important, the security aspects are arguably even more critical - support for secure web proxies is one of many great examples.
Interestingly, Chrome is on track to take the #2 browser spot away from Firefox before the end of the year, and at the same time, Firefox 11 will arrive on December 20th. So what? Well, one of the less often mentioned features of Firefox 11 is the built-in SPDY support! Combined, between Chrome and Firefox, this means that over 50% of all internet sessions will be SPDY capable. Mark 2012 as the beginning of the end for HTTP.
more »
Faster Web vs. TCP Slow-Start »
Created at: 20.10.2011 18:52, source: igvita.com, tagged: Architecture slowstart tcp
Ever obsess about trying to find the "best and fastest" internet provider? Having recently gone through the marketing vortex of comparing a dozen different plans, I was then reminded of a simple fact: the primary metric (bandwidth) used by the industry is actually highly misleading. It turns out, for most web-browsing use cases, an internet connection over several Mbps offers but a tiny improvement in performance.
One of the main culprits is "TCP slow-start", and yes that is a feature, not a bug. To understand why, we have to look inside of the TCP stack itself, and in doing so we can also learn a few interesting tips for how to build faster web-services in general.
TCP Slow-start
TCP offers a dozen different built-in features, but the two we are interested in are congestion control and congestion avoidance. TCP slow-start specifically is an implementation of congestion control within the TCP layer:
Slow start is used in conjunction with other algorithms to avoid sending more data than the network is capable of transmitting, that is, to avoid causing network congestion. (wikipedia)
The high-level flow is simple: the client sends a SYN packet which advertises its maximum buffer size (rwnd - receive window), the sender replies by sending several packets back (cwnd - congestion window) and then each time it receives an ACK from the client, it doubles the number of packets that can be "on the wire" while unacknowledged. This phase is also known as the "exponential growth" phase of the TCP connection. OSI school has an an excellent animation illustrating this phase: scroll to the bottom and hit play.
So, why do we care? Well, no matter what the size of your pipe, every TCP connection goes through this phase, which also means that more often than not, the utilized bandwidth is effectively limited by the settings of the sender's and receiver's buffer sizes.
HTTP and TCP slow start
Of course, a higher bandwidth connection will help when you are streaming a large file, or running a vanity speed test. The problem is, HTTP traffic tends to make use of short and bursty connections - in these cases we often never even reach the full capacity of our pipes!
Research done at Google shows that an increase from 5Mbps to 10Mbps results in a disappointing 5% improvement in page load times. Or put slightly differently, a 10Mbps connection, on average uses only 16% of its capacity. Yikes! As it turns out, if we want faster internet, we should focus on cutting down the round-trip time between the client and server, not necessarily just investing in bigger pipes.
The story of CWND
If TCP slow start is, well, slow, then couldn't we just make it faster? Turns out, until very recently the Linux TCP stack itself was hardcoded to start with the congestion window (cwnd) of just 3 or 4 packets, which amounts to about 4kb (~1360 bytes per packet). Combine that with the unfortunately frequent pathological case of fetching a single resource per connection, and you have managed to severely limit your performance.
As of kernel version 2.6.33, following a protracted discussion and a number of IETF recommendations, the initial cwnd value has been reset to 10 packets, which in itself is a huge step forward. Only one problem, guess what kernel versions most servers run today? Yes, perhaps it is time to upgrade your servers.
As a practical tip, if you have been thinking about enabling SPDY for your web-service, then running on anything but some of the latest kernels won't actually give you any performance improvements! A tiny change in the TCP stack, but a big difference overall.
So what can I do about it?
TCP slow start is a feature, not a bug, and it does carry some interesting and important implications. As developers we often overlook the round trip time from the client to the server, but if we are truly interested in building a faster web, then this is a good time to investigate your options: re-use your TCP connections when talking to web-services, build web-services that support HTTP keep-alive and pipelining, and do think about end-to-end latency. Oh, and don't waste your money on that 10Mbps+ pipe, you probably don't need it.
more »
Optimizing HTTP: Keep-alive and Pipelining »
Created at: 04.10.2011 22:16, source: igvita.com, tagged: Architecture http keepalive pipelining
The last major update to the HTTP spec dates back to 1999, at which time RFC 2616 standardized HTTP 1.1 and introduced the much needed keep-alive and pipelining support. Whereas HTTP 1.0 required strict "single request per connection" model, HTTP 1.1 reversed this behavior: by default, an HTTP 1.1 client and server keep the connection open, unless the client indicates otherwise (via Connection: close header).
Why bother? Setting up a TCP connection is very expensive! Even in an optimized case, a full one-way route between the client and server can take 10-50ms. Now multiply that three times to complete the TCP handshake, and we're already looking at a 150ms ceiling! Keep-alive allows us to reuse the same connection between different requests and amortize this cost.
The only problem is, more often than not, as developers we tend to forget this. Take a look at your own code, how often do you reuse an HTTP connection? Same problem is found in most API wrappers, and even standard HTTP libraries of most languages, which disable keepalive by default.
HTTP Pipelining
The good news is, keep-alive is supported by all modern browsers and mostly works out of the box. Unfortunately, support for pipelining is in a much worse off shape: no browsers support it officially, and few developers ever think about it. Which is unfortunate, because it can yield significant performance benefits!
While keep-alive helps us amortize the cost of creating a TCP connection, pipelining allows us to break the strict "send a request, wait for response" model. Instead, we can dispatch multiple requests, in parallel, over the same connection, without waiting for a response in serial fashion. This may seem like a minor optimization at first, but let's consider the following scenario: request 1 and request 2 are pipelined, request 1 takes 1.5s to render on the server, whereas request 2 takes 1s. What is the total runtime?
Of course, the answer depends on the amount of data sent back, but the lower bound is actually 1.5s! Because the requests are pipelined, both request 1 and request 2 can be processed by the server in parallel. Hence, request 2 completes before request 1, but is sent immediately after request 1 is complete. Fewer connections, faster response times - makes you wonder why nobody advertises that their API supports HTTP pipelining?
HTTP Keep-alive & Pipelining in Ruby
Unfortunately, many standard HTTP libraries revert to HTTP 1.0: one connection, one request. Ruby's own net/http uses a little known behavior where by default an "Connection: close" header is appended to each request, except when you're using the block form:
require 'net/http' start = Time.now Net::HTTP.start('127.0.0.1', 9000) do |http| r1 = http.get "/?delay=1.5" r2 = http.get "/?delay=1.0" p Time.now - start # => 2.5 - doh! keepalive, but no pipelining end
With the example above we get the benefits of HTTP keep-alive, but unfortunately net/http offers no support for pipelining. To enable that, you'll have to use a net-http-pipeline, which is a standalone library:
require 'net/http/pipeline' start = Time.now Net::HTTP.start 'localhost', 9000 do |http| http.pipelining = true reqs = [] reqs << Net::HTTP::Get.new('/?delay=1.5') reqs << Net::HTTP::Get.new('/?delay=1.0') http.pipeline reqs do |res| puts res.code puts res.body[0..60].inspect end p Time.now - start # => 1.5 - keep-alive + pipelining! end
EM-HTTP & Goliath: Keep-alive + Pipelining
While pipelining is disabled in most browsers, due to many issues related to proxies and caches, it is nonetheless a useful optimization for your own, or for talking to your partner API's. The good news is, Apache, Nginx, HAproxy and others support it, but the problem is that most app servers, even the ones which claim to be "HTTP 1.1", usually don't.
True keep-alive and pipelining support is one of the reasons we built both em-http-request and Goliath for our stack at PostRank. A simple example in action:
require 'goliath' class Echo < Goliath::API use Goliath::Rack::Params use Goliath::Rack::Validation::RequiredParam, {:key => 'delay'} def response(env) EM::Synchrony.sleep params['delay'] [200, {}, params['delay']] end end
require 'em-http-request' EM.run do conn = EM::HttpRequest.new('http://localhost:9000/') start = Time.now r1 = conn.get :query => {delay: 1.5}, :keepalive => true r2 = conn.get :query => {delay: 1.0} r2.callback do p Time.now - start # => 1.5 - keep-alive + pipelining EM.stop end end
Total runtime is, you guessed it, 1.5s. If your public or private API's are built on top of HTTP, then keep-alive and pipelining are features you should be leveraging wherever you can.
Optimizing HTTP: Interrogate your code!
While we love to spend time optimizing our algorithms, or making the databases faster, we often forget the basics: setting up TCP connections is expensive, and pipeling can lead to big wins. Do use reuse HTTP connections in your code? Does your app server support pipelining? The answers are usually "no, and I'm not sure", which is something we need to change!
more »
Server-Sent Event Notifications with HTML5 »
Created at: 26.08.2011 22:23, source: igvita.com, tagged: Architecture html5 eventsource sse
Server-Sent Events (SSE) have long been in the shadow of the much more often talked about WebSockets API. While WebSockets can be thought of as the "TCP for the web", then SSE is best described as the "HTML5 replacement for Comet". Question is, do we really need both?
WebSockets allow bi-directional communication and to do so, they piggyback on the original HTTP connection and "upgrade" to the WebSocket protocol. By comparison, and as the name implies, SSE is strictly a server push protocol. Additionally, messages are encoded in UTF-8 and are newline delimited. In other words, the bet is that most applications primarily need to push data to the client, which means that we can simplify the API and the implementation - that is where SSE comes in. Still skeptical? I was. Let's take a look under the hood.
EventSource Client API
Unlike a WebSocket connection, SSE requires no additional HTTP handshakes or "protocol upgrades". For all intents and purposes, an SSE channel is simply a long-lived, streaming HTTP connection. This alone means that you can implement an SSE service on top of any streaming capable HTTP server.
On the browser side, the EventSource API is now supported by Chrome, Firefox 6+, Safari and Opera. To start receiving server notifications, we simply open a connection to an SSE URI and setup our callbacks:
var source = new EventSource('/my/sse_endpoint'); source.addEventListener('message', function(e) { console.log(e.data); }, false); source.addEventListener('user_login', function(e) { console.log(e.data); }, false); source.addEventListener('open', function(e) { // Connection was opened. }, false); source.addEventListener('error', function(e) { if (e.eventPhase == EventSource.CLOSED) { // Connection was closed. } }, false);
The API is very simple: create an EventSource connection, define your connection open, close, and message callbacks, and you are off to the races. However, the SSE API also provides a few more built-in features. In the example above we added a "user_login" listener - turns out, we can tag our messages! Likewise, we can provide message ID's, and the SSE source will even automatically reconnect for you if the connection is dropped.
Simple SSE Server with Goliath
The client API is nice and simple, but where the SSE spec really shines is in how simple it makes the server implementation. Whereas to power a WebSocket service you will likely need an entirely different backend, an SSE endpoint can be implemented by any streaming capable HTTP web server. As an example, let's create a Goliath powered SSE app and deploy it to Heroku:
require 'goliath' class EventGenerator < Goliath::API def response(env) EM.add_periodic_timer(1) { env.stream_send("data:hello ##{rand(100)}\\n\\n") } EM.add_periodic_timer(3) do env.stream_send(["event:signup", "data:signup event ##{rand(100)}\\n\\n"].join("\\n")) end streaming_response(200, {'Content-Type' => 'text/event-stream'}) end end class SSE < Goliath::API use Rack::Static, :urls => ["/index.html"], :root => Goliath::Application.app_path("public") get "/events" do run EventGenerator.new end end
Our Goliath server responds to requests to /events by returning a 200 OK response, and then starts emitting newline delimited messages every couple of seconds - we're streaming plaintext messages directly into the HTTP connection! Additionally, Goliath also serves a static index file, which opens an EventSource connection to our /events endpoint (see the full gist here). With that, we can add a Procfile and a Gemfile, and push it out to Heroku: a live SSE demo powered by Goliath.
SSE vs. WebSockets
Technically speaking, the WebSockets spec is a superset of SSE and it is easy to wonder why we need both. Having said that, it is also hard not to like the simplicity of the EventSource API - both on the client, and especially on the server. Chances are, many apps won't actually need the bi-directional capabilities of WebSockets, and hence can benefit a great deal from the simplified setup and deployment enabled by SSE. Definitely an option to keep in mind.
more »
