Wednesday, February 2, 2011

HTTP Parallel Connections (Firefox edition!)

Parallelism helps when
    • It hides network idleness during TCP Handshakes though persistent connections help with this too.
    • It hides network idleness during the first byte phase transactions, though pipelining can address this too.
    • It hides network idleness during TCP slow start wait-for-ack periods. This is a big one.
    • It provides a mechanism to prioritize and avoid head of line blocking problems. 
    • It steals bandwidth from competing "tcp friendly" flows by simply increasing the number of flows in one application. That's an arms race that most people think should be avoided.

 Parallelism hurts when
    • It increases the number of TCP Handshakes which are both slow and CPU intensive (at least compared to regular data packets) to execute - this assumes persistent connections are an alternative.
    • It increases the overhead of normal data processing because more flows have to be considered typically via longer hash chains
    • It increases the impact of memory overhead and processor cache pollution by increasing the number of simultaneous TCP control blocks that have to managed on both the client and the server.
    • The resulting reduced amount of data per flow makes it harder to fully open sender congestion windows.
    • Packet loss is increased due to the non correlated fluctuations of data to be sent between the parallel connections. Two competing flows that are both sending from infinite data sources will quickly adapt to share the bandwidth, but two flows that have a fluctuating demand (e.g. parallel persistent HTTP connections that periodically go idle and alive) will inherently have patterns of underutilizing and overutilizing the path. Overutilization results in either packet loss or excess buffering in the network, which leads to poor interactive response times.
When should we open a new parallel connection?
  1. when I don't have an idle connection and I need the answer with minimum latency
  2. when I expect existing connections are experiencing idleness and therefore not using all of the available bandwidth
 The approach HTTP implementations, including firefox, take to solving this quandary? They crudely enforce a constant number of connections per host and open them until they hit that limit. Variously across time that limit has commonly been 2, 4, and 6.  As server technology has evolved to the point many years ago where the impact of idleness was a bigger deal than the CPU overhead on the server we saw servers actually publish their resources under several virtual host names, even though it was all the same server, for the exclusive purpose of circumenting that per-host limit in the client.

I wonder if we can't do better in Firefox.. First, lets deal with the case of a low latency request. Right now all we do with them is to put them at the top of the waiting queue if the request cannot be dispatched immediately (because the limit of 6 has already been reached). But there are really two cases to consider:
  1. What to do when the network is not already saturated
  2. What to do when the network is saturated
 In both cases the first step for a truly low latency request is the same - open a new connection assuming there isn't an idle one available. However, note that establishing that connection is going to take at least 1 RTT for normal HTTP and 2 RTT for HTTPs - so we should actually watch for any existing HTTP transactions to complete on a different reusable persistent connection in between the time we start opening the new connection and the time the handshake is complete. If that happens the persistent connection should be used instead - that will require a change in the current logic where nsHttpConnection opens the sockets after it has been assigned a transaction. Instead nsHttpConnectionMgr should be opening the sockets as well as receiving the returned persistent connections and then should dispatch to them as they become available.

In the case of a saturated network some of the existing parallel connections should be stalled while the low latency request is satisfied in order to provide the most bandwidth for that important transaction. We can do this by temporarily slamming their recv windows to something close to 1 packet of data which will slow them down to a trickle. This can be done commensurately with the transmission of the prioritized request as it should take 1/2 RTT for the window change to reach the sender.

But what about the more common case where all transactions are of equal priority - how do we make the decision then about opening a new connection vs queueing a new transaction? Assuming we aren't concerned about head of line blocking issues (which we should be able to wrap up in a definition of priorty somehow), then we want to do this only when there is network idleness that can be covered up by parallelism. This approach is radically different than "open up to N" connections.

It isn't obvious exactly how to determine that in Necko. But then again, you are looking for data bursts followed by idleness - and its pretty obvious when you see it graphed out. This is the transfer pattern of a single http response I looked at a couple of weeks ago - it could happily overlap with another flow in order to more effectively utilize the whole pipe. (of course, if the server used a larger initial CWND, the problem would be massively reduced.)