Saturday, August 31, 2013

SSL Everywhere for HTTP/2 - A New Hope

Recently the IETF working group on HTTP met in Berlin, Germany and discussed the concept of mandatory to offer TLS for HTTP/2, offered by Mark Nottingham.  The current approach to transport security means only 1/5 of web transactions are given the protections of TLS.  Currently all of the choices are made by the content owner via the scheme of the url in the markup.

It is time that the Internet infrastructure simply gave users a secure by default transport environment - that doesn't seem like a radical statement to me. From FireSheep to the Google Sniff-Wifi-While-You-Map Car to PRISM there is ample evidence to suggest that secure transports are just necessary table stakes in today's Internet.

This movement in the IETF working group is welcome news and I'm going to do everything I can to help iron out the corner cases and build a robust solution.

My point of view for Firefox has always been that we would only implement HTTP/2 over TLS. That means https:// but it has been my hope to find a way to use it for http:// schemes on servers that supported it too. This is just transport level security - for web level security the different schemes still represent different origins. If cleartext needed to be used it would be done with HTTP/1 and someday in the distant future HTTP/1 would be put out to pasture. This roughly matches Google Chrome's public stance on the issue.

Mandatory to offer does not ban the practice of cleartext - if the client did not want TLS a  compliant cleartext channel could be established. This might be appropriate inside a data center for instance - but Firefox would be unlikely to do so.

This approach also does not ban intermediaries completely. https:// uris of course remain end to end and can only be tunneled (as is the case in HTTP/1), but http:// uris could be proxied via appropriately configured clients by having the HTTP/2 stream terminated on the proxy. It would prevent "transparent proxies" which are fundamentally man in the middle attacks anyhow.

Comments over here: https://plus.google.com/100166083286297802191/posts/XVwhcvTyh1R

Tuesday, August 13, 2013

Go Read "Reducing Web Latency - the Virtue of Gentle Aggression"

One of the more interesting networking papers I've come across lately is Reducing Web Latency: the Virtue of Gentle Aggression. It details how poorly TCP's packet loss recovery techniques map onto HTTP and proposes a few techniques (some more backwards compatible than others) for improving things. Its a large collaborative effort from Google and USC. Go read it.

The most important piece of information is that TCP flows that experienced a loss took 5 times longer to complete than flows that did not experience a loss. 10% of flows fell into the "some loss" bucket. For good analysis go read the paper, but the shorthand headline is that this is because the short "mice" flows of HTTP/1 tend to be comprised mostly of tail packets and traditionally tail loss is only recovered using horrible timeout based strategies.

This is the best summary of the problem that I've seen - but its been understood for quite a while. It is one of the motivators of HTTP/2 and is a theme underlying several of the blog posts google has made about QUIC.

The "aggression" concept used in the paper is essentially the inclusion of redundant data in the TCP flow under some circumstances. This can, at the cost of extra bandwidth, bulk up mice flows so that their tail losses are more likely to be recovered with single RTT mechanisms such as SACK based fast recovery instead of timeouts. Conceivably this extra data could also be treated as an error correction coding which would allow some losses to be recovered independent of the RTT.

Saturday, August 3, 2013

Internet Society Briefing Panel @ IETF 87

This past week saw IETF 87, in Berlin Germany, come and go. As usual I met a number of interesting new folks, caught up with old acquaintances I respect (a number of whom now work for Google - it seems to happen when you're not looking at them :)), and got some new ideas into my head (or refined old ones).

On Tuesday I was invited to be part of a panel for the ISOC lunch with Stuart Cheshire and Jason Livingood. We were lead and moderated by the imitable Leslie Daigle.

The panel's topic was "Improving the Internet Experience, All Together Now." I made the case that the standards community needs to think a little less about layers and a little more about the end goal - and by doing so we can provide more efficient building blocks. Otherwise developers find them selves tempted to give up essential properties like security and congestion control in order to build more responsive applications. But we don't need to make that tradeoff.

The audio is available here http://www.internetsociety.org/internet-society-briefing-panel-ietf-87 .. unfortunately the audio is full of intolerable squelch until just after the 22:00 mark (I'm the one speaking at that point).. the good news is most of that is "hold music", introductions, and a recap of the panel topic that is also described on the web page.

Thursday, June 6, 2013

RTT - a lot more than time to first byte

Lots of interesting stuff in this paper: Understanding the latency benefits of multi-cloud webservice deployments.

One of the side angles I find interesting is that they find that even if you built an idealized fully replicated custom CDN using multiple datacenters on multiple clouds there are still huge swaths of Internet population that have browser to server RTTs over 100ms. (China, India, Argentina, Israel!)

RTT is the hidden driver in so much of internet performance and to whatever extent possible it needs to be quashed. Its so much more than time to first byte.

Bandwidth, CPU, even our ability to milk more data out of radio spectrum, all improve much more rapidly than RTT can.

Making RTT better is really a physical problem and is, in the general case at least, really really hard. Making things less reliant on RTT is a software problem and is a more normal kind of hard. Or at least it seems if you're a software developer.

RTT is not just time to the first byte of a web request. It's DNS. Its the TCP handshake. Its the SSL handshake. Its the time it takes to do a CORS check before an XHR. Its the time it takes to grow the congestion window (i.e. how fast you speed up from slow start) or confirm revocation status of the SSL certificate. Its the time it takes to do an HTTP redirect. And it is the time you're stalled while you recover from a packet loss. They're all some factor of RTT.

HTTP/2 and SPDY take the first step of dealing with this by creating a prioritized and muxxed protocol which tends to make it more often bandwidth bound than HTTP/1 is. But even that protocol still has lots of hidden RTT scaling algorithms in it.


Tuesday, May 28, 2013

If a tree falls in a forest and nobody is there to hear it..

Robert Koch's Masters Thesis about websockets security and adoption is linked.

One interesting tidbit - he surveyed 15K free apps from the google play store and saw only 14 that use websockets. Previous surveys of traditional websites have seen similarly poor uptake.

WS certainly isn't optimal, but it has a few pretty interesting properties and its sad to see it underused. I suspect its reliance on a different server and infrastructure stack is part of the reason.


Thursday, March 7, 2013

Firefox Nightly on WebPageTest

Thanks to Google's amazing Pat Meenan, Firefox nightly has been added to webpagetest.org - just make sure to select the Dulles VA location to be able to use it.

He shared the news on twitter. Thanks Pat!

WPT is a gold standard tool and to be able to see it reflect (and validate/repudiate) changes in nightly is really terrific addition.




Monday, January 14, 2013

On Induced Latency and Loss

The estimable Mark Allman has a new paper out in the latest ACM CCR dealing with data collection around buffering in the Internet. If you respect a good characterization paper as much as I do - you should read it and a couple related mailing list threads here and here. The paper is only 7 pages; go read it -- this blog post will wait for you.

The paper's title, Comments on Bufferbloat, in my mind undersells the contributions of the paper. Colloquially I consider bufferbloat to refer specifically to deeply filled queues at or below the IP layer that create latency problems for real-time applications using different transport layer flows. (e.g. FTP/Torrent/WebBrowser creating problems for perhaps unrelated VOIP). But the paper provides valuable data on real world levels of induced latency where it has implications for TCP Initial Window (IW) sending sizes and the web-browser centric concern of managing connection parallelism in a world with diverse buffer sizes and IWs. That's why I'm writing about it here.

Servers inducing large latency and loss through parallelized sending is right now a bit of a corner case problem in the web space that seems to be growing. My chrome counterpart Will Chan talks a bit about it too over here.

For what its worth, I'm still looking for some common amounts of network buffering to use as configurations in simulations on the downstream path. The famous Netalyzer paper has some numbers on upstream. Let me know if you can help.

The traffic Mark looked at was not all web traffic (undoubtedly lots of P2P), but I think there are some interesting take aways for the web space. These are my interpretations - don't let me put words in the author's mouth for you when the paper is linked above:

  • There is commonly some induced delay due to buffering.. the data in the paper shows residential rtts of 78ms typically bloated to ~120ms. That's not going to mess with real time too badly, but its an interesting data point.
  • The data set shows bufferbloat existence at the far end of the distribution, At the 99th percentile of round trips on residential connections you see over 900ms of induced latency.
  • The data set shows that bufferbloat is not a constant condition - the amount of induced latency is commonly changing (for better and for worse) and most of the time doesn't fluctuate into dangerous ranges.
  • There are real questions of whether other data sets would show the same thing, and even if they did it isn't clear to me what the acceptable frequency of these blips would be to a realtime app like VOIP or RTC-Web.
  • IW 10 seems reasonably safe from the data in Mark's paper, but all of his characterizations don't necessarily match what we see in the web space. In particular the size of our flows are not as often less than 3 packets in size as they are in the paper's data (which is not all web traffic). There are clearly deployed corner cases on the web where server's send way too much data (typically through sharding) and induce train wrecks on the web transfer. How do we deal with that gracefully in a browser without sharding or IW=10 for hosts that use them in a more coordinated way? That's a big question for me right now.

[ Comments on this are best done at https://plus.google.com/100166083286297802191/posts/6XB59oaQzDL]