The Pith of Performance: Bandwidth vs. Latency

Despite what you may have read in the press lately, neither our world nor the world of performance is flat. This is especially true of the performance metrics commonly known as bandwidth (or throughput) and latency. The performance relationship between these two metrics is curved or nonlinear. In general, this nonlinearity is a consequence of these two metrics being inversely related to each another: increase system throughput to decrease the latency of each request, and vice versa. A common misconception persists, however, that bandwidth and latency are independent performance metrics. I'm going to call that view the Flat-Earth view. Where does that view come from?

Window on the World

In part, it depends on your window to the world. When we look out a window, the Earth looks flat.

Occasionally, we are reminded that the Earth is actually curved. That's something most of us accept and expect, even if we're not aware of it all of the time.

To avoid any semantic problems, let's list some basic definitions:

Throughput (X): A rate metric. Number of completions per unit time.
Bandwidth (BW): Width of the network pipe. Maximum throughput X_max.
Latency (R): A time metric. Delay.
Response time (R): Time spent in the system which does not include any delays between the initiation of each new request.
Round-trip time (RTT): Time from start of request to end of response.

I've tried to explain this necessary dependency between throughput and latency before, but I like to refine my explanations, if possible. I think this attempt is more succinct than the previous effort, but feel free to let me know either way.

Flat-Earth View of the Performance World

Consider the Voyager I spacecraft, now 2/3rds of a light-day from Earth. A CMD-ACK pair takes about 32 hours RTT. Because of Einstein, there is absolutely nothing we can do to reduce that latency. That's the first point in the Flat-Earth view of the performance world.

The second point is that we could get more done by issuing more CMDs in parallel, but that would require a fatter pipe or more bandwidth. In reality, of course, the BW (single channel) was permanently fixed when the Voyager spacecraft was designed and built by NASA, some 35 years ago.

Nonetheless, we could imagine upgrading the on-board software or tinkering with various things between here and there to increase the bandwidth of the Voyager comms channel. Unlike the speed-of-light limitation for latency, there's no immutable physical law to prevent us trying to fatten the comms pipe. But, even if we were to be successful and change the channel BW, the latency would remain unchanged.

It's in this sense that the bandwidth and latency metrics appear to be independent, and that's what gives rise to the Flat-Earth view. What the Flat-Earthers overlook is that there is another delay involved: the time to decode the CMD, act on it, and set up the ACK to be sent back to Earth. It's a vastly shorter delay than the 32 hr RTT, but it's there. It means that NASA/JPL has to insert a suitable delay between successive CMDs issued to the Voyager. In other words, the CMDs have to be clocked just like chocolates on a conveyor belt.

The simplest clocking protocol is to wait for the ACK from the previous CMD before sending the next CMD. Without a clocked delay between CMDs, the Voyager might still be in the process of repositioning its antenna dish, for example, when the next CMD arrives. So, that CMD would be missed and that, in turn, would require CMD retransmission, which would make Voyager comms performance worse!

The Performance World is Curved

The Flat-Earth view of Voyager comms is analogous to looking out of an aircraft window while the plane is still on the ground. As this plot shows, at low loads, the bandwidth or throughput (black dotted line) is increasing whereas the latency (blue dotted line) remains fairly constant or flat.

The "fifty thousand foot" view, however, shows that, just like the real world, both the throughput (X) and the latency (R) are curved, not flat.

Queues Cause Curves

A more subtle solution to the problem of missed CMDs on the Voyager, would be to incorporate an on-board buffer. Such a buffer would eliminate the need to clock the CMDs because if they happened to pile up before the Voyager could decode them, they would be captured and stored in the buffer for later servicing. But, as I point out in my Perl::PDQ book, a buffer is just a queue and queues grow nonlinearly with increasing load. It's queueing that causes the throughput (X) and latency (R) profiles to be nonlinear.

Shown in more detail here, we see that the throughput (X) rises more or less linearly (left side) until something in the system saturates, i.e., pegs out (top). That resource becomes the system bottleneck. Since CMDs or requests cannot be handled any faster than the speed of the bottlenecked resource, any new requests simply queue up. That lengthening queue is reflected in higher response times (R) or latency (right side).

Indeed, this is how things would look on a saturated load-test system, for example, where the number of requests (N) in the system cannot be any larger than the number of clients (N) driving the system under test (SUT). Notice also that there is an interesting symmetry between the throughput profile (X) and the latency profile (R). It's easy to accept that X grows more or less linearly up to the saturation point. The symmetry means that R also grows more or less linearly beyond the saturation point, although a common mistake is to claim that it grows "exponentially."

This tendency toward linear growth under heavy load (N) is why the R profile is often referred to as a "hockey stick" shape. The complete profile is, of course, nonlinear.

What about unbounded queueing systems such as front-end servers on a web site? As the above plot shows, although the precise details are different, the general effect is the same. Increasing throughput and nonlinearly increasing latency. At low loads (near the origin), the curves look very similar to the load-test system. Note that the x-axis here is throughput (X) and not the number of load-clients (N) as in the previous example. That's why the throughput profile rises in 1-to-1 proportion here.

How do things look for a clocked system like Voyager comms? Since a delay is inserted between each CMD that is transimitted, the comms channel acts like a conveyor belt. In other words, there can never be any CMDs or packets waiting. Since there is no waiting time, the response time or latency looks flat under all loads. The only exception is if NASA were to drive the comms channel so fast that the 8-bit processor on the Voyager became overdriven. In that case, the latency would increase suddenly (the knee on the right-hand side) due to all the retries that would immediately be required.

The world of performance is curved, just like the real world, even though we may not always be aware of it. What you see depends on where your window is positioned relative to the rest of the world. Often, the performance world looks flat to people who always tend to work with clocked (i.e., deterministic) systems, e.g., packet networks or deep-space networks. But, just like looking out of an aircraft window while the plane is on the tarmac, that doesn't mean the performance world is actually flat.