Some specific phenomenons in modern systems happen only when we scale into large systems. This note will gather some observations about the most important phenomena we observe at these scales.

Tail Latency Phenomenon

Tail latency refers to the high-end response time experienced by When scaling our services, using Massive Parallel Processing, and similar technology, it is not rare that a small percentage of requests in a system experience a high-end response time, typically measured at the 95th or 99th percentile. This significant delays that can degrade user experience or system reliability.

Source of the Bottlenecks

These outliers often result from resource contention, network variability, or workload spikes. Often it is caused by an overload of a specific resource in the system. Recall from Cloud Storage, that data-centers, have a limited availability of CPU cores, RAM, storage, network bandwidth. If one of these resources is full, then the machines could suffer from high latency.

Typical latencies

Performance at Large Scales-20250118141920729

We observe that simply reading from the disk, already brings a latency of 20ms.

If we compare this with the throughput, it’s quite easy to see something on the order of MegaBytes per second.

These values are important to consider as the user’s waiting time is exactly the sum of the initial latency + time of transmission.

Hedge Requests

Reducing tail latency is critical in latency-sensitive systems, such as cloud services, where even occasional delays can lead to cascading performance issues or customer dissatisfaction.

One big problems becomes identifying the source of latency, and then connecting to the machine to solve it.

One common approach is called Hedge Requests:

Basically, do all tasks twice.
Monitor, and start tasks when you notice some stragglers. This is what it’s done in (Dean & Ghemawat 2008).

With the first solution, the cluster should be twice as slow, because it has to do so much more computation.

Solving Disk-IO

For example, if the disk-io is the bottleneck, using Apache Spark or MapReduce helps to solve this problem, as we are now reading from disk in parallel.

Law’s of speedups

Amdahl’s Law

Amdahl’s Law quantifies the potential speedup of a system when a portion of its workload is parallelized. It states that the overall improvement is limited by the fraction of the workload that remains sequential. We assume we have a constant problem size.

Mathematically, the speedup $S$ is given by $S = \frac{1}{(1 - P) + \frac{P}{N}}$, where $P$ is the parallelizable fraction, and $N$ is the number of parallel processing units. This principle highlights the diminishing returns of adding more resources when the sequential portion dominates, emphasizing the importance of optimizing both parallel and sequential components for maximum performance gains.

The takeaway from here is that not everything is parallelizable or speedable with just more machines.

$$ S = \frac{1}{(1 - P)}$$

Gustafson’s Law

Gustafson’s Law provides an alternative perspective to Amdahl’s Law, focusing on scalability in parallel computing. It asserts that increasing the problem size can maintain efficiency as more processors are added, rather than being limited by a fixed workload. The speedup SS is expressed as S=P⋅N+(1−P)S = P \cdot N + (1 - P), where PP is the fraction of work that scales with parallelism, and NN is the number of processors. By emphasizing the expansion of parallel workloads, Gustafson’s Law highlights the potential for achieving greater performance by leveraging the computational power of modern systems effectively.

References

[1] Dean & Ghemawat “MapReduce: Simplified Data Processing on Large Clusters” Communications of the ACM Vol. 51(1), pp. 107--113 2008

Tail Latency Phenomenon#

Source of the Bottlenecks#

Typical latencies#

Hedge Requests#

Solving Disk-IO#

Law’s of speedups#

Amdahl’s Law#

Gustafson’s Law#

References#