Introduction to cluster management

How can we allocate the resources in a cluster in an efficient manner? How can we allocate resources fairly?

Two step allocations 🟨++

There are two main kinds of allocation: first you need to allocate resources to a process, then allocate the process physically in the cluster.

Private and public cluster management 🟥++

Cluster management could be private or public.

Private means every app is managing their own sub-cluster: each app receives a private, static set of resources. Here it is easier to manage hardware for various needs. Public means there is a big cluster, like standard third party

Desiderata for Cluster Management 🟨–

  • Fairness
    • Users should be granted resources proportional to how much they are paying • Flexibility to express priorities
  • Efficient resource usage
    • Work conservation: Resources should not idle while there are users whose demand is not fully satisfied
  • Performance isolation
    • Guarantee that misbehaved collocated applications cannot affect me “too much”

Max-Min Scheduling

The method 🟩

The algorithm is quite easy:

  • Allocate resources in order of increasing demand
  • No user gets a resource share larger than its demand
  • Users with unsatisfied demands get an equal share of the resource

The problem is when you have different kinds of resources to share. Another problem is that it is not always predictable to know how many tokens are you getting, this is mostly dependent by what others require.

Cluster Management Policies-20250312123022124 #### Fairness Considerations 🟨-- One can prove that this algorithm is fair: - Nobody gets more than what it demands - When a user's request is not fully satisfied, the user gets an equal share of the remaining resources - It is possible to extend with with priorities. - It is **strategy proof**: users are not better off if they lie about their demands. - For example a scheduler that assigns resources based on CPU utilization alone is not strategy-proof: applications could fake their usages to get more resources.

Weighted fair scheduling

In the weighted fair scheduling, you assign a weight to each user, and then allocate resources proportionally to the weight. Thi It is a little different from weighted max-min fairness.

Dominant Resource Fairness

The Idea 🟨

The idea is quite simple: consider the dominant resource usage by each user. Then allocate the containers in a manner that is constant with respect to max resource usage ratio / number of containers. For example if Alice is using 6% of the available CPU, and Bob 3% of the available RAM, then bob should have twice as the number of containers compared to Alice.

Worked out Example

Suppose you have 9 CPU and 18 GB.

  • Alice wants to issue processes 1 CPU and 4 GB
  • Bob demands to issue processes with 3CPU and 1GB.

You want to equalize the share of the dominant resources with same constraints. Cluster Management Policies-20250319143030555

Token Bucket

  • Has a size (number of tokens in the bucket)
  • The fill rate, how quickly it is filled up. Commonly used for controlling network and storage traffic. Basically the buckets are filled up at a constant rate, and spent when the transmission is needed.

Resource Assignment

The algorithm is easy: filter for hard constraints and order for soft constraints. Users usually do not know what their application needs. Often users tend to ask more than they need, this is called overprovisioning.

Problem of low resource usage

From the point of view of the provider this is not optimal since we will have a low resource utilization:

  • Hard consumption and costs. This problem has improved (Google Borg cluster manager has around 50% utilization by allocating to almost double CPU!).

Performance Goal assignments

See (Delimitrou & Kozyrakis 2014). We want to allocate resources based on performance goals, not predicted resource usages.

  • For each new application, we need to recommend a resource allocation and assignment
  • We need to base this raccomendation based on previous needs of similar applications.
  • It does collaborative filtering to decide the resource allocation.

References

[1] Delimitrou & Kozyrakis “Quasar: Resource-Efficient and QoS-aware Cluster Management” Association for Computing Machinery 2014