Cloud Computing: An Overview

Cloud shifted the paradigm from owning hardware to renting computing resources on-demand. Hardware became a service.

Key Players in the Cloud Industry 🟨

The cloud computing market is dominated by several major providers, often referred to as the “Big Seven”, also called hyper-scalers. They are usually not interested in making it interoperable (they prefer the lock-in).

  1. Amazon Web Services (AWS): The largest provider, offering a comprehensive suite of cloud services.
  2. Microsoft Azure: Known for deep integration with enterprise systems and hybrid cloud solutions.
  3. Google Cloud Platform (GCP): Excels in data analytics, AI/ML, and Kubernetes-based solutions.
  4. IBM Cloud: Focuses on hybrid cloud and enterprise-grade AI.
  5. Oracle Cloud: Specializes in database solutions and enterprise applications.
  6. Alibaba Cloud: The leading provider in Asia, offering services similar to AWS.
  7. Salesforce: A major player in SaaS, particularly for CRM and business applications.
    These providers collectively control the majority of the global cloud infrastructure market, enabling scalable and on-demand computing resources for businesses worldwide.

Capital and Operational Expenses in the Cloud

Definition for CapEx and OpEx 🟥

Cloud computing transforms traditional IT cost structures by shifting expenses from capital expenditures (CapEx), such as purchasing servers and data centers, to operational expenditures (OpEx), where users pay only for the resources they consume.

Capital Expenses are usually depreciated (spread in time). Operation Expenses are typically tax-deductible.

Total cost of ownership

Better using hardware, leads to better performance at the same cost! So this is important to solve to keep costs of operating datacenters low. To compute the cost of ownership we need to take into account:

  • Capital expense in buying hardware (big part 61%)
  • Energy and cooling to keep it going
  • Networking costs
  • Other (like maintenance, cost of capital, amortized cost of building etc.)
  • People that operate the building (security, technicians).

Economic Advantages of the Cloud 🟨–

  • Increased Resource Utilization: Cloud providers optimize hardware usage by sharing resources across multiple users (multitenancy), reducing idle capacity.
  • Pay-as-You-Go Model: Organizations avoid upfront investments, paying instead for compute power, storage, and services as needed.
  • Scalability: Costs align dynamically with demand, eliminating over-provisioning.

Key Insight:
The cloud improves efficiency by enabling resource sharing. From the provider’s perspective, resource utilization increases significantly. For users, costs are proportional to actual usage, fostering financial flexibility.

Today’s platforms are highly inefficient from this point of view. (most of the resources are not used!).

Cloud Service Models

Cloud offerings are categorized into four primary service models, each providing distinct levels of abstraction and management responsibility.

Traditional methods 🟩

  • On-premises: The traditional model where organizations own and manage their infrastructure.
  • Colocation: Organizations rent space in a data center and manage their hardware.
    • Example Equinox that rents server cages.
  • Hosting. Then cloud came and made the system administration part much more easier for administrators, who do not need to worry about the hardware anymore. This was also the beauty of abstractions.

Now, in the second part of the cloud evolution, as described in (Schleier-Smith et al. 2021), is innovating how developers work: they don’t need to care about scaling and managing the infrastructure anymore, they just need to care about the code.

Infrastructure as a Service (IaaS) 🟩

Definition: IaaS provides virtualized computing resources over the internet, allowing users to rent fundamental infrastructure components.

Features:

  • Virtual Machines (VMs): Securely isolated partitions of physical servers, enabled by hypervisor-based virtualization.
  • Deployment Options:
    • Shared VMs: Cost-effective, multitenant environments.
    • Dedicated Servers: Virtualized but not shared (single-tenant).
    • Bare Metal Servers: Physical servers without virtualization.
  • Use Case: Ideal for organizations needing full control over their software stack but lacking physical hardware.
  • We pay by the minute.

Security Considerations:
Users rely on the provider’s implementation of isolation, hardware security, and virtualization integrity. This aligns with the shared responsibility model, where providers secure the infrastructure, while users protect their data and applications.

Examples:

  • AWS EC2 (Elastic Compute Cloud)
  • Microsoft Azure Virtual Machines
  • Google Compute Engine

Platform as a Service (PaaS) 🟨

Definition: PaaS delivers a managed environment for developing, testing, and deploying applications, abstracting underlying infrastructure.

Features:

  • Preconfigured middleware (e.g., databases, load balancers, autoscalers).
  • Streamlined deployment pipelines and DevOps tools (distributed caches).
  • Automatic scaling by the load (specify policies of scaling) and resource management.

Use Case: Reduces development overhead by handling infrastructure management, allowing teams to focus on coding and innovation, instead of managing infrastructure.

Examples:

  • AWS Elastic Beanstalk
  • Google App Engine
  • Heroku

Software as a Service (SaaS) 🟨–

Definition: SaaS delivers fully functional applications over the internet, managed entirely by the provider (classical software websites on the web, you have an application you just give them the data (dropbox, gmail, snowflake))

Features:

  • Users interact solely with the application interface.
  • No maintenance or infrastructure management required.
  • Often subscription-based.
  • The user just inputs the data, does not need to write software.

Examples:

  • Microsoft Office 365
  • Google Workspace (Gmail, Drive)
  • Salesforce CRM
  • Snowflake (cloud data warehousing)

Simple comparison of Service Models

Model Control Management Use Case
IaaS High (OS & apps) User-managed Custom infrastructure needs
PaaS Medium (apps) Provider-managed Streamlined app development
SaaS Low (configuration) Fully managed Ready-to-use software
FaaS Minimal (code) Fully automated Event-driven, short-lived tasks

Serverless computing

We hide the hardware, but we also hide the backend!

The future evolution of serverless computing, and in our view of cloud computing, will be guided by efforts to provide abstractions that simplify cloud programming. (Schleier-Smith et al. 2021).

Functions as a Service (FaaS) / Serverless Computing 🟩

Definition: FaaS allows users to deploy event-driven, stateless functions without managing servers.

Features:

  • Event Triggers: Functions execute in response to events (e.g., file uploads, API calls, scheduled by the cluster manager, but it is difficult to provide low latency scheduling, see Cluster Resource Management#Dirigent).
  • Millisecond Billing: Costs are based on execution time and memory used (fine grained billing!).
  • Automatic Scaling: Instances spin up/down seamlessly with demand.

Advantages:

  • Eliminates server provisioning and capacity planning from the developer/user.
  • Ideal for parallelizable tasks (e.g., video encoding, data processing).
  • Very fine granularity of billing.

Use Cases:

  • Small highly parallelizable functions triggered by events.
  • Real-time data processing
  • Microservices architecture
  • Batch jobs (e.g., compiling code, running unit tests, compressing or decompressing videos).

Examples:

  • AWS Lambda
  • Azure Functions
  • Google Cloud Functions
  • Sometimes we can see these as a supercomputer on demand.

Advantages and Disadvantages of Serverless Computing 🟨++

Advantages:

  • Easy to use (just add a single function + trigger)
  • Scalable (automatic scaling, very robust to bursts)
  • Cost-effective (pay only for execution time)

Limitations:

  • Stateless
  • No connection between each other or states between each other (we don’t have IPs between each other)
  • Limited resources (execution time, bandwidth, memory etc).
  • Difficulty in efficiently scheduling and running the lambdas
    • The provider has little info about application characteristics to optimize scheduling
    • In (Schleier-Smith et al. 2021) the authors propose that the developers themselves could add hints for the cloud provider to know which kind of communication patterns are more prevalent in that application.
    • The cloud provider could develop technologies of static code analysis to infer the main communication patterns of the program.
  • There are no general serverless platforms, against serverful classical platforms. (For some uses, serverless is too slow or inefficient).

Providing FaaS

There are three main challenges that are against securely hosting these services:

  • Fine-Grained functions -> densely packaged functions
  • Isolate functions from one to another
    • Functions usually cannot communicate with each other.
    • Fixed resources are allocated to the functions.
  • We should be able to spin up the resources quickly (today on the order of hundreds of milliseconds)
    • See next section for warm and cold starts.

Warm and Cold starts 🟨–

Lambdas cannot execute out of the blue:

  • Boot function sandboxes
  • Fetch function codes
  • Application runtimes
  • And then execute. The first three parts are the cold start where a lot of the stuff needs to be downloaded and configured. The hot start is just keeping around the sandboxes, and the required dependencies and then execute the lambdas. But it has the problem that this is consuming memory.

Microsoft has seen that hot starts need on average 16x more memory compared to cold starts, over 100 function runs.

Firecracker

Firecracker ((Agache et al. 2020)) is a lightweight, open-source Virtual Machine Monitor (VMM) designed to run microVMs with minimal overhead. Developed by AWS for services like AWS Lambda and AWS Fargate, Firecracker is optimized for secure, fast-booting, and resource-efficient virtualization. It uses KVM (Kernel-based Virtual Machine) and provides strong isolation while maintaining near-native performance. Alternatives at the time excluded security with minimal overhead. It has been in production since 2018.

Key features of Firecracker 🟨+

It offers memory overhead of less than 5MB per container, boots to application code in less than 125ms, and allows creation of up to 150 MicroVMs per second per host.

  • MicroVMs: Lightweight VMs with a minimal memory and CPU footprint (efficient) 5MB per vm.
    • They removed known things that are usually not useful for the cloud computing parts:
    • VM migration, BIOS, cpu emulation, devices handling are some examples of what was removed.
  • Fast Boot Times
    • Can start microVMs in around less than 125 milliseconds instead of seconds.
    • They can create about 150 vms per host.
  • Security: Implements strong isolation with seccomp filters (see Container Virtualization#Linux Containers) and a minimal attack surface.
    • The security critical part is moved to the hardware.

It’s a nice way to develop machines that have some properties of VMs (See Architettura software del OS) or Container Virtualization. They preferred to use VMs for security reasons.

Usages and Development

Usages:

  • API-Driven: Exposes a REST API for managing microVMs programmatically.
  • Optimized for Serverless: Used in AWS Lambda and Fargate to run function-based workloads efficiently.

Instead of using QEMU on top of KVM, they developed this lightweight version, see Architettura software del OS.

Open Questions for MicroVMs

  • Latency overhead: boot time is still non-trivial (> 100ms)
  • Memory overhead: need a guest OS per MicroVM, keep around hot MicroVMs

And it is still hard to optimize scheduling (charging price even if I’m blocked on I/O).

Dandelion 🟥++

Dandelion can be seen as an alternative approach compared to microVMs. It needs to use some specifically build software that separates the computation part with the communication part. For the computation they allow basically to execute everything, even untrusted code. While for communication they ask for some specific trusted code (API for communication). This separation enables a lightweight yet secure function execution system, making functions more amenable to hardware acceleration and facilitating dataflow-aware function orchestration. Knowing the communication patterns helps to optimize when you want to allocate it.

Resource allocation

We have first explored some methods in Massive Parallel Processing. Cloud providers pack the requests together to run the functions, but this is not always efficient currently.

We need to take into consideration some important key factors:

  • Requirements change quite often.
  • Fewer configurations so that they are easier to manage.
  • Constant power budget

Uncorrelated Resource usages

The difficult part to handle is that the usage of CPU and Disk are uncorrelated, at least their relation is not simply linear or inversely linear. The main implication of this is: it is difficult to have a balanced resource usage -> sunk cost: we have resources that we are not using, but we are paying for.

  • 80% of servers have less than 20% of utilization, or similar numbers (most of computers have not so much utilization).
  • Sometimes big events are present (see twitter crash in 2009), and in this case the resources are used, but these events are very very sparse.

Another notable uncorrelation is the utilization and the power usage! This was quite not intuitive: old servers consumed 60% of the max power usage even at 0 utilization!

Disaggregated resource pools

Instead of having fixed allocation to various applications, we can have a logical layer that says that we have a specific pool of each kind of resource and then try to allocate them independently. The idea is that by decoupling the resources, it is easier to allocate them.

Software

We usually divide software in different tiers. Here’s a more structured, textbook-style version of your notes:

Main Layers

In modern software architecture, systems are commonly structured into three primary layers, closely resembling the architecture described by (Calder et al. 2011):

  1. Presentation Layer – This layer is responsible for the user interface and user experience. It handles input from users and displays the processed data in an understandable manner.
  2. Application Layer – Sometimes referred to as the business logic layer, this component processes data, executes operations, and enforces business rules. It serves as an intermediary between the presentation and data layers.
  3. Data Layer – This layer is dedicated to data storage and retrieval. It communicates with databases or other storage systems to ensure data persistence and integrity.

These layers interact with each other in a structured manner to maintain modularity and separation of concerns. However, increasing the number of levels of indirection can help solve certain system design challenges, such as scalability or maintainability. On the other hand, excessive layering introduces overhead, making the system more complex to maintain and potentially reducing its performance.

Types of Tiers 🟥

Software architectures have evolved over time to address scalability, maintainability, and efficiency. The following outlines the major transitions in architectural paradigms:

  • 1-Tier (Monolithic Architecture)
    • A single system that integrates all components into a single unit.
    • Simple to develop and manage since there are no inter-process or network communication overheads.
    • Lacks scalability, as increasing demand requires vertical scaling (adding more resources to a single server).
  • 2-Tier Architecture (Client-Server Model)
    • Introduced API-based communication through remote procedure calls (RPC).
    • Standardized API interfaces, often implemented via REST APIs over HTTP.
    • More complex than monolithic systems due to the need for synchronization between client and server.
    • Prone to compatibility issues, as both client and server must adhere to the same API specifications.
    • We need to have an interface, as we have a separation between client and server.
    • There is no communication with one server and another (so difficult to coordinate and synchronize the servers).
  • 3-Tier Architecture (Middleware-Based Systems)
    • Introduced an intermediary middleware layer that processes business logic independently of the client and data layers. (level of indirection, this is also the root of microservice architecture)
    • Enabled better scalability by distributing workload across multiple servers.
    • Served as the foundation for micro-services architectures, where different services operate independently and communicate via lightweight protocols such as HTTP or message queues.

Performance evaluation

  • Understand why a system behaves in the way it does This is based on the paper (Ousterhout 2018).

Common errors in performance evaluation

  • Bugs in the benchmarking code (difficult to spot, as they just product erroneous measures, instead of faulting irrecoverably)
  • Making educated guesses instead of actually measuring the performance that is object of the discourse. Often these results are simply not true. One easy way is checking for inconsistencies.
  • Not measuring everything. This leads to superficial measurements. Another effect that this often leads is confimation bias of the original hypothesis, if we measure only that thing. One way to fix this is measuring at a lower level, or measuring in different manners.

References

[1] Calder et al. “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency” ACM 2011

[2] Agache et al. “Firecracker: Lightweight Virtualization for Serverless Applications” 17th {{USENIX Symposium}} on {{Networked Systems Design}} and {{Implementation}} ({{NSDI}} 20) 2020

[3] Ousterhout “Always Measure One Level Deeper” Communications of the ACM Vol. 61(7), pp. 74–83 2018

[4] Schleier-Smith et al. “What Serverless Computing Is and Should Become: The next Phase of Cloud Computing” Communications of the ACM Vol. 64(5), pp. 76–84 2021