Cloud Computing: An Overview

Cloud shifted the paradigm from owning hardware to renting computing resources on-demand. Hardware became a service.

Key Players in the Cloud Industry

The cloud computing market is dominated by several major providers, often referred to as the “Big Seven”, also called hyper-scalers. They are usually not interested in making it interoperable (they prefer the lock-in).

Amazon Web Services (AWS): The largest provider, offering a comprehensive suite of cloud services.
Microsoft Azure: Known for deep integration with enterprise systems and hybrid cloud solutions.
Google Cloud Platform (GCP): Excels in data analytics, AI/ML, and Kubernetes-based solutions.
IBM Cloud: Focuses on hybrid cloud and enterprise-grade AI.
Oracle Cloud: Specializes in database solutions and enterprise applications.
Alibaba Cloud: The leading provider in Asia, offering services similar to AWS.
Salesforce: A major player in SaaS, particularly for CRM and business applications.
These providers collectively control the majority of the global cloud infrastructure market, enabling scalable and on-demand computing resources for businesses worldwide.

Capital and Operational Expenses in the Cloud

Definition for CapEx and OpEx

Cloud computing transforms traditional IT cost structures by shifting expenses from capital expenditures (CapEx), such as purchasing servers and data centers, to operational expenditures (OpEx), where users pay only for the resources they consume.

Capital Expenses are usually depreciated (spread in time). Operation Expenses are typically tax-deductible.

Total cost of ownership

Better using hardware, leads to better performance at the same cost! So this is important to solve to keep costs of operating datacenters low. To compute the cost of ownership we need to take into account:

Capital expense in buying hardware (big part 61%)
Energy and cooling to keep it going
Networking costs
Other (like maintenance, cost of capital, amortized cost of building etc.)
People that operate the building (security, technicians).

Economic Advantages of the Cloud

Increased Resource Utilization: Cloud providers optimize hardware usage by sharing resources across multiple users (multi-tenancy), reducing idle capacity.
Pay-as-You-Go Model: Organizations avoid upfront investments, paying instead for compute power, storage, and services as needed.
Scalability: Costs align dynamically with demand, eliminating over-provisioning.
Cost of operations: they are reduces in comparison to alternatives (enterprise and similar).

As (Agache et al. 2020) puts it:

Their popularity is due to reduced cost of operations, improved utilization of hardware, and faster scaling than traditional deployment methods.

Key Insight:
The cloud improves efficiency by enabling resource sharing. From the provider’s perspective, resource utilization increases significantly. For users, costs are proportional to actual usage, fostering financial flexibility.

Today’s platforms are highly inefficient from this point of view. (most of the resources are not used!).

Cloud Service Models

Cloud offerings are categorized into four primary service models, each providing distinct levels of abstraction and management responsibility.

Traditional methods

On-premises: The traditional model where organizations own and manage their infrastructure.
Colocation: Organizations rent space in a data center and manage their hardware.
- Example Equinox that rents server cages, with physical access keys
Hosting. Then cloud came and made the system administration part much more easier for administrators, who do not need to worry about the hardware anymore. This was also the beauty of abstractions.

Now, in the second part of the cloud evolution, as described in (Schleier-Smith et al. 2021), is innovating how developers work: they don’t need to care about scaling and managing the infrastructure anymore, they just need to care about the code.

Infrastructure as a Service (IaaS)

Definition: IaaS provides virtualized computing resources over the internet, allowing users to rent fundamental infrastructure components.

Features:

Virtual Machines (VMs): Securely isolated partitions of physical servers, enabled by hypervisor-based virtualization.
Deployment Options:
- Shared VMs: Cost-effective, multitenant environments.
- Dedicated Servers: Virtualized but not shared (single-tenant).
- Bare Metal Servers: Physical servers without virtualization.
Use Case: Ideal for organizations needing full control over their software stack but lacking physical hardware.
We pay by the minute.

Security Considerations:
Users rely on the provider’s implementation of isolation, hardware security, and virtualization integrity. This aligns with the shared responsibility model, where providers secure the infrastructure, while users protect their data and applications.

Examples:

AWS EC2 (Elastic Compute Cloud)
Microsoft Azure Virtual Machines
Google Compute Engine

Platform as a Service (PaaS)

Definition: PaaS delivers a managed environment for developing, testing, and deploying applications, abstracting underlying infrastructure.

Features:

Preconfigured middleware (e.g., databases, load balancers, autoscalers).
Streamlined deployment pipelines and DevOps tools (distributed caches).
Automatic scaling by the load (specify policies of scaling) and resource management.

Use Case: Reduces development overhead by handling infrastructure management, allowing teams to focus on coding and innovation, instead of managing infrastructure.

Examples:

AWS Elastic Beanstalk
Google App Engine
Heroku

Software as a Service (SaaS)

Definition: SaaS delivers fully functional applications over the internet, managed entirely by the provider (classical software websites on the web, you have an application you just give them the data (dropbox, gmail, snowflake))

Features:

Users interact solely with the application interface.
No maintenance or infrastructure management required.
Often subscription-based.
The user just inputs the data, does not need to write software.

Examples:

Microsoft Office 365
Google Workspace (Gmail, Drive)
Salesforce CRM
Snowflake (cloud data warehousing)

Simple comparison of Service Models

Model	Control	Management	Use Case
IaaS	High (OS & apps)	User-managed	Custom infrastructure needs
PaaS	Medium (apps)	Provider-managed	Streamlined app development
SaaS	Low (configuration)	Fully managed	Ready-to-use software
FaaS	Minimal (code)	Fully automated	Event-driven, short-lived tasks

Serverless computing

We hide the hardware, but we also hide the backend!

The future evolution of serverless computing, and in our view of cloud computing, will be guided by efforts to provide abstractions that simplify cloud programming. (Schleier-Smith et al. 2021).

Functions as a Service (FaaS) / Serverless Computing

Definition: FaaS allows users to deploy event-driven, stateless functions without managing servers.

Features:

Event Triggers: Functions execute in response to events (e.g., file uploads, API calls, scheduled by the cluster manager, but it is difficult to provide low latency scheduling, see Cluster Resource Management#Dirigent).
Millisecond Billing: Costs are based on execution time and memory used (fine grained billing!).
- Limited execution time, usually 15 minutes.
Automatic Scaling: Instances spin up/down seamlessly with demand.

Use Cases:

Small highly parallelizable functions triggered by events.
Real-time data processing
Microservices architecture
Batch jobs (e.g., compiling code, running unit tests, compressing or decompressing videos).

Examples:

AWS Lambda
Azure Functions
Google Cloud Functions
Sometimes we can see these as a supercomputer on demand.

Advantages and Disadvantages of Serverless Computing

Advantages:

Easy to use (just add a single function + trigger)
Scalable (automatic scaling, very robust to bursts)
Cost-effective (pay only for execution time)
Eliminates server provisioning and capacity planning from the developer/user.
Ideal for parallelizable tasks (e.g., video encoding, data processing).
Very fine granularity of billing.

Limitations:

Stateless
No connection between each other or states between each other (we don’t have IPs between each other)
- E.g. if we need to use Massive Parallel Processing frameworks, we write to a shared location.
Limited resources (execution time, bandwidth, memory etc).
Difficulty in efficiently scheduling and running the lambdas
- The provider has little info about application characteristics to optimize scheduling
- In (Schleier-Smith et al. 2021) the authors propose that the developers themselves could add hints for the cloud provider to know which kind of communication patterns are more prevalent in that application.
- The cloud provider could develop technologies of static code analysis to infer the main communication patterns of the program.
There are no general serverless platforms, against serverful classical platforms. (For some uses, serverless is too slow or inefficient).

Providing FaaS

There are three main challenges that are against securely hosting these services:

Fine-Grained functions -> densely packaged functions
Isolate functions from one to another
- Functions usually cannot communicate with each other.
- Fixed resources are allocated to the functions.
We should be able to spin up the resources quickly (today on the order of hundreds of milliseconds)
- See next section for warm and cold starts.

Warm and Cold starts

Lambdas cannot execute out of the blue:

Boot function sandboxes
Fetch function codes
Application runtimes
And then execute. The first three parts are the cold start where a lot of the stuff needs to be downloaded and configured. The hot start is just keeping around the sandboxes, and the required dependencies and then execute the lambdas. But it has the problem that this is consuming memory.

Microsoft has seen that hot starts need on average 16x more memory compared to cold starts, over 100 function runs.

Cloud Computing Services-20250603094049097 — Difference between cold and hot request | Slides from CCA ETHz 2025

Firecracker

Firecracker ((Agache et al. 2020)) is a lightweight, open-source Virtual Machine Monitor (VMM) designed to run microVMs with minimal overhead. Developed by AWS for services like AWS Lambda and AWS Fargate, Firecracker is optimized for secure, fast-booting, and resource-efficient virtualization. It uses KVM (Kernel-based Virtual Machine) and provides strong isolation while maintaining near-native performance. Alternatives at the time excluded security with minimal overhead. It has been in production since 2018.

Key features of Firecracker

It offers memory overhead of less than 5MB per container, boots to application code in less than 125ms, and allows creation of up to 150 MicroVMs per second per host.

MicroVMs: Lightweight VMs with a minimal memory and CPU footprint (efficient) 5MB per vm.
- They removed known things that are usually not useful for the cloud computing parts:
- VM migration, BIOS, cpu emulation, devices handling are some examples of what was removed.
Fast Boot Times
- Can start microVMs in around less than 125 milliseconds instead of seconds.
- They can create about 150 vms per host.
Security: Implements strong isolation with seccomp filters (see Container Virtualization#Linux Containers) and a minimal attack surface.
- The security critical part is moved to the hardware.

It’s a nice way to develop machines that have some properties of VMs (See Architettura software del OS) or Container Virtualization. They preferred to use VMs for security reasons.

Prebooted pool

Firecracker uses a prebooted pool of microVMs to minimize cold start times. This approach involves keeping a pool of pre-initialized microVMs ready to handle incoming requests, significantly reducing the time it takes to respond to events. This is particularly useful for workloads with unpredictable spikes in demand.

Prebooted Pool: A set of pre-initialized microVMs that can be quickly activated to handle incoming requests, reducing cold start times.
Warm Start: Reusing existing microVMs from the prebooted pool for subsequent requests, minimizing latency.
Cold Start: The initial boot time of a microVM when no prebooted instances are available, leading to higher latency (e.g. downloading code and dependencies from the disk, it’s a machine that is not yet able to accept requests). Use Little’s law to know how many microVMs to keep around in the pool, see Queueing Theory.

Usages and Development

Usages:

API-Driven: Exposes a REST API for managing microVMs programmatically.
Optimized for Serverless: Used in AWS Lambda and Fargate to run function-based workloads efficiently.

Instead of using QEMU on top of KVM, they developed this lightweight version, see Architettura software del OS.

Open Questions for MicroVMs

Latency overhead: boot time is still non-trivial (> 100ms)
Memory overhead: need a guest OS per MicroVM, keep around hot MicroVMs

And it is still hard to optimize scheduling (charging price even if I’m blocked on I/O).

Dandelion

Dandelion can be seen as an alternative approach compared to microVMs. It needs to use some specifically build software that separates the computation part with the communication part. For the computation they allow basically to execute everything, even untrusted code. While for communication they ask for some specific trusted code (API for communication). This separation enables a lightweight yet secure function execution system, making functions more amenable to hardware acceleration and facilitating dataflow-aware function orchestration. Knowing the communication patterns helps to optimize when you want to allocate it.

Resource allocation

We have first explored some methods in Massive Parallel Processing. Cloud providers pack the requests together to run the functions, but this is not always efficient currently.

We need to take into consideration some important key factors:

Requirements change quite often.
Fewer configurations so that they are easier to manage.
Constant power budget

Uncorrelated Resource usages

The difficult part to handle is that the usage of CPU and Disk are uncorrelated, at least their relation is not simply linear or inversely linear. The main implication of this is: it is difficult to have a balanced resource usage -> sunk cost: we have resources that we are not using, but we are paying for.

80% of servers have less than 20% of utilization, or similar numbers (most of computers have not so much utilization).
Sometimes big events are present (see twitter crash in 2009), and in this case the resources are used, but these events are very very sparse.

Another notable uncorrelation is the utilization and the power usage! This was quite not intuitive: old servers consumed 60% of the max power usage even at 0 utilization!

Disaggregated resource pools

Instead of having fixed allocation to various applications, we can have a logical layer that says that we have a specific pool of each kind of resource and then try to allocate them independently. The idea is that by decoupling the resources, it is easier to allocate them.

Software

We usually divide software in different tiers. Here’s a more structured, textbook-style version of your notes:

Tiers of Software Services

Basic Layers

In modern software architecture, systems are commonly structured into three primary layers, closely resembling the architecture described by (Calder et al. 2011):

Presentation Layer – This layer is responsible for the user interface and user experience. It handles input from users and displays the processed data in an understandable manner.
Application Layer – Sometimes referred to as the business logic layer, this component processes data, executes operations, and enforces business rules. It serves as an intermediary between the presentation and data layers.
Data Layer – This layer is dedicated to data storage and retrieval. It communicates with databases or other storage systems to ensure data persistence and integrity.

These layers interact with each other in a structured manner to maintain modularity and separation of concerns. However, increasing the number of levels of indirection can help solve certain system design challenges, such as scalability or maintainability. On the other hand, excessive layering introduces overhead, making the system more complex to maintain and potentially reducing its performance.

Types of Tiers

Cloud Computing Services-20250603094913444 — Example structure of presentation, application and data layer in different tiers of software | Slides from CCA ETHz 2025

Software architectures have evolved over time to address scalability, maintainability, and efficiency. The following outlines the major transitions in architectural paradigms:

1-Tier (Monolithic Architecture)
- A single system that integrates all components into a single unit.
- Simple to develop and manage since there are no inter-process or network communication overheads.
- Lacks scalability, as increasing demand requires vertical scaling (adding more resources to a single server).
2-Tier Architecture (Client-Server Model)
- Introduced API-based communication through remote procedure calls (RPC).
- Standardized API interfaces, often implemented via REST APIs over HTTP.
- More complex than monolithic systems due to the need for synchronization between client and server.
- Better and diverse presentation layer.
- Prone to compatibility issues, as both client and server must adhere to the same API specifications.
- We need to have an interface, as we have a separation between client and server.
- There is no communication with one server and another (so difficult to coordinate and synchronize the servers, which makes the service difficult to scale).
3-Tier Architecture (Middleware-Based Systems)
- Introduced an intermediary middleware layer that processes business logic independently of the client and data layers. (level of indirection, this is also the root of microservice architecture)
- Enabled better scalability by distributing workload across multiple servers, every client can have some shared view or presentation.
- Served as the foundation for micro-services architectures, where different services operate independently and communicate via lightweight protocols such as HTTP or message queues.
- There is no common standard of this kind of architecture.

Performance evaluation

The main objective of performance evaluation is to understand why a system behaves in the way it does. This paper, (Ousterhout 2018), paints the picture of some of the main pitfalls in performance evaluation of a system. In this section, we will provide a summary of some of the main points.

Common errors in performance evaluation

Bugs in the benchmarking code (difficult to spot, as they just product erroneous measures, instead of faulting irrecoverably)
Making educated guesses instead of actually measuring the performance that is object of the discourse. Often these results are simply not true. One easy way is checking for inconsistencies.
Not measuring everything. This leads to superficial measurements. Another effect that this often leads is confimation bias of the original hypothesis, if we measure only that thing. One way to fix this is measuring at a lower level, or measuring in different manners.

References

[1] Ousterhout “Always Measure One Level Deeper” Vol. 61(7), pp. 74--83 2018

[2] Calder et al. “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency” ACM 2011

[3] Schleier-Smith et al. “What Serverless Computing Is and Should Become: The next Phase of Cloud Computing” Vol. 64(5), pp. 76--84 2021

[4] Agache et al. “Firecracker: Lightweight Virtualization for Serverless Applications” 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20) 2020

Cloud Computing: An Overview#

Key Players in the Cloud Industry#

Capital and Operational Expenses in the Cloud#

Definition for CapEx and OpEx#

Total cost of ownership#

Economic Advantages of the Cloud#

Cloud Service Models#

Traditional methods#

Infrastructure as a Service (IaaS)#

Platform as a Service (PaaS)#

Software as a Service (SaaS)#

Simple comparison of Service Models#

Serverless computing#

Functions as a Service (FaaS) / Serverless Computing#

Advantages and Disadvantages of Serverless Computing#

Providing FaaS#

Warm and Cold starts#

Firecracker#

Key features of Firecracker#

Prebooted pool#

Usages and Development#

Open Questions for MicroVMs#

Dandelion#

Resource allocation#

Uncorrelated Resource usages#

Disaggregated resource pools#

Software#

Tiers of Software Services#

Basic Layers#

Types of Tiers#

Performance evaluation#

Common errors in performance evaluation#

References#

Cloud Computing: An Overview

Key Players in the Cloud Industry

Capital and Operational Expenses in the Cloud

Definition for CapEx and OpEx

Total cost of ownership

Economic Advantages of the Cloud

Cloud Service Models

Traditional methods

Infrastructure as a Service (IaaS)

Platform as a Service (PaaS)

Software as a Service (SaaS)

Simple comparison of Service Models

Serverless computing

Functions as a Service (FaaS) / Serverless Computing

Advantages and Disadvantages of Serverless Computing

Providing FaaS

Warm and Cold starts

Firecracker

Key features of Firecracker

Prebooted pool

Usages and Development

Open Questions for MicroVMs

Dandelion

Resource allocation

Uncorrelated Resource usages

Disaggregated resource pools

Software

Tiers of Software Services

Basic Layers

Types of Tiers

Performance evaluation

Common errors in performance evaluation

References