Datacenter Hardware

We want to optimize the parts of the datacenter hardware such that the cost of operating the datacenter as a whole would be lower, we need to think about it as a whole. Datacenter CPUs Different requirements Hardware needs high level isolation (because it will be shared among different users). Usually high workloads and moving a lot of data around. They have a spectrum of low and high end cores, so that if you have high parallelism you can use lower cores, while for resource intensive tasks, its better to have high end cores, especially for latency critical tasks. ...

16 min · Xuanqiang 'Angelo' Huang

Green Computing

The cloud is inefficient, and it looks like we can improve a lot on this side. Computer Science with their systems have reached industrial scales and can be compared to build airports, highways and metro systems in terms of public infrastructure, yet, due to their immaterial and intangible nature, the perception of these systems do not match their perceived reality by the majority of the people. While classical engineering designs physical objects, computer science designs virtual objects ~Gustavo Alonso CCA Lecture 14 May 2025 ETH Zürich ...

5 min · Xuanqiang 'Angelo' Huang

Optimizations for DNN

Mixture of Experts There is a gate that opens a subset of the experts, and the output is the weighted sum of the outputs of the experts. The weights are computed by a gating network. One problem is load balancing, non uniform assignment. And there is a lot of communication overhead when you place them in different devices.

1 min · Xuanqiang 'Angelo' Huang

Systems for Artificial Intelligence

At the time of writing, the compute requirements for machine learning models and artificial intelligence are growing at a staggering rate of 200% every 3.5 months. Interest in the area is being quantified as 10k papers per month on the topic, while dollar investments on compute (energy, cooling, sustainability of compute in general) have had a hard time keeping up with the continuous new requests. From https://ucbrise.github.io/cs294-ai-sys-fa19/assets/lectures/lec03/03_ml-lifecycle.pdf ...

8 min · Xuanqiang 'Angelo' Huang

Cloud Computing Services

Cloud Computing: An Overview Cloud shifted the paradigm from owning hardware to renting computing resources on-demand. Hardware became a service. Key Players in the Cloud Industry 🟨 The cloud computing market is dominated by several major providers, often referred to as the “Big Seven”, also called hyper-scalers. They are usually not interested in making it interoperable (they prefer the lock-in). Amazon Web Services (AWS): The largest provider, offering a comprehensive suite of cloud services. Microsoft Azure: Known for deep integration with enterprise systems and hybrid cloud solutions. Google Cloud Platform (GCP): Excels in data analytics, AI/ML, and Kubernetes-based solutions. IBM Cloud: Focuses on hybrid cloud and enterprise-grade AI. Oracle Cloud: Specializes in database solutions and enterprise applications. Alibaba Cloud: The leading provider in Asia, offering services similar to AWS. Salesforce: A major player in SaaS, particularly for CRM and business applications. These providers collectively control the majority of the global cloud infrastructure market, enabling scalable and on-demand computing resources for businesses worldwide. Capital and Operational Expenses in the Cloud Definition for CapEx and OpEx 🟥 Cloud computing transforms traditional IT cost structures by shifting expenses from capital expenditures (CapEx), such as purchasing servers and data centers, to operational expenditures (OpEx), where users pay only for the resources they consume. ...

13 min · Xuanqiang 'Angelo' Huang

Cloud Reliability

Reliability is the ability of a system to remain operational over time, i.e., to offer the service it was designed for. Cloud Hardware and software fails. In this note, we will try to find methods to analyze and predict when components fail, and how we can prevent this problem. Defining the vocabulary Availability 🟨++ $$ \text{Availability} = \frac{\text{Uptime}}{\text{Uptime} + \text{Downtime}} $$MTTF: Mean Time To Failure 🟩– $$ \text{MTTF} = \frac{1}{r} $$ This definition does not include repair time, and assumes the failures are independent with each other. ...

6 min · Xuanqiang 'Angelo' Huang

Communication in the Cloud

How can we coordinate services to actually understand what they are doing, or what the user wants them to do? How to manage networks errors? This note will mainly focus on high level communication protocols to coordinate this kind of communication. Remote Procedure Calls History and Basic Idea This has been the main idea, introduced in 1984, using the idea of stubs, see (Birrell & Nelson 1984). The system basically calls the remote procedure as if it was local on the high level, but on a lower level a network request is sent. The architecture has remained the same in these years. It hides all the complexity in the stub (marshaling, binding and sending, without caring about the sockets and communication matters). One problem is that it might be hiding the complexity too well. The programmer has surely an ease of programming, but design consideration should consider overloads generated by the network communication. ...

7 min · Xuanqiang 'Angelo' Huang

Queueing Theory

Queueing theory is the theory behind what happens when you have lots of jobs, scarce resources, and subsequently long queues and delays. It is literally the “theory of queues”: what makes queues appear and how to make them go away. This is basically what happens in clusters, where you have a limited number of workers that need to execute a number of jobs. We need some little maths to model the stochastic process of request arrivals. ...

7 min · Xuanqiang 'Angelo' Huang

Virtual Machines

The fundamental idea behind a virtual machine is to abstract the hardware of a single computer (the CPU, memory, disk drives, network interface cards, and so forth) into several different execution environments, thereby creating the illusion that each separate environment is running on its own private computer. (Silberschatz et al. 2018). Virtualization allows a single computer to host multiple virtual machines, each potentially running a completely different operating system. È virtuale nel senso che la macchina virtuale ha la stessa percezione della realtà di una macchina reale. Qualcosa che non è la realtà ma appare molto simile ad essa. ...

11 min · Xuanqiang 'Angelo' Huang

Cluster Management Policies

Introduction to cluster management How can we allocate the resources in a cluster in an efficient manner? How can we allocate resources fairly? Two step allocations 🟨++ There are two main kinds of allocation: first you need to allocate resources to a process, then allocate the process physically in the cluster. Private and public cluster management 🟥++ Cluster management could be private or public. Private means every app is managing their own sub-cluster: each app receives a private, static set of resources. Here it is easier to manage hardware for various needs. Public means there is a big cluster, like standard third party ...

4 min · Xuanqiang 'Angelo' Huang