Redundant Array of Independent Disks

Introduzione ai Redundant Array of Indipendent Disks I RAID ne abbiamo citato per la prima volta in Memoria. Come facciamo a stare su alla velocità del processore se questa va a crescere in modo esponenziale? Parallelizzazione della ricerca!. Ecco perché ci serve raid (oltre alla ridondanza quindi più sicuro). E possono anche fallire. → ammette recovery. E una altra cosa bella dei raid è che sono hot-swappable cioè li puoi sostituire anche quando stanno runnando. ...

4 min · Xuanqiang 'Angelo' Huang

Content Delivery Networks

CDNs are intermediary servers that replicate read intensive data to provide better performance when user requests them. A close relative of CDNs is edge computing (e.g. gaming stations) where lots of computation is done directly close to the user. Types of CDNs Mainly three types of CDNs: Highly distributed ones. -> Akamai Database based ones. Ad-hoc CDNs. Advantages and disadvantages The main reason we use CDNs is to lower the value of latency: we are in fact bringing the data closer to the user. We have much less data in length to be transmitted. Yet we have some disadvantages too: ...

3 min · Xuanqiang 'Angelo' Huang

HTTP e REST

HTTP is the acronym for HyperText Transfer Protocol. Caratteristiche principali (3) Comunicazioni fra client e server, e quanto sono comunicate le cose si chiude la connessione e ci sono politiche di caching molto bone (tipo con i proxy) Generico: perché è un protocollo utilizzato per caricare moltissime tipologie di risorse! Stateless, ossia non vengono mantenute informazioni su scambi vecchi, in un certo modo ne abbiamo parlato in Sicurezza delle reti quando abbiamo parlato di firewall stateless. Solitamente possiamo intendere questo protocollo come utile per scambiare risorse di cui abbiamo parlato in Uniform Resource Identifier. ...

6 min · Xuanqiang 'Angelo' Huang

Optimizations for DNN

Mixture of Experts There is a gate that opens a subset of the experts, and the output is the weighted sum of the outputs of the experts. The weights are computed by a gating network. One problem is load balancing, non uniform assignment. And there is a lot of communication overhead when you place them in different devices. LoRA: Low-Rank Adaptation We only finetune a part of the network, called lora adapters, not the whole thing. There are two matrices here, a matrix A and B, they are some sort of an Autoencoders, done for every Q nd V matrices in the LLM attention layer. The nice thing is that there are not many inference costs if adapters are merged post training: ...

9 min · Xuanqiang 'Angelo' Huang

Virtual Machines

The fundamental idea behind a virtual machine is to abstract the hardware of a single computer (the CPU, memory, disk drives, network interface cards, and so forth) into several different execution environments, thereby creating the illusion that each separate environment is running on its own private computer. (Silberschatz et al. 2018). Virtualization allows a single computer to host multiple virtual machines, each potentially running a completely different operating system. È virtuale nel senso che la macchina virtuale ha la stessa percezione della realtà di una macchina reale. Qualcosa che non è la realtà ma appare molto simile ad essa. ...

13 min · Xuanqiang 'Angelo' Huang

Communication in the Cloud

How can we coordinate services to actually understand what they are doing, or what the user wants them to do? How to manage networks errors? This note will mainly focus on high level communication protocols to coordinate this kind of communication. Remote Procedure Calls History: the Stub This has been the main idea, introduced in 1984, using the idea of stubs, see (Birrell & Nelson 1984). The system basically calls the remote procedure as if it was local on the high level, but on a lower level a network request is sent. The architecture has remained the same in these years. It hides all the complexity in the stub (marshaling, binding and sending, without caring about the sockets and communication matters). One problem is that it might be hiding the complexity too well. The programmer has surely an ease of programming, but design consideration should consider overloads generated by the network communication. ...

7 min · Xuanqiang 'Angelo' Huang

Compute Express Link

This allows us to extend the memory hierarchy (see Memoria) that we have today. The problem is that we have heterogeneous access patterns specifications and hardware. One of the main trends is disaggregation: we want to be able to scale different resources independently. Introduction to CXL (Compute Express Link) This is a new part of the memory hierarchy. NVM is a kind of non volatile memory that is used as a storage device that is close to the device (others are network attached or slower than network part anyways. It is persistent and has low latency. It is used in the memory hierarchy to extend the memory capacity. ...

4 min · Xuanqiang 'Angelo' Huang

Confidential Computing

pWith confidential computing we want to guarantee confidentiality and integrity of a user’s computation running on a remote (cloud) system, including: The program Its inputs and outputs Intermediate state, control flow, etc. Even if do not trust the cloud provider! Usually it is easy to guarantee that kind of privacy if you are storing or communicating using encryption methods (see Asymmetric Cryptography, Block Ciphers), but it’s difficult to do so if the program is running. ...

6 min · Xuanqiang 'Angelo' Huang

Green Computing

The cloud is inefficient, and it looks like we can improve a lot on this side. Computer Science with their systems have reached industrial scales and can be compared to build airports, highways and metro systems in terms of public infrastructure, yet, due to their immaterial and intangible nature, the perception of these systems do not match their perceived reality by the majority of the people. While classical engineering designs physical objects, computer science designs virtual objects ~Gustavo Alonso CCA Lecture 14 May 2025 ETH Zürich ...

5 min · Xuanqiang 'Angelo' Huang

Cloud Computing Services

Cloud Computing: An Overview Cloud shifted the paradigm from owning hardware to renting computing resources on-demand. Hardware became a service. Key Players in the Cloud Industry The cloud computing market is dominated by several major providers, often referred to as the “Big Seven”, also called hyper-scalers. They are usually not interested in making it interoperable (they prefer the lock-in). Amazon Web Services (AWS): The largest provider, offering a comprehensive suite of cloud services. Microsoft Azure: Known for deep integration with enterprise systems and hybrid cloud solutions. Google Cloud Platform (GCP): Excels in data analytics, AI/ML, and Kubernetes-based solutions. IBM Cloud: Focuses on hybrid cloud and enterprise-grade AI. Oracle Cloud: Specializes in database solutions and enterprise applications. Alibaba Cloud: The leading provider in Asia, offering services similar to AWS. Salesforce: A major player in SaaS, particularly for CRM and business applications. These providers collectively control the majority of the global cloud infrastructure market, enabling scalable and on-demand computing resources for businesses worldwide. Capital and Operational Expenses in the Cloud Definition for CapEx and OpEx Cloud computing transforms traditional IT cost structures by shifting expenses from capital expenditures (CapEx), such as purchasing servers and data centers, to operational expenditures (OpEx), where users pay only for the resources they consume. ...

14 min · Xuanqiang 'Angelo' Huang