Cloud Storage

Object Stores Characteristics of Cloud Systems Object storage design principles We don’t want the hierarchy that is common in Filesystems, so we need to simplify that and have these four principles: Black-box objects Flat and global key-value model (trivial model, easy to access, without the need to trasverse a file hierarchy). Flexible metadata Commodity hardware (the battery idea of Tesla until 2017). Object storage usages Object storage are useful to store things that are usually read-intensive. Some examples are ...

19 min · Xuanqiang 'Angelo' Huang

Notions of Security

CIAA principles of security We have already outlined these principles in Sicurezza delle reti and talked about the concepts of authentication and integrity. Here we try to deepen these concepts and delve a little bit more on the attack vectors. This note mainly focuses on the principles summarized by the acronyms CIA and AAA. Confidentiality This is one concerns about the secrecy of the sent message. We do not want others to be able to access and read what we are doing. ...

7 min · Xuanqiang 'Angelo' Huang

Queueing Theory

Queueing theory is the theory behind what happens when you have lots of jobs, scarce resources, and subsequently long queues and delays. It is literally the “theory of queues”: what makes queues appear and how to make them go away. This is basically what happens in clusters, where you have a limited number of workers that need to execute a number of jobs. We need some little maths to model the stochastic process of request arrivals. ...

8 min · Xuanqiang 'Angelo' Huang

Systems for Artificial Intelligence

At the time of writing, the compute requirements for machine learning models and artificial intelligence are growing at a staggering rate of 200% every 3.5 months. Interest in the area is being quantified as 10k papers per month on the topic, while dollar investments on compute (energy, cooling, sustainability of compute in general) have had a hard time keeping up with the continuous new requests. Image from here ...

11 min · Xuanqiang 'Angelo' Huang

Container Virtualization

Containers In this note, we introduce the famous docker containers. We also explore how #Linux Containers are implemented, and some parts of how #Docker works. What is a Container We have explored Virtual Machines in some past section. Containers do not virtualize everything, but just the environment where the application is run. This includes: Libraries Binaries We can see it as a lightweight VM, even if they do not offer the full level of isolation of traditional virtual machines. ...

6 min · Xuanqiang 'Angelo' Huang

Datacenter Hardware

We want to optimize the parts of the datacenter hardware such that the cost of operating the datacenter as a whole would be lower, we need to think about it as a whole. Datacenter CPUs Desktop CPU vs Cloud CPU Isolation: Desktop CPUs have low isolation, they are used by a single user. Cloud CPUs have high isolation, they are shared among different users. Workload and performance: usually high workloads and moving a lot of data around. They have a spectrum of low and high end cores, so that if you have high parallelism you can use lower cores, while for resource intensive tasks, its better to have high end cores, especially for latency critical tasks. ...

19 min · Xuanqiang 'Angelo' Huang

Cloud Reliability

Reliability is the ability of a system to remain operational over time, i.e., to offer the service it was designed for. Cloud Hardware and software fails. In this note, we will try to find methods to analyze and predict when components fail, and how we can prevent this problem. Defining the vocabulary Reliability and Factors of Influence Reliability is the probability that a system will perform its intended function without failure over a specified period of time. There are many factors that influence this value: ...

8 min · Xuanqiang 'Angelo' Huang

Cluster Management Policies

We have resources, but need to know how to assign these to the jobs that need them. This note presents some of the most common resource management policies for cloud clusters. Introduction to cluster management How can we allocate the resources in a cluster in an efficient manner? How can we allocate resources fairly? Two step allocations There are two main kinds of allocation: first you need to allocate resources to a process, then allocate the process physically in the cluster. We have seen an example of a working infrastructure in Cluster Resource Management. ...

8 min · Xuanqiang 'Angelo' Huang

Cluster Resource Management

We need to find an efficient and effective manner to allocate the resources around. This is what the resource management layer does. Introduction to the problem What is Cluster Resource Management? Most of the time, the user specifies an amount of resources, and then the cluster decides how much to allocate (but approaches like (Delimitrou & Kozyrakis 2014), do it differently). There are mainly two parts in cluster resource management: Allocation: deciding how many resources an application (techniques for this is presented in Cluster Management Policies. Assignment: from which physical machine you can effectively put the application. Types of management architectures We mainly divide the management architectures in three ways: ...

7 min · Xuanqiang 'Angelo' Huang