Containers

What is a Container ๐ŸŸจ++

We have explored virtual machines in Architettura software del OS#Macchine virtuali. Containers do not virtualize everything, but just the environment where the application is run. This includes:

  • Libraries
  • Binaries

We can see it as a lightweight VM, even if they do not offer the full level of isolation of traditional virtual machines.

Container Virtualization-20250226134043539

Image from the course slides

Containers vs. Virtual Machines (VMs) ๐ŸŸจ–

Docker is one of the most famous containerization tools, but there are many others like Podman, LXC, or Singularity. They have different roles and scopes.

Advantages:

  • Lightweight: More efficient resource sharing, allowing higher density on a single machine.
  • Fast Startup & Shutdown: Containers launch and terminate significantly faster than VMs.
    • VMs usually take minutes to boot up, while Containers seconds or microseconds (see firecracker in Cloud Computing Services).
  • Near Bare-Metal Performance: No Virtual Machine Monitor (VMM) traps or binary translation overhead.
  • Better sharing: higher density sharing on the resource machine.

Disadvantages

  • Weaker Isolation: Security boundaries are less strict compared to VMs. (Larger codebase, more probability of having vulnerabilities, and in practice this seems to be true).
  • OS Dependency: All containers must run on the same host operating system (this is a strong requirement!)

Docker and Linux Containers ๐ŸŸฅ++

Container Virtualization-20250226134541051

Linux Containers

This is the technology developed in 2008 that then enabled the developement of Docker applications

Namespace ๐ŸŸจ++

Namespaces are a Linux kernel feature that isolates resources per process. This allows to create a container that has its own view of the system.

They offer:

  • process-level isolation from global resources Processes thing they have the control of everything, and they are the only ones.

Some examples of namespaces are:

  • PID: Process IDs
  • NET: Networking
  • IPC: Interprocess Communication
  • UTS: Hostname and domain name
  • MNT: Filesystem mount points
  • USER: User and group IDs

All processes have default namespaces, but they can be changed with the unshare command. They can create and change using new namespaces.

Cgroups ๐ŸŸฉ–

Cgroups have mainly two roles:

  • Fine grained allocation of resources (CPU, memory, disk I/O, network bandwidth).
  • Limitation of maximum quantity of resources to some groups of processes

Providing examples of how cgroups work could be helpful:

# Create a new memory cgroup directory
mkdir /sys/fs/cgroup/memory/my_cgroup
# Set a memory limit (500 MB)
echo 500M > /sys/fs/cgroup/memory/my_cgroup/memory.limit_in_bytes
# Attach a process (replace <PID> with the process ID)
echo <PID> > /sys/fs/cgroup/memory/my_cgroup/tasks
# ---------------------------
# We can also use libcgroup tools
# Create a cgroup with memory and CPU controllers
cgcreate -g memory,cpu:/my_cgroup
# Set the memory limit to 500 MB
cgset -r memory.limit_in_bytes=500M my_cgroup
# Run a command (e.g., a shell) within this cgroup
cgexec -g memory,cpu:my_cgroup /bin/bash

Seccomp-bpf

This feature is used to restrict the system calls that a process can make. seccomp-bpf stands for Secure Computing with Berkeley Packet Filters.

This is an example how to use the seccomp feature.

# Create a test script
echo -e '#!/bin/bash\necho "Hello, World!"\ndate' > test_script.sh
chmod +x test_script.sh

# Run the script with seccomp restricting to only read and exit syscalls
seccomp-tools exec --seccomp 'allow read; allow exit; deny all' ./test_script.sh

But mostly this package libseccomp is meant as a C header library.

Docker

One of the most famous containerization technologies around. His main aim is to containerize applications meaning, providing a reproducible set of dependencies, and operations that an application needs in order to function the same way. With docker, and containers in general, you do not have hypervisors.

Docker is built on top of linux containers to provide a easier use of features like cgroups and namespaces. However, it is known that it had many security vulnerabilities(the kernel, which is a lot of code, needs to be secured, while for VMs the hypervisor needs to be secure, which is often less code) (if you want isolation, go for classical VMs, if only performance is important, go for containers)

Docker Images

โ€ข Images are divided into a sequence of layers. Each command creates a new layer on top of the previous layers. Container Virtualization-20250226135057905

Docker Runtime ๐ŸŸจ++

The Docker runtime is mainly divided into three main parts:

  • dockerd: This is the user level management which is the software that creates and runs containers from the images, and supports the integration with the user CLI.
  • containerd: this is the high level runtime which supports for downloading, sharing and managing these containers
  • runc: this is the low level runtime that uses the linux container features to provide the security virtualization parts etc.
  • Then you have a client that communicates with the engine to create and run containers (in linux the docker command is your client).

gVisor

gVisor stands in the middle between VMs and Containers for security vs performance metrics. The main difference compared to containers is that gVisor catches and runs system calls to run it in user-space, so it is a secure version of containers, but, as you might imagine, slower.

WebAssembly

WebAssembly is a binary instruction format originally though for the web, to be fast, secure and lightweight scripts, now it is being used in cloud and edge computing too. You can image you need:

  • An operating system
  • a runtime that loads and runs them, on top of the OS.

Key Features

  • Strong security
  • Small binary sizes
  • Fast loading and running
  • Support for many operating systems and architectures
  • Interoperability with the browser/cloud services

Conclusions: Comparison between virtualization methods

VM Container Wasm
Isolation Strong Moderate Strong
Performance Slower (emulation overhead) Fast (lightweight, shared kernel) Near-native (no emulation overhead)
Startup Time Slow (seconds to minutes) Fast (milliseconds to seconds) Instant (milliseconds)
Footprint Heavy (GBs) Moderate (MBs) Light (KBs)

VM are more secure: in order to attach it you would need to exploit bugs between app -> OS -> Hypervisor. Containers instead have a very large codebase, so it is quite probable that there some bugs.

It seems from the table that Wasm might be winning every comparison, but the difficulty is that its code must be written for that thing. Then, you might now have direct system resources APIs with it, and it does not provide full resource control.