Container and Cgroups

Cgroups is a Kernel feature that organizes processes into hierarchical groups to limit and monitor their system usage such as CPU, memory, disk, network and so on.

The Linux kernel provides a pseudo-filesystem named cgroupfs as the interface. A cgroup is a set of processes which has settings in cgroupfs. With the settings in cgroupsfs, we can do things below:

Below is the simplified code from bocker. It demonstrates that limiting the system resource usage of a container can be achieved by creating a cgroup and executing a command in a cgroup.

# Prepare a hash. We need it to identify our container.
$ uuid="ps_$(shuf -i 42002-42254 -n 1)"

# Prepare a root dir for all the containers.
$ btrfs_path='/var/bocker' && cgroups='cpu,cpuacct,memory';

# Prepare root filesystem based on the given `$image`.
$ btrfs subvolume snapshot "$btrfs_path/$image" "$btrfs_path/$uuid" > /dev/null

# Create a cgroup
$ cgcreate -g "$cgroups:/$uuid"

# Control cgroup resource
$ cgset -r cpu.shares=512 "$uuid"
$ cgset -r memory.limit_in_bytes=512000000 "$uuid"

# Execute a given `$cmd` in the cgroup.
# We need to create a unique namespace for the command (unshare).
# We also need to change the root directory (chroot).
# We also need to mount the runtime (/proc).
# Logging is a bonus (tee).
$ cgexec -g "$cgroups:$uuid" \
        ip netns exec netns_"$uuid" \
        unshare -fmuip --mount-proc \
        chroot "$btrfs_path/$uuid" \
        /bin/sh -c "/bin/mount -t proc proc /proc && $cmd" \
        2>&1 | tee "$btrfs_path/$uuid/$uuid.log"

You don't necessarily need to remember all the commands above, since it's pointless if you aren't a container engine developer.

The container engine such as runC, rkt, lxc provides you a beautifully designed CLI that abstracts above process for you. If you like watching Youtube video, Liz Rice just implemented the container from scratch in 40 minutes.

Check the manpage of cgroups.7 for the overview and Introduction to Control Groups for the usage.