Running podman inside of a container

Podman

Before diving into how to run podman inside of a container, let’s understand what podman is and how it is different than docker.

What is Podman?

Podman is a container engine similar to docker which can be used to create, manage, and run containers and container images. Podman supports running containers in both root and rootless modes, with a strong focus on security and integration with Linux’s security features like SELinux.

You must be wondering what actually is pod in podman. Well, a pod is essentially a group of one or more containers that share the same network namespace and storage, allowing them to work closely together as a single unit. A pod can contain one or more tightly coupled containers that need to communicate with each other.

For example, in a web application, you might place a web server container and a sidecar container (such as one that logs requests) in the same pod. Podman manages these pods via a simple command-line interface (CLI) and the libpod library, which provides application programming interfaces (APIs) for managing containers, pods, container images, and volumes.

Each pod is composed of 1 infra container and any number of regular containers. The infra container keeps the pod running and maintains user namespaces, which isolate containers from the host. The other containers each have a monitor to keep track of their processes and look out for dead containers―nonfunctioning containers that can’t be taken out of the environment because some of their resources are still being used.

Namespaces in Linux

Namespaces are a kernel feature that provide a way to isolate and virtualize system resources for processes. By using namespaces, processes can have their own unique views of system resources, such as the filesystem, network interfaces, user IDs, and more. You can think of namespaces as vitual environments in your linux system.

This isolation makes namespaces secure, isolated environments where multiple applications can run on a single system without interfering with each other. Now, you must have understood why namespaces are important in containerisation as it allow each container to run in a fully isolated environment.

There are multiple types of linux namespaces such as Mount, Process ID, Network, Interprocess Communication, User, Time, etc. You can read more about these namespaces in the official documentation as this post deals mainly with podman :).

Podman vs Docker

Podman stands out from other container engines because it’s daemonless, meaning it doesn’t rely on a process with root privileges to run containers.

Daemons are processes that run in the background of your system to do the work of running containers without a user interface. Think of daemons as the intermediary communicating between the user and the container itself.

In Linux systems, the root account acts as a superuser with administrative access (while bypassing the need for admin verification) to read files, install programs, edit applications, and more. This makes daemons an ideal target for hackers who want to gain control of your containers and infiltrate the host system.

Podman Vs Docker

How to use podman inside of a container

Podman can be run in multiple ways, rootful and rootless. The combinations can be:

Rootful Podman in rootful Podman
Rootless Podman in rootful Podman
Rootful Podman in rootless Podman
Rootless Podman in rootless Podman

In order to run a container engine like Podman within a container, the first thing you need to understand is that you need a fair amount of privilege.

Containers require multiple UIDs. Most container images need more than one UID to work. For example, you might have an image with most of the files owned by root, but some owned by the apache user (UID=60).
Container engines mount file systems and use the system call clone to create user namespaces.
Containers have their own /etc/passwd and /etc/group files that define the containers users and groups, which are separate from the host’s users and groups.

Running with the –privileged flag

Rootful Podman in rootful Podman with –privileged

aditya@zephyrus:~$ podman run --privileged quay.io/podman/stable podman run ubi8 echo hello
Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.access.redhat.com/ubi8:latest...
Getting image source signatures
Checking if image destination supports signatures
Copying blob sha256:148a3ed2f70ea72bafab6a24598d47682ae82214e6b84b09c69960fdd727465b
Copying config sha256:4f03f39cd42733419904ceba5493d183e8e606dea850e2abbcf1dba3971540ab
Writing manifest to image destination
Storing signatures
hello

If you notice, here I have used --privileged flag which lets me run the rootful podman. The --privileged flag in Podman is used to give a container elevated privileges on the host. It overrides many security restrictions that are typically applied to containers, allowing them more direct access to the host system’s resources. This mode is often used in scenarios where containers need extensive system access that isn’t normally available in a standard container environment.

While --privileged can be powerful, it also introduces security risks:

Host Compromise Risk: A privileged container has higher access to host resources, making it more susceptible to attacks or vulnerabilities that could compromise the host.
Isolation Loss: The elevated permissions reduce the isolation between the container and the host, defeating some of the primary security benefits of containerization.

Rootless Podman in rootful Podman with –privileged

aditya@zephyrus:~$ podman run --user podman --privileged quay.io/podman/stable podman run ubi8 echo hello

Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.access.redhat.com/ubi8:latest...
Getting image source signatures
Checking if image destination supports signatures
Copying blob sha256:148a3ed2f70ea72bafab6a24598d47682ae82214e6b84b09c69960fdd727465b
Copying config sha256:4f03f39cd42733419904ceba5493d183e8e606dea850e2abbcf1dba3971540ab
Writing manifest to image destination
Storing signatures
hello

The Podman running inside the container is running as the user podman. This is because the containerized Podman uses the user namespace to create a confined container within the privileged container.

Running without the –privileged flag

Notice that even though we ran the outer containers –privileged above, the inner containers are running in locked-down mode. The rootless Podman running within the container is really locked down and would have a very difficult time escaping.

Rootful Podman in rootful Podman without –privileged

aditya@zephyrus:~$ podman run --cap-add=sys_admin,mknod --device=/dev/fuse --security-opt label=disable quay.io/podman/stable podman run ubi8-minimal echo hello
Resolved "ubi8-minimal" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.access.redhat.com/ubi8-minimal:latest...
Getting image source signatures
Checking if image destination supports signatures
Copying blob sha256:133388727492505cb9d292d14f00942ad2915ab7df3ec1aabbea88cd25035f7b
Copying config sha256:4e07d79182a85dd10951c19f294a7b6e9dcc7d73e02efc5a969cdbd24f37e929
Writing manifest to image destination
Storing signatures
hello

We can eliminate the --privileged flag from rootful Podman but still have to disable some security features to make rootful Podman within the container to work.

Capabilities: --cap-add=sys_admin,mknod We need to add two Linux capabilities.

a. CAP_SYS_ADMIN is required for the Podman running as root inside of the container to mount the required file systems.

b. CAP_MKNOD is required for Podman running as root inside of the container to create the devices in /dev.
Devices: The --device /dev/fuse flag must use fuse-overlayfs inside the container. This option tells Podman on the host to add /dev/fuse to the container so that containerized Podman can use it.
Disable SELinux: The --security-opt label=disable option tells the host’s Podman to disable SElinux separation for the container. SELinux does not allow containerized processes to mount all of the file systems required to run inside a container.

Rootless Podman in rootful Podman without –privileged

aditya@zephyrus:~$ podman run --user podman --security-opt label=disable --security-opt unmask=ALL --device /dev/fuse -ti quay.io/podman/stable podman run -ti docker.io/busybox echo hello
Trying to pull docker.io/library/busybox:latest...
Getting image source signatures
Copying blob a46fbb00284b done   | 
Copying config 27a71e19c9 done   | 
Writing manifest to image destination
hello

Note that unlike the rooful within rootful case before, we don’t have to add the dangerous security capabilities sys_admin and mknod
In this case, I am running with –user podman, which automatically causes the Podman within the container to run within the user namespace
Still disabling SELinux since it blocks the mounting
Still need –device /dev/fuse to use fuse-overlayfs within the container

Let’s now try running all the four combinations

Rootful podman in rootful podman

root@zephyrus:~# whoami
root
root@zephyrus:~# id
uid=0(root) gid=0(root) groups=0(root)
root@zephyrus:~# 
root@zephyrus:~# podman run -d --name rootoutside-rootinside registry.access.redhat.com/ubi8:latest tail -f /dev/null
ab7821cffbc2b11e38e9216cb2818d3a9daee6e241b3f1f2fbb75731d7aa6d60
root@zephyrus:~# podman exec -it rootoutside-rootinside /bin/bash
[root@ab7821cffbc2 /]# whoami
root
[root@ab7821cffbc2 /]# id
uid=0(root) gid=0(root) groups=0(root)
[root@ab7821cffbc2 /]# sleep 2000 &
[1] 15
[root@ab7821cffbc2 /]# exit
exit
root@zephyrus:~# 
root@zephyrus:~# ps -ef | grep "sleep 2000" | grep -v grep
root       48695   48542  0 17:33 ?        00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 2000
root@zephyrus:~# 
root@zephyrus:~# ps -ef n | grep "sleep 2000" | grep -v grep
       0   48695   48542  0 17:33 ?        S      0:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 2000

Here we have logged in a root user and run podman as root user. If you see, processes run as UID 0 inside the container and as UID 0 outside the container from the host perspective. We ran sleep command in the background inside the container and then after we exit the container we verify the pid of the process in the host view which is the root user’s pid.

Podman run as root (UID 0)
Processes in container running as root (UID 0)
UID=0 (root) outside container
UID=0 (root) inside container

Rootless podman in rootful podman

root@zephyrus:~# whoami
root
root@zephyrus:~# id
uid=0(root) gid=0(root) groups=0(root)
root@zephyrus:~# podman run -d --name rootoutside-userinside -u sync registry.access.redhat.com/ubi8:latest tail -f /dev/null
82812d8c70ff8a1218d11f1df6fc3f78de923ee417e2090c71426d45dd0c9483
root@zephyrus:~# 
root@zephyrus:~# podman exec -it rootoutside-userinside /bin/bash
bash-4.4$ whoami
sync
bash-4.4$ id
uid=5(sync) gid=0(root) groups=0(root)
bash-4.4$ sleep 4000 &
[1] 5
bash-4.4$ exit
exit
root@zephyrus:~# ps -ef | grep "sleep 4000" | grep -v grep
5          49567   49489  0 17:40 ?        00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 4000
root@zephyrus:~# ps -ef n | grep "sleep 4000" | grep -v grep
       5   49567   49489  0 17:40 ?        S      0:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 4000
root@zephyrus:~# 

Here, we have started podman as root user but we are running the processes within the container as non-root user (sync). In this scenario, whatever UID number user is assigned to in the container will also be the UID that owns the process from the host point of view. One thing to keep in mind if UID 5 was assigned to a different user on the host /etc/passwd file then the processes would appeear to be running as whatever user was assigned UID5 in the host /etc/passwd file.

Podman run as root (UID 0)
Processes in container running as sync
UID=5 outside container
UID=5 (sync) inside container

Rootful podman in rootless podman

(base) aditya@zephyrus:~$ whoami
aditya
(base) aditya@zephyrus:~$ id
uid=1000(aditya) gid=1000(aditya) groups=1000(aditya),4(adm),20(dialout),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),135(lxd),136(sambashare),996(incus-admin),998(ollama),999(docker)
(base) aditya@zephyrus:~$ podman run -d --name useroutside-rootinside registry.access.redhat.com/ubi8:latest tail -f /dev/null
837e64598182e10dc77c985175fc97c869c370f57e8af4f6c99cad3ddb5bcaae
(base) aditya@zephyrus:~$ podman exec -it useroutside-rootinside /bin/bash
[root@837e64598182 /]# whoami
root
[root@837e64598182 /]# id
uid=0(root) gid=0(root) groups=0(root)
[root@837e64598182 /]# sleep 5000 &
[1] 16
[root@837e64598182 /]# exit
exit
(base) aditya@zephyrus:~$ ps -ef | grep "sleep 5000" | grep -v grep
aditya     50576   50404  0 17:45 ?        00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 5000
(base) aditya@zephyrus:~$ 
(base) aditya@zephyrus:~$ ps -ef n | grep "sleep 5000" | grep -v grep
    1000   50576   50404  0 17:45 ?        S      0:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 5000

If we start a container as an unprivileged user with rootless podman, user namespaces are used. User namespaces are a feature in linux kernel that allow user accounts to be isolated in namepspaces and allow for processes to have different uid and gid number inside and outside of the username space. It allow for the processes to run as root within the container’s namespace but still be unprivileged outside of the container’s user namepsace. Here, the root user in the container can kill other processes running within the container but cannot reboot the host because it’s mapped back to an unprivileged user in the host.

Podman run as aditya (UID 1000) - rootless podman
Processes in container running as root
UID=1000 (aditya) outside container
UID=0 (root) inside container

Rootless podman in rootless podman

(base) aditya@zephyrus:~$ whoami
aditya
(base) aditya@zephyrus:~$ id
uid=1000(aditya) gid=1000(aditya) groups=1000(aditya),4(adm),20(dialout),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),135(lxd),136(sambashare),996(incus-admin),998(ollama),999(docker)
(base) aditya@zephyrus:~$ podman run -d --name useroutside-userinside -u sync registry.access.redhat.com/ubi8:latest tail -f /dev/null
22c36b82cb9c730ec9beb66845bc08a9d8a693d8bdd4dbcddfd0158499cea2a4
(base) aditya@zephyrus:~$ podman exec -it useroutside-userinside /bin/bash
bash-4.4$ whoami
sync
bash-4.4$ id
uid=5(sync) gid=0(root) groups=0(root)
bash-4.4$ sleep 6000 &
[1] 5
bash-4.4$ exit
exit
(base) aditya@zephyrus:~$ ps -ef | grep "sleep 6000" | grep -v grep
100004     51409   51293  0 17:48 ?        00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 6000
(base) aditya@zephyrus:~$ ps -ef n | grep "sleep 6000" | grep -v grep
  100004   51409   51293  0 17:48 ?        S      0:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 6000

Here, we are running rootless podman in the host as well as in the container. From a security perspective, this is the best option out of the four because running podman rootless from the host point of view is one layer of defense and running the processes within the container as a non-root user offers another level of defense.

Podman run as aditya (UID 1000) - rootless podman
Processes in container running as sync
UID=100004 outside container
UID=5 (sync) inside container

Conclusion

This went a bit long but I hope you are able to learn about podman, namespaces in linux and running containers with and without privileges. You can also read more about podman from the official documentation https://www.redhat.com/en/topics/containers/what-is-podman. All the above details have been taken from the official docs only.

Please drop me a mail if you feel anything in this post can be improved or changed.

Happy learning :)