End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Linux unshare -m for per-process private filesystem mount points

Private mount points with unshare

Linux offers some pretty interesting features that are either new, borrowed, obscure, experimental, or any combination of those qualities. One such feature that is interesting is the unshare() function, which the unshare(2) man page says “allows a process to disassociate parts of its execution context that are currently being shared with other processes. Part of the execution context, such as the mount namespace, is shared implicitly when a new process is created using fork(2) or vfork(2)”.

I’m going to talk here about one option to unshare: per-process private filesystem mount points, also described as mount namespaces. This Linux kernel feature has been around for a few years and is easily accessible in the userland command unshare(1) in util-linux-ng 2.17 or newer (which is now simply util-linux again without the "ng" distinction because the fork took over mainline development).

Running `unshare -m` gives the calling process a private copy of its mount namespace, and also unshares file system attributes so that it no longer shares its root directory, current directory, or umask attributes with any other process.

Yes, completely private mount points for each process. Isn’t that interesting and strange?

A demonstration

Here’s a demonstration on an Ubuntu 11.04 system. In one terminal:

% su -
Password:
# unshare -m /bin/bash
# secret_dir=`mktemp -d --tmpdir=/tmp`
# echo $secret_dir
/tmp/tmp.75xu4BfiCw
# mount -n -o size=1m -t tmpfs tmpfs $secret_dir
# df -hT
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/auge-root
              ext4    451G  355G   74G  83% /

There’s no system-wide sign of /tmp/tmp.* there thanks to mount -n which hides it. But it can be seen process-private here:

# grep /tmp /proc/mounts
tmpfs /tmp/tmp.75xu4BfiCw tmpfs rw,relatime,size=1024k 0 0
# cd $secret_dir
# ls -lFa
total 36
drwxrwxrwt  2 root root    40 2011-11-03 22:10 ./
drwxrwxrwt 21 root root 36864 2011-11-03 22:10 ../
# touch play-file
# mkdir play-dir
# ls -lFa
total 36
drwxrwxrwt  3 root root    80 2011-11-03 22:10 ./
drwxrwxrwt 21 root root 36864 2011-11-03 22:10 ../
drwxr-xr-x  2 root root    40 2011-11-03 22:10 play-dir/
-rw-r--r--  1 root root     0 2011-11-03 22:10 play-file

Afterward, in another terminal, and thus a separate process with no visibility into the above-shown terminal process’s private mount points:

% su -
Password:
# grep /tmp /proc/mounts
# cd /tmp/tmp.75xu4BfiCw
# ls -lFa
total 40
drwx------  2 root root  4096 2011-11-03 22:10 ./
drwxrwxrwt 21 root root 36864 2011-11-03 22:18 ../

It’s all secret!

Use cases

This feature makes it possible for us to create a private temporary filesystem that even other root-owned processes cannot see or browse through, raising the bar considerably for a naive attacker to get access to sensitive files or even see that they exist, at least when they’re not currently open and visible to e.g. lsof.

Of course a sophisticated attacker would presumably have a tool to troll through kernel memory looking for what they need. As always, assume that a sophisticated attacker who has access to the machine will sooner or later have anything they really want from it. But we’d might as well make it a challenge.

Another possible use of this feature is to have a process unmount a filesystem privately, perhaps to reduce the exposure of other files on a system to a running daemon if it is compromised.

/etc/mtab vs. /proc/mounts

Experimenting with this feature also drew my attention to differences in how popular Linux distributions expose mount points. There are actually traditionally two places that the list of mounts is stored on a Linux system.

First, the classic Unix /etc/mtab, which is in essence a materialized view. It is the reason that on the Ubuntu 11.04 example above we see the private mount point everywhere on the system, but it reported different disk sizes. The existence of the mount point was global in /etc/mtab but the sizes are determined dynamically and differ based on process’s view into the mount points themselves. The `mount -n` option tells mount to not put the new mount point into /etc/mtab. And this is what the df(1) command refers to. How repulsive that a file in the normally read-only /etc is written to so nonchalantly!

Second, the Linux-specific /proc/mounts, which is real-time, exact, and accurate, and can appear differently to each process. The mount invocation can’t hide anything from /proc/mounts. This is what you would think is the only place to look for mounts, but /etc/mtab is still used some places.

Ubuntu 11.04 still has both, with a separate /etc/mtab. Fedora 16 has done away with /etc/mtab entirely and made it merely a symlink to /proc/mounts, which makes sense, but that is a newer convention and leads to the surprising difference here.

Linux distributions and unshare

The unshare userland command in util-linux(-ng) comes with RHEL 6, Debian 6, Ubuntu 11.04, and Fedora 16, but not on the very common RHEL 5 or CentOS 5. Because we needed it on RHEL 5, I made a simple package that contains only the unshare(1) command and peacefully coexists with the older stock RHEL 5 util-linux. It’s called util-linux-unshare and here are the RPM downloads for RHEL 5:

I hope you’ve found this as interesting as I did!

Further reading

4 comments:

Anonymous said...

Is there a FreeBSD version of this system call? It seems useful but it would be a darn shame if I had to use Linux to use it.

Jon Jensen said...

I don't know, but would love to hear if you find out!

Alexey Lesovsky said...

How I can find hidden mountpoint, and umount it?

Of course, I can walk every /proc/$pid/mountinfo and diff it with /proc/mounts )) But may have an easier way?

Jon Jensen said...

Alexey, that's a good question. I don't know. Even if you can find the private mounts, I think you would have to kill the process that controls the private mount unless you can get that process to run the unmount code for you (in a shell or similar). Just guessing.