Linux userland API discussions
 help / color / mirror / Atom feed
From: "John Ericson" <mail@johnericson.me>
To: "Al Viro" <viro@zeniv.linux.org.uk>
Cc: "Andy Lutomirski" <luto@kernel.org>, "Li Chen" <me@linux.beauty>,
	"Cong Wang" <cwang@multikernel.io>,
	"Christian Brauner" <brauner@kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-api <linux-api@vger.kernel.org>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Thomas Gleixner" <tglx@kernel.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, "Jan Kara" <jack@suse.cz>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Shuah Khan" <skhan@linuxfoundation.org>,
	"Kees Cook" <kees@kernel.org>,
	"Sergei Zimmerman" <sergei@zimmerman.foo>,
	"Farid Zakaria" <farid.m.zakaria@gmail.com>
Subject: Re: [RFC] Null Namespaces
Date: Wed, 24 Jun 2026 23:41:07 -0400	[thread overview]
Message-ID: <a75a9b82-a15b-4893-8f92-62b62664ea83@app.fastmail.com> (raw)
In-Reply-To: <20260625011023.GM2636677@ZenIV>

Ah, I started replying to your first email, but this is better, this
gets to the heart of the matter. Please don't mind me responding to your
two questions in reverse.

On Wed, Jun 24, 2026, at 9:10 PM, Al Viro wrote:
> What's the fundamental difference between CWD and any open descriptor
> for a directory?  Why does it make sense to ban the former, but allow
> the equivalents done via the latter?

Yes! These two notions are very close --- but that's the *problem*, not
a reason to not care about the existence of the CWD and root FS. I want
to get rid of CWD in my processes not because it is fundamentally
different (it isn't), but because it is superfluous.

If one is capability-minded like me, it's a bad mistake that we ever had
this "working directory" notion to begin with, and yet another example
of the folks at Bell Labs sticking something in the kernel that was
really only needed by the shell, and that could have just been done in
userland.

The current working directory, roughly, is *just* some global state
holding a directory file descriptor. But I don't want that global state.
If I am writing my userland program (that is not a shell), I would not
create the global variable. I do not appreciate the fact that the kernel
foists that state upon me whether I like it or not.

Now obviously we cannot have a giant breaking change removing the notion
of a current working directory altogether. But we can allow individual
processes which don't want it to opt out, and that is what nulling out
these fields (and updating the path resolution code to cope with that)
allows.

There is no loss of expressive power doing this, because one can (and
should!) just use the `*at` and file descriptors. But there is, however,
the imposition of discipline. The programmer (or coding agent) is
encouraged to do everything with file descriptors rather than path
concatenations etc., because they need to use `*at` anyways, and then
voilà, without browbeating anyone in security seminars or code review, a
bunch of TOCTOU issues disappear simply because doing the right thing is
now the path of least resistance.

> Please, start with explaining what, in your opinion, a mount namespace
> _is_, and where does "mount X is attached at path P relative to mount
> Y" belong.

Let's take a pathological example:

- Process A has `/foo` bind-mounted at `/bar/foo`

- Process B has `/bar` without that bind mount, and `/foo` mounted at
  `/baz/foo`, as is possible because it is in a different mount
  namespace.

If A opens `/bar/foo`, and sends it over (via socket) to B, and then B
does `openat(recv_fd, "..")`, B will get `/bar`, not `/baz`. This is
because `..` is resolved according to the mount referenced in the open
file. (This is, by the way, very good! Directory file descriptors would
be perilous to use if this were not the case!)

The moral of the story is that "mount X is attached at path P relative
to mount Y" is information accessed in the mounts themselves (maybe via
their containing mount namespace, per the `mnt_ns` field, or maybe not,
I am not sure, but it is immaterial). In contrast, the mount namespace
of the *opening* task (`current->nsproxy->mnt_ns`, and current is B)
doesn't matter at all for this purpose.

I am not on a crusade against `struct mnt_namespace` in general; I am
just trying to null out `(struct nsproxy)::mnt_ns` in particular. (This
is just as I am not on a crusade against `struct path`, just `root` and
`pwd` of `struct fs_struct`.)

These days, `current->nsproxy->mnt_ns` is, to me, first and foremost,
there for the legacy mount API. Again, just like our CWD example above,
this is mostly just global state.

The new mount API drastically [^1] reduces the need for it, since it
allows referring to mounts explicitly via file descriptors. That's OK!
The argument is the same as the above --- I am *not* trying to limit
what can be done if one has all the right files open with the right
perms. I am just trying to limit what works out of the box --- to reduce
the default set of privileges, *especially* where the resources involved
are implicit and/or stateful.

[^1]: It doesn't *quite* eliminate the need for `nsproxy->mnt_ns`
    entirely, since (as I understand it, from reading the `move_mount`
    man page) it is still used for some authorization checks, since
    `O_PATH` file descriptors do not grant privileges other than mere
    discoverability. But that's a problem that could be solved later
    with an `O_MOUNT` option analogous to `O_RDONLY` or `O_WRONLY`. In
    the meantime, I am perfectly happy if my processes with null mount
    namespaces get `move_mount` permission errors.

  reply	other threads:[~2026-06-25  3:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24 22:51 [RFC] Null Namespaces John Ericson
2026-06-24 23:06 ` Andy Lutomirski
2026-06-24 23:20   ` Andy Lutomirski
2026-06-24 23:53     ` John Ericson
2026-06-25  1:10       ` Al Viro
2026-06-25  3:41         ` John Ericson [this message]
2026-06-24 23:12 ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a75a9b82-a15b-4893-8f92-62b62664ea83@app.fastmail.com \
    --to=mail@johnericson.me \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=cwang@multikernel.io \
    --cc=dave.hansen@linux.intel.com \
    --cc=farid.m.zakaria@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=kees@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=me@linux.beauty \
    --cc=mingo@redhat.com \
    --cc=sergei@zimmerman.foo \
    --cc=skhan@linuxfoundation.org \
    --cc=tglx@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox