Generic Linux architectural discussions
 help / color / mirror / Atom feed
* [RFC] Null Namespaces
@ 2026-06-24 22:51 John Ericson
  2026-06-24 23:06 ` Andy Lutomirski
  2026-06-24 23:12 ` Al Viro
  0 siblings, 2 replies; 7+ messages in thread
From: John Ericson @ 2026-06-24 22:51 UTC (permalink / raw)
  To: Li Chen, Cong Wang, Christian Brauner, linux-arch
  Cc: linux-kernel, linux-fsdevel, linux-api, Arnd Bergmann,
	Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Jan Kara, Jonathan Corbet,
	Shuah Khan, Alexander Viro, Kees Cook, Sergei Zimmerman,
	Farid Zakaria

Hello, I am hoping to discuss an idea I've had for a while, that I am
calling "null namespaces" that has become more relevant with some recent
other discussions. First I'll discuss null namespaces in general terms,
and then I'll link those recent discussions and relate null namespaces
to them.

### Null namespaces

The essence of null namespaces is trying to give processes as little
ambient authority as possible, so they are lighter weight and allowed to
do even less than fully unshared processes today.

Namespaces as they exist today are frequently described as an isolation
mechanism, but I think this is the conflation of two different things.
*Removing* a new process from its parent's namespaces unquestionably is
increasing isolation --- no disagreement there. But putting the process
in new namespaces is something else; I would call it supporting
"delusions of grandeur" of that process. For example, namespaces allow a
process to do mounts, have `CAP_SYS_ADMIN`, create network interfaces,
look up other processes by PID, etc.

Conceptually, to remove a process from one ambient authority scope (the
very name "namespaces" indicates they are about ambient authority)
should not require putting it in some ambient authority scope. Just
because, for example, the process cannot see one mount tree, doesn't
mean it needs to see another.

Here's what I am thinking would happen concretely:

First, the simpler cases:

#### Null mount namespace

- requires:

  - null root file system: absolute paths don't work.

  - null current working directory: relative paths with traditional,
    non-`*at` system calls (and `*at` ones using `AT_FDCWD`) don't work.

- All operations relating to the "ambient" mount tree don't work.

- `*at` operations with a file descriptor do work.

- The new fd-based mount APIs with detached mounts do work, modulo
  the calling process having enough permissions (as usual).

#### Null network namespace

- No network interfaces

- No abstract Unix sockets

#### Null IPC namespace

- cannot create or look up either type of message queue

#### Null UTS namespace

- no hostname or domainname: `uname`, `gethostname`/`sethostname`, and the
  related `/proc/sys/kernel` sysctls all fail.

#### Null user namespace

- Process has no user or group ids

- All uid/gid-based authorization lookups return "denied"

- -1 / "nobody" IDs for operations we don't want to fail (like `fstat`)
  can be used.

Note how in each of these, the notion of there "existing" a "single"
null namespace or not is degenerate --- every process with a null
namespace field is as isolated from one another (in terms of the axis
that namespace regulates) as they are from processes that are in other
namespaces. It is truly a minimal permission level, and (as we shall
see) cheap too, because it is just a null pointer in `task_struct`.

Then for the nested ones --- PID and cgroup --- we cannot have quite a
null namespace in the same sense, because it is an important property
that these namespaces are hierarchical up to the root namespaces.
Instead of having a disjoint null namespace, we need a null namespace
with a parent.

#### Null PID namespace

- cannot look up other processes by PID

- current process ID lookup fails

- current process's parent process ID lookup fails

- current process still assigned IDs in parent PID namespaces, per usual

#### Null cgroup namespace

- Process still can have resources restricted according to parent cgroup

- Process unaware of cgroup hierarchy though --- blind to who/how it is
  constrained

In these cases, we cannot just implement with a null pointer, because we
still need a valid parent namespace. However, we shouldn't need any info
*but* the parent namespace. A pair of a pointer and a bool indicating
null namespace with parent namespace or actual namespace membership,
with some sort of helper to get the parent namespace in either case
(since the actual namespace has its parent), should implement this.

Finally there is the time namespace. Conceptually a null time namespace
is simple enough --- you cannot look up the time! --- but the
implementation is a bit more complex to get right because of the vDSO
for certain timing operations.

### General Motivation

Why am I so interested in this stuff?

Firstly it is because I have always been interested in a more strictly
object-capability-based userland, and projects like
Capsicum/CloudABI/WASI. I think going all in on file descriptors is
generally the direction that Linux has been going in, and it creates a
genuinely better programming model than the traditional Unix one with
all its ambient authority, and the TOCTOU and other issues that attend
it.

Today's container idioms and the "delusions of grandeur" that namespaces
provide are great for retrofitting existing software to run in a more
isolated environment. But I don't want that to be the ceiling of our
ambitions. Especially in this age of LLM refactoring, it is very easy to
get both new and existing software to abide by the more limited set of
allowed operations that null-namespace processes allow. And the
modifications that that entails (more `openat`, more socket activation,
etc.) make that software (in my view) simply *better* --- I would want
it to work that way with or without these constraints forcing the issue.

Secondly, and more concretely/imminently as a Nix developer, I am very
interested in the performance and overhead of process isolation. It is
very much my ambition to move Nix into the Bazel/Buck space of ever more
numerous and fine-grained atomic build steps (i.e. small compilation
units, not "packages"), but to do this *without* sacrificing Nix's
strong sandboxing guarantees that make our build plans so self-contained
and thus the ease of onboarding new Nix users.

I think this "null namespace" sandboxing will likely be simpler and more
performant than creating and destroying a bunch of regular namespaces
for each compilation unit. And while it will no doubt take some compiler
/ other tool patching to fix up any assumptions that get in the way of
running processes with so few permissions, I am happy to take a stab at
that too. Nix is, after all, for "tool-assisted yak shaves" as one put
it --- patching GCC / Clang / whatever and then rebuilding the world is
something we are quite good at.

Lastly, I'll add that the traditional way people have thought about
things like Capsicum/CloudABI is custom personalities/seccomp rules, but
IMO trying to tackle the massive UAPI surface area so shallowly is ugly
and unmaintainable. Nulling out namespace fields in `task_struct`,
conversely, attacks the problem at its core, much more elegantly, and
makes it easy to handle both current *and future* syscalls in a
minimally invasive and maintainable manner.

### Null namespaces and process spawning

Why bring this up now?

Recently [1], Li Chen took a stab at the venerable old goal of making a
better process spawning UAPI than fork/clone + exec. I am quite excited
to see this happen, as it generally dovetails very nicely with the
object capability goals I have above. (E.g. making it performant and
idiomatic to opt-in, rather than opt-out of sharing file descriptors
with a child process is very good for a world where all
resource/privilege sharing is done with file descriptors.)

One problem with clone that didn't yet come up is that its defaults are
not good from a security perspective: sharing by default, and unsharing
as the opt in means that one must remember and take active measures to
ensure that child processes get *less* privileges. This is very bad ---
secure practices mean that the "lazy programmer" and the "smallest
program" must always err on the side of giving the child process *less*
privileges. This is the only way economics and the "principle of least
privilege" will work together, rather than against each other (and
economics is quite likely to win when they are working against each
other).

The reason that clone *doesn't* work that way is, of course,
performance: it would be wasteful to unshare and create new namespaces
when they are just going to be thrown away because the user wants to
share after all.

Null namespaces I think elegantly work around this performance/security
trade-off, while also avoiding the need for gazillion-parameter syscalls
like clone. This is because, as the most secure option, and a cheap
option, they are the rightful default for a new process creation API.

1. When an "embryonic" (under construction, not yet ready to be
   scheduled) task is first created, it should have all null namespaces.

2. Separate syscalls (`io_uring` exists for batching, we don't need to
   reinvent an ad-hoc batch solution) can exist for setting the
   namespaces on the process, where either "sharing" (use parent process
   namespace) or "unsharing" (use fresh namespace, usually derived from
   the parent process namespace but perhaps derived from a different
   one) are choices that can be opted into instead of the null namespace
   default.

3. After all state is initialized (arguments, environment variables,
   file descriptors, namespaces, etc.), the process can be "birthed",
   and submitted as ready to be scheduled.

This design is very natural to me, but its full naturality is *only*
available with the null namespace option. Otherwise we are stuck in a
place of no good defaults, and the "builder pattern" seems more awkward.

Also in [2], I bring up a design for unix sockets without the file
system or the "abstract" socket namespace, and how I want to avoid both
in order to firmly rule out TOCTOU and other ambient authority issues. I
think those arguments stand on their own, but the possibility of a null
network namespace sharpens the issue: it forces the `O_PATH` FD stuff I
discuss to be the only viable option.

### Implementation

I've "LLM'd" out some draft patches [3] for this. I'm not submitting
them because I still need to review and test them, and I don't want
(currently, pre those steps) low-quality slop to tarnish this proposal.
What this initial exploration did, however, confirm for me is that these
changes should be quite lightweight to implement. (Also, what I propose
is slightly different from my implementation draft in a few cases where
I think the design I proposed here is better than my draft
implementation.)

If the discussion here starts moving towards consensus, I'll clean up
and rework those patches along the lines of the consensus. Ideally I
would submit them one at a time, I figure, since the implementations for
different namespaces are necessarily changes to different subsystems.

Cheers!

John

[1]: https://lore.kernel.org/all/20260528095235.2491226-1-me@linux.beauty/

[2]: https://lore.kernel.org/all/455281ec-3ee1-4f27-989b-c239f0690d8b@app.fastmail.com/

[3]: https://github.com/Ericson2314/linux/commits/null-namespace

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-25  3:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 22:51 [RFC] Null Namespaces John Ericson
2026-06-24 23:06 ` Andy Lutomirski
2026-06-24 23:20   ` Andy Lutomirski
2026-06-24 23:53     ` John Ericson
2026-06-25  1:10       ` Al Viro
2026-06-25  3:41         ` John Ericson
2026-06-24 23:12 ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox