From: Christian Brauner <brauner@kernel.org>
To: Ferenc Fejes <ferenc@fejes.dev>
Cc: linux-fsdevel@vger.kernel.org,
"Josef Bacik" <josef@toxicpanda.com>,
"Jeff Layton" <jlayton@kernel.org>,
"Jann Horn" <jannh@google.com>, "Mike Yuan" <me@yhndnzj.com>,
"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
"Lennart Poettering" <mzxreary@0pointer.de>,
"Daan De Meyer" <daan.j.demeyer@gmail.com>,
"Aleksa Sarai" <cyphar@cyphar.com>,
"Amir Goldstein" <amir73il@gmail.com>,
"Tejun Heo" <tj@kernel.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Jan Kara" <jack@suse.cz>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
bpf@vger.kernel.org, "Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
netdev@vger.kernel.org, "Arnd Bergmann" <arnd@arndb.de>
Subject: Re: [PATCH RFC DRAFT 00/50] nstree: listns()
Date: Fri, 24 Oct 2025 16:50:44 +0200 [thread overview]
Message-ID: <20251024-rostig-stier-0bcd991850f5@brauner> (raw)
In-Reply-To: <f708a1119b2ad8cf2514b1df128a4ef7cf21c636.camel@fejes.dev>
> > Add a new listns() system call that allows userspace to iterate through
> > namespaces in the system. This provides a programmatic interface to
> > discover and inspect namespaces, enhancing existing namespace apis.
> >
> > Currently, there is no direct way for userspace to enumerate namespaces
> > in the system. Applications must resort to scanning /proc/<pid>/ns/
> > across all processes, which is:
> >
> > 1. Inefficient - requires iterating over all processes
> > 2. Incomplete - misses inactive namespaces that aren't attached to any
> > running process but are kept alive by file descriptors, bind mounts,
> > or parent namespace references
> > 3. Permission-heavy - requires access to /proc for many processes
> > 4. No ordering or ownership.
> > 5. No filtering per namespace type: Must always iterate and check all
> > namespaces.
> >
> > The list goes on. The listns() system call solves these problems by
> > providing direct kernel-level enumeration of namespaces. It is similar
> > to listmount() but obviously tailored to namespaces.
>
> I've been waiting for such an API for years; thanks for working on it. I mostly
> deal with network namespaces, where points 2 and 3 are especially painful.
>
> Recently, I've used this eBPF snippet to discover (at most 1024, because of the
> verifier's halt checking) network namespaces, even if no process is attached.
> But I can't do anything with it in userspace since it's not possible to pass the
> inode number or netns cookie value to setns()...
I've mentioned it in the cover letter and in my earlier reply to Josef:
On v6.18+ kernels it is possible to generate and open file handles to
namespaces. This is probably an api that people outside of fs/ proper
aren't all that familiar with.
In essence it allows you to refer to files - or more-general:
kernel-object that may be referenced via files - via opaque handles
instead of paths.
For regular filesystem that are multi-instance (IOW, you can have
multiple btrfs or ext4 filesystems mounted) such file handles cannot be
used without providing a file descriptor to another object in the
filesystem that is used to resolve the file handle...
However, for single-instance filesystems like pidfs and nsfs that's not
required which is why I added:
FD_PIDFS_ROOT
FD_NSFS_ROOT
which means that you can open both pidfds and namespace via
open_by_handle_at() purely based on the file handle. I call such file
handles "exhaustive file handles" because they fully describe the object
to be resolvable without any further information.
They are also not subject to the capable(CAP_DAC_READ_SEARCH) permission
check that regular file handles are and so can be used even by
unprivileged code as long as the caller is sufficiently privileged over
the relevant object (pid resolvable in caller's pid namespace of pidfds,
or caller located in namespace or privileged over the owning user
namespace of the relevant namespace for nsfs).
File handles for namespaces have the following uapi:
struct nsfs_file_handle {
__u64 ns_id;
__u32 ns_type;
__u32 ns_inum;
};
#define NSFS_FILE_HANDLE_SIZE_VER0 16 /* sizeof first published struct */
#define NSFS_FILE_HANDLE_SIZE_LATEST sizeof(struct nsfs_file_handle) /* sizeof latest published struct */
and it is explicitly allowed to generate such file handles manually in
userspace. When the kernel generates a namespace file handle via
name_to_handle_at() till will return: ns_id, ns_type, and ns_inum but
userspace is allowed to provide the kernel with a laxer file handle
where only the ns_id is filled in but ns_type and ns_inum are zero - at
least after this patch series.
So for your case where you even know inode number, ns type, and ns id
you can fill in a struct nsfs_file_handle and either look at my reply to
Josef or in the (ugly) tests.
fd = open_by_handle_at(FD_NSFS_ROOT, file_handle, O_RDONLY);
and can open the namespace (provided it is still active).
>
> extern const void net_namespace_list __ksym;
> static void list_all_netns()
> {
> struct list_head *nslist =
> bpf_core_cast(&net_namespace_list, struct list_head);
>
> struct list_head *iter = nslist->next;
>
> bpf_repeat(1024) {
This isn't needed anymore. I've implemented it in a bpf-friendly way so
it's possible to add kfuncs that would allow you to iterate through the
various namespace trees (locklessly).
If this is merged then I'll likely design that bpf part myself.
> After this merged, do you see any chance for backports? Does it rely on recent
> bits which is hard/impossible to backport? I'm not aware of backported syscalls
> but this would be really nice to see in older kernels.
Uhm, what downstream entities, managing kernels do is not my concern but
for upstream it's certainly not an option. There's a lot of preparatory
work that would have to be backported.
next prev parent reply other threads:[~2025-10-24 14:50 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-21 11:43 [PATCH RFC DRAFT 00/50] nstree: listns() Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 01/50] libfs: allow to specify s_d_flags Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 02/50] nsfs: use inode_just_drop() Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 03/50] nsfs: raise DCACHE_DONTCACHE explicitly Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 04/50] pidfs: " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 05/50] nsfs: raise SB_I_NODEV and SB_I_NOEXEC Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 06/50] nstree: simplify return Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 07/50] ns: initialize ns_list_node for initial namespaces Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 08/50] ns: add __ns_ref_read() Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 09/50] ns: add active reference count Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 10/50] ns: use anonymous struct to group list member Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 11/50] nstree: introduce a unified tree Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 12/50] nstree: allow lookup solely based on inode Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 13/50] nstree: assign fixed ids to the initial namespaces Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 14/50] ns: maintain list of owned namespaces Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 15/50] nstree: add listns() Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 16/50] arch: hookup listns() system call Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 17/50] nsfs: update tools header Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 18/50] selftests/filesystems: remove CLONE_NEWPIDNS from setup_userns() helper Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 19/50] selftests/namespaces: first active reference count tests Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 20/50] selftests/namespaces: second " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 21/50] selftests/namespaces: third " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 22/50] selftests/namespaces: fourth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 23/50] selftests/namespaces: fifth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 24/50] selftests/namespaces: sixth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 25/50] selftests/namespaces: seventh " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 26/50] selftests/namespaces: eigth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 27/50] selftests/namespaces: ninth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 28/50] selftests/namespaces: tenth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 29/50] selftests/namespaces: eleventh " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 30/50] selftests/namespaces: twelth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 31/50] selftests/namespaces: thirteenth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 32/50] selftests/namespaces: fourteenth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 33/50] selftests/namespaces: fifteenth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 34/50] selftests/namespaces: add listns() wrapper Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 35/50] selftests/namespaces: first listns() test Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 36/50] selftests/namespaces: second " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 37/50] selftests/namespaces: third " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 38/50] selftests/namespaces: fourth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 39/50] selftests/namespaces: fifth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 40/50] selftests/namespaces: sixth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 41/50] selftests/namespaces: seventh " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 42/50] selftests/namespaces: ninth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 43/50] " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 44/50] selftests/namespaces: first listns() permission test Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 45/50] selftests/namespaces: second " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 46/50] selftests/namespaces: third " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 47/50] selftests/namespaces: fourth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 48/50] selftests/namespaces: fifth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 49/50] selftests/namespaces: sixth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 50/50] selftests/namespaces: seventh " Christian Brauner
2025-10-21 14:34 ` [PATCH RFC DRAFT 00/50] nstree: listns() Josef Bacik
2025-10-22 8:34 ` Christian Brauner
2025-10-21 14:41 ` [syzbot ci] " syzbot ci
2025-10-22 11:00 ` [PATCH RFC DRAFT 00/50] " Ferenc Fejes
2025-10-24 14:50 ` Christian Brauner [this message]
2025-10-27 10:49 ` Ferenc Fejes
2025-10-22 11:28 ` Jeff Layton
2025-10-24 14:54 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251024-rostig-stier-0bcd991850f5@brauner \
--to=brauner@kernel.org \
--cc=amir73il@gmail.com \
--cc=arnd@arndb.de \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=cyphar@cyphar.com \
--cc=daan.j.demeyer@gmail.com \
--cc=edumazet@google.com \
--cc=ferenc@fejes.dev \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kuba@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=me@yhndnzj.com \
--cc=mzxreary@0pointer.de \
--cc=netdev@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=zbyszek@in.waw.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).