From: Ferenc Fejes <ferenc@fejes.dev>
To: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org,
	"Josef Bacik" <josef@toxicpanda.com>,
	"Jeff Layton" <jlayton@kernel.org>,
	"Jann Horn" <jannh@google.com>, "Mike Yuan" <me@yhndnzj.com>,
	"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
	"Lennart Poettering" <mzxreary@0pointer.de>,
	"Daan De Meyer" <daan.j.demeyer@gmail.com>,
	"Aleksa Sarai" <cyphar@cyphar.com>,
	"Amir Goldstein" <amir73il@gmail.com>,
	"Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Jan Kara" <jack@suse.cz>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	bpf@vger.kernel.org, "Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	netdev@vger.kernel.org, "Arnd Bergmann" <arnd@arndb.de>
Subject: Re: [PATCH RFC DRAFT 00/50] nstree: listns()
Date: Mon, 27 Oct 2025 11:49:20 +0100	[thread overview]
Message-ID: <a06ceeb57ba62aeb6df00bd49faad1bb5073321c.camel@fejes.dev> (raw)
In-Reply-To: <20251024-rostig-stier-0bcd991850f5@brauner>
On Fri, 2025-10-24 at 16:50 +0200, Christian Brauner wrote:
> > > Add a new listns() system call that allows userspace to iterate through
> > > namespaces in the system. This provides a programmatic interface to
> > > discover and inspect namespaces, enhancing existing namespace apis.
> > > 
> > > Currently, there is no direct way for userspace to enumerate namespaces
> > > in the system. Applications must resort to scanning /proc/<pid>/ns/
> > > across all processes, which is:
> > > 
> > > 1. Inefficient - requires iterating over all processes
> > > 2. Incomplete - misses inactive namespaces that aren't attached to any
> > >    running process but are kept alive by file descriptors, bind mounts,
> > >    or parent namespace references
> > > 3. Permission-heavy - requires access to /proc for many processes
> > > 4. No ordering or ownership.
> > > 5. No filtering per namespace type: Must always iterate and check all
> > >    namespaces.
> > > 
> > > The list goes on. The listns() system call solves these problems by
> > > providing direct kernel-level enumeration of namespaces. It is similar
> > > to listmount() but obviously tailored to namespaces.
> > 
> > I've been waiting for such an API for years; thanks for working on it. I
> > mostly
> > deal with network namespaces, where points 2 and 3 are especially painful.
> > 
> > Recently, I've used this eBPF snippet to discover (at most 1024, because of
> > the
> > verifier's halt checking) network namespaces, even if no process is
> > attached.
> > But I can't do anything with it in userspace since it's not possible to pass
> > the
> > inode number or netns cookie value to setns()...
> 
> I've mentioned it in the cover letter and in my earlier reply to Josef:
> 
> On v6.18+ kernels it is possible to generate and open file handles to
> namespaces. This is probably an api that people outside of fs/ proper
> aren't all that familiar with.
> 
> In essence it allows you to refer to files - or more-general:
> kernel-object that may be referenced via files - via opaque handles
> instead of paths.
> 
> For regular filesystem that are multi-instance (IOW, you can have
> multiple btrfs or ext4 filesystems mounted) such file handles cannot be
> used without providing a file descriptor to another object in the
> filesystem that is used to resolve the file handle...
> 
> However, for single-instance filesystems like pidfs and nsfs that's not
> required which is why I added:
> 
> FD_PIDFS_ROOT
> FD_NSFS_ROOT
> 
> which means that you can open both pidfds and namespace via
> open_by_handle_at() purely based on the file handle. I call such file
> handles "exhaustive file handles" because they fully describe the object
> to be resolvable without any further information.
> 
> They are also not subject to the capable(CAP_DAC_READ_SEARCH) permission
> check that regular file handles are and so can be used even by
> unprivileged code as long as the caller is sufficiently privileged over
> the relevant object (pid resolvable in caller's pid namespace of pidfds,
> or caller located in namespace or privileged over the owning user
> namespace of the relevant namespace for nsfs).
> 
> File handles for namespaces have the following uapi:
> 
> struct nsfs_file_handle {
> 	__u64 ns_id;
> 	__u32 ns_type;
> 	__u32 ns_inum;
> };
> 
> #define NSFS_FILE_HANDLE_SIZE_VER0 16 /* sizeof first published struct */
> #define NSFS_FILE_HANDLE_SIZE_LATEST sizeof(struct nsfs_file_handle) /* sizeof
> latest published struct */
> 
> and it is explicitly allowed to generate such file handles manually in
> userspace. When the kernel generates a namespace file handle via
> name_to_handle_at() till will return: ns_id, ns_type, and ns_inum but
> userspace is allowed to provide the kernel with a laxer file handle
> where only the ns_id is filled in but ns_type and ns_inum are zero - at
> least after this patch series.
> 
> So for your case where you even know inode number, ns type, and ns id
> you can fill in a struct nsfs_file_handle and either look at my reply to
> Josef or in the (ugly) tests.
> 
> fd = open_by_handle_at(FD_NSFS_ROOT, file_handle, O_RDONLY);
> 
> and can open the namespace (provided it is still active).
> 
> > 
> > extern const void net_namespace_list __ksym;
> > static void list_all_netns()
> > {
> >     struct list_head *nslist = 
> > 	bpf_core_cast(&net_namespace_list, struct list_head);
> > 
> >     struct list_head *iter = nslist->next;
> > 
> >     bpf_repeat(1024) {
> 
> This isn't needed anymore. I've implemented it in a bpf-friendly way so
> it's possible to add kfuncs that would allow you to iterate through the
> various namespace trees (locklessly).
> 
> If this is merged then I'll likely design that bpf part myself.
Excellent, thanks for the detailed explanation, noted! Well I guess I have to
keep my eyes closer on recent ns changes, I was aware of pidfs but not the
helpers you just mentioned.
> 
> > After this merged, do you see any chance for backports? Does it rely on
> > recent
> > bits which is hard/impossible to backport? I'm not aware of backported
> > syscalls
> > but this would be really nice to see in older kernels.
> 
> Uhm, what downstream entities, managing kernels do is not my concern but
> for upstream it's certainly not an option. There's a lot of preparatory
> work that would have to be backported.
I was curious about the upstream option, but I see this isn't feasible. Anyway,
its great we will have this in the future, thanks for doing it!
Ferenc
next prev parent reply	other threads:[~2025-10-27 10:49 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-21 11:43 [PATCH RFC DRAFT 00/50] nstree: listns() Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 01/50] libfs: allow to specify s_d_flags Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 02/50] nsfs: use inode_just_drop() Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 03/50] nsfs: raise DCACHE_DONTCACHE explicitly Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 04/50] pidfs: " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 05/50] nsfs: raise SB_I_NODEV and SB_I_NOEXEC Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 06/50] nstree: simplify return Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 07/50] ns: initialize ns_list_node for initial namespaces Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 08/50] ns: add __ns_ref_read() Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 09/50] ns: add active reference count Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 10/50] ns: use anonymous struct to group list member Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 11/50] nstree: introduce a unified tree Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 12/50] nstree: allow lookup solely based on inode Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 13/50] nstree: assign fixed ids to the initial namespaces Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 14/50] ns: maintain list of owned namespaces Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 15/50] nstree: add listns() Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 16/50] arch: hookup listns() system call Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 17/50] nsfs: update tools header Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 18/50] selftests/filesystems: remove CLONE_NEWPIDNS from setup_userns() helper Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 19/50] selftests/namespaces: first active reference count tests Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 20/50] selftests/namespaces: second " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 21/50] selftests/namespaces: third " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 22/50] selftests/namespaces: fourth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 23/50] selftests/namespaces: fifth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 24/50] selftests/namespaces: sixth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 25/50] selftests/namespaces: seventh " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 26/50] selftests/namespaces: eigth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 27/50] selftests/namespaces: ninth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 28/50] selftests/namespaces: tenth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 29/50] selftests/namespaces: eleventh " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 30/50] selftests/namespaces: twelth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 31/50] selftests/namespaces: thirteenth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 32/50] selftests/namespaces: fourteenth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 33/50] selftests/namespaces: fifteenth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 34/50] selftests/namespaces: add listns() wrapper Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 35/50] selftests/namespaces: first listns() test Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 36/50] selftests/namespaces: second " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 37/50] selftests/namespaces: third " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 38/50] selftests/namespaces: fourth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 39/50] selftests/namespaces: fifth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 40/50] selftests/namespaces: sixth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 41/50] selftests/namespaces: seventh " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 42/50] selftests/namespaces: ninth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 43/50] " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 44/50] selftests/namespaces: first listns() permission test Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 45/50] selftests/namespaces: second " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 46/50] selftests/namespaces: third " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 47/50] selftests/namespaces: fourth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 48/50] selftests/namespaces: fifth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 49/50] selftests/namespaces: sixth " Christian Brauner
2025-10-21 11:43 ` [PATCH RFC DRAFT 50/50] selftests/namespaces: seventh " Christian Brauner
2025-10-21 14:34 ` [PATCH RFC DRAFT 00/50] nstree: listns() Josef Bacik
2025-10-22  8:34   ` Christian Brauner
2025-10-21 14:41 ` [syzbot ci] " syzbot ci
2025-10-22 11:00 ` [PATCH RFC DRAFT 00/50] " Ferenc Fejes
2025-10-24 14:50   ` Christian Brauner
2025-10-27 10:49     ` Ferenc Fejes [this message]
2025-10-22 11:28 ` Jeff Layton
2025-10-24 14:54   ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=a06ceeb57ba62aeb6df00bd49faad1bb5073321c.camel@fejes.dev \
    --to=ferenc@fejes.dev \
    --cc=amir73il@gmail.com \
    --cc=arnd@arndb.de \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cyphar@cyphar.com \
    --cc=daan.j.demeyer@gmail.com \
    --cc=edumazet@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jlayton@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=kuba@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=me@yhndnzj.com \
    --cc=mzxreary@0pointer.de \
    --cc=netdev@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zbyszek@in.waw.pl \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).