From: Demi Marie Obenour <demiobenour@gmail.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>,
lsf-pc@lists.linux-foundation.org,
aleksandr.mikhalitsyn@futurfusion.io,
linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
stgraber@stgraber.org, ksugihara@preferred.jp,
utam0k@preferred.jp, trondmy@kernel.org, anna@kernel.org,
jlayton@kernel.org, chuck.lever@oracle.com, neilb@suse.de,
miklos@szeredi.hu, jack@suse.cz, amir73il@gmail.com,
trapexit@spawn.link
Subject: Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
Date: Tue, 24 Feb 2026 09:18:21 -0500 [thread overview]
Message-ID: <3cc6a84f-2f97-49e8-909a-ea69f9a97fb0@gmail.com> (raw)
In-Reply-To: <20260224-bannen-waldlauf-481ad13899a9@brauner>
[-- Attachment #1.1.1: Type: text/plain, Size: 6453 bytes --]
On 2/24/26 03:54, Christian Brauner wrote:
> On Sat, Feb 21, 2026 at 01:44:26AM -0500, Demi Marie Obenour wrote:
>> On 2/18/26 07:44, Alexander Mikhalitsyn wrote:
>>> Dear friends,
>>>
>>> I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
>>>
>>> Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
>>> from Christian.
>>>
>>> This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
>>> intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
>>> FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
>>> ceph folks about the right way to support idmaps.
>>>
>>> One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
>>> In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
>>> Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
>>> The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
>>> which makes cephfs FDs not very transferable through unix sockets. [3]
>>>
>>> These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
>>> not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
>>> VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
>>> For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
>>> used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
>>>
>>> We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
>>> was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
>>> of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
>>> Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
>>> POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
>>> (taken from FUSE request header).
>>>
>>> We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
>>> to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
>>> make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
>>> to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
>>> summarize everything and prepare some slides to navigate/plan discussion.
>>>
>>> [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
>>> [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
>>> [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
>>> [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
>>> [5]
>>> mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
>>>
>>> Kind regards,
>>> Alexander Mikhalitsyn @ futurfusion.io
>>
>> The secure case (strong authentication) has similar problems to
>> In both cases, there is no way to store the files with the UID/GID/etc
>
> It's easy to support idmapped mounts without user namespaces. They're
> completely decoupled from them already for that purpose so they can
> support id squashing and so on going forward. The only thing that's
> needed is to extend the api so that we can specific mappings to be used
> for a mount. That's not difficult and there's no need to adhere to any
> inherent limit on the number of mappings that user namespaces have.
>
> It's also useful indepent of all that for local filesystems that want to
> expose files with different ownership at different locations without
> getting into namespaces at all.
Virtiofsd needs user namespaces to create a mount unless it runs as
real root.
>> that the VFS says they should have. The server (NFS) or kernel
>> (virtiofsd) simply will not (and, for security reasons, *must not*)
>> allow this.
>>
>> I proposed a workaround for virtiofsd [1] that I will also propose
>> here: store the mapped UID and GID as a user.* xattr. This requires
>
> xattrs as an ownership side-channel are an absolute clusterfuck. The
> kernel implementation for POSIX ACLs and filesystem capabilities that
> slap ownership information that the VFS must consume on arbitraries
> inodes should be set on fire.I've burned way too many cycles getting
> this into an even remotely acceptable shape and it still sucks to no
> end. Permission checking is a completely nightmare because we need to go
> fetch stuff from disk, cache it in a global format, then do an in-place
> translation having to parse ownership out of binary data stored
> alongside the inode.
Why is this so bad? What would be a better way to do this?
> Nowever, if userspace wants to consume ownership information by storing
> arbitrary ownership information as user.* xattrs then I obviously
> couldn't care less but it won't nest, performance will suck, and it will
> be brittle to get this right imho.
It can be nested by repeatedly remapping xattrs: user.o, user.o.o,
and so on. More efficient schemes undoubtedly exist.
Is there a better solution? The only one I can think of is to include
subordinate UIDs/GIDs in Kerberos tickets, and that fails when the
product of (total users in an installation * number of subordinate
UIDs/GIDs per user) approaches 2^32. This can happen if both are
65536.
I did suggest replacing UIDs and GIDs with NT-style SIDs, but that
is an absolutely enormous change across the entire system.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
prev parent reply other threads:[~2026-02-24 14:18 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-18 12:44 [LSF/MM/BPF TOPIC] VFS idmappings support in NFS Alexander Mikhalitsyn
2026-02-18 13:49 ` Jeff Layton
2026-02-18 14:36 ` Alexander Mikhalitsyn
2026-02-18 16:01 ` Jeff Layton
2026-02-18 16:39 ` Alexander Mikhalitsyn
2026-02-19 0:57 ` NeilBrown
2026-02-19 8:53 ` Kohei Sugihara
2026-02-18 14:37 ` Trond Myklebust
2026-02-18 15:08 ` Jeff Layton
2026-02-18 15:25 ` Alexander Mikhalitsyn
2026-02-21 6:44 ` Demi Marie Obenour
2026-02-24 8:54 ` Christian Brauner
2026-02-24 14:18 ` Demi Marie Obenour [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3cc6a84f-2f97-49e8-909a-ea69f9a97fb0@gmail.com \
--to=demiobenour@gmail.com \
--cc=aleksandr.mikhalitsyn@futurfusion.io \
--cc=alexander@mihalicyn.com \
--cc=amir73il@gmail.com \
--cc=anna@kernel.org \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=ksugihara@preferred.jp \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=miklos@szeredi.hu \
--cc=neilb@suse.de \
--cc=stgraber@stgraber.org \
--cc=trapexit@spawn.link \
--cc=trondmy@kernel.org \
--cc=utam0k@preferred.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox