* [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
@ 2026-02-18 12:44 Alexander Mikhalitsyn
2026-02-18 13:49 ` Jeff Layton
2026-02-21 6:44 ` Demi Marie Obenour
0 siblings, 2 replies; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2026-02-18 12:44 UTC (permalink / raw)
To: lsf-pc
Cc: aleksandr.mikhalitsyn, linux-fsdevel, linux-nfs, stgraber,
brauner, ksugihara, utam0k, trondmy, anna, jlayton, chuck.lever,
neilb, miklos, jack, amir73il, trapexit
[-- Attachment #1: Type: text/plain, Size: 3431 bytes --]
Dear friends,
I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
from Christian.
This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
ceph folks about the right way to support idmaps.
One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
which makes cephfs FDs not very transferable through unix sockets. [3]
These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
(taken from FUSE request header).
We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
summarize everything and prepare some slides to navigate/plan discussion.
[1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
[2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
[3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
[4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
[5]
mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
Kind regards,
Alexander Mikhalitsyn @ futurfusion.io
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-18 12:44 [LSF/MM/BPF TOPIC] VFS idmappings support in NFS Alexander Mikhalitsyn
@ 2026-02-18 13:49 ` Jeff Layton
2026-02-18 14:36 ` Alexander Mikhalitsyn
2026-02-18 14:37 ` Trond Myklebust
2026-02-21 6:44 ` Demi Marie Obenour
1 sibling, 2 replies; 13+ messages in thread
From: Jeff Layton @ 2026-02-18 13:49 UTC (permalink / raw)
To: Alexander Mikhalitsyn, lsf-pc
Cc: aleksandr.mikhalitsyn, linux-fsdevel, linux-nfs, stgraber,
brauner, ksugihara, utam0k, trondmy, anna, chuck.lever, neilb,
miklos, jack, amir73il, trapexit
On Wed, 2026-02-18 at 13:44 +0100, Alexander Mikhalitsyn wrote:
> Dear friends,
>
> I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
>
> Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
> from Christian.
>
> This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
> intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
> FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
> ceph folks about the right way to support idmaps.
>
> One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
> In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
> Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
> The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
> which makes cephfs FDs not very transferable through unix sockets. [3]
>
> These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
> not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
> VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
> For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
> used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
>
> We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
> was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
> of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
> Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
> POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
> (taken from FUSE request header).
>
> We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
> to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
> make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
> to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
> summarize everything and prepare some slides to navigate/plan discussion.
>
> [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> [5]
> mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
>
> Kind regards,
> Alexander Mikhalitsyn @ futurfusion.io
IIUC, people mostly use vfs-layer idmappings because they want to remap
the uid/gid values of files that get stored on the backing store (disk,
ceph MDS, or whatever).
I've never used idmappings myself much in practice. Could you lay out
an example of how you would use them with NFS in a real environment so
I understand the problem better? I'd start by assuming a simple setup
with AUTH_SYS and no NFSv4 idmapping involved, since that case should
be fairly straightforward.
Mixing in AUTH_GSS and real idmapping will be where things get harder,
so let's not worry about those cases for now.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-18 13:49 ` Jeff Layton
@ 2026-02-18 14:36 ` Alexander Mikhalitsyn
2026-02-18 16:01 ` Jeff Layton
2026-02-18 14:37 ` Trond Myklebust
1 sibling, 1 reply; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2026-02-18 14:36 UTC (permalink / raw)
To: Jeff Layton
Cc: lsf-pc, aleksandr.mikhalitsyn, linux-fsdevel, linux-nfs, stgraber,
brauner, ksugihara, utam0k, trondmy, anna, chuck.lever, neilb,
miklos, jack, amir73il, trapexit
Am Mi., 18. Feb. 2026 um 14:49 Uhr schrieb Jeff Layton <jlayton@kernel.org>:
>
> On Wed, 2026-02-18 at 13:44 +0100, Alexander Mikhalitsyn wrote:
> > Dear friends,
> >
> > I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
> >
> > Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
> > from Christian.
> >
> > This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
> > intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
> > FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
> > ceph folks about the right way to support idmaps.
> >
> > One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
> > In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
> > Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
> > The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
> > which makes cephfs FDs not very transferable through unix sockets. [3]
> >
> > These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
> > not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
> > VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
> > For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
> > used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
> >
> > We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
> > was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
> > of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
> > Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
> > POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
> > (taken from FUSE request header).
> >
> > We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
> > to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
> > make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
> > to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
> > summarize everything and prepare some slides to navigate/plan discussion.
> >
> > [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> > [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> > [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> > [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> > [5]
> > mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
> >
> > Kind regards,
> > Alexander Mikhalitsyn @ futurfusion.io
>
Hi Jeff,
thanks for such a fast reply! ;)
>
> IIUC, people mostly use vfs-layer idmappings because they want to remap
> the uid/gid values of files that get stored on the backing store (disk,
> ceph MDS, or whatever).
yes, precisely.
>
> I've never used idmappings myself much in practice. Could you lay out
> an example of how you would use them with NFS in a real environment so
> I understand the problem better? I'd start by assuming a simple setup
> with AUTH_SYS and no NFSv4 idmapping involved, since that case should
> be fairly straightforward.
For me, from the point of LXC/Incus project, idmapped mounts are used as
a way to "delegate" filesystems (or subtrees) to the containers:
1. We, of course, assume that container enables user namespaces and
user can't mount a filesystem
inside because it has no FS_USERNS_MOUNT flag set (like in case of Cephfs, NFS,
CIFS and many others).
2. At the same time host's system administrator wants to avoid
remapping between container's user ns and
sb->s_user_ns (which is init_user_ns for those filesystems). [
motivation here is that in many
cases you may want to have the same subtree to be shared with other
containers and even host users too and
you want UIDs to be "compatible", i.e UID 1000 in one container and
UID 1000 in another container should
land as UID 1000 on the filesystem's inode ]
For this usecase, when we bind-mount filesystem to container, we apply
VFS idmap equal to container's
user namespace. This makes a behavior I described.
But this is just one use case. I'm pretty sure there are some more
around here :)
I know that folks from Preferred Networks (preferred.jp) are also
interested in VFS idmap support in NFS,
probably they can share some ideas/use cases too.
>
> Mixing in AUTH_GSS and real idmapping will be where things get harder,
> so let's not worry about those cases for now.
> --
> Jeff Layton <jlayton@kernel.org>
Kind regards,
Alex
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-18 13:49 ` Jeff Layton
2026-02-18 14:36 ` Alexander Mikhalitsyn
@ 2026-02-18 14:37 ` Trond Myklebust
2026-02-18 15:08 ` Jeff Layton
1 sibling, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2026-02-18 14:37 UTC (permalink / raw)
To: Jeff Layton, Alexander Mikhalitsyn, lsf-pc
Cc: aleksandr.mikhalitsyn, linux-fsdevel, linux-nfs, stgraber,
brauner, ksugihara, utam0k, anna, chuck.lever, neilb, miklos,
jack, amir73il, trapexit
On Wed, 2026-02-18 at 08:49 -0500, Jeff Layton wrote:
> On Wed, 2026-02-18 at 13:44 +0100, Alexander Mikhalitsyn wrote:
> > Dear friends,
> >
> > I would like to propose "VFS idmappings support in NFS" as a topic
> > for discussion at the LSF/MM/BPF Summit.
> >
> > Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and
> > cephfs [1] with support/guidance
> > from Christian.
> >
> > This experience with Cephfs & FUSE has shown that VFS idmap
> > semantics, while being very elegant and
> > intuitive for local filesystems, can be quite challenging to
> > combine with network/network-like (e.g. FUSE)
> > FSes. In case of Cephfs we had to modify its protocol (!) (see [2])
> > as a part of our agreement with
> > ceph folks about the right way to support idmaps.
> >
> > One obstacle here was that cephfs has some features that are not
> > very Linux-wayish, I would say.
> > In particular, system administrator can configure path-based
> > UID/GID restrictions on a *server*-side (Ceph MDS).
> > Basically, you can say "I expect UID 1000 and GID 2000 for all
> > files under /stuff directory".
> > The problem here is that these UID/GIDs are taken from a syscall-
> > caller's creds (not from (struct file *)->f_cred)
> > which makes cephfs FDs not very transferable through unix sockets.
> > [3]
> >
> > These path-based UID/GID restrictions mean that server expects
> > client to send UID/GID with every single request,
> > not only for those OPs where UID/GID needs to be written to the
> > disk (mknod, mkdir, symlink, etc).
> > VFS idmaps API is designed to prevent filesystems developers from
> > making a mistakes when supporting FS_ALLOW_IDMAP.
> > For example, (struct mnt_idmap *) is not passed to every single
> > i_op, but instead to only those where it can be
> > used legitimately. Particularly, readlink/listxattr or rmdir are
> > not expected to use idmapping information anyhow.
> >
> > We've seen very similar challenges with FUSE. Not a long time ago
> > on Linux Containers project forum, there
> > was a discussion about mergerfs (a popular FUSE-based filesystem) &
> > VFS idmaps [5]. And I see that this problem
> > of "caller UID/GID are needed everywhere" still blocks VFS idmaps
> > adoption in some usecases.
> > Antonio Musumeci (mergerfs maintainer) claimed that in many cases
> > filesystems behind mergerfs may not be fully
> > POSIX and basically, when mergerfs does IO on the underlying FSes
> > it needs to do UID/GID switch to caller's UID/GID
> > (taken from FUSE request header).
> >
> > We don't expect NFS to be any simpler :-) I would say that
> > supporting NFS is a final boss. It would be great
> > to have a deep technical discussion with VFS/FSes maintainers and
> > developers about all these challenges and
> > make some conclusions and identify a right direction/approach to
> > these problems. From my side, I'm going
> > to get more familiar with high-level part of NFS (or even make PoC
> > if time permits), identify challenges,
> > summarize everything and prepare some slides to navigate/plan
> > discussion.
> >
> > [1] cephfs
> > https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> > [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> > [3] cephfs & f_cred
> > https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> > [4] fuse/virtiofs
> > https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> > [5]
> > mergerfs
> > https://discuss.linuxcontainers.org/t/is-it-the-case-that-you-
> > cannot-use-shift-true-for-disk-devices-where-the-source-is-a-
> > mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
> >
> > Kind regards,
> > Alexander Mikhalitsyn @ futurfusion.io
>
>
> IIUC, people mostly use vfs-layer idmappings because they want to
> remap
> the uid/gid values of files that get stored on the backing store
> (disk,
> ceph MDS, or whatever).
>
> I've never used idmappings myself much in practice. Could you lay out
> an example of how you would use them with NFS in a real environment
> so
> I understand the problem better? I'd start by assuming a simple setup
> with AUTH_SYS and no NFSv4 idmapping involved, since that case should
> be fairly straightforward.
>
> Mixing in AUTH_GSS and real idmapping will be where things get
> harder,
> so let's not worry about those cases for now.
I think you do need to worry about those cases. As the NFS and RPC
protocols stand today, strong authentication will defeat any client
side idmapping scheme, because the server can't know what uids or gids
the client is using on its end; it just knows about the account that
was used to authenticate.
I think if you do want to implement something generic, you're going to
have to consider how the client and server can exchange (and store) the
information needed to allow the client to perform the mapping of file
owners/group owners on its end. The client would presumably also need
to be in charge of enforcing permissions for such mappings.
It would be a very different security model than the one used by NFS
today, and almost certainly require protocol extensions.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-18 14:37 ` Trond Myklebust
@ 2026-02-18 15:08 ` Jeff Layton
2026-02-18 15:25 ` Alexander Mikhalitsyn
0 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2026-02-18 15:08 UTC (permalink / raw)
To: Trond Myklebust, Alexander Mikhalitsyn, lsf-pc
Cc: aleksandr.mikhalitsyn, linux-fsdevel, linux-nfs, stgraber,
brauner, ksugihara, utam0k, anna, chuck.lever, neilb, miklos,
jack, amir73il, trapexit
On Wed, 2026-02-18 at 09:37 -0500, Trond Myklebust wrote:
> On Wed, 2026-02-18 at 08:49 -0500, Jeff Layton wrote:
> > On Wed, 2026-02-18 at 13:44 +0100, Alexander Mikhalitsyn wrote:
> > > Dear friends,
> > >
> > > I would like to propose "VFS idmappings support in NFS" as a topic
> > > for discussion at the LSF/MM/BPF Summit.
> > >
> > > Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and
> > > cephfs [1] with support/guidance
> > > from Christian.
> > >
> > > This experience with Cephfs & FUSE has shown that VFS idmap
> > > semantics, while being very elegant and
> > > intuitive for local filesystems, can be quite challenging to
> > > combine with network/network-like (e.g. FUSE)
> > > FSes. In case of Cephfs we had to modify its protocol (!) (see [2])
> > > as a part of our agreement with
> > > ceph folks about the right way to support idmaps.
> > >
> > > One obstacle here was that cephfs has some features that are not
> > > very Linux-wayish, I would say.
> > > In particular, system administrator can configure path-based
> > > UID/GID restrictions on a *server*-side (Ceph MDS).
> > > Basically, you can say "I expect UID 1000 and GID 2000 for all
> > > files under /stuff directory".
> > > The problem here is that these UID/GIDs are taken from a syscall-
> > > caller's creds (not from (struct file *)->f_cred)
> > > which makes cephfs FDs not very transferable through unix sockets.
> > > [3]
> > >
> > > These path-based UID/GID restrictions mean that server expects
> > > client to send UID/GID with every single request,
> > > not only for those OPs where UID/GID needs to be written to the
> > > disk (mknod, mkdir, symlink, etc).
> > > VFS idmaps API is designed to prevent filesystems developers from
> > > making a mistakes when supporting FS_ALLOW_IDMAP.
> > > For example, (struct mnt_idmap *) is not passed to every single
> > > i_op, but instead to only those where it can be
> > > used legitimately. Particularly, readlink/listxattr or rmdir are
> > > not expected to use idmapping information anyhow.
> > >
> > > We've seen very similar challenges with FUSE. Not a long time ago
> > > on Linux Containers project forum, there
> > > was a discussion about mergerfs (a popular FUSE-based filesystem) &
> > > VFS idmaps [5]. And I see that this problem
> > > of "caller UID/GID are needed everywhere" still blocks VFS idmaps
> > > adoption in some usecases.
> > > Antonio Musumeci (mergerfs maintainer) claimed that in many cases
> > > filesystems behind mergerfs may not be fully
> > > POSIX and basically, when mergerfs does IO on the underlying FSes
> > > it needs to do UID/GID switch to caller's UID/GID
> > > (taken from FUSE request header).
> > >
> > > We don't expect NFS to be any simpler :-) I would say that
> > > supporting NFS is a final boss. It would be great
> > > to have a deep technical discussion with VFS/FSes maintainers and
> > > developers about all these challenges and
> > > make some conclusions and identify a right direction/approach to
> > > these problems. From my side, I'm going
> > > to get more familiar with high-level part of NFS (or even make PoC
> > > if time permits), identify challenges,
> > > summarize everything and prepare some slides to navigate/plan
> > > discussion.
> > >
> > > [1] cephfs
> > > https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> > > [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> > > [3] cephfs & f_cred
> > > https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> > > [4] fuse/virtiofs
> > > https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> > > [5]
> > > mergerfs
> > > https://discuss.linuxcontainers.org/t/is-it-the-case-that-you-
> > > cannot-use-shift-true-for-disk-devices-where-the-source-is-a-
> > > mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
> > >
> > > Kind regards,
> > > Alexander Mikhalitsyn @ futurfusion.io
> >
> >
> > IIUC, people mostly use vfs-layer idmappings because they want to
> > remap
> > the uid/gid values of files that get stored on the backing store
> > (disk,
> > ceph MDS, or whatever).
> >
> > I've never used idmappings myself much in practice. Could you lay out
> > an example of how you would use them with NFS in a real environment
> > so
> > I understand the problem better? I'd start by assuming a simple setup
> > with AUTH_SYS and no NFSv4 idmapping involved, since that case should
> > be fairly straightforward.
> >
> > Mixing in AUTH_GSS and real idmapping will be where things get
> > harder,
> > so let's not worry about those cases for now.
>
> I think you do need to worry about those cases. As the NFS and RPC
> protocols stand today, strong authentication will defeat any client
> side idmapping scheme, because the server can't know what uids or gids
> the client is using on its end; it just knows about the account that
> was used to authenticate.
>
Oh, we absolutely need to worry about them, but this is a difficult
topic to get our arms around. We can potentially have several layers
that are doing idmapping, so I want to understand a simple use-case
first. Once that's clear I plan to start throwing in monkey wrenches.
> I think if you do want to implement something generic, you're going to
> have to consider how the client and server can exchange (and store) the
> information needed to allow the client to perform the mapping of file
> owners/group owners on its end. The client would presumably also need
> to be in charge of enforcing permissions for such mappings.
> It would be a very different security model than the one used by NFS
> today, and almost certainly require protocol extensions.
That may be, but I still don't fully understand the use-case here.
Maybe they'd be content with just shifting UIDs at a higher level
without changing the protocol? Without understanding how they intend to
use this, it's hard to know what's needed.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-18 15:08 ` Jeff Layton
@ 2026-02-18 15:25 ` Alexander Mikhalitsyn
0 siblings, 0 replies; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2026-02-18 15:25 UTC (permalink / raw)
To: Jeff Layton
Cc: Trond Myklebust, lsf-pc, aleksandr.mikhalitsyn, linux-fsdevel,
linux-nfs, stgraber, brauner, ksugihara, utam0k, anna,
chuck.lever, neilb, miklos, jack, amir73il, trapexit
Am Mi., 18. Feb. 2026 um 16:08 Uhr schrieb Jeff Layton <jlayton@kernel.org>:
>
> On Wed, 2026-02-18 at 09:37 -0500, Trond Myklebust wrote:
> > On Wed, 2026-02-18 at 08:49 -0500, Jeff Layton wrote:
> > > On Wed, 2026-02-18 at 13:44 +0100, Alexander Mikhalitsyn wrote:
> > > > Dear friends,
> > > >
> > > > I would like to propose "VFS idmappings support in NFS" as a topic
> > > > for discussion at the LSF/MM/BPF Summit.
> > > >
> > > > Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and
> > > > cephfs [1] with support/guidance
> > > > from Christian.
> > > >
> > > > This experience with Cephfs & FUSE has shown that VFS idmap
> > > > semantics, while being very elegant and
> > > > intuitive for local filesystems, can be quite challenging to
> > > > combine with network/network-like (e.g. FUSE)
> > > > FSes. In case of Cephfs we had to modify its protocol (!) (see [2])
> > > > as a part of our agreement with
> > > > ceph folks about the right way to support idmaps.
> > > >
> > > > One obstacle here was that cephfs has some features that are not
> > > > very Linux-wayish, I would say.
> > > > In particular, system administrator can configure path-based
> > > > UID/GID restrictions on a *server*-side (Ceph MDS).
> > > > Basically, you can say "I expect UID 1000 and GID 2000 for all
> > > > files under /stuff directory".
> > > > The problem here is that these UID/GIDs are taken from a syscall-
> > > > caller's creds (not from (struct file *)->f_cred)
> > > > which makes cephfs FDs not very transferable through unix sockets.
> > > > [3]
> > > >
> > > > These path-based UID/GID restrictions mean that server expects
> > > > client to send UID/GID with every single request,
> > > > not only for those OPs where UID/GID needs to be written to the
> > > > disk (mknod, mkdir, symlink, etc).
> > > > VFS idmaps API is designed to prevent filesystems developers from
> > > > making a mistakes when supporting FS_ALLOW_IDMAP.
> > > > For example, (struct mnt_idmap *) is not passed to every single
> > > > i_op, but instead to only those where it can be
> > > > used legitimately. Particularly, readlink/listxattr or rmdir are
> > > > not expected to use idmapping information anyhow.
> > > >
> > > > We've seen very similar challenges with FUSE. Not a long time ago
> > > > on Linux Containers project forum, there
> > > > was a discussion about mergerfs (a popular FUSE-based filesystem) &
> > > > VFS idmaps [5]. And I see that this problem
> > > > of "caller UID/GID are needed everywhere" still blocks VFS idmaps
> > > > adoption in some usecases.
> > > > Antonio Musumeci (mergerfs maintainer) claimed that in many cases
> > > > filesystems behind mergerfs may not be fully
> > > > POSIX and basically, when mergerfs does IO on the underlying FSes
> > > > it needs to do UID/GID switch to caller's UID/GID
> > > > (taken from FUSE request header).
> > > >
> > > > We don't expect NFS to be any simpler :-) I would say that
> > > > supporting NFS is a final boss. It would be great
> > > > to have a deep technical discussion with VFS/FSes maintainers and
> > > > developers about all these challenges and
> > > > make some conclusions and identify a right direction/approach to
> > > > these problems. From my side, I'm going
> > > > to get more familiar with high-level part of NFS (or even make PoC
> > > > if time permits), identify challenges,
> > > > summarize everything and prepare some slides to navigate/plan
> > > > discussion.
> > > >
> > > > [1] cephfs
> > > > https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> > > > [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> > > > [3] cephfs & f_cred
> > > > https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> > > > [4] fuse/virtiofs
> > > > https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> > > > [5]
> > > > mergerfs
> > > > https://discuss.linuxcontainers.org/t/is-it-the-case-that-you-
> > > > cannot-use-shift-true-for-disk-devices-where-the-source-is-a-
> > > > mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
> > > >
> > > > Kind regards,
> > > > Alexander Mikhalitsyn @ futurfusion.io
> > >
> > >
> > > IIUC, people mostly use vfs-layer idmappings because they want to
> > > remap
> > > the uid/gid values of files that get stored on the backing store
> > > (disk,
> > > ceph MDS, or whatever).
> > >
> > > I've never used idmappings myself much in practice. Could you lay out
> > > an example of how you would use them with NFS in a real environment
> > > so
> > > I understand the problem better? I'd start by assuming a simple setup
> > > with AUTH_SYS and no NFSv4 idmapping involved, since that case should
> > > be fairly straightforward.
> > >
> > > Mixing in AUTH_GSS and real idmapping will be where things get
> > > harder,
> > > so let's not worry about those cases for now.
> >
> > I think you do need to worry about those cases. As the NFS and RPC
> > protocols stand today, strong authentication will defeat any client
> > side idmapping scheme, because the server can't know what uids or gids
> > the client is using on its end; it just knows about the account that
> > was used to authenticate.
> >
>
> Oh, we absolutely need to worry about them, but this is a difficult
> topic to get our arms around. We can potentially have several layers
> that are doing idmapping, so I want to understand a simple use-case
> first. Once that's clear I plan to start throwing in monkey wrenches.
>
> > I think if you do want to implement something generic, you're going to
> > have to consider how the client and server can exchange (and store) the
> > information needed to allow the client to perform the mapping of file
> > owners/group owners on its end. The client would presumably also need
> > to be in charge of enforcing permissions for such mappings.
> > It would be a very different security model than the one used by NFS
> > today, and almost certainly require protocol extensions.
>
> That may be, but I still don't fully understand the use-case here.
Please, let me know if my earlier reply doesn't clarify LXC/Incus use case.
I can prepare a more detailed explanation with command line/configuration
examples with pleasure.
> Maybe they'd be content with just shifting UIDs at a higher level
> without changing the protocol? Without understanding how they intend to
> use this, it's hard to know what's needed.
If you ask me, I have no problem or I would say more, I look positively
on the way "keep it high level & don't touch NFS protocol" ;-)
But I remember a very tight discussion (good context [1]) about Cephfs and
this way wasn't considered as acceptable back then (and we had to make
a protocol extension).
We can always go iteratively, and first version can be simple and then on-demand
we can support more tricky cases if this is acceptable for you guys.
You set the rules. ;-)
[1] https://lore.kernel.org/lkml/f3864ed6-8c97-8a7a-f268-dab29eb2fb21@redhat.com/
Kind regards,
Alex
>
> --
> Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-18 14:36 ` Alexander Mikhalitsyn
@ 2026-02-18 16:01 ` Jeff Layton
2026-02-18 16:39 ` Alexander Mikhalitsyn
2026-02-19 0:57 ` NeilBrown
0 siblings, 2 replies; 13+ messages in thread
From: Jeff Layton @ 2026-02-18 16:01 UTC (permalink / raw)
To: Alexander Mikhalitsyn
Cc: lsf-pc, aleksandr.mikhalitsyn, linux-fsdevel, linux-nfs, stgraber,
brauner, ksugihara, utam0k, trondmy, anna, chuck.lever, neilb,
miklos, jack, amir73il, trapexit
On Wed, 2026-02-18 at 15:36 +0100, Alexander Mikhalitsyn wrote:
> Am Mi., 18. Feb. 2026 um 14:49 Uhr schrieb Jeff Layton <jlayton@kernel.org>:
> >
> > On Wed, 2026-02-18 at 13:44 +0100, Alexander Mikhalitsyn wrote:
> > > Dear friends,
> > >
> > > I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
> > >
> > > Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
> > > from Christian.
> > >
> > > This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
> > > intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
> > > FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
> > > ceph folks about the right way to support idmaps.
> > >
> > > One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
> > > In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
> > > Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
> > > The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
> > > which makes cephfs FDs not very transferable through unix sockets. [3]
> > >
> > > These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
> > > not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
> > > VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
> > > For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
> > > used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
> > >
> > > We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
> > > was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
> > > of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
> > > Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
> > > POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
> > > (taken from FUSE request header).
> > >
> > > We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
> > > to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
> > > make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
> > > to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
> > > summarize everything and prepare some slides to navigate/plan discussion.
> > >
> > > [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> > > [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> > > [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> > > [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> > > [5]
> > > mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
> > >
> > > Kind regards,
> > > Alexander Mikhalitsyn @ futurfusion.io
> >
>
> Hi Jeff,
>
> thanks for such a fast reply! ;)
>
> >
> > IIUC, people mostly use vfs-layer idmappings because they want to remap
> > the uid/gid values of files that get stored on the backing store (disk,
> > ceph MDS, or whatever).
>
> yes, precisely.
>
> >
> > I've never used idmappings myself much in practice. Could you lay out
> > an example of how you would use them with NFS in a real environment so
> > I understand the problem better? I'd start by assuming a simple setup
> > with AUTH_SYS and no NFSv4 idmapping involved, since that case should
> > be fairly straightforward.
>
> For me, from the point of LXC/Incus project, idmapped mounts are used as
> a way to "delegate" filesystems (or subtrees) to the containers:
> 1. We, of course, assume that container enables user namespaces and
> user can't mount a filesystem
> inside because it has no FS_USERNS_MOUNT flag set (like in case of Cephfs, NFS,
> CIFS and many others).
> 2. At the same time host's system administrator wants to avoid
> remapping between container's user ns and
> sb->s_user_ns (which is init_user_ns for those filesystems). [
> motivation here is that in many
> cases you may want to have the same subtree to be shared with other
> containers and even host users too and
> you want UIDs to be "compatible", i.e UID 1000 in one container and
> UID 1000 in another container should
> land as UID 1000 on the filesystem's inode ]
>
> For this usecase, when we bind-mount filesystem to container, we apply
> VFS idmap equal to container's
> user namespace. This makes a behavior I described.
>
Ok: so you have a process running in a userns as UID 2000 and you want
to use vfs layer idmapping so that when you create a file as that user
that it ends up being owned by UID 1000. Is that basically correct?
Typically, the RPC credentials used in an OPEN or CREATE call is what
determines its ownership (at least until a SETATTR comes in). With
AUTH_SYS, the credential is just a uid and set of gids.
So in this case, it sounds like you would need just do that conversion
(maybe at the RPC client layer?) when issuing an RPC. You don't really
need a protocol extension for that case.
As Trond points out though, AUTH_GSS and NFSv4 idmapping will make this
more complex. Once you're using kerberos credentials for
authentication, you don't have much control over what the UIDs and GIDs
will be on newly-created files, but is that really a problem? As long
as all of the clients have a consistent view, I wouldn't think so.
> But this is just one use case. I'm pretty sure there are some more
> around here :)
> I know that folks from Preferred Networks (preferred.jp) are also
> interested in VFS idmap support in NFS,
> probably they can share some ideas/use cases too.
>
>
Yes, we don't want to focus too much on a single use-case, but I find
it helpful to focus on a single simple problem first.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-18 16:01 ` Jeff Layton
@ 2026-02-18 16:39 ` Alexander Mikhalitsyn
2026-02-19 0:57 ` NeilBrown
1 sibling, 0 replies; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2026-02-18 16:39 UTC (permalink / raw)
To: Jeff Layton
Cc: lsf-pc, aleksandr.mikhalitsyn, linux-fsdevel, linux-nfs, stgraber,
brauner, ksugihara, utam0k, trondmy, anna, chuck.lever, neilb,
miklos, jack, amir73il, trapexit
Am Mi., 18. Feb. 2026 um 17:01 Uhr schrieb Jeff Layton <jlayton@kernel.org>:
>
> On Wed, 2026-02-18 at 15:36 +0100, Alexander Mikhalitsyn wrote:
> > Am Mi., 18. Feb. 2026 um 14:49 Uhr schrieb Jeff Layton <jlayton@kernel.org>:
> > >
> > > On Wed, 2026-02-18 at 13:44 +0100, Alexander Mikhalitsyn wrote:
> > > > Dear friends,
> > > >
> > > > I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
> > > >
> > > > Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
> > > > from Christian.
> > > >
> > > > This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
> > > > intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
> > > > FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
> > > > ceph folks about the right way to support idmaps.
> > > >
> > > > One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
> > > > In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
> > > > Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
> > > > The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
> > > > which makes cephfs FDs not very transferable through unix sockets. [3]
> > > >
> > > > These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
> > > > not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
> > > > VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
> > > > For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
> > > > used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
> > > >
> > > > We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
> > > > was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
> > > > of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
> > > > Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
> > > > POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
> > > > (taken from FUSE request header).
> > > >
> > > > We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
> > > > to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
> > > > make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
> > > > to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
> > > > summarize everything and prepare some slides to navigate/plan discussion.
> > > >
> > > > [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> > > > [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> > > > [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> > > > [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> > > > [5]
> > > > mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
> > > >
> > > > Kind regards,
> > > > Alexander Mikhalitsyn @ futurfusion.io
> > >
> >
> > Hi Jeff,
> >
> > thanks for such a fast reply! ;)
> >
> > >
> > > IIUC, people mostly use vfs-layer idmappings because they want to remap
> > > the uid/gid values of files that get stored on the backing store (disk,
> > > ceph MDS, or whatever).
> >
> > yes, precisely.
> >
> > >
> > > I've never used idmappings myself much in practice. Could you lay out
> > > an example of how you would use them with NFS in a real environment so
> > > I understand the problem better? I'd start by assuming a simple setup
> > > with AUTH_SYS and no NFSv4 idmapping involved, since that case should
> > > be fairly straightforward.
> >
> > For me, from the point of LXC/Incus project, idmapped mounts are used as
> > a way to "delegate" filesystems (or subtrees) to the containers:
> > 1. We, of course, assume that container enables user namespaces and
> > user can't mount a filesystem
> > inside because it has no FS_USERNS_MOUNT flag set (like in case of Cephfs, NFS,
> > CIFS and many others).
> > 2. At the same time host's system administrator wants to avoid
> > remapping between container's user ns and
> > sb->s_user_ns (which is init_user_ns for those filesystems). [
> > motivation here is that in many
> > cases you may want to have the same subtree to be shared with other
> > containers and even host users too and
> > you want UIDs to be "compatible", i.e UID 1000 in one container and
> > UID 1000 in another container should
> > land as UID 1000 on the filesystem's inode ]
> >
> > For this usecase, when we bind-mount filesystem to container, we apply
> > VFS idmap equal to container's
> > user namespace. This makes a behavior I described.
> >
>
> Ok: so you have a process running in a userns as UID 2000 and you want
> to use vfs layer idmapping so that when you create a file as that user
> that it ends up being owned by UID 1000. Is that basically correct?
In our case, we have a UID 1000 (inside user namespace), which mapped to
something like 10000 + 1000 (in the init_user_ns). And then we have
NFS mount (sb->s_user_ns = init_user_ns, ofc), so if user UID 1000
(inside the container)
creates a file, it will be 11000, right? But we do bind-mount of that
NFS mount+VFS idmap,
so that once file is created it has owner_uid = 1000. (This scenario
is covered by [1] and [2])
[1] https://docs.kernel.org/filesystems/idmappings.html#example-3
[2] https://docs.kernel.org/filesystems/idmappings.html#example-3-reconsidered
>
> Typically, the RPC credentials used in an OPEN or CREATE call is what
> determines its ownership (at least until a SETATTR comes in). With
> AUTH_SYS, the credential is just a uid and set of gids.
>
> So in this case, it sounds like you would need just do that conversion
> (maybe at the RPC client layer?) when issuing an RPC. You don't really
> need a protocol extension for that case.
>
> As Trond points out though, AUTH_GSS and NFSv4 idmapping will make this
> more complex. Once you're using kerberos credentials for
> authentication, you don't have much control over what the UIDs and GIDs
> will be on newly-created files, but is that really a problem? As long
> as all of the clients have a consistent view, I wouldn't think so.
I absolutely agree.
>
> > But this is just one use case. I'm pretty sure there are some more
> > around here :)
> > I know that folks from Preferred Networks (preferred.jp) are also
> > interested in VFS idmap support in NFS,
> > probably they can share some ideas/use cases too.
> >
> >
>
> Yes, we don't want to focus too much on a single use-case, but I find
> it helpful to focus on a single simple problem first.
Yes, I could prepare RFC patches before LSF/MM/BPF for that simple case so
we have something to start with.
> --
> Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-18 16:01 ` Jeff Layton
2026-02-18 16:39 ` Alexander Mikhalitsyn
@ 2026-02-19 0:57 ` NeilBrown
2026-02-19 8:53 ` Kohei Sugihara
1 sibling, 1 reply; 13+ messages in thread
From: NeilBrown @ 2026-02-19 0:57 UTC (permalink / raw)
To: Jeff Layton
Cc: Alexander Mikhalitsyn, lsf-pc, aleksandr.mikhalitsyn,
linux-fsdevel, linux-nfs, stgraber, brauner, ksugihara, utam0k,
trondmy, anna, chuck.lever, miklos, jack, amir73il, trapexit
On Thu, 19 Feb 2026, Jeff Layton wrote:
> On Wed, 2026-02-18 at 15:36 +0100, Alexander Mikhalitsyn wrote:
> > Am Mi., 18. Feb. 2026 um 14:49 Uhr schrieb Jeff Layton <jlayton@kernel.org>:
> > >
> > > On Wed, 2026-02-18 at 13:44 +0100, Alexander Mikhalitsyn wrote:
> > > > Dear friends,
> > > >
> > > > I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
> > > >
> > > > Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
> > > > from Christian.
> > > >
> > > > This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
> > > > intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
> > > > FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
> > > > ceph folks about the right way to support idmaps.
> > > >
> > > > One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
> > > > In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
> > > > Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
> > > > The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
> > > > which makes cephfs FDs not very transferable through unix sockets. [3]
> > > >
> > > > These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
> > > > not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
> > > > VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
> > > > For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
> > > > used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
> > > >
> > > > We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
> > > > was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
> > > > of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
> > > > Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
> > > > POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
> > > > (taken from FUSE request header).
> > > >
> > > > We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
> > > > to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
> > > > make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
> > > > to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
> > > > summarize everything and prepare some slides to navigate/plan discussion.
> > > >
> > > > [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> > > > [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> > > > [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> > > > [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> > > > [5]
> > > > mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
> > > >
> > > > Kind regards,
> > > > Alexander Mikhalitsyn @ futurfusion.io
> > >
> >
> > Hi Jeff,
> >
> > thanks for such a fast reply! ;)
> >
> > >
> > > IIUC, people mostly use vfs-layer idmappings because they want to remap
> > > the uid/gid values of files that get stored on the backing store (disk,
> > > ceph MDS, or whatever).
> >
> > yes, precisely.
> >
> > >
> > > I've never used idmappings myself much in practice. Could you lay out
> > > an example of how you would use them with NFS in a real environment so
> > > I understand the problem better? I'd start by assuming a simple setup
> > > with AUTH_SYS and no NFSv4 idmapping involved, since that case should
> > > be fairly straightforward.
> >
> > For me, from the point of LXC/Incus project, idmapped mounts are used as
> > a way to "delegate" filesystems (or subtrees) to the containers:
> > 1. We, of course, assume that container enables user namespaces and
> > user can't mount a filesystem
> > inside because it has no FS_USERNS_MOUNT flag set (like in case of Cephfs, NFS,
> > CIFS and many others).
> > 2. At the same time host's system administrator wants to avoid
> > remapping between container's user ns and
> > sb->s_user_ns (which is init_user_ns for those filesystems). [
> > motivation here is that in many
> > cases you may want to have the same subtree to be shared with other
> > containers and even host users too and
> > you want UIDs to be "compatible", i.e UID 1000 in one container and
> > UID 1000 in another container should
> > land as UID 1000 on the filesystem's inode ]
> >
> > For this usecase, when we bind-mount filesystem to container, we apply
> > VFS idmap equal to container's
> > user namespace. This makes a behavior I described.
> >
>
> Ok: so you have a process running in a userns as UID 2000 and you want
> to use vfs layer idmapping so that when you create a file as that user
> that it ends up being owned by UID 1000. Is that basically correct?
>
> Typically, the RPC credentials used in an OPEN or CREATE call is what
> determines its ownership (at least until a SETATTR comes in). With
> AUTH_SYS, the credential is just a uid and set of gids.
>
> So in this case, it sounds like you would need just do that conversion
> (maybe at the RPC client layer?) when issuing an RPC. You don't really
> need a protocol extension for that case.
You also need to consider the conversion when receiving an RPC.
If you use krb5 and NFSv3 then you really want the mapping between krb5
identity and uid to be the same on client and server, so then when an
application creates a file and the stats it, it sees that it owns it.
If I use a krb5 identity in an idmapped NFS filesystem I'll want the
server to map the identity to the "underlying" uid (was would be stored
in a local filesystem) and then when the client gets a GETATTR reply,
the VFS maps back to the uid seen by the application.
With NFSv4 and the idmapper you wouldn't need (or want) the kernel
idmapping to be used at all. You would want the idmapper deamon to run
in the user-namespace and map from on-the-wire names to the appropriate
app-level uids.
This would mean that a given NFS mount would need to be an a given user
namespace. Maybe that isn't desired.
If it is important for a given NFS mount to be available in multiple
user namespaces, then the idmapper daemon would need to map to the
underlying uid, and the VFS mapping would map that up to the app-level
uid.
NeilBrown
>
> As Trond points out though, AUTH_GSS and NFSv4 idmapping will make this
> more complex. Once you're using kerberos credentials for
> authentication, you don't have much control over what the UIDs and GIDs
> will be on newly-created files, but is that really a problem? As long
> as all of the clients have a consistent view, I wouldn't think so.
>
> > But this is just one use case. I'm pretty sure there are some more
> > around here :)
> > I know that folks from Preferred Networks (preferred.jp) are also
> > interested in VFS idmap support in NFS,
> > probably they can share some ideas/use cases too.
> >
> >
>
> Yes, we don't want to focus too much on a single use-case, but I find
> it helpful to focus on a single simple problem first.
> --
> Jeff Layton <jlayton@kernel.org>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-19 0:57 ` NeilBrown
@ 2026-02-19 8:53 ` Kohei Sugihara
0 siblings, 0 replies; 13+ messages in thread
From: Kohei Sugihara @ 2026-02-19 8:53 UTC (permalink / raw)
To: NeilBrown
Cc: Jeff Layton, Alexander Mikhalitsyn, lsf-pc, aleksandr.mikhalitsyn,
linux-fsdevel, linux-nfs, stgraber, brauner, utam0k, trondmy,
anna, chuck.lever, miklos, jack, amir73il, trapexit
On Thu, Feb 19, 2026 at 9:58 AM NeilBrown <neilb@ownmail.net> wrote:
>
> On Thu, 19 Feb 2026, Jeff Layton wrote:
> > On Wed, 2026-02-18 at 15:36 +0100, Alexander Mikhalitsyn wrote:
> > > Am Mi., 18. Feb. 2026 um 14:49 Uhr schrieb Jeff Layton <jlayton@kernel.org>:
> > > >
> > > > On Wed, 2026-02-18 at 13:44 +0100, Alexander Mikhalitsyn wrote:
> > > > > Dear friends,
> > > > >
> > > > > I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
> > > > >
> > > > > Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
> > > > > from Christian.
> > > > >
> > > > > This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
> > > > > intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
> > > > > FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
> > > > > ceph folks about the right way to support idmaps.
> > > > >
> > > > > One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
> > > > > In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
> > > > > Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
> > > > > The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
> > > > > which makes cephfs FDs not very transferable through unix sockets. [3]
> > > > >
> > > > > These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
> > > > > not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
> > > > > VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
> > > > > For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
> > > > > used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
> > > > >
> > > > > We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
> > > > > was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
> > > > > of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
> > > > > Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
> > > > > POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
> > > > > (taken from FUSE request header).
> > > > >
> > > > > We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
> > > > > to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
> > > > > make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
> > > > > to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
> > > > > summarize everything and prepare some slides to navigate/plan discussion.
> > > > >
> > > > > [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> > > > > [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> > > > > [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> > > > > [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> > > > > [5]
> > > > > mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
> > > > >
> > > > > Kind regards,
> > > > > Alexander Mikhalitsyn @ futurfusion.io
> > > >
> > >
> > > Hi Jeff,
> > >
> > > thanks for such a fast reply! ;)
> > >
> > > >
> > > > IIUC, people mostly use vfs-layer idmappings because they want to remap
> > > > the uid/gid values of files that get stored on the backing store (disk,
> > > > ceph MDS, or whatever).
> > >
> > > yes, precisely.
> > >
> > > >
> > > > I've never used idmappings myself much in practice. Could you lay out
> > > > an example of how you would use them with NFS in a real environment so
> > > > I understand the problem better? I'd start by assuming a simple setup
> > > > with AUTH_SYS and no NFSv4 idmapping involved, since that case should
> > > > be fairly straightforward.
> > >
> > > For me, from the point of LXC/Incus project, idmapped mounts are used as
> > > a way to "delegate" filesystems (or subtrees) to the containers:
> > > 1. We, of course, assume that container enables user namespaces and
> > > user can't mount a filesystem
> > > inside because it has no FS_USERNS_MOUNT flag set (like in case of Cephfs, NFS,
> > > CIFS and many others).
> > > 2. At the same time host's system administrator wants to avoid
> > > remapping between container's user ns and
> > > sb->s_user_ns (which is init_user_ns for those filesystems). [
> > > motivation here is that in many
> > > cases you may want to have the same subtree to be shared with other
> > > containers and even host users too and
> > > you want UIDs to be "compatible", i.e UID 1000 in one container and
> > > UID 1000 in another container should
> > > land as UID 1000 on the filesystem's inode ]
> > >
> > > For this usecase, when we bind-mount filesystem to container, we apply
> > > VFS idmap equal to container's
> > > user namespace. This makes a behavior I described.
> > >
> >
> > Ok: so you have a process running in a userns as UID 2000 and you want
> > to use vfs layer idmapping so that when you create a file as that user
> > that it ends up being owned by UID 1000. Is that basically correct?
> >
> > Typically, the RPC credentials used in an OPEN or CREATE call is what
> > determines its ownership (at least until a SETATTR comes in). With
> > AUTH_SYS, the credential is just a uid and set of gids.
> >
> > So in this case, it sounds like you would need just do that conversion
> > (maybe at the RPC client layer?) when issuing an RPC. You don't really
> > need a protocol extension for that case.
>
> You also need to consider the conversion when receiving an RPC.
>
> If you use krb5 and NFSv3 then you really want the mapping between krb5
> identity and uid to be the same on client and server, so then when an
> application creates a file and the stats it, it sees that it owns it.
>
> If I use a krb5 identity in an idmapped NFS filesystem I'll want the
> server to map the identity to the "underlying" uid (was would be stored
> in a local filesystem) and then when the client gets a GETATTR reply,
> the VFS maps back to the uid seen by the application.
Thank you Alex for the proposal and quick follow-ups. We're really
interested in this feature and we'd like to share our use case.
> > > But this is just one use case. I'm pretty sure there are some more
> > > around here :)
> > > I know that folks from Preferred Networks (preferred.jp) are also
> > > interested in VFS idmap support in NFS,
> > > probably they can share some ideas/use cases too.
Our use case is running multi-tenant Kubernetes clusters with
Kubernetes User Namespaces [1]. Basically we need to share a single
storage endpoint among multiple pods using ReadWriteMany (RWX) access
mode. Implementations that support both RWX and ID-mapped mount are
limited [2].
NFS is operationally common, so I am interested in supporting NFS for
ID-mapping, but NFS is complex due to its variety of mount options and
security features as Trond mentioned. We'd like to share our use case
and define the minimum goal. Our goal is here:
- 1: Mount the same NFS export as a persistent volume from multiple
Kubernetes Pods running on different compute nodes. Each tenant has
its own exports.
- 2: UID/GID in a container in the pod can be configurable to an
arbitrary value by runAsUser/runAsGroup (e.g. runAsUser/Group is set
to 1000).
- 3: We can access the export from the container as 1000:1000. At
minimum, ownership should be consistent from the container view (i.e.
stat shows 1000:1000 for files that the container creates). Today,
ID-mapped mount does not support NFS. The NFS client ends up using the
host-mapped uid/gid (e.g. container 1000 becomes host 11000), so the
container view becomes inconsistent across nodes.
There are (at least) two possible models here:
a) the NFS client sends 1000:1000 on the wire and the server stores
1000:1000 (so server-side ownership matches the container uid/gid), or
b) the server stores the host uid/gid (e.g. 11000:11000) and the
client/VFS maps it so that the container still sees 1000:1000.
My intuition is that (a) is simpler for a multi-node RWX setup, but it
may have security / policy implications depending on how the server
does authorization (especially with sec=sys). I think it’s worth
discussing what the safe and reasonable minimum should be.
In this case, UID/GID in the host node is not deterministic for the
process in the container due to user_namespaces(7), so we need to do
ID-mapping to unify UID/GID between container and file system. Also,
we likely need to consider both request and reply paths (e.g. GETATTR)
to keep the view consistent.
> Mixing in AUTH_GSS and real idmapping will be where things get harder,
> so let's not worry about those cases for now.
I totally agree with Jeff. We can start a minimum PoC with AUTH_SYS.
> With NFSv4 and the idmapper you wouldn't need (or want) the kernel
> idmapping to be used at all. You would want the idmapper deamon to run
> in the user-namespace and map from on-the-wire names to the appropriate
> app-level uids.
> This would mean that a given NFS mount would need to be an a given user
> namespace. Maybe that isn't desired.
Neil, thank you for your comment. We initially expected it to be in
NFSv4. I totally agree with you and exactly our concern is how do we
make it consistent with idmapd(8). In the Kubernetes case, we cannot
pass CAP_SYS_ADMIN to allow pods to mount NFS directly, so mount will
be done on the host. As you mentioned, we think we can share a single
NFS export from multiple hosts and pods, so I think introducing
ID-mapping into the VFS layer (with referencing local id-mapping
table) is appropriate.
We can start by picking a small case. My concern was whether this
could violate NFS protocol or not, whether things can be done on the
client side or not, and this topic is suitable for dealing with this
as the VFS community. If things can be done on the client side, we can
cover existing NFS server implementations (e.g. OpenZFS, proprietary
appliances). I believe this can be applied to recent containerized
runtime environments, even this small working set.
Adding more context, Kubernetes and the container community actively
work on host isolation using the Linux user namespace feature.
Recently they experienced RCE vulnerabilities on container runtime but
it could be mitigated by host isolation using the user namespace
isolation [3]. Along with migrating the runtime environment to user
namespace, extending file system support will be worth discussing.
Kind regards,
Kohei
[1] https://kubernetes.io/docs/concepts/workloads/pods/user-namespaces/
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
[3] https://lpc.events/event/19/contributions/2065/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-18 12:44 [LSF/MM/BPF TOPIC] VFS idmappings support in NFS Alexander Mikhalitsyn
2026-02-18 13:49 ` Jeff Layton
@ 2026-02-21 6:44 ` Demi Marie Obenour
2026-02-24 8:54 ` Christian Brauner
1 sibling, 1 reply; 13+ messages in thread
From: Demi Marie Obenour @ 2026-02-21 6:44 UTC (permalink / raw)
To: Alexander Mikhalitsyn, lsf-pc
Cc: aleksandr.mikhalitsyn, linux-fsdevel, linux-nfs, stgraber,
brauner, ksugihara, utam0k, trondmy, anna, jlayton, chuck.lever,
neilb, miklos, jack, amir73il, trapexit
[-- Attachment #1.1.1: Type: text/plain, Size: 4530 bytes --]
On 2/18/26 07:44, Alexander Mikhalitsyn wrote:
> Dear friends,
>
> I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
>
> Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
> from Christian.
>
> This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
> intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
> FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
> ceph folks about the right way to support idmaps.
>
> One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
> In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
> Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
> The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
> which makes cephfs FDs not very transferable through unix sockets. [3]
>
> These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
> not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
> VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
> For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
> used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
>
> We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
> was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
> of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
> Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
> POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
> (taken from FUSE request header).
>
> We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
> to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
> make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
> to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
> summarize everything and prepare some slides to navigate/plan discussion.
>
> [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> [5]
> mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
>
> Kind regards,
> Alexander Mikhalitsyn @ futurfusion.io
The secure case (strong authentication) has similar problems to
unprivileged virtiofsd on a system with user namespaces disabled.
In both cases, there is no way to store the files with the UID/GID/etc
that the VFS says they should have. The server (NFS) or kernel
(virtiofsd) simply will not (and, for security reasons, *must not*)
allow this.
I proposed a workaround for virtiofsd [1] that I will also propose
here: store the mapped UID and GID as a user.* xattr. This requires
no special permissions, and so it completely solves this problem.
It is also the only solution I know of that scales to NFS servers
with over 2^16 users, which might well exist.
The only better solution I can think of is to replace the numeric
UID/GID with hierarchical identifier, such as a Windows-style SID.
Those are much more complex, though.
[1]: https://gitlab.com/virtio-fs/virtiofsd/-/issues/225
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-21 6:44 ` Demi Marie Obenour
@ 2026-02-24 8:54 ` Christian Brauner
2026-02-24 14:18 ` Demi Marie Obenour
0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2026-02-24 8:54 UTC (permalink / raw)
To: Demi Marie Obenour
Cc: Alexander Mikhalitsyn, lsf-pc, aleksandr.mikhalitsyn,
linux-fsdevel, linux-nfs, stgraber, ksugihara, utam0k, trondmy,
anna, jlayton, chuck.lever, neilb, miklos, jack, amir73il,
trapexit
On Sat, Feb 21, 2026 at 01:44:26AM -0500, Demi Marie Obenour wrote:
> On 2/18/26 07:44, Alexander Mikhalitsyn wrote:
> > Dear friends,
> >
> > I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
> >
> > Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
> > from Christian.
> >
> > This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
> > intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
> > FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
> > ceph folks about the right way to support idmaps.
> >
> > One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
> > In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
> > Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
> > The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
> > which makes cephfs FDs not very transferable through unix sockets. [3]
> >
> > These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
> > not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
> > VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
> > For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
> > used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
> >
> > We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
> > was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
> > of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
> > Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
> > POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
> > (taken from FUSE request header).
> >
> > We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
> > to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
> > make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
> > to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
> > summarize everything and prepare some slides to navigate/plan discussion.
> >
> > [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> > [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> > [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> > [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> > [5]
> > mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
> >
> > Kind regards,
> > Alexander Mikhalitsyn @ futurfusion.io
>
> The secure case (strong authentication) has similar problems to
> In both cases, there is no way to store the files with the UID/GID/etc
It's easy to support idmapped mounts without user namespaces. They're
completely decoupled from them already for that purpose so they can
support id squashing and so on going forward. The only thing that's
needed is to extend the api so that we can specific mappings to be used
for a mount. That's not difficult and there's no need to adhere to any
inherent limit on the number of mappings that user namespaces have.
It's also useful indepent of all that for local filesystems that want to
expose files with different ownership at different locations without
getting into namespaces at all.
> that the VFS says they should have. The server (NFS) or kernel
> (virtiofsd) simply will not (and, for security reasons, *must not*)
> allow this.
>
> I proposed a workaround for virtiofsd [1] that I will also propose
> here: store the mapped UID and GID as a user.* xattr. This requires
xattrs as an ownership side-channel are an absolute clusterfuck. The
kernel implementation for POSIX ACLs and filesystem capabilities that
slap ownership information that the VFS must consume on arbitraries
inodes should be set on fire. I've burned way too many cycles getting
this into an even remotely acceptable shape and it still sucks to no
end. Permission checking is a completely nightmare because we need to go
fetch stuff from disk, cache it in a global format, then do an in-place
translation having to parse ownership out of binary data stored
alongside the inode.
Nowever, if userspace wants to consume ownership information by storing
arbitrary ownership information as user.* xattrs then I obviously
couldn't care less but it won't nest, performance will suck, and it will
be brittle to get this right imho.
> no special permissions, and so it completely solves this problem.
> It is also the only solution I know of that scales to NFS servers
> with over 2^16 users, which might well exist.
>
> The only better solution I can think of is to replace the numeric
> UID/GID with hierarchical identifier, such as a Windows-style SID.
> Those are much more complex, though.
>
> [1]: https://gitlab.com/virtio-fs/virtiofsd/-/issues/225
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
2026-02-24 8:54 ` Christian Brauner
@ 2026-02-24 14:18 ` Demi Marie Obenour
0 siblings, 0 replies; 13+ messages in thread
From: Demi Marie Obenour @ 2026-02-24 14:18 UTC (permalink / raw)
To: Christian Brauner
Cc: Alexander Mikhalitsyn, lsf-pc, aleksandr.mikhalitsyn,
linux-fsdevel, linux-nfs, stgraber, ksugihara, utam0k, trondmy,
anna, jlayton, chuck.lever, neilb, miklos, jack, amir73il,
trapexit
[-- Attachment #1.1.1: Type: text/plain, Size: 6453 bytes --]
On 2/24/26 03:54, Christian Brauner wrote:
> On Sat, Feb 21, 2026 at 01:44:26AM -0500, Demi Marie Obenour wrote:
>> On 2/18/26 07:44, Alexander Mikhalitsyn wrote:
>>> Dear friends,
>>>
>>> I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
>>>
>>> Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
>>> from Christian.
>>>
>>> This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
>>> intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
>>> FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
>>> ceph folks about the right way to support idmaps.
>>>
>>> One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
>>> In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
>>> Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
>>> The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
>>> which makes cephfs FDs not very transferable through unix sockets. [3]
>>>
>>> These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
>>> not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
>>> VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
>>> For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
>>> used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
>>>
>>> We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
>>> was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
>>> of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
>>> Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
>>> POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
>>> (taken from FUSE request header).
>>>
>>> We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
>>> to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
>>> make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
>>> to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
>>> summarize everything and prepare some slides to navigate/plan discussion.
>>>
>>> [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
>>> [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
>>> [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
>>> [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
>>> [5]
>>> mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
>>>
>>> Kind regards,
>>> Alexander Mikhalitsyn @ futurfusion.io
>>
>> The secure case (strong authentication) has similar problems to
>> In both cases, there is no way to store the files with the UID/GID/etc
>
> It's easy to support idmapped mounts without user namespaces. They're
> completely decoupled from them already for that purpose so they can
> support id squashing and so on going forward. The only thing that's
> needed is to extend the api so that we can specific mappings to be used
> for a mount. That's not difficult and there's no need to adhere to any
> inherent limit on the number of mappings that user namespaces have.
>
> It's also useful indepent of all that for local filesystems that want to
> expose files with different ownership at different locations without
> getting into namespaces at all.
Virtiofsd needs user namespaces to create a mount unless it runs as
real root.
>> that the VFS says they should have. The server (NFS) or kernel
>> (virtiofsd) simply will not (and, for security reasons, *must not*)
>> allow this.
>>
>> I proposed a workaround for virtiofsd [1] that I will also propose
>> here: store the mapped UID and GID as a user.* xattr. This requires
>
> xattrs as an ownership side-channel are an absolute clusterfuck. The
> kernel implementation for POSIX ACLs and filesystem capabilities that
> slap ownership information that the VFS must consume on arbitraries
> inodes should be set on fire.I've burned way too many cycles getting
> this into an even remotely acceptable shape and it still sucks to no
> end. Permission checking is a completely nightmare because we need to go
> fetch stuff from disk, cache it in a global format, then do an in-place
> translation having to parse ownership out of binary data stored
> alongside the inode.
Why is this so bad? What would be a better way to do this?
> Nowever, if userspace wants to consume ownership information by storing
> arbitrary ownership information as user.* xattrs then I obviously
> couldn't care less but it won't nest, performance will suck, and it will
> be brittle to get this right imho.
It can be nested by repeatedly remapping xattrs: user.o, user.o.o,
and so on. More efficient schemes undoubtedly exist.
Is there a better solution? The only one I can think of is to include
subordinate UIDs/GIDs in Kerberos tickets, and that fails when the
product of (total users in an installation * number of subordinate
UIDs/GIDs per user) approaches 2^32. This can happen if both are
65536.
I did suggest replacing UIDs and GIDs with NT-style SIDs, but that
is an absolutely enormous change across the entire system.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-02-24 14:18 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 12:44 [LSF/MM/BPF TOPIC] VFS idmappings support in NFS Alexander Mikhalitsyn
2026-02-18 13:49 ` Jeff Layton
2026-02-18 14:36 ` Alexander Mikhalitsyn
2026-02-18 16:01 ` Jeff Layton
2026-02-18 16:39 ` Alexander Mikhalitsyn
2026-02-19 0:57 ` NeilBrown
2026-02-19 8:53 ` Kohei Sugihara
2026-02-18 14:37 ` Trond Myklebust
2026-02-18 15:08 ` Jeff Layton
2026-02-18 15:25 ` Alexander Mikhalitsyn
2026-02-21 6:44 ` Demi Marie Obenour
2026-02-24 8:54 ` Christian Brauner
2026-02-24 14:18 ` Demi Marie Obenour
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox