public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Cc: lsf-pc@lists.linux-foundation.org,
	aleksandr.mikhalitsyn@futurfusion.io,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	stgraber@stgraber.org, 	brauner@kernel.org,
	ksugihara@preferred.jp, utam0k@preferred.jp, 	trondmy@kernel.org,
	anna@kernel.org, chuck.lever@oracle.com, neilb@suse.de,
		miklos@szeredi.hu, jack@suse.cz, amir73il@gmail.com,
	trapexit@spawn.link
Subject: Re: [LSF/MM/BPF TOPIC] VFS idmappings support in NFS
Date: Wed, 18 Feb 2026 11:01:16 -0500	[thread overview]
Message-ID: <e0be58df89ffaf41763312dfffe8402fdcb9d023.camel@kernel.org> (raw)
In-Reply-To: <CAJqdLrqNzXRwMF2grTGCkaMKCEXAwemQLEi3wsL5Lp2W9D-ZVg@mail.gmail.com>

On Wed, 2026-02-18 at 15:36 +0100, Alexander Mikhalitsyn wrote:
> Am Mi., 18. Feb. 2026 um 14:49 Uhr schrieb Jeff Layton <jlayton@kernel.org>:
> > 
> > On Wed, 2026-02-18 at 13:44 +0100, Alexander Mikhalitsyn wrote:
> > > Dear friends,
> > > 
> > > I would like to propose "VFS idmappings support in NFS" as a topic for discussion at the LSF/MM/BPF Summit.
> > > 
> > > Previously, I worked on VFS idmap support for FUSE/virtiofs [2] and cephfs [1] with support/guidance
> > > from Christian.
> > > 
> > > This experience with Cephfs & FUSE has shown that VFS idmap semantics, while being very elegant and
> > > intuitive for local filesystems, can be quite challenging to combine with network/network-like (e.g. FUSE)
> > > FSes. In case of Cephfs we had to modify its protocol (!) (see [2]) as a part of our agreement with
> > > ceph folks about the right way to support idmaps.
> > > 
> > > One obstacle here was that cephfs has some features that are not very Linux-wayish, I would say.
> > > In particular, system administrator can configure path-based UID/GID restrictions on a *server*-side (Ceph MDS).
> > > Basically, you can say "I expect UID 1000 and GID 2000 for all files under /stuff directory".
> > > The problem here is that these UID/GIDs are taken from a syscall-caller's creds (not from (struct file *)->f_cred)
> > > which makes cephfs FDs not very transferable through unix sockets. [3]
> > > 
> > > These path-based UID/GID restrictions mean that server expects client to send UID/GID with every single request,
> > > not only for those OPs where UID/GID needs to be written to the disk (mknod, mkdir, symlink, etc).
> > > VFS idmaps API is designed to prevent filesystems developers from making a mistakes when supporting FS_ALLOW_IDMAP.
> > > For example, (struct mnt_idmap *) is not passed to every single i_op, but instead to only those where it can be
> > > used legitimately. Particularly, readlink/listxattr or rmdir are not expected to use idmapping information anyhow.
> > > 
> > > We've seen very similar challenges with FUSE. Not a long time ago on Linux Containers project forum, there
> > > was a discussion about mergerfs (a popular FUSE-based filesystem) & VFS idmaps [5]. And I see that this problem
> > > of "caller UID/GID are needed everywhere" still blocks VFS idmaps adoption in some usecases.
> > > Antonio Musumeci (mergerfs maintainer) claimed that in many cases filesystems behind mergerfs may not be fully
> > > POSIX and basically, when mergerfs does IO on the underlying FSes it needs to do UID/GID switch to caller's UID/GID
> > > (taken from FUSE request header).
> > > 
> > > We don't expect NFS to be any simpler :-) I would say that supporting NFS is a final boss. It would be great
> > > to have a deep technical discussion with VFS/FSes maintainers and developers about all these challenges and
> > > make some conclusions and identify a right direction/approach to these problems. From my side, I'm going
> > > to get more familiar with high-level part of NFS (or even make PoC if time permits), identify challenges,
> > > summarize everything and prepare some slides to navigate/plan discussion.
> > > 
> > > [1] cephfs https://lore.kernel.org/linux-fsdevel/20230807132626.182101-1-aleksandr.mikhalitsyn@canonical.com
> > > [2] cephfs protocol changes https://github.com/ceph/ceph/pull/52575
> > > [3] cephfs & f_cred https://lore.kernel.org/lkml/CAEivzxeZ6fDgYMnjk21qXYz13tHqZa8rP-cZ2jdxkY0eX+dOjw@mail.gmail.com/
> > > [4] fuse/virtiofs https://lore.kernel.org/linux-fsdevel/20240903151626.264609-1-aleksandr.mikhalitsyn@canonical.com/
> > > [5]
> > > mergerfshttps://discuss.linuxcontainers.org/t/is-it-the-case-that-you-cannot-use-shift-true-for-disk-devices-where-the-source-is-a-mergerfs-mount-is-there-a-workaround/25336/11?u=amikhalitsyn
> > > 
> > > Kind regards,
> > > Alexander Mikhalitsyn @ futurfusion.io
> > 
> 
> Hi Jeff,
> 
> thanks for such a fast reply! ;)
> 
> > 
> > IIUC, people mostly use vfs-layer idmappings because they want to remap
> > the uid/gid values of files that get stored on the backing store (disk,
> > ceph MDS, or whatever).
> 
> yes, precisely.
> 
> > 
> > I've never used idmappings myself much in practice. Could you lay out
> > an example of how you would use them with NFS in a real environment so
> > I understand the problem better? I'd start by assuming a simple setup
> > with AUTH_SYS and no NFSv4 idmapping involved, since that case should
> > be fairly straightforward.
> 
> For me, from the point of LXC/Incus project, idmapped mounts are used as
> a way to "delegate" filesystems (or subtrees) to the containers:
> 1. We, of course, assume that container enables user namespaces and
> user can't mount a filesystem
> inside because it has no FS_USERNS_MOUNT flag set (like in case of Cephfs, NFS,
> CIFS and many others).
> 2. At the same time host's system administrator wants to avoid
> remapping between container's user ns and
> sb->s_user_ns (which is init_user_ns for those filesystems). [
> motivation here is that in many
> cases you may want to have the same subtree to be shared with other
> containers and even host users too and
> you want UIDs to be "compatible", i.e UID 1000 in one container and
> UID 1000 in another container should
> land as UID 1000 on the filesystem's inode ]
> 
> For this usecase, when we bind-mount filesystem to container, we apply
> VFS idmap equal to container's
> user namespace. This makes a behavior I described.
> 

Ok: so you have a process running in a userns as UID 2000 and you want
to use vfs layer idmapping so that when you create a file as that user
that it ends up being owned by UID 1000. Is that basically correct?

Typically, the RPC credentials used in an OPEN or CREATE call is what
determines its ownership (at least until a SETATTR comes in). With
AUTH_SYS, the credential is just a uid and set of gids.

So in this case, it sounds like you would need just do that conversion
(maybe at the RPC client layer?) when issuing an RPC. You don't really
need a protocol extension for that case.

As Trond points out though, AUTH_GSS and NFSv4 idmapping will make this
more complex. Once you're using kerberos credentials for
authentication, you don't have much control over what the UIDs and GIDs
will be on newly-created files, but is that really a problem? As long
as all of the clients have a consistent view, I wouldn't think so.

> But this is just one use case. I'm pretty sure there are some more
> around here :)
> I know that folks from Preferred Networks (preferred.jp) are also
> interested in VFS idmap support in NFS,
> probably they can share some ideas/use cases too.
> 
> 

Yes, we don't want to focus too much on a single use-case, but I find
it helpful to focus on a single simple problem first.
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2026-02-18 16:01 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-18 12:44 [LSF/MM/BPF TOPIC] VFS idmappings support in NFS Alexander Mikhalitsyn
2026-02-18 13:49 ` Jeff Layton
2026-02-18 14:36   ` Alexander Mikhalitsyn
2026-02-18 16:01     ` Jeff Layton [this message]
2026-02-18 16:39       ` Alexander Mikhalitsyn
2026-02-19  0:57       ` NeilBrown
2026-02-19  8:53         ` Kohei Sugihara
2026-02-18 14:37   ` Trond Myklebust
2026-02-18 15:08     ` Jeff Layton
2026-02-18 15:25       ` Alexander Mikhalitsyn
2026-02-21  6:44 ` Demi Marie Obenour
2026-02-24  8:54   ` Christian Brauner
2026-02-24 14:18     ` Demi Marie Obenour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e0be58df89ffaf41763312dfffe8402fdcb9d023.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=aleksandr.mikhalitsyn@futurfusion.io \
    --cc=alexander@mihalicyn.com \
    --cc=amir73il@gmail.com \
    --cc=anna@kernel.org \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jack@suse.cz \
    --cc=ksugihara@preferred.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=miklos@szeredi.hu \
    --cc=neilb@suse.de \
    --cc=stgraber@stgraber.org \
    --cc=trapexit@spawn.link \
    --cc=trondmy@kernel.org \
    --cc=utam0k@preferred.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox