Linux NFS development
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: NeilBrown <neil@brown.name>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	 David Howells <dhowells@redhat.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	 Jeff Layton <jlayton@kernel.org>,
	linux-nfs@vger.kernel.org, netfs@lists.linux.dev,
	 linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 0/8 RFC] tidy up various VFS lookup functions
Date: Tue, 18 Mar 2025 14:57:33 +0100	[thread overview]
Message-ID: <20250318-audienz-radeln-17745f4c6b8e@brauner> (raw)
In-Reply-To: <174217721714.9342.9504907056839144338@noble.neil.brown.name>

On Mon, Mar 17, 2025 at 01:06:57PM +1100, NeilBrown wrote:
> On Fri, 14 Mar 2025, Christian Brauner wrote:
> > On Fri, Mar 14, 2025 at 11:34:06AM +1100, NeilBrown wrote:
> > > VFS has some functions with names containing "lookup_one_len" and others
> > > without the "_len".  This difference has nothing to do with "len".
> > > 
> > > The functions without "_len" take a "mnt_idmap" pointer.  This is found
> > 
> > When we added idmapped mounts there were all these *_len() helpers and I
> > orignally had just ported them to pass mnt_idmap. But we decided to not
> > do this. The argument might have been that most callers don't need to be
> > switched (I'm not actually sure if that's still true now that we have
> > quite a few filesystems that do support idmapped mounts.).
> > 
> > So then we added new helper and then we decided to use better naming
> > then that *_len() stuff. That's about it.
> > 
> > > in the "vfsmount" and that is an important question when choosing which
> > > to use: do you have a vfsmount, or are you "inside" the filesystem.  A
> > > related question is "is permission checking relevant here?".
> > > 
> > > nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len
> > > functions.  They pass nop_mnt_idmap which is not correct if the vfsmount
> > > is actually idmaped.  For cachefiles it probably (?) doesn't matter as
> > > the accesses to the backing filesystem are always does with elevated privileged (?).
> > 
> > Cachefiles explicitly refuse being mounted on top of an idmapped mount
> > and they require that the mount is attached (check_mnt()) and an
> > attached mount can never be idmapped as it has already been exposed in
> > the filesystem hierarchy.
> > 
> > > 
> > > For nfsd it would matter if anyone exported an idmapped filesystem.  I
> > > wonder if anyone has tried...
> > 
> > nfsd doesn't support exporting idmapped mounts. See check_export() where
> > that's explicitly checked.
> > 
> > If there are ways to circumvent this I'd be good to know.
> 
> I should have checked that they rejected idmapped mounts
> (is_idmapped_mnt()).  But I think that just changes my justification for
> the change, not my desire to make the change.
> 
> There are two contexts in which lookup is done.  One is the common
> context when there is a vfsmount present and permission checking is
> expected.  nfsd and cachefiles both fit this context.
> 
> The other is when there is no vfsmount and/or permission checking is not
> relevant.  This happens after lookup_parentat when the permission check
> has already been performed, and in various virtual filesystems when the
> filesystem itself is adding/removing files or in normal filesystems
> where dedicated names like "lost+found" and "quota" are being accessed.
> 
> I would like to make a clear distinction between these, and for that to
> be done nfsd and cachefiles need to be changed to clearly fit the first
> context.  Whether they should allow idmapped mounts or not is to some
> extent a separate question.  They do want to do permission checking
> (certainly nfsd does) so they should use the same API as other
> permission-checking code.
> 
> > 
> > > 
> > > These patches change the "lookup_one" functions to take a vfsmount
> > > instead of a mnt_idmap because I think that makes the intention clearer.
> > 
> > Please don't!
> > 
> > These internal lookup helpers intentionally do not take a vfsmount.
> > First, because they can be called in places where access to a vfsmount
> > isn't possible and we don't want to pass vfsmounts down to filesystems
> > ever!
> 
> There are two sorts of internal lookup helpers.
> Those that currently don't even take a mnt_idmap and are called, as you
> say, in places where a vfsmount isn't available.
> And those that are currently called with a mnt_idmap and called (after a
> few cleanup) in places where a vfsmount is readily available.

That's not the point. The vfsmount is the wrong data structure to pass
to lookup_one(). The mount is completely immaterial to what it does. The
only thing that matters is the idmap. Passing the vfsmount is actively
misleading.

> 
> 
> > 
> > Second, the mnt_idmap pointer is - with few safe exceptions - is
> > retrieved once in the VFS and then passed down so that e.g., permission
> > checking and file creation are guaranteed to use the same mnt_idmap
> > pointer.
> 
> In every case that I changed a call to pass a vfsmount instead of a
> mnt_idmap, the mnt_idmap had recently been fetched from the vfsmount,
> often by mnt_idmap() in the first argument to lookup_one().  Sometimes
> by file_mnt_idmap() or similar.  So the patch never changed the safety
> of the idmap.

Taking btrfs:

        dentry = lookup_one(parent->mnt, QSTR_LEN(name, namelen), parent->dentry);
        error = PTR_ERR(dentry);
        if (IS_ERR(dentry))
                goto out_unlock;

        error = btrfs_may_create(idmap, dir, dentry);
        if (error)
                goto out_dput;

You fetch the idmap pointer twice here. The only reason that this is
safe is because I made it so that while an active writer is on a mount
the idmap cannot be changed. But that's subtle knowledge. By passing the
idmap pointer directly it is visually trivial to figure out that it is
guaranteed to be the same idmap without having to know how
mnt_want_write_file() interacts with mount_setattr().

The changes here also make following the logic complex. The idmap
pointer is fetched once and passed down everywhere it is needed. But
suddenly that's not true anymore for lookup_one() where its the
vfsmount. First question this raises is whether the mount topology
somehow matters for the lookup when really it doesn't.

Your cleanup series is really good with or without stuffing vfsmount
into all of these helpers. So please just either resend it and continue
passing struct mnt_idmap or I'll change it when I apply.

      reply	other threads:[~2025-03-18 13:57 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-14  0:34 [PATCH 0/8 RFC] tidy up various VFS lookup functions NeilBrown
2025-03-14  0:34 ` [PATCH 1/8] VFS: improve interface for lookup_one functions NeilBrown
2025-03-14  0:34 ` [PATCH 2/8] nfsd: Use lookup_one() rather than lookup_one_len() NeilBrown
2025-03-14  0:34 ` [PATCH 3/8] nfsd: use correct idmap for all accesses NeilBrown
2025-03-14  0:34 ` [PATCH 4/8] cachefiles: Use lookup_one() rather than lookup_one_len() NeilBrown
2025-03-14  0:34 ` [PATCH 5/8] cachefiles: use correct mnt_idmap NeilBrown
2025-03-14  0:34 ` [PATCH 6/8] VFS: rename lookup_one_len() family to lookup_noperm() and remove permission check NeilBrown
2025-03-14  0:34 ` [PATCH 7/8] Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS NeilBrown
2025-03-14  0:34 ` [PATCH 8/8] VFS: change lookup_one_common and lookup_noperm_common to take a qstr NeilBrown
2025-03-14 10:38 ` [PATCH 0/8 RFC] tidy up various VFS lookup functions Christian Brauner
2025-03-17  2:06   ` NeilBrown
2025-03-18 13:57     ` Christian Brauner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250318-audienz-radeln-17745f4c6b8e@brauner \
    --to=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dhowells@redhat.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=netfs@lists.linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox