Re: [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Mickaël Salaün" <mic@digikod.net>
To: Tingmao Wang <m@maowtm.org>
Cc: "Al Viro" <viro@zeniv.linux.org.uk>,
	"Eric Van Hensbergen" <ericvh@kernel.org>,
	"Dominique Martinet" <asmadeus@codewreck.org>,
	"Latchesar Ionkov" <lucho@ionkov.net>,
	"Christian Schoenebeck" <linux_oss@crudebyte.com>,
	v9fs@lists.linux.dev, "Günther Noack" <gnoack@google.com>,
	linux-security-module@vger.kernel.org, "Jan Kara" <jack@suse.cz>,
	"Amir Goldstein" <amir73il@gmail.com>,
	"Matthew Bobrowski" <repnop@google.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L
Date: Fri, 8 Aug 2025 10:32:44 +0200	[thread overview]
Message-ID: <20250808.oog4xee5Pee2@digikod.net> (raw)
In-Reply-To: <b32e2088-92c0-43e0-8c90-cb20d4567973@maowtm.org>

On Fri, Jul 11, 2025 at 08:11:44PM +0100, Tingmao Wang wrote:
> Hi Al, thanks for the review :)  I haven't had the chance to properly
> think about this until today, so apologies for the delay.
> 
> On 7/5/25 01:19, Al Viro wrote:
> > On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> > 
> >> +struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
> >> +{
> >> +	struct v9fs_ino_path *path;
> >> +	size_t path_components = 0;
> >> +	struct dentry *curr = dentry;
> >> +	ssize_t i;
> >> +
> >> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> >> +
> >> +	rcu_read_lock();
> >> +
> >> +    /* Don't include the root dentry */
> >> +	while (curr->d_parent != curr) {
> >> +		path_components++;
> >> +		curr = curr->d_parent;
> >> +	}
> >> +	if (WARN_ON(path_components > SSIZE_MAX)) {
> 
> (Looking at this again I think this check is a bit bogus.  I don't know
> how would it be possible at all for us to have > SSIZE_MAX deep
> directories especially since each level requires a dentry allocation, but
> even if this check is actually useful, it should be in the while loop,
> before each path_components++)

WARN_ON_ONCE() would be better, especially in a while loop.  I avoid
using WARN_ON(), especially when that can be triggered by users.

> 
> >> +		rcu_read_unlock();
> >> +		return NULL;
> >> +	}
> >> +
> >> +	path = kmalloc(struct_size(path, names, path_components),
> >> +		       GFP_KERNEL);
> > 
> > Blocking allocation under rcu_read_lock().
> 
> I think my first instinct of how to fix this, if the original code is
> correct barring this allocation issue, would be to take rcu read lock
> twice (first walk to calculate how much to allocate, then second walk to
> actually take the snapshots).  We should be safe to rcu_read_unlock() in
> the middle as long as caller has a reference to the target dentry (this
> needs to be true even if we just do one rcu_read_lock() anyway), and we
> can start a parent walk again.  The v9fs rename_sem should ensure we see
> the same path again.
> 
> Alternatively, we can use dget_parent to do the walk, and not lock RCU at
> all.  We still need to walk twice tho, to know how much to allocate.  But
> for now I will keep the current approach.
> 
> New version:
> 
> /*
>  * Must hold rename_sem due to traversing parents.  Caller must hold
>  * reference to dentry.
>  */
> struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
> {
> 	struct v9fs_ino_path *path;
> 	size_t path_components = 0;
> 	struct dentry *curr = dentry;
> 	ssize_t i;
> 
> 	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> 	might_sleep(); /* Allocation below might block */
> 
> 	rcu_read_lock();
> 
> 	/* Don't include the root dentry */
> 	while (curr->d_parent != curr) {
> 		if (WARN_ON(path_components >= SSIZE_MAX)) {
> 			rcu_read_unlock();
> 			return NULL;
> 		}
> 		path_components++;
> 		curr = curr->d_parent;
> 	}
> 
> 	/*
> 	 * Allocation can block so don't do it in RCU (and because the
> 	 * allocation might be large, since name_snapshot leaves space for
> 	 * inline str, not worth trying GFP_ATOMIC)
> 	 */
> 	rcu_read_unlock();
> 
> 	path = kmalloc(struct_size(path, names, path_components), GFP_KERNEL);
> 	if (!path) {
> 		rcu_read_unlock();

This unlock is wrong.

> 		return NULL;
> 	}
> 
> 	path->nr_components = path_components;
> 	curr = dentry;
> 
> 	rcu_read_lock();
> 	for (i = path_components - 1; i >= 0; i--) {
> 		take_dentry_name_snapshot(&path->names[i], curr);
> 		curr = curr->d_parent;
> 	}
> 	WARN_ON(curr != curr->d_parent);
> 	rcu_read_unlock();
> 	return path;
> }
> 
> How does this look?

Looks good to me overall.  Please sent a new patch series.

> 
> On 7/5/25 01:25, Al Viro wrote:
> > On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> >> +bool ino_path_compare(struct v9fs_ino_path *ino_path,
> >> +			     struct dentry *dentry)
> >> +{
> >> +	struct dentry *curr = dentry;
> >> +	struct qstr *curr_name;
> >> +	struct name_snapshot *compare;
> >> +	ssize_t i;
> >> +
> >> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> >> +
> >> +	rcu_read_lock();
> >> +	for (i = ino_path->nr_components - 1; i >= 0; i--) {
> >> +		if (curr->d_parent == curr) {
> >> +			/* We're supposed to have more components to walk */
> >> +			rcu_read_unlock();
> >> +			return false;
> >> +		}
> >> +		curr_name = &curr->d_name;
> >> +		compare = &ino_path->names[i];
> >> +		/*
> >> +		 * We can't use hash_len because it is salted with the parent
> >> +		 * dentry pointer.  We could make this faster by pre-computing our
> >> +		 * own hashlen for compare and ino_path outside, probably.
> >> +		 */
> >> +		if (curr_name->len != compare->name.len) {
> >> +			rcu_read_unlock();
> >> +			return false;
> >> +		}
> >> +		if (strncmp(curr_name->name, compare->name.name,
> >> +			    curr_name->len) != 0) {
> > 
> > ... without any kind of protection for curr_name.  Incidentally,
> > what about rename()?  Not a cross-directory one, just one that
> > changes the name of a subdirectory within the same parent?
> 
> As far as I can tell, in v9fs_vfs_rename, v9ses->rename_sem is taken for
> both same-parent and different parent renames, so I think we're safe here
> (and hopefully for any v9fs dentries, nobody should be causing d_name to
> change except for ourselves when we call d_move in v9fs_vfs_rename?  If
> yes then because we also take v9ses->rename_sem, in theory we should be
> fine here...?)

A lockdep_assert_held() or similar and a comment would make this clear.

> 
> (Let me know if I missed anything.  I'm assuming only the filesystem
> "owning" a dentry should d_move/d_exchange the dentry.)
> 
> However, I see that there is a d_same_name function in dcache.c which is
> slightly more careful (but still requires the caller to check the dentry
> seqcount, which we do not need to because of the reasoning above), and in
> hindsight I think that is probably the more proper way to do this
> comparison (and will also handle case-insensitivity, although I've not
> explored if this is applicable to 9pfs).
> 
> New version:
> 
> /*
>  * Must hold rename_sem due to traversing parents
>  */
> bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
> {
> 	struct dentry *curr = dentry;
> 	struct name_snapshot *compare;
> 	ssize_t i;
> 
> 	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> 
> 	rcu_read_lock();
> 	for (i = ino_path->nr_components - 1; i >= 0; i--) {
> 		if (curr->d_parent == curr) {
> 			/* We're supposed to have more components to walk */
> 			rcu_read_unlock();
> 			return false;
> 		}
> 		compare = &ino_path->names[i];
> 		if (!d_same_name(curr, curr->d_parent, &compare->name)) {
> 			rcu_read_unlock();
> 			return false;
> 		}
> 		curr = curr->d_parent;
> 	}
> 	rcu_read_unlock();
> 	if (curr != curr->d_parent) {
> 		/* dentry is deeper than ino_path */
> 		return false;
> 	}
> 	return true;
> }

I like this new version.

> 
> If you think this is not enough, can you suggest what would be needed?
> I'm thinking maybe we can check dentry seqcount to be safe, but from
> earlier discussion in "bpf path iterator" my impression is that that is
> VFS internal data - can we use it here (if needed)?
> 
> (I assume, from looking at the code, just having a reference does not
> prevent d_name from changing)
> 
>

next prev parent reply	other threads:[~2025-08-08  8:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L Tingmao Wang
2025-07-05  0:19   ` Al Viro
2025-07-05  0:25   ` Al Viro
2025-07-11 19:11     ` Tingmao Wang
2025-08-08  8:32       ` Mickaël Salaün [this message]
2025-08-12 23:57         ` Tingmao Wang
2025-08-13  7:47           ` Mickaël Salaün
2025-07-11 19:11   ` Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 2/6] fs/9p: add default option for path-based inodes Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 3/6] fs/9p: Hide inodeident=path from show_options as it is the default Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 4/6] fs/9p: Add ability to identify inode by path for non-.L Tingmao Wang
2025-07-11 19:12   ` Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 5/6] fs/9p: .L: Refresh stale inodes on reuse Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 6/6] fs/9p: non-.L: " Tingmao Wang
2025-07-04 18:37 ` [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Mickaël Salaün
2025-08-08 10:27 ` Dominique Martinet
2025-08-08 10:52   ` Christian Schoenebeck
2025-08-12 23:53     ` Tingmao Wang
2025-08-13  5:30       ` Dominique Martinet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250808.oog4xee5Pee2@digikod.net \
    --to=mic@digikod.net \
    --cc=amir73il@gmail.com \
    --cc=asmadeus@codewreck.org \
    --cc=ericvh@kernel.org \
    --cc=gnoack@google.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux_oss@crudebyte.com \
    --cc=lucho@ionkov.net \
    --cc=m@maowtm.org \
    --cc=repnop@google.com \
    --cc=v9fs@lists.linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.