public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
Cc: agruenba@redhat.com, amir73il@gmail.com, brauner@kernel.org,
	ceph-devel@vger.kernel.org, dhowells@redhat.com,
	hubcap@omnibond.com, jack@suse.cz, krisman@kernel.org,
	linux-nfs@vger.kernel.org, miklos@szeredi.hu,
	torvalds@linux-foundation.org
Subject: [PATCH 20/20] 9p: fix ->rename_sem exclusion
Date: Fri, 10 Jan 2025 02:43:03 +0000	[thread overview]
Message-ID: <20250110024303.4157645-20-viro@zeniv.linux.org.uk> (raw)
In-Reply-To: <20250110024303.4157645-1-viro@zeniv.linux.org.uk>

9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.

->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain.  Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.

Solution:
	* have d_move() done in ->rename() rather than in its caller
	* maintain a 9p-private rwsem (per-filesystem)
	* hold it exclusive over the relevant part of ->rename()
	* hold it shared over the places where we want the path.

That almost works.  FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control.  However, there's
also __d_unalias(), which isn't covered by any of that.

If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.

An approach to fixing that would be a couple of optional methods, so that
__d_unalias() would do
	if alias->d_op->d_unalias_trylock != NULL
		if (!alias->d_op->d_unalias_trylock(alias))
			fail (resulting in -ESTALE from lookup)
	__d_move(...)
	if alias->d_op->d_unalias_unlock != NULL
		alias->d_unalias_unlock(alias)
where it currently does __d_move().  9p instances would be down_write_trylock()
and up_write() of ->rename_mutex.

However, to reduce dentry_operations bloat, let's add one method instead -
->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and
->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias).

Another possible variant would be to hold ->rename_sem exclusive around
d_splice_alias() calls in 9p ->lookup(), but that would cause a lot of
contention on that rwsem and it's filesystem-wide, so let's not go there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/locking.rst |  2 ++
 Documentation/filesystems/vfs.rst     | 19 +++++++++++++++++++
 fs/9p/v9fs.h                          |  2 +-
 fs/9p/vfs_dentry.c                    | 13 +++++++++++++
 fs/dcache.c                           |  6 ++++++
 include/linux/dcache.h                |  1 +
 6 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 146e7d8aa736..6e20282447a0 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -31,6 +31,7 @@ prototypes::
 	struct vfsmount *(*d_automount)(struct path *path);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_want_unalias)(const struct dentry *, bool);
 
 locking rules:
 
@@ -50,6 +51,7 @@ d_dname:	   no		no		no		no
 d_automount:	   no		no		yes		no
 d_manage:	   no		no		yes (ref-walk)	maybe
 d_real		   no		no		yes 		no
+d_want_unalias	   yes		no		no 		no
 ================== ===========	========	==============	========
 
 inode_operations
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c352ebaae98..07d4b4deb252 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1265,6 +1265,7 @@ defined:
 		struct vfsmount *(*d_automount)(struct path *);
 		int (*d_manage)(const struct path *, bool);
 		struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+		bool (*d_want_unalias)(const struct dentry *, bool);
 	};
 
 ``d_revalidate``
@@ -1428,6 +1429,24 @@ defined:
 
 	For non-regular files, the 'dentry' argument is returned.
 
+``d_want_unalias``
+	if present, will be called by d_splice_alias() before and after
+	moving a preexisting attached alias.  The second argument is
+	true for call before __d_move() and false for the call after.
+	Returning false on the first call prevents __d_move(), making
+	d_splice_alias() fail with -ESTALE; return value on the second
+	call is ignored.
+
+	Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move()
+	and d_exchange() calls from the outside of filesystem methods;
+	however, it does not guarantee that attached dentries won't
+	be renamed or moved by d_splice_alias() finding a preexisting
+	alias for a directory inode.  Normally we would not care;
+	however, something that wants to stabilize the entire path to
+	root over a blocking operation might need that.  See 9p for one
+	(and hopefully only) example.
+
+
 Each dentry has a pointer to its parent dentry, as well as a hash list
 of child dentries.  Child dentries are basically like files in a
 directory.
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 698c43dd5dc8..f28bc763847a 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -202,7 +202,7 @@ static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode)
 	return inode->i_sb->s_fs_info;
 }
 
-static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry)
+static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry)
 {
 	return dentry->d_sb->s_fs_info;
 }
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 872c1abe3295..b2222df318d0 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -105,14 +105,27 @@ static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
 	return __v9fs_lookup_revalidate(dentry, flags);
 }
 
+static bool v9fs_dentry_want_unalias(const struct dentry *dentry, bool lock)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+
+	if (lock)
+		return down_write_trylock(&v9ses->rename_sem);
+
+	up_write(&v9ses->rename_sem);
+	return true;
+}
+
 const struct dentry_operations v9fs_cached_dentry_operations = {
 	.d_revalidate = v9fs_lookup_revalidate,
 	.d_weak_revalidate = __v9fs_lookup_revalidate,
 	.d_delete = v9fs_cached_dentry_delete,
 	.d_release = v9fs_dentry_release,
+	.d_want_unalias = v9fs_dentry_want_unalias,
 };
 
 const struct dentry_operations v9fs_dentry_operations = {
 	.d_delete = always_delete_dentry,
 	.d_release = v9fs_dentry_release,
+	.d_want_unalias = v9fs_dentry_want_unalias,
 };
diff --git a/fs/dcache.c b/fs/dcache.c
index 7d42ca367522..efbfbc1bc5d4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2947,6 +2947,7 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias)
 {
 	struct mutex *m1 = NULL;
 	struct rw_semaphore *m2 = NULL;
+	bool (*extra_trylock)(const struct dentry *, bool);
 	int ret = -ESTALE;
 
 	/* If alias and dentry share a parent, then no extra locks required */
@@ -2961,7 +2962,12 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias)
 		goto out_err;
 	m2 = &alias->d_parent->d_inode->i_rwsem;
 out_unalias:
+	extra_trylock = alias->d_op->d_want_unalias;
+	if (extra_trylock && !extra_trylock(alias, true))
+		goto out_err;
 	__d_move(alias, dentry, false);
+	if (extra_trylock)
+		extra_trylock(alias, false);
 	ret = 0;
 out_err:
 	if (m2)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 4a6bdadf2f29..2b33b9d04a8f 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -159,6 +159,7 @@ struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_want_unalias)(const struct dentry *, bool);
 } ____cacheline_aligned;
 
 /*
-- 
2.39.5


  parent reply	other threads:[~2025-01-10  2:43 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-10  2:38 [PATCHES][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-10  2:42   ` [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-10  9:35     ` Jan Kara
2025-01-10 16:24       ` Al Viro
2025-01-10  2:42   ` [PATCH 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-10  9:45     ` Jan Kara
2025-01-10  2:42   ` [PATCH 04/20] dissolve external_name.u into separate members Al Viro
2025-01-10  7:34     ` David Howells
2025-01-10 16:46       ` Al Viro
2025-01-10  2:42   ` [PATCH 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-10  9:15     ` Jan Kara
2025-01-10  2:42   ` [PATCH 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-10  2:42   ` [PATCH 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-10  2:42   ` [PATCH 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-10  2:42   ` [PATCH 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-10 19:45     ` Viacheslav Dubeyko
2025-01-10  2:42   ` [PATCH 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
2025-01-10  2:42   ` [PATCH 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-10  2:42   ` [PATCH 12/20] exfat_d_revalidate(): " Al Viro
2025-01-10  2:42   ` [PATCH 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-10  2:42   ` [PATCH 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-10  2:42   ` [PATCH 15/20] gfs2_drevalidate(): " Al Viro
2025-01-10 19:20     ` Andreas Grünbacher
2025-01-10  2:42   ` [PATCH 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-10  2:43   ` [PATCH 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-10  2:43   ` [PATCH 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-10  9:54     ` Jan Kara
2025-01-10  2:43   ` [PATCH 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-10  3:06     ` Linus Torvalds
2025-01-10  2:43   ` Al Viro [this message]
2025-01-10  3:11     ` [PATCH 20/20] 9p: fix ->rename_sem exclusion Linus Torvalds
2025-01-10  5:53       ` Al Viro
2025-01-10  9:21   ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Jan Kara
2025-01-16  5:21 ` [PATCHES v2][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-16  5:22     ` [PATCH v2 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-16  5:23     ` [PATCH v2 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-16  5:23     ` [PATCH v2 04/20] dissolve external_name.u into separate members Al Viro
2025-01-16 10:06       ` Jan Kara
2025-01-16  5:23     ` [PATCH v2 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-16  5:23     ` [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-16 15:38       ` Gabriel Krisman Bertazi
2025-01-16 15:46         ` Al Viro
2025-01-16 15:53           ` Gabriel Krisman Bertazi
2025-01-16  5:23     ` [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-16 15:15       ` Gabriel Krisman Bertazi
2025-01-17 18:55       ` Jeff Layton
2025-01-17 19:00         ` Al Viro
2025-01-16  5:23     ` [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-22 20:27       ` David Howells
2025-01-22 21:01         ` Al Viro
2025-01-22 21:24           ` Al Viro
2025-01-22 21:55             ` David Howells
2025-01-16  5:23     ` [PATCH v2 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-17 18:35       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
2025-01-17 18:35       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-17 15:20       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 12/20] exfat_d_revalidate(): " Al Viro
2025-01-16  5:23     ` [PATCH v2 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-17 15:22       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-17 15:18       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 15/20] gfs2_drevalidate(): " Al Viro
2025-01-16  5:23     ` [PATCH v2 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-17 14:05       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-17 15:12       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-16  5:23     ` [PATCH v2 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-16  5:23     ` [PATCH v2 20/20] 9p: fix ->rename_sem exclusion Al Viro
2025-01-23  1:45   ` [PATCHES v3][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-23  1:46       ` [PATCH v3 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-23  1:46       ` [PATCH v3 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-23  1:46       ` [PATCH v3 04/20] dissolve external_name.u into separate members Al Viro
2025-01-23  1:46       ` [PATCH v3 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-23  1:46       ` [PATCH v3 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-23  1:46       ` [PATCH v3 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-23  1:46       ` [PATCH v3 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-23  1:46       ` [PATCH v3 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-23  1:46       ` [PATCH v3 10/20] ceph_d_revalidate(): propagate stable name down into request encoding Al Viro
2025-01-23  1:46       ` [PATCH v3 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-23  1:46       ` [PATCH v3 12/20] exfat_d_revalidate(): " Al Viro
2025-01-23  1:46       ` [PATCH v3 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-23  1:46       ` [PATCH v3 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-23 10:51         ` Miklos Szeredi
2025-01-23  1:46       ` [PATCH v3 15/20] gfs2_drevalidate(): " Al Viro
2025-01-23  1:46       ` [PATCH v3 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-23  1:46       ` [PATCH v3 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-23  1:46       ` [PATCH v3 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-23  1:46       ` [PATCH v3 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-25 16:25         ` Mike Marshall
2025-01-23  1:46       ` [PATCH v3 20/20] 9p: fix ->rename_sem exclusion Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250110024303.4157645-20-viro@zeniv.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=agruenba@redhat.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=dhowells@redhat.com \
    --cc=hubcap@omnibond.com \
    --cc=jack@suse.cz \
    --cc=krisman@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox