From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, agruenba@redhat.com,
amir73il@gmail.com, brauner@kernel.org,
ceph-devel@vger.kernel.org, dhowells@redhat.com,
hubcap@omnibond.com, jack@suse.cz, krisman@kernel.org,
linux-nfs@vger.kernel.org, miklos@szeredi.hu
Subject: Re: [PATCH 20/20] 9p: fix ->rename_sem exclusion
Date: Fri, 10 Jan 2025 05:53:21 +0000 [thread overview]
Message-ID: <20250110055321.GU1977892@ZenIV> (raw)
In-Reply-To: <CAHk-=wh2i3j3GUYpiTBteS7Ln02endudZCqc9DRz==3p8T__yQ@mail.gmail.com>
On Thu, Jan 09, 2025 at 07:11:46PM -0800, Linus Torvalds wrote:
> On Thu, 9 Jan 2025 at 18:45, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > However, to reduce dentry_operations bloat, let's add one method instead -
> > ->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and
> > ->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias).
>
> Ugh.
>
> So of all the patches, this is the one that I hate.
>
> I absolutely detest interfaces with random true/false arguments, and
> when it is about locking, the "detest" becomes something even darker
> and more visceral.
>
> I think it would be a lot better as separate ops, considering that
>
> (a) we'll probably have only one or two actual users, so it's not
> like it complicates things on that side
>
> (b) we don't have *that* many "struct dentry_operations" structures:
> I think they are all statically generated constant structures
> (typically one or two per filesystem), so it's not like we're saving
> memory by merging those pointers into one.
ACK.
> Please?
Done and force-pushed; see below for updated variant of that commit
commit 1f28d77e868e63a07ab50e7fe161fc366b2fb23b
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Sun Jan 5 21:33:17 2025 -0500
9p: fix ->rename_sem exclusion
9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.
->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain. Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.
Solution:
* have d_move() done in ->rename() rather than in its caller
* maintain a 9p-private rwsem (per-filesystem)
* hold it exclusive over the relevant part of ->rename()
* hold it shared over the places where we want the path.
That almost works. FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control. However, there's
also __d_unalias(), which isn't covered by any of that.
If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.
Add a couple of optional methods, so that __d_unalias() would do
if alias->d_op->d_unalias_trylock != NULL
if (!alias->d_op->d_unalias_trylock(alias))
fail (resulting in -ESTALE from lookup)
__d_move(...)
if alias->d_op->d_unalias_unlock != NULL
alias->d_unalias_unlock(alias)
where it currently does __d_move(). 9p instances do down_write_trylock()
and up_write() of ->rename_mutex.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 146e7d8aa736..d20a32b77b60 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -31,6 +31,8 @@ prototypes::
struct vfsmount *(*d_automount)(struct path *path);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+ bool (*d_unalias_trylock)(const struct dentry *);
+ void (*d_unalias_unlock)(const struct dentry *);
locking rules:
@@ -50,6 +52,8 @@ d_dname: no no no no
d_automount: no no yes no
d_manage: no no yes (ref-walk) maybe
d_real no no yes no
+d_unalias_trylock yes no no no
+d_unalias_unlock yes no no no
================== =========== ======== ============== ========
inode_operations
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c352ebaae98..31eea688609a 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1265,6 +1265,8 @@ defined:
struct vfsmount *(*d_automount)(struct path *);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+ bool (*d_unalias_trylock)(const struct dentry *);
+ void (*d_unalias_unlock)(const struct dentry *);
};
``d_revalidate``
@@ -1428,6 +1430,25 @@ defined:
For non-regular files, the 'dentry' argument is returned.
+``d_unalias_trylock``
+ if present, will be called by d_splice_alias() before moving a
+ preexisting attached alias. Returning false prevents __d_move(),
+ making d_splice_alias() fail with -ESTALE.
+
+ Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move()
+ and d_exchange() calls from the outside of filesystem methods;
+ however, it does not guarantee that attached dentries won't
+ be renamed or moved by d_splice_alias() finding a preexisting
+ alias for a directory inode. Normally we would not care;
+ however, something that wants to stabilize the entire path to
+ root over a blocking operation might need that. See 9p for one
+ (and hopefully only) example.
+
+``d_unalias_unlock``
+ should be paired with ``d_unalias_trylock``; that one is called after
+ __d_move() call in __d_unalias().
+
+
Each dentry has a pointer to its parent dentry, as well as a hash list
of child dentries. Child dentries are basically like files in a
directory.
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 698c43dd5dc8..f28bc763847a 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -202,7 +202,7 @@ static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode)
return inode->i_sb->s_fs_info;
}
-static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry)
+static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry)
{
return dentry->d_sb->s_fs_info;
}
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 872c1abe3295..5061f192eafd 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -105,14 +105,30 @@ static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
return __v9fs_lookup_revalidate(dentry, flags);
}
+static bool v9fs_dentry_unalias_trylock(const struct dentry *dentry)
+{
+ struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+ return down_write_trylock(&v9ses->rename_sem);
+}
+
+static void v9fs_dentry_unalias_unlock(const struct dentry *dentry)
+{
+ struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+ up_write(&v9ses->rename_sem);
+}
+
const struct dentry_operations v9fs_cached_dentry_operations = {
.d_revalidate = v9fs_lookup_revalidate,
.d_weak_revalidate = __v9fs_lookup_revalidate,
.d_delete = v9fs_cached_dentry_delete,
.d_release = v9fs_dentry_release,
+ .d_unalias_trylock = v9fs_dentry_unalias_trylock,
+ .d_unalias_unlock = v9fs_dentry_unalias_unlock,
};
const struct dentry_operations v9fs_dentry_operations = {
.d_delete = always_delete_dentry,
.d_release = v9fs_dentry_release,
+ .d_unalias_trylock = v9fs_dentry_unalias_trylock,
+ .d_unalias_unlock = v9fs_dentry_unalias_unlock,
};
diff --git a/fs/dcache.c b/fs/dcache.c
index 7d42ca367522..2ac614fc8bba 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2961,7 +2961,12 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias)
goto out_err;
m2 = &alias->d_parent->d_inode->i_rwsem;
out_unalias:
+ if (alias->d_op->d_unalias_trylock &&
+ !alias->d_op->d_unalias_trylock(alias))
+ goto out_err;
__d_move(alias, dentry, false);
+ if (alias->d_op->d_unalias_unlock)
+ alias->d_op->d_unalias_unlock(alias);
ret = 0;
out_err:
if (m2)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 4a6bdadf2f29..9a1a30857763 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -159,6 +159,8 @@ struct dentry_operations {
struct vfsmount *(*d_automount)(struct path *);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+ bool (*d_unalias_trylock)(const struct dentry *);
+ void (*d_unalias_unlock)(const struct dentry *);
} ____cacheline_aligned;
/*
next prev parent reply other threads:[~2025-01-10 5:53 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-10 2:38 [PATCHES][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-10 2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-10 2:42 ` [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-10 9:35 ` Jan Kara
2025-01-10 16:24 ` Al Viro
2025-01-10 2:42 ` [PATCH 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-10 9:45 ` Jan Kara
2025-01-10 2:42 ` [PATCH 04/20] dissolve external_name.u into separate members Al Viro
2025-01-10 7:34 ` David Howells
2025-01-10 16:46 ` Al Viro
2025-01-10 2:42 ` [PATCH 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-10 9:15 ` Jan Kara
2025-01-10 2:42 ` [PATCH 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-10 2:42 ` [PATCH 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-10 2:42 ` [PATCH 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-10 2:42 ` [PATCH 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-10 19:45 ` Viacheslav Dubeyko
2025-01-10 2:42 ` [PATCH 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
2025-01-10 2:42 ` [PATCH 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-10 2:42 ` [PATCH 12/20] exfat_d_revalidate(): " Al Viro
2025-01-10 2:42 ` [PATCH 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-10 2:42 ` [PATCH 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-10 2:42 ` [PATCH 15/20] gfs2_drevalidate(): " Al Viro
2025-01-10 19:20 ` Andreas Grünbacher
2025-01-10 2:42 ` [PATCH 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-10 2:43 ` [PATCH 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-10 2:43 ` [PATCH 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-10 9:54 ` Jan Kara
2025-01-10 2:43 ` [PATCH 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-10 3:06 ` Linus Torvalds
2025-01-10 2:43 ` [PATCH 20/20] 9p: fix ->rename_sem exclusion Al Viro
2025-01-10 3:11 ` Linus Torvalds
2025-01-10 5:53 ` Al Viro [this message]
2025-01-10 9:21 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Jan Kara
2025-01-16 5:21 ` [PATCHES v2][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-16 5:22 ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-16 5:22 ` [PATCH v2 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-16 5:23 ` [PATCH v2 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-16 5:23 ` [PATCH v2 04/20] dissolve external_name.u into separate members Al Viro
2025-01-16 10:06 ` Jan Kara
2025-01-16 5:23 ` [PATCH v2 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-16 5:23 ` [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-16 15:38 ` Gabriel Krisman Bertazi
2025-01-16 15:46 ` Al Viro
2025-01-16 15:53 ` Gabriel Krisman Bertazi
2025-01-16 5:23 ` [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-16 15:15 ` Gabriel Krisman Bertazi
2025-01-17 18:55 ` Jeff Layton
2025-01-17 19:00 ` Al Viro
2025-01-16 5:23 ` [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-22 20:27 ` David Howells
2025-01-22 21:01 ` Al Viro
2025-01-22 21:24 ` Al Viro
2025-01-22 21:55 ` David Howells
2025-01-16 5:23 ` [PATCH v2 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-17 18:35 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
2025-01-17 18:35 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-17 15:20 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 12/20] exfat_d_revalidate(): " Al Viro
2025-01-16 5:23 ` [PATCH v2 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-17 15:22 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-17 15:18 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 15/20] gfs2_drevalidate(): " Al Viro
2025-01-16 5:23 ` [PATCH v2 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-17 14:05 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-17 15:12 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-16 5:23 ` [PATCH v2 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-16 5:23 ` [PATCH v2 20/20] 9p: fix ->rename_sem exclusion Al Viro
2025-01-23 1:45 ` [PATCHES v3][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-23 1:46 ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-23 1:46 ` [PATCH v3 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-23 1:46 ` [PATCH v3 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-23 1:46 ` [PATCH v3 04/20] dissolve external_name.u into separate members Al Viro
2025-01-23 1:46 ` [PATCH v3 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-23 1:46 ` [PATCH v3 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-23 1:46 ` [PATCH v3 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-23 1:46 ` [PATCH v3 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-23 1:46 ` [PATCH v3 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-23 1:46 ` [PATCH v3 10/20] ceph_d_revalidate(): propagate stable name down into request encoding Al Viro
2025-01-23 1:46 ` [PATCH v3 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-23 1:46 ` [PATCH v3 12/20] exfat_d_revalidate(): " Al Viro
2025-01-23 1:46 ` [PATCH v3 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-23 1:46 ` [PATCH v3 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-23 10:51 ` Miklos Szeredi
2025-01-23 1:46 ` [PATCH v3 15/20] gfs2_drevalidate(): " Al Viro
2025-01-23 1:46 ` [PATCH v3 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-23 1:46 ` [PATCH v3 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-23 1:46 ` [PATCH v3 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-23 1:46 ` [PATCH v3 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-25 16:25 ` Mike Marshall
2025-01-23 1:46 ` [PATCH v3 20/20] 9p: fix ->rename_sem exclusion Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250110055321.GU1977892@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=agruenba@redhat.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=ceph-devel@vger.kernel.org \
--cc=dhowells@redhat.com \
--cc=hubcap@omnibond.com \
--cc=jack@suse.cz \
--cc=krisman@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox