From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, agruenba@redhat.com,
amir73il@gmail.com, brauner@kernel.org,
ceph-devel@vger.kernel.org, dhowells@redhat.com,
hubcap@omnibond.com, jack@suse.cz, krisman@kernel.org,
linux-nfs@vger.kernel.org, miklos@szeredi.hu
Subject: Re: [PATCH 20/20] 9p: fix ->rename_sem exclusion
Date: Fri, 10 Jan 2025 05:53:21 +0000 [thread overview]
Message-ID: <20250110055321.GU1977892@ZenIV> (raw)
In-Reply-To: <CAHk-=wh2i3j3GUYpiTBteS7Ln02endudZCqc9DRz==3p8T__yQ@mail.gmail.com>
On Thu, Jan 09, 2025 at 07:11:46PM -0800, Linus Torvalds wrote:
> On Thu, 9 Jan 2025 at 18:45, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > However, to reduce dentry_operations bloat, let's add one method instead -
> > ->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and
> > ->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias).
>
> Ugh.
>
> So of all the patches, this is the one that I hate.
>
> I absolutely detest interfaces with random true/false arguments, and
> when it is about locking, the "detest" becomes something even darker
> and more visceral.
>
> I think it would be a lot better as separate ops, considering that
>
> (a) we'll probably have only one or two actual users, so it's not
> like it complicates things on that side
>
> (b) we don't have *that* many "struct dentry_operations" structures:
> I think they are all statically generated constant structures
> (typically one or two per filesystem), so it's not like we're saving
> memory by merging those pointers into one.
ACK.
> Please?
Done and force-pushed; see below for updated variant of that commit
commit 1f28d77e868e63a07ab50e7fe161fc366b2fb23b
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Sun Jan 5 21:33:17 2025 -0500
9p: fix ->rename_sem exclusion
9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.
->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain. Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.
Solution:
* have d_move() done in ->rename() rather than in its caller
* maintain a 9p-private rwsem (per-filesystem)
* hold it exclusive over the relevant part of ->rename()
* hold it shared over the places where we want the path.
That almost works. FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control. However, there's
also __d_unalias(), which isn't covered by any of that.
If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.
Add a couple of optional methods, so that __d_unalias() would do
if alias->d_op->d_unalias_trylock != NULL
if (!alias->d_op->d_unalias_trylock(alias))
fail (resulting in -ESTALE from lookup)
__d_move(...)
if alias->d_op->d_unalias_unlock != NULL
alias->d_unalias_unlock(alias)
where it currently does __d_move(). 9p instances do down_write_trylock()
and up_write() of ->rename_mutex.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 146e7d8aa736..d20a32b77b60 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -31,6 +31,8 @@ prototypes::
struct vfsmount *(*d_automount)(struct path *path);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+ bool (*d_unalias_trylock)(const struct dentry *);
+ void (*d_unalias_unlock)(const struct dentry *);
locking rules:
@@ -50,6 +52,8 @@ d_dname: no no no no
d_automount: no no yes no
d_manage: no no yes (ref-walk) maybe
d_real no no yes no
+d_unalias_trylock yes no no no
+d_unalias_unlock yes no no no
================== =========== ======== ============== ========
inode_operations
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c352ebaae98..31eea688609a 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1265,6 +1265,8 @@ defined:
struct vfsmount *(*d_automount)(struct path *);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+ bool (*d_unalias_trylock)(const struct dentry *);
+ void (*d_unalias_unlock)(const struct dentry *);
};
``d_revalidate``
@@ -1428,6 +1430,25 @@ defined:
For non-regular files, the 'dentry' argument is returned.
+``d_unalias_trylock``
+ if present, will be called by d_splice_alias() before moving a
+ preexisting attached alias. Returning false prevents __d_move(),
+ making d_splice_alias() fail with -ESTALE.
+
+ Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move()
+ and d_exchange() calls from the outside of filesystem methods;
+ however, it does not guarantee that attached dentries won't
+ be renamed or moved by d_splice_alias() finding a preexisting
+ alias for a directory inode. Normally we would not care;
+ however, something that wants to stabilize the entire path to
+ root over a blocking operation might need that. See 9p for one
+ (and hopefully only) example.
+
+``d_unalias_unlock``
+ should be paired with ``d_unalias_trylock``; that one is called after
+ __d_move() call in __d_unalias().
+
+
Each dentry has a pointer to its parent dentry, as well as a hash list
of child dentries. Child dentries are basically like files in a
directory.
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 698c43dd5dc8..f28bc763847a 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -202,7 +202,7 @@ static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode)
return inode->i_sb->s_fs_info;
}
-static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry)
+static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry)
{
return dentry->d_sb->s_fs_info;
}
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 872c1abe3295..5061f192eafd 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -105,14 +105,30 @@ static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
return __v9fs_lookup_revalidate(dentry, flags);
}
+static bool v9fs_dentry_unalias_trylock(const struct dentry *dentry)
+{
+ struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+ return down_write_trylock(&v9ses->rename_sem);
+}
+
+static void v9fs_dentry_unalias_unlock(const struct dentry *dentry)
+{
+ struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+ up_write(&v9ses->rename_sem);
+}
+
const struct dentry_operations v9fs_cached_dentry_operations = {
.d_revalidate = v9fs_lookup_revalidate,
.d_weak_revalidate = __v9fs_lookup_revalidate,
.d_delete = v9fs_cached_dentry_delete,
.d_release = v9fs_dentry_release,
+ .d_unalias_trylock = v9fs_dentry_unalias_trylock,
+ .d_unalias_unlock = v9fs_dentry_unalias_unlock,
};
const struct dentry_operations v9fs_dentry_operations = {
.d_delete = always_delete_dentry,
.d_release = v9fs_dentry_release,
+ .d_unalias_trylock = v9fs_dentry_unalias_trylock,
+ .d_unalias_unlock = v9fs_dentry_unalias_unlock,
};
diff --git a/fs/dcache.c b/fs/dcache.c
index 7d42ca367522..2ac614fc8bba 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2961,7 +2961,12 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias)
goto out_err;
m2 = &alias->d_parent->d_inode->i_rwsem;
out_unalias:
+ if (alias->d_op->d_unalias_trylock &&
+ !alias->d_op->d_unalias_trylock(alias))
+ goto out_err;
__d_move(alias, dentry, false);
+ if (alias->d_op->d_unalias_unlock)
+ alias->d_op->d_unalias_unlock(alias);
ret = 0;
out_err:
if (m2)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 4a6bdadf2f29..9a1a30857763 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -159,6 +159,8 @@ struct dentry_operations {
struct vfsmount *(*d_automount)(struct path *);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+ bool (*d_unalias_trylock)(const struct dentry *);
+ void (*d_unalias_unlock)(const struct dentry *);
} ____cacheline_aligned;
/*
next prev parent reply other threads:[~2025-01-10 5:53 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-10 2:38 [PATCHES][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-10 2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-10 2:42 ` [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-10 9:35 ` Jan Kara
2025-01-10 16:24 ` Al Viro
2025-01-10 2:42 ` [PATCH 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-10 9:45 ` Jan Kara
2025-01-10 2:42 ` [PATCH 04/20] dissolve external_name.u into separate members Al Viro
2025-01-10 7:34 ` David Howells
2025-01-10 16:46 ` Al Viro
2025-01-10 2:42 ` [PATCH 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-10 9:15 ` Jan Kara
2025-01-10 2:42 ` [PATCH 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-10 2:42 ` [PATCH 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-10 2:42 ` [PATCH 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-10 2:42 ` [PATCH 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-10 19:45 ` Viacheslav Dubeyko
2025-01-10 2:42 ` [PATCH 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
2025-01-10 2:42 ` [PATCH 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-10 2:42 ` [PATCH 12/20] exfat_d_revalidate(): " Al Viro
2025-01-10 2:42 ` [PATCH 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-10 2:42 ` [PATCH 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-10 2:42 ` [PATCH 15/20] gfs2_drevalidate(): " Al Viro
2025-01-10 19:20 ` Andreas Grünbacher
2025-01-10 2:42 ` [PATCH 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-10 2:43 ` [PATCH 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-10 2:43 ` [PATCH 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-10 9:54 ` Jan Kara
2025-01-10 2:43 ` [PATCH 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-10 3:06 ` Linus Torvalds
2025-01-10 2:43 ` [PATCH 20/20] 9p: fix ->rename_sem exclusion Al Viro
2025-01-10 3:11 ` Linus Torvalds
2025-01-10 5:53 ` Al Viro [this message]
2025-01-10 9:21 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Jan Kara
2025-01-16 5:21 ` [PATCHES v2][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-16 5:22 ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-16 5:22 ` [PATCH v2 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-16 5:23 ` [PATCH v2 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-16 5:23 ` [PATCH v2 04/20] dissolve external_name.u into separate members Al Viro
2025-01-16 10:06 ` Jan Kara
2025-01-16 5:23 ` [PATCH v2 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-16 5:23 ` [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-16 15:38 ` Gabriel Krisman Bertazi
2025-01-16 15:46 ` Al Viro
2025-01-16 15:53 ` Gabriel Krisman Bertazi
2025-01-16 5:23 ` [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-16 15:15 ` Gabriel Krisman Bertazi
2025-01-17 18:55 ` Jeff Layton
2025-01-17 19:00 ` Al Viro
2025-01-16 5:23 ` [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-22 20:27 ` David Howells
2025-01-22 21:01 ` Al Viro
2025-01-22 21:24 ` Al Viro
2025-01-22 21:55 ` David Howells
2025-01-16 5:23 ` [PATCH v2 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-17 18:35 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
2025-01-17 18:35 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-17 15:20 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 12/20] exfat_d_revalidate(): " Al Viro
2025-01-16 5:23 ` [PATCH v2 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-17 15:22 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-17 15:18 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 15/20] gfs2_drevalidate(): " Al Viro
2025-01-16 5:23 ` [PATCH v2 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-17 14:05 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-17 15:12 ` Jeff Layton
2025-01-16 5:23 ` [PATCH v2 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-16 5:23 ` [PATCH v2 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-16 5:23 ` [PATCH v2 20/20] 9p: fix ->rename_sem exclusion Al Viro
2025-01-23 1:45 ` [PATCHES v3][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-23 1:46 ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-23 1:46 ` [PATCH v3 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-23 1:46 ` [PATCH v3 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-23 1:46 ` [PATCH v3 04/20] dissolve external_name.u into separate members Al Viro
2025-01-23 1:46 ` [PATCH v3 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-23 1:46 ` [PATCH v3 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-23 1:46 ` [PATCH v3 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-23 1:46 ` [PATCH v3 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-23 1:46 ` [PATCH v3 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-23 1:46 ` [PATCH v3 10/20] ceph_d_revalidate(): propagate stable name down into request encoding Al Viro
2025-01-23 1:46 ` [PATCH v3 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-23 1:46 ` [PATCH v3 12/20] exfat_d_revalidate(): " Al Viro
2025-01-23 1:46 ` [PATCH v3 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-23 1:46 ` [PATCH v3 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-23 10:51 ` Miklos Szeredi
2025-01-23 1:46 ` [PATCH v3 15/20] gfs2_drevalidate(): " Al Viro
2025-01-23 1:46 ` [PATCH v3 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-23 1:46 ` [PATCH v3 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-23 1:46 ` [PATCH v3 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-23 1:46 ` [PATCH v3 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-25 16:25 ` Mike Marshall
2025-01-23 1:46 ` [PATCH v3 20/20] 9p: fix ->rename_sem exclusion Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250110055321.GU1977892@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=agruenba@redhat.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=ceph-devel@vger.kernel.org \
--cc=dhowells@redhat.com \
--cc=hubcap@omnibond.com \
--cc=jack@suse.cz \
--cc=krisman@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.