linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHES][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems)
@ 2025-01-10  2:38 Al Viro
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
  2025-01-16  5:21 ` [PATCHES v2][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
  0 siblings, 2 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:38 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Gabriel Krisman Bertazi, Christian Brauner,
	Jan Kara, David Howells, ceph-devel, linux-nfs, Amir Goldstein,
	Miklos Szeredi, Andreas Gruenbacher, Mike Marshall

[this had been more than a year in making; my apologies for getting sidetracked
several times]

	Locking rules for dentry->d_name and dentry->d_parent are seriously
unpleasant and historically we had quite a few races in that area.  Filesystem
methods mostly don't have to care about that since the locking in VFS callers
is enough to guarantee that dentries passed to the methods won't be renamed
or moved behind the method's back.

	Directory-modifying methods (->mknod(), ->mkdir(), ->unlink(), ->rename(),
->link(), etc.) are guaranteed that their dentry arguments will have names and
parents unchanging through the method exectuction.  For ->lookup() the warranties
are slightly weaker (they disappear once an inode has been attached to dentry),
but they still cover most of the execution.  As the result, all these methods
can safely access ->d_parent and ->d_name.

	->d_revalidate() is an exception - the caller might be holding no locks
whatsoever and both the name and parent may be changing right under us.
Locally you can hold dentry->d_lock - that'll stabilize both ->d_parent and
->d_name, but you obviously can't hold that over any IO and as soon as you drop
->d_lock, you are on your own.

	There is a rather convoluted dance needed to get a safe reference
to parent -
	if not in RCU mode
		parent = dget_parent(dentry)
		dir = d_inode(parent)
	else
		parent = READ_ONCE(dentry->d_parent)
		dir = d_inode_rcu(parent)
		if (!dir)
			return -ECHILD
	<do actual work>
	if not in RCU mode
		dput(parent)
and it's duplicated in a bunch of instances (not all of them - quite a few
->d_revalidate() instances do not care about the parent *or* name in the
first place).  For names... you can safely access that under ->d_lock
(including copying it someplace safe) or you can use take_dentry_name_snapshot(),
but blind dereferencing of ->d_name.name is really asking for trouble -
for a long name you might end up accessing freed memory.

	An obvious improvement would be to pass safe references to parent
and name as explicit arguments of ->d_revalidate().  Examining the in-tree
instances shows that we have 4 groups:
	1) really don't care about parent at all: hfs, jfs, all procfs ones,
tracefs.
	2) want only the inode of parent directory: afs, ceph, exfat and
vfat ones, fscrypt, fuse, gfs2, nfs, ocfs2, orangefs
	3) don't use the parent directly, no help from that calling conventions
change: smb, 9p, vboxsf, coda, kernfs.
	4) really special: ecryptfs, overlayfs.
In other words, passing the parent's inode is more useful than passing its
dentry, ending up with
	int (*d_revalidate)(struct inode *dir, const struct qstr *name,
                            struct dentry *dentry, unsigned int flags);

	That, however, presumes that we can get these stable references in
the callers without a serious overhead.  Thankfully, there are only 3 callers
of ->d_revalidate() in the entire tree.  The regular one is in
fs/namei.c:d_revalidate() and that's what the pathname resolution is using.
Additionally, ecryptfs and overlayfs instances of ->d_revalidate() may
want to call that method for dentries in underlying filesystems.

	fs/namei.c:d_revalidate() callers already have stable references
to parent and name - we are calling that right after we'd found our dentry
in dcache and we bloody well know which parent/name combination we'd been
looking for.  So in this case no convolutions are needed - we already have
the values of extra arguments for ->d_revalidate().

	In case of ecryptfs and overlayfs deciding to call ->d_revalidate()
for underlying dentries we can just use take_dentry_name_snapshot() on that
underlying dentry to get the stable name and either do the aforementioned
convoluted dance to get a stable reference to parent (in case of overlayfs)
or use the directory underlying the parent of our dentry (in case of ecryptfs).

	That allows to get rid of boilerplate in the instances and allows
to close some actual races wrt ->d_name uses.  The series below attempts to
do just that.  It lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.d_revalidate
itself on top of #work.dcache.

	Individual patches in followups; please, review.

Shortlog (including #work.dcache):
Al Viro (20):
      make sure that DNAME_INLINE_LEN is a multiple of word size
      dcache: back inline names with a struct-wrapped array of unsigned long
      make take_dentry_name_snapshot() lockless
      dissolve external_name.u into separate members
      ext4 fast_commit: make use of name_snapshot primitives
      generic_ci_d_compare(): use shortname_storage
      Pass parent directory inode and expected name to ->d_revalidate()
      afs_d_revalidate(): use stable name and parent inode passed by caller
      ceph_d_revalidate(): use stable parent inode passed by caller
      ceph_d_revalidate(): propagate stable name down into request enconding
      fscrypt_d_revalidate(): use stable parent inode passed by caller
      exfat_d_revalidate(): use stable parent inode passed by caller
      vfat_revalidate{,_ci}(): use stable parent inode passed by caller
      fuse_dentry_revalidate(): use stable parent inode and name passed by caller
      gfs2_drevalidate(): use stable parent inode and name passed by caller
      nfs{,4}_lookup_validate(): use stable parent inode passed by caller
      nfs: fix ->d_revalidate() UAF on ->d_name accesses
      ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
      orangefs_d_revalidate(): use stable parent inode and name passed by caller
      9p: fix ->rename_sem exclusion

Diffstat (again, including #work.dcache):
 Documentation/filesystems/locking.rst        |  5 +-
 Documentation/filesystems/porting.rst        | 13 ++++
 Documentation/filesystems/vfs.rst            | 22 ++++++-
 fs/9p/v9fs.h                                 |  2 +-
 fs/9p/vfs_dentry.c                           | 23 ++++++-
 fs/afs/dir.c                                 | 40 ++++--------
 fs/ceph/dir.c                                | 25 ++------
 fs/ceph/mds_client.c                         |  9 ++-
 fs/ceph/mds_client.h                         |  2 +
 fs/coda/dir.c                                |  3 +-
 fs/crypto/fname.c                            | 22 ++-----
 fs/dcache.c                                  | 96 ++++++++++++++++------------
 fs/ecryptfs/dentry.c                         | 18 ++++--
 fs/exfat/namei.c                             | 11 +---
 fs/ext4/fast_commit.c                        | 29 ++-------
 fs/ext4/fast_commit.h                        |  3 +-
 fs/fat/namei_vfat.c                          | 19 +++---
 fs/fuse/dir.c                                | 14 ++--
 fs/gfs2/dentry.c                             | 31 ++++-----
 fs/hfs/sysdep.c                              |  3 +-
 fs/jfs/namei.c                               |  3 +-
 fs/kernfs/dir.c                              |  3 +-
 fs/libfs.c                                   | 15 +++--
 fs/namei.c                                   | 18 +++---
 fs/nfs/dir.c                                 | 62 ++++++++----------
 fs/nfs/namespace.c                           |  2 +-
 fs/nfs/nfs3proc.c                            |  5 +-
 fs/nfs/nfs4proc.c                            | 20 +++---
 fs/nfs/proc.c                                |  6 +-
 fs/ocfs2/dcache.c                            | 14 ++--
 fs/orangefs/dcache.c                         | 20 +++---
 fs/overlayfs/super.c                         | 22 ++++++-
 fs/proc/base.c                               |  6 +-
 fs/proc/fd.c                                 |  3 +-
 fs/proc/generic.c                            |  6 +-
 fs/proc/proc_sysctl.c                        |  3 +-
 fs/smb/client/dir.c                          |  3 +-
 fs/tracefs/inode.c                           |  3 +-
 fs/vboxsf/dir.c                              |  3 +-
 include/linux/dcache.h                       | 22 +++++--
 include/linux/fscrypt.h                      |  7 +-
 include/linux/nfs_xdr.h                      |  2 +-
 tools/testing/selftests/bpf/progs/find_vma.c |  2 +-
 43 files changed, 336 insertions(+), 304 deletions(-)

	Overview (#work.dcache is the first 6 commits in there):

Part 1: hopefully cheaper take_dentry_name_snapshot() and handling of inline
(short) names.  One surprising thing was that gcc __builtin_memcpy() does
*not* make use of the alignment information; turns out that it's better to
wrap the entire short name into an object that can be copied by assignment.

01/20)   make sure that DNAME_INLINE_LEN is a multiple of word size
	Linus' suggestion to define the size of shortname in terms of
unsigned long words and derive its size in bytes from that.  Cleaner
that way.
02/20)   dcache: back inline names with a struct-wrapped array of unsigned long
	... so that they can be copied with struct assignment (which
generates better code) and accessed word-by-word.
	The type is union shortname_storage; it's a union of arrays of
unsigned char and unsigned long.
	struct name_snapshot.inline_name turned into union
shortname_storage; users (all in fs/dcache.c) adjusted.
	struct dentry.d_iname has some users outside of fs/dcache.c;
to reduce the amount of noise in commit, it is replaced with union
shortname_storage d_shortname and d_iname is turned into a macro that
expands to d_shortname.string (similar to d_lock handling)
03/20)   make take_dentry_name_snapshot() lockless
	Use ->d_seq instead of grabbing ->d_lock; in case of shortname
dentries that avoids any stores to shared data objects and in case of
long names we are down to (unavoidable) atomic_inc on the external_name
refcount.  Makes the thing safer as well - the areas where ->d_seq is held
odd are all nested inside the areas where ->d_lock is held, and the latter
are much more numerous.  NOTE: now that there is a lockless path where
we might try to grab a reference to an already doomed external_name
instance, it is no longer possible for external_name.u.count and
external_name.u.head to share space (kudos to Linus for spotting that).
To reduce the noice this commit just make external_name.u a struct
(instead of union); the next commit will dissolve it.
04/20)   dissolve external_name.u into separate members
	kept separate from the previous commit to keep the noise separate
from actual changes...
05/20)   ext4 fast_commit: make use of name_snapshot primitives
	... rather than open-coding them.  As a bonus, that avoids the
pointless work with extra allocations, etc. for long names.
06/20)   generic_ci_d_compare(): use shortname_storage
	... and check the "name might be unstable" predicate the right way.

Part 2: ->d_revalidate() calling conventions change.  The first commit
adds the method prototype and has the extra arguments supplied by the callers;
making use of those extra arguments is done in followup patches, so that
they can be reviewed separately.

07/20)   Pass parent directory inode and expected name to ->d_revalidate()
	->d_revalidate() often needs to access dentry parent and name;
that has to be done carefully, since the locking environment varies from
caller to caller.  We are not guaranteed that dentry in question will
not be moved right under us - not unless the filesystem is such that
nothing on it ever gets renamed.
	It can be dealt with, but that results in boilerplate code that
isn't even needed - the callers normally have just found the dentry
via dcache lookup and want to verify that it's in the right place; they
already have the values of ->d_parent and ->d_name stable.  There is a
couple of exceptions (overlayfs and, to less extent, ecryptfs), but for
the majority of calls that song and dance is not needed at all.
	It's easier to make ecryptfs and overlayfs find and pass those
values if there's a ->d_revalidate() instance to be called, rather than
doing that in the instances.
	This commit only changes the calling conventions; making use of
supplied values is left to followups.
	NOTE: some instances need more than just the parent - things like
CIFS may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate.  This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).


Part 3: making use of the new arguments - getting rid of parent-obtaining
boilerplate in the instances and getting rid of racy uses of ->d_name in
some of those.  Split on per-filesystem basis.

08/20)   afs_d_revalidate(): use stable name and parent inode passed by caller
09/20)   ceph_d_revalidate(): use stable parent inode passed by caller
10/20)   ceph_d_revalidate(): propagate stable name down into request enconding
11/20)   fscrypt_d_revalidate(): use stable parent inode passed by caller
12/20)   exfat_d_revalidate(): use stable parent inode passed by caller
13/20)   vfat_revalidate{,_ci}(): use stable parent inode passed by caller
14/20)   fuse_dentry_revalidate(): use stable parent inode and name passed by caller
15/20)   gfs2_drevalidate(): use stable parent inode and name passed by caller
16/20)   nfs{,4}_lookup_validate(): use stable parent inode passed by caller
17/20)   nfs: fix ->d_revalidate() UAF on ->d_name accesses
18/20)   ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
19/20)   orangefs_d_revalidate(): use stable parent inode and name passed by caller


Part 4: dealing with races in access to ancestors' names/parents.  Changing the
calling conventions above helps with the name and parent of dentry being revalidated;
it doesn't do anything for its ancestors and some instances want to deal with the
entire path to filesystem root.  The way it's done varies from filesystem to
filesystem and often isn't limited to ->d_revalidate().  There is a safe helper
(dentry_path()) and e.g. smb avoids the races by using it.  9p and ceph do not,
for various reasons, and both have problems.  I've done a 9p fix (see below);
ceph one is trickier and I'd prefer to discuss it with ceph folks first.

20/20)   9p: fix ->rename_sem exclusion
	9p wants to be able to build a path from given dentry to fs root
and keep it valid over a blocking operation.
	->s_vfs_rename_mutex would be a natural candidate, but there
are places where we want that and where we have no way to tell if
->s_vfs_rename_mutex has already been taken deeper in callchain.
Moreover, it's only held for cross-directory renames; name changes within
the same directory happen without touching that lock.
Current mainline solution:
	* have d_move() done in ->rename() rather than in its caller
	* maintain a 9p-private rwsem (->rename_sem, per-filesystem)
	* hold it exclusive over the relevant part of ->rename()
	* hold it shared over the places where we want the path.
That almost works.  FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control.  However, there's
also __d_unalias(), which isn't covered by any of that.
	If ->lookup() hits a directory inode with preexisting dentry
elsewhere (due to e.g. rename done on server behind our back),
d_splice_alias() called by ->lookup() will move/rename that alias.
	One approach to fixing that would be to add a couple of optional
methods, so that __d_unalias() would do
	if alias->d_op->d_unalias_trylock != NULL
		if (!alias->d_op->d_unalias_trylock(alias))
			fail (resulting in -ESTALE from lookup)
	__d_move(...)
	if alias->d_op->d_unalias_unlock != NULL
		alias->d_unalias_unlock(alias)
where it currently does __d_move().  9p instances would be down_write_trylock()
and up_write() of ->rename_sem.
	However, to reduce dentry_operations bloat, let's add one method
instead - ->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias)
and ->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias).
	Another possible variant would be to hold ->rename_sem exclusive
around d_splice_alias() calls in 9p ->lookup(), but that would cause a lot
of contention on that rwsem (every lookup rather than only ones that end
up with __d_unalias()) and rwsem is filesystem-wide.  Let's not go there.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size
  2025-01-10  2:38 [PATCHES][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
@ 2025-01-10  2:42 ` Al Viro
  2025-01-10  2:42   ` [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
                     ` (19 more replies)
  2025-01-16  5:21 ` [PATCHES v2][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
  1 sibling, 20 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... calling the number of words DNAME_INLINE_WORDS.

The next step will be to have a structure to hold inline name arrays
(both in dentry and in name_snapshot) and use that to alias the
existing arrays of unsigned char there.  That will allow both
full-structure copies and convenient word-by-word accesses.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c            | 4 +---
 include/linux/dcache.h | 8 +++++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b4d5e9e1e43d..ea0f0bea511b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2748,9 +2748,7 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			/*
 			 * Both are internal.
 			 */
-			unsigned int i;
-			BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
-			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
+			for (int i = 0; i < DNAME_INLINE_WORDS; i++) {
 				swap(((long *) &dentry->d_iname)[i],
 				     ((long *) &target->d_iname)[i]);
 			}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index bff956f7b2b9..42dd89beaf4e 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -68,15 +68,17 @@ extern const struct qstr dotdot_name;
  * large memory footprint increase).
  */
 #ifdef CONFIG_64BIT
-# define DNAME_INLINE_LEN 40 /* 192 bytes */
+# define DNAME_INLINE_WORDS 5 /* 192 bytes */
 #else
 # ifdef CONFIG_SMP
-#  define DNAME_INLINE_LEN 36 /* 128 bytes */
+#  define DNAME_INLINE_WORDS 9 /* 128 bytes */
 # else
-#  define DNAME_INLINE_LEN 44 /* 128 bytes */
+#  define DNAME_INLINE_WORDS 11 /* 128 bytes */
 # endif
 #endif
 
+#define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
+
 #define d_lock	d_lockref.lock
 
 struct dentry {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  9:35     ` Jan Kara
  2025-01-10  2:42   ` [PATCH 03/20] make take_dentry_name_snapshot() lockless Al Viro
                     ` (18 subsequent siblings)
  19 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... so that they can be copied with struct assignment (which generates
better code) and accessed word-by-word.

The type is union shortname_storage; it's a union of arrays of
unsigned char and unsigned long.

struct name_snapshot.inline_name turned into union shortname_storage;
users (all in fs/dcache.c) adjusted.

struct dentry.d_iname has some users outside of fs/dcache.c; to
reduce the amount of noise in commit, it is replaced with
union shortname_storage d_shortname and d_iname is turned into a macro
that expands to d_shortname.string (similar to d_lock handling, hopefully
temporary - most, if not all, users shouldn't be messing with it).)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c                                  | 43 +++++++++-----------
 include/linux/dcache.h                       | 10 ++++-
 tools/testing/selftests/bpf/progs/find_vma.c |  2 +-
 3 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ea0f0bea511b..52662a5d08e4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -324,7 +324,7 @@ static void __d_free_external(struct rcu_head *head)
 
 static inline int dname_external(const struct dentry *dentry)
 {
-	return dentry->d_name.name != dentry->d_iname;
+	return dentry->d_name.name != dentry->d_shortname.string;
 }
 
 void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
@@ -334,9 +334,8 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry
 	if (unlikely(dname_external(dentry))) {
 		atomic_inc(&external_name(dentry)->u.count);
 	} else {
-		memcpy(name->inline_name, dentry->d_iname,
-		       dentry->d_name.len + 1);
-		name->name.name = name->inline_name;
+		name->inline_name = dentry->d_shortname;
+		name->name.name = name->inline_name.string;
 	}
 	spin_unlock(&dentry->d_lock);
 }
@@ -344,7 +343,7 @@ EXPORT_SYMBOL(take_dentry_name_snapshot);
 
 void release_dentry_name_snapshot(struct name_snapshot *name)
 {
-	if (unlikely(name->name.name != name->inline_name)) {
+	if (unlikely(name->name.name != name->inline_name.string)) {
 		struct external_name *p;
 		p = container_of(name->name.name, struct external_name, name[0]);
 		if (unlikely(atomic_dec_and_test(&p->u.count)))
@@ -1654,10 +1653,10 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	 * will still always have a NUL at the end, even if we might
 	 * be overwriting an internal NUL character
 	 */
-	dentry->d_iname[DNAME_INLINE_LEN-1] = 0;
+	dentry->d_shortname.string[DNAME_INLINE_LEN-1] = 0;
 	if (unlikely(!name)) {
 		name = &slash_name;
-		dname = dentry->d_iname;
+		dname = dentry->d_shortname.string;
 	} else if (name->len > DNAME_INLINE_LEN-1) {
 		size_t size = offsetof(struct external_name, name[1]);
 		struct external_name *p = kmalloc(size + name->len,
@@ -1670,7 +1669,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 		atomic_set(&p->u.count, 1);
 		dname = p->name;
 	} else  {
-		dname = dentry->d_iname;
+		dname = dentry->d_shortname.string;
 	}	
 
 	dentry->d_name.len = name->len;
@@ -2729,10 +2728,9 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			 * dentry:internal, target:external.  Steal target's
 			 * storage and make target internal.
 			 */
-			memcpy(target->d_iname, dentry->d_name.name,
-					dentry->d_name.len + 1);
 			dentry->d_name.name = target->d_name.name;
-			target->d_name.name = target->d_iname;
+			target->d_shortname = dentry->d_shortname;
+			target->d_name.name = target->d_shortname.string;
 		}
 	} else {
 		if (unlikely(dname_external(dentry))) {
@@ -2740,18 +2738,16 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			 * dentry:external, target:internal.  Give dentry's
 			 * storage to target and make dentry internal
 			 */
-			memcpy(dentry->d_iname, target->d_name.name,
-					target->d_name.len + 1);
 			target->d_name.name = dentry->d_name.name;
-			dentry->d_name.name = dentry->d_iname;
+			dentry->d_shortname = target->d_shortname;
+			dentry->d_name.name = dentry->d_shortname.string;
 		} else {
 			/*
 			 * Both are internal.
 			 */
-			for (int i = 0; i < DNAME_INLINE_WORDS; i++) {
-				swap(((long *) &dentry->d_iname)[i],
-				     ((long *) &target->d_iname)[i]);
-			}
+			for (int i = 0; i < DNAME_INLINE_WORDS; i++)
+				swap(dentry->d_shortname.words[i],
+				     target->d_shortname.words[i]);
 		}
 	}
 	swap(dentry->d_name.hash_len, target->d_name.hash_len);
@@ -2766,9 +2762,8 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 		atomic_inc(&external_name(target)->u.count);
 		dentry->d_name = target->d_name;
 	} else {
-		memcpy(dentry->d_iname, target->d_name.name,
-				target->d_name.len + 1);
-		dentry->d_name.name = dentry->d_iname;
+		dentry->d_shortname = target->d_shortname;
+		dentry->d_name.name = dentry->d_shortname.string;
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
 	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
@@ -3101,12 +3096,12 @@ void d_mark_tmpfile(struct file *file, struct inode *inode)
 {
 	struct dentry *dentry = file->f_path.dentry;
 
-	BUG_ON(dentry->d_name.name != dentry->d_iname ||
+	BUG_ON(dname_external(dentry) ||
 		!hlist_unhashed(&dentry->d_u.d_alias) ||
 		!d_unlinked(dentry));
 	spin_lock(&dentry->d_parent->d_lock);
 	spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-	dentry->d_name.len = sprintf(dentry->d_iname, "#%llu",
+	dentry->d_name.len = sprintf(dentry->d_shortname.string, "#%llu",
 				(unsigned long long)inode->i_ino);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dentry->d_parent->d_lock);
@@ -3194,7 +3189,7 @@ static void __init dcache_init(void)
 	 */
 	dentry_cache = KMEM_CACHE_USERCOPY(dentry,
 		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT,
-		d_iname);
+		d_shortname.string);
 
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 42dd89beaf4e..8bc567a35718 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -79,7 +79,13 @@ extern const struct qstr dotdot_name;
 
 #define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
 
+union shortname_store {
+	unsigned char string[DNAME_INLINE_LEN];
+	unsigned long words[DNAME_INLINE_WORDS];
+};
+
 #define d_lock	d_lockref.lock
+#define d_iname d_shortname.string
 
 struct dentry {
 	/* RCU lookup touched fields */
@@ -90,7 +96,7 @@ struct dentry {
 	struct qstr d_name;
 	struct inode *d_inode;		/* Where the name belongs to - NULL is
 					 * negative */
-	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
+	union shortname_store d_shortname;
 	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
 
 	/* Ref lookup also touches following */
@@ -591,7 +597,7 @@ static inline struct inode *d_real_inode(const struct dentry *dentry)
 
 struct name_snapshot {
 	struct qstr name;
-	unsigned char inline_name[DNAME_INLINE_LEN];
+	union shortname_store inline_name;
 };
 void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
 void release_dentry_name_snapshot(struct name_snapshot *);
diff --git a/tools/testing/selftests/bpf/progs/find_vma.c b/tools/testing/selftests/bpf/progs/find_vma.c
index 38034fb82530..02b82774469c 100644
--- a/tools/testing/selftests/bpf/progs/find_vma.c
+++ b/tools/testing/selftests/bpf/progs/find_vma.c
@@ -25,7 +25,7 @@ static long check_vma(struct task_struct *task, struct vm_area_struct *vma,
 {
 	if (vma->vm_file)
 		bpf_probe_read_kernel_str(d_iname, DNAME_INLINE_LEN - 1,
-					  vma->vm_file->f_path.dentry->d_iname);
+					  vma->vm_file->f_path.dentry->d_shortname.string);
 
 	/* check for VM_EXEC */
 	if (vma->vm_flags & VM_EXEC)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 03/20] make take_dentry_name_snapshot() lockless
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
  2025-01-10  2:42   ` [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  9:45     ` Jan Kara
  2025-01-10  2:42   ` [PATCH 04/20] dissolve external_name.u into separate members Al Viro
                     ` (17 subsequent siblings)
  19 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Use ->d_seq instead of grabbing ->d_lock; in case of shortname dentries
that avoids any stores to shared data objects and in case of long names
we are down to (unavoidable) atomic_inc on the external_name refcount.

Makes the thing safer as well - the areas where ->d_seq is held odd are
all nested inside the areas where ->d_lock is held, and the latter are
much more numerous.

NOTE: now that there is a lockless path where we might try to grab
a reference to an already doomed external_name instance, it is no
longer possible for external_name.u.count and external_name.u.head
to share space (kudos to Linus for spotting that).

To reduce the noice this commit just turns external_name.u into
a struct (instead of union); the next commit will dissolve it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 52662a5d08e4..f387dc97df86 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -296,9 +296,9 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
 }
 
 struct external_name {
-	union {
-		atomic_t count;
-		struct rcu_head head;
+	struct {
+		atomic_t count;		// ->count and ->head can't be combined
+		struct rcu_head head;	// see take_dentry_name_snapshot()
 	} u;
 	unsigned char name[];
 };
@@ -329,15 +329,30 @@ static inline int dname_external(const struct dentry *dentry)
 
 void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
 {
-	spin_lock(&dentry->d_lock);
-	name->name = dentry->d_name;
-	if (unlikely(dname_external(dentry))) {
-		atomic_inc(&external_name(dentry)->u.count);
-	} else {
+	unsigned seq;
+	const unsigned char *s;
+
+	rcu_read_lock();
+retry:
+	seq = read_seqcount_begin(&dentry->d_seq);
+	s = READ_ONCE(dentry->d_name.name);
+	name->name.hash_len = dentry->d_name.hash_len;
+	name->name.name = name->inline_name.string;
+	if (likely(s == dentry->d_shortname.string)) {
 		name->inline_name = dentry->d_shortname;
-		name->name.name = name->inline_name.string;
+	} else {
+		struct external_name *p;
+		p = container_of(s, struct external_name, name[0]);
+		// get a valid reference
+		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
+			goto retry;
+		name->name.name = s;
 	}
-	spin_unlock(&dentry->d_lock);
+	if (read_seqcount_retry(&dentry->d_seq, seq)) {
+		release_dentry_name_snapshot(name);
+		goto retry;
+	}
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(take_dentry_name_snapshot);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 04/20] dissolve external_name.u into separate members
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
  2025-01-10  2:42   ` [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
  2025-01-10  2:42   ` [PATCH 03/20] make take_dentry_name_snapshot() lockless Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  7:34     ` David Howells
  2025-01-10  2:42   ` [PATCH 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
                     ` (16 subsequent siblings)
  19 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

kept separate from the previous commit to keep the noise separate
from actual changes...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f387dc97df86..7d42ca367522 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -296,10 +296,8 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
 }
 
 struct external_name {
-	struct {
-		atomic_t count;		// ->count and ->head can't be combined
-		struct rcu_head head;	// see take_dentry_name_snapshot()
-	} u;
+	atomic_t count;		// ->count and ->head can't be combined
+	struct rcu_head head;	// see take_dentry_name_snapshot()
 	unsigned char name[];
 };
 
@@ -344,7 +342,7 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry
 		struct external_name *p;
 		p = container_of(s, struct external_name, name[0]);
 		// get a valid reference
-		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
+		if (unlikely(!atomic_inc_not_zero(&p->count)))
 			goto retry;
 		name->name.name = s;
 	}
@@ -361,8 +359,8 @@ void release_dentry_name_snapshot(struct name_snapshot *name)
 	if (unlikely(name->name.name != name->inline_name.string)) {
 		struct external_name *p;
 		p = container_of(name->name.name, struct external_name, name[0]);
-		if (unlikely(atomic_dec_and_test(&p->u.count)))
-			kfree_rcu(p, u.head);
+		if (unlikely(atomic_dec_and_test(&p->count)))
+			kfree_rcu(p, head);
 	}
 }
 EXPORT_SYMBOL(release_dentry_name_snapshot);
@@ -400,7 +398,7 @@ static void dentry_free(struct dentry *dentry)
 	WARN_ON(!hlist_unhashed(&dentry->d_u.d_alias));
 	if (unlikely(dname_external(dentry))) {
 		struct external_name *p = external_name(dentry);
-		if (likely(atomic_dec_and_test(&p->u.count))) {
+		if (likely(atomic_dec_and_test(&p->count))) {
 			call_rcu(&dentry->d_u.d_rcu, __d_free_external);
 			return;
 		}
@@ -1681,7 +1679,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 			kmem_cache_free(dentry_cache, dentry); 
 			return NULL;
 		}
-		atomic_set(&p->u.count, 1);
+		atomic_set(&p->count, 1);
 		dname = p->name;
 	} else  {
 		dname = dentry->d_shortname.string;
@@ -2774,15 +2772,15 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 	if (unlikely(dname_external(dentry)))
 		old_name = external_name(dentry);
 	if (unlikely(dname_external(target))) {
-		atomic_inc(&external_name(target)->u.count);
+		atomic_inc(&external_name(target)->count);
 		dentry->d_name = target->d_name;
 	} else {
 		dentry->d_shortname = target->d_shortname;
 		dentry->d_name.name = dentry->d_shortname.string;
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
-	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
-		kfree_rcu(old_name, u.head);
+	if (old_name && likely(atomic_dec_and_test(&old_name->count)))
+		kfree_rcu(old_name, head);
 }
 
 /*
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 05/20] ext4 fast_commit: make use of name_snapshot primitives
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (2 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 04/20] dissolve external_name.u into separate members Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  9:15     ` Jan Kara
  2025-01-10  2:42   ` [PATCH 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
                     ` (15 subsequent siblings)
  19 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... rather than open-coding them.  As a bonus, that avoids the pointless
work with extra allocations, etc. for long names.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ext4/fast_commit.c | 29 +++++------------------------
 fs/ext4/fast_commit.h |  3 +--
 2 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 26c4fc37edcf..da4263a14a20 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -322,9 +322,7 @@ void ext4_fc_del(struct inode *inode)
 	WARN_ON(!list_empty(&ei->i_fc_dilist));
 	spin_unlock(&sbi->s_fc_lock);
 
-	if (fc_dentry->fcd_name.name &&
-		fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
-		kfree(fc_dentry->fcd_name.name);
+	release_dentry_name_snapshot(&fc_dentry->fcd_name);
 	kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
 
 	return;
@@ -449,22 +447,7 @@ static int __track_dentry_update(handle_t *handle, struct inode *inode,
 	node->fcd_op = dentry_update->op;
 	node->fcd_parent = dir->i_ino;
 	node->fcd_ino = inode->i_ino;
-	if (dentry->d_name.len > DNAME_INLINE_LEN) {
-		node->fcd_name.name = kmalloc(dentry->d_name.len, GFP_NOFS);
-		if (!node->fcd_name.name) {
-			kmem_cache_free(ext4_fc_dentry_cachep, node);
-			ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_NOMEM, handle);
-			mutex_lock(&ei->i_fc_lock);
-			return -ENOMEM;
-		}
-		memcpy((u8 *)node->fcd_name.name, dentry->d_name.name,
-			dentry->d_name.len);
-	} else {
-		memcpy(node->fcd_iname, dentry->d_name.name,
-			dentry->d_name.len);
-		node->fcd_name.name = node->fcd_iname;
-	}
-	node->fcd_name.len = dentry->d_name.len;
+	take_dentry_name_snapshot(&node->fcd_name, dentry);
 	INIT_LIST_HEAD(&node->fcd_dilist);
 	spin_lock(&sbi->s_fc_lock);
 	if (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
@@ -832,7 +815,7 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
 {
 	struct ext4_fc_dentry_info fcd;
 	struct ext4_fc_tl tl;
-	int dlen = fc_dentry->fcd_name.len;
+	int dlen = fc_dentry->fcd_name.name.len;
 	u8 *dst = ext4_fc_reserve_space(sb,
 			EXT4_FC_TAG_BASE_LEN + sizeof(fcd) + dlen, crc);
 
@@ -847,7 +830,7 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
 	dst += EXT4_FC_TAG_BASE_LEN;
 	memcpy(dst, &fcd, sizeof(fcd));
 	dst += sizeof(fcd);
-	memcpy(dst, fc_dentry->fcd_name.name, dlen);
+	memcpy(dst, fc_dentry->fcd_name.name.name, dlen);
 
 	return true;
 }
@@ -1328,9 +1311,7 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
 		list_del_init(&fc_dentry->fcd_dilist);
 		spin_unlock(&sbi->s_fc_lock);
 
-		if (fc_dentry->fcd_name.name &&
-			fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
-			kfree(fc_dentry->fcd_name.name);
+		release_dentry_name_snapshot(&fc_dentry->fcd_name);
 		kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
 		spin_lock(&sbi->s_fc_lock);
 	}
diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
index 2fadb2c4780c..3bd534e4dbbf 100644
--- a/fs/ext4/fast_commit.h
+++ b/fs/ext4/fast_commit.h
@@ -109,8 +109,7 @@ struct ext4_fc_dentry_update {
 	int fcd_op;		/* Type of update create / unlink / link */
 	int fcd_parent;		/* Parent inode number */
 	int fcd_ino;		/* Inode number */
-	struct qstr fcd_name;	/* Dirent name */
-	unsigned char fcd_iname[DNAME_INLINE_LEN];	/* Dirent name string */
+	struct name_snapshot fcd_name;	/* Dirent name */
 	struct list_head fcd_list;
 	struct list_head fcd_dilist;
 };
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 06/20] generic_ci_d_compare(): use shortname_storage
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (3 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  2:42   ` [PATCH 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
                     ` (14 subsequent siblings)
  19 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... and check the "name might be unstable" predicate
the right way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/libfs.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 748ac5923154..3ad1b1b7fed6 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1789,7 +1789,7 @@ int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 {
 	const struct dentry *parent;
 	const struct inode *dir;
-	char strbuf[DNAME_INLINE_LEN];
+	union shortname_store strbuf;
 	struct qstr qstr;
 
 	/*
@@ -1809,22 +1809,23 @@ int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 	if (!dir || !IS_CASEFOLDED(dir))
 		return 1;
 
+	qstr.len = len;
+	qstr.name = str;
 	/*
 	 * If the dentry name is stored in-line, then it may be concurrently
 	 * modified by a rename.  If this happens, the VFS will eventually retry
 	 * the lookup, so it doesn't matter what ->d_compare() returns.
 	 * However, it's unsafe to call utf8_strncasecmp() with an unstable
 	 * string.  Therefore, we have to copy the name into a temporary buffer.
+	 * As above, len is guaranteed to match str, so the shortname case
+	 * is exactly when str points to ->d_shortname.
 	 */
-	if (len <= DNAME_INLINE_LEN - 1) {
-		memcpy(strbuf, str, len);
-		strbuf[len] = 0;
-		str = strbuf;
+	if (qstr.name == dentry->d_shortname.string) {
+		strbuf = dentry->d_shortname; // NUL is guaranteed to be in there
+		qstr.name = strbuf.string;
 		/* prevent compiler from optimizing out the temporary buffer */
 		barrier();
 	}
-	qstr.len = len;
-	qstr.name = str;
 
 	return utf8_strncasecmp(dentry->d_sb->s_encoding, name, &qstr);
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 07/20] Pass parent directory inode and expected name to ->d_revalidate()
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (4 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  2:42   ` [PATCH 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
                     ` (13 subsequent siblings)
  19 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

->d_revalidate() often needs to access dentry parent and name; that has
to be done carefully, since the locking environment varies from caller
to caller.  We are not guaranteed that dentry in question will not be
moved right under us - not unless the filesystem is such that nothing
on it ever gets renamed.

It can be dealt with, but that results in boilerplate code that isn't
even needed - the callers normally have just found the dentry via dcache
lookup and want to verify that it's in the right place; they already
have the values of ->d_parent and ->d_name stable.  There is a couple
of exceptions (overlayfs and, to less extent, ecryptfs), but for the
majority of calls that song and dance is not needed at all.

It's easier to make ecryptfs and overlayfs find and pass those values if
there's a ->d_revalidate() instance to be called, rather than doing that
in the instances.

This commit only changes the calling conventions; making use of supplied
values is left to followups.

NOTE: some instances need more than just the parent - things like CIFS
may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate.  This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/locking.rst |  3 ++-
 Documentation/filesystems/porting.rst | 13 +++++++++++++
 Documentation/filesystems/vfs.rst     |  3 ++-
 fs/9p/vfs_dentry.c                    | 10 ++++++++--
 fs/afs/dir.c                          |  6 ++++--
 fs/ceph/dir.c                         |  5 +++--
 fs/coda/dir.c                         |  3 ++-
 fs/crypto/fname.c                     |  3 ++-
 fs/ecryptfs/dentry.c                  | 18 ++++++++++++++----
 fs/exfat/namei.c                      |  3 ++-
 fs/fat/namei_vfat.c                   |  6 ++++--
 fs/fuse/dir.c                         |  3 ++-
 fs/gfs2/dentry.c                      |  7 +++++--
 fs/hfs/sysdep.c                       |  3 ++-
 fs/jfs/namei.c                        |  3 ++-
 fs/kernfs/dir.c                       |  3 ++-
 fs/namei.c                            | 18 ++++++++++--------
 fs/nfs/dir.c                          |  9 ++++++---
 fs/ocfs2/dcache.c                     |  3 ++-
 fs/orangefs/dcache.c                  |  3 ++-
 fs/overlayfs/super.c                  | 22 ++++++++++++++++++++--
 fs/proc/base.c                        |  6 ++++--
 fs/proc/fd.c                          |  3 ++-
 fs/proc/generic.c                     |  6 ++++--
 fs/proc/proc_sysctl.c                 |  3 ++-
 fs/smb/client/dir.c                   |  3 ++-
 fs/tracefs/inode.c                    |  3 ++-
 fs/vboxsf/dir.c                       |  3 ++-
 include/linux/dcache.h                |  3 ++-
 include/linux/fscrypt.h               |  7 ++++---
 30 files changed, 133 insertions(+), 51 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index f5e3676db954..146e7d8aa736 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -17,7 +17,8 @@ dentry_operations
 
 prototypes::
 
-	int (*d_revalidate)(struct dentry *, unsigned int);
+	int (*d_revalidate)(struct inode *, const struct qstr *,
+			    struct dentry *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
 	int (*d_compare)(const struct dentry *,
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 9ab2a3d6f2b4..b50c3ce36ef2 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1141,3 +1141,16 @@ pointer are gone.
 
 set_blocksize() takes opened struct file instead of struct block_device now
 and it *must* be opened exclusive.
+
+---
+
+** mandatory**
+
+->d_revalidate() gets two extra arguments - inode of parent directory and
+name our dentry is expected to have.  Both are stable (dir is pinned in
+non-RCU case and will stay around during the call in RCU case, and name
+is guaranteed to stay unchanging).  Your instance doesn't have to use
+either, but it often helps to avoid a lot of painful boilerplate.
+NOTE: if you need something like full path from the root of filesystem,
+you are still on your own - this assists with simple cases, but it's not
+magic.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 0b18af3f954e..7c352ebaae98 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1251,7 +1251,8 @@ defined:
 .. code-block:: c
 
 	struct dentry_operations {
-		int (*d_revalidate)(struct dentry *, unsigned int);
+		int (*d_revalidate)(struct inode *, const struct qstr *,
+				    struct dentry *, unsigned int);
 		int (*d_weak_revalidate)(struct dentry *, unsigned int);
 		int (*d_hash)(const struct dentry *, struct qstr *);
 		int (*d_compare)(const struct dentry *,
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 01338d4c2d9e..872c1abe3295 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -61,7 +61,7 @@ static void v9fs_dentry_release(struct dentry *dentry)
 		p9_fid_put(hlist_entry(p, struct p9_fid, dlist));
 }
 
-static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
+static int __v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 {
 	struct p9_fid *fid;
 	struct inode *inode;
@@ -99,9 +99,15 @@ static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 	return 1;
 }
 
+static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
+{
+	return __v9fs_lookup_revalidate(dentry, flags);
+}
+
 const struct dentry_operations v9fs_cached_dentry_operations = {
 	.d_revalidate = v9fs_lookup_revalidate,
-	.d_weak_revalidate = v9fs_lookup_revalidate,
+	.d_weak_revalidate = __v9fs_lookup_revalidate,
 	.d_delete = v9fs_cached_dentry_delete,
 	.d_release = v9fs_dentry_release,
 };
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index ada363af5aab..9780013cd83a 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -22,7 +22,8 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
 				 unsigned int flags);
 static int afs_dir_open(struct inode *inode, struct file *file);
 static int afs_readdir(struct file *file, struct dir_context *ctx);
-static int afs_d_revalidate(struct dentry *dentry, unsigned int flags);
+static int afs_d_revalidate(struct inode *dir, const struct qstr *name,
+			    struct dentry *dentry, unsigned int flags);
 static int afs_d_delete(const struct dentry *dentry);
 static void afs_d_iput(struct dentry *dentry, struct inode *inode);
 static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name, int nlen,
@@ -1093,7 +1094,8 @@ static int afs_d_revalidate_rcu(struct dentry *dentry)
  * - NOTE! the hit can be a negative hit too, so we can't assume we have an
  *   inode
  */
-static int afs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+			    struct dentry *dentry, unsigned int flags)
 {
 	struct afs_vnode *vnode, *dir;
 	struct afs_fid fid;
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 0bf388e07a02..c4c71c24221b 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1940,7 +1940,8 @@ static int dir_lease_is_valid(struct inode *dir, struct dentry *dentry,
 /*
  * Check if cached dentry can be trusted.
  */
-static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+			     struct dentry *dentry, unsigned int flags)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_fs_client(dentry->d_sb)->mdsc;
 	struct ceph_client *cl = mdsc->fsc->client;
@@ -1948,7 +1949,7 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 	struct dentry *parent;
 	struct inode *dir, *inode;
 
-	valid = fscrypt_d_revalidate(dentry, flags);
+	valid = fscrypt_d_revalidate(parent_dir, name, dentry, flags);
 	if (valid <= 0)
 		return valid;
 
diff --git a/fs/coda/dir.c b/fs/coda/dir.c
index 4e552ba7bd43..a3e2dfeedfbf 100644
--- a/fs/coda/dir.c
+++ b/fs/coda/dir.c
@@ -445,7 +445,8 @@ static int coda_readdir(struct file *coda_file, struct dir_context *ctx)
 }
 
 /* called when a cache lookup succeeds */
-static int coda_dentry_revalidate(struct dentry *de, unsigned int flags)
+static int coda_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *de, unsigned int flags)
 {
 	struct inode *inode;
 	struct coda_inode_info *cii;
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 0ad52fbe51c9..389f5b2bf63b 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -574,7 +574,8 @@ EXPORT_SYMBOL_GPL(fscrypt_fname_siphash);
  * Validate dentries in encrypted directories to make sure we aren't potentially
  * caching stale dentries after a key has been added.
  */
-int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
+int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+			 struct dentry *dentry, unsigned int flags)
 {
 	struct dentry *dir;
 	int err;
diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
index acaa0825e9bb..1dfd5b81d831 100644
--- a/fs/ecryptfs/dentry.c
+++ b/fs/ecryptfs/dentry.c
@@ -17,7 +17,9 @@
 
 /**
  * ecryptfs_d_revalidate - revalidate an ecryptfs dentry
- * @dentry: The ecryptfs dentry
+ * @dir: inode of expected parent
+ * @name: expected name
+ * @dentry: dentry to revalidate
  * @flags: lookup flags
  *
  * Called when the VFS needs to revalidate a dentry. This
@@ -28,7 +30,8 @@
  * Returns 1 if valid, 0 otherwise.
  *
  */
-static int ecryptfs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int ecryptfs_d_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
 	int rc = 1;
@@ -36,8 +39,15 @@ static int ecryptfs_d_revalidate(struct dentry *dentry, unsigned int flags)
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	if (lower_dentry->d_flags & DCACHE_OP_REVALIDATE)
-		rc = lower_dentry->d_op->d_revalidate(lower_dentry, flags);
+	if (lower_dentry->d_flags & DCACHE_OP_REVALIDATE) {
+		struct inode *lower_dir = ecryptfs_inode_to_lower(dir);
+		struct name_snapshot n;
+
+		take_dentry_name_snapshot(&n, lower_dentry);
+		rc = lower_dentry->d_op->d_revalidate(lower_dir, &n.name,
+						      lower_dentry, flags);
+		release_dentry_name_snapshot(&n);
+	}
 
 	if (d_really_is_positive(dentry)) {
 		struct inode *inode = d_inode(dentry);
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 97d2774760fe..e3b4feccba07 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -31,7 +31,8 @@ static inline void exfat_d_version_set(struct dentry *dentry,
  * If it happened, the negative dentry isn't actually negative anymore.  So,
  * drop it.
  */
-static int exfat_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int exfat_d_revalidate(struct inode *dir, const struct qstr *name,
+			      struct dentry *dentry, unsigned int flags)
 {
 	int ret;
 
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 15bf32c21ac0..f9cbd5c6f932 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -53,7 +53,8 @@ static int vfat_revalidate_shortname(struct dentry *dentry)
 	return ret;
 }
 
-static int vfat_revalidate(struct dentry *dentry, unsigned int flags)
+static int vfat_revalidate(struct inode *dir, const struct qstr *name,
+			   struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
@@ -64,7 +65,8 @@ static int vfat_revalidate(struct dentry *dentry, unsigned int flags)
 	return vfat_revalidate_shortname(dentry);
 }
 
-static int vfat_revalidate_ci(struct dentry *dentry, unsigned int flags)
+static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
+			      struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 494ac372ace0..d9e9f26917eb 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -192,7 +192,8 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
  * the lookup once more.  If the lookup results in the same inode,
  * then refresh the attributes, timeouts and mark the dentry valid.
  */
-static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
+static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *entry, unsigned int flags)
 {
 	struct inode *inode;
 	struct dentry *parent;
diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c
index 2e215e8c3c88..86c338901fab 100644
--- a/fs/gfs2/dentry.c
+++ b/fs/gfs2/dentry.c
@@ -21,7 +21,9 @@
 
 /**
  * gfs2_drevalidate - Check directory lookup consistency
- * @dentry: the mapping to check
+ * @dir: expected parent directory inode
+ * @name: expexted name
+ * @dentry: dentry to check
  * @flags: lookup flags
  *
  * Check to make sure the lookup necessary to arrive at this inode from its
@@ -30,7 +32,8 @@
  * Returns: 1 if the dentry is ok, 0 if it isn't
  */
 
-static int gfs2_drevalidate(struct dentry *dentry, unsigned int flags)
+static int gfs2_drevalidate(struct inode *dir, const struct qstr *name,
+			    struct dentry *dentry, unsigned int flags)
 {
 	struct dentry *parent;
 	struct gfs2_sbd *sdp;
diff --git a/fs/hfs/sysdep.c b/fs/hfs/sysdep.c
index 76fa02e3835b..ef54fc8093cf 100644
--- a/fs/hfs/sysdep.c
+++ b/fs/hfs/sysdep.c
@@ -13,7 +13,8 @@
 
 /* dentry case-handling: just lowercase everything */
 
-static int hfs_revalidate_dentry(struct dentry *dentry, unsigned int flags)
+static int hfs_revalidate_dentry(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	int diff;
diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index d68a4e6ac345..fc8ede43afde 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -1576,7 +1576,8 @@ static int jfs_ci_compare(const struct dentry *dentry,
 	return result;
 }
 
-static int jfs_ci_revalidate(struct dentry *dentry, unsigned int flags)
+static int jfs_ci_revalidate(struct inode *dir, const struct qstr *name,
+			     struct dentry *dentry, unsigned int flags)
 {
 	/*
 	 * This is not negative dentry. Always valid.
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 458519e416fe..5f0f8b95f44c 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1109,7 +1109,8 @@ struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent,
 	return ERR_PTR(rc);
 }
 
-static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
+static int kernfs_dop_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	struct kernfs_node *kn;
 	struct kernfs_root *root;
diff --git a/fs/namei.c b/fs/namei.c
index 9d30c7aa9aa6..77e5d136faaf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -921,10 +921,11 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 	return false;
 }
 
-static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
+static inline int d_revalidate(struct inode *dir, const struct qstr *name,
+			       struct dentry *dentry, unsigned int flags)
 {
 	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
-		return dentry->d_op->d_revalidate(dentry, flags);
+		return dentry->d_op->d_revalidate(dir, name, dentry, flags);
 	else
 		return 1;
 }
@@ -1652,7 +1653,7 @@ static struct dentry *lookup_dcache(const struct qstr *name,
 {
 	struct dentry *dentry = d_lookup(dir, name);
 	if (dentry) {
-		int error = d_revalidate(dentry, flags);
+		int error = d_revalidate(dir->d_inode, name, dentry, flags);
 		if (unlikely(error <= 0)) {
 			if (!error)
 				d_invalidate(dentry);
@@ -1737,19 +1738,20 @@ static struct dentry *lookup_fast(struct nameidata *nd)
 		if (read_seqcount_retry(&parent->d_seq, nd->seq))
 			return ERR_PTR(-ECHILD);
 
-		status = d_revalidate(dentry, nd->flags);
+		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags);
 		if (likely(status > 0))
 			return dentry;
 		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
-			status = d_revalidate(dentry, nd->flags);
+			status = d_revalidate(nd->inode, &nd->last,
+					      dentry, nd->flags);
 	} else {
 		dentry = __d_lookup(parent, &nd->last);
 		if (unlikely(!dentry))
 			return NULL;
-		status = d_revalidate(dentry, nd->flags);
+		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags);
 	}
 	if (unlikely(status <= 0)) {
 		if (!status)
@@ -1777,7 +1779,7 @@ static struct dentry *__lookup_slow(const struct qstr *name,
 	if (IS_ERR(dentry))
 		return dentry;
 	if (unlikely(!d_in_lookup(dentry))) {
-		int error = d_revalidate(dentry, flags);
+		int error = d_revalidate(inode, name, dentry, flags);
 		if (unlikely(error <= 0)) {
 			if (!error) {
 				d_invalidate(dentry);
@@ -3575,7 +3577,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		if (d_in_lookup(dentry))
 			break;
 
-		error = d_revalidate(dentry, nd->flags);
+		error = d_revalidate(dir_inode, &nd->last, dentry, nd->flags);
 		if (likely(error > 0))
 			break;
 		if (error)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 492cffd9d3d8..9910d9796f4c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1814,7 +1814,8 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
 	return ret;
 }
 
-static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
+static int nfs_lookup_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	return __nfs_lookup_revalidate(dentry, flags, nfs_do_lookup_revalidate);
 }
@@ -2025,7 +2026,8 @@ void nfs_d_prune_case_insensitive_aliases(struct inode *inode)
 EXPORT_SYMBOL_GPL(nfs_d_prune_case_insensitive_aliases);
 
 #if IS_ENABLED(CONFIG_NFS_V4)
-static int nfs4_lookup_revalidate(struct dentry *, unsigned int);
+static int nfs4_lookup_revalidate(struct inode *, const struct qstr *,
+				  struct dentry *, unsigned int);
 
 const struct dentry_operations nfs4_dentry_operations = {
 	.d_revalidate	= nfs4_lookup_revalidate,
@@ -2260,7 +2262,8 @@ nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 	return nfs_do_lookup_revalidate(dir, dentry, flags);
 }
 
-static int nfs4_lookup_revalidate(struct dentry *dentry, unsigned int flags)
+static int nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
 {
 	return __nfs_lookup_revalidate(dentry, flags,
 			nfs4_do_lookup_revalidate);
diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c
index a9b8688aaf30..ecb1ce6301c4 100644
--- a/fs/ocfs2/dcache.c
+++ b/fs/ocfs2/dcache.c
@@ -32,7 +32,8 @@ void ocfs2_dentry_attach_gen(struct dentry *dentry)
 }
 
 
-static int ocfs2_dentry_revalidate(struct dentry *dentry, unsigned int flags)
+static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				   struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	int ret = 0;    /* if all else fails, just return false */
diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
index 395a00ed8ac7..c32c9a86e8d0 100644
--- a/fs/orangefs/dcache.c
+++ b/fs/orangefs/dcache.c
@@ -92,7 +92,8 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
  *
  * Should return 1 if dentry can still be trusted, else 0.
  */
-static int orangefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int orangefs_d_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	int ret;
 	unsigned long time = (unsigned long) dentry->d_fsdata;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index fe511192f83c..86ae6f6da36b 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -91,7 +91,24 @@ static int ovl_revalidate_real(struct dentry *d, unsigned int flags, bool weak)
 		if (d->d_flags & DCACHE_OP_WEAK_REVALIDATE)
 			ret =  d->d_op->d_weak_revalidate(d, flags);
 	} else if (d->d_flags & DCACHE_OP_REVALIDATE) {
-		ret = d->d_op->d_revalidate(d, flags);
+		struct dentry *parent;
+		struct inode *dir;
+		struct name_snapshot n;
+
+		if (flags & LOOKUP_RCU) {
+			parent = READ_ONCE(d->d_parent);
+			dir = d_inode_rcu(parent);
+			if (!dir)
+				return -ECHILD;
+		} else {
+			parent = dget_parent(d);
+			dir = d_inode(parent);
+		}
+		take_dentry_name_snapshot(&n, d);
+		ret = d->d_op->d_revalidate(dir, &n.name, d, flags);
+		release_dentry_name_snapshot(&n);
+		if (!(flags & LOOKUP_RCU))
+			dput(parent);
 		if (!ret) {
 			if (!(flags & LOOKUP_RCU))
 				d_invalidate(d);
@@ -127,7 +144,8 @@ static int ovl_dentry_revalidate_common(struct dentry *dentry,
 	return ret;
 }
 
-static int ovl_dentry_revalidate(struct dentry *dentry, unsigned int flags)
+static int ovl_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	return ovl_dentry_revalidate_common(dentry, flags, false);
 }
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0edf14a9840e..fb5493d0edf0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2058,7 +2058,8 @@ void pid_update_inode(struct task_struct *task, struct inode *inode)
  * performed a setuid(), etc.
  *
  */
-static int pid_revalidate(struct dentry *dentry, unsigned int flags)
+static int pid_revalidate(struct inode *dir, const struct qstr *name,
+			  struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	struct task_struct *task;
@@ -2191,7 +2192,8 @@ static int dname_to_vma_addr(struct dentry *dentry,
 	return 0;
 }
 
-static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int map_files_d_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
 {
 	unsigned long vm_start, vm_end;
 	bool exact_vma_exists = false;
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 24baf23e864f..37aa778d1af7 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -140,7 +140,8 @@ static void tid_fd_update_inode(struct task_struct *task, struct inode *inode,
 	security_task_to_inode(task, inode);
 }
 
-static int tid_fd_revalidate(struct dentry *dentry, unsigned int flags)
+static int tid_fd_revalidate(struct inode *dir, const struct qstr *name,
+			     struct dentry *dentry, unsigned int flags)
 {
 	struct task_struct *task;
 	struct inode *inode;
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index dbe82cf23ee4..8ec90826a49e 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -216,7 +216,8 @@ void proc_free_inum(unsigned int inum)
 	ida_free(&proc_inum_ida, inum - PROC_DYNAMIC_FIRST);
 }
 
-static int proc_misc_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int proc_misc_d_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
@@ -343,7 +344,8 @@ static const struct file_operations proc_dir_operations = {
 	.iterate_shared		= proc_readdir,
 };
 
-static int proc_net_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int proc_net_d_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	return 0;
 }
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 27a283d85a6e..cc9d74a06ff0 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -884,7 +884,8 @@ static const struct inode_operations proc_sys_dir_operations = {
 	.getattr	= proc_sys_getattr,
 };
 
-static int proc_sys_revalidate(struct dentry *dentry, unsigned int flags)
+static int proc_sys_revalidate(struct inode *dir, const struct qstr *name,
+			       struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
diff --git a/fs/smb/client/dir.c b/fs/smb/client/dir.c
index 864b194dbaa0..8c5d44ee91ed 100644
--- a/fs/smb/client/dir.c
+++ b/fs/smb/client/dir.c
@@ -737,7 +737,8 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
 }
 
 static int
-cifs_d_revalidate(struct dentry *direntry, unsigned int flags)
+cifs_d_revalidate(struct inode *dir, const struct qstr *name,
+		  struct dentry *direntry, unsigned int flags)
 {
 	struct inode *inode;
 	int rc;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index cfc614c638da..53214499e384 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -457,7 +457,8 @@ static void tracefs_d_release(struct dentry *dentry)
 		eventfs_d_release(dentry);
 }
 
-static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int tracefs_d_revalidate(struct inode *inode, const struct qstr *name,
+				struct dentry *dentry, unsigned int flags)
 {
 	struct eventfs_inode *ei = dentry->d_fsdata;
 
diff --git a/fs/vboxsf/dir.c b/fs/vboxsf/dir.c
index 5f1a14d5b927..a859ac9b74ba 100644
--- a/fs/vboxsf/dir.c
+++ b/fs/vboxsf/dir.c
@@ -192,7 +192,8 @@ const struct file_operations vboxsf_dir_fops = {
  * This is called during name resolution/lookup to check if the @dentry in
  * the cache is still valid. the job is handled by vboxsf_inode_revalidate.
  */
-static int vboxsf_dentry_revalidate(struct dentry *dentry, unsigned int flags)
+static int vboxsf_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				    struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 8bc567a35718..4a6bdadf2f29 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -144,7 +144,8 @@ enum d_real_type {
 };
 
 struct dentry_operations {
-	int (*d_revalidate)(struct dentry *, unsigned int);
+	int (*d_revalidate)(struct inode *, const struct qstr *,
+			    struct dentry *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
 	int (*d_compare)(const struct dentry *,
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 772f822dc6b8..18855cb44b1c 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -192,7 +192,8 @@ struct fscrypt_operations {
 					     unsigned int *num_devs);
 };
 
-int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags);
+int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
+			 struct dentry *dentry, unsigned int flags);
 
 static inline struct fscrypt_inode_info *
 fscrypt_get_inode_info(const struct inode *inode)
@@ -711,8 +712,8 @@ static inline u64 fscrypt_fname_siphash(const struct inode *dir,
 	return 0;
 }
 
-static inline int fscrypt_d_revalidate(struct dentry *dentry,
-				       unsigned int flags)
+static inline int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
+				       struct dentry *dentry, unsigned int flags)
 {
 	return 1;
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (5 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  2:42   ` [PATCH 09/20] ceph_d_revalidate(): use stable " Al Viro
                     ` (12 subsequent siblings)
  19 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to bother with boilerplate for obtaining the latter and for
the former we really should not count upon ->d_name.name remaining
stable under us.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/afs/dir.c | 34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 9780013cd83a..c6ee6257d4c6 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -607,19 +607,19 @@ static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name,
  * Do a lookup of a single name in a directory
  * - just returns the FID the dentry name maps to if found
  */
-static int afs_do_lookup_one(struct inode *dir, struct dentry *dentry,
+static int afs_do_lookup_one(struct inode *dir, const struct qstr *name,
 			     struct afs_fid *fid, struct key *key,
 			     afs_dataversion_t *_dir_version)
 {
 	struct afs_super_info *as = dir->i_sb->s_fs_info;
 	struct afs_lookup_one_cookie cookie = {
 		.ctx.actor = afs_lookup_one_filldir,
-		.name = dentry->d_name,
+		.name = *name,
 		.fid.vid = as->volume->vid
 	};
 	int ret;
 
-	_enter("{%lu},%p{%pd},", dir->i_ino, dentry, dentry);
+	_enter("{%lu},{%s},", dir->i_ino, name->name);
 
 	/* search the directory */
 	ret = afs_dir_iterate(dir, &cookie.ctx, key, _dir_version);
@@ -1052,21 +1052,12 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
 /*
  * Check the validity of a dentry under RCU conditions.
  */
-static int afs_d_revalidate_rcu(struct dentry *dentry)
+static int afs_d_revalidate_rcu(struct afs_vnode *dvnode, struct dentry *dentry)
 {
-	struct afs_vnode *dvnode;
-	struct dentry *parent;
-	struct inode *dir;
 	long dir_version, de_version;
 
 	_enter("%p", dentry);
 
-	/* Check the parent directory is still valid first. */
-	parent = READ_ONCE(dentry->d_parent);
-	dir = d_inode_rcu(parent);
-	if (!dir)
-		return -ECHILD;
-	dvnode = AFS_FS_I(dir);
 	if (test_bit(AFS_VNODE_DELETED, &dvnode->flags))
 		return -ECHILD;
 
@@ -1097,9 +1088,8 @@ static int afs_d_revalidate_rcu(struct dentry *dentry)
 static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 			    struct dentry *dentry, unsigned int flags)
 {
-	struct afs_vnode *vnode, *dir;
+	struct afs_vnode *vnode, *dir = AFS_FS_I(parent_dir);
 	struct afs_fid fid;
-	struct dentry *parent;
 	struct inode *inode;
 	struct key *key;
 	afs_dataversion_t dir_version, invalid_before;
@@ -1107,7 +1097,7 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	int ret;
 
 	if (flags & LOOKUP_RCU)
-		return afs_d_revalidate_rcu(dentry);
+		return afs_d_revalidate_rcu(dir, dentry);
 
 	if (d_really_is_positive(dentry)) {
 		vnode = AFS_FS_I(d_inode(dentry));
@@ -1122,14 +1112,9 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	if (IS_ERR(key))
 		key = NULL;
 
-	/* Hold the parent dentry so we can peer at it */
-	parent = dget_parent(dentry);
-	dir = AFS_FS_I(d_inode(parent));
-
 	/* validate the parent directory */
 	ret = afs_validate(dir, key);
 	if (ret == -ERESTARTSYS) {
-		dput(parent);
 		key_put(key);
 		return ret;
 	}
@@ -1157,7 +1142,7 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	afs_stat_v(dir, n_reval);
 
 	/* search the directory for this vnode */
-	ret = afs_do_lookup_one(&dir->netfs.inode, dentry, &fid, key, &dir_version);
+	ret = afs_do_lookup_one(&dir->netfs.inode, name, &fid, key, &dir_version);
 	switch (ret) {
 	case 0:
 		/* the filename maps to something */
@@ -1201,22 +1186,19 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 		goto out_valid;
 
 	default:
-		_debug("failed to iterate dir %pd: %d",
-		       parent, ret);
+		_debug("failed to iterate parent %pd2: %d", dentry, ret);
 		goto not_found;
 	}
 
 out_valid:
 	dentry->d_fsdata = (void *)(unsigned long)dir_version;
 out_valid_noupdate:
-	dput(parent);
 	key_put(key);
 	_leave(" = 1 [valid]");
 	return 1;
 
 not_found:
 	_debug("dropping dentry %pd2", dentry);
-	dput(parent);
 	key_put(key);
 
 	_leave(" = 0 [bad]");
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 09/20] ceph_d_revalidate(): use stable parent inode passed by caller
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (6 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10 19:45     ` Viacheslav Dubeyko
  2025-01-10  2:42   ` [PATCH 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
                     ` (11 subsequent siblings)
  19 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to mess with the boilerplate for obtaining what we already
have.  Note that ceph is one of the "will want a path from filesystem
root if we want to talk to server" cases, so the name of the last
component is of little use - it is passed to fscrypt_d_revalidate()
and it's used to deal with (also crypt-related) case in request
marshalling, when encrypted name turns out to be too long.  The former
is not a problem, but the latter is racy; that part will be handled
in the next commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ceph/dir.c | 22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index c4c71c24221b..dc5f55bebad7 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1940,30 +1940,19 @@ static int dir_lease_is_valid(struct inode *dir, struct dentry *dentry,
 /*
  * Check if cached dentry can be trusted.
  */
-static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+static int ceph_d_revalidate(struct inode *dir, const struct qstr *name,
 			     struct dentry *dentry, unsigned int flags)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_fs_client(dentry->d_sb)->mdsc;
 	struct ceph_client *cl = mdsc->fsc->client;
 	int valid = 0;
-	struct dentry *parent;
-	struct inode *dir, *inode;
+	struct inode *inode;
 
-	valid = fscrypt_d_revalidate(parent_dir, name, dentry, flags);
+	valid = fscrypt_d_revalidate(dir, name, dentry, flags);
 	if (valid <= 0)
 		return valid;
 
-	if (flags & LOOKUP_RCU) {
-		parent = READ_ONCE(dentry->d_parent);
-		dir = d_inode_rcu(parent);
-		if (!dir)
-			return -ECHILD;
-		inode = d_inode_rcu(dentry);
-	} else {
-		parent = dget_parent(dentry);
-		dir = d_inode(parent);
-		inode = d_inode(dentry);
-	}
+	inode = d_inode_rcu(dentry);
 
 	doutc(cl, "%p '%pd' inode %p offset 0x%llx nokey %d\n",
 	      dentry, dentry, inode, ceph_dentry(dentry)->offset,
@@ -2039,9 +2028,6 @@ static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	doutc(cl, "%p '%pd' %s\n", dentry, dentry, valid ? "valid" : "invalid");
 	if (!valid)
 		ceph_dir_clear_complete(dir);
-
-	if (!(flags & LOOKUP_RCU))
-		dput(parent);
 	return valid;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 10/20] ceph_d_revalidate(): propagate stable name down into request enconding
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (7 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 09/20] ceph_d_revalidate(): use stable " Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  2:42   ` [PATCH 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
                     ` (10 subsequent siblings)
  19 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Currently get_fscrypt_altname() requires ->r_dentry->d_name to be stable
and it gets that in almost all cases.  The only exception is ->d_revalidate(),
where we have a stable name, but it's passed separately - dentry->d_name
is not stable there.

Propagate it down to get_fscrypt_altname() as a new field of struct
ceph_mds_request - ->r_dname, to be used instead ->r_dentry->d_name
when non-NULL.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ceph/dir.c        | 2 ++
 fs/ceph/mds_client.c | 9 ++++++---
 fs/ceph/mds_client.h | 2 ++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index dc5f55bebad7..62e99e65250d 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1998,6 +1998,8 @@ static int ceph_d_revalidate(struct inode *dir, const struct qstr *name,
 			req->r_parent = dir;
 			ihold(dir);
 
+			req->r_dname = name;
+
 			mask = CEPH_STAT_CAP_INODE | CEPH_CAP_AUTH_SHARED;
 			if (ceph_security_xattr_wanted(dir))
 				mask |= CEPH_CAP_XATTR_SHARED;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 219a2cc2bf3c..3b766b984713 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2621,6 +2621,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
 {
 	struct inode *dir = req->r_parent;
 	struct dentry *dentry = req->r_dentry;
+	const struct qstr *name = req->r_dname;
 	u8 *cryptbuf = NULL;
 	u32 len = 0;
 	int ret = 0;
@@ -2641,8 +2642,10 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
 	if (!fscrypt_has_encryption_key(dir))
 		goto success;
 
-	if (!fscrypt_fname_encrypted_size(dir, dentry->d_name.len, NAME_MAX,
-					  &len)) {
+	if (!name)
+		name = &dentry->d_name;
+
+	if (!fscrypt_fname_encrypted_size(dir, name->len, NAME_MAX, &len)) {
 		WARN_ON_ONCE(1);
 		return ERR_PTR(-ENAMETOOLONG);
 	}
@@ -2657,7 +2660,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
 	if (!cryptbuf)
 		return ERR_PTR(-ENOMEM);
 
-	ret = fscrypt_fname_encrypt(dir, &dentry->d_name, cryptbuf, len);
+	ret = fscrypt_fname_encrypt(dir, name, cryptbuf, len);
 	if (ret) {
 		kfree(cryptbuf);
 		return ERR_PTR(ret);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 38bb7e0d2d79..7c9fee9e80d4 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -299,6 +299,8 @@ struct ceph_mds_request {
 	struct inode *r_target_inode;       /* resulting inode */
 	struct inode *r_new_inode;	    /* new inode (for creates) */
 
+	const struct qstr *r_dname;	    /* stable name (for ->d_revalidate) */
+
 #define CEPH_MDS_R_DIRECT_IS_HASH	(1) /* r_direct_hash is valid */
 #define CEPH_MDS_R_ABORTED		(2) /* call was aborted */
 #define CEPH_MDS_R_GOT_UNSAFE		(3) /* got an unsafe reply */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (8 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  2:42   ` [PATCH 12/20] exfat_d_revalidate(): " Al Viro
                     ` (9 subsequent siblings)
  19 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

The only thing it's using is parent directory inode and we are already
given a stable reference to that - no need to bother with boilerplate.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/crypto/fname.c | 21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 389f5b2bf63b..010f9c0a4c2f 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -574,12 +574,10 @@ EXPORT_SYMBOL_GPL(fscrypt_fname_siphash);
  * Validate dentries in encrypted directories to make sure we aren't potentially
  * caching stale dentries after a key has been added.
  */
-int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
 			 struct dentry *dentry, unsigned int flags)
 {
-	struct dentry *dir;
 	int err;
-	int valid;
 
 	/*
 	 * Plaintext names are always valid, since fscrypt doesn't support
@@ -592,30 +590,21 @@ int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	/*
 	 * No-key name; valid if the directory's key is still unavailable.
 	 *
-	 * Although fscrypt forbids rename() on no-key names, we still must use
-	 * dget_parent() here rather than use ->d_parent directly.  That's
-	 * because a corrupted fs image may contain directory hard links, which
-	 * the VFS handles by moving the directory's dentry tree in the dcache
-	 * each time ->lookup() finds the directory and it already has a dentry
-	 * elsewhere.  Thus ->d_parent can be changing, and we must safely grab
-	 * a reference to some ->d_parent to prevent it from being freed.
+	 * Note in RCU mode we have to bail if we get here -
+	 * fscrypt_get_encryption_info() may block.
 	 */
 
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	dir = dget_parent(dentry);
 	/*
 	 * Pass allow_unsupported=true, so that files with an unsupported
 	 * encryption policy can be deleted.
 	 */
-	err = fscrypt_get_encryption_info(d_inode(dir), true);
-	valid = !fscrypt_has_encryption_key(d_inode(dir));
-	dput(dir);
-
+	err = fscrypt_get_encryption_info(dir, true);
 	if (err < 0)
 		return err;
 
-	return valid;
+	return !fscrypt_has_encryption_key(dir);
 }
 EXPORT_SYMBOL_GPL(fscrypt_d_revalidate);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 12/20] exfat_d_revalidate(): use stable parent inode passed by caller
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (9 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  2:42   ` [PATCH 13/20] vfat_revalidate{,_ci}(): " Al Viro
                     ` (8 subsequent siblings)
  19 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... no need to bother with ->d_lock and ->d_parent->d_inode.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/exfat/namei.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index e3b4feccba07..61c7164b85b3 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -34,8 +34,6 @@ static inline void exfat_d_version_set(struct dentry *dentry,
 static int exfat_d_revalidate(struct inode *dir, const struct qstr *name,
 			      struct dentry *dentry, unsigned int flags)
 {
-	int ret;
-
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
@@ -59,11 +57,7 @@ static int exfat_d_revalidate(struct inode *dir, const struct qstr *name,
 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
 		return 0;
 
-	spin_lock(&dentry->d_lock);
-	ret = inode_eq_iversion(d_inode(dentry->d_parent),
-			exfat_d_version(dentry));
-	spin_unlock(&dentry->d_lock);
-	return ret;
+	return inode_eq_iversion(dir, exfat_d_version(dentry));
 }
 
 /* returns the length of a struct qstr, ignoring trailing dots if necessary */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 13/20] vfat_revalidate{,_ci}(): use stable parent inode passed by caller
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (10 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 12/20] exfat_d_revalidate(): " Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  2:42   ` [PATCH 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
                     ` (7 subsequent siblings)
  19 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/fat/namei_vfat.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index f9cbd5c6f932..926c26e90ef8 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -43,14 +43,9 @@ static inline void vfat_d_version_set(struct dentry *dentry,
  * If it happened, the negative dentry isn't actually negative
  * anymore.  So, drop it.
  */
-static int vfat_revalidate_shortname(struct dentry *dentry)
+static bool vfat_revalidate_shortname(struct dentry *dentry, struct inode *dir)
 {
-	int ret = 1;
-	spin_lock(&dentry->d_lock);
-	if (!inode_eq_iversion(d_inode(dentry->d_parent), vfat_d_version(dentry)))
-		ret = 0;
-	spin_unlock(&dentry->d_lock);
-	return ret;
+	return inode_eq_iversion(dir, vfat_d_version(dentry));
 }
 
 static int vfat_revalidate(struct inode *dir, const struct qstr *name,
@@ -62,7 +57,7 @@ static int vfat_revalidate(struct inode *dir, const struct qstr *name,
 	/* This is not negative dentry. Always valid. */
 	if (d_really_is_positive(dentry))
 		return 1;
-	return vfat_revalidate_shortname(dentry);
+	return vfat_revalidate_shortname(dentry, dir);
 }
 
 static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
@@ -99,7 +94,7 @@ static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
 		return 0;
 
-	return vfat_revalidate_shortname(dentry);
+	return vfat_revalidate_shortname(dentry, dir);
 }
 
 /* returns the length of a struct qstr, ignoring trailing dots */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 14/20] fuse_dentry_revalidate(): use stable parent inode and name passed by caller
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (11 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 13/20] vfat_revalidate{,_ci}(): " Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  2:42   ` [PATCH 15/20] gfs2_drevalidate(): " Al Viro
                     ` (6 subsequent siblings)
  19 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable - it's a real-life UAF.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/fuse/dir.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d9e9f26917eb..7e93a8470c36 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -196,7 +196,6 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
 				  struct dentry *entry, unsigned int flags)
 {
 	struct inode *inode;
-	struct dentry *parent;
 	struct fuse_mount *fm;
 	struct fuse_inode *fi;
 	int ret;
@@ -228,11 +227,9 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
 
 		attr_version = fuse_get_attr_version(fm->fc);
 
-		parent = dget_parent(entry);
-		fuse_lookup_init(fm->fc, &args, get_node_id(d_inode(parent)),
-				 &entry->d_name, &outarg);
+		fuse_lookup_init(fm->fc, &args, get_node_id(dir),
+				 name, &outarg);
 		ret = fuse_simple_request(fm, &args);
-		dput(parent);
 		/* Zero nodeid is same as -ENOENT */
 		if (!ret && !outarg.nodeid)
 			ret = -ENOENT;
@@ -266,9 +263,7 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
 			if (test_bit(FUSE_I_INIT_RDPLUS, &fi->state))
 				return -ECHILD;
 		} else if (test_and_clear_bit(FUSE_I_INIT_RDPLUS, &fi->state)) {
-			parent = dget_parent(entry);
-			fuse_advise_use_readdirplus(d_inode(parent));
-			dput(parent);
+			fuse_advise_use_readdirplus(dir);
 		}
 	}
 	ret = 1;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 15/20] gfs2_drevalidate(): use stable parent inode and name passed by caller
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (12 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10 19:20     ` Andreas Grünbacher
  2025-01-10  2:42   ` [PATCH 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
                     ` (5 subsequent siblings)
  19 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable.  Again, a UAF there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/gfs2/dentry.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c
index 86c338901fab..95050e719233 100644
--- a/fs/gfs2/dentry.c
+++ b/fs/gfs2/dentry.c
@@ -35,48 +35,40 @@
 static int gfs2_drevalidate(struct inode *dir, const struct qstr *name,
 			    struct dentry *dentry, unsigned int flags)
 {
-	struct dentry *parent;
-	struct gfs2_sbd *sdp;
-	struct gfs2_inode *dip;
+	struct gfs2_sbd *sdp = GFS2_SB(dir);
+	struct gfs2_inode *dip = GFS2_I(dir);
 	struct inode *inode;
 	struct gfs2_holder d_gh;
 	struct gfs2_inode *ip = NULL;
-	int error, valid = 0;
+	int error, valid;
 	int had_lock = 0;
 
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	parent = dget_parent(dentry);
-	sdp = GFS2_SB(d_inode(parent));
-	dip = GFS2_I(d_inode(parent));
 	inode = d_inode(dentry);
 
 	if (inode) {
 		if (is_bad_inode(inode))
-			goto out;
+			return 0;
 		ip = GFS2_I(inode);
 	}
 
-	if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL) {
-		valid = 1;
-		goto out;
-	}
+	if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL)
+		return 1;
 
 	had_lock = (gfs2_glock_is_locked_by_me(dip->i_gl) != NULL);
 	if (!had_lock) {
 		error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &d_gh);
 		if (error)
-			goto out;
+			return 0;
 	}
 
-	error = gfs2_dir_check(d_inode(parent), &dentry->d_name, ip);
+	error = gfs2_dir_check(dir, name, ip);
 	valid = inode ? !error : (error == -ENOENT);
 
 	if (!had_lock)
 		gfs2_glock_dq_uninit(&d_gh);
-out:
-	dput(parent);
 	return valid;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 16/20] nfs{,4}_lookup_validate(): use stable parent inode passed by caller
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (13 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 15/20] gfs2_drevalidate(): " Al Viro
@ 2025-01-10  2:42   ` Al Viro
  2025-01-10  2:43   ` [PATCH 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
                     ` (4 subsequent siblings)
  19 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:42 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

we can't kill __nfs_lookup_revalidate() completely, but ->d_parent boilerplate
in it is gone

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/nfs/dir.c | 43 +++++++++++++------------------------------
 1 file changed, 13 insertions(+), 30 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 9910d9796f4c..c28983ee75ca 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1732,8 +1732,8 @@ static int nfs_lookup_revalidate_dentry(struct inode *dir,
  * cached dentry and do a new lookup.
  */
 static int
-nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
-			 unsigned int flags)
+nfs_do_lookup_revalidate(struct inode *dir, const struct qstr *name,
+			 struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	int error = 0;
@@ -1785,39 +1785,26 @@ nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 }
 
 static int
-__nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
-			int (*reval)(struct inode *, struct dentry *, unsigned int))
+__nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 {
-	struct dentry *parent;
-	struct inode *dir;
-	int ret;
-
 	if (flags & LOOKUP_RCU) {
 		if (dentry->d_fsdata == NFS_FSDATA_BLOCKED)
 			return -ECHILD;
-		parent = READ_ONCE(dentry->d_parent);
-		dir = d_inode_rcu(parent);
-		if (!dir)
-			return -ECHILD;
-		ret = reval(dir, dentry, flags);
-		if (parent != READ_ONCE(dentry->d_parent))
-			return -ECHILD;
 	} else {
 		/* Wait for unlink to complete - see unblock_revalidate() */
 		wait_var_event(&dentry->d_fsdata,
 			       smp_load_acquire(&dentry->d_fsdata)
 			       != NFS_FSDATA_BLOCKED);
-		parent = dget_parent(dentry);
-		ret = reval(d_inode(parent), dentry, flags);
-		dput(parent);
 	}
-	return ret;
+	return 0;
 }
 
 static int nfs_lookup_revalidate(struct inode *dir, const struct qstr *name,
 				 struct dentry *dentry, unsigned int flags)
 {
-	return __nfs_lookup_revalidate(dentry, flags, nfs_do_lookup_revalidate);
+	if (__nfs_lookup_revalidate(dentry, flags))
+		return -ECHILD;
+	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
 }
 
 static void block_revalidate(struct dentry *dentry)
@@ -2216,11 +2203,14 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 EXPORT_SYMBOL_GPL(nfs_atomic_open);
 
 static int
-nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
-			  unsigned int flags)
+nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
+		       struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 
+	if (__nfs_lookup_revalidate(dentry, flags))
+		return -ECHILD;
+
 	trace_nfs_lookup_revalidate_enter(dir, dentry, flags);
 
 	if (!(flags & LOOKUP_OPEN) || (flags & LOOKUP_DIRECTORY))
@@ -2259,14 +2249,7 @@ nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
 
 full_reval:
-	return nfs_do_lookup_revalidate(dir, dentry, flags);
-}
-
-static int nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
-				  struct dentry *dentry, unsigned int flags)
-{
-	return __nfs_lookup_revalidate(dentry, flags,
-			nfs4_do_lookup_revalidate);
+	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
 }
 
 #endif /* CONFIG_NFSV4 */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (14 preceding siblings ...)
  2025-01-10  2:42   ` [PATCH 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
@ 2025-01-10  2:43   ` Al Viro
  2025-01-10  2:43   ` [PATCH 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
                     ` (3 subsequent siblings)
  19 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Pass the stable name all the way down to ->rpc_ops->lookup() instances.

Note that passing &dentry->d_name is safe in e.g. nfs_lookup() - it *is*
stable there, as it is in ->create() et.al.

dget_parent() in nfs_instantiate() should be redundant - it'd better be
stable there; if it's not, we have more trouble, since ->d_name would
also be unsafe in such case.

nfs_submount() and nfs4_submount() may or may not require fixes - if
they ever get moved on server with fhandle preserved, we are in trouble
there...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/nfs/dir.c            | 14 ++++++++------
 fs/nfs/namespace.c      |  2 +-
 fs/nfs/nfs3proc.c       |  5 ++---
 fs/nfs/nfs4proc.c       | 20 ++++++++++----------
 fs/nfs/proc.c           |  6 +++---
 include/linux/nfs_xdr.h |  2 +-
 6 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index c28983ee75ca..2b04038b0e40 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1672,7 +1672,7 @@ nfs_lookup_revalidate_delegated(struct inode *dir, struct dentry *dentry,
 	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
 }
 
-static int nfs_lookup_revalidate_dentry(struct inode *dir,
+static int nfs_lookup_revalidate_dentry(struct inode *dir, const struct qstr *name,
 					struct dentry *dentry,
 					struct inode *inode, unsigned int flags)
 {
@@ -1690,7 +1690,7 @@ static int nfs_lookup_revalidate_dentry(struct inode *dir,
 		goto out;
 
 	dir_verifier = nfs_save_change_attribute(dir);
-	ret = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
+	ret = NFS_PROTO(dir)->lookup(dir, dentry, name, fhandle, fattr);
 	if (ret < 0)
 		goto out;
 
@@ -1775,7 +1775,7 @@ nfs_do_lookup_revalidate(struct inode *dir, const struct qstr *name,
 	if (NFS_STALE(inode))
 		goto out_bad;
 
-	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
+	return nfs_lookup_revalidate_dentry(dir, name, dentry, inode, flags);
 out_valid:
 	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
 out_bad:
@@ -1970,7 +1970,8 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in
 
 	dir_verifier = nfs_save_change_attribute(dir);
 	trace_nfs_lookup_enter(dir, dentry, flags);
-	error = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
+	error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
+				       fhandle, fattr);
 	if (error == -ENOENT) {
 		if (nfs_server_capable(dir, NFS_CAP_CASE_INSENSITIVE))
 			dir_verifier = inode_peek_iversion_raw(dir);
@@ -2246,7 +2247,7 @@ nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
 reval_dentry:
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
-	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
+	return nfs_lookup_revalidate_dentry(dir, name, dentry, inode, flags);
 
 full_reval:
 	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
@@ -2305,7 +2306,8 @@ nfs_add_or_obtain(struct dentry *dentry, struct nfs_fh *fhandle,
 	d_drop(dentry);
 
 	if (fhandle->size == 0) {
-		error = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
+		error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
+					       fhandle, fattr);
 		if (error)
 			goto out_error;
 	}
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index 2d53574da605..973aed9cc5fe 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -308,7 +308,7 @@ int nfs_submount(struct fs_context *fc, struct nfs_server *server)
 	int err;
 
 	/* Look it up again to get its attributes */
-	err = server->nfs_client->rpc_ops->lookup(d_inode(parent), dentry,
+	err = server->nfs_client->rpc_ops->lookup(d_inode(parent), dentry, &dentry->d_name,
 						  ctx->mntfh, ctx->clone_data.fattr);
 	dput(parent);
 	if (err != 0)
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 1566163c6d85..ce70768e0201 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -192,7 +192,7 @@ __nfs3_proc_lookup(struct inode *dir, const char *name, size_t len,
 }
 
 static int
-nfs3_proc_lookup(struct inode *dir, struct dentry *dentry,
+nfs3_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
 		 struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	unsigned short task_flags = 0;
@@ -202,8 +202,7 @@ nfs3_proc_lookup(struct inode *dir, struct dentry *dentry,
 		task_flags |= RPC_TASK_TIMEOUT;
 
 	dprintk("NFS call  lookup %pd2\n", dentry);
-	return __nfs3_proc_lookup(dir, dentry->d_name.name,
-				  dentry->d_name.len, fhandle, fattr,
+	return __nfs3_proc_lookup(dir, name->name, name->len, fhandle, fattr,
 				  task_flags);
 }
 
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 405f17e6e0b4..4d85068e820d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4536,15 +4536,15 @@ nfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
 }
 
 static int _nfs4_proc_lookup(struct rpc_clnt *clnt, struct inode *dir,
-		struct dentry *dentry, struct nfs_fh *fhandle,
-		struct nfs_fattr *fattr)
+		struct dentry *dentry, const struct qstr *name,
+		struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	struct nfs_server *server = NFS_SERVER(dir);
 	int		       status;
 	struct nfs4_lookup_arg args = {
 		.bitmask = server->attr_bitmask,
 		.dir_fh = NFS_FH(dir),
-		.name = &dentry->d_name,
+		.name = name,
 	};
 	struct nfs4_lookup_res res = {
 		.server = server,
@@ -4586,17 +4586,16 @@ static void nfs_fixup_secinfo_attributes(struct nfs_fattr *fattr)
 }
 
 static int nfs4_proc_lookup_common(struct rpc_clnt **clnt, struct inode *dir,
-				   struct dentry *dentry, struct nfs_fh *fhandle,
-				   struct nfs_fattr *fattr)
+				   struct dentry *dentry, const struct qstr *name,
+				   struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	struct nfs4_exception exception = {
 		.interruptible = true,
 	};
 	struct rpc_clnt *client = *clnt;
-	const struct qstr *name = &dentry->d_name;
 	int err;
 	do {
-		err = _nfs4_proc_lookup(client, dir, dentry, fhandle, fattr);
+		err = _nfs4_proc_lookup(client, dir, dentry, name, fhandle, fattr);
 		trace_nfs4_lookup(dir, name, err);
 		switch (err) {
 		case -NFS4ERR_BADNAME:
@@ -4631,13 +4630,13 @@ static int nfs4_proc_lookup_common(struct rpc_clnt **clnt, struct inode *dir,
 	return err;
 }
 
-static int nfs4_proc_lookup(struct inode *dir, struct dentry *dentry,
+static int nfs4_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
 			    struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	int status;
 	struct rpc_clnt *client = NFS_CLIENT(dir);
 
-	status = nfs4_proc_lookup_common(&client, dir, dentry, fhandle, fattr);
+	status = nfs4_proc_lookup_common(&client, dir, dentry, name, fhandle, fattr);
 	if (client != NFS_CLIENT(dir)) {
 		rpc_shutdown_client(client);
 		nfs_fixup_secinfo_attributes(fattr);
@@ -4652,7 +4651,8 @@ nfs4_proc_lookup_mountpoint(struct inode *dir, struct dentry *dentry,
 	struct rpc_clnt *client = NFS_CLIENT(dir);
 	int status;
 
-	status = nfs4_proc_lookup_common(&client, dir, dentry, fhandle, fattr);
+	status = nfs4_proc_lookup_common(&client, dir, dentry, &dentry->d_name,
+					 fhandle, fattr);
 	if (status < 0)
 		return ERR_PTR(status);
 	return (client == NFS_CLIENT(dir)) ? rpc_clone_client(client) : client;
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 6c09cd090c34..77920a2e3cef 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -153,13 +153,13 @@ nfs_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
 }
 
 static int
-nfs_proc_lookup(struct inode *dir, struct dentry *dentry,
+nfs_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
 		struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	struct nfs_diropargs	arg = {
 		.fh		= NFS_FH(dir),
-		.name		= dentry->d_name.name,
-		.len		= dentry->d_name.len
+		.name		= name->name,
+		.len		= name->len
 	};
 	struct nfs_diropok	res = {
 		.fh		= fhandle,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 559273a0f16d..08b62bbf59f0 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1785,7 +1785,7 @@ struct nfs_rpc_ops {
 			    struct nfs_fattr *, struct inode *);
 	int	(*setattr) (struct dentry *, struct nfs_fattr *,
 			    struct iattr *);
-	int	(*lookup)  (struct inode *, struct dentry *,
+	int	(*lookup)  (struct inode *, struct dentry *, const struct qstr *,
 			    struct nfs_fh *, struct nfs_fattr *);
 	int	(*lookupp) (struct inode *, struct nfs_fh *,
 			    struct nfs_fattr *);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (15 preceding siblings ...)
  2025-01-10  2:43   ` [PATCH 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
@ 2025-01-10  2:43   ` Al Viro
  2025-01-10  9:54     ` Jan Kara
  2025-01-10  2:43   ` [PATCH 19/20] orangefs_d_revalidate(): " Al Viro
                     ` (2 subsequent siblings)
  19 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

theoretically, ->d_name use in there is a UAF, but only if you are messing with
tracepoints...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ocfs2/dcache.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c
index ecb1ce6301c4..1873bbbb7e5b 100644
--- a/fs/ocfs2/dcache.c
+++ b/fs/ocfs2/dcache.c
@@ -45,8 +45,7 @@ static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
 	inode = d_inode(dentry);
 	osb = OCFS2_SB(dentry->d_sb);
 
-	trace_ocfs2_dentry_revalidate(dentry, dentry->d_name.len,
-				      dentry->d_name.name);
+	trace_ocfs2_dentry_revalidate(dentry, name->len, name->name);
 
 	/* For a negative dentry -
 	 * check the generation number of the parent and compare with the
@@ -54,12 +53,8 @@ static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
 	 */
 	if (inode == NULL) {
 		unsigned long gen = (unsigned long) dentry->d_fsdata;
-		unsigned long pgen;
-		spin_lock(&dentry->d_lock);
-		pgen = OCFS2_I(d_inode(dentry->d_parent))->ip_dir_lock_gen;
-		spin_unlock(&dentry->d_lock);
-		trace_ocfs2_dentry_revalidate_negative(dentry->d_name.len,
-						       dentry->d_name.name,
+		unsigned long pgen = OCFS2_I(dir)->ip_dir_lock_gen;
+		trace_ocfs2_dentry_revalidate_negative(name->len, name->name,
 						       pgen, gen);
 		if (gen != pgen)
 			goto bail;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 19/20] orangefs_d_revalidate(): use stable parent inode and name passed by caller
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (16 preceding siblings ...)
  2025-01-10  2:43   ` [PATCH 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
@ 2025-01-10  2:43   ` Al Viro
  2025-01-10  3:06     ` Linus Torvalds
  2025-01-10  2:43   ` [PATCH 20/20] 9p: fix ->rename_sem exclusion Al Viro
  2025-01-10  9:21   ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Jan Kara
  19 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

->d_name use is a UAF.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/orangefs/dcache.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
index c32c9a86e8d0..060c94e9759b 100644
--- a/fs/orangefs/dcache.c
+++ b/fs/orangefs/dcache.c
@@ -13,10 +13,9 @@
 #include "orangefs-kernel.h"
 
 /* Returns 1 if dentry can still be trusted, else 0. */
-static int orangefs_revalidate_lookup(struct dentry *dentry)
+static int orangefs_revalidate_lookup(struct inode *parent_inode, const struct qstr *name,
+				      struct dentry *dentry)
 {
-	struct dentry *parent_dentry = dget_parent(dentry);
-	struct inode *parent_inode = parent_dentry->d_inode;
 	struct orangefs_inode_s *parent = ORANGEFS_I(parent_inode);
 	struct inode *inode = dentry->d_inode;
 	struct orangefs_kernel_op_s *new_op;
@@ -26,14 +25,12 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
 	gossip_debug(GOSSIP_DCACHE_DEBUG, "%s: attempting lookup.\n", __func__);
 
 	new_op = op_alloc(ORANGEFS_VFS_OP_LOOKUP);
-	if (!new_op) {
-		ret = -ENOMEM;
-		goto out_put_parent;
-	}
+	if (!new_op)
+		return -ENOMEM;
 
 	new_op->upcall.req.lookup.sym_follow = ORANGEFS_LOOKUP_LINK_NO_FOLLOW;
 	new_op->upcall.req.lookup.parent_refn = parent->refn;
-	strscpy(new_op->upcall.req.lookup.d_name, dentry->d_name.name);
+	strscpy(new_op->upcall.req.lookup.d_name, name->name);
 
 	gossip_debug(GOSSIP_DCACHE_DEBUG,
 		     "%s:%s:%d interrupt flag [%d]\n",
@@ -78,8 +75,6 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
 	ret = 1;
 out_release_op:
 	op_release(new_op);
-out_put_parent:
-	dput(parent_dentry);
 	return ret;
 out_drop:
 	gossip_debug(GOSSIP_DCACHE_DEBUG, "%s:%s:%d revalidate failed\n",
@@ -115,7 +110,7 @@ static int orangefs_d_revalidate(struct inode *dir, const struct qstr *name,
 	 * If this passes, the positive dentry still exists or the negative
 	 * dentry still does not exist.
 	 */
-	if (!orangefs_revalidate_lookup(dentry))
+	if (!orangefs_revalidate_lookup(dir, name, dentry))
 		return 0;
 
 	/* We do not need to continue with negative dentries. */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 20/20] 9p: fix ->rename_sem exclusion
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (17 preceding siblings ...)
  2025-01-10  2:43   ` [PATCH 19/20] orangefs_d_revalidate(): " Al Viro
@ 2025-01-10  2:43   ` Al Viro
  2025-01-10  3:11     ` Linus Torvalds
  2025-01-10  9:21   ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Jan Kara
  19 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-10  2:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.

->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain.  Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.

Solution:
	* have d_move() done in ->rename() rather than in its caller
	* maintain a 9p-private rwsem (per-filesystem)
	* hold it exclusive over the relevant part of ->rename()
	* hold it shared over the places where we want the path.

That almost works.  FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control.  However, there's
also __d_unalias(), which isn't covered by any of that.

If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.

An approach to fixing that would be a couple of optional methods, so that
__d_unalias() would do
	if alias->d_op->d_unalias_trylock != NULL
		if (!alias->d_op->d_unalias_trylock(alias))
			fail (resulting in -ESTALE from lookup)
	__d_move(...)
	if alias->d_op->d_unalias_unlock != NULL
		alias->d_unalias_unlock(alias)
where it currently does __d_move().  9p instances would be down_write_trylock()
and up_write() of ->rename_mutex.

However, to reduce dentry_operations bloat, let's add one method instead -
->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and
->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias).

Another possible variant would be to hold ->rename_sem exclusive around
d_splice_alias() calls in 9p ->lookup(), but that would cause a lot of
contention on that rwsem and it's filesystem-wide, so let's not go there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/locking.rst |  2 ++
 Documentation/filesystems/vfs.rst     | 19 +++++++++++++++++++
 fs/9p/v9fs.h                          |  2 +-
 fs/9p/vfs_dentry.c                    | 13 +++++++++++++
 fs/dcache.c                           |  6 ++++++
 include/linux/dcache.h                |  1 +
 6 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 146e7d8aa736..6e20282447a0 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -31,6 +31,7 @@ prototypes::
 	struct vfsmount *(*d_automount)(struct path *path);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_want_unalias)(const struct dentry *, bool);
 
 locking rules:
 
@@ -50,6 +51,7 @@ d_dname:	   no		no		no		no
 d_automount:	   no		no		yes		no
 d_manage:	   no		no		yes (ref-walk)	maybe
 d_real		   no		no		yes 		no
+d_want_unalias	   yes		no		no 		no
 ================== ===========	========	==============	========
 
 inode_operations
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c352ebaae98..07d4b4deb252 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1265,6 +1265,7 @@ defined:
 		struct vfsmount *(*d_automount)(struct path *);
 		int (*d_manage)(const struct path *, bool);
 		struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+		bool (*d_want_unalias)(const struct dentry *, bool);
 	};
 
 ``d_revalidate``
@@ -1428,6 +1429,24 @@ defined:
 
 	For non-regular files, the 'dentry' argument is returned.
 
+``d_want_unalias``
+	if present, will be called by d_splice_alias() before and after
+	moving a preexisting attached alias.  The second argument is
+	true for call before __d_move() and false for the call after.
+	Returning false on the first call prevents __d_move(), making
+	d_splice_alias() fail with -ESTALE; return value on the second
+	call is ignored.
+
+	Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move()
+	and d_exchange() calls from the outside of filesystem methods;
+	however, it does not guarantee that attached dentries won't
+	be renamed or moved by d_splice_alias() finding a preexisting
+	alias for a directory inode.  Normally we would not care;
+	however, something that wants to stabilize the entire path to
+	root over a blocking operation might need that.  See 9p for one
+	(and hopefully only) example.
+
+
 Each dentry has a pointer to its parent dentry, as well as a hash list
 of child dentries.  Child dentries are basically like files in a
 directory.
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 698c43dd5dc8..f28bc763847a 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -202,7 +202,7 @@ static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode)
 	return inode->i_sb->s_fs_info;
 }
 
-static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry)
+static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry)
 {
 	return dentry->d_sb->s_fs_info;
 }
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 872c1abe3295..b2222df318d0 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -105,14 +105,27 @@ static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
 	return __v9fs_lookup_revalidate(dentry, flags);
 }
 
+static bool v9fs_dentry_want_unalias(const struct dentry *dentry, bool lock)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+
+	if (lock)
+		return down_write_trylock(&v9ses->rename_sem);
+
+	up_write(&v9ses->rename_sem);
+	return true;
+}
+
 const struct dentry_operations v9fs_cached_dentry_operations = {
 	.d_revalidate = v9fs_lookup_revalidate,
 	.d_weak_revalidate = __v9fs_lookup_revalidate,
 	.d_delete = v9fs_cached_dentry_delete,
 	.d_release = v9fs_dentry_release,
+	.d_want_unalias = v9fs_dentry_want_unalias,
 };
 
 const struct dentry_operations v9fs_dentry_operations = {
 	.d_delete = always_delete_dentry,
 	.d_release = v9fs_dentry_release,
+	.d_want_unalias = v9fs_dentry_want_unalias,
 };
diff --git a/fs/dcache.c b/fs/dcache.c
index 7d42ca367522..efbfbc1bc5d4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2947,6 +2947,7 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias)
 {
 	struct mutex *m1 = NULL;
 	struct rw_semaphore *m2 = NULL;
+	bool (*extra_trylock)(const struct dentry *, bool);
 	int ret = -ESTALE;
 
 	/* If alias and dentry share a parent, then no extra locks required */
@@ -2961,7 +2962,12 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias)
 		goto out_err;
 	m2 = &alias->d_parent->d_inode->i_rwsem;
 out_unalias:
+	extra_trylock = alias->d_op->d_want_unalias;
+	if (extra_trylock && !extra_trylock(alias, true))
+		goto out_err;
 	__d_move(alias, dentry, false);
+	if (extra_trylock)
+		extra_trylock(alias, false);
 	ret = 0;
 out_err:
 	if (m2)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 4a6bdadf2f29..2b33b9d04a8f 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -159,6 +159,7 @@ struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_want_unalias)(const struct dentry *, bool);
 } ____cacheline_aligned;
 
 /*
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [PATCH 19/20] orangefs_d_revalidate(): use stable parent inode and name passed by caller
  2025-01-10  2:43   ` [PATCH 19/20] orangefs_d_revalidate(): " Al Viro
@ 2025-01-10  3:06     ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2025-01-10  3:06 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos

On Thu, 9 Jan 2025 at 18:45, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> ->d_name use is a UAF.

.. let's change "is a UAF" to "can be a potential UAF" in that sentence, ok?

The way you phrase it, it sounds like it's an acute problem, rather
than a "nobody has ever seen it in practice, but in theory with just
the right patterns and memory pressure".

Anyway, apart from this (and similar wording in one or two others,
iirc) ack on all the patches up until the last one. I'll write a
separate note for that one.

          Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 20/20] 9p: fix ->rename_sem exclusion
  2025-01-10  2:43   ` [PATCH 20/20] 9p: fix ->rename_sem exclusion Al Viro
@ 2025-01-10  3:11     ` Linus Torvalds
  2025-01-10  5:53       ` Al Viro
  0 siblings, 1 reply; 96+ messages in thread
From: Linus Torvalds @ 2025-01-10  3:11 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos

On Thu, 9 Jan 2025 at 18:45, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> However, to reduce dentry_operations bloat, let's add one method instead -
> ->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and
> ->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias).

Ugh.

So of all the patches, this is the one that I hate.

I absolutely detest interfaces with random true/false arguments, and
when it is about locking, the "detest" becomes something even darker
and more visceral.

I think it would be a lot better as separate ops, considering that

 (a) we'll probably have only one or two actual users, so it's not
like it complicates things on that side

 (b) we don't have *that* many "struct dentry_operations" structures:
I think they are all statically generated constant structures
(typically one or two per filesystem), so it's not like we're saving
memory by merging those pointers into one.

Please?

           Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 20/20] 9p: fix ->rename_sem exclusion
  2025-01-10  3:11     ` Linus Torvalds
@ 2025-01-10  5:53       ` Al Viro
  0 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10  5:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos

On Thu, Jan 09, 2025 at 07:11:46PM -0800, Linus Torvalds wrote:
> On Thu, 9 Jan 2025 at 18:45, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > However, to reduce dentry_operations bloat, let's add one method instead -
> > ->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and
> > ->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias).
> 
> Ugh.
> 
> So of all the patches, this is the one that I hate.
> 
> I absolutely detest interfaces with random true/false arguments, and
> when it is about locking, the "detest" becomes something even darker
> and more visceral.
> 
> I think it would be a lot better as separate ops, considering that
> 
>  (a) we'll probably have only one or two actual users, so it's not
> like it complicates things on that side
> 
>  (b) we don't have *that* many "struct dentry_operations" structures:
> I think they are all statically generated constant structures
> (typically one or two per filesystem), so it's not like we're saving
> memory by merging those pointers into one.

ACK.

> Please?

Done and force-pushed; see below for updated variant of that commit

commit 1f28d77e868e63a07ab50e7fe161fc366b2fb23b
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Jan 5 21:33:17 2025 -0500

    9p: fix ->rename_sem exclusion
    
    9p wants to be able to build a path from given dentry to fs root and keep
    it valid over a blocking operation.
    
    ->s_vfs_rename_mutex would be a natural candidate, but there are places
    where we need that and where we have no way to tell if ->s_vfs_rename_mutex
    is already held deeper in callchain.  Moreover, it's only held for
    cross-directory renames; name changes within the same directory happen
    without it.
    
    Solution:
            * have d_move() done in ->rename() rather than in its caller
            * maintain a 9p-private rwsem (per-filesystem)
            * hold it exclusive over the relevant part of ->rename()
            * hold it shared over the places where we want the path.
    
    That almost works.  FS_RENAME_DOES_D_MOVE is enough to put all d_move()
    and d_exchange() calls under filesystem's control.  However, there's
    also __d_unalias(), which isn't covered by any of that.
    
    If ->lookup() hits a directory inode with preexisting dentry elsewhere
    (due to e.g. rename done on server behind our back), d_splice_alias()
    called by ->lookup() will move/rename that alias.
    
    Add a couple of optional methods, so that __d_unalias() would do
            if alias->d_op->d_unalias_trylock != NULL
                    if (!alias->d_op->d_unalias_trylock(alias))
                            fail (resulting in -ESTALE from lookup)
            __d_move(...)
            if alias->d_op->d_unalias_unlock != NULL
                    alias->d_unalias_unlock(alias)
    where it currently does __d_move().  9p instances do down_write_trylock()
    and up_write() of ->rename_mutex.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 146e7d8aa736..d20a32b77b60 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -31,6 +31,8 @@ prototypes::
 	struct vfsmount *(*d_automount)(struct path *path);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_unalias_trylock)(const struct dentry *);
+	void (*d_unalias_unlock)(const struct dentry *);
 
 locking rules:
 
@@ -50,6 +52,8 @@ d_dname:	   no		no		no		no
 d_automount:	   no		no		yes		no
 d_manage:	   no		no		yes (ref-walk)	maybe
 d_real		   no		no		yes 		no
+d_unalias_trylock  yes		no		no 		no
+d_unalias_unlock   yes		no		no 		no
 ================== ===========	========	==============	========
 
 inode_operations
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c352ebaae98..31eea688609a 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1265,6 +1265,8 @@ defined:
 		struct vfsmount *(*d_automount)(struct path *);
 		int (*d_manage)(const struct path *, bool);
 		struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+		bool (*d_unalias_trylock)(const struct dentry *);
+		void (*d_unalias_unlock)(const struct dentry *);
 	};
 
 ``d_revalidate``
@@ -1428,6 +1430,25 @@ defined:
 
 	For non-regular files, the 'dentry' argument is returned.
 
+``d_unalias_trylock``
+	if present, will be called by d_splice_alias() before moving a
+	preexisting attached alias.  Returning false prevents __d_move(),
+	making d_splice_alias() fail with -ESTALE.
+
+	Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move()
+	and d_exchange() calls from the outside of filesystem methods;
+	however, it does not guarantee that attached dentries won't
+	be renamed or moved by d_splice_alias() finding a preexisting
+	alias for a directory inode.  Normally we would not care;
+	however, something that wants to stabilize the entire path to
+	root over a blocking operation might need that.  See 9p for one
+	(and hopefully only) example.
+
+``d_unalias_unlock``
+	should be paired with ``d_unalias_trylock``; that one is called after
+	__d_move() call in __d_unalias().
+
+
 Each dentry has a pointer to its parent dentry, as well as a hash list
 of child dentries.  Child dentries are basically like files in a
 directory.
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 698c43dd5dc8..f28bc763847a 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -202,7 +202,7 @@ static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode)
 	return inode->i_sb->s_fs_info;
 }
 
-static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry)
+static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry)
 {
 	return dentry->d_sb->s_fs_info;
 }
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 872c1abe3295..5061f192eafd 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -105,14 +105,30 @@ static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
 	return __v9fs_lookup_revalidate(dentry, flags);
 }
 
+static bool v9fs_dentry_unalias_trylock(const struct dentry *dentry)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+	return down_write_trylock(&v9ses->rename_sem);
+}
+
+static void v9fs_dentry_unalias_unlock(const struct dentry *dentry)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+	up_write(&v9ses->rename_sem);
+}
+
 const struct dentry_operations v9fs_cached_dentry_operations = {
 	.d_revalidate = v9fs_lookup_revalidate,
 	.d_weak_revalidate = __v9fs_lookup_revalidate,
 	.d_delete = v9fs_cached_dentry_delete,
 	.d_release = v9fs_dentry_release,
+	.d_unalias_trylock = v9fs_dentry_unalias_trylock,
+	.d_unalias_unlock = v9fs_dentry_unalias_unlock,
 };
 
 const struct dentry_operations v9fs_dentry_operations = {
 	.d_delete = always_delete_dentry,
 	.d_release = v9fs_dentry_release,
+	.d_unalias_trylock = v9fs_dentry_unalias_trylock,
+	.d_unalias_unlock = v9fs_dentry_unalias_unlock,
 };
diff --git a/fs/dcache.c b/fs/dcache.c
index 7d42ca367522..2ac614fc8bba 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2961,7 +2961,12 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias)
 		goto out_err;
 	m2 = &alias->d_parent->d_inode->i_rwsem;
 out_unalias:
+	if (alias->d_op->d_unalias_trylock &&
+	    !alias->d_op->d_unalias_trylock(alias))
+		goto out_err;
 	__d_move(alias, dentry, false);
+	if (alias->d_op->d_unalias_unlock)
+		alias->d_op->d_unalias_unlock(alias);
 	ret = 0;
 out_err:
 	if (m2)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 4a6bdadf2f29..9a1a30857763 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -159,6 +159,8 @@ struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_unalias_trylock)(const struct dentry *);
+	void (*d_unalias_unlock)(const struct dentry *);
 } ____cacheline_aligned;
 
 /*

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [PATCH 04/20] dissolve external_name.u into separate members
  2025-01-10  2:42   ` [PATCH 04/20] dissolve external_name.u into separate members Al Viro
@ 2025-01-10  7:34     ` David Howells
  2025-01-10 16:46       ` Al Viro
  0 siblings, 1 reply; 96+ messages in thread
From: David Howells @ 2025-01-10  7:34 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, linux-fsdevel, agruenba, amir73il, brauner, ceph-devel,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

Al Viro <viro@zeniv.linux.org.uk> wrote:

>  struct external_name {
> -	struct {
> -		atomic_t count;		// ->count and ->head can't be combined
> -		struct rcu_head head;	// see take_dentry_name_snapshot()
> -	} u;
> +	atomic_t count;		// ->count and ->head can't be combined
> +	struct rcu_head head;	// see take_dentry_name_snapshot()
>  	unsigned char name[];
>  };

This gets you a 4-byte hole between count and head on a 64-bit system.  Did
you want to flip the order of count and head?

David


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 05/20] ext4 fast_commit: make use of name_snapshot primitives
  2025-01-10  2:42   ` [PATCH 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
@ 2025-01-10  9:15     ` Jan Kara
  0 siblings, 0 replies; 96+ messages in thread
From: Jan Kara @ 2025-01-10  9:15 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

On Fri 10-01-25 02:42:48, Al Viro wrote:
> ... rather than open-coding them.  As a bonus, that avoids the pointless
> work with extra allocations, etc. for long names.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Nice! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/fast_commit.c | 29 +++++------------------------
>  fs/ext4/fast_commit.h |  3 +--
>  2 files changed, 6 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 26c4fc37edcf..da4263a14a20 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -322,9 +322,7 @@ void ext4_fc_del(struct inode *inode)
>  	WARN_ON(!list_empty(&ei->i_fc_dilist));
>  	spin_unlock(&sbi->s_fc_lock);
>  
> -	if (fc_dentry->fcd_name.name &&
> -		fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
> -		kfree(fc_dentry->fcd_name.name);
> +	release_dentry_name_snapshot(&fc_dentry->fcd_name);
>  	kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
>  
>  	return;
> @@ -449,22 +447,7 @@ static int __track_dentry_update(handle_t *handle, struct inode *inode,
>  	node->fcd_op = dentry_update->op;
>  	node->fcd_parent = dir->i_ino;
>  	node->fcd_ino = inode->i_ino;
> -	if (dentry->d_name.len > DNAME_INLINE_LEN) {
> -		node->fcd_name.name = kmalloc(dentry->d_name.len, GFP_NOFS);
> -		if (!node->fcd_name.name) {
> -			kmem_cache_free(ext4_fc_dentry_cachep, node);
> -			ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_NOMEM, handle);
> -			mutex_lock(&ei->i_fc_lock);
> -			return -ENOMEM;
> -		}
> -		memcpy((u8 *)node->fcd_name.name, dentry->d_name.name,
> -			dentry->d_name.len);
> -	} else {
> -		memcpy(node->fcd_iname, dentry->d_name.name,
> -			dentry->d_name.len);
> -		node->fcd_name.name = node->fcd_iname;
> -	}
> -	node->fcd_name.len = dentry->d_name.len;
> +	take_dentry_name_snapshot(&node->fcd_name, dentry);
>  	INIT_LIST_HEAD(&node->fcd_dilist);
>  	spin_lock(&sbi->s_fc_lock);
>  	if (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
> @@ -832,7 +815,7 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
>  {
>  	struct ext4_fc_dentry_info fcd;
>  	struct ext4_fc_tl tl;
> -	int dlen = fc_dentry->fcd_name.len;
> +	int dlen = fc_dentry->fcd_name.name.len;
>  	u8 *dst = ext4_fc_reserve_space(sb,
>  			EXT4_FC_TAG_BASE_LEN + sizeof(fcd) + dlen, crc);
>  
> @@ -847,7 +830,7 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
>  	dst += EXT4_FC_TAG_BASE_LEN;
>  	memcpy(dst, &fcd, sizeof(fcd));
>  	dst += sizeof(fcd);
> -	memcpy(dst, fc_dentry->fcd_name.name, dlen);
> +	memcpy(dst, fc_dentry->fcd_name.name.name, dlen);
>  
>  	return true;
>  }
> @@ -1328,9 +1311,7 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>  		list_del_init(&fc_dentry->fcd_dilist);
>  		spin_unlock(&sbi->s_fc_lock);
>  
> -		if (fc_dentry->fcd_name.name &&
> -			fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
> -			kfree(fc_dentry->fcd_name.name);
> +		release_dentry_name_snapshot(&fc_dentry->fcd_name);
>  		kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
>  		spin_lock(&sbi->s_fc_lock);
>  	}
> diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
> index 2fadb2c4780c..3bd534e4dbbf 100644
> --- a/fs/ext4/fast_commit.h
> +++ b/fs/ext4/fast_commit.h
> @@ -109,8 +109,7 @@ struct ext4_fc_dentry_update {
>  	int fcd_op;		/* Type of update create / unlink / link */
>  	int fcd_parent;		/* Parent inode number */
>  	int fcd_ino;		/* Inode number */
> -	struct qstr fcd_name;	/* Dirent name */
> -	unsigned char fcd_iname[DNAME_INLINE_LEN];	/* Dirent name string */
> +	struct name_snapshot fcd_name;	/* Dirent name */
>  	struct list_head fcd_list;
>  	struct list_head fcd_dilist;
>  };
> -- 
> 2.39.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                     ` (18 preceding siblings ...)
  2025-01-10  2:43   ` [PATCH 20/20] 9p: fix ->rename_sem exclusion Al Viro
@ 2025-01-10  9:21   ` Jan Kara
  19 siblings, 0 replies; 96+ messages in thread
From: Jan Kara @ 2025-01-10  9:21 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

On Fri 10-01-25 02:42:44, Al Viro wrote:
> ... calling the number of words DNAME_INLINE_WORDS.
> 
> The next step will be to have a structure to hold inline name arrays
> (both in dentry and in name_snapshot) and use that to alias the
> existing arrays of unsigned char there.  That will allow both
> full-structure copies and convenient word-by-word accesses.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/dcache.c            | 4 +---
>  include/linux/dcache.h | 8 +++++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b4d5e9e1e43d..ea0f0bea511b 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -2748,9 +2748,7 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
>  			/*
>  			 * Both are internal.
>  			 */
> -			unsigned int i;
> -			BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
> -			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
> +			for (int i = 0; i < DNAME_INLINE_WORDS; i++) {
>  				swap(((long *) &dentry->d_iname)[i],
>  				     ((long *) &target->d_iname)[i]);
>  			}
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index bff956f7b2b9..42dd89beaf4e 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -68,15 +68,17 @@ extern const struct qstr dotdot_name;
>   * large memory footprint increase).
>   */
>  #ifdef CONFIG_64BIT
> -# define DNAME_INLINE_LEN 40 /* 192 bytes */
> +# define DNAME_INLINE_WORDS 5 /* 192 bytes */
>  #else
>  # ifdef CONFIG_SMP
> -#  define DNAME_INLINE_LEN 36 /* 128 bytes */
> +#  define DNAME_INLINE_WORDS 9 /* 128 bytes */
>  # else
> -#  define DNAME_INLINE_LEN 44 /* 128 bytes */
> +#  define DNAME_INLINE_WORDS 11 /* 128 bytes */
>  # endif
>  #endif
>  
> +#define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
> +
>  #define d_lock	d_lockref.lock
>  
>  struct dentry {
> -- 
> 2.39.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long
  2025-01-10  2:42   ` [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
@ 2025-01-10  9:35     ` Jan Kara
  2025-01-10 16:24       ` Al Viro
  0 siblings, 1 reply; 96+ messages in thread
From: Jan Kara @ 2025-01-10  9:35 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

On Fri 10-01-25 02:42:45, Al Viro wrote:
> ... so that they can be copied with struct assignment (which generates
> better code) and accessed word-by-word.
> 
> The type is union shortname_storage; it's a union of arrays of
> unsigned char and unsigned long.
> 
> struct name_snapshot.inline_name turned into union shortname_storage;
> users (all in fs/dcache.c) adjusted.
> 
> struct dentry.d_iname has some users outside of fs/dcache.c; to
> reduce the amount of noise in commit, it is replaced with
> union shortname_storage d_shortname and d_iname is turned into a macro
> that expands to d_shortname.string (similar to d_lock handling, hopefully
> temporary - most, if not all, users shouldn't be messing with it).)
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

I was thinking for a while whether if you now always copy 40 bytes instead
of only d_name.len bytes cannot have any adverse performance effects
(additional cacheline fetched / dirtied) but I don't think any path copying
the name is that performance critical to matter if it would be noticeable
at all.

								Honza


> ---
>  fs/dcache.c                                  | 43 +++++++++-----------
>  include/linux/dcache.h                       | 10 ++++-
>  tools/testing/selftests/bpf/progs/find_vma.c |  2 +-
>  3 files changed, 28 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index ea0f0bea511b..52662a5d08e4 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -324,7 +324,7 @@ static void __d_free_external(struct rcu_head *head)
>  
>  static inline int dname_external(const struct dentry *dentry)
>  {
> -	return dentry->d_name.name != dentry->d_iname;
> +	return dentry->d_name.name != dentry->d_shortname.string;
>  }
>  
>  void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
> @@ -334,9 +334,8 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry
>  	if (unlikely(dname_external(dentry))) {
>  		atomic_inc(&external_name(dentry)->u.count);
>  	} else {
> -		memcpy(name->inline_name, dentry->d_iname,
> -		       dentry->d_name.len + 1);
> -		name->name.name = name->inline_name;
> +		name->inline_name = dentry->d_shortname;
> +		name->name.name = name->inline_name.string;
>  	}
>  	spin_unlock(&dentry->d_lock);
>  }
> @@ -344,7 +343,7 @@ EXPORT_SYMBOL(take_dentry_name_snapshot);
>  
>  void release_dentry_name_snapshot(struct name_snapshot *name)
>  {
> -	if (unlikely(name->name.name != name->inline_name)) {
> +	if (unlikely(name->name.name != name->inline_name.string)) {
>  		struct external_name *p;
>  		p = container_of(name->name.name, struct external_name, name[0]);
>  		if (unlikely(atomic_dec_and_test(&p->u.count)))
> @@ -1654,10 +1653,10 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
>  	 * will still always have a NUL at the end, even if we might
>  	 * be overwriting an internal NUL character
>  	 */
> -	dentry->d_iname[DNAME_INLINE_LEN-1] = 0;
> +	dentry->d_shortname.string[DNAME_INLINE_LEN-1] = 0;
>  	if (unlikely(!name)) {
>  		name = &slash_name;
> -		dname = dentry->d_iname;
> +		dname = dentry->d_shortname.string;
>  	} else if (name->len > DNAME_INLINE_LEN-1) {
>  		size_t size = offsetof(struct external_name, name[1]);
>  		struct external_name *p = kmalloc(size + name->len,
> @@ -1670,7 +1669,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
>  		atomic_set(&p->u.count, 1);
>  		dname = p->name;
>  	} else  {
> -		dname = dentry->d_iname;
> +		dname = dentry->d_shortname.string;
>  	}	
>  
>  	dentry->d_name.len = name->len;
> @@ -2729,10 +2728,9 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
>  			 * dentry:internal, target:external.  Steal target's
>  			 * storage and make target internal.
>  			 */
> -			memcpy(target->d_iname, dentry->d_name.name,
> -					dentry->d_name.len + 1);
>  			dentry->d_name.name = target->d_name.name;
> -			target->d_name.name = target->d_iname;
> +			target->d_shortname = dentry->d_shortname;
> +			target->d_name.name = target->d_shortname.string;
>  		}
>  	} else {
>  		if (unlikely(dname_external(dentry))) {
> @@ -2740,18 +2738,16 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
>  			 * dentry:external, target:internal.  Give dentry's
>  			 * storage to target and make dentry internal
>  			 */
> -			memcpy(dentry->d_iname, target->d_name.name,
> -					target->d_name.len + 1);
>  			target->d_name.name = dentry->d_name.name;
> -			dentry->d_name.name = dentry->d_iname;
> +			dentry->d_shortname = target->d_shortname;
> +			dentry->d_name.name = dentry->d_shortname.string;
>  		} else {
>  			/*
>  			 * Both are internal.
>  			 */
> -			for (int i = 0; i < DNAME_INLINE_WORDS; i++) {
> -				swap(((long *) &dentry->d_iname)[i],
> -				     ((long *) &target->d_iname)[i]);
> -			}
> +			for (int i = 0; i < DNAME_INLINE_WORDS; i++)
> +				swap(dentry->d_shortname.words[i],
> +				     target->d_shortname.words[i]);
>  		}
>  	}
>  	swap(dentry->d_name.hash_len, target->d_name.hash_len);
> @@ -2766,9 +2762,8 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
>  		atomic_inc(&external_name(target)->u.count);
>  		dentry->d_name = target->d_name;
>  	} else {
> -		memcpy(dentry->d_iname, target->d_name.name,
> -				target->d_name.len + 1);
> -		dentry->d_name.name = dentry->d_iname;
> +		dentry->d_shortname = target->d_shortname;
> +		dentry->d_name.name = dentry->d_shortname.string;
>  		dentry->d_name.hash_len = target->d_name.hash_len;
>  	}
>  	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
> @@ -3101,12 +3096,12 @@ void d_mark_tmpfile(struct file *file, struct inode *inode)
>  {
>  	struct dentry *dentry = file->f_path.dentry;
>  
> -	BUG_ON(dentry->d_name.name != dentry->d_iname ||
> +	BUG_ON(dname_external(dentry) ||
>  		!hlist_unhashed(&dentry->d_u.d_alias) ||
>  		!d_unlinked(dentry));
>  	spin_lock(&dentry->d_parent->d_lock);
>  	spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> -	dentry->d_name.len = sprintf(dentry->d_iname, "#%llu",
> +	dentry->d_name.len = sprintf(dentry->d_shortname.string, "#%llu",
>  				(unsigned long long)inode->i_ino);
>  	spin_unlock(&dentry->d_lock);
>  	spin_unlock(&dentry->d_parent->d_lock);
> @@ -3194,7 +3189,7 @@ static void __init dcache_init(void)
>  	 */
>  	dentry_cache = KMEM_CACHE_USERCOPY(dentry,
>  		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT,
> -		d_iname);
> +		d_shortname.string);
>  
>  	/* Hash may have been set up in dcache_init_early */
>  	if (!hashdist)
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 42dd89beaf4e..8bc567a35718 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -79,7 +79,13 @@ extern const struct qstr dotdot_name;
>  
>  #define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
>  
> +union shortname_store {
> +	unsigned char string[DNAME_INLINE_LEN];
> +	unsigned long words[DNAME_INLINE_WORDS];
> +};
> +
>  #define d_lock	d_lockref.lock
> +#define d_iname d_shortname.string
>  
>  struct dentry {
>  	/* RCU lookup touched fields */
> @@ -90,7 +96,7 @@ struct dentry {
>  	struct qstr d_name;
>  	struct inode *d_inode;		/* Where the name belongs to - NULL is
>  					 * negative */
> -	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
> +	union shortname_store d_shortname;
>  	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
>  
>  	/* Ref lookup also touches following */
> @@ -591,7 +597,7 @@ static inline struct inode *d_real_inode(const struct dentry *dentry)
>  
>  struct name_snapshot {
>  	struct qstr name;
> -	unsigned char inline_name[DNAME_INLINE_LEN];
> +	union shortname_store inline_name;
>  };
>  void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
>  void release_dentry_name_snapshot(struct name_snapshot *);
> diff --git a/tools/testing/selftests/bpf/progs/find_vma.c b/tools/testing/selftests/bpf/progs/find_vma.c
> index 38034fb82530..02b82774469c 100644
> --- a/tools/testing/selftests/bpf/progs/find_vma.c
> +++ b/tools/testing/selftests/bpf/progs/find_vma.c
> @@ -25,7 +25,7 @@ static long check_vma(struct task_struct *task, struct vm_area_struct *vma,
>  {
>  	if (vma->vm_file)
>  		bpf_probe_read_kernel_str(d_iname, DNAME_INLINE_LEN - 1,
> -					  vma->vm_file->f_path.dentry->d_iname);
> +					  vma->vm_file->f_path.dentry->d_shortname.string);
>  
>  	/* check for VM_EXEC */
>  	if (vma->vm_flags & VM_EXEC)
> -- 
> 2.39.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 03/20] make take_dentry_name_snapshot() lockless
  2025-01-10  2:42   ` [PATCH 03/20] make take_dentry_name_snapshot() lockless Al Viro
@ 2025-01-10  9:45     ` Jan Kara
  0 siblings, 0 replies; 96+ messages in thread
From: Jan Kara @ 2025-01-10  9:45 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

On Fri 10-01-25 02:42:46, Al Viro wrote:
> Use ->d_seq instead of grabbing ->d_lock; in case of shortname dentries
> that avoids any stores to shared data objects and in case of long names
> we are down to (unavoidable) atomic_inc on the external_name refcount.
> 
> Makes the thing safer as well - the areas where ->d_seq is held odd are
> all nested inside the areas where ->d_lock is held, and the latter are
> much more numerous.
> 
> NOTE: now that there is a lockless path where we might try to grab
> a reference to an already doomed external_name instance, it is no
> longer possible for external_name.u.count and external_name.u.head
> to share space (kudos to Linus for spotting that).
> 
> To reduce the noice this commit just turns external_name.u into
> a struct (instead of union); the next commit will dissolve it.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Cool. One less lock roundtrip on relatively hot fsnotify path :). Feel free
to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/dcache.c | 35 +++++++++++++++++++++++++----------
>  1 file changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 52662a5d08e4..f387dc97df86 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -296,9 +296,9 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
>  }
>  
>  struct external_name {
> -	union {
> -		atomic_t count;
> -		struct rcu_head head;
> +	struct {
> +		atomic_t count;		// ->count and ->head can't be combined
> +		struct rcu_head head;	// see take_dentry_name_snapshot()
>  	} u;
>  	unsigned char name[];
>  };
> @@ -329,15 +329,30 @@ static inline int dname_external(const struct dentry *dentry)
>  
>  void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
>  {
> -	spin_lock(&dentry->d_lock);
> -	name->name = dentry->d_name;
> -	if (unlikely(dname_external(dentry))) {
> -		atomic_inc(&external_name(dentry)->u.count);
> -	} else {
> +	unsigned seq;
> +	const unsigned char *s;
> +
> +	rcu_read_lock();
> +retry:
> +	seq = read_seqcount_begin(&dentry->d_seq);
> +	s = READ_ONCE(dentry->d_name.name);
> +	name->name.hash_len = dentry->d_name.hash_len;
> +	name->name.name = name->inline_name.string;
> +	if (likely(s == dentry->d_shortname.string)) {
>  		name->inline_name = dentry->d_shortname;
> -		name->name.name = name->inline_name.string;
> +	} else {
> +		struct external_name *p;
> +		p = container_of(s, struct external_name, name[0]);
> +		// get a valid reference
> +		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
> +			goto retry;
> +		name->name.name = s;
>  	}
> -	spin_unlock(&dentry->d_lock);
> +	if (read_seqcount_retry(&dentry->d_seq, seq)) {
> +		release_dentry_name_snapshot(name);
> +		goto retry;
> +	}
> +	rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(take_dentry_name_snapshot);
>  
> -- 
> 2.39.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
  2025-01-10  2:43   ` [PATCH 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
@ 2025-01-10  9:54     ` Jan Kara
  0 siblings, 0 replies; 96+ messages in thread
From: Jan Kara @ 2025-01-10  9:54 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

On Fri 10-01-25 02:43:01, Al Viro wrote:
> theoretically, ->d_name use in there is a UAF, but only if you are messing with
> tracepoints...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ocfs2/dcache.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c
> index ecb1ce6301c4..1873bbbb7e5b 100644
> --- a/fs/ocfs2/dcache.c
> +++ b/fs/ocfs2/dcache.c
> @@ -45,8 +45,7 @@ static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
>  	inode = d_inode(dentry);
>  	osb = OCFS2_SB(dentry->d_sb);
>  
> -	trace_ocfs2_dentry_revalidate(dentry, dentry->d_name.len,
> -				      dentry->d_name.name);
> +	trace_ocfs2_dentry_revalidate(dentry, name->len, name->name);
>  
>  	/* For a negative dentry -
>  	 * check the generation number of the parent and compare with the
> @@ -54,12 +53,8 @@ static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
>  	 */
>  	if (inode == NULL) {
>  		unsigned long gen = (unsigned long) dentry->d_fsdata;
> -		unsigned long pgen;
> -		spin_lock(&dentry->d_lock);
> -		pgen = OCFS2_I(d_inode(dentry->d_parent))->ip_dir_lock_gen;
> -		spin_unlock(&dentry->d_lock);
> -		trace_ocfs2_dentry_revalidate_negative(dentry->d_name.len,
> -						       dentry->d_name.name,
> +		unsigned long pgen = OCFS2_I(dir)->ip_dir_lock_gen;
> +		trace_ocfs2_dentry_revalidate_negative(name->len, name->name,
>  						       pgen, gen);
>  		if (gen != pgen)
>  			goto bail;
> -- 
> 2.39.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long
  2025-01-10  9:35     ` Jan Kara
@ 2025-01-10 16:24       ` Al Viro
  0 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10 16:24 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, krisman, linux-nfs, miklos, torvalds

On Fri, Jan 10, 2025 at 10:35:14AM +0100, Jan Kara wrote:

> I was thinking for a while whether if you now always copy 40 bytes instead
> of only d_name.len bytes cannot have any adverse performance effects
> (additional cacheline fetched / dirtied) but I don't think any path copying
> the name is that performance critical to matter if it would be noticeable
> at all.

FWIW, I'd expect it to be a slight win overall; we'll see if profiling shows
otherwise, but...

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 04/20] dissolve external_name.u into separate members
  2025-01-10  7:34     ` David Howells
@ 2025-01-10 16:46       ` Al Viro
  0 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-10 16:46 UTC (permalink / raw)
  To: David Howells
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, hubcap,
	jack, krisman, linux-nfs, miklos, torvalds

On Fri, Jan 10, 2025 at 07:34:11AM +0000, David Howells wrote:
> Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> >  struct external_name {
> > -	struct {
> > -		atomic_t count;		// ->count and ->head can't be combined
> > -		struct rcu_head head;	// see take_dentry_name_snapshot()
> > -	} u;
> > +	atomic_t count;		// ->count and ->head can't be combined
> > +	struct rcu_head head;	// see take_dentry_name_snapshot()
> >  	unsigned char name[];
> >  };
> 
> This gets you a 4-byte hole between count and head on a 64-bit system.  Did
> you want to flip the order of count and head?

Umm...  Could do, but that probably wouldn't be that much of a win - we use
those for names >= 40 characters long, and currently the size is 25 + len
bytes.  And it's kmalloc'ed, so anything in range 40...71 goes into kmalloc-96.

Reordering those would have 40..43 land in kmalloc-64, leaving the rest as-is.
Might as well...

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 15/20] gfs2_drevalidate(): use stable parent inode and name passed by caller
  2025-01-10  2:42   ` [PATCH 15/20] gfs2_drevalidate(): " Al Viro
@ 2025-01-10 19:20     ` Andreas Grünbacher
  0 siblings, 0 replies; 96+ messages in thread
From: Andreas Grünbacher @ 2025-01-10 19:20 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

Am Fr., 10. Jan. 2025 um 03:44 Uhr schrieb Al Viro <viro@zeniv.linux.org.uk>:
> No need to mess with dget_parent() for the former; for the latter we really should
> not rely upon ->d_name.name remaining stable.  Again, a UAF there.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/gfs2/dentry.c | 24 ++++++++----------------
>  1 file changed, 8 insertions(+), 16 deletions(-)
>
> diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c
> index 86c338901fab..95050e719233 100644
> --- a/fs/gfs2/dentry.c
> +++ b/fs/gfs2/dentry.c
> @@ -35,48 +35,40 @@
>  static int gfs2_drevalidate(struct inode *dir, const struct qstr *name,
>                             struct dentry *dentry, unsigned int flags)
>  {
> -       struct dentry *parent;
> -       struct gfs2_sbd *sdp;
> -       struct gfs2_inode *dip;
> +       struct gfs2_sbd *sdp = GFS2_SB(dir);
> +       struct gfs2_inode *dip = GFS2_I(dir);
>         struct inode *inode;
>         struct gfs2_holder d_gh;
>         struct gfs2_inode *ip = NULL;
> -       int error, valid = 0;
> +       int error, valid;
>         int had_lock = 0;
>
>         if (flags & LOOKUP_RCU)
>                 return -ECHILD;
>
> -       parent = dget_parent(dentry);
> -       sdp = GFS2_SB(d_inode(parent));
> -       dip = GFS2_I(d_inode(parent));
>         inode = d_inode(dentry);
>
>         if (inode) {
>                 if (is_bad_inode(inode))
> -                       goto out;
> +                       return 0;
>                 ip = GFS2_I(inode);
>         }
>
> -       if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL) {
> -               valid = 1;
> -               goto out;
> -       }
> +       if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL)
> +               return 1;
>
>         had_lock = (gfs2_glock_is_locked_by_me(dip->i_gl) != NULL);
>         if (!had_lock) {
>                 error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &d_gh);
>                 if (error)
> -                       goto out;
> +                       return 0;
>         }
>
> -       error = gfs2_dir_check(d_inode(parent), &dentry->d_name, ip);
> +       error = gfs2_dir_check(dir, name, ip);
>         valid = inode ? !error : (error == -ENOENT);
>
>         if (!had_lock)
>                 gfs2_glock_dq_uninit(&d_gh);
> -out:
> -       dput(parent);
>         return valid;
>  }
>
> --
> 2.39.5

Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re:  [PATCH 09/20] ceph_d_revalidate(): use stable parent inode passed by caller
  2025-01-10  2:42   ` [PATCH 09/20] ceph_d_revalidate(): use stable " Al Viro
@ 2025-01-10 19:45     ` Viacheslav Dubeyko
  0 siblings, 0 replies; 96+ messages in thread
From: Viacheslav Dubeyko @ 2025-01-10 19:45 UTC (permalink / raw)
  To: linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk
  Cc: jack@suse.cz, hubcap@omnibond.com, brauner@kernel.org,
	David Howells, ceph-devel@vger.kernel.org, miklos@szeredi.hu,
	Andreas Gruenbacher, torvalds@linux-foundation.org,
	krisman@kernel.org, amir73il@gmail.com, linux-nfs@vger.kernel.org

On Fri, 2025-01-10 at 02:42 +0000, Al Viro wrote:
> No need to mess with the boilerplate for obtaining what we already
> have.  Note that ceph is one of the "will want a path from filesystem
> root if we want to talk to server" cases, so the name of the last
> component is of little use - it is passed to fscrypt_d_revalidate()
> and it's used to deal with (also crypt-related) case in request
> marshalling, when encrypted name turns out to be too long.  The
> former
> is not a problem, but the latter is racy; that part will be handled
> in the next commit.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/ceph/dir.c | 22 ++++------------------
>  1 file changed, 4 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index c4c71c24221b..dc5f55bebad7 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -1940,30 +1940,19 @@ static int dir_lease_is_valid(struct inode
> *dir, struct dentry *dentry,
>  /*
>   * Check if cached dentry can be trusted.
>   */
> -static int ceph_d_revalidate(struct inode *parent_dir, const struct
> qstr *name,
> +static int ceph_d_revalidate(struct inode *dir, const struct qstr
> *name,
>  			     struct dentry *dentry, unsigned int
> flags)
>  {
>  	struct ceph_mds_client *mdsc = ceph_sb_to_fs_client(dentry-
> >d_sb)->mdsc;
>  	struct ceph_client *cl = mdsc->fsc->client;
>  	int valid = 0;
> -	struct dentry *parent;
> -	struct inode *dir, *inode;
> +	struct inode *inode;
>  
> -	valid = fscrypt_d_revalidate(parent_dir, name, dentry,
> flags);
> +	valid = fscrypt_d_revalidate(dir, name, dentry, flags);
>  	if (valid <= 0)
>  		return valid;
>  
> -	if (flags & LOOKUP_RCU) {
> -		parent = READ_ONCE(dentry->d_parent);
> -		dir = d_inode_rcu(parent);
> -		if (!dir)
> -			return -ECHILD;
> -		inode = d_inode_rcu(dentry);
> -	} else {
> -		parent = dget_parent(dentry);
> -		dir = d_inode(parent);
> -		inode = d_inode(dentry);
> -	}
> +	inode = d_inode_rcu(dentry);
>  
>  	doutc(cl, "%p '%pd' inode %p offset 0x%llx nokey %d\n",
>  	      dentry, dentry, inode, ceph_dentry(dentry)->offset,
> @@ -2039,9 +2028,6 @@ static int ceph_d_revalidate(struct inode
> *parent_dir, const struct qstr *name,
>  	doutc(cl, "%p '%pd' %s\n", dentry, dentry, valid ? "valid" :
> "invalid");
>  	if (!valid)
>  		ceph_dir_clear_complete(dir);
> -
> -	if (!(flags & LOOKUP_RCU))
> -		dput(parent);
>  	return valid;
>  }
>  

Looks much better now.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Thanks,
Slava.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCHES v2][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems)
  2025-01-10  2:38 [PATCHES][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
  2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
@ 2025-01-16  5:21 ` Al Viro
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
  2025-01-23  1:45   ` [PATCHES v3][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
  1 sibling, 2 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:21 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Gabriel Krisman Bertazi, Christian Brauner,
	Jan Kara, David Howells, ceph-devel, linux-nfs, Amir Goldstein,
	Miklos Szeredi, Andreas Gruenbacher, Mike Marshall

	Series updated and force-pushed to the same place:
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.d_revalidate
itself on top of #work.dcache.
 
	Individual patches in followups; please, review.

	Changes since v1:
* reordered external_name members to get rid of hole on 64bit, as suggested by
dhowells.
* split the added method in two in the last commit ("9p: fix ->rename_sem exclusion")

	Folks, if no objections materialize, into #for-next it goes.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size
  2025-01-16  5:21 ` [PATCHES v2][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
@ 2025-01-16  5:22   ` Al Viro
  2025-01-16  5:22     ` [PATCH v2 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
                       ` (18 more replies)
  2025-01-23  1:45   ` [PATCHES v3][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
  1 sibling, 19 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:22 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... calling the number of words DNAME_INLINE_WORDS.

The next step will be to have a structure to hold inline name arrays
(both in dentry and in name_snapshot) and use that to alias the
existing arrays of unsigned char there.  That will allow both
full-structure copies and convenient word-by-word accesses.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c            | 4 +---
 include/linux/dcache.h | 8 +++++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b4d5e9e1e43d..ea0f0bea511b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2748,9 +2748,7 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			/*
 			 * Both are internal.
 			 */
-			unsigned int i;
-			BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
-			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
+			for (int i = 0; i < DNAME_INLINE_WORDS; i++) {
 				swap(((long *) &dentry->d_iname)[i],
 				     ((long *) &target->d_iname)[i]);
 			}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index bff956f7b2b9..42dd89beaf4e 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -68,15 +68,17 @@ extern const struct qstr dotdot_name;
  * large memory footprint increase).
  */
 #ifdef CONFIG_64BIT
-# define DNAME_INLINE_LEN 40 /* 192 bytes */
+# define DNAME_INLINE_WORDS 5 /* 192 bytes */
 #else
 # ifdef CONFIG_SMP
-#  define DNAME_INLINE_LEN 36 /* 128 bytes */
+#  define DNAME_INLINE_WORDS 9 /* 128 bytes */
 # else
-#  define DNAME_INLINE_LEN 44 /* 128 bytes */
+#  define DNAME_INLINE_WORDS 11 /* 128 bytes */
 # endif
 #endif
 
+#define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
+
 #define d_lock	d_lockref.lock
 
 struct dentry {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 02/20] dcache: back inline names with a struct-wrapped array of unsigned long
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
@ 2025-01-16  5:22     ` Al Viro
  2025-01-16  5:23     ` [PATCH v2 03/20] make take_dentry_name_snapshot() lockless Al Viro
                       ` (17 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:22 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... so that they can be copied with struct assignment (which generates
better code) and accessed word-by-word.

The type is union shortname_storage; it's a union of arrays of
unsigned char and unsigned long.

struct name_snapshot.inline_name turned into union shortname_storage;
users (all in fs/dcache.c) adjusted.

struct dentry.d_iname has some users outside of fs/dcache.c; to
reduce the amount of noise in commit, it is replaced with
union shortname_storage d_shortname and d_iname is turned into a macro
that expands to d_shortname.string (similar to d_lock handling).
That compat macro is temporary - most of the remaining instances will
be taken out by debugfs series, and once that is merged and few others
are taken care of this will go away.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c                                  | 43 +++++++++-----------
 include/linux/dcache.h                       | 10 ++++-
 tools/testing/selftests/bpf/progs/find_vma.c |  2 +-
 3 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ea0f0bea511b..52662a5d08e4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -324,7 +324,7 @@ static void __d_free_external(struct rcu_head *head)
 
 static inline int dname_external(const struct dentry *dentry)
 {
-	return dentry->d_name.name != dentry->d_iname;
+	return dentry->d_name.name != dentry->d_shortname.string;
 }
 
 void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
@@ -334,9 +334,8 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry
 	if (unlikely(dname_external(dentry))) {
 		atomic_inc(&external_name(dentry)->u.count);
 	} else {
-		memcpy(name->inline_name, dentry->d_iname,
-		       dentry->d_name.len + 1);
-		name->name.name = name->inline_name;
+		name->inline_name = dentry->d_shortname;
+		name->name.name = name->inline_name.string;
 	}
 	spin_unlock(&dentry->d_lock);
 }
@@ -344,7 +343,7 @@ EXPORT_SYMBOL(take_dentry_name_snapshot);
 
 void release_dentry_name_snapshot(struct name_snapshot *name)
 {
-	if (unlikely(name->name.name != name->inline_name)) {
+	if (unlikely(name->name.name != name->inline_name.string)) {
 		struct external_name *p;
 		p = container_of(name->name.name, struct external_name, name[0]);
 		if (unlikely(atomic_dec_and_test(&p->u.count)))
@@ -1654,10 +1653,10 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	 * will still always have a NUL at the end, even if we might
 	 * be overwriting an internal NUL character
 	 */
-	dentry->d_iname[DNAME_INLINE_LEN-1] = 0;
+	dentry->d_shortname.string[DNAME_INLINE_LEN-1] = 0;
 	if (unlikely(!name)) {
 		name = &slash_name;
-		dname = dentry->d_iname;
+		dname = dentry->d_shortname.string;
 	} else if (name->len > DNAME_INLINE_LEN-1) {
 		size_t size = offsetof(struct external_name, name[1]);
 		struct external_name *p = kmalloc(size + name->len,
@@ -1670,7 +1669,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 		atomic_set(&p->u.count, 1);
 		dname = p->name;
 	} else  {
-		dname = dentry->d_iname;
+		dname = dentry->d_shortname.string;
 	}	
 
 	dentry->d_name.len = name->len;
@@ -2729,10 +2728,9 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			 * dentry:internal, target:external.  Steal target's
 			 * storage and make target internal.
 			 */
-			memcpy(target->d_iname, dentry->d_name.name,
-					dentry->d_name.len + 1);
 			dentry->d_name.name = target->d_name.name;
-			target->d_name.name = target->d_iname;
+			target->d_shortname = dentry->d_shortname;
+			target->d_name.name = target->d_shortname.string;
 		}
 	} else {
 		if (unlikely(dname_external(dentry))) {
@@ -2740,18 +2738,16 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			 * dentry:external, target:internal.  Give dentry's
 			 * storage to target and make dentry internal
 			 */
-			memcpy(dentry->d_iname, target->d_name.name,
-					target->d_name.len + 1);
 			target->d_name.name = dentry->d_name.name;
-			dentry->d_name.name = dentry->d_iname;
+			dentry->d_shortname = target->d_shortname;
+			dentry->d_name.name = dentry->d_shortname.string;
 		} else {
 			/*
 			 * Both are internal.
 			 */
-			for (int i = 0; i < DNAME_INLINE_WORDS; i++) {
-				swap(((long *) &dentry->d_iname)[i],
-				     ((long *) &target->d_iname)[i]);
-			}
+			for (int i = 0; i < DNAME_INLINE_WORDS; i++)
+				swap(dentry->d_shortname.words[i],
+				     target->d_shortname.words[i]);
 		}
 	}
 	swap(dentry->d_name.hash_len, target->d_name.hash_len);
@@ -2766,9 +2762,8 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 		atomic_inc(&external_name(target)->u.count);
 		dentry->d_name = target->d_name;
 	} else {
-		memcpy(dentry->d_iname, target->d_name.name,
-				target->d_name.len + 1);
-		dentry->d_name.name = dentry->d_iname;
+		dentry->d_shortname = target->d_shortname;
+		dentry->d_name.name = dentry->d_shortname.string;
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
 	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
@@ -3101,12 +3096,12 @@ void d_mark_tmpfile(struct file *file, struct inode *inode)
 {
 	struct dentry *dentry = file->f_path.dentry;
 
-	BUG_ON(dentry->d_name.name != dentry->d_iname ||
+	BUG_ON(dname_external(dentry) ||
 		!hlist_unhashed(&dentry->d_u.d_alias) ||
 		!d_unlinked(dentry));
 	spin_lock(&dentry->d_parent->d_lock);
 	spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-	dentry->d_name.len = sprintf(dentry->d_iname, "#%llu",
+	dentry->d_name.len = sprintf(dentry->d_shortname.string, "#%llu",
 				(unsigned long long)inode->i_ino);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dentry->d_parent->d_lock);
@@ -3194,7 +3189,7 @@ static void __init dcache_init(void)
 	 */
 	dentry_cache = KMEM_CACHE_USERCOPY(dentry,
 		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT,
-		d_iname);
+		d_shortname.string);
 
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 42dd89beaf4e..8bc567a35718 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -79,7 +79,13 @@ extern const struct qstr dotdot_name;
 
 #define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
 
+union shortname_store {
+	unsigned char string[DNAME_INLINE_LEN];
+	unsigned long words[DNAME_INLINE_WORDS];
+};
+
 #define d_lock	d_lockref.lock
+#define d_iname d_shortname.string
 
 struct dentry {
 	/* RCU lookup touched fields */
@@ -90,7 +96,7 @@ struct dentry {
 	struct qstr d_name;
 	struct inode *d_inode;		/* Where the name belongs to - NULL is
 					 * negative */
-	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
+	union shortname_store d_shortname;
 	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
 
 	/* Ref lookup also touches following */
@@ -591,7 +597,7 @@ static inline struct inode *d_real_inode(const struct dentry *dentry)
 
 struct name_snapshot {
 	struct qstr name;
-	unsigned char inline_name[DNAME_INLINE_LEN];
+	union shortname_store inline_name;
 };
 void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
 void release_dentry_name_snapshot(struct name_snapshot *);
diff --git a/tools/testing/selftests/bpf/progs/find_vma.c b/tools/testing/selftests/bpf/progs/find_vma.c
index 38034fb82530..02b82774469c 100644
--- a/tools/testing/selftests/bpf/progs/find_vma.c
+++ b/tools/testing/selftests/bpf/progs/find_vma.c
@@ -25,7 +25,7 @@ static long check_vma(struct task_struct *task, struct vm_area_struct *vma,
 {
 	if (vma->vm_file)
 		bpf_probe_read_kernel_str(d_iname, DNAME_INLINE_LEN - 1,
-					  vma->vm_file->f_path.dentry->d_iname);
+					  vma->vm_file->f_path.dentry->d_shortname.string);
 
 	/* check for VM_EXEC */
 	if (vma->vm_flags & VM_EXEC)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 03/20] make take_dentry_name_snapshot() lockless
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
  2025-01-16  5:22     ` [PATCH v2 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-16  5:23     ` [PATCH v2 04/20] dissolve external_name.u into separate members Al Viro
                       ` (16 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Use ->d_seq instead of grabbing ->d_lock; in case of shortname dentries
that avoids any stores to shared data objects and in case of long names
we are down to (unavoidable) atomic_inc on the external_name refcount.

Makes the thing safer as well - the areas where ->d_seq is held odd are
all nested inside the areas where ->d_lock is held, and the latter are
much more numerous.

NOTE: now that there is a lockless path where we might try to grab
a reference to an already doomed external_name instance, it is no
longer possible for external_name.u.count and external_name.u.head
to share space (kudos to Linus for spotting that).

To reduce the noice this commit just make external_name.u a struct
(instead of union); the next commit will dissolve it.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 52662a5d08e4..f387dc97df86 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -296,9 +296,9 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
 }
 
 struct external_name {
-	union {
-		atomic_t count;
-		struct rcu_head head;
+	struct {
+		atomic_t count;		// ->count and ->head can't be combined
+		struct rcu_head head;	// see take_dentry_name_snapshot()
 	} u;
 	unsigned char name[];
 };
@@ -329,15 +329,30 @@ static inline int dname_external(const struct dentry *dentry)
 
 void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
 {
-	spin_lock(&dentry->d_lock);
-	name->name = dentry->d_name;
-	if (unlikely(dname_external(dentry))) {
-		atomic_inc(&external_name(dentry)->u.count);
-	} else {
+	unsigned seq;
+	const unsigned char *s;
+
+	rcu_read_lock();
+retry:
+	seq = read_seqcount_begin(&dentry->d_seq);
+	s = READ_ONCE(dentry->d_name.name);
+	name->name.hash_len = dentry->d_name.hash_len;
+	name->name.name = name->inline_name.string;
+	if (likely(s == dentry->d_shortname.string)) {
 		name->inline_name = dentry->d_shortname;
-		name->name.name = name->inline_name.string;
+	} else {
+		struct external_name *p;
+		p = container_of(s, struct external_name, name[0]);
+		// get a valid reference
+		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
+			goto retry;
+		name->name.name = s;
 	}
-	spin_unlock(&dentry->d_lock);
+	if (read_seqcount_retry(&dentry->d_seq, seq)) {
+		release_dentry_name_snapshot(name);
+		goto retry;
+	}
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(take_dentry_name_snapshot);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 04/20] dissolve external_name.u into separate members
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
  2025-01-16  5:22     ` [PATCH v2 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
  2025-01-16  5:23     ` [PATCH v2 03/20] make take_dentry_name_snapshot() lockless Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-16 10:06       ` Jan Kara
  2025-01-16  5:23     ` [PATCH v2 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
                       ` (15 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

kept separate from the previous commit to keep the noise separate
from actual changes...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f387dc97df86..6f36d3e8c739 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -296,10 +296,8 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
 }
 
 struct external_name {
-	struct {
-		atomic_t count;		// ->count and ->head can't be combined
-		struct rcu_head head;	// see take_dentry_name_snapshot()
-	} u;
+	struct rcu_head head;	// ->head and ->count can't be combined
+	atomic_t count;		// see take_dentry_name_snapshot()
 	unsigned char name[];
 };
 
@@ -344,7 +342,7 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry
 		struct external_name *p;
 		p = container_of(s, struct external_name, name[0]);
 		// get a valid reference
-		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
+		if (unlikely(!atomic_inc_not_zero(&p->count)))
 			goto retry;
 		name->name.name = s;
 	}
@@ -361,8 +359,8 @@ void release_dentry_name_snapshot(struct name_snapshot *name)
 	if (unlikely(name->name.name != name->inline_name.string)) {
 		struct external_name *p;
 		p = container_of(name->name.name, struct external_name, name[0]);
-		if (unlikely(atomic_dec_and_test(&p->u.count)))
-			kfree_rcu(p, u.head);
+		if (unlikely(atomic_dec_and_test(&p->count)))
+			kfree_rcu(p, head);
 	}
 }
 EXPORT_SYMBOL(release_dentry_name_snapshot);
@@ -400,7 +398,7 @@ static void dentry_free(struct dentry *dentry)
 	WARN_ON(!hlist_unhashed(&dentry->d_u.d_alias));
 	if (unlikely(dname_external(dentry))) {
 		struct external_name *p = external_name(dentry);
-		if (likely(atomic_dec_and_test(&p->u.count))) {
+		if (likely(atomic_dec_and_test(&p->count))) {
 			call_rcu(&dentry->d_u.d_rcu, __d_free_external);
 			return;
 		}
@@ -1681,7 +1679,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 			kmem_cache_free(dentry_cache, dentry); 
 			return NULL;
 		}
-		atomic_set(&p->u.count, 1);
+		atomic_set(&p->count, 1);
 		dname = p->name;
 	} else  {
 		dname = dentry->d_shortname.string;
@@ -2774,15 +2772,15 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 	if (unlikely(dname_external(dentry)))
 		old_name = external_name(dentry);
 	if (unlikely(dname_external(target))) {
-		atomic_inc(&external_name(target)->u.count);
+		atomic_inc(&external_name(target)->count);
 		dentry->d_name = target->d_name;
 	} else {
 		dentry->d_shortname = target->d_shortname;
 		dentry->d_name.name = dentry->d_shortname.string;
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
-	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
-		kfree_rcu(old_name, u.head);
+	if (old_name && likely(atomic_dec_and_test(&old_name->count)))
+		kfree_rcu(old_name, head);
 }
 
 /*
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 05/20] ext4 fast_commit: make use of name_snapshot primitives
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (2 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 04/20] dissolve external_name.u into separate members Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-16  5:23     ` [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
                       ` (14 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... rather than open-coding them.  As a bonus, that avoids the pointless
work with extra allocations, etc. for long names.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ext4/fast_commit.c | 29 +++++------------------------
 fs/ext4/fast_commit.h |  3 +--
 2 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 26c4fc37edcf..da4263a14a20 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -322,9 +322,7 @@ void ext4_fc_del(struct inode *inode)
 	WARN_ON(!list_empty(&ei->i_fc_dilist));
 	spin_unlock(&sbi->s_fc_lock);
 
-	if (fc_dentry->fcd_name.name &&
-		fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
-		kfree(fc_dentry->fcd_name.name);
+	release_dentry_name_snapshot(&fc_dentry->fcd_name);
 	kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
 
 	return;
@@ -449,22 +447,7 @@ static int __track_dentry_update(handle_t *handle, struct inode *inode,
 	node->fcd_op = dentry_update->op;
 	node->fcd_parent = dir->i_ino;
 	node->fcd_ino = inode->i_ino;
-	if (dentry->d_name.len > DNAME_INLINE_LEN) {
-		node->fcd_name.name = kmalloc(dentry->d_name.len, GFP_NOFS);
-		if (!node->fcd_name.name) {
-			kmem_cache_free(ext4_fc_dentry_cachep, node);
-			ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_NOMEM, handle);
-			mutex_lock(&ei->i_fc_lock);
-			return -ENOMEM;
-		}
-		memcpy((u8 *)node->fcd_name.name, dentry->d_name.name,
-			dentry->d_name.len);
-	} else {
-		memcpy(node->fcd_iname, dentry->d_name.name,
-			dentry->d_name.len);
-		node->fcd_name.name = node->fcd_iname;
-	}
-	node->fcd_name.len = dentry->d_name.len;
+	take_dentry_name_snapshot(&node->fcd_name, dentry);
 	INIT_LIST_HEAD(&node->fcd_dilist);
 	spin_lock(&sbi->s_fc_lock);
 	if (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
@@ -832,7 +815,7 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
 {
 	struct ext4_fc_dentry_info fcd;
 	struct ext4_fc_tl tl;
-	int dlen = fc_dentry->fcd_name.len;
+	int dlen = fc_dentry->fcd_name.name.len;
 	u8 *dst = ext4_fc_reserve_space(sb,
 			EXT4_FC_TAG_BASE_LEN + sizeof(fcd) + dlen, crc);
 
@@ -847,7 +830,7 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
 	dst += EXT4_FC_TAG_BASE_LEN;
 	memcpy(dst, &fcd, sizeof(fcd));
 	dst += sizeof(fcd);
-	memcpy(dst, fc_dentry->fcd_name.name, dlen);
+	memcpy(dst, fc_dentry->fcd_name.name.name, dlen);
 
 	return true;
 }
@@ -1328,9 +1311,7 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
 		list_del_init(&fc_dentry->fcd_dilist);
 		spin_unlock(&sbi->s_fc_lock);
 
-		if (fc_dentry->fcd_name.name &&
-			fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
-			kfree(fc_dentry->fcd_name.name);
+		release_dentry_name_snapshot(&fc_dentry->fcd_name);
 		kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
 		spin_lock(&sbi->s_fc_lock);
 	}
diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
index 2fadb2c4780c..3bd534e4dbbf 100644
--- a/fs/ext4/fast_commit.h
+++ b/fs/ext4/fast_commit.h
@@ -109,8 +109,7 @@ struct ext4_fc_dentry_update {
 	int fcd_op;		/* Type of update create / unlink / link */
 	int fcd_parent;		/* Parent inode number */
 	int fcd_ino;		/* Inode number */
-	struct qstr fcd_name;	/* Dirent name */
-	unsigned char fcd_iname[DNAME_INLINE_LEN];	/* Dirent name string */
+	struct name_snapshot fcd_name;	/* Dirent name */
 	struct list_head fcd_list;
 	struct list_head fcd_dilist;
 };
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (3 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-16 15:38       ` Gabriel Krisman Bertazi
  2025-01-16  5:23     ` [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
                       ` (13 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... and check the "name might be unstable" predicate
the right way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/libfs.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 748ac5923154..3ad1b1b7fed6 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1789,7 +1789,7 @@ int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 {
 	const struct dentry *parent;
 	const struct inode *dir;
-	char strbuf[DNAME_INLINE_LEN];
+	union shortname_store strbuf;
 	struct qstr qstr;
 
 	/*
@@ -1809,22 +1809,23 @@ int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 	if (!dir || !IS_CASEFOLDED(dir))
 		return 1;
 
+	qstr.len = len;
+	qstr.name = str;
 	/*
 	 * If the dentry name is stored in-line, then it may be concurrently
 	 * modified by a rename.  If this happens, the VFS will eventually retry
 	 * the lookup, so it doesn't matter what ->d_compare() returns.
 	 * However, it's unsafe to call utf8_strncasecmp() with an unstable
 	 * string.  Therefore, we have to copy the name into a temporary buffer.
+	 * As above, len is guaranteed to match str, so the shortname case
+	 * is exactly when str points to ->d_shortname.
 	 */
-	if (len <= DNAME_INLINE_LEN - 1) {
-		memcpy(strbuf, str, len);
-		strbuf[len] = 0;
-		str = strbuf;
+	if (qstr.name == dentry->d_shortname.string) {
+		strbuf = dentry->d_shortname; // NUL is guaranteed to be in there
+		qstr.name = strbuf.string;
 		/* prevent compiler from optimizing out the temporary buffer */
 		barrier();
 	}
-	qstr.len = len;
-	qstr.name = str;
 
 	return utf8_strncasecmp(dentry->d_sb->s_encoding, name, &qstr);
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate()
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (4 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-16 15:15       ` Gabriel Krisman Bertazi
  2025-01-17 18:55       ` Jeff Layton
  2025-01-16  5:23     ` [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
                       ` (12 subsequent siblings)
  18 siblings, 2 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

->d_revalidate() often needs to access dentry parent and name; that has
to be done carefully, since the locking environment varies from caller
to caller.  We are not guaranteed that dentry in question will not be
moved right under us - not unless the filesystem is such that nothing
on it ever gets renamed.

It can be dealt with, but that results in boilerplate code that isn't
even needed - the callers normally have just found the dentry via dcache
lookup and want to verify that it's in the right place; they already
have the values of ->d_parent and ->d_name stable.  There is a couple
of exceptions (overlayfs and, to less extent, ecryptfs), but for the
majority of calls that song and dance is not needed at all.

It's easier to make ecryptfs and overlayfs find and pass those values if
there's a ->d_revalidate() instance to be called, rather than doing that
in the instances.

This commit only changes the calling conventions; making use of supplied
values is left to followups.

NOTE: some instances need more than just the parent - things like CIFS
may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate.  This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/locking.rst |  3 ++-
 Documentation/filesystems/porting.rst | 13 +++++++++++++
 Documentation/filesystems/vfs.rst     |  3 ++-
 fs/9p/vfs_dentry.c                    | 10 ++++++++--
 fs/afs/dir.c                          |  6 ++++--
 fs/ceph/dir.c                         |  5 +++--
 fs/coda/dir.c                         |  3 ++-
 fs/crypto/fname.c                     |  3 ++-
 fs/ecryptfs/dentry.c                  | 18 ++++++++++++++----
 fs/exfat/namei.c                      |  3 ++-
 fs/fat/namei_vfat.c                   |  6 ++++--
 fs/fuse/dir.c                         |  3 ++-
 fs/gfs2/dentry.c                      |  7 +++++--
 fs/hfs/sysdep.c                       |  3 ++-
 fs/jfs/namei.c                        |  3 ++-
 fs/kernfs/dir.c                       |  3 ++-
 fs/namei.c                            | 18 ++++++++++--------
 fs/nfs/dir.c                          |  9 ++++++---
 fs/ocfs2/dcache.c                     |  3 ++-
 fs/orangefs/dcache.c                  |  3 ++-
 fs/overlayfs/super.c                  | 22 ++++++++++++++++++++--
 fs/proc/base.c                        |  6 ++++--
 fs/proc/fd.c                          |  3 ++-
 fs/proc/generic.c                     |  6 ++++--
 fs/proc/proc_sysctl.c                 |  3 ++-
 fs/smb/client/dir.c                   |  3 ++-
 fs/tracefs/inode.c                    |  3 ++-
 fs/vboxsf/dir.c                       |  3 ++-
 include/linux/dcache.h                |  3 ++-
 include/linux/fscrypt.h               |  7 ++++---
 30 files changed, 133 insertions(+), 51 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index f5e3676db954..146e7d8aa736 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -17,7 +17,8 @@ dentry_operations
 
 prototypes::
 
-	int (*d_revalidate)(struct dentry *, unsigned int);
+	int (*d_revalidate)(struct inode *, const struct qstr *,
+			    struct dentry *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
 	int (*d_compare)(const struct dentry *,
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 9ab2a3d6f2b4..b50c3ce36ef2 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1141,3 +1141,16 @@ pointer are gone.
 
 set_blocksize() takes opened struct file instead of struct block_device now
 and it *must* be opened exclusive.
+
+---
+
+** mandatory**
+
+->d_revalidate() gets two extra arguments - inode of parent directory and
+name our dentry is expected to have.  Both are stable (dir is pinned in
+non-RCU case and will stay around during the call in RCU case, and name
+is guaranteed to stay unchanging).  Your instance doesn't have to use
+either, but it often helps to avoid a lot of painful boilerplate.
+NOTE: if you need something like full path from the root of filesystem,
+you are still on your own - this assists with simple cases, but it's not
+magic.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 0b18af3f954e..7c352ebaae98 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1251,7 +1251,8 @@ defined:
 .. code-block:: c
 
 	struct dentry_operations {
-		int (*d_revalidate)(struct dentry *, unsigned int);
+		int (*d_revalidate)(struct inode *, const struct qstr *,
+				    struct dentry *, unsigned int);
 		int (*d_weak_revalidate)(struct dentry *, unsigned int);
 		int (*d_hash)(const struct dentry *, struct qstr *);
 		int (*d_compare)(const struct dentry *,
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 01338d4c2d9e..872c1abe3295 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -61,7 +61,7 @@ static void v9fs_dentry_release(struct dentry *dentry)
 		p9_fid_put(hlist_entry(p, struct p9_fid, dlist));
 }
 
-static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
+static int __v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 {
 	struct p9_fid *fid;
 	struct inode *inode;
@@ -99,9 +99,15 @@ static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 	return 1;
 }
 
+static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
+{
+	return __v9fs_lookup_revalidate(dentry, flags);
+}
+
 const struct dentry_operations v9fs_cached_dentry_operations = {
 	.d_revalidate = v9fs_lookup_revalidate,
-	.d_weak_revalidate = v9fs_lookup_revalidate,
+	.d_weak_revalidate = __v9fs_lookup_revalidate,
 	.d_delete = v9fs_cached_dentry_delete,
 	.d_release = v9fs_dentry_release,
 };
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index ada363af5aab..9780013cd83a 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -22,7 +22,8 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
 				 unsigned int flags);
 static int afs_dir_open(struct inode *inode, struct file *file);
 static int afs_readdir(struct file *file, struct dir_context *ctx);
-static int afs_d_revalidate(struct dentry *dentry, unsigned int flags);
+static int afs_d_revalidate(struct inode *dir, const struct qstr *name,
+			    struct dentry *dentry, unsigned int flags);
 static int afs_d_delete(const struct dentry *dentry);
 static void afs_d_iput(struct dentry *dentry, struct inode *inode);
 static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name, int nlen,
@@ -1093,7 +1094,8 @@ static int afs_d_revalidate_rcu(struct dentry *dentry)
  * - NOTE! the hit can be a negative hit too, so we can't assume we have an
  *   inode
  */
-static int afs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+			    struct dentry *dentry, unsigned int flags)
 {
 	struct afs_vnode *vnode, *dir;
 	struct afs_fid fid;
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 0bf388e07a02..c4c71c24221b 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1940,7 +1940,8 @@ static int dir_lease_is_valid(struct inode *dir, struct dentry *dentry,
 /*
  * Check if cached dentry can be trusted.
  */
-static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+			     struct dentry *dentry, unsigned int flags)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_fs_client(dentry->d_sb)->mdsc;
 	struct ceph_client *cl = mdsc->fsc->client;
@@ -1948,7 +1949,7 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 	struct dentry *parent;
 	struct inode *dir, *inode;
 
-	valid = fscrypt_d_revalidate(dentry, flags);
+	valid = fscrypt_d_revalidate(parent_dir, name, dentry, flags);
 	if (valid <= 0)
 		return valid;
 
diff --git a/fs/coda/dir.c b/fs/coda/dir.c
index 4e552ba7bd43..a3e2dfeedfbf 100644
--- a/fs/coda/dir.c
+++ b/fs/coda/dir.c
@@ -445,7 +445,8 @@ static int coda_readdir(struct file *coda_file, struct dir_context *ctx)
 }
 
 /* called when a cache lookup succeeds */
-static int coda_dentry_revalidate(struct dentry *de, unsigned int flags)
+static int coda_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *de, unsigned int flags)
 {
 	struct inode *inode;
 	struct coda_inode_info *cii;
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 0ad52fbe51c9..389f5b2bf63b 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -574,7 +574,8 @@ EXPORT_SYMBOL_GPL(fscrypt_fname_siphash);
  * Validate dentries in encrypted directories to make sure we aren't potentially
  * caching stale dentries after a key has been added.
  */
-int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
+int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+			 struct dentry *dentry, unsigned int flags)
 {
 	struct dentry *dir;
 	int err;
diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
index acaa0825e9bb..1dfd5b81d831 100644
--- a/fs/ecryptfs/dentry.c
+++ b/fs/ecryptfs/dentry.c
@@ -17,7 +17,9 @@
 
 /**
  * ecryptfs_d_revalidate - revalidate an ecryptfs dentry
- * @dentry: The ecryptfs dentry
+ * @dir: inode of expected parent
+ * @name: expected name
+ * @dentry: dentry to revalidate
  * @flags: lookup flags
  *
  * Called when the VFS needs to revalidate a dentry. This
@@ -28,7 +30,8 @@
  * Returns 1 if valid, 0 otherwise.
  *
  */
-static int ecryptfs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int ecryptfs_d_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
 	int rc = 1;
@@ -36,8 +39,15 @@ static int ecryptfs_d_revalidate(struct dentry *dentry, unsigned int flags)
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	if (lower_dentry->d_flags & DCACHE_OP_REVALIDATE)
-		rc = lower_dentry->d_op->d_revalidate(lower_dentry, flags);
+	if (lower_dentry->d_flags & DCACHE_OP_REVALIDATE) {
+		struct inode *lower_dir = ecryptfs_inode_to_lower(dir);
+		struct name_snapshot n;
+
+		take_dentry_name_snapshot(&n, lower_dentry);
+		rc = lower_dentry->d_op->d_revalidate(lower_dir, &n.name,
+						      lower_dentry, flags);
+		release_dentry_name_snapshot(&n);
+	}
 
 	if (d_really_is_positive(dentry)) {
 		struct inode *inode = d_inode(dentry);
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 97d2774760fe..e3b4feccba07 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -31,7 +31,8 @@ static inline void exfat_d_version_set(struct dentry *dentry,
  * If it happened, the negative dentry isn't actually negative anymore.  So,
  * drop it.
  */
-static int exfat_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int exfat_d_revalidate(struct inode *dir, const struct qstr *name,
+			      struct dentry *dentry, unsigned int flags)
 {
 	int ret;
 
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 15bf32c21ac0..f9cbd5c6f932 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -53,7 +53,8 @@ static int vfat_revalidate_shortname(struct dentry *dentry)
 	return ret;
 }
 
-static int vfat_revalidate(struct dentry *dentry, unsigned int flags)
+static int vfat_revalidate(struct inode *dir, const struct qstr *name,
+			   struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
@@ -64,7 +65,8 @@ static int vfat_revalidate(struct dentry *dentry, unsigned int flags)
 	return vfat_revalidate_shortname(dentry);
 }
 
-static int vfat_revalidate_ci(struct dentry *dentry, unsigned int flags)
+static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
+			      struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 494ac372ace0..d9e9f26917eb 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -192,7 +192,8 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
  * the lookup once more.  If the lookup results in the same inode,
  * then refresh the attributes, timeouts and mark the dentry valid.
  */
-static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
+static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *entry, unsigned int flags)
 {
 	struct inode *inode;
 	struct dentry *parent;
diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c
index 2e215e8c3c88..86c338901fab 100644
--- a/fs/gfs2/dentry.c
+++ b/fs/gfs2/dentry.c
@@ -21,7 +21,9 @@
 
 /**
  * gfs2_drevalidate - Check directory lookup consistency
- * @dentry: the mapping to check
+ * @dir: expected parent directory inode
+ * @name: expexted name
+ * @dentry: dentry to check
  * @flags: lookup flags
  *
  * Check to make sure the lookup necessary to arrive at this inode from its
@@ -30,7 +32,8 @@
  * Returns: 1 if the dentry is ok, 0 if it isn't
  */
 
-static int gfs2_drevalidate(struct dentry *dentry, unsigned int flags)
+static int gfs2_drevalidate(struct inode *dir, const struct qstr *name,
+			    struct dentry *dentry, unsigned int flags)
 {
 	struct dentry *parent;
 	struct gfs2_sbd *sdp;
diff --git a/fs/hfs/sysdep.c b/fs/hfs/sysdep.c
index 76fa02e3835b..ef54fc8093cf 100644
--- a/fs/hfs/sysdep.c
+++ b/fs/hfs/sysdep.c
@@ -13,7 +13,8 @@
 
 /* dentry case-handling: just lowercase everything */
 
-static int hfs_revalidate_dentry(struct dentry *dentry, unsigned int flags)
+static int hfs_revalidate_dentry(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	int diff;
diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index d68a4e6ac345..fc8ede43afde 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -1576,7 +1576,8 @@ static int jfs_ci_compare(const struct dentry *dentry,
 	return result;
 }
 
-static int jfs_ci_revalidate(struct dentry *dentry, unsigned int flags)
+static int jfs_ci_revalidate(struct inode *dir, const struct qstr *name,
+			     struct dentry *dentry, unsigned int flags)
 {
 	/*
 	 * This is not negative dentry. Always valid.
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 458519e416fe..5f0f8b95f44c 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1109,7 +1109,8 @@ struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent,
 	return ERR_PTR(rc);
 }
 
-static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
+static int kernfs_dop_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	struct kernfs_node *kn;
 	struct kernfs_root *root;
diff --git a/fs/namei.c b/fs/namei.c
index 9d30c7aa9aa6..77e5d136faaf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -921,10 +921,11 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 	return false;
 }
 
-static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
+static inline int d_revalidate(struct inode *dir, const struct qstr *name,
+			       struct dentry *dentry, unsigned int flags)
 {
 	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
-		return dentry->d_op->d_revalidate(dentry, flags);
+		return dentry->d_op->d_revalidate(dir, name, dentry, flags);
 	else
 		return 1;
 }
@@ -1652,7 +1653,7 @@ static struct dentry *lookup_dcache(const struct qstr *name,
 {
 	struct dentry *dentry = d_lookup(dir, name);
 	if (dentry) {
-		int error = d_revalidate(dentry, flags);
+		int error = d_revalidate(dir->d_inode, name, dentry, flags);
 		if (unlikely(error <= 0)) {
 			if (!error)
 				d_invalidate(dentry);
@@ -1737,19 +1738,20 @@ static struct dentry *lookup_fast(struct nameidata *nd)
 		if (read_seqcount_retry(&parent->d_seq, nd->seq))
 			return ERR_PTR(-ECHILD);
 
-		status = d_revalidate(dentry, nd->flags);
+		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags);
 		if (likely(status > 0))
 			return dentry;
 		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
-			status = d_revalidate(dentry, nd->flags);
+			status = d_revalidate(nd->inode, &nd->last,
+					      dentry, nd->flags);
 	} else {
 		dentry = __d_lookup(parent, &nd->last);
 		if (unlikely(!dentry))
 			return NULL;
-		status = d_revalidate(dentry, nd->flags);
+		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags);
 	}
 	if (unlikely(status <= 0)) {
 		if (!status)
@@ -1777,7 +1779,7 @@ static struct dentry *__lookup_slow(const struct qstr *name,
 	if (IS_ERR(dentry))
 		return dentry;
 	if (unlikely(!d_in_lookup(dentry))) {
-		int error = d_revalidate(dentry, flags);
+		int error = d_revalidate(inode, name, dentry, flags);
 		if (unlikely(error <= 0)) {
 			if (!error) {
 				d_invalidate(dentry);
@@ -3575,7 +3577,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		if (d_in_lookup(dentry))
 			break;
 
-		error = d_revalidate(dentry, nd->flags);
+		error = d_revalidate(dir_inode, &nd->last, dentry, nd->flags);
 		if (likely(error > 0))
 			break;
 		if (error)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 492cffd9d3d8..9910d9796f4c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1814,7 +1814,8 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
 	return ret;
 }
 
-static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
+static int nfs_lookup_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	return __nfs_lookup_revalidate(dentry, flags, nfs_do_lookup_revalidate);
 }
@@ -2025,7 +2026,8 @@ void nfs_d_prune_case_insensitive_aliases(struct inode *inode)
 EXPORT_SYMBOL_GPL(nfs_d_prune_case_insensitive_aliases);
 
 #if IS_ENABLED(CONFIG_NFS_V4)
-static int nfs4_lookup_revalidate(struct dentry *, unsigned int);
+static int nfs4_lookup_revalidate(struct inode *, const struct qstr *,
+				  struct dentry *, unsigned int);
 
 const struct dentry_operations nfs4_dentry_operations = {
 	.d_revalidate	= nfs4_lookup_revalidate,
@@ -2260,7 +2262,8 @@ nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 	return nfs_do_lookup_revalidate(dir, dentry, flags);
 }
 
-static int nfs4_lookup_revalidate(struct dentry *dentry, unsigned int flags)
+static int nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
 {
 	return __nfs_lookup_revalidate(dentry, flags,
 			nfs4_do_lookup_revalidate);
diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c
index a9b8688aaf30..ecb1ce6301c4 100644
--- a/fs/ocfs2/dcache.c
+++ b/fs/ocfs2/dcache.c
@@ -32,7 +32,8 @@ void ocfs2_dentry_attach_gen(struct dentry *dentry)
 }
 
 
-static int ocfs2_dentry_revalidate(struct dentry *dentry, unsigned int flags)
+static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				   struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	int ret = 0;    /* if all else fails, just return false */
diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
index 395a00ed8ac7..c32c9a86e8d0 100644
--- a/fs/orangefs/dcache.c
+++ b/fs/orangefs/dcache.c
@@ -92,7 +92,8 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
  *
  * Should return 1 if dentry can still be trusted, else 0.
  */
-static int orangefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int orangefs_d_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	int ret;
 	unsigned long time = (unsigned long) dentry->d_fsdata;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index fe511192f83c..86ae6f6da36b 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -91,7 +91,24 @@ static int ovl_revalidate_real(struct dentry *d, unsigned int flags, bool weak)
 		if (d->d_flags & DCACHE_OP_WEAK_REVALIDATE)
 			ret =  d->d_op->d_weak_revalidate(d, flags);
 	} else if (d->d_flags & DCACHE_OP_REVALIDATE) {
-		ret = d->d_op->d_revalidate(d, flags);
+		struct dentry *parent;
+		struct inode *dir;
+		struct name_snapshot n;
+
+		if (flags & LOOKUP_RCU) {
+			parent = READ_ONCE(d->d_parent);
+			dir = d_inode_rcu(parent);
+			if (!dir)
+				return -ECHILD;
+		} else {
+			parent = dget_parent(d);
+			dir = d_inode(parent);
+		}
+		take_dentry_name_snapshot(&n, d);
+		ret = d->d_op->d_revalidate(dir, &n.name, d, flags);
+		release_dentry_name_snapshot(&n);
+		if (!(flags & LOOKUP_RCU))
+			dput(parent);
 		if (!ret) {
 			if (!(flags & LOOKUP_RCU))
 				d_invalidate(d);
@@ -127,7 +144,8 @@ static int ovl_dentry_revalidate_common(struct dentry *dentry,
 	return ret;
 }
 
-static int ovl_dentry_revalidate(struct dentry *dentry, unsigned int flags)
+static int ovl_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	return ovl_dentry_revalidate_common(dentry, flags, false);
 }
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0edf14a9840e..fb5493d0edf0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2058,7 +2058,8 @@ void pid_update_inode(struct task_struct *task, struct inode *inode)
  * performed a setuid(), etc.
  *
  */
-static int pid_revalidate(struct dentry *dentry, unsigned int flags)
+static int pid_revalidate(struct inode *dir, const struct qstr *name,
+			  struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	struct task_struct *task;
@@ -2191,7 +2192,8 @@ static int dname_to_vma_addr(struct dentry *dentry,
 	return 0;
 }
 
-static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int map_files_d_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
 {
 	unsigned long vm_start, vm_end;
 	bool exact_vma_exists = false;
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 24baf23e864f..37aa778d1af7 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -140,7 +140,8 @@ static void tid_fd_update_inode(struct task_struct *task, struct inode *inode,
 	security_task_to_inode(task, inode);
 }
 
-static int tid_fd_revalidate(struct dentry *dentry, unsigned int flags)
+static int tid_fd_revalidate(struct inode *dir, const struct qstr *name,
+			     struct dentry *dentry, unsigned int flags)
 {
 	struct task_struct *task;
 	struct inode *inode;
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index dbe82cf23ee4..8ec90826a49e 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -216,7 +216,8 @@ void proc_free_inum(unsigned int inum)
 	ida_free(&proc_inum_ida, inum - PROC_DYNAMIC_FIRST);
 }
 
-static int proc_misc_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int proc_misc_d_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
@@ -343,7 +344,8 @@ static const struct file_operations proc_dir_operations = {
 	.iterate_shared		= proc_readdir,
 };
 
-static int proc_net_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int proc_net_d_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	return 0;
 }
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 27a283d85a6e..cc9d74a06ff0 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -884,7 +884,8 @@ static const struct inode_operations proc_sys_dir_operations = {
 	.getattr	= proc_sys_getattr,
 };
 
-static int proc_sys_revalidate(struct dentry *dentry, unsigned int flags)
+static int proc_sys_revalidate(struct inode *dir, const struct qstr *name,
+			       struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
diff --git a/fs/smb/client/dir.c b/fs/smb/client/dir.c
index 864b194dbaa0..8c5d44ee91ed 100644
--- a/fs/smb/client/dir.c
+++ b/fs/smb/client/dir.c
@@ -737,7 +737,8 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
 }
 
 static int
-cifs_d_revalidate(struct dentry *direntry, unsigned int flags)
+cifs_d_revalidate(struct inode *dir, const struct qstr *name,
+		  struct dentry *direntry, unsigned int flags)
 {
 	struct inode *inode;
 	int rc;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index cfc614c638da..53214499e384 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -457,7 +457,8 @@ static void tracefs_d_release(struct dentry *dentry)
 		eventfs_d_release(dentry);
 }
 
-static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int tracefs_d_revalidate(struct inode *inode, const struct qstr *name,
+				struct dentry *dentry, unsigned int flags)
 {
 	struct eventfs_inode *ei = dentry->d_fsdata;
 
diff --git a/fs/vboxsf/dir.c b/fs/vboxsf/dir.c
index 5f1a14d5b927..a859ac9b74ba 100644
--- a/fs/vboxsf/dir.c
+++ b/fs/vboxsf/dir.c
@@ -192,7 +192,8 @@ const struct file_operations vboxsf_dir_fops = {
  * This is called during name resolution/lookup to check if the @dentry in
  * the cache is still valid. the job is handled by vboxsf_inode_revalidate.
  */
-static int vboxsf_dentry_revalidate(struct dentry *dentry, unsigned int flags)
+static int vboxsf_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				    struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 8bc567a35718..4a6bdadf2f29 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -144,7 +144,8 @@ enum d_real_type {
 };
 
 struct dentry_operations {
-	int (*d_revalidate)(struct dentry *, unsigned int);
+	int (*d_revalidate)(struct inode *, const struct qstr *,
+			    struct dentry *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
 	int (*d_compare)(const struct dentry *,
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 772f822dc6b8..18855cb44b1c 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -192,7 +192,8 @@ struct fscrypt_operations {
 					     unsigned int *num_devs);
 };
 
-int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags);
+int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
+			 struct dentry *dentry, unsigned int flags);
 
 static inline struct fscrypt_inode_info *
 fscrypt_get_inode_info(const struct inode *inode)
@@ -711,8 +712,8 @@ static inline u64 fscrypt_fname_siphash(const struct inode *dir,
 	return 0;
 }
 
-static inline int fscrypt_d_revalidate(struct dentry *dentry,
-				       unsigned int flags)
+static inline int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
+				       struct dentry *dentry, unsigned int flags)
 {
 	return 1;
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (5 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-22 20:27       ` David Howells
  2025-01-16  5:23     ` [PATCH v2 09/20] ceph_d_revalidate(): use stable " Al Viro
                       ` (11 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to bother with boilerplate for obtaining the latter and for
the former we really should not count upon ->d_name.name remaining
stable under us.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/afs/dir.c | 34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 9780013cd83a..c6ee6257d4c6 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -607,19 +607,19 @@ static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name,
  * Do a lookup of a single name in a directory
  * - just returns the FID the dentry name maps to if found
  */
-static int afs_do_lookup_one(struct inode *dir, struct dentry *dentry,
+static int afs_do_lookup_one(struct inode *dir, const struct qstr *name,
 			     struct afs_fid *fid, struct key *key,
 			     afs_dataversion_t *_dir_version)
 {
 	struct afs_super_info *as = dir->i_sb->s_fs_info;
 	struct afs_lookup_one_cookie cookie = {
 		.ctx.actor = afs_lookup_one_filldir,
-		.name = dentry->d_name,
+		.name = *name,
 		.fid.vid = as->volume->vid
 	};
 	int ret;
 
-	_enter("{%lu},%p{%pd},", dir->i_ino, dentry, dentry);
+	_enter("{%lu},{%s},", dir->i_ino, name->name);
 
 	/* search the directory */
 	ret = afs_dir_iterate(dir, &cookie.ctx, key, _dir_version);
@@ -1052,21 +1052,12 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
 /*
  * Check the validity of a dentry under RCU conditions.
  */
-static int afs_d_revalidate_rcu(struct dentry *dentry)
+static int afs_d_revalidate_rcu(struct afs_vnode *dvnode, struct dentry *dentry)
 {
-	struct afs_vnode *dvnode;
-	struct dentry *parent;
-	struct inode *dir;
 	long dir_version, de_version;
 
 	_enter("%p", dentry);
 
-	/* Check the parent directory is still valid first. */
-	parent = READ_ONCE(dentry->d_parent);
-	dir = d_inode_rcu(parent);
-	if (!dir)
-		return -ECHILD;
-	dvnode = AFS_FS_I(dir);
 	if (test_bit(AFS_VNODE_DELETED, &dvnode->flags))
 		return -ECHILD;
 
@@ -1097,9 +1088,8 @@ static int afs_d_revalidate_rcu(struct dentry *dentry)
 static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 			    struct dentry *dentry, unsigned int flags)
 {
-	struct afs_vnode *vnode, *dir;
+	struct afs_vnode *vnode, *dir = AFS_FS_I(parent_dir);
 	struct afs_fid fid;
-	struct dentry *parent;
 	struct inode *inode;
 	struct key *key;
 	afs_dataversion_t dir_version, invalid_before;
@@ -1107,7 +1097,7 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	int ret;
 
 	if (flags & LOOKUP_RCU)
-		return afs_d_revalidate_rcu(dentry);
+		return afs_d_revalidate_rcu(dir, dentry);
 
 	if (d_really_is_positive(dentry)) {
 		vnode = AFS_FS_I(d_inode(dentry));
@@ -1122,14 +1112,9 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	if (IS_ERR(key))
 		key = NULL;
 
-	/* Hold the parent dentry so we can peer at it */
-	parent = dget_parent(dentry);
-	dir = AFS_FS_I(d_inode(parent));
-
 	/* validate the parent directory */
 	ret = afs_validate(dir, key);
 	if (ret == -ERESTARTSYS) {
-		dput(parent);
 		key_put(key);
 		return ret;
 	}
@@ -1157,7 +1142,7 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	afs_stat_v(dir, n_reval);
 
 	/* search the directory for this vnode */
-	ret = afs_do_lookup_one(&dir->netfs.inode, dentry, &fid, key, &dir_version);
+	ret = afs_do_lookup_one(&dir->netfs.inode, name, &fid, key, &dir_version);
 	switch (ret) {
 	case 0:
 		/* the filename maps to something */
@@ -1201,22 +1186,19 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 		goto out_valid;
 
 	default:
-		_debug("failed to iterate dir %pd: %d",
-		       parent, ret);
+		_debug("failed to iterate parent %pd2: %d", dentry, ret);
 		goto not_found;
 	}
 
 out_valid:
 	dentry->d_fsdata = (void *)(unsigned long)dir_version;
 out_valid_noupdate:
-	dput(parent);
 	key_put(key);
 	_leave(" = 1 [valid]");
 	return 1;
 
 not_found:
 	_debug("dropping dentry %pd2", dentry);
-	dput(parent);
 	key_put(key);
 
 	_leave(" = 0 [bad]");
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 09/20] ceph_d_revalidate(): use stable parent inode passed by caller
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (6 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-17 18:35       ` Jeff Layton
  2025-01-16  5:23     ` [PATCH v2 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
                       ` (10 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to mess with the boilerplate for obtaining what we already
have.  Note that ceph is one of the "will want a path from filesystem
root if we want to talk to server" cases, so the name of the last
component is of little use - it is passed to fscrypt_d_revalidate()
and it's used to deal with (also crypt-related) case in request
marshalling, when encrypted name turns out to be too long.  The former
is not a problem, but the latter is racy; that part will be handled
in the next commit.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ceph/dir.c | 22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index c4c71c24221b..dc5f55bebad7 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1940,30 +1940,19 @@ static int dir_lease_is_valid(struct inode *dir, struct dentry *dentry,
 /*
  * Check if cached dentry can be trusted.
  */
-static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+static int ceph_d_revalidate(struct inode *dir, const struct qstr *name,
 			     struct dentry *dentry, unsigned int flags)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_fs_client(dentry->d_sb)->mdsc;
 	struct ceph_client *cl = mdsc->fsc->client;
 	int valid = 0;
-	struct dentry *parent;
-	struct inode *dir, *inode;
+	struct inode *inode;
 
-	valid = fscrypt_d_revalidate(parent_dir, name, dentry, flags);
+	valid = fscrypt_d_revalidate(dir, name, dentry, flags);
 	if (valid <= 0)
 		return valid;
 
-	if (flags & LOOKUP_RCU) {
-		parent = READ_ONCE(dentry->d_parent);
-		dir = d_inode_rcu(parent);
-		if (!dir)
-			return -ECHILD;
-		inode = d_inode_rcu(dentry);
-	} else {
-		parent = dget_parent(dentry);
-		dir = d_inode(parent);
-		inode = d_inode(dentry);
-	}
+	inode = d_inode_rcu(dentry);
 
 	doutc(cl, "%p '%pd' inode %p offset 0x%llx nokey %d\n",
 	      dentry, dentry, inode, ceph_dentry(dentry)->offset,
@@ -2039,9 +2028,6 @@ static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	doutc(cl, "%p '%pd' %s\n", dentry, dentry, valid ? "valid" : "invalid");
 	if (!valid)
 		ceph_dir_clear_complete(dir);
-
-	if (!(flags & LOOKUP_RCU))
-		dput(parent);
 	return valid;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 10/20] ceph_d_revalidate(): propagate stable name down into request enconding
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (7 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 09/20] ceph_d_revalidate(): use stable " Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-17 18:35       ` Jeff Layton
  2025-01-16  5:23     ` [PATCH v2 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
                       ` (9 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Currently get_fscrypt_altname() requires ->r_dentry->d_name to be stable
and it gets that in almost all cases.  The only exception is ->d_revalidate(),
where we have a stable name, but it's passed separately - dentry->d_name
is not stable there.

Propagate it down to get_fscrypt_altname() as a new field of struct
ceph_mds_request - ->r_dname, to be used instead ->r_dentry->d_name
when non-NULL.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ceph/dir.c        | 2 ++
 fs/ceph/mds_client.c | 9 ++++++---
 fs/ceph/mds_client.h | 2 ++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index dc5f55bebad7..62e99e65250d 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1998,6 +1998,8 @@ static int ceph_d_revalidate(struct inode *dir, const struct qstr *name,
 			req->r_parent = dir;
 			ihold(dir);
 
+			req->r_dname = name;
+
 			mask = CEPH_STAT_CAP_INODE | CEPH_CAP_AUTH_SHARED;
 			if (ceph_security_xattr_wanted(dir))
 				mask |= CEPH_CAP_XATTR_SHARED;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 219a2cc2bf3c..3b766b984713 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2621,6 +2621,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
 {
 	struct inode *dir = req->r_parent;
 	struct dentry *dentry = req->r_dentry;
+	const struct qstr *name = req->r_dname;
 	u8 *cryptbuf = NULL;
 	u32 len = 0;
 	int ret = 0;
@@ -2641,8 +2642,10 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
 	if (!fscrypt_has_encryption_key(dir))
 		goto success;
 
-	if (!fscrypt_fname_encrypted_size(dir, dentry->d_name.len, NAME_MAX,
-					  &len)) {
+	if (!name)
+		name = &dentry->d_name;
+
+	if (!fscrypt_fname_encrypted_size(dir, name->len, NAME_MAX, &len)) {
 		WARN_ON_ONCE(1);
 		return ERR_PTR(-ENAMETOOLONG);
 	}
@@ -2657,7 +2660,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
 	if (!cryptbuf)
 		return ERR_PTR(-ENOMEM);
 
-	ret = fscrypt_fname_encrypt(dir, &dentry->d_name, cryptbuf, len);
+	ret = fscrypt_fname_encrypt(dir, name, cryptbuf, len);
 	if (ret) {
 		kfree(cryptbuf);
 		return ERR_PTR(ret);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 38bb7e0d2d79..7c9fee9e80d4 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -299,6 +299,8 @@ struct ceph_mds_request {
 	struct inode *r_target_inode;       /* resulting inode */
 	struct inode *r_new_inode;	    /* new inode (for creates) */
 
+	const struct qstr *r_dname;	    /* stable name (for ->d_revalidate) */
+
 #define CEPH_MDS_R_DIRECT_IS_HASH	(1) /* r_direct_hash is valid */
 #define CEPH_MDS_R_ABORTED		(2) /* call was aborted */
 #define CEPH_MDS_R_GOT_UNSAFE		(3) /* got an unsafe reply */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (8 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-17 15:20       ` Jeff Layton
  2025-01-16  5:23     ` [PATCH v2 12/20] exfat_d_revalidate(): " Al Viro
                       ` (8 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

The only thing it's using is parent directory inode and we are already
given a stable reference to that - no need to bother with boilerplate.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/crypto/fname.c | 21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 389f5b2bf63b..010f9c0a4c2f 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -574,12 +574,10 @@ EXPORT_SYMBOL_GPL(fscrypt_fname_siphash);
  * Validate dentries in encrypted directories to make sure we aren't potentially
  * caching stale dentries after a key has been added.
  */
-int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
 			 struct dentry *dentry, unsigned int flags)
 {
-	struct dentry *dir;
 	int err;
-	int valid;
 
 	/*
 	 * Plaintext names are always valid, since fscrypt doesn't support
@@ -592,30 +590,21 @@ int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	/*
 	 * No-key name; valid if the directory's key is still unavailable.
 	 *
-	 * Although fscrypt forbids rename() on no-key names, we still must use
-	 * dget_parent() here rather than use ->d_parent directly.  That's
-	 * because a corrupted fs image may contain directory hard links, which
-	 * the VFS handles by moving the directory's dentry tree in the dcache
-	 * each time ->lookup() finds the directory and it already has a dentry
-	 * elsewhere.  Thus ->d_parent can be changing, and we must safely grab
-	 * a reference to some ->d_parent to prevent it from being freed.
+	 * Note in RCU mode we have to bail if we get here -
+	 * fscrypt_get_encryption_info() may block.
 	 */
 
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	dir = dget_parent(dentry);
 	/*
 	 * Pass allow_unsupported=true, so that files with an unsupported
 	 * encryption policy can be deleted.
 	 */
-	err = fscrypt_get_encryption_info(d_inode(dir), true);
-	valid = !fscrypt_has_encryption_key(d_inode(dir));
-	dput(dir);
-
+	err = fscrypt_get_encryption_info(dir, true);
 	if (err < 0)
 		return err;
 
-	return valid;
+	return !fscrypt_has_encryption_key(dir);
 }
 EXPORT_SYMBOL_GPL(fscrypt_d_revalidate);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 12/20] exfat_d_revalidate(): use stable parent inode passed by caller
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (9 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-16  5:23     ` [PATCH v2 13/20] vfat_revalidate{,_ci}(): " Al Viro
                       ` (7 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... no need to bother with ->d_lock and ->d_parent->d_inode.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/exfat/namei.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index e3b4feccba07..61c7164b85b3 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -34,8 +34,6 @@ static inline void exfat_d_version_set(struct dentry *dentry,
 static int exfat_d_revalidate(struct inode *dir, const struct qstr *name,
 			      struct dentry *dentry, unsigned int flags)
 {
-	int ret;
-
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
@@ -59,11 +57,7 @@ static int exfat_d_revalidate(struct inode *dir, const struct qstr *name,
 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
 		return 0;
 
-	spin_lock(&dentry->d_lock);
-	ret = inode_eq_iversion(d_inode(dentry->d_parent),
-			exfat_d_version(dentry));
-	spin_unlock(&dentry->d_lock);
-	return ret;
+	return inode_eq_iversion(dir, exfat_d_version(dentry));
 }
 
 /* returns the length of a struct qstr, ignoring trailing dots if necessary */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 13/20] vfat_revalidate{,_ci}(): use stable parent inode passed by caller
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (10 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 12/20] exfat_d_revalidate(): " Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-17 15:22       ` Jeff Layton
  2025-01-16  5:23     ` [PATCH v2 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
                       ` (6 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/fat/namei_vfat.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index f9cbd5c6f932..926c26e90ef8 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -43,14 +43,9 @@ static inline void vfat_d_version_set(struct dentry *dentry,
  * If it happened, the negative dentry isn't actually negative
  * anymore.  So, drop it.
  */
-static int vfat_revalidate_shortname(struct dentry *dentry)
+static bool vfat_revalidate_shortname(struct dentry *dentry, struct inode *dir)
 {
-	int ret = 1;
-	spin_lock(&dentry->d_lock);
-	if (!inode_eq_iversion(d_inode(dentry->d_parent), vfat_d_version(dentry)))
-		ret = 0;
-	spin_unlock(&dentry->d_lock);
-	return ret;
+	return inode_eq_iversion(dir, vfat_d_version(dentry));
 }
 
 static int vfat_revalidate(struct inode *dir, const struct qstr *name,
@@ -62,7 +57,7 @@ static int vfat_revalidate(struct inode *dir, const struct qstr *name,
 	/* This is not negative dentry. Always valid. */
 	if (d_really_is_positive(dentry))
 		return 1;
-	return vfat_revalidate_shortname(dentry);
+	return vfat_revalidate_shortname(dentry, dir);
 }
 
 static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
@@ -99,7 +94,7 @@ static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
 		return 0;
 
-	return vfat_revalidate_shortname(dentry);
+	return vfat_revalidate_shortname(dentry, dir);
 }
 
 /* returns the length of a struct qstr, ignoring trailing dots */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 14/20] fuse_dentry_revalidate(): use stable parent inode and name passed by caller
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (11 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 13/20] vfat_revalidate{,_ci}(): " Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-17 15:18       ` Jeff Layton
  2025-01-16  5:23     ` [PATCH v2 15/20] gfs2_drevalidate(): " Al Viro
                       ` (5 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable - it's a real-life UAF.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/fuse/dir.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d9e9f26917eb..7e93a8470c36 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -196,7 +196,6 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
 				  struct dentry *entry, unsigned int flags)
 {
 	struct inode *inode;
-	struct dentry *parent;
 	struct fuse_mount *fm;
 	struct fuse_inode *fi;
 	int ret;
@@ -228,11 +227,9 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
 
 		attr_version = fuse_get_attr_version(fm->fc);
 
-		parent = dget_parent(entry);
-		fuse_lookup_init(fm->fc, &args, get_node_id(d_inode(parent)),
-				 &entry->d_name, &outarg);
+		fuse_lookup_init(fm->fc, &args, get_node_id(dir),
+				 name, &outarg);
 		ret = fuse_simple_request(fm, &args);
-		dput(parent);
 		/* Zero nodeid is same as -ENOENT */
 		if (!ret && !outarg.nodeid)
 			ret = -ENOENT;
@@ -266,9 +263,7 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
 			if (test_bit(FUSE_I_INIT_RDPLUS, &fi->state))
 				return -ECHILD;
 		} else if (test_and_clear_bit(FUSE_I_INIT_RDPLUS, &fi->state)) {
-			parent = dget_parent(entry);
-			fuse_advise_use_readdirplus(d_inode(parent));
-			dput(parent);
+			fuse_advise_use_readdirplus(dir);
 		}
 	}
 	ret = 1;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 15/20] gfs2_drevalidate(): use stable parent inode and name passed by caller
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (12 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-16  5:23     ` [PATCH v2 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
                       ` (4 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable.  Theoretically a UAF, but it's
hard to exfiltrate the information...

Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/gfs2/dentry.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c
index 86c338901fab..95050e719233 100644
--- a/fs/gfs2/dentry.c
+++ b/fs/gfs2/dentry.c
@@ -35,48 +35,40 @@
 static int gfs2_drevalidate(struct inode *dir, const struct qstr *name,
 			    struct dentry *dentry, unsigned int flags)
 {
-	struct dentry *parent;
-	struct gfs2_sbd *sdp;
-	struct gfs2_inode *dip;
+	struct gfs2_sbd *sdp = GFS2_SB(dir);
+	struct gfs2_inode *dip = GFS2_I(dir);
 	struct inode *inode;
 	struct gfs2_holder d_gh;
 	struct gfs2_inode *ip = NULL;
-	int error, valid = 0;
+	int error, valid;
 	int had_lock = 0;
 
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	parent = dget_parent(dentry);
-	sdp = GFS2_SB(d_inode(parent));
-	dip = GFS2_I(d_inode(parent));
 	inode = d_inode(dentry);
 
 	if (inode) {
 		if (is_bad_inode(inode))
-			goto out;
+			return 0;
 		ip = GFS2_I(inode);
 	}
 
-	if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL) {
-		valid = 1;
-		goto out;
-	}
+	if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL)
+		return 1;
 
 	had_lock = (gfs2_glock_is_locked_by_me(dip->i_gl) != NULL);
 	if (!had_lock) {
 		error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &d_gh);
 		if (error)
-			goto out;
+			return 0;
 	}
 
-	error = gfs2_dir_check(d_inode(parent), &dentry->d_name, ip);
+	error = gfs2_dir_check(dir, name, ip);
 	valid = inode ? !error : (error == -ENOENT);
 
 	if (!had_lock)
 		gfs2_glock_dq_uninit(&d_gh);
-out:
-	dput(parent);
 	return valid;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 16/20] nfs{,4}_lookup_validate(): use stable parent inode passed by caller
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (13 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 15/20] gfs2_drevalidate(): " Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-17 14:05       ` Jeff Layton
  2025-01-16  5:23     ` [PATCH v2 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
                       ` (3 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

we can't kill __nfs_lookup_revalidate() completely, but ->d_parent boilerplate
in it is gone

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/nfs/dir.c | 43 +++++++++++++------------------------------
 1 file changed, 13 insertions(+), 30 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 9910d9796f4c..c28983ee75ca 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1732,8 +1732,8 @@ static int nfs_lookup_revalidate_dentry(struct inode *dir,
  * cached dentry and do a new lookup.
  */
 static int
-nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
-			 unsigned int flags)
+nfs_do_lookup_revalidate(struct inode *dir, const struct qstr *name,
+			 struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	int error = 0;
@@ -1785,39 +1785,26 @@ nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 }
 
 static int
-__nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
-			int (*reval)(struct inode *, struct dentry *, unsigned int))
+__nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 {
-	struct dentry *parent;
-	struct inode *dir;
-	int ret;
-
 	if (flags & LOOKUP_RCU) {
 		if (dentry->d_fsdata == NFS_FSDATA_BLOCKED)
 			return -ECHILD;
-		parent = READ_ONCE(dentry->d_parent);
-		dir = d_inode_rcu(parent);
-		if (!dir)
-			return -ECHILD;
-		ret = reval(dir, dentry, flags);
-		if (parent != READ_ONCE(dentry->d_parent))
-			return -ECHILD;
 	} else {
 		/* Wait for unlink to complete - see unblock_revalidate() */
 		wait_var_event(&dentry->d_fsdata,
 			       smp_load_acquire(&dentry->d_fsdata)
 			       != NFS_FSDATA_BLOCKED);
-		parent = dget_parent(dentry);
-		ret = reval(d_inode(parent), dentry, flags);
-		dput(parent);
 	}
-	return ret;
+	return 0;
 }
 
 static int nfs_lookup_revalidate(struct inode *dir, const struct qstr *name,
 				 struct dentry *dentry, unsigned int flags)
 {
-	return __nfs_lookup_revalidate(dentry, flags, nfs_do_lookup_revalidate);
+	if (__nfs_lookup_revalidate(dentry, flags))
+		return -ECHILD;
+	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
 }
 
 static void block_revalidate(struct dentry *dentry)
@@ -2216,11 +2203,14 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 EXPORT_SYMBOL_GPL(nfs_atomic_open);
 
 static int
-nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
-			  unsigned int flags)
+nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
+		       struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 
+	if (__nfs_lookup_revalidate(dentry, flags))
+		return -ECHILD;
+
 	trace_nfs_lookup_revalidate_enter(dir, dentry, flags);
 
 	if (!(flags & LOOKUP_OPEN) || (flags & LOOKUP_DIRECTORY))
@@ -2259,14 +2249,7 @@ nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
 
 full_reval:
-	return nfs_do_lookup_revalidate(dir, dentry, flags);
-}
-
-static int nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
-				  struct dentry *dentry, unsigned int flags)
-{
-	return __nfs_lookup_revalidate(dentry, flags,
-			nfs4_do_lookup_revalidate);
+	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
 }
 
 #endif /* CONFIG_NFSV4 */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (14 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-17 15:12       ` Jeff Layton
  2025-01-16  5:23     ` [PATCH v2 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
                       ` (2 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Pass the stable name all the way down to ->rpc_ops->lookup() instances.

Note that passing &dentry->d_name is safe in e.g. nfs_lookup() - it *is*
stable there, as it is in ->create() et.al.

dget_parent() in nfs_instantiate() should be redundant - it'd better be
stable there; if it's not, we have more trouble, since ->d_name would
also be unsafe in such case.

nfs_submount() and nfs4_submount() may or may not require fixes - if
they ever get moved on server with fhandle preserved, we are in trouble
there...

UAF window is fairly narrow here and exfiltration requires the ability
to watch the traffic.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/nfs/dir.c            | 14 ++++++++------
 fs/nfs/namespace.c      |  2 +-
 fs/nfs/nfs3proc.c       |  5 ++---
 fs/nfs/nfs4proc.c       | 20 ++++++++++----------
 fs/nfs/proc.c           |  6 +++---
 include/linux/nfs_xdr.h |  2 +-
 6 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index c28983ee75ca..2b04038b0e40 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1672,7 +1672,7 @@ nfs_lookup_revalidate_delegated(struct inode *dir, struct dentry *dentry,
 	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
 }
 
-static int nfs_lookup_revalidate_dentry(struct inode *dir,
+static int nfs_lookup_revalidate_dentry(struct inode *dir, const struct qstr *name,
 					struct dentry *dentry,
 					struct inode *inode, unsigned int flags)
 {
@@ -1690,7 +1690,7 @@ static int nfs_lookup_revalidate_dentry(struct inode *dir,
 		goto out;
 
 	dir_verifier = nfs_save_change_attribute(dir);
-	ret = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
+	ret = NFS_PROTO(dir)->lookup(dir, dentry, name, fhandle, fattr);
 	if (ret < 0)
 		goto out;
 
@@ -1775,7 +1775,7 @@ nfs_do_lookup_revalidate(struct inode *dir, const struct qstr *name,
 	if (NFS_STALE(inode))
 		goto out_bad;
 
-	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
+	return nfs_lookup_revalidate_dentry(dir, name, dentry, inode, flags);
 out_valid:
 	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
 out_bad:
@@ -1970,7 +1970,8 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in
 
 	dir_verifier = nfs_save_change_attribute(dir);
 	trace_nfs_lookup_enter(dir, dentry, flags);
-	error = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
+	error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
+				       fhandle, fattr);
 	if (error == -ENOENT) {
 		if (nfs_server_capable(dir, NFS_CAP_CASE_INSENSITIVE))
 			dir_verifier = inode_peek_iversion_raw(dir);
@@ -2246,7 +2247,7 @@ nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
 reval_dentry:
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
-	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
+	return nfs_lookup_revalidate_dentry(dir, name, dentry, inode, flags);
 
 full_reval:
 	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
@@ -2305,7 +2306,8 @@ nfs_add_or_obtain(struct dentry *dentry, struct nfs_fh *fhandle,
 	d_drop(dentry);
 
 	if (fhandle->size == 0) {
-		error = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
+		error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
+					       fhandle, fattr);
 		if (error)
 			goto out_error;
 	}
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index 2d53574da605..973aed9cc5fe 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -308,7 +308,7 @@ int nfs_submount(struct fs_context *fc, struct nfs_server *server)
 	int err;
 
 	/* Look it up again to get its attributes */
-	err = server->nfs_client->rpc_ops->lookup(d_inode(parent), dentry,
+	err = server->nfs_client->rpc_ops->lookup(d_inode(parent), dentry, &dentry->d_name,
 						  ctx->mntfh, ctx->clone_data.fattr);
 	dput(parent);
 	if (err != 0)
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 1566163c6d85..ce70768e0201 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -192,7 +192,7 @@ __nfs3_proc_lookup(struct inode *dir, const char *name, size_t len,
 }
 
 static int
-nfs3_proc_lookup(struct inode *dir, struct dentry *dentry,
+nfs3_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
 		 struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	unsigned short task_flags = 0;
@@ -202,8 +202,7 @@ nfs3_proc_lookup(struct inode *dir, struct dentry *dentry,
 		task_flags |= RPC_TASK_TIMEOUT;
 
 	dprintk("NFS call  lookup %pd2\n", dentry);
-	return __nfs3_proc_lookup(dir, dentry->d_name.name,
-				  dentry->d_name.len, fhandle, fattr,
+	return __nfs3_proc_lookup(dir, name->name, name->len, fhandle, fattr,
 				  task_flags);
 }
 
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 405f17e6e0b4..4d85068e820d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4536,15 +4536,15 @@ nfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
 }
 
 static int _nfs4_proc_lookup(struct rpc_clnt *clnt, struct inode *dir,
-		struct dentry *dentry, struct nfs_fh *fhandle,
-		struct nfs_fattr *fattr)
+		struct dentry *dentry, const struct qstr *name,
+		struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	struct nfs_server *server = NFS_SERVER(dir);
 	int		       status;
 	struct nfs4_lookup_arg args = {
 		.bitmask = server->attr_bitmask,
 		.dir_fh = NFS_FH(dir),
-		.name = &dentry->d_name,
+		.name = name,
 	};
 	struct nfs4_lookup_res res = {
 		.server = server,
@@ -4586,17 +4586,16 @@ static void nfs_fixup_secinfo_attributes(struct nfs_fattr *fattr)
 }
 
 static int nfs4_proc_lookup_common(struct rpc_clnt **clnt, struct inode *dir,
-				   struct dentry *dentry, struct nfs_fh *fhandle,
-				   struct nfs_fattr *fattr)
+				   struct dentry *dentry, const struct qstr *name,
+				   struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	struct nfs4_exception exception = {
 		.interruptible = true,
 	};
 	struct rpc_clnt *client = *clnt;
-	const struct qstr *name = &dentry->d_name;
 	int err;
 	do {
-		err = _nfs4_proc_lookup(client, dir, dentry, fhandle, fattr);
+		err = _nfs4_proc_lookup(client, dir, dentry, name, fhandle, fattr);
 		trace_nfs4_lookup(dir, name, err);
 		switch (err) {
 		case -NFS4ERR_BADNAME:
@@ -4631,13 +4630,13 @@ static int nfs4_proc_lookup_common(struct rpc_clnt **clnt, struct inode *dir,
 	return err;
 }
 
-static int nfs4_proc_lookup(struct inode *dir, struct dentry *dentry,
+static int nfs4_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
 			    struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	int status;
 	struct rpc_clnt *client = NFS_CLIENT(dir);
 
-	status = nfs4_proc_lookup_common(&client, dir, dentry, fhandle, fattr);
+	status = nfs4_proc_lookup_common(&client, dir, dentry, name, fhandle, fattr);
 	if (client != NFS_CLIENT(dir)) {
 		rpc_shutdown_client(client);
 		nfs_fixup_secinfo_attributes(fattr);
@@ -4652,7 +4651,8 @@ nfs4_proc_lookup_mountpoint(struct inode *dir, struct dentry *dentry,
 	struct rpc_clnt *client = NFS_CLIENT(dir);
 	int status;
 
-	status = nfs4_proc_lookup_common(&client, dir, dentry, fhandle, fattr);
+	status = nfs4_proc_lookup_common(&client, dir, dentry, &dentry->d_name,
+					 fhandle, fattr);
 	if (status < 0)
 		return ERR_PTR(status);
 	return (client == NFS_CLIENT(dir)) ? rpc_clone_client(client) : client;
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 6c09cd090c34..77920a2e3cef 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -153,13 +153,13 @@ nfs_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
 }
 
 static int
-nfs_proc_lookup(struct inode *dir, struct dentry *dentry,
+nfs_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
 		struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	struct nfs_diropargs	arg = {
 		.fh		= NFS_FH(dir),
-		.name		= dentry->d_name.name,
-		.len		= dentry->d_name.len
+		.name		= name->name,
+		.len		= name->len
 	};
 	struct nfs_diropok	res = {
 		.fh		= fhandle,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 559273a0f16d..08b62bbf59f0 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1785,7 +1785,7 @@ struct nfs_rpc_ops {
 			    struct nfs_fattr *, struct inode *);
 	int	(*setattr) (struct dentry *, struct nfs_fattr *,
 			    struct iattr *);
-	int	(*lookup)  (struct inode *, struct dentry *,
+	int	(*lookup)  (struct inode *, struct dentry *, const struct qstr *,
 			    struct nfs_fh *, struct nfs_fattr *);
 	int	(*lookupp) (struct inode *, struct nfs_fh *,
 			    struct nfs_fattr *);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (15 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-16  5:23     ` [PATCH v2 19/20] orangefs_d_revalidate(): " Al Viro
  2025-01-16  5:23     ` [PATCH v2 20/20] 9p: fix ->rename_sem exclusion Al Viro
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

theoretically, ->d_name use in there is a UAF, but only if you are messing with
tracepoints...

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ocfs2/dcache.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c
index ecb1ce6301c4..1873bbbb7e5b 100644
--- a/fs/ocfs2/dcache.c
+++ b/fs/ocfs2/dcache.c
@@ -45,8 +45,7 @@ static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
 	inode = d_inode(dentry);
 	osb = OCFS2_SB(dentry->d_sb);
 
-	trace_ocfs2_dentry_revalidate(dentry, dentry->d_name.len,
-				      dentry->d_name.name);
+	trace_ocfs2_dentry_revalidate(dentry, name->len, name->name);
 
 	/* For a negative dentry -
 	 * check the generation number of the parent and compare with the
@@ -54,12 +53,8 @@ static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
 	 */
 	if (inode == NULL) {
 		unsigned long gen = (unsigned long) dentry->d_fsdata;
-		unsigned long pgen;
-		spin_lock(&dentry->d_lock);
-		pgen = OCFS2_I(d_inode(dentry->d_parent))->ip_dir_lock_gen;
-		spin_unlock(&dentry->d_lock);
-		trace_ocfs2_dentry_revalidate_negative(dentry->d_name.len,
-						       dentry->d_name.name,
+		unsigned long pgen = OCFS2_I(dir)->ip_dir_lock_gen;
+		trace_ocfs2_dentry_revalidate_negative(name->len, name->name,
 						       pgen, gen);
 		if (gen != pgen)
 			goto bail;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 19/20] orangefs_d_revalidate(): use stable parent inode and name passed by caller
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (16 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
@ 2025-01-16  5:23     ` Al Viro
  2025-01-16  5:23     ` [PATCH v2 20/20] 9p: fix ->rename_sem exclusion Al Viro
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

->d_name use is a UAF if the userland side of things can be slowed down
by attacker.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/orangefs/dcache.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
index c32c9a86e8d0..060c94e9759b 100644
--- a/fs/orangefs/dcache.c
+++ b/fs/orangefs/dcache.c
@@ -13,10 +13,9 @@
 #include "orangefs-kernel.h"
 
 /* Returns 1 if dentry can still be trusted, else 0. */
-static int orangefs_revalidate_lookup(struct dentry *dentry)
+static int orangefs_revalidate_lookup(struct inode *parent_inode, const struct qstr *name,
+				      struct dentry *dentry)
 {
-	struct dentry *parent_dentry = dget_parent(dentry);
-	struct inode *parent_inode = parent_dentry->d_inode;
 	struct orangefs_inode_s *parent = ORANGEFS_I(parent_inode);
 	struct inode *inode = dentry->d_inode;
 	struct orangefs_kernel_op_s *new_op;
@@ -26,14 +25,12 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
 	gossip_debug(GOSSIP_DCACHE_DEBUG, "%s: attempting lookup.\n", __func__);
 
 	new_op = op_alloc(ORANGEFS_VFS_OP_LOOKUP);
-	if (!new_op) {
-		ret = -ENOMEM;
-		goto out_put_parent;
-	}
+	if (!new_op)
+		return -ENOMEM;
 
 	new_op->upcall.req.lookup.sym_follow = ORANGEFS_LOOKUP_LINK_NO_FOLLOW;
 	new_op->upcall.req.lookup.parent_refn = parent->refn;
-	strscpy(new_op->upcall.req.lookup.d_name, dentry->d_name.name);
+	strscpy(new_op->upcall.req.lookup.d_name, name->name);
 
 	gossip_debug(GOSSIP_DCACHE_DEBUG,
 		     "%s:%s:%d interrupt flag [%d]\n",
@@ -78,8 +75,6 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
 	ret = 1;
 out_release_op:
 	op_release(new_op);
-out_put_parent:
-	dput(parent_dentry);
 	return ret;
 out_drop:
 	gossip_debug(GOSSIP_DCACHE_DEBUG, "%s:%s:%d revalidate failed\n",
@@ -115,7 +110,7 @@ static int orangefs_d_revalidate(struct inode *dir, const struct qstr *name,
 	 * If this passes, the positive dentry still exists or the negative
 	 * dentry still does not exist.
 	 */
-	if (!orangefs_revalidate_lookup(dentry))
+	if (!orangefs_revalidate_lookup(dir, name, dentry))
 		return 0;
 
 	/* We do not need to continue with negative dentries. */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 20/20] 9p: fix ->rename_sem exclusion
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                       ` (17 preceding siblings ...)
  2025-01-16  5:23     ` [PATCH v2 19/20] orangefs_d_revalidate(): " Al Viro
@ 2025-01-16  5:23     ` Al Viro
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-16  5:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.

->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain.  Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.

Solution:
	* have d_move() done in ->rename() rather than in its caller
	* maintain a 9p-private rwsem (per-filesystem)
	* hold it exclusive over the relevant part of ->rename()
	* hold it shared over the places where we want the path.

That almost works.  FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control.  However, there's
also __d_unalias(), which isn't covered by any of that.

If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.

Add a couple of optional methods, so that __d_unalias() would do
	if alias->d_op->d_unalias_trylock != NULL
		if (!alias->d_op->d_unalias_trylock(alias))
			fail (resulting in -ESTALE from lookup)
	__d_move(...)
	if alias->d_op->d_unalias_unlock != NULL
		alias->d_unalias_unlock(alias)
where it currently does __d_move().  9p instances do down_write_trylock()
and up_write() of ->rename_mutex.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/locking.rst |  4 ++++
 Documentation/filesystems/vfs.rst     | 21 +++++++++++++++++++++
 fs/9p/v9fs.h                          |  2 +-
 fs/9p/vfs_dentry.c                    | 16 ++++++++++++++++
 fs/dcache.c                           |  5 +++++
 include/linux/dcache.h                |  2 ++
 6 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 146e7d8aa736..d20a32b77b60 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -31,6 +31,8 @@ prototypes::
 	struct vfsmount *(*d_automount)(struct path *path);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_unalias_trylock)(const struct dentry *);
+	void (*d_unalias_unlock)(const struct dentry *);
 
 locking rules:
 
@@ -50,6 +52,8 @@ d_dname:	   no		no		no		no
 d_automount:	   no		no		yes		no
 d_manage:	   no		no		yes (ref-walk)	maybe
 d_real		   no		no		yes 		no
+d_unalias_trylock  yes		no		no 		no
+d_unalias_unlock   yes		no		no 		no
 ================== ===========	========	==============	========
 
 inode_operations
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c352ebaae98..31eea688609a 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1265,6 +1265,8 @@ defined:
 		struct vfsmount *(*d_automount)(struct path *);
 		int (*d_manage)(const struct path *, bool);
 		struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+		bool (*d_unalias_trylock)(const struct dentry *);
+		void (*d_unalias_unlock)(const struct dentry *);
 	};
 
 ``d_revalidate``
@@ -1428,6 +1430,25 @@ defined:
 
 	For non-regular files, the 'dentry' argument is returned.
 
+``d_unalias_trylock``
+	if present, will be called by d_splice_alias() before moving a
+	preexisting attached alias.  Returning false prevents __d_move(),
+	making d_splice_alias() fail with -ESTALE.
+
+	Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move()
+	and d_exchange() calls from the outside of filesystem methods;
+	however, it does not guarantee that attached dentries won't
+	be renamed or moved by d_splice_alias() finding a preexisting
+	alias for a directory inode.  Normally we would not care;
+	however, something that wants to stabilize the entire path to
+	root over a blocking operation might need that.  See 9p for one
+	(and hopefully only) example.
+
+``d_unalias_unlock``
+	should be paired with ``d_unalias_trylock``; that one is called after
+	__d_move() call in __d_unalias().
+
+
 Each dentry has a pointer to its parent dentry, as well as a hash list
 of child dentries.  Child dentries are basically like files in a
 directory.
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 698c43dd5dc8..f28bc763847a 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -202,7 +202,7 @@ static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode)
 	return inode->i_sb->s_fs_info;
 }
 
-static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry)
+static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry)
 {
 	return dentry->d_sb->s_fs_info;
 }
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 872c1abe3295..5061f192eafd 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -105,14 +105,30 @@ static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
 	return __v9fs_lookup_revalidate(dentry, flags);
 }
 
+static bool v9fs_dentry_unalias_trylock(const struct dentry *dentry)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+	return down_write_trylock(&v9ses->rename_sem);
+}
+
+static void v9fs_dentry_unalias_unlock(const struct dentry *dentry)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+	up_write(&v9ses->rename_sem);
+}
+
 const struct dentry_operations v9fs_cached_dentry_operations = {
 	.d_revalidate = v9fs_lookup_revalidate,
 	.d_weak_revalidate = __v9fs_lookup_revalidate,
 	.d_delete = v9fs_cached_dentry_delete,
 	.d_release = v9fs_dentry_release,
+	.d_unalias_trylock = v9fs_dentry_unalias_trylock,
+	.d_unalias_unlock = v9fs_dentry_unalias_unlock,
 };
 
 const struct dentry_operations v9fs_dentry_operations = {
 	.d_delete = always_delete_dentry,
 	.d_release = v9fs_dentry_release,
+	.d_unalias_trylock = v9fs_dentry_unalias_trylock,
+	.d_unalias_unlock = v9fs_dentry_unalias_unlock,
 };
diff --git a/fs/dcache.c b/fs/dcache.c
index 6f36d3e8c739..695406e48937 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2961,7 +2961,12 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias)
 		goto out_err;
 	m2 = &alias->d_parent->d_inode->i_rwsem;
 out_unalias:
+	if (alias->d_op->d_unalias_trylock &&
+	    !alias->d_op->d_unalias_trylock(alias))
+		goto out_err;
 	__d_move(alias, dentry, false);
+	if (alias->d_op->d_unalias_unlock)
+		alias->d_op->d_unalias_unlock(alias);
 	ret = 0;
 out_err:
 	if (m2)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 4a6bdadf2f29..9a1a30857763 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -159,6 +159,8 @@ struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_unalias_trylock)(const struct dentry *);
+	void (*d_unalias_unlock)(const struct dentry *);
 } ____cacheline_aligned;
 
 /*
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 04/20] dissolve external_name.u into separate members
  2025-01-16  5:23     ` [PATCH v2 04/20] dissolve external_name.u into separate members Al Viro
@ 2025-01-16 10:06       ` Jan Kara
  0 siblings, 0 replies; 96+ messages in thread
From: Jan Kara @ 2025-01-16 10:06 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

On Thu 16-01-25 05:23:01, Al Viro wrote:
> kept separate from the previous commit to keep the noise separate
> from actual changes...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/dcache.c | 22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index f387dc97df86..6f36d3e8c739 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -296,10 +296,8 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
>  }
>  
>  struct external_name {
> -	struct {
> -		atomic_t count;		// ->count and ->head can't be combined
> -		struct rcu_head head;	// see take_dentry_name_snapshot()
> -	} u;
> +	struct rcu_head head;	// ->head and ->count can't be combined
> +	atomic_t count;		// see take_dentry_name_snapshot()
>  	unsigned char name[];
>  };
>  
> @@ -344,7 +342,7 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry
>  		struct external_name *p;
>  		p = container_of(s, struct external_name, name[0]);
>  		// get a valid reference
> -		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
> +		if (unlikely(!atomic_inc_not_zero(&p->count)))
>  			goto retry;
>  		name->name.name = s;
>  	}
> @@ -361,8 +359,8 @@ void release_dentry_name_snapshot(struct name_snapshot *name)
>  	if (unlikely(name->name.name != name->inline_name.string)) {
>  		struct external_name *p;
>  		p = container_of(name->name.name, struct external_name, name[0]);
> -		if (unlikely(atomic_dec_and_test(&p->u.count)))
> -			kfree_rcu(p, u.head);
> +		if (unlikely(atomic_dec_and_test(&p->count)))
> +			kfree_rcu(p, head);
>  	}
>  }
>  EXPORT_SYMBOL(release_dentry_name_snapshot);
> @@ -400,7 +398,7 @@ static void dentry_free(struct dentry *dentry)
>  	WARN_ON(!hlist_unhashed(&dentry->d_u.d_alias));
>  	if (unlikely(dname_external(dentry))) {
>  		struct external_name *p = external_name(dentry);
> -		if (likely(atomic_dec_and_test(&p->u.count))) {
> +		if (likely(atomic_dec_and_test(&p->count))) {
>  			call_rcu(&dentry->d_u.d_rcu, __d_free_external);
>  			return;
>  		}
> @@ -1681,7 +1679,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
>  			kmem_cache_free(dentry_cache, dentry); 
>  			return NULL;
>  		}
> -		atomic_set(&p->u.count, 1);
> +		atomic_set(&p->count, 1);
>  		dname = p->name;
>  	} else  {
>  		dname = dentry->d_shortname.string;
> @@ -2774,15 +2772,15 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
>  	if (unlikely(dname_external(dentry)))
>  		old_name = external_name(dentry);
>  	if (unlikely(dname_external(target))) {
> -		atomic_inc(&external_name(target)->u.count);
> +		atomic_inc(&external_name(target)->count);
>  		dentry->d_name = target->d_name;
>  	} else {
>  		dentry->d_shortname = target->d_shortname;
>  		dentry->d_name.name = dentry->d_shortname.string;
>  		dentry->d_name.hash_len = target->d_name.hash_len;
>  	}
> -	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
> -		kfree_rcu(old_name, u.head);
> +	if (old_name && likely(atomic_dec_and_test(&old_name->count)))
> +		kfree_rcu(old_name, head);
>  }
>  
>  /*
> -- 
> 2.39.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate()
  2025-01-16  5:23     ` [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
@ 2025-01-16 15:15       ` Gabriel Krisman Bertazi
  2025-01-17 18:55       ` Jeff Layton
  1 sibling, 0 replies; 96+ messages in thread
From: Gabriel Krisman Bertazi @ 2025-01-16 15:15 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, linux-nfs, miklos, torvalds

Al Viro <viro@zeniv.linux.org.uk> writes:

> ->d_revalidate() often needs to access dentry parent and name; that has
> to be done carefully, since the locking environment varies from caller
> to caller.  We are not guaranteed that dentry in question will not be
> moved right under us - not unless the filesystem is such that nothing
> on it ever gets renamed.
>
> It can be dealt with, but that results in boilerplate code that isn't
> even needed - the callers normally have just found the dentry via dcache
> lookup and want to verify that it's in the right place; they already
> have the values of ->d_parent and ->d_name stable.  There is a couple
> of exceptions (overlayfs and, to less extent, ecryptfs), but for the
> majority of calls that song and dance is not needed at all.
>
> It's easier to make ecryptfs and overlayfs find and pass those values if
> there's a ->d_revalidate() instance to be called, rather than doing that
> in the instances.
>
> This commit only changes the calling conventions; making use of supplied
> values is left to followups.
>
> NOTE: some instances need more than just the parent - things like CIFS
> may need to build an entire path from filesystem root, so they need
> more precautions than the usual boilerplate.  This series doesn't
> do anything to that need - these filesystems have to keep their locking
> mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
> a-la v9fs).
>

Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>

Thanks for this. It is a requirement for the negative dentry patchset I
sent a while ago that I'll revive now.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage
  2025-01-16  5:23     ` [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
@ 2025-01-16 15:38       ` Gabriel Krisman Bertazi
  2025-01-16 15:46         ` Al Viro
  0 siblings, 1 reply; 96+ messages in thread
From: Gabriel Krisman Bertazi @ 2025-01-16 15:38 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, linux-nfs, miklos, torvalds

Al Viro <viro@zeniv.linux.org.uk> writes:

> ... and check the "name might be unstable" predicate
> the right way.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/libfs.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 748ac5923154..3ad1b1b7fed6 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1789,7 +1789,7 @@ int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
>  {
>  	const struct dentry *parent;
>  	const struct inode *dir;
> -	char strbuf[DNAME_INLINE_LEN];
> +	union shortname_store strbuf;
>  	struct qstr qstr;
>  
>  	/*
> @@ -1809,22 +1809,23 @@ int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
>  	if (!dir || !IS_CASEFOLDED(dir))
>  		return 1;
>  
> +	qstr.len = len;
> +	qstr.name = str;
>  	/*
>  	 * If the dentry name is stored in-line, then it may be concurrently
>  	 * modified by a rename.  If this happens, the VFS will eventually retry
>  	 * the lookup, so it doesn't matter what ->d_compare() returns.
>  	 * However, it's unsafe to call utf8_strncasecmp() with an unstable
>  	 * string.  Therefore, we have to copy the name into a temporary buffer.

This part of the comment needs updating since there is no more copying.

> +	 * As above, len is guaranteed to match str, so the shortname case
> +	 * is exactly when str points to ->d_shortname.
>  	 */
> -	if (len <= DNAME_INLINE_LEN - 1) {
> -		memcpy(strbuf, str, len);
> -		strbuf[len] = 0;
> -		str = strbuf;
> +	if (qstr.name == dentry->d_shortname.string) {
> +		strbuf = dentry->d_shortname; // NUL is guaranteed to be in there
> +		qstr.name = strbuf.string;
>  		/* prevent compiler from optimizing out the temporary buffer */
>  		barrier();

If I read the code correctly, I admit I don't understand how this
guarantees the stability.  Aren't you just assigning qstr.name back the
same value it had in case of an inlined name through a bounce pointer?
The previous implementation made sense to me, since the memcpy only
accessed each character once, and we guaranteed the terminating
character explicitly, but I'm having a hard time with this version.

>  	}
> -	qstr.len = len;
> -	qstr.name = str;
>  
>  	return utf8_strncasecmp(dentry->d_sb->s_encoding, name, &qstr);
>  }

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage
  2025-01-16 15:38       ` Gabriel Krisman Bertazi
@ 2025-01-16 15:46         ` Al Viro
  2025-01-16 15:53           ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-16 15:46 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, linux-nfs, miklos, torvalds

On Thu, Jan 16, 2025 at 10:38:53AM -0500, Gabriel Krisman Bertazi wrote:
> >  	 * If the dentry name is stored in-line, then it may be concurrently
> >  	 * modified by a rename.  If this happens, the VFS will eventually retry
> >  	 * the lookup, so it doesn't matter what ->d_compare() returns.
> >  	 * However, it's unsafe to call utf8_strncasecmp() with an unstable
> >  	 * string.  Therefore, we have to copy the name into a temporary buffer.
> 
> This part of the comment needs updating since there is no more copying.
> 
> > +	 * As above, len is guaranteed to match str, so the shortname case
> > +	 * is exactly when str points to ->d_shortname.
> >  	 */
> > -	if (len <= DNAME_INLINE_LEN - 1) {
> > -		memcpy(strbuf, str, len);
> > -		strbuf[len] = 0;
> > -		str = strbuf;
> > +	if (qstr.name == dentry->d_shortname.string) {
> > +		strbuf = dentry->d_shortname; // NUL is guaranteed to be in there
> > +		qstr.name = strbuf.string;
> >  		/* prevent compiler from optimizing out the temporary buffer */
> >  		barrier();
> 
> If I read the code correctly, I admit I don't understand how this
> guarantees the stability.  Aren't you just assigning qstr.name back the
> same value it had in case of an inlined name through a bounce pointer?
> The previous implementation made sense to me, since the memcpy only
> accessed each character once, and we guaranteed the terminating
> character explicitly, but I'm having a hard time with this version.

This
		strbuf = dentry->d_shortname; // NUL is guaranteed to be in there
copies the entire array.  No bounce pointers of any sort; we copy
the array contents, all 40 bytes of it.  And yes, struct (or union,
in this case) assignment generates better code than manual memcpy()
here.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage
  2025-01-16 15:46         ` Al Viro
@ 2025-01-16 15:53           ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 96+ messages in thread
From: Gabriel Krisman Bertazi @ 2025-01-16 15:53 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, linux-nfs, miklos, torvalds

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Thu, Jan 16, 2025 at 10:38:53AM -0500, Gabriel Krisman Bertazi wrote:
>> >  	 * If the dentry name is stored in-line, then it may be concurrently
>> >  	 * modified by a rename.  If this happens, the VFS will eventually retry
>> >  	 * the lookup, so it doesn't matter what ->d_compare() returns.
>> >  	 * However, it's unsafe to call utf8_strncasecmp() with an unstable
>> >  	 * string.  Therefore, we have to copy the name into a temporary buffer.
>> 
>> This part of the comment needs updating since there is no more copying.
>> 
>> > +	 * As above, len is guaranteed to match str, so the shortname case
>> > +	 * is exactly when str points to ->d_shortname.
>> >  	 */
>> > -	if (len <= DNAME_INLINE_LEN - 1) {
>> > -		memcpy(strbuf, str, len);
>> > -		strbuf[len] = 0;
>> > -		str = strbuf;
>> > +	if (qstr.name == dentry->d_shortname.string) {
>> > +		strbuf = dentry->d_shortname; // NUL is guaranteed to be in there
>> > +		qstr.name = strbuf.string;
>> >  		/* prevent compiler from optimizing out the temporary buffer */
>> >  		barrier();
>> 
>> If I read the code correctly, I admit I don't understand how this
>> guarantees the stability.  Aren't you just assigning qstr.name back the
>> same value it had in case of an inlined name through a bounce pointer?
>> The previous implementation made sense to me, since the memcpy only
>> accessed each character once, and we guaranteed the terminating
>> character explicitly, but I'm having a hard time with this version.
>
> This
> 		strbuf = dentry->d_shortname; // NUL is guaranteed to be in there
> copies the entire array.  No bounce pointers of any sort; we copy
> the array contents, all 40 bytes of it.  And yes, struct (or union,
> in this case) assignment generates better code than manual memcpy()
> here.

Ah. I read that as:

unsigned char *strbuf = &dentry->d_shortname

Thanks for explaining.  Makes sense to me:

Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 16/20] nfs{,4}_lookup_validate(): use stable parent inode passed by caller
  2025-01-16  5:23     ` [PATCH v2 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
@ 2025-01-17 14:05       ` Jeff Layton
  0 siblings, 0 replies; 96+ messages in thread
From: Jeff Layton @ 2025-01-17 14:05 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

On Thu, 2025-01-16 at 05:23 +0000, Al Viro wrote:
> we can't kill __nfs_lookup_revalidate() completely, but ->d_parent boilerplate
> in it is gone
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/nfs/dir.c | 43 +++++++++++++------------------------------
>  1 file changed, 13 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 9910d9796f4c..c28983ee75ca 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1732,8 +1732,8 @@ static int nfs_lookup_revalidate_dentry(struct inode *dir,
>   * cached dentry and do a new lookup.
>   */
>  static int
> -nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
> -			 unsigned int flags)
> +nfs_do_lookup_revalidate(struct inode *dir, const struct qstr *name,
> +			 struct dentry *dentry, unsigned int flags)
>  {
>  	struct inode *inode;
>  	int error = 0;
> @@ -1785,39 +1785,26 @@ nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
>  }
>  
>  static int
> -__nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
> -			int (*reval)(struct inode *, struct dentry *, unsigned int))
> +__nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
>  {
> -	struct dentry *parent;
> -	struct inode *dir;
> -	int ret;
> -
>  	if (flags & LOOKUP_RCU) {
>  		if (dentry->d_fsdata == NFS_FSDATA_BLOCKED)
>  			return -ECHILD;
> -		parent = READ_ONCE(dentry->d_parent);
> -		dir = d_inode_rcu(parent);
> -		if (!dir)
> -			return -ECHILD;
> -		ret = reval(dir, dentry, flags);
> -		if (parent != READ_ONCE(dentry->d_parent))
> -			return -ECHILD;
>  	} else {
>  		/* Wait for unlink to complete - see unblock_revalidate() */
>  		wait_var_event(&dentry->d_fsdata,
>  			       smp_load_acquire(&dentry->d_fsdata)
>  			       != NFS_FSDATA_BLOCKED);
> -		parent = dget_parent(dentry);
> -		ret = reval(d_inode(parent), dentry, flags);
> -		dput(parent);
>  	}
> -	return ret;
> +	return 0;
>  }
>  
>  static int nfs_lookup_revalidate(struct inode *dir, const struct qstr *name,
>  				 struct dentry *dentry, unsigned int flags)
>  {
> -	return __nfs_lookup_revalidate(dentry, flags, nfs_do_lookup_revalidate);
> +	if (__nfs_lookup_revalidate(dentry, flags))
> +		return -ECHILD;
> +	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
>  }
>  
>  static void block_revalidate(struct dentry *dentry)
> @@ -2216,11 +2203,14 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>  EXPORT_SYMBOL_GPL(nfs_atomic_open);
>  
>  static int
> -nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
> -			  unsigned int flags)
> +nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
> +		       struct dentry *dentry, unsigned int flags)
>  {
>  	struct inode *inode;
>  
> +	if (__nfs_lookup_revalidate(dentry, flags))
> +		return -ECHILD;
> +
>  	trace_nfs_lookup_revalidate_enter(dir, dentry, flags);
>  
>  	if (!(flags & LOOKUP_OPEN) || (flags & LOOKUP_DIRECTORY))
> @@ -2259,14 +2249,7 @@ nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
>  	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
>  
>  full_reval:
> -	return nfs_do_lookup_revalidate(dir, dentry, flags);
> -}
> -
> -static int nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
> -				  struct dentry *dentry, unsigned int flags)
> -{
> -	return __nfs_lookup_revalidate(dentry, flags,
> -			nfs4_do_lookup_revalidate);
> +	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
>  }
>  
>  #endif /* CONFIG_NFSV4 */

Much nicer without the "reval" callback.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses
  2025-01-16  5:23     ` [PATCH v2 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
@ 2025-01-17 15:12       ` Jeff Layton
  0 siblings, 0 replies; 96+ messages in thread
From: Jeff Layton @ 2025-01-17 15:12 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

On Thu, 2025-01-16 at 05:23 +0000, Al Viro wrote:
> Pass the stable name all the way down to ->rpc_ops->lookup() instances.
> 
> Note that passing &dentry->d_name is safe in e.g. nfs_lookup() - it *is*
> stable there, as it is in ->create() et.al.
> 
> dget_parent() in nfs_instantiate() should be redundant - it'd better be
> stable there; if it's not, we have more trouble, since ->d_name would
> also be unsafe in such case.
> 
> nfs_submount() and nfs4_submount() may or may not require fixes - if
> they ever get moved on server with fhandle preserved, we are in trouble
> there...
> 
> UAF window is fairly narrow here and exfiltration requires the ability
> to watch the traffic.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/nfs/dir.c            | 14 ++++++++------
>  fs/nfs/namespace.c      |  2 +-
>  fs/nfs/nfs3proc.c       |  5 ++---
>  fs/nfs/nfs4proc.c       | 20 ++++++++++----------
>  fs/nfs/proc.c           |  6 +++---
>  include/linux/nfs_xdr.h |  2 +-
>  6 files changed, 25 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index c28983ee75ca..2b04038b0e40 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1672,7 +1672,7 @@ nfs_lookup_revalidate_delegated(struct inode *dir, struct dentry *dentry,
>  	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
>  }
>  
> -static int nfs_lookup_revalidate_dentry(struct inode *dir,
> +static int nfs_lookup_revalidate_dentry(struct inode *dir, const struct qstr *name,
>  					struct dentry *dentry,
>  					struct inode *inode, unsigned int flags)
>  {
> @@ -1690,7 +1690,7 @@ static int nfs_lookup_revalidate_dentry(struct inode *dir,
>  		goto out;
>  
>  	dir_verifier = nfs_save_change_attribute(dir);
> -	ret = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
> +	ret = NFS_PROTO(dir)->lookup(dir, dentry, name, fhandle, fattr);
>  	if (ret < 0)
>  		goto out;
>  
> @@ -1775,7 +1775,7 @@ nfs_do_lookup_revalidate(struct inode *dir, const struct qstr *name,
>  	if (NFS_STALE(inode))
>  		goto out_bad;
>  
> -	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
> +	return nfs_lookup_revalidate_dentry(dir, name, dentry, inode, flags);
>  out_valid:
>  	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
>  out_bad:
> @@ -1970,7 +1970,8 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in
>  
>  	dir_verifier = nfs_save_change_attribute(dir);
>  	trace_nfs_lookup_enter(dir, dentry, flags);
> -	error = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
> +	error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
> +				       fhandle, fattr);
>  	if (error == -ENOENT) {
>  		if (nfs_server_capable(dir, NFS_CAP_CASE_INSENSITIVE))
>  			dir_verifier = inode_peek_iversion_raw(dir);
> @@ -2246,7 +2247,7 @@ nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
>  reval_dentry:
>  	if (flags & LOOKUP_RCU)
>  		return -ECHILD;
> -	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
> +	return nfs_lookup_revalidate_dentry(dir, name, dentry, inode, flags);
>  
>  full_reval:
>  	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
> @@ -2305,7 +2306,8 @@ nfs_add_or_obtain(struct dentry *dentry, struct nfs_fh *fhandle,
>  	d_drop(dentry);
>  
>  	if (fhandle->size == 0) {
> -		error = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
> +		error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
> +					       fhandle, fattr);
>  		if (error)
>  			goto out_error;
>  	}
> diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
> index 2d53574da605..973aed9cc5fe 100644
> --- a/fs/nfs/namespace.c
> +++ b/fs/nfs/namespace.c
> @@ -308,7 +308,7 @@ int nfs_submount(struct fs_context *fc, struct nfs_server *server)
>  	int err;
>  
>  	/* Look it up again to get its attributes */
> -	err = server->nfs_client->rpc_ops->lookup(d_inode(parent), dentry,
> +	err = server->nfs_client->rpc_ops->lookup(d_inode(parent), dentry, &dentry->d_name,
>  						  ctx->mntfh, ctx->clone_data.fattr);
>  	dput(parent);
>  	if (err != 0)
> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> index 1566163c6d85..ce70768e0201 100644
> --- a/fs/nfs/nfs3proc.c
> +++ b/fs/nfs/nfs3proc.c
> @@ -192,7 +192,7 @@ __nfs3_proc_lookup(struct inode *dir, const char *name, size_t len,
>  }
>  
>  static int
> -nfs3_proc_lookup(struct inode *dir, struct dentry *dentry,
> +nfs3_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
>  		 struct nfs_fh *fhandle, struct nfs_fattr *fattr)
>  {
>  	unsigned short task_flags = 0;
> @@ -202,8 +202,7 @@ nfs3_proc_lookup(struct inode *dir, struct dentry *dentry,
>  		task_flags |= RPC_TASK_TIMEOUT;
>  
>  	dprintk("NFS call  lookup %pd2\n", dentry);
> -	return __nfs3_proc_lookup(dir, dentry->d_name.name,
> -				  dentry->d_name.len, fhandle, fattr,
> +	return __nfs3_proc_lookup(dir, name->name, name->len, fhandle, fattr,
>  				  task_flags);
>  }
>  
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 405f17e6e0b4..4d85068e820d 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -4536,15 +4536,15 @@ nfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
>  }
>  
>  static int _nfs4_proc_lookup(struct rpc_clnt *clnt, struct inode *dir,
> -		struct dentry *dentry, struct nfs_fh *fhandle,
> -		struct nfs_fattr *fattr)
> +		struct dentry *dentry, const struct qstr *name,
> +		struct nfs_fh *fhandle, struct nfs_fattr *fattr)
>  {
>  	struct nfs_server *server = NFS_SERVER(dir);
>  	int		       status;
>  	struct nfs4_lookup_arg args = {
>  		.bitmask = server->attr_bitmask,
>  		.dir_fh = NFS_FH(dir),
> -		.name = &dentry->d_name,
> +		.name = name,
>  	};
>  	struct nfs4_lookup_res res = {
>  		.server = server,
> @@ -4586,17 +4586,16 @@ static void nfs_fixup_secinfo_attributes(struct nfs_fattr *fattr)
>  }
>  
>  static int nfs4_proc_lookup_common(struct rpc_clnt **clnt, struct inode *dir,
> -				   struct dentry *dentry, struct nfs_fh *fhandle,
> -				   struct nfs_fattr *fattr)
> +				   struct dentry *dentry, const struct qstr *name,
> +				   struct nfs_fh *fhandle, struct nfs_fattr *fattr)
>  {
>  	struct nfs4_exception exception = {
>  		.interruptible = true,
>  	};
>  	struct rpc_clnt *client = *clnt;
> -	const struct qstr *name = &dentry->d_name;
>  	int err;
>  	do {
> -		err = _nfs4_proc_lookup(client, dir, dentry, fhandle, fattr);
> +		err = _nfs4_proc_lookup(client, dir, dentry, name, fhandle, fattr);
>  		trace_nfs4_lookup(dir, name, err);
>  		switch (err) {
>  		case -NFS4ERR_BADNAME:
> @@ -4631,13 +4630,13 @@ static int nfs4_proc_lookup_common(struct rpc_clnt **clnt, struct inode *dir,
>  	return err;
>  }
>  
> -static int nfs4_proc_lookup(struct inode *dir, struct dentry *dentry,
> +static int nfs4_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
>  			    struct nfs_fh *fhandle, struct nfs_fattr *fattr)
>  {
>  	int status;
>  	struct rpc_clnt *client = NFS_CLIENT(dir);
>  
> -	status = nfs4_proc_lookup_common(&client, dir, dentry, fhandle, fattr);
> +	status = nfs4_proc_lookup_common(&client, dir, dentry, name, fhandle, fattr);
>  	if (client != NFS_CLIENT(dir)) {
>  		rpc_shutdown_client(client);
>  		nfs_fixup_secinfo_attributes(fattr);
> @@ -4652,7 +4651,8 @@ nfs4_proc_lookup_mountpoint(struct inode *dir, struct dentry *dentry,
>  	struct rpc_clnt *client = NFS_CLIENT(dir);
>  	int status;
>  
> -	status = nfs4_proc_lookup_common(&client, dir, dentry, fhandle, fattr);
> +	status = nfs4_proc_lookup_common(&client, dir, dentry, &dentry->d_name,
> +					 fhandle, fattr);
>  	if (status < 0)
>  		return ERR_PTR(status);
>  	return (client == NFS_CLIENT(dir)) ? rpc_clone_client(client) : client;
> diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
> index 6c09cd090c34..77920a2e3cef 100644
> --- a/fs/nfs/proc.c
> +++ b/fs/nfs/proc.c
> @@ -153,13 +153,13 @@ nfs_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
>  }
>  
>  static int
> -nfs_proc_lookup(struct inode *dir, struct dentry *dentry,
> +nfs_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
>  		struct nfs_fh *fhandle, struct nfs_fattr *fattr)
>  {
>  	struct nfs_diropargs	arg = {
>  		.fh		= NFS_FH(dir),
> -		.name		= dentry->d_name.name,
> -		.len		= dentry->d_name.len
> +		.name		= name->name,
> +		.len		= name->len
>  	};
>  	struct nfs_diropok	res = {
>  		.fh		= fhandle,
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 559273a0f16d..08b62bbf59f0 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1785,7 +1785,7 @@ struct nfs_rpc_ops {
>  			    struct nfs_fattr *, struct inode *);
>  	int	(*setattr) (struct dentry *, struct nfs_fattr *,
>  			    struct iattr *);
> -	int	(*lookup)  (struct inode *, struct dentry *,
> +	int	(*lookup)  (struct inode *, struct dentry *, const struct qstr *,
>  			    struct nfs_fh *, struct nfs_fattr *);
>  	int	(*lookupp) (struct inode *, struct nfs_fh *,
>  			    struct nfs_fattr *);

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 14/20] fuse_dentry_revalidate(): use stable parent inode and name passed by caller
  2025-01-16  5:23     ` [PATCH v2 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
@ 2025-01-17 15:18       ` Jeff Layton
  0 siblings, 0 replies; 96+ messages in thread
From: Jeff Layton @ 2025-01-17 15:18 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

On Thu, 2025-01-16 at 05:23 +0000, Al Viro wrote:
> No need to mess with dget_parent() for the former; for the latter we really should
> not rely upon ->d_name.name remaining stable - it's a real-life UAF.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/fuse/dir.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index d9e9f26917eb..7e93a8470c36 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -196,7 +196,6 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
>  				  struct dentry *entry, unsigned int flags)
>  {
>  	struct inode *inode;
> -	struct dentry *parent;
>  	struct fuse_mount *fm;
>  	struct fuse_inode *fi;
>  	int ret;
> @@ -228,11 +227,9 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
>  
>  		attr_version = fuse_get_attr_version(fm->fc);
>  
> -		parent = dget_parent(entry);
> -		fuse_lookup_init(fm->fc, &args, get_node_id(d_inode(parent)),
> -				 &entry->d_name, &outarg);
> +		fuse_lookup_init(fm->fc, &args, get_node_id(dir),
> +				 name, &outarg);
>  		ret = fuse_simple_request(fm, &args);
> -		dput(parent);
>  		/* Zero nodeid is same as -ENOENT */
>  		if (!ret && !outarg.nodeid)
>  			ret = -ENOENT;
> @@ -266,9 +263,7 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
>  			if (test_bit(FUSE_I_INIT_RDPLUS, &fi->state))
>  				return -ECHILD;
>  		} else if (test_and_clear_bit(FUSE_I_INIT_RDPLUS, &fi->state)) {
> -			parent = dget_parent(entry);
> -			fuse_advise_use_readdirplus(d_inode(parent));
> -			dput(parent);
> +			fuse_advise_use_readdirplus(dir);
>  		}
>  	}
>  	ret = 1;

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller
  2025-01-16  5:23     ` [PATCH v2 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
@ 2025-01-17 15:20       ` Jeff Layton
  0 siblings, 0 replies; 96+ messages in thread
From: Jeff Layton @ 2025-01-17 15:20 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

On Thu, 2025-01-16 at 05:23 +0000, Al Viro wrote:
> The only thing it's using is parent directory inode and we are already
> given a stable reference to that - no need to bother with boilerplate.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/crypto/fname.c | 21 +++++----------------
>  1 file changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
> index 389f5b2bf63b..010f9c0a4c2f 100644
> --- a/fs/crypto/fname.c
> +++ b/fs/crypto/fname.c
> @@ -574,12 +574,10 @@ EXPORT_SYMBOL_GPL(fscrypt_fname_siphash);
>   * Validate dentries in encrypted directories to make sure we aren't potentially
>   * caching stale dentries after a key has been added.
>   */
> -int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
> +int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
>  			 struct dentry *dentry, unsigned int flags)
>  {
> -	struct dentry *dir;
>  	int err;
> -	int valid;
>  
>  	/*
>  	 * Plaintext names are always valid, since fscrypt doesn't support
> @@ -592,30 +590,21 @@ int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
>  	/*
>  	 * No-key name; valid if the directory's key is still unavailable.
>  	 *
> -	 * Although fscrypt forbids rename() on no-key names, we still must use
> -	 * dget_parent() here rather than use ->d_parent directly.  That's
> -	 * because a corrupted fs image may contain directory hard links, which
> -	 * the VFS handles by moving the directory's dentry tree in the dcache
> -	 * each time ->lookup() finds the directory and it already has a dentry
> -	 * elsewhere.  Thus ->d_parent can be changing, and we must safely grab
> -	 * a reference to some ->d_parent to prevent it from being freed.
> +	 * Note in RCU mode we have to bail if we get here -
> +	 * fscrypt_get_encryption_info() may block.
>  	 */
>  
>  	if (flags & LOOKUP_RCU)
>  		return -ECHILD;
>  
> -	dir = dget_parent(dentry);
>  	/*
>  	 * Pass allow_unsupported=true, so that files with an unsupported
>  	 * encryption policy can be deleted.
>  	 */
> -	err = fscrypt_get_encryption_info(d_inode(dir), true);
> -	valid = !fscrypt_has_encryption_key(d_inode(dir));
> -	dput(dir);
> -
> +	err = fscrypt_get_encryption_info(dir, true);
>  	if (err < 0)
>  		return err;
>  
> -	return valid;
> +	return !fscrypt_has_encryption_key(dir);
>  }
>  EXPORT_SYMBOL_GPL(fscrypt_d_revalidate);

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 13/20] vfat_revalidate{,_ci}(): use stable parent inode passed by caller
  2025-01-16  5:23     ` [PATCH v2 13/20] vfat_revalidate{,_ci}(): " Al Viro
@ 2025-01-17 15:22       ` Jeff Layton
  0 siblings, 0 replies; 96+ messages in thread
From: Jeff Layton @ 2025-01-17 15:22 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

On Thu, 2025-01-16 at 05:23 +0000, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/fat/namei_vfat.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
> index f9cbd5c6f932..926c26e90ef8 100644
> --- a/fs/fat/namei_vfat.c
> +++ b/fs/fat/namei_vfat.c
> @@ -43,14 +43,9 @@ static inline void vfat_d_version_set(struct dentry *dentry,
>   * If it happened, the negative dentry isn't actually negative
>   * anymore.  So, drop it.
>   */
> -static int vfat_revalidate_shortname(struct dentry *dentry)
> +static bool vfat_revalidate_shortname(struct dentry *dentry, struct inode *dir)
>  {
> -	int ret = 1;
> -	spin_lock(&dentry->d_lock);
> -	if (!inode_eq_iversion(d_inode(dentry->d_parent), vfat_d_version(dentry)))
> -		ret = 0;
> -	spin_unlock(&dentry->d_lock);
> -	return ret;
> +	return inode_eq_iversion(dir, vfat_d_version(dentry));
>  }
>  
>  static int vfat_revalidate(struct inode *dir, const struct qstr *name,
> @@ -62,7 +57,7 @@ static int vfat_revalidate(struct inode *dir, const struct qstr *name,
>  	/* This is not negative dentry. Always valid. */
>  	if (d_really_is_positive(dentry))
>  		return 1;
> -	return vfat_revalidate_shortname(dentry);
> +	return vfat_revalidate_shortname(dentry, dir);
>  }
>  
>  static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
> @@ -99,7 +94,7 @@ static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
>  	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
>  		return 0;
>  
> -	return vfat_revalidate_shortname(dentry);
> +	return vfat_revalidate_shortname(dentry, dir);
>  }
>  
>  /* returns the length of a struct qstr, ignoring trailing dots */

Reviewed-by: Jeff Layton <jlayton@kernel.org>

Al, you can also add my R-b to 1-12 too.

Nice work!

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 10/20] ceph_d_revalidate(): propagate stable name down into request enconding
  2025-01-16  5:23     ` [PATCH v2 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
@ 2025-01-17 18:35       ` Jeff Layton
  0 siblings, 0 replies; 96+ messages in thread
From: Jeff Layton @ 2025-01-17 18:35 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

On Thu, 2025-01-16 at 05:23 +0000, Al Viro wrote:
> Currently get_fscrypt_altname() requires ->r_dentry->d_name to be stable
> and it gets that in almost all cases.  The only exception is ->d_revalidate(),
> where we have a stable name, but it's passed separately - dentry->d_name
> is not stable there.
> 
> Propagate it down to get_fscrypt_altname() as a new field of struct
> ceph_mds_request - ->r_dname, to be used instead ->r_dentry->d_name
> when non-NULL.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/ceph/dir.c        | 2 ++
>  fs/ceph/mds_client.c | 9 ++++++---
>  fs/ceph/mds_client.h | 2 ++
>  3 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index dc5f55bebad7..62e99e65250d 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -1998,6 +1998,8 @@ static int ceph_d_revalidate(struct inode *dir, const struct qstr *name,
>  			req->r_parent = dir;
>  			ihold(dir);
>  
> +			req->r_dname = name;
> +
>  			mask = CEPH_STAT_CAP_INODE | CEPH_CAP_AUTH_SHARED;
>  			if (ceph_security_xattr_wanted(dir))
>  				mask |= CEPH_CAP_XATTR_SHARED;
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 219a2cc2bf3c..3b766b984713 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2621,6 +2621,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
>  {
>  	struct inode *dir = req->r_parent;
>  	struct dentry *dentry = req->r_dentry;
> +	const struct qstr *name = req->r_dname;
>  	u8 *cryptbuf = NULL;
>  	u32 len = 0;
>  	int ret = 0;
> @@ -2641,8 +2642,10 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
>  	if (!fscrypt_has_encryption_key(dir))
>  		goto success;
>  
> -	if (!fscrypt_fname_encrypted_size(dir, dentry->d_name.len, NAME_MAX,
> -					  &len)) {
> +	if (!name)
> +		name = &dentry->d_name;
> +
> +	if (!fscrypt_fname_encrypted_size(dir, name->len, NAME_MAX, &len)) {
>  		WARN_ON_ONCE(1);
>  		return ERR_PTR(-ENAMETOOLONG);
>  	}
> @@ -2657,7 +2660,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
>  	if (!cryptbuf)
>  		return ERR_PTR(-ENOMEM);
>  
> -	ret = fscrypt_fname_encrypt(dir, &dentry->d_name, cryptbuf, len);
> +	ret = fscrypt_fname_encrypt(dir, name, cryptbuf, len);
>  	if (ret) {
>  		kfree(cryptbuf);
>  		return ERR_PTR(ret);
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index 38bb7e0d2d79..7c9fee9e80d4 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -299,6 +299,8 @@ struct ceph_mds_request {
>  	struct inode *r_target_inode;       /* resulting inode */
>  	struct inode *r_new_inode;	    /* new inode (for creates) */
>  
> +	const struct qstr *r_dname;	    /* stable name (for ->d_revalidate) */
> +
>  #define CEPH_MDS_R_DIRECT_IS_HASH	(1) /* r_direct_hash is valid */
>  #define CEPH_MDS_R_ABORTED		(2) /* call was aborted */
>  #define CEPH_MDS_R_GOT_UNSAFE		(3) /* got an unsafe reply */

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 09/20] ceph_d_revalidate(): use stable parent inode passed by caller
  2025-01-16  5:23     ` [PATCH v2 09/20] ceph_d_revalidate(): use stable " Al Viro
@ 2025-01-17 18:35       ` Jeff Layton
  0 siblings, 0 replies; 96+ messages in thread
From: Jeff Layton @ 2025-01-17 18:35 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

On Thu, 2025-01-16 at 05:23 +0000, Al Viro wrote:
> No need to mess with the boilerplate for obtaining what we already
> have.  Note that ceph is one of the "will want a path from filesystem
> root if we want to talk to server" cases, so the name of the last
> component is of little use - it is passed to fscrypt_d_revalidate()
> and it's used to deal with (also crypt-related) case in request
> marshalling, when encrypted name turns out to be too long.  The former
> is not a problem, but the latter is racy; that part will be handled
> in the next commit.
> 
> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/ceph/dir.c | 22 ++++------------------
>  1 file changed, 4 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index c4c71c24221b..dc5f55bebad7 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -1940,30 +1940,19 @@ static int dir_lease_is_valid(struct inode *dir, struct dentry *dentry,
>  /*
>   * Check if cached dentry can be trusted.
>   */
> -static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
> +static int ceph_d_revalidate(struct inode *dir, const struct qstr *name,
>  			     struct dentry *dentry, unsigned int flags)
>  {
>  	struct ceph_mds_client *mdsc = ceph_sb_to_fs_client(dentry->d_sb)->mdsc;
>  	struct ceph_client *cl = mdsc->fsc->client;
>  	int valid = 0;
> -	struct dentry *parent;
> -	struct inode *dir, *inode;
> +	struct inode *inode;
>  
> -	valid = fscrypt_d_revalidate(parent_dir, name, dentry, flags);
> +	valid = fscrypt_d_revalidate(dir, name, dentry, flags);
>  	if (valid <= 0)
>  		return valid;
>  
> -	if (flags & LOOKUP_RCU) {
> -		parent = READ_ONCE(dentry->d_parent);
> -		dir = d_inode_rcu(parent);
> -		if (!dir)
> -			return -ECHILD;
> -		inode = d_inode_rcu(dentry);
> -	} else {
> -		parent = dget_parent(dentry);
> -		dir = d_inode(parent);
> -		inode = d_inode(dentry);
> -	}
> +	inode = d_inode_rcu(dentry);
>  
>  	doutc(cl, "%p '%pd' inode %p offset 0x%llx nokey %d\n",
>  	      dentry, dentry, inode, ceph_dentry(dentry)->offset,
> @@ -2039,9 +2028,6 @@ static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
>  	doutc(cl, "%p '%pd' %s\n", dentry, dentry, valid ? "valid" : "invalid");
>  	if (!valid)
>  		ceph_dir_clear_complete(dir);
> -
> -	if (!(flags & LOOKUP_RCU))
> -		dput(parent);
>  	return valid;
>  }
>  

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate()
  2025-01-16  5:23     ` [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
  2025-01-16 15:15       ` Gabriel Krisman Bertazi
@ 2025-01-17 18:55       ` Jeff Layton
  2025-01-17 19:00         ` Al Viro
  1 sibling, 1 reply; 96+ messages in thread
From: Jeff Layton @ 2025-01-17 18:55 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

On Thu, 2025-01-16 at 05:23 +0000, Al Viro wrote:
> ->d_revalidate() often needs to access dentry parent and name; that has
> to be done carefully, since the locking environment varies from caller
> to caller.  We are not guaranteed that dentry in question will not be
> moved right under us - not unless the filesystem is such that nothing
> on it ever gets renamed.
> 
> It can be dealt with, but that results in boilerplate code that isn't
> even needed - the callers normally have just found the dentry via dcache
> lookup and want to verify that it's in the right place; they already
> have the values of ->d_parent and ->d_name stable.  There is a couple
> of exceptions (overlayfs and, to less extent, ecryptfs), but for the
> majority of calls that song and dance is not needed at all.
> 
> It's easier to make ecryptfs and overlayfs find and pass those values if
> there's a ->d_revalidate() instance to be called, rather than doing that
> in the instances.
> 
> This commit only changes the calling conventions; making use of supplied
> values is left to followups.
> 
> NOTE: some instances need more than just the parent - things like CIFS
> may need to build an entire path from filesystem root, so they need
> more precautions than the usual boilerplate.  This series doesn't
> do anything to that need - these filesystems have to keep their locking
> mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
> a-la v9fs).
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  Documentation/filesystems/locking.rst |  3 ++-
>  Documentation/filesystems/porting.rst | 13 +++++++++++++
>  Documentation/filesystems/vfs.rst     |  3 ++-
>  fs/9p/vfs_dentry.c                    | 10 ++++++++--
>  fs/afs/dir.c                          |  6 ++++--
>  fs/ceph/dir.c                         |  5 +++--
>  fs/coda/dir.c                         |  3 ++-
>  fs/crypto/fname.c                     |  3 ++-
>  fs/ecryptfs/dentry.c                  | 18 ++++++++++++++----
>  fs/exfat/namei.c                      |  3 ++-
>  fs/fat/namei_vfat.c                   |  6 ++++--
>  fs/fuse/dir.c                         |  3 ++-
>  fs/gfs2/dentry.c                      |  7 +++++--
>  fs/hfs/sysdep.c                       |  3 ++-
>  fs/jfs/namei.c                        |  3 ++-
>  fs/kernfs/dir.c                       |  3 ++-
>  fs/namei.c                            | 18 ++++++++++--------
>  fs/nfs/dir.c                          |  9 ++++++---
>  fs/ocfs2/dcache.c                     |  3 ++-
>  fs/orangefs/dcache.c                  |  3 ++-
>  fs/overlayfs/super.c                  | 22 ++++++++++++++++++++--
>  fs/proc/base.c                        |  6 ++++--
>  fs/proc/fd.c                          |  3 ++-
>  fs/proc/generic.c                     |  6 ++++--
>  fs/proc/proc_sysctl.c                 |  3 ++-
>  fs/smb/client/dir.c                   |  3 ++-
>  fs/tracefs/inode.c                    |  3 ++-
>  fs/vboxsf/dir.c                       |  3 ++-
>  include/linux/dcache.h                |  3 ++-
>  include/linux/fscrypt.h               |  7 ++++---
>  30 files changed, 133 insertions(+), 51 deletions(-)
> 
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index f5e3676db954..146e7d8aa736 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -17,7 +17,8 @@ dentry_operations
>  
>  prototypes::
>  
> -	int (*d_revalidate)(struct dentry *, unsigned int);
> +	int (*d_revalidate)(struct inode *, const struct qstr *,
> +			    struct dentry *, unsigned int);
>  	int (*d_weak_revalidate)(struct dentry *, unsigned int);
>  	int (*d_hash)(const struct dentry *, struct qstr *);
>  	int (*d_compare)(const struct dentry *,
> diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
> index 9ab2a3d6f2b4..b50c3ce36ef2 100644
> --- a/Documentation/filesystems/porting.rst
> +++ b/Documentation/filesystems/porting.rst
> @@ -1141,3 +1141,16 @@ pointer are gone.
>  
>  set_blocksize() takes opened struct file instead of struct block_device now
>  and it *must* be opened exclusive.
> +
> +---
> +
> +** mandatory**
> +
> +->d_revalidate() gets two extra arguments - inode of parent directory and
> +name our dentry is expected to have.  Both are stable (dir is pinned in
> +non-RCU case and will stay around during the call in RCU case, and name
> +is guaranteed to stay unchanging).  Your instance doesn't have to use
> +either, but it often helps to avoid a lot of painful boilerplate.
> +NOTE: if you need something like full path from the root of filesystem,
> +you are still on your own - this assists with simple cases, but it's not
> +magic.
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index 0b18af3f954e..7c352ebaae98 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -1251,7 +1251,8 @@ defined:
>  .. code-block:: c
>  
>  	struct dentry_operations {
> -		int (*d_revalidate)(struct dentry *, unsigned int);
> +		int (*d_revalidate)(struct inode *, const struct qstr *,
> +				    struct dentry *, unsigned int);
>  		int (*d_weak_revalidate)(struct dentry *, unsigned int);
>  		int (*d_hash)(const struct dentry *, struct qstr *);
>  		int (*d_compare)(const struct dentry *,
> diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
> index 01338d4c2d9e..872c1abe3295 100644
> --- a/fs/9p/vfs_dentry.c
> +++ b/fs/9p/vfs_dentry.c
> @@ -61,7 +61,7 @@ static void v9fs_dentry_release(struct dentry *dentry)
>  		p9_fid_put(hlist_entry(p, struct p9_fid, dlist));
>  }
>  
> -static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
> +static int __v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
>  {
>  	struct p9_fid *fid;
>  	struct inode *inode;
> @@ -99,9 +99,15 @@ static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
>  	return 1;
>  }
>  
> +static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
> +				  struct dentry *dentry, unsigned int flags)
> +{
> +	return __v9fs_lookup_revalidate(dentry, flags);
> +}
> +
>  const struct dentry_operations v9fs_cached_dentry_operations = {
>  	.d_revalidate = v9fs_lookup_revalidate,
> -	.d_weak_revalidate = v9fs_lookup_revalidate,
> +	.d_weak_revalidate = __v9fs_lookup_revalidate,
>  	.d_delete = v9fs_cached_dentry_delete,
>  	.d_release = v9fs_dentry_release,
>  };
> diff --git a/fs/afs/dir.c b/fs/afs/dir.c
> index ada363af5aab..9780013cd83a 100644
> --- a/fs/afs/dir.c
> +++ b/fs/afs/dir.c
> @@ -22,7 +22,8 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
>  				 unsigned int flags);
>  static int afs_dir_open(struct inode *inode, struct file *file);
>  static int afs_readdir(struct file *file, struct dir_context *ctx);
> -static int afs_d_revalidate(struct dentry *dentry, unsigned int flags);
> +static int afs_d_revalidate(struct inode *dir, const struct qstr *name,
> +			    struct dentry *dentry, unsigned int flags);
>  static int afs_d_delete(const struct dentry *dentry);
>  static void afs_d_iput(struct dentry *dentry, struct inode *inode);
>  static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name, int nlen,
> @@ -1093,7 +1094,8 @@ static int afs_d_revalidate_rcu(struct dentry *dentry)
>   * - NOTE! the hit can be a negative hit too, so we can't assume we have an
>   *   inode
>   */
> -static int afs_d_revalidate(struct dentry *dentry, unsigned int flags)
> +static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
> +			    struct dentry *dentry, unsigned int flags)
>  {
>  	struct afs_vnode *vnode, *dir;
>  	struct afs_fid fid;
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 0bf388e07a02..c4c71c24221b 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -1940,7 +1940,8 @@ static int dir_lease_is_valid(struct inode *dir, struct dentry *dentry,
>  /*
>   * Check if cached dentry can be trusted.
>   */
> -static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
> +static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
> +			     struct dentry *dentry, unsigned int flags)
>  {
>  	struct ceph_mds_client *mdsc = ceph_sb_to_fs_client(dentry->d_sb)->mdsc;
>  	struct ceph_client *cl = mdsc->fsc->client;
> @@ -1948,7 +1949,7 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
>  	struct dentry *parent;
>  	struct inode *dir, *inode;
>  
> -	valid = fscrypt_d_revalidate(dentry, flags);
> +	valid = fscrypt_d_revalidate(parent_dir, name, dentry, flags);
>  	if (valid <= 0)
>  		return valid;
>  
> diff --git a/fs/coda/dir.c b/fs/coda/dir.c
> index 4e552ba7bd43..a3e2dfeedfbf 100644
> --- a/fs/coda/dir.c
> +++ b/fs/coda/dir.c
> @@ -445,7 +445,8 @@ static int coda_readdir(struct file *coda_file, struct dir_context *ctx)
>  }
>  
>  /* called when a cache lookup succeeds */
> -static int coda_dentry_revalidate(struct dentry *de, unsigned int flags)
> +static int coda_dentry_revalidate(struct inode *dir, const struct qstr *name,
> +				  struct dentry *de, unsigned int flags)
>  {
>  	struct inode *inode;
>  	struct coda_inode_info *cii;
> diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
> index 0ad52fbe51c9..389f5b2bf63b 100644
> --- a/fs/crypto/fname.c
> +++ b/fs/crypto/fname.c
> @@ -574,7 +574,8 @@ EXPORT_SYMBOL_GPL(fscrypt_fname_siphash);
>   * Validate dentries in encrypted directories to make sure we aren't potentially
>   * caching stale dentries after a key has been added.
>   */
> -int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
> +int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
> +			 struct dentry *dentry, unsigned int flags)
>  {
>  	struct dentry *dir;
>  	int err;
> diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
> index acaa0825e9bb..1dfd5b81d831 100644
> --- a/fs/ecryptfs/dentry.c
> +++ b/fs/ecryptfs/dentry.c
> @@ -17,7 +17,9 @@
>  
>  /**
>   * ecryptfs_d_revalidate - revalidate an ecryptfs dentry
> - * @dentry: The ecryptfs dentry
> + * @dir: inode of expected parent
> + * @name: expected name
> + * @dentry: dentry to revalidate
>   * @flags: lookup flags
>   *
>   * Called when the VFS needs to revalidate a dentry. This
> @@ -28,7 +30,8 @@
>   * Returns 1 if valid, 0 otherwise.
>   *
>   */
> -static int ecryptfs_d_revalidate(struct dentry *dentry, unsigned int flags)
> +static int ecryptfs_d_revalidate(struct inode *dir, const struct qstr *name,
> +				 struct dentry *dentry, unsigned int flags)
>  {
>  	struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
>  	int rc = 1;
> @@ -36,8 +39,15 @@ static int ecryptfs_d_revalidate(struct dentry *dentry, unsigned int flags)
>  	if (flags & LOOKUP_RCU)
>  		return -ECHILD;
>  
> -	if (lower_dentry->d_flags & DCACHE_OP_REVALIDATE)
> -		rc = lower_dentry->d_op->d_revalidate(lower_dentry, flags);
> +	if (lower_dentry->d_flags & DCACHE_OP_REVALIDATE) {
> +		struct inode *lower_dir = ecryptfs_inode_to_lower(dir);
> +		struct name_snapshot n;
> +
> +		take_dentry_name_snapshot(&n, lower_dentry);
> +		rc = lower_dentry->d_op->d_revalidate(lower_dir, &n.name,
> +						      lower_dentry, flags);
> +		release_dentry_name_snapshot(&n);
> +	}
>  
>  	if (d_really_is_positive(dentry)) {
>  		struct inode *inode = d_inode(dentry);
> diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
> index 97d2774760fe..e3b4feccba07 100644
> --- a/fs/exfat/namei.c
> +++ b/fs/exfat/namei.c
> @@ -31,7 +31,8 @@ static inline void exfat_d_version_set(struct dentry *dentry,
>   * If it happened, the negative dentry isn't actually negative anymore.  So,
>   * drop it.
>   */
> -static int exfat_d_revalidate(struct dentry *dentry, unsigned int flags)
> +static int exfat_d_revalidate(struct inode *dir, const struct qstr *name,
> +			      struct dentry *dentry, unsigned int flags)
>  {
>  	int ret;
>  
> diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
> index 15bf32c21ac0..f9cbd5c6f932 100644
> --- a/fs/fat/namei_vfat.c
> +++ b/fs/fat/namei_vfat.c
> @@ -53,7 +53,8 @@ static int vfat_revalidate_shortname(struct dentry *dentry)
>  	return ret;
>  }
>  
> -static int vfat_revalidate(struct dentry *dentry, unsigned int flags)
> +static int vfat_revalidate(struct inode *dir, const struct qstr *name,
> +			   struct dentry *dentry, unsigned int flags)
>  {
>  	if (flags & LOOKUP_RCU)
>  		return -ECHILD;
> @@ -64,7 +65,8 @@ static int vfat_revalidate(struct dentry *dentry, unsigned int flags)
>  	return vfat_revalidate_shortname(dentry);
>  }
>  
> -static int vfat_revalidate_ci(struct dentry *dentry, unsigned int flags)
> +static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
> +			      struct dentry *dentry, unsigned int flags)
>  {
>  	if (flags & LOOKUP_RCU)
>  		return -ECHILD;
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 494ac372ace0..d9e9f26917eb 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -192,7 +192,8 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
>   * the lookup once more.  If the lookup results in the same inode,
>   * then refresh the attributes, timeouts and mark the dentry valid.
>   */
> -static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
> +static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
> +				  struct dentry *entry, unsigned int flags)
>  {
>  	struct inode *inode;
>  	struct dentry *parent;
> diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c
> index 2e215e8c3c88..86c338901fab 100644
> --- a/fs/gfs2/dentry.c
> +++ b/fs/gfs2/dentry.c
> @@ -21,7 +21,9 @@
>  
>  /**
>   * gfs2_drevalidate - Check directory lookup consistency
> - * @dentry: the mapping to check
> + * @dir: expected parent directory inode
> + * @name: expexted name
> + * @dentry: dentry to check
>   * @flags: lookup flags
>   *
>   * Check to make sure the lookup necessary to arrive at this inode from its
> @@ -30,7 +32,8 @@
>   * Returns: 1 if the dentry is ok, 0 if it isn't
>   */
>  
> -static int gfs2_drevalidate(struct dentry *dentry, unsigned int flags)
> +static int gfs2_drevalidate(struct inode *dir, const struct qstr *name,
> +			    struct dentry *dentry, unsigned int flags)
>  {
>  	struct dentry *parent;
>  	struct gfs2_sbd *sdp;
> diff --git a/fs/hfs/sysdep.c b/fs/hfs/sysdep.c
> index 76fa02e3835b..ef54fc8093cf 100644
> --- a/fs/hfs/sysdep.c
> +++ b/fs/hfs/sysdep.c
> @@ -13,7 +13,8 @@
>  
>  /* dentry case-handling: just lowercase everything */
>  
> -static int hfs_revalidate_dentry(struct dentry *dentry, unsigned int flags)
> +static int hfs_revalidate_dentry(struct inode *dir, const struct qstr *name,
> +				 struct dentry *dentry, unsigned int flags)
>  {
>  	struct inode *inode;
>  	int diff;
> diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
> index d68a4e6ac345..fc8ede43afde 100644
> --- a/fs/jfs/namei.c
> +++ b/fs/jfs/namei.c
> @@ -1576,7 +1576,8 @@ static int jfs_ci_compare(const struct dentry *dentry,
>  	return result;
>  }
>  
> -static int jfs_ci_revalidate(struct dentry *dentry, unsigned int flags)
> +static int jfs_ci_revalidate(struct inode *dir, const struct qstr *name,
> +			     struct dentry *dentry, unsigned int flags)
>  {
>  	/*
>  	 * This is not negative dentry. Always valid.
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 458519e416fe..5f0f8b95f44c 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1109,7 +1109,8 @@ struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent,
>  	return ERR_PTR(rc);
>  }
>  
> -static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
> +static int kernfs_dop_revalidate(struct inode *dir, const struct qstr *name,
> +				 struct dentry *dentry, unsigned int flags)
>  {
>  	struct kernfs_node *kn;
>  	struct kernfs_root *root;
> diff --git a/fs/namei.c b/fs/namei.c
> index 9d30c7aa9aa6..77e5d136faaf 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -921,10 +921,11 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
>  	return false;
>  }
>  
> -static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
> +static inline int d_revalidate(struct inode *dir, const struct qstr *name,
> +			       struct dentry *dentry, unsigned int flags)
>  {
>  	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
> -		return dentry->d_op->d_revalidate(dentry, flags);
> +		return dentry->d_op->d_revalidate(dir, name, dentry, flags);

I know I sent a R-b for this, but one question:

Suppose we get back a positive result (dentry is still good), but the
name and dentry->d_name no longer match. Do we need to do any special
handling in that case?

>  	else
>  		return 1;
>  }
> @@ -1652,7 +1653,7 @@ static struct dentry *lookup_dcache(const struct qstr *name,
>  {
>  	struct dentry *dentry = d_lookup(dir, name);
>  	if (dentry) {
> -		int error = d_revalidate(dentry, flags);
> +		int error = d_revalidate(dir->d_inode, name, dentry, flags);
>  		if (unlikely(error <= 0)) {
>  			if (!error)
>  				d_invalidate(dentry);
> @@ -1737,19 +1738,20 @@ static struct dentry *lookup_fast(struct nameidata *nd)
>  		if (read_seqcount_retry(&parent->d_seq, nd->seq))
>  			return ERR_PTR(-ECHILD);
>  
> -		status = d_revalidate(dentry, nd->flags);
> +		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags);
>  		if (likely(status > 0))
>  			return dentry;
>  		if (!try_to_unlazy_next(nd, dentry))
>  			return ERR_PTR(-ECHILD);
>  		if (status == -ECHILD)
>  			/* we'd been told to redo it in non-rcu mode */
> -			status = d_revalidate(dentry, nd->flags);
> +			status = d_revalidate(nd->inode, &nd->last,
> +					      dentry, nd->flags);
>  	} else {
>  		dentry = __d_lookup(parent, &nd->last);
>  		if (unlikely(!dentry))
>  			return NULL;
> -		status = d_revalidate(dentry, nd->flags);
> +		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags);
>  	}
>  	if (unlikely(status <= 0)) {
>  		if (!status)
> @@ -1777,7 +1779,7 @@ static struct dentry *__lookup_slow(const struct qstr *name,
>  	if (IS_ERR(dentry))
>  		return dentry;
>  	if (unlikely(!d_in_lookup(dentry))) {
> -		int error = d_revalidate(dentry, flags);
> +		int error = d_revalidate(inode, name, dentry, flags);
>  		if (unlikely(error <= 0)) {
>  			if (!error) {
>  				d_invalidate(dentry);
> @@ -3575,7 +3577,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
>  		if (d_in_lookup(dentry))
>  			break;
>  
> -		error = d_revalidate(dentry, nd->flags);
> +		error = d_revalidate(dir_inode, &nd->last, dentry, nd->flags);
>  		if (likely(error > 0))
>  			break;
>  		if (error)
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 492cffd9d3d8..9910d9796f4c 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1814,7 +1814,8 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
>  	return ret;
>  }
>  
> -static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
> +static int nfs_lookup_revalidate(struct inode *dir, const struct qstr *name,
> +				 struct dentry *dentry, unsigned int flags)
>  {
>  	return __nfs_lookup_revalidate(dentry, flags, nfs_do_lookup_revalidate);
>  }
> @@ -2025,7 +2026,8 @@ void nfs_d_prune_case_insensitive_aliases(struct inode *inode)
>  EXPORT_SYMBOL_GPL(nfs_d_prune_case_insensitive_aliases);
>  
>  #if IS_ENABLED(CONFIG_NFS_V4)
> -static int nfs4_lookup_revalidate(struct dentry *, unsigned int);
> +static int nfs4_lookup_revalidate(struct inode *, const struct qstr *,
> +				  struct dentry *, unsigned int);
>  
>  const struct dentry_operations nfs4_dentry_operations = {
>  	.d_revalidate	= nfs4_lookup_revalidate,
> @@ -2260,7 +2262,8 @@ nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
>  	return nfs_do_lookup_revalidate(dir, dentry, flags);
>  }
>  
> -static int nfs4_lookup_revalidate(struct dentry *dentry, unsigned int flags)
> +static int nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
> +				  struct dentry *dentry, unsigned int flags)
>  {
>  	return __nfs_lookup_revalidate(dentry, flags,
>  			nfs4_do_lookup_revalidate);
> diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c
> index a9b8688aaf30..ecb1ce6301c4 100644
> --- a/fs/ocfs2/dcache.c
> +++ b/fs/ocfs2/dcache.c
> @@ -32,7 +32,8 @@ void ocfs2_dentry_attach_gen(struct dentry *dentry)
>  }
>  
>  
> -static int ocfs2_dentry_revalidate(struct dentry *dentry, unsigned int flags)
> +static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
> +				   struct dentry *dentry, unsigned int flags)
>  {
>  	struct inode *inode;
>  	int ret = 0;    /* if all else fails, just return false */
> diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
> index 395a00ed8ac7..c32c9a86e8d0 100644
> --- a/fs/orangefs/dcache.c
> +++ b/fs/orangefs/dcache.c
> @@ -92,7 +92,8 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
>   *
>   * Should return 1 if dentry can still be trusted, else 0.
>   */
> -static int orangefs_d_revalidate(struct dentry *dentry, unsigned int flags)
> +static int orangefs_d_revalidate(struct inode *dir, const struct qstr *name,
> +				 struct dentry *dentry, unsigned int flags)
>  {
>  	int ret;
>  	unsigned long time = (unsigned long) dentry->d_fsdata;
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index fe511192f83c..86ae6f6da36b 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -91,7 +91,24 @@ static int ovl_revalidate_real(struct dentry *d, unsigned int flags, bool weak)
>  		if (d->d_flags & DCACHE_OP_WEAK_REVALIDATE)
>  			ret =  d->d_op->d_weak_revalidate(d, flags);
>  	} else if (d->d_flags & DCACHE_OP_REVALIDATE) {
> -		ret = d->d_op->d_revalidate(d, flags);
> +		struct dentry *parent;
> +		struct inode *dir;
> +		struct name_snapshot n;
> +
> +		if (flags & LOOKUP_RCU) {
> +			parent = READ_ONCE(d->d_parent);
> +			dir = d_inode_rcu(parent);
> +			if (!dir)
> +				return -ECHILD;
> +		} else {
> +			parent = dget_parent(d);
> +			dir = d_inode(parent);
> +		}
> +		take_dentry_name_snapshot(&n, d);
> +		ret = d->d_op->d_revalidate(dir, &n.name, d, flags);
> +		release_dentry_name_snapshot(&n);
> +		if (!(flags & LOOKUP_RCU))
> +			dput(parent);
>  		if (!ret) {
>  			if (!(flags & LOOKUP_RCU))
>  				d_invalidate(d);
> @@ -127,7 +144,8 @@ static int ovl_dentry_revalidate_common(struct dentry *dentry,
>  	return ret;
>  }
>  
> -static int ovl_dentry_revalidate(struct dentry *dentry, unsigned int flags)
> +static int ovl_dentry_revalidate(struct inode *dir, const struct qstr *name,
> +				 struct dentry *dentry, unsigned int flags)
>  {
>  	return ovl_dentry_revalidate_common(dentry, flags, false);
>  }
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 0edf14a9840e..fb5493d0edf0 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2058,7 +2058,8 @@ void pid_update_inode(struct task_struct *task, struct inode *inode)
>   * performed a setuid(), etc.
>   *
>   */
> -static int pid_revalidate(struct dentry *dentry, unsigned int flags)
> +static int pid_revalidate(struct inode *dir, const struct qstr *name,
> +			  struct dentry *dentry, unsigned int flags)
>  {
>  	struct inode *inode;
>  	struct task_struct *task;
> @@ -2191,7 +2192,8 @@ static int dname_to_vma_addr(struct dentry *dentry,
>  	return 0;
>  }
>  
> -static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags)
> +static int map_files_d_revalidate(struct inode *dir, const struct qstr *name,
> +				  struct dentry *dentry, unsigned int flags)
>  {
>  	unsigned long vm_start, vm_end;
>  	bool exact_vma_exists = false;
> diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> index 24baf23e864f..37aa778d1af7 100644
> --- a/fs/proc/fd.c
> +++ b/fs/proc/fd.c
> @@ -140,7 +140,8 @@ static void tid_fd_update_inode(struct task_struct *task, struct inode *inode,
>  	security_task_to_inode(task, inode);
>  }
>  
> -static int tid_fd_revalidate(struct dentry *dentry, unsigned int flags)
> +static int tid_fd_revalidate(struct inode *dir, const struct qstr *name,
> +			     struct dentry *dentry, unsigned int flags)
>  {
>  	struct task_struct *task;
>  	struct inode *inode;
> diff --git a/fs/proc/generic.c b/fs/proc/generic.c
> index dbe82cf23ee4..8ec90826a49e 100644
> --- a/fs/proc/generic.c
> +++ b/fs/proc/generic.c
> @@ -216,7 +216,8 @@ void proc_free_inum(unsigned int inum)
>  	ida_free(&proc_inum_ida, inum - PROC_DYNAMIC_FIRST);
>  }
>  
> -static int proc_misc_d_revalidate(struct dentry *dentry, unsigned int flags)
> +static int proc_misc_d_revalidate(struct inode *dir, const struct qstr *name,
> +				  struct dentry *dentry, unsigned int flags)
>  {
>  	if (flags & LOOKUP_RCU)
>  		return -ECHILD;
> @@ -343,7 +344,8 @@ static const struct file_operations proc_dir_operations = {
>  	.iterate_shared		= proc_readdir,
>  };
>  
> -static int proc_net_d_revalidate(struct dentry *dentry, unsigned int flags)
> +static int proc_net_d_revalidate(struct inode *dir, const struct qstr *name,
> +				 struct dentry *dentry, unsigned int flags)
>  {
>  	return 0;
>  }
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index 27a283d85a6e..cc9d74a06ff0 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -884,7 +884,8 @@ static const struct inode_operations proc_sys_dir_operations = {
>  	.getattr	= proc_sys_getattr,
>  };
>  
> -static int proc_sys_revalidate(struct dentry *dentry, unsigned int flags)
> +static int proc_sys_revalidate(struct inode *dir, const struct qstr *name,
> +			       struct dentry *dentry, unsigned int flags)
>  {
>  	if (flags & LOOKUP_RCU)
>  		return -ECHILD;
> diff --git a/fs/smb/client/dir.c b/fs/smb/client/dir.c
> index 864b194dbaa0..8c5d44ee91ed 100644
> --- a/fs/smb/client/dir.c
> +++ b/fs/smb/client/dir.c
> @@ -737,7 +737,8 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
>  }
>  
>  static int
> -cifs_d_revalidate(struct dentry *direntry, unsigned int flags)
> +cifs_d_revalidate(struct inode *dir, const struct qstr *name,
> +		  struct dentry *direntry, unsigned int flags)
>  {
>  	struct inode *inode;
>  	int rc;
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index cfc614c638da..53214499e384 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -457,7 +457,8 @@ static void tracefs_d_release(struct dentry *dentry)
>  		eventfs_d_release(dentry);
>  }
>  
> -static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags)
> +static int tracefs_d_revalidate(struct inode *inode, const struct qstr *name,
> +				struct dentry *dentry, unsigned int flags)
>  {
>  	struct eventfs_inode *ei = dentry->d_fsdata;
>  
> diff --git a/fs/vboxsf/dir.c b/fs/vboxsf/dir.c
> index 5f1a14d5b927..a859ac9b74ba 100644
> --- a/fs/vboxsf/dir.c
> +++ b/fs/vboxsf/dir.c
> @@ -192,7 +192,8 @@ const struct file_operations vboxsf_dir_fops = {
>   * This is called during name resolution/lookup to check if the @dentry in
>   * the cache is still valid. the job is handled by vboxsf_inode_revalidate.
>   */
> -static int vboxsf_dentry_revalidate(struct dentry *dentry, unsigned int flags)
> +static int vboxsf_dentry_revalidate(struct inode *dir, const struct qstr *name,
> +				    struct dentry *dentry, unsigned int flags)
>  {
>  	if (flags & LOOKUP_RCU)
>  		return -ECHILD;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 8bc567a35718..4a6bdadf2f29 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -144,7 +144,8 @@ enum d_real_type {
>  };
>  
>  struct dentry_operations {
> -	int (*d_revalidate)(struct dentry *, unsigned int);
> +	int (*d_revalidate)(struct inode *, const struct qstr *,
> +			    struct dentry *, unsigned int);
>  	int (*d_weak_revalidate)(struct dentry *, unsigned int);
>  	int (*d_hash)(const struct dentry *, struct qstr *);
>  	int (*d_compare)(const struct dentry *,
> diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> index 772f822dc6b8..18855cb44b1c 100644
> --- a/include/linux/fscrypt.h
> +++ b/include/linux/fscrypt.h
> @@ -192,7 +192,8 @@ struct fscrypt_operations {
>  					     unsigned int *num_devs);
>  };
>  
> -int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags);
> +int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
> +			 struct dentry *dentry, unsigned int flags);
>  
>  static inline struct fscrypt_inode_info *
>  fscrypt_get_inode_info(const struct inode *inode)
> @@ -711,8 +712,8 @@ static inline u64 fscrypt_fname_siphash(const struct inode *dir,
>  	return 0;
>  }
>  
> -static inline int fscrypt_d_revalidate(struct dentry *dentry,
> -				       unsigned int flags)
> +static inline int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
> +				       struct dentry *dentry, unsigned int flags)
>  {
>  	return 1;
>  }

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate()
  2025-01-17 18:55       ` Jeff Layton
@ 2025-01-17 19:00         ` Al Viro
  0 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-17 19:00 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

On Fri, Jan 17, 2025 at 01:55:01PM -0500, Jeff Layton wrote:
> > +static inline int d_revalidate(struct inode *dir, const struct qstr *name,
> > +			       struct dentry *dentry, unsigned int flags)
> >  {
> >  	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
> > -		return dentry->d_op->d_revalidate(dentry, flags);
> > +		return dentry->d_op->d_revalidate(dir, name, dentry, flags);
> 
> I know I sent a R-b for this, but one question:
> 
> Suppose we get back a positive result (dentry is still good), but the
> name and dentry->d_name no longer match. Do we need to do any special
> handling in that case?

Not really - it's the same situation we'd have if it got renamed right
after we'd concluded that everything's fine, after all.  We can't spin
there indefinitely, rechecking the name for changes ;-)

In RCU mode ->d_seq mismatch guaranteed to catch that, in non-RCU...
well, there's really nothing we could do - rename could've bloody
well happened just as we'd completed pathname resolution.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller
  2025-01-16  5:23     ` [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
@ 2025-01-22 20:27       ` David Howells
  2025-01-22 21:01         ` Al Viro
  0 siblings, 1 reply; 96+ messages in thread
From: David Howells @ 2025-01-22 20:27 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, linux-fsdevel, agruenba, amir73il, brauner, ceph-devel,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

Al Viro <viro@zeniv.linux.org.uk> wrote:

> -	_enter("{%lu},%p{%pd},", dir->i_ino, dentry, dentry);
> +	_enter("{%lu},{%s},", dir->i_ino, name->name);

I don't think that name->name is guaranteed to be NUL-terminated after
name->len characters.  The following:

	_enter("{%lu},{%*s},", dir->i_ino, name->len, name->name);

might be better, though:

	_enter("{%lu},{%*.*s},", dir->i_ino, name->len, name->len, name->name);

might be necessary.

Apart from that:

Acked-by: David Howells <dhowells@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller
  2025-01-22 20:27       ` David Howells
@ 2025-01-22 21:01         ` Al Viro
  2025-01-22 21:24           ` Al Viro
  0 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-22 21:01 UTC (permalink / raw)
  To: David Howells
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, hubcap,
	jack, krisman, linux-nfs, miklos, torvalds

On Wed, Jan 22, 2025 at 08:27:41PM +0000, David Howells wrote:
> Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> > -	_enter("{%lu},%p{%pd},", dir->i_ino, dentry, dentry);
> > +	_enter("{%lu},{%s},", dir->i_ino, name->name);
> 
> I don't think that name->name is guaranteed to be NUL-terminated after
> name->len characters.  The following:
> 
> 	_enter("{%lu},{%*s},", dir->i_ino, name->len, name->name);
> 
> might be better, though:
> 
> 	_enter("{%lu},{%*.*s},", dir->i_ino, name->len, name->len, name->name);
> 
> might be necessary.

Good catch (and that definitely needs to be documented in previous commit),
but what's wrong with
	_enter("{%lu},{%.*s},", dir->i_ino, name->len, name->name);

After looking through the rest of the series, fuse and orangefs patches
need to be adjusted.  Not caught in testing since there similar braino
manifests as stray invalidates ;-/

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller
  2025-01-22 21:01         ` Al Viro
@ 2025-01-22 21:24           ` Al Viro
  2025-01-22 21:55             ` David Howells
  0 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-22 21:24 UTC (permalink / raw)
  To: David Howells
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, hubcap,
	jack, krisman, linux-nfs, miklos, torvalds

On Wed, Jan 22, 2025 at 09:01:24PM +0000, Al Viro wrote:
> On Wed, Jan 22, 2025 at 08:27:41PM +0000, David Howells wrote:
> > Al Viro <viro@zeniv.linux.org.uk> wrote:
> > 
> > > -	_enter("{%lu},%p{%pd},", dir->i_ino, dentry, dentry);
> > > +	_enter("{%lu},{%s},", dir->i_ino, name->name);
> > 
> > I don't think that name->name is guaranteed to be NUL-terminated after
> > name->len characters.  The following:
> > 
> > 	_enter("{%lu},{%*s},", dir->i_ino, name->len, name->name);
> > 
> > might be better, though:
> > 
> > 	_enter("{%lu},{%*.*s},", dir->i_ino, name->len, name->len, name->name);
> > 
> > might be necessary.
> 
> Good catch (and that definitely needs to be documented in previous commit),
> but what's wrong with
> 	_enter("{%lu},{%.*s},", dir->i_ino, name->len, name->name);

IOW, are you OK with the following?

commit bf61e4013ab1cb9a819303faca018e7b7cbaf3e7
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Fri Jan 3 00:27:27 2025 -0500

    afs_d_revalidate(): use stable name and parent inode passed by caller
    
    No need to bother with boilerplate for obtaining the latter and for
    the former we really should not count upon ->d_name.name remaining
    stable under us.
    
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 9780013cd83a..e04cffe4beb1 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -607,19 +607,19 @@ static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name,
  * Do a lookup of a single name in a directory
  * - just returns the FID the dentry name maps to if found
  */
-static int afs_do_lookup_one(struct inode *dir, struct dentry *dentry,
+static int afs_do_lookup_one(struct inode *dir, const struct qstr *name,
 			     struct afs_fid *fid, struct key *key,
 			     afs_dataversion_t *_dir_version)
 {
 	struct afs_super_info *as = dir->i_sb->s_fs_info;
 	struct afs_lookup_one_cookie cookie = {
 		.ctx.actor = afs_lookup_one_filldir,
-		.name = dentry->d_name,
+		.name = *name,
 		.fid.vid = as->volume->vid
 	};
 	int ret;
 
-	_enter("{%lu},%p{%pd},", dir->i_ino, dentry, dentry);
+	_enter("{%lu},{%.*s},", dir->i_ino, name->len, name->name);
 
 	/* search the directory */
 	ret = afs_dir_iterate(dir, &cookie.ctx, key, _dir_version);
@@ -1052,21 +1052,12 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
 /*
  * Check the validity of a dentry under RCU conditions.
  */
-static int afs_d_revalidate_rcu(struct dentry *dentry)
+static int afs_d_revalidate_rcu(struct afs_vnode *dvnode, struct dentry *dentry)
 {
-	struct afs_vnode *dvnode;
-	struct dentry *parent;
-	struct inode *dir;
 	long dir_version, de_version;
 
 	_enter("%p", dentry);
 
-	/* Check the parent directory is still valid first. */
-	parent = READ_ONCE(dentry->d_parent);
-	dir = d_inode_rcu(parent);
-	if (!dir)
-		return -ECHILD;
-	dvnode = AFS_FS_I(dir);
 	if (test_bit(AFS_VNODE_DELETED, &dvnode->flags))
 		return -ECHILD;
 
@@ -1097,9 +1088,8 @@ static int afs_d_revalidate_rcu(struct dentry *dentry)
 static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 			    struct dentry *dentry, unsigned int flags)
 {
-	struct afs_vnode *vnode, *dir;
+	struct afs_vnode *vnode, *dir = AFS_FS_I(parent_dir);
 	struct afs_fid fid;
-	struct dentry *parent;
 	struct inode *inode;
 	struct key *key;
 	afs_dataversion_t dir_version, invalid_before;
@@ -1107,7 +1097,7 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	int ret;
 
 	if (flags & LOOKUP_RCU)
-		return afs_d_revalidate_rcu(dentry);
+		return afs_d_revalidate_rcu(dir, dentry);
 
 	if (d_really_is_positive(dentry)) {
 		vnode = AFS_FS_I(d_inode(dentry));
@@ -1122,14 +1112,9 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	if (IS_ERR(key))
 		key = NULL;
 
-	/* Hold the parent dentry so we can peer at it */
-	parent = dget_parent(dentry);
-	dir = AFS_FS_I(d_inode(parent));
-
 	/* validate the parent directory */
 	ret = afs_validate(dir, key);
 	if (ret == -ERESTARTSYS) {
-		dput(parent);
 		key_put(key);
 		return ret;
 	}
@@ -1157,7 +1142,7 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	afs_stat_v(dir, n_reval);
 
 	/* search the directory for this vnode */
-	ret = afs_do_lookup_one(&dir->netfs.inode, dentry, &fid, key, &dir_version);
+	ret = afs_do_lookup_one(&dir->netfs.inode, name, &fid, key, &dir_version);
 	switch (ret) {
 	case 0:
 		/* the filename maps to something */
@@ -1201,22 +1186,19 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 		goto out_valid;
 
 	default:
-		_debug("failed to iterate dir %pd: %d",
-		       parent, ret);
+		_debug("failed to iterate parent %pd2: %d", dentry, ret);
 		goto not_found;
 	}
 
 out_valid:
 	dentry->d_fsdata = (void *)(unsigned long)dir_version;
 out_valid_noupdate:
-	dput(parent);
 	key_put(key);
 	_leave(" = 1 [valid]");
 	return 1;
 
 not_found:
 	_debug("dropping dentry %pd2", dentry);
-	dput(parent);
 	key_put(key);
 
 	_leave(" = 0 [bad]");

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller
  2025-01-22 21:24           ` Al Viro
@ 2025-01-22 21:55             ` David Howells
  0 siblings, 0 replies; 96+ messages in thread
From: David Howells @ 2025-01-22 21:55 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, linux-fsdevel, agruenba, amir73il, brauner, ceph-devel,
	hubcap, jack, krisman, linux-nfs, miklos, torvalds

Al Viro <viro@zeniv.linux.org.uk> wrote:

> IOW, are you OK with the following?
> 
> commit bf61e4013ab1cb9a819303faca018e7b7cbaf3e7
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Fri Jan 3 00:27:27 2025 -0500
> 
>     afs_d_revalidate(): use stable name and parent inode passed by caller
>     
>     No need to bother with boilerplate for obtaining the latter and for
>     the former we really should not count upon ->d_name.name remaining
>     stable under us.
>     
>     Reviewed-by: Jeff Layton <jlayton@kernel.org>
>     Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Acked-by: David Howells <dhowells@redhat.com>


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCHES v3][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems)
  2025-01-16  5:21 ` [PATCHES v2][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
  2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
@ 2025-01-23  1:45   ` Al Viro
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
  1 sibling, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:45 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Gabriel Krisman Bertazi, Christian Brauner,
	Jan Kara, David Howells, ceph-devel, linux-nfs, Amir Goldstein,
	Miklos Szeredi, Andreas Gruenbacher, Mike Marshall

 	Series updated and force-pushed to the same place:
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.d_revalidate
itself on top of #work.dcache.
 
 	Individual patches in followups; please, review.
 
	Changes since v2:
* document that stable name passed to ->d_revalidate() may be followed by '/'
rather than NUL - in normal case it's given a pathname component in the
pathname being resolved and it doesn't have to be the last one.  Basically,
it's the situation as for ->d_hash() and ->d_compare() - ->len should not
be ignored.  AFS, FUSE and orangefs patches in the series ran afoul of that;
spotted (in AFS case) by dhowells.  Fixed; in case of afs it used to end up
with incorrect debugging printk, in case of fuse and orangefs - stray invalidations,
unfortunately not caught by testing.

 	Changes since v1:
* reordered external_name members to get rid of hole on 64bit, as suggested by
dhowells.
* split the added method in two in the last commit ("9p: fix ->rename_sem exclusion")

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size
  2025-01-23  1:45   ` [PATCHES v3][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
@ 2025-01-23  1:46     ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
                         ` (18 more replies)
  0 siblings, 19 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... calling the number of words DNAME_INLINE_WORDS.

The next step will be to have a structure to hold inline name arrays
(both in dentry and in name_snapshot) and use that to alias the
existing arrays of unsigned char there.  That will allow both
full-structure copies and convenient word-by-word accesses.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c            | 4 +---
 include/linux/dcache.h | 8 +++++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b4d5e9e1e43d..ea0f0bea511b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2748,9 +2748,7 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			/*
 			 * Both are internal.
 			 */
-			unsigned int i;
-			BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
-			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
+			for (int i = 0; i < DNAME_INLINE_WORDS; i++) {
 				swap(((long *) &dentry->d_iname)[i],
 				     ((long *) &target->d_iname)[i]);
 			}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index bff956f7b2b9..42dd89beaf4e 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -68,15 +68,17 @@ extern const struct qstr dotdot_name;
  * large memory footprint increase).
  */
 #ifdef CONFIG_64BIT
-# define DNAME_INLINE_LEN 40 /* 192 bytes */
+# define DNAME_INLINE_WORDS 5 /* 192 bytes */
 #else
 # ifdef CONFIG_SMP
-#  define DNAME_INLINE_LEN 36 /* 128 bytes */
+#  define DNAME_INLINE_WORDS 9 /* 128 bytes */
 # else
-#  define DNAME_INLINE_LEN 44 /* 128 bytes */
+#  define DNAME_INLINE_WORDS 11 /* 128 bytes */
 # endif
 #endif
 
+#define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
+
 #define d_lock	d_lockref.lock
 
 struct dentry {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 02/20] dcache: back inline names with a struct-wrapped array of unsigned long
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 03/20] make take_dentry_name_snapshot() lockless Al Viro
                         ` (17 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... so that they can be copied with struct assignment (which generates
better code) and accessed word-by-word.

The type is union shortname_storage; it's a union of arrays of
unsigned char and unsigned long.

struct name_snapshot.inline_name turned into union shortname_storage;
users (all in fs/dcache.c) adjusted.

struct dentry.d_iname has some users outside of fs/dcache.c; to
reduce the amount of noise in commit, it is replaced with
union shortname_storage d_shortname and d_iname is turned into a macro
that expands to d_shortname.string (similar to d_lock handling).
That compat macro is temporary - most of the remaining instances will
be taken out by debugfs series, and once that is merged and few others
are taken care of this will go away.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c                                  | 43 +++++++++-----------
 include/linux/dcache.h                       | 10 ++++-
 tools/testing/selftests/bpf/progs/find_vma.c |  2 +-
 3 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ea0f0bea511b..52662a5d08e4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -324,7 +324,7 @@ static void __d_free_external(struct rcu_head *head)
 
 static inline int dname_external(const struct dentry *dentry)
 {
-	return dentry->d_name.name != dentry->d_iname;
+	return dentry->d_name.name != dentry->d_shortname.string;
 }
 
 void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
@@ -334,9 +334,8 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry
 	if (unlikely(dname_external(dentry))) {
 		atomic_inc(&external_name(dentry)->u.count);
 	} else {
-		memcpy(name->inline_name, dentry->d_iname,
-		       dentry->d_name.len + 1);
-		name->name.name = name->inline_name;
+		name->inline_name = dentry->d_shortname;
+		name->name.name = name->inline_name.string;
 	}
 	spin_unlock(&dentry->d_lock);
 }
@@ -344,7 +343,7 @@ EXPORT_SYMBOL(take_dentry_name_snapshot);
 
 void release_dentry_name_snapshot(struct name_snapshot *name)
 {
-	if (unlikely(name->name.name != name->inline_name)) {
+	if (unlikely(name->name.name != name->inline_name.string)) {
 		struct external_name *p;
 		p = container_of(name->name.name, struct external_name, name[0]);
 		if (unlikely(atomic_dec_and_test(&p->u.count)))
@@ -1654,10 +1653,10 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	 * will still always have a NUL at the end, even if we might
 	 * be overwriting an internal NUL character
 	 */
-	dentry->d_iname[DNAME_INLINE_LEN-1] = 0;
+	dentry->d_shortname.string[DNAME_INLINE_LEN-1] = 0;
 	if (unlikely(!name)) {
 		name = &slash_name;
-		dname = dentry->d_iname;
+		dname = dentry->d_shortname.string;
 	} else if (name->len > DNAME_INLINE_LEN-1) {
 		size_t size = offsetof(struct external_name, name[1]);
 		struct external_name *p = kmalloc(size + name->len,
@@ -1670,7 +1669,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 		atomic_set(&p->u.count, 1);
 		dname = p->name;
 	} else  {
-		dname = dentry->d_iname;
+		dname = dentry->d_shortname.string;
 	}	
 
 	dentry->d_name.len = name->len;
@@ -2729,10 +2728,9 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			 * dentry:internal, target:external.  Steal target's
 			 * storage and make target internal.
 			 */
-			memcpy(target->d_iname, dentry->d_name.name,
-					dentry->d_name.len + 1);
 			dentry->d_name.name = target->d_name.name;
-			target->d_name.name = target->d_iname;
+			target->d_shortname = dentry->d_shortname;
+			target->d_name.name = target->d_shortname.string;
 		}
 	} else {
 		if (unlikely(dname_external(dentry))) {
@@ -2740,18 +2738,16 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			 * dentry:external, target:internal.  Give dentry's
 			 * storage to target and make dentry internal
 			 */
-			memcpy(dentry->d_iname, target->d_name.name,
-					target->d_name.len + 1);
 			target->d_name.name = dentry->d_name.name;
-			dentry->d_name.name = dentry->d_iname;
+			dentry->d_shortname = target->d_shortname;
+			dentry->d_name.name = dentry->d_shortname.string;
 		} else {
 			/*
 			 * Both are internal.
 			 */
-			for (int i = 0; i < DNAME_INLINE_WORDS; i++) {
-				swap(((long *) &dentry->d_iname)[i],
-				     ((long *) &target->d_iname)[i]);
-			}
+			for (int i = 0; i < DNAME_INLINE_WORDS; i++)
+				swap(dentry->d_shortname.words[i],
+				     target->d_shortname.words[i]);
 		}
 	}
 	swap(dentry->d_name.hash_len, target->d_name.hash_len);
@@ -2766,9 +2762,8 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 		atomic_inc(&external_name(target)->u.count);
 		dentry->d_name = target->d_name;
 	} else {
-		memcpy(dentry->d_iname, target->d_name.name,
-				target->d_name.len + 1);
-		dentry->d_name.name = dentry->d_iname;
+		dentry->d_shortname = target->d_shortname;
+		dentry->d_name.name = dentry->d_shortname.string;
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
 	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
@@ -3101,12 +3096,12 @@ void d_mark_tmpfile(struct file *file, struct inode *inode)
 {
 	struct dentry *dentry = file->f_path.dentry;
 
-	BUG_ON(dentry->d_name.name != dentry->d_iname ||
+	BUG_ON(dname_external(dentry) ||
 		!hlist_unhashed(&dentry->d_u.d_alias) ||
 		!d_unlinked(dentry));
 	spin_lock(&dentry->d_parent->d_lock);
 	spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-	dentry->d_name.len = sprintf(dentry->d_iname, "#%llu",
+	dentry->d_name.len = sprintf(dentry->d_shortname.string, "#%llu",
 				(unsigned long long)inode->i_ino);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dentry->d_parent->d_lock);
@@ -3194,7 +3189,7 @@ static void __init dcache_init(void)
 	 */
 	dentry_cache = KMEM_CACHE_USERCOPY(dentry,
 		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT,
-		d_iname);
+		d_shortname.string);
 
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 42dd89beaf4e..8bc567a35718 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -79,7 +79,13 @@ extern const struct qstr dotdot_name;
 
 #define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
 
+union shortname_store {
+	unsigned char string[DNAME_INLINE_LEN];
+	unsigned long words[DNAME_INLINE_WORDS];
+};
+
 #define d_lock	d_lockref.lock
+#define d_iname d_shortname.string
 
 struct dentry {
 	/* RCU lookup touched fields */
@@ -90,7 +96,7 @@ struct dentry {
 	struct qstr d_name;
 	struct inode *d_inode;		/* Where the name belongs to - NULL is
 					 * negative */
-	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
+	union shortname_store d_shortname;
 	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
 
 	/* Ref lookup also touches following */
@@ -591,7 +597,7 @@ static inline struct inode *d_real_inode(const struct dentry *dentry)
 
 struct name_snapshot {
 	struct qstr name;
-	unsigned char inline_name[DNAME_INLINE_LEN];
+	union shortname_store inline_name;
 };
 void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
 void release_dentry_name_snapshot(struct name_snapshot *);
diff --git a/tools/testing/selftests/bpf/progs/find_vma.c b/tools/testing/selftests/bpf/progs/find_vma.c
index 38034fb82530..02b82774469c 100644
--- a/tools/testing/selftests/bpf/progs/find_vma.c
+++ b/tools/testing/selftests/bpf/progs/find_vma.c
@@ -25,7 +25,7 @@ static long check_vma(struct task_struct *task, struct vm_area_struct *vma,
 {
 	if (vma->vm_file)
 		bpf_probe_read_kernel_str(d_iname, DNAME_INLINE_LEN - 1,
-					  vma->vm_file->f_path.dentry->d_iname);
+					  vma->vm_file->f_path.dentry->d_shortname.string);
 
 	/* check for VM_EXEC */
 	if (vma->vm_flags & VM_EXEC)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 03/20] make take_dentry_name_snapshot() lockless
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
  2025-01-23  1:46       ` [PATCH v3 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 04/20] dissolve external_name.u into separate members Al Viro
                         ` (16 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Use ->d_seq instead of grabbing ->d_lock; in case of shortname dentries
that avoids any stores to shared data objects and in case of long names
we are down to (unavoidable) atomic_inc on the external_name refcount.

Makes the thing safer as well - the areas where ->d_seq is held odd are
all nested inside the areas where ->d_lock is held, and the latter are
much more numerous.

NOTE: now that there is a lockless path where we might try to grab
a reference to an already doomed external_name instance, it is no
longer possible for external_name.u.count and external_name.u.head
to share space (kudos to Linus for spotting that).

To reduce the noise this commit just make external_name.u a struct
(instead of union); the next commit will dissolve it.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 52662a5d08e4..f387dc97df86 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -296,9 +296,9 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
 }
 
 struct external_name {
-	union {
-		atomic_t count;
-		struct rcu_head head;
+	struct {
+		atomic_t count;		// ->count and ->head can't be combined
+		struct rcu_head head;	// see take_dentry_name_snapshot()
 	} u;
 	unsigned char name[];
 };
@@ -329,15 +329,30 @@ static inline int dname_external(const struct dentry *dentry)
 
 void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
 {
-	spin_lock(&dentry->d_lock);
-	name->name = dentry->d_name;
-	if (unlikely(dname_external(dentry))) {
-		atomic_inc(&external_name(dentry)->u.count);
-	} else {
+	unsigned seq;
+	const unsigned char *s;
+
+	rcu_read_lock();
+retry:
+	seq = read_seqcount_begin(&dentry->d_seq);
+	s = READ_ONCE(dentry->d_name.name);
+	name->name.hash_len = dentry->d_name.hash_len;
+	name->name.name = name->inline_name.string;
+	if (likely(s == dentry->d_shortname.string)) {
 		name->inline_name = dentry->d_shortname;
-		name->name.name = name->inline_name.string;
+	} else {
+		struct external_name *p;
+		p = container_of(s, struct external_name, name[0]);
+		// get a valid reference
+		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
+			goto retry;
+		name->name.name = s;
 	}
-	spin_unlock(&dentry->d_lock);
+	if (read_seqcount_retry(&dentry->d_seq, seq)) {
+		release_dentry_name_snapshot(name);
+		goto retry;
+	}
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(take_dentry_name_snapshot);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 04/20] dissolve external_name.u into separate members
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
  2025-01-23  1:46       ` [PATCH v3 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
  2025-01-23  1:46       ` [PATCH v3 03/20] make take_dentry_name_snapshot() lockless Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
                         ` (15 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

kept separate from the previous commit to keep the noise separate
from actual changes...

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f387dc97df86..6f36d3e8c739 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -296,10 +296,8 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
 }
 
 struct external_name {
-	struct {
-		atomic_t count;		// ->count and ->head can't be combined
-		struct rcu_head head;	// see take_dentry_name_snapshot()
-	} u;
+	struct rcu_head head;	// ->head and ->count can't be combined
+	atomic_t count;		// see take_dentry_name_snapshot()
 	unsigned char name[];
 };
 
@@ -344,7 +342,7 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry
 		struct external_name *p;
 		p = container_of(s, struct external_name, name[0]);
 		// get a valid reference
-		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
+		if (unlikely(!atomic_inc_not_zero(&p->count)))
 			goto retry;
 		name->name.name = s;
 	}
@@ -361,8 +359,8 @@ void release_dentry_name_snapshot(struct name_snapshot *name)
 	if (unlikely(name->name.name != name->inline_name.string)) {
 		struct external_name *p;
 		p = container_of(name->name.name, struct external_name, name[0]);
-		if (unlikely(atomic_dec_and_test(&p->u.count)))
-			kfree_rcu(p, u.head);
+		if (unlikely(atomic_dec_and_test(&p->count)))
+			kfree_rcu(p, head);
 	}
 }
 EXPORT_SYMBOL(release_dentry_name_snapshot);
@@ -400,7 +398,7 @@ static void dentry_free(struct dentry *dentry)
 	WARN_ON(!hlist_unhashed(&dentry->d_u.d_alias));
 	if (unlikely(dname_external(dentry))) {
 		struct external_name *p = external_name(dentry);
-		if (likely(atomic_dec_and_test(&p->u.count))) {
+		if (likely(atomic_dec_and_test(&p->count))) {
 			call_rcu(&dentry->d_u.d_rcu, __d_free_external);
 			return;
 		}
@@ -1681,7 +1679,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 			kmem_cache_free(dentry_cache, dentry); 
 			return NULL;
 		}
-		atomic_set(&p->u.count, 1);
+		atomic_set(&p->count, 1);
 		dname = p->name;
 	} else  {
 		dname = dentry->d_shortname.string;
@@ -2774,15 +2772,15 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 	if (unlikely(dname_external(dentry)))
 		old_name = external_name(dentry);
 	if (unlikely(dname_external(target))) {
-		atomic_inc(&external_name(target)->u.count);
+		atomic_inc(&external_name(target)->count);
 		dentry->d_name = target->d_name;
 	} else {
 		dentry->d_shortname = target->d_shortname;
 		dentry->d_name.name = dentry->d_shortname.string;
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
-	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
-		kfree_rcu(old_name, u.head);
+	if (old_name && likely(atomic_dec_and_test(&old_name->count)))
+		kfree_rcu(old_name, head);
 }
 
 /*
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 05/20] ext4 fast_commit: make use of name_snapshot primitives
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (2 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 04/20] dissolve external_name.u into separate members Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
                         ` (14 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... rather than open-coding them.  As a bonus, that avoids the pointless
work with extra allocations, etc. for long names.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ext4/fast_commit.c | 29 +++++------------------------
 fs/ext4/fast_commit.h |  3 +--
 2 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 26c4fc37edcf..da4263a14a20 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -322,9 +322,7 @@ void ext4_fc_del(struct inode *inode)
 	WARN_ON(!list_empty(&ei->i_fc_dilist));
 	spin_unlock(&sbi->s_fc_lock);
 
-	if (fc_dentry->fcd_name.name &&
-		fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
-		kfree(fc_dentry->fcd_name.name);
+	release_dentry_name_snapshot(&fc_dentry->fcd_name);
 	kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
 
 	return;
@@ -449,22 +447,7 @@ static int __track_dentry_update(handle_t *handle, struct inode *inode,
 	node->fcd_op = dentry_update->op;
 	node->fcd_parent = dir->i_ino;
 	node->fcd_ino = inode->i_ino;
-	if (dentry->d_name.len > DNAME_INLINE_LEN) {
-		node->fcd_name.name = kmalloc(dentry->d_name.len, GFP_NOFS);
-		if (!node->fcd_name.name) {
-			kmem_cache_free(ext4_fc_dentry_cachep, node);
-			ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_NOMEM, handle);
-			mutex_lock(&ei->i_fc_lock);
-			return -ENOMEM;
-		}
-		memcpy((u8 *)node->fcd_name.name, dentry->d_name.name,
-			dentry->d_name.len);
-	} else {
-		memcpy(node->fcd_iname, dentry->d_name.name,
-			dentry->d_name.len);
-		node->fcd_name.name = node->fcd_iname;
-	}
-	node->fcd_name.len = dentry->d_name.len;
+	take_dentry_name_snapshot(&node->fcd_name, dentry);
 	INIT_LIST_HEAD(&node->fcd_dilist);
 	spin_lock(&sbi->s_fc_lock);
 	if (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
@@ -832,7 +815,7 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
 {
 	struct ext4_fc_dentry_info fcd;
 	struct ext4_fc_tl tl;
-	int dlen = fc_dentry->fcd_name.len;
+	int dlen = fc_dentry->fcd_name.name.len;
 	u8 *dst = ext4_fc_reserve_space(sb,
 			EXT4_FC_TAG_BASE_LEN + sizeof(fcd) + dlen, crc);
 
@@ -847,7 +830,7 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
 	dst += EXT4_FC_TAG_BASE_LEN;
 	memcpy(dst, &fcd, sizeof(fcd));
 	dst += sizeof(fcd);
-	memcpy(dst, fc_dentry->fcd_name.name, dlen);
+	memcpy(dst, fc_dentry->fcd_name.name.name, dlen);
 
 	return true;
 }
@@ -1328,9 +1311,7 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
 		list_del_init(&fc_dentry->fcd_dilist);
 		spin_unlock(&sbi->s_fc_lock);
 
-		if (fc_dentry->fcd_name.name &&
-			fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
-			kfree(fc_dentry->fcd_name.name);
+		release_dentry_name_snapshot(&fc_dentry->fcd_name);
 		kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
 		spin_lock(&sbi->s_fc_lock);
 	}
diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
index 2fadb2c4780c..3bd534e4dbbf 100644
--- a/fs/ext4/fast_commit.h
+++ b/fs/ext4/fast_commit.h
@@ -109,8 +109,7 @@ struct ext4_fc_dentry_update {
 	int fcd_op;		/* Type of update create / unlink / link */
 	int fcd_parent;		/* Parent inode number */
 	int fcd_ino;		/* Inode number */
-	struct qstr fcd_name;	/* Dirent name */
-	unsigned char fcd_iname[DNAME_INLINE_LEN];	/* Dirent name string */
+	struct name_snapshot fcd_name;	/* Dirent name */
 	struct list_head fcd_list;
 	struct list_head fcd_dilist;
 };
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 06/20] generic_ci_d_compare(): use shortname_storage
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (3 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
                         ` (13 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... and check the "name might be unstable" predicate
the right way.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/libfs.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 748ac5923154..3ad1b1b7fed6 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1789,7 +1789,7 @@ int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 {
 	const struct dentry *parent;
 	const struct inode *dir;
-	char strbuf[DNAME_INLINE_LEN];
+	union shortname_store strbuf;
 	struct qstr qstr;
 
 	/*
@@ -1809,22 +1809,23 @@ int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 	if (!dir || !IS_CASEFOLDED(dir))
 		return 1;
 
+	qstr.len = len;
+	qstr.name = str;
 	/*
 	 * If the dentry name is stored in-line, then it may be concurrently
 	 * modified by a rename.  If this happens, the VFS will eventually retry
 	 * the lookup, so it doesn't matter what ->d_compare() returns.
 	 * However, it's unsafe to call utf8_strncasecmp() with an unstable
 	 * string.  Therefore, we have to copy the name into a temporary buffer.
+	 * As above, len is guaranteed to match str, so the shortname case
+	 * is exactly when str points to ->d_shortname.
 	 */
-	if (len <= DNAME_INLINE_LEN - 1) {
-		memcpy(strbuf, str, len);
-		strbuf[len] = 0;
-		str = strbuf;
+	if (qstr.name == dentry->d_shortname.string) {
+		strbuf = dentry->d_shortname; // NUL is guaranteed to be in there
+		qstr.name = strbuf.string;
 		/* prevent compiler from optimizing out the temporary buffer */
 		barrier();
 	}
-	qstr.len = len;
-	qstr.name = str;
 
 	return utf8_strncasecmp(dentry->d_sb->s_encoding, name, &qstr);
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 07/20] Pass parent directory inode and expected name to ->d_revalidate()
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (4 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
                         ` (12 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

->d_revalidate() often needs to access dentry parent and name; that has
to be done carefully, since the locking environment varies from caller
to caller.  We are not guaranteed that dentry in question will not be
moved right under us - not unless the filesystem is such that nothing
on it ever gets renamed.

It can be dealt with, but that results in boilerplate code that isn't
even needed - the callers normally have just found the dentry via dcache
lookup and want to verify that it's in the right place; they already
have the values of ->d_parent and ->d_name stable.  There is a couple
of exceptions (overlayfs and, to less extent, ecryptfs), but for the
majority of calls that song and dance is not needed at all.

It's easier to make ecryptfs and overlayfs find and pass those values if
there's a ->d_revalidate() instance to be called, rather than doing that
in the instances.

This commit only changes the calling conventions; making use of supplied
values is left to followups.

NOTE: some instances need more than just the parent - things like CIFS
may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate.  This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).

One thing to keep in mind when using name is that name->name will normally
point into the pathname being resolved; the filename in question occupies
name->len bytes starting at name->name, and there is NUL somewhere after it,
but it the next byte might very well be '/' rather than '\0'.  Do not
ignore name->len.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/locking.rst |  3 ++-
 Documentation/filesystems/porting.rst | 16 ++++++++++++++++
 Documentation/filesystems/vfs.rst     |  3 ++-
 fs/9p/vfs_dentry.c                    | 10 ++++++++--
 fs/afs/dir.c                          |  6 ++++--
 fs/ceph/dir.c                         |  5 +++--
 fs/coda/dir.c                         |  3 ++-
 fs/crypto/fname.c                     |  3 ++-
 fs/ecryptfs/dentry.c                  | 18 ++++++++++++++----
 fs/exfat/namei.c                      |  3 ++-
 fs/fat/namei_vfat.c                   |  6 ++++--
 fs/fuse/dir.c                         |  3 ++-
 fs/gfs2/dentry.c                      |  7 +++++--
 fs/hfs/sysdep.c                       |  3 ++-
 fs/jfs/namei.c                        |  3 ++-
 fs/kernfs/dir.c                       |  3 ++-
 fs/namei.c                            | 18 ++++++++++--------
 fs/nfs/dir.c                          |  9 ++++++---
 fs/ocfs2/dcache.c                     |  3 ++-
 fs/orangefs/dcache.c                  |  3 ++-
 fs/overlayfs/super.c                  | 22 ++++++++++++++++++++--
 fs/proc/base.c                        |  6 ++++--
 fs/proc/fd.c                          |  3 ++-
 fs/proc/generic.c                     |  6 ++++--
 fs/proc/proc_sysctl.c                 |  3 ++-
 fs/smb/client/dir.c                   |  3 ++-
 fs/tracefs/inode.c                    |  3 ++-
 fs/vboxsf/dir.c                       |  3 ++-
 include/linux/dcache.h                |  3 ++-
 include/linux/fscrypt.h               |  7 ++++---
 30 files changed, 136 insertions(+), 51 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index f5e3676db954..146e7d8aa736 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -17,7 +17,8 @@ dentry_operations
 
 prototypes::
 
-	int (*d_revalidate)(struct dentry *, unsigned int);
+	int (*d_revalidate)(struct inode *, const struct qstr *,
+			    struct dentry *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
 	int (*d_compare)(const struct dentry *,
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 9ab2a3d6f2b4..568e7ea3c4ae 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1141,3 +1141,19 @@ pointer are gone.
 
 set_blocksize() takes opened struct file instead of struct block_device now
 and it *must* be opened exclusive.
+
+---
+
+** mandatory**
+
+->d_revalidate() gets two extra arguments - inode of parent directory and
+name our dentry is expected to have.  Both are stable (dir is pinned in
+non-RCU case and will stay around during the call in RCU case, and name
+is guaranteed to stay unchanging).  Your instance doesn't have to use
+either, but it often helps to avoid a lot of painful boilerplate.
+Note that while name->name is stable and NUL-terminated, it may (and
+often will) have name->name[name->len] equal to '/' rather than '\0' -
+in normal case it points into the pathname being looked up.
+NOTE: if you need something like full path from the root of filesystem,
+you are still on your own - this assists with simple cases, but it's not
+magic.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 0b18af3f954e..7c352ebaae98 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1251,7 +1251,8 @@ defined:
 .. code-block:: c
 
 	struct dentry_operations {
-		int (*d_revalidate)(struct dentry *, unsigned int);
+		int (*d_revalidate)(struct inode *, const struct qstr *,
+				    struct dentry *, unsigned int);
 		int (*d_weak_revalidate)(struct dentry *, unsigned int);
 		int (*d_hash)(const struct dentry *, struct qstr *);
 		int (*d_compare)(const struct dentry *,
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 01338d4c2d9e..872c1abe3295 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -61,7 +61,7 @@ static void v9fs_dentry_release(struct dentry *dentry)
 		p9_fid_put(hlist_entry(p, struct p9_fid, dlist));
 }
 
-static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
+static int __v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 {
 	struct p9_fid *fid;
 	struct inode *inode;
@@ -99,9 +99,15 @@ static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 	return 1;
 }
 
+static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
+{
+	return __v9fs_lookup_revalidate(dentry, flags);
+}
+
 const struct dentry_operations v9fs_cached_dentry_operations = {
 	.d_revalidate = v9fs_lookup_revalidate,
-	.d_weak_revalidate = v9fs_lookup_revalidate,
+	.d_weak_revalidate = __v9fs_lookup_revalidate,
 	.d_delete = v9fs_cached_dentry_delete,
 	.d_release = v9fs_dentry_release,
 };
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index ada363af5aab..9780013cd83a 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -22,7 +22,8 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
 				 unsigned int flags);
 static int afs_dir_open(struct inode *inode, struct file *file);
 static int afs_readdir(struct file *file, struct dir_context *ctx);
-static int afs_d_revalidate(struct dentry *dentry, unsigned int flags);
+static int afs_d_revalidate(struct inode *dir, const struct qstr *name,
+			    struct dentry *dentry, unsigned int flags);
 static int afs_d_delete(const struct dentry *dentry);
 static void afs_d_iput(struct dentry *dentry, struct inode *inode);
 static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name, int nlen,
@@ -1093,7 +1094,8 @@ static int afs_d_revalidate_rcu(struct dentry *dentry)
  * - NOTE! the hit can be a negative hit too, so we can't assume we have an
  *   inode
  */
-static int afs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+			    struct dentry *dentry, unsigned int flags)
 {
 	struct afs_vnode *vnode, *dir;
 	struct afs_fid fid;
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 0bf388e07a02..c4c71c24221b 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1940,7 +1940,8 @@ static int dir_lease_is_valid(struct inode *dir, struct dentry *dentry,
 /*
  * Check if cached dentry can be trusted.
  */
-static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+			     struct dentry *dentry, unsigned int flags)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_fs_client(dentry->d_sb)->mdsc;
 	struct ceph_client *cl = mdsc->fsc->client;
@@ -1948,7 +1949,7 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 	struct dentry *parent;
 	struct inode *dir, *inode;
 
-	valid = fscrypt_d_revalidate(dentry, flags);
+	valid = fscrypt_d_revalidate(parent_dir, name, dentry, flags);
 	if (valid <= 0)
 		return valid;
 
diff --git a/fs/coda/dir.c b/fs/coda/dir.c
index 4e552ba7bd43..a3e2dfeedfbf 100644
--- a/fs/coda/dir.c
+++ b/fs/coda/dir.c
@@ -445,7 +445,8 @@ static int coda_readdir(struct file *coda_file, struct dir_context *ctx)
 }
 
 /* called when a cache lookup succeeds */
-static int coda_dentry_revalidate(struct dentry *de, unsigned int flags)
+static int coda_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *de, unsigned int flags)
 {
 	struct inode *inode;
 	struct coda_inode_info *cii;
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 0ad52fbe51c9..389f5b2bf63b 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -574,7 +574,8 @@ EXPORT_SYMBOL_GPL(fscrypt_fname_siphash);
  * Validate dentries in encrypted directories to make sure we aren't potentially
  * caching stale dentries after a key has been added.
  */
-int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
+int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+			 struct dentry *dentry, unsigned int flags)
 {
 	struct dentry *dir;
 	int err;
diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
index acaa0825e9bb..1dfd5b81d831 100644
--- a/fs/ecryptfs/dentry.c
+++ b/fs/ecryptfs/dentry.c
@@ -17,7 +17,9 @@
 
 /**
  * ecryptfs_d_revalidate - revalidate an ecryptfs dentry
- * @dentry: The ecryptfs dentry
+ * @dir: inode of expected parent
+ * @name: expected name
+ * @dentry: dentry to revalidate
  * @flags: lookup flags
  *
  * Called when the VFS needs to revalidate a dentry. This
@@ -28,7 +30,8 @@
  * Returns 1 if valid, 0 otherwise.
  *
  */
-static int ecryptfs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int ecryptfs_d_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
 	int rc = 1;
@@ -36,8 +39,15 @@ static int ecryptfs_d_revalidate(struct dentry *dentry, unsigned int flags)
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	if (lower_dentry->d_flags & DCACHE_OP_REVALIDATE)
-		rc = lower_dentry->d_op->d_revalidate(lower_dentry, flags);
+	if (lower_dentry->d_flags & DCACHE_OP_REVALIDATE) {
+		struct inode *lower_dir = ecryptfs_inode_to_lower(dir);
+		struct name_snapshot n;
+
+		take_dentry_name_snapshot(&n, lower_dentry);
+		rc = lower_dentry->d_op->d_revalidate(lower_dir, &n.name,
+						      lower_dentry, flags);
+		release_dentry_name_snapshot(&n);
+	}
 
 	if (d_really_is_positive(dentry)) {
 		struct inode *inode = d_inode(dentry);
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 97d2774760fe..e3b4feccba07 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -31,7 +31,8 @@ static inline void exfat_d_version_set(struct dentry *dentry,
  * If it happened, the negative dentry isn't actually negative anymore.  So,
  * drop it.
  */
-static int exfat_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int exfat_d_revalidate(struct inode *dir, const struct qstr *name,
+			      struct dentry *dentry, unsigned int flags)
 {
 	int ret;
 
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 15bf32c21ac0..f9cbd5c6f932 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -53,7 +53,8 @@ static int vfat_revalidate_shortname(struct dentry *dentry)
 	return ret;
 }
 
-static int vfat_revalidate(struct dentry *dentry, unsigned int flags)
+static int vfat_revalidate(struct inode *dir, const struct qstr *name,
+			   struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
@@ -64,7 +65,8 @@ static int vfat_revalidate(struct dentry *dentry, unsigned int flags)
 	return vfat_revalidate_shortname(dentry);
 }
 
-static int vfat_revalidate_ci(struct dentry *dentry, unsigned int flags)
+static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
+			      struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 494ac372ace0..d9e9f26917eb 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -192,7 +192,8 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
  * the lookup once more.  If the lookup results in the same inode,
  * then refresh the attributes, timeouts and mark the dentry valid.
  */
-static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
+static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *entry, unsigned int flags)
 {
 	struct inode *inode;
 	struct dentry *parent;
diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c
index 2e215e8c3c88..86c338901fab 100644
--- a/fs/gfs2/dentry.c
+++ b/fs/gfs2/dentry.c
@@ -21,7 +21,9 @@
 
 /**
  * gfs2_drevalidate - Check directory lookup consistency
- * @dentry: the mapping to check
+ * @dir: expected parent directory inode
+ * @name: expexted name
+ * @dentry: dentry to check
  * @flags: lookup flags
  *
  * Check to make sure the lookup necessary to arrive at this inode from its
@@ -30,7 +32,8 @@
  * Returns: 1 if the dentry is ok, 0 if it isn't
  */
 
-static int gfs2_drevalidate(struct dentry *dentry, unsigned int flags)
+static int gfs2_drevalidate(struct inode *dir, const struct qstr *name,
+			    struct dentry *dentry, unsigned int flags)
 {
 	struct dentry *parent;
 	struct gfs2_sbd *sdp;
diff --git a/fs/hfs/sysdep.c b/fs/hfs/sysdep.c
index 76fa02e3835b..ef54fc8093cf 100644
--- a/fs/hfs/sysdep.c
+++ b/fs/hfs/sysdep.c
@@ -13,7 +13,8 @@
 
 /* dentry case-handling: just lowercase everything */
 
-static int hfs_revalidate_dentry(struct dentry *dentry, unsigned int flags)
+static int hfs_revalidate_dentry(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	int diff;
diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index d68a4e6ac345..fc8ede43afde 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -1576,7 +1576,8 @@ static int jfs_ci_compare(const struct dentry *dentry,
 	return result;
 }
 
-static int jfs_ci_revalidate(struct dentry *dentry, unsigned int flags)
+static int jfs_ci_revalidate(struct inode *dir, const struct qstr *name,
+			     struct dentry *dentry, unsigned int flags)
 {
 	/*
 	 * This is not negative dentry. Always valid.
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 458519e416fe..5f0f8b95f44c 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1109,7 +1109,8 @@ struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent,
 	return ERR_PTR(rc);
 }
 
-static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
+static int kernfs_dop_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	struct kernfs_node *kn;
 	struct kernfs_root *root;
diff --git a/fs/namei.c b/fs/namei.c
index 9d30c7aa9aa6..77e5d136faaf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -921,10 +921,11 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 	return false;
 }
 
-static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
+static inline int d_revalidate(struct inode *dir, const struct qstr *name,
+			       struct dentry *dentry, unsigned int flags)
 {
 	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
-		return dentry->d_op->d_revalidate(dentry, flags);
+		return dentry->d_op->d_revalidate(dir, name, dentry, flags);
 	else
 		return 1;
 }
@@ -1652,7 +1653,7 @@ static struct dentry *lookup_dcache(const struct qstr *name,
 {
 	struct dentry *dentry = d_lookup(dir, name);
 	if (dentry) {
-		int error = d_revalidate(dentry, flags);
+		int error = d_revalidate(dir->d_inode, name, dentry, flags);
 		if (unlikely(error <= 0)) {
 			if (!error)
 				d_invalidate(dentry);
@@ -1737,19 +1738,20 @@ static struct dentry *lookup_fast(struct nameidata *nd)
 		if (read_seqcount_retry(&parent->d_seq, nd->seq))
 			return ERR_PTR(-ECHILD);
 
-		status = d_revalidate(dentry, nd->flags);
+		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags);
 		if (likely(status > 0))
 			return dentry;
 		if (!try_to_unlazy_next(nd, dentry))
 			return ERR_PTR(-ECHILD);
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
-			status = d_revalidate(dentry, nd->flags);
+			status = d_revalidate(nd->inode, &nd->last,
+					      dentry, nd->flags);
 	} else {
 		dentry = __d_lookup(parent, &nd->last);
 		if (unlikely(!dentry))
 			return NULL;
-		status = d_revalidate(dentry, nd->flags);
+		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags);
 	}
 	if (unlikely(status <= 0)) {
 		if (!status)
@@ -1777,7 +1779,7 @@ static struct dentry *__lookup_slow(const struct qstr *name,
 	if (IS_ERR(dentry))
 		return dentry;
 	if (unlikely(!d_in_lookup(dentry))) {
-		int error = d_revalidate(dentry, flags);
+		int error = d_revalidate(inode, name, dentry, flags);
 		if (unlikely(error <= 0)) {
 			if (!error) {
 				d_invalidate(dentry);
@@ -3575,7 +3577,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		if (d_in_lookup(dentry))
 			break;
 
-		error = d_revalidate(dentry, nd->flags);
+		error = d_revalidate(dir_inode, &nd->last, dentry, nd->flags);
 		if (likely(error > 0))
 			break;
 		if (error)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 492cffd9d3d8..9910d9796f4c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1814,7 +1814,8 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
 	return ret;
 }
 
-static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
+static int nfs_lookup_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	return __nfs_lookup_revalidate(dentry, flags, nfs_do_lookup_revalidate);
 }
@@ -2025,7 +2026,8 @@ void nfs_d_prune_case_insensitive_aliases(struct inode *inode)
 EXPORT_SYMBOL_GPL(nfs_d_prune_case_insensitive_aliases);
 
 #if IS_ENABLED(CONFIG_NFS_V4)
-static int nfs4_lookup_revalidate(struct dentry *, unsigned int);
+static int nfs4_lookup_revalidate(struct inode *, const struct qstr *,
+				  struct dentry *, unsigned int);
 
 const struct dentry_operations nfs4_dentry_operations = {
 	.d_revalidate	= nfs4_lookup_revalidate,
@@ -2260,7 +2262,8 @@ nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 	return nfs_do_lookup_revalidate(dir, dentry, flags);
 }
 
-static int nfs4_lookup_revalidate(struct dentry *dentry, unsigned int flags)
+static int nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
 {
 	return __nfs_lookup_revalidate(dentry, flags,
 			nfs4_do_lookup_revalidate);
diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c
index a9b8688aaf30..ecb1ce6301c4 100644
--- a/fs/ocfs2/dcache.c
+++ b/fs/ocfs2/dcache.c
@@ -32,7 +32,8 @@ void ocfs2_dentry_attach_gen(struct dentry *dentry)
 }
 
 
-static int ocfs2_dentry_revalidate(struct dentry *dentry, unsigned int flags)
+static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				   struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	int ret = 0;    /* if all else fails, just return false */
diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
index 395a00ed8ac7..c32c9a86e8d0 100644
--- a/fs/orangefs/dcache.c
+++ b/fs/orangefs/dcache.c
@@ -92,7 +92,8 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
  *
  * Should return 1 if dentry can still be trusted, else 0.
  */
-static int orangefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int orangefs_d_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	int ret;
 	unsigned long time = (unsigned long) dentry->d_fsdata;
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index fe511192f83c..86ae6f6da36b 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -91,7 +91,24 @@ static int ovl_revalidate_real(struct dentry *d, unsigned int flags, bool weak)
 		if (d->d_flags & DCACHE_OP_WEAK_REVALIDATE)
 			ret =  d->d_op->d_weak_revalidate(d, flags);
 	} else if (d->d_flags & DCACHE_OP_REVALIDATE) {
-		ret = d->d_op->d_revalidate(d, flags);
+		struct dentry *parent;
+		struct inode *dir;
+		struct name_snapshot n;
+
+		if (flags & LOOKUP_RCU) {
+			parent = READ_ONCE(d->d_parent);
+			dir = d_inode_rcu(parent);
+			if (!dir)
+				return -ECHILD;
+		} else {
+			parent = dget_parent(d);
+			dir = d_inode(parent);
+		}
+		take_dentry_name_snapshot(&n, d);
+		ret = d->d_op->d_revalidate(dir, &n.name, d, flags);
+		release_dentry_name_snapshot(&n);
+		if (!(flags & LOOKUP_RCU))
+			dput(parent);
 		if (!ret) {
 			if (!(flags & LOOKUP_RCU))
 				d_invalidate(d);
@@ -127,7 +144,8 @@ static int ovl_dentry_revalidate_common(struct dentry *dentry,
 	return ret;
 }
 
-static int ovl_dentry_revalidate(struct dentry *dentry, unsigned int flags)
+static int ovl_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	return ovl_dentry_revalidate_common(dentry, flags, false);
 }
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0edf14a9840e..fb5493d0edf0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2058,7 +2058,8 @@ void pid_update_inode(struct task_struct *task, struct inode *inode)
  * performed a setuid(), etc.
  *
  */
-static int pid_revalidate(struct dentry *dentry, unsigned int flags)
+static int pid_revalidate(struct inode *dir, const struct qstr *name,
+			  struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	struct task_struct *task;
@@ -2191,7 +2192,8 @@ static int dname_to_vma_addr(struct dentry *dentry,
 	return 0;
 }
 
-static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int map_files_d_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
 {
 	unsigned long vm_start, vm_end;
 	bool exact_vma_exists = false;
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 24baf23e864f..37aa778d1af7 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -140,7 +140,8 @@ static void tid_fd_update_inode(struct task_struct *task, struct inode *inode,
 	security_task_to_inode(task, inode);
 }
 
-static int tid_fd_revalidate(struct dentry *dentry, unsigned int flags)
+static int tid_fd_revalidate(struct inode *dir, const struct qstr *name,
+			     struct dentry *dentry, unsigned int flags)
 {
 	struct task_struct *task;
 	struct inode *inode;
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index dbe82cf23ee4..8ec90826a49e 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -216,7 +216,8 @@ void proc_free_inum(unsigned int inum)
 	ida_free(&proc_inum_ida, inum - PROC_DYNAMIC_FIRST);
 }
 
-static int proc_misc_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int proc_misc_d_revalidate(struct inode *dir, const struct qstr *name,
+				  struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
@@ -343,7 +344,8 @@ static const struct file_operations proc_dir_operations = {
 	.iterate_shared		= proc_readdir,
 };
 
-static int proc_net_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int proc_net_d_revalidate(struct inode *dir, const struct qstr *name,
+				 struct dentry *dentry, unsigned int flags)
 {
 	return 0;
 }
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 27a283d85a6e..cc9d74a06ff0 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -884,7 +884,8 @@ static const struct inode_operations proc_sys_dir_operations = {
 	.getattr	= proc_sys_getattr,
 };
 
-static int proc_sys_revalidate(struct dentry *dentry, unsigned int flags)
+static int proc_sys_revalidate(struct inode *dir, const struct qstr *name,
+			       struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
diff --git a/fs/smb/client/dir.c b/fs/smb/client/dir.c
index 864b194dbaa0..8c5d44ee91ed 100644
--- a/fs/smb/client/dir.c
+++ b/fs/smb/client/dir.c
@@ -737,7 +737,8 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
 }
 
 static int
-cifs_d_revalidate(struct dentry *direntry, unsigned int flags)
+cifs_d_revalidate(struct inode *dir, const struct qstr *name,
+		  struct dentry *direntry, unsigned int flags)
 {
 	struct inode *inode;
 	int rc;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index cfc614c638da..53214499e384 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -457,7 +457,8 @@ static void tracefs_d_release(struct dentry *dentry)
 		eventfs_d_release(dentry);
 }
 
-static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+static int tracefs_d_revalidate(struct inode *inode, const struct qstr *name,
+				struct dentry *dentry, unsigned int flags)
 {
 	struct eventfs_inode *ei = dentry->d_fsdata;
 
diff --git a/fs/vboxsf/dir.c b/fs/vboxsf/dir.c
index 5f1a14d5b927..a859ac9b74ba 100644
--- a/fs/vboxsf/dir.c
+++ b/fs/vboxsf/dir.c
@@ -192,7 +192,8 @@ const struct file_operations vboxsf_dir_fops = {
  * This is called during name resolution/lookup to check if the @dentry in
  * the cache is still valid. the job is handled by vboxsf_inode_revalidate.
  */
-static int vboxsf_dentry_revalidate(struct dentry *dentry, unsigned int flags)
+static int vboxsf_dentry_revalidate(struct inode *dir, const struct qstr *name,
+				    struct dentry *dentry, unsigned int flags)
 {
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 8bc567a35718..4a6bdadf2f29 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -144,7 +144,8 @@ enum d_real_type {
 };
 
 struct dentry_operations {
-	int (*d_revalidate)(struct dentry *, unsigned int);
+	int (*d_revalidate)(struct inode *, const struct qstr *,
+			    struct dentry *, unsigned int);
 	int (*d_weak_revalidate)(struct dentry *, unsigned int);
 	int (*d_hash)(const struct dentry *, struct qstr *);
 	int (*d_compare)(const struct dentry *,
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 772f822dc6b8..18855cb44b1c 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -192,7 +192,8 @@ struct fscrypt_operations {
 					     unsigned int *num_devs);
 };
 
-int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags);
+int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
+			 struct dentry *dentry, unsigned int flags);
 
 static inline struct fscrypt_inode_info *
 fscrypt_get_inode_info(const struct inode *inode)
@@ -711,8 +712,8 @@ static inline u64 fscrypt_fname_siphash(const struct inode *dir,
 	return 0;
 }
 
-static inline int fscrypt_d_revalidate(struct dentry *dentry,
-				       unsigned int flags)
+static inline int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
+				       struct dentry *dentry, unsigned int flags)
 {
 	return 1;
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (5 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 09/20] ceph_d_revalidate(): use stable " Al Viro
                         ` (11 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to bother with boilerplate for obtaining the latter and for
the former we really should not count upon ->d_name.name remaining
stable under us.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/afs/dir.c | 34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 9780013cd83a..e04cffe4beb1 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -607,19 +607,19 @@ static bool afs_lookup_one_filldir(struct dir_context *ctx, const char *name,
  * Do a lookup of a single name in a directory
  * - just returns the FID the dentry name maps to if found
  */
-static int afs_do_lookup_one(struct inode *dir, struct dentry *dentry,
+static int afs_do_lookup_one(struct inode *dir, const struct qstr *name,
 			     struct afs_fid *fid, struct key *key,
 			     afs_dataversion_t *_dir_version)
 {
 	struct afs_super_info *as = dir->i_sb->s_fs_info;
 	struct afs_lookup_one_cookie cookie = {
 		.ctx.actor = afs_lookup_one_filldir,
-		.name = dentry->d_name,
+		.name = *name,
 		.fid.vid = as->volume->vid
 	};
 	int ret;
 
-	_enter("{%lu},%p{%pd},", dir->i_ino, dentry, dentry);
+	_enter("{%lu},{%.*s},", dir->i_ino, name->len, name->name);
 
 	/* search the directory */
 	ret = afs_dir_iterate(dir, &cookie.ctx, key, _dir_version);
@@ -1052,21 +1052,12 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
 /*
  * Check the validity of a dentry under RCU conditions.
  */
-static int afs_d_revalidate_rcu(struct dentry *dentry)
+static int afs_d_revalidate_rcu(struct afs_vnode *dvnode, struct dentry *dentry)
 {
-	struct afs_vnode *dvnode;
-	struct dentry *parent;
-	struct inode *dir;
 	long dir_version, de_version;
 
 	_enter("%p", dentry);
 
-	/* Check the parent directory is still valid first. */
-	parent = READ_ONCE(dentry->d_parent);
-	dir = d_inode_rcu(parent);
-	if (!dir)
-		return -ECHILD;
-	dvnode = AFS_FS_I(dir);
 	if (test_bit(AFS_VNODE_DELETED, &dvnode->flags))
 		return -ECHILD;
 
@@ -1097,9 +1088,8 @@ static int afs_d_revalidate_rcu(struct dentry *dentry)
 static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 			    struct dentry *dentry, unsigned int flags)
 {
-	struct afs_vnode *vnode, *dir;
+	struct afs_vnode *vnode, *dir = AFS_FS_I(parent_dir);
 	struct afs_fid fid;
-	struct dentry *parent;
 	struct inode *inode;
 	struct key *key;
 	afs_dataversion_t dir_version, invalid_before;
@@ -1107,7 +1097,7 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	int ret;
 
 	if (flags & LOOKUP_RCU)
-		return afs_d_revalidate_rcu(dentry);
+		return afs_d_revalidate_rcu(dir, dentry);
 
 	if (d_really_is_positive(dentry)) {
 		vnode = AFS_FS_I(d_inode(dentry));
@@ -1122,14 +1112,9 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	if (IS_ERR(key))
 		key = NULL;
 
-	/* Hold the parent dentry so we can peer at it */
-	parent = dget_parent(dentry);
-	dir = AFS_FS_I(d_inode(parent));
-
 	/* validate the parent directory */
 	ret = afs_validate(dir, key);
 	if (ret == -ERESTARTSYS) {
-		dput(parent);
 		key_put(key);
 		return ret;
 	}
@@ -1157,7 +1142,7 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	afs_stat_v(dir, n_reval);
 
 	/* search the directory for this vnode */
-	ret = afs_do_lookup_one(&dir->netfs.inode, dentry, &fid, key, &dir_version);
+	ret = afs_do_lookup_one(&dir->netfs.inode, name, &fid, key, &dir_version);
 	switch (ret) {
 	case 0:
 		/* the filename maps to something */
@@ -1201,22 +1186,19 @@ static int afs_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 		goto out_valid;
 
 	default:
-		_debug("failed to iterate dir %pd: %d",
-		       parent, ret);
+		_debug("failed to iterate parent %pd2: %d", dentry, ret);
 		goto not_found;
 	}
 
 out_valid:
 	dentry->d_fsdata = (void *)(unsigned long)dir_version;
 out_valid_noupdate:
-	dput(parent);
 	key_put(key);
 	_leave(" = 1 [valid]");
 	return 1;
 
 not_found:
 	_debug("dropping dentry %pd2", dentry);
-	dput(parent);
 	key_put(key);
 
 	_leave(" = 0 [bad]");
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 09/20] ceph_d_revalidate(): use stable parent inode passed by caller
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (6 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 10/20] ceph_d_revalidate(): propagate stable name down into request encoding Al Viro
                         ` (10 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to mess with the boilerplate for obtaining what we already
have.  Note that ceph is one of the "will want a path from filesystem
root if we want to talk to server" cases, so the name of the last
component is of little use - it is passed to fscrypt_d_revalidate()
and it's used to deal with (also crypt-related) case in request
marshalling, when encrypted name turns out to be too long.  The former
is not a problem, but the latter is racy; that part will be handled
in the next commit.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ceph/dir.c | 22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index c4c71c24221b..dc5f55bebad7 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1940,30 +1940,19 @@ static int dir_lease_is_valid(struct inode *dir, struct dentry *dentry,
 /*
  * Check if cached dentry can be trusted.
  */
-static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+static int ceph_d_revalidate(struct inode *dir, const struct qstr *name,
 			     struct dentry *dentry, unsigned int flags)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_fs_client(dentry->d_sb)->mdsc;
 	struct ceph_client *cl = mdsc->fsc->client;
 	int valid = 0;
-	struct dentry *parent;
-	struct inode *dir, *inode;
+	struct inode *inode;
 
-	valid = fscrypt_d_revalidate(parent_dir, name, dentry, flags);
+	valid = fscrypt_d_revalidate(dir, name, dentry, flags);
 	if (valid <= 0)
 		return valid;
 
-	if (flags & LOOKUP_RCU) {
-		parent = READ_ONCE(dentry->d_parent);
-		dir = d_inode_rcu(parent);
-		if (!dir)
-			return -ECHILD;
-		inode = d_inode_rcu(dentry);
-	} else {
-		parent = dget_parent(dentry);
-		dir = d_inode(parent);
-		inode = d_inode(dentry);
-	}
+	inode = d_inode_rcu(dentry);
 
 	doutc(cl, "%p '%pd' inode %p offset 0x%llx nokey %d\n",
 	      dentry, dentry, inode, ceph_dentry(dentry)->offset,
@@ -2039,9 +2028,6 @@ static int ceph_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	doutc(cl, "%p '%pd' %s\n", dentry, dentry, valid ? "valid" : "invalid");
 	if (!valid)
 		ceph_dir_clear_complete(dir);
-
-	if (!(flags & LOOKUP_RCU))
-		dput(parent);
 	return valid;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 10/20] ceph_d_revalidate(): propagate stable name down into request encoding
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (7 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 09/20] ceph_d_revalidate(): use stable " Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
                         ` (9 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Currently get_fscrypt_altname() requires ->r_dentry->d_name to be stable
and it gets that in almost all cases.  The only exception is ->d_revalidate(),
where we have a stable name, but it's passed separately - dentry->d_name
is not stable there.

Propagate it down to get_fscrypt_altname() as a new field of struct
ceph_mds_request - ->r_dname, to be used instead ->r_dentry->d_name
when non-NULL.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ceph/dir.c        | 2 ++
 fs/ceph/mds_client.c | 9 ++++++---
 fs/ceph/mds_client.h | 2 ++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index dc5f55bebad7..62e99e65250d 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1998,6 +1998,8 @@ static int ceph_d_revalidate(struct inode *dir, const struct qstr *name,
 			req->r_parent = dir;
 			ihold(dir);
 
+			req->r_dname = name;
+
 			mask = CEPH_STAT_CAP_INODE | CEPH_CAP_AUTH_SHARED;
 			if (ceph_security_xattr_wanted(dir))
 				mask |= CEPH_CAP_XATTR_SHARED;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 219a2cc2bf3c..3b766b984713 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2621,6 +2621,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
 {
 	struct inode *dir = req->r_parent;
 	struct dentry *dentry = req->r_dentry;
+	const struct qstr *name = req->r_dname;
 	u8 *cryptbuf = NULL;
 	u32 len = 0;
 	int ret = 0;
@@ -2641,8 +2642,10 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
 	if (!fscrypt_has_encryption_key(dir))
 		goto success;
 
-	if (!fscrypt_fname_encrypted_size(dir, dentry->d_name.len, NAME_MAX,
-					  &len)) {
+	if (!name)
+		name = &dentry->d_name;
+
+	if (!fscrypt_fname_encrypted_size(dir, name->len, NAME_MAX, &len)) {
 		WARN_ON_ONCE(1);
 		return ERR_PTR(-ENAMETOOLONG);
 	}
@@ -2657,7 +2660,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
 	if (!cryptbuf)
 		return ERR_PTR(-ENOMEM);
 
-	ret = fscrypt_fname_encrypt(dir, &dentry->d_name, cryptbuf, len);
+	ret = fscrypt_fname_encrypt(dir, name, cryptbuf, len);
 	if (ret) {
 		kfree(cryptbuf);
 		return ERR_PTR(ret);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 38bb7e0d2d79..7c9fee9e80d4 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -299,6 +299,8 @@ struct ceph_mds_request {
 	struct inode *r_target_inode;       /* resulting inode */
 	struct inode *r_new_inode;	    /* new inode (for creates) */
 
+	const struct qstr *r_dname;	    /* stable name (for ->d_revalidate) */
+
 #define CEPH_MDS_R_DIRECT_IS_HASH	(1) /* r_direct_hash is valid */
 #define CEPH_MDS_R_ABORTED		(2) /* call was aborted */
 #define CEPH_MDS_R_GOT_UNSAFE		(3) /* got an unsafe reply */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (8 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 10/20] ceph_d_revalidate(): propagate stable name down into request encoding Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 12/20] exfat_d_revalidate(): " Al Viro
                         ` (8 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

The only thing it's using is parent directory inode and we are already
given a stable reference to that - no need to bother with boilerplate.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/crypto/fname.c | 21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 389f5b2bf63b..010f9c0a4c2f 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -574,12 +574,10 @@ EXPORT_SYMBOL_GPL(fscrypt_fname_siphash);
  * Validate dentries in encrypted directories to make sure we aren't potentially
  * caching stale dentries after a key has been added.
  */
-int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
+int fscrypt_d_revalidate(struct inode *dir, const struct qstr *name,
 			 struct dentry *dentry, unsigned int flags)
 {
-	struct dentry *dir;
 	int err;
-	int valid;
 
 	/*
 	 * Plaintext names are always valid, since fscrypt doesn't support
@@ -592,30 +590,21 @@ int fscrypt_d_revalidate(struct inode *parent_dir, const struct qstr *name,
 	/*
 	 * No-key name; valid if the directory's key is still unavailable.
 	 *
-	 * Although fscrypt forbids rename() on no-key names, we still must use
-	 * dget_parent() here rather than use ->d_parent directly.  That's
-	 * because a corrupted fs image may contain directory hard links, which
-	 * the VFS handles by moving the directory's dentry tree in the dcache
-	 * each time ->lookup() finds the directory and it already has a dentry
-	 * elsewhere.  Thus ->d_parent can be changing, and we must safely grab
-	 * a reference to some ->d_parent to prevent it from being freed.
+	 * Note in RCU mode we have to bail if we get here -
+	 * fscrypt_get_encryption_info() may block.
 	 */
 
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	dir = dget_parent(dentry);
 	/*
 	 * Pass allow_unsupported=true, so that files with an unsupported
 	 * encryption policy can be deleted.
 	 */
-	err = fscrypt_get_encryption_info(d_inode(dir), true);
-	valid = !fscrypt_has_encryption_key(d_inode(dir));
-	dput(dir);
-
+	err = fscrypt_get_encryption_info(dir, true);
 	if (err < 0)
 		return err;
 
-	return valid;
+	return !fscrypt_has_encryption_key(dir);
 }
 EXPORT_SYMBOL_GPL(fscrypt_d_revalidate);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 12/20] exfat_d_revalidate(): use stable parent inode passed by caller
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (9 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 13/20] vfat_revalidate{,_ci}(): " Al Viro
                         ` (7 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

... no need to bother with ->d_lock and ->d_parent->d_inode.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/exfat/namei.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index e3b4feccba07..61c7164b85b3 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -34,8 +34,6 @@ static inline void exfat_d_version_set(struct dentry *dentry,
 static int exfat_d_revalidate(struct inode *dir, const struct qstr *name,
 			      struct dentry *dentry, unsigned int flags)
 {
-	int ret;
-
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
@@ -59,11 +57,7 @@ static int exfat_d_revalidate(struct inode *dir, const struct qstr *name,
 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
 		return 0;
 
-	spin_lock(&dentry->d_lock);
-	ret = inode_eq_iversion(d_inode(dentry->d_parent),
-			exfat_d_version(dentry));
-	spin_unlock(&dentry->d_lock);
-	return ret;
+	return inode_eq_iversion(dir, exfat_d_version(dentry));
 }
 
 /* returns the length of a struct qstr, ignoring trailing dots if necessary */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 13/20] vfat_revalidate{,_ci}(): use stable parent inode passed by caller
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (10 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 12/20] exfat_d_revalidate(): " Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
                         ` (6 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/fat/namei_vfat.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index f9cbd5c6f932..926c26e90ef8 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -43,14 +43,9 @@ static inline void vfat_d_version_set(struct dentry *dentry,
  * If it happened, the negative dentry isn't actually negative
  * anymore.  So, drop it.
  */
-static int vfat_revalidate_shortname(struct dentry *dentry)
+static bool vfat_revalidate_shortname(struct dentry *dentry, struct inode *dir)
 {
-	int ret = 1;
-	spin_lock(&dentry->d_lock);
-	if (!inode_eq_iversion(d_inode(dentry->d_parent), vfat_d_version(dentry)))
-		ret = 0;
-	spin_unlock(&dentry->d_lock);
-	return ret;
+	return inode_eq_iversion(dir, vfat_d_version(dentry));
 }
 
 static int vfat_revalidate(struct inode *dir, const struct qstr *name,
@@ -62,7 +57,7 @@ static int vfat_revalidate(struct inode *dir, const struct qstr *name,
 	/* This is not negative dentry. Always valid. */
 	if (d_really_is_positive(dentry))
 		return 1;
-	return vfat_revalidate_shortname(dentry);
+	return vfat_revalidate_shortname(dentry, dir);
 }
 
 static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
@@ -99,7 +94,7 @@ static int vfat_revalidate_ci(struct inode *dir, const struct qstr *name,
 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
 		return 0;
 
-	return vfat_revalidate_shortname(dentry);
+	return vfat_revalidate_shortname(dentry, dir);
 }
 
 /* returns the length of a struct qstr, ignoring trailing dots */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 14/20] fuse_dentry_revalidate(): use stable parent inode and name passed by caller
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (11 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 13/20] vfat_revalidate{,_ci}(): " Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23 10:51         ` Miklos Szeredi
  2025-01-23  1:46       ` [PATCH v3 15/20] gfs2_drevalidate(): " Al Viro
                         ` (5 subsequent siblings)
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable - it's a real-life UAF.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/fuse/dir.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d9e9f26917eb..3019bc1d9f9d 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -175,9 +175,11 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
 	memset(outarg, 0, sizeof(struct fuse_entry_out));
 	args->opcode = FUSE_LOOKUP;
 	args->nodeid = nodeid;
-	args->in_numargs = 1;
-	args->in_args[0].size = name->len + 1;
+	args->in_numargs = 2;
+	args->in_args[0].size = name->len;
 	args->in_args[0].value = name->name;
+	args->in_args[1].size = 1;
+	args->in_args[1].value = "";
 	args->out_numargs = 1;
 	args->out_args[0].size = sizeof(struct fuse_entry_out);
 	args->out_args[0].value = outarg;
@@ -196,7 +198,6 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
 				  struct dentry *entry, unsigned int flags)
 {
 	struct inode *inode;
-	struct dentry *parent;
 	struct fuse_mount *fm;
 	struct fuse_inode *fi;
 	int ret;
@@ -228,11 +229,9 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
 
 		attr_version = fuse_get_attr_version(fm->fc);
 
-		parent = dget_parent(entry);
-		fuse_lookup_init(fm->fc, &args, get_node_id(d_inode(parent)),
-				 &entry->d_name, &outarg);
+		fuse_lookup_init(fm->fc, &args, get_node_id(dir),
+				 name, &outarg);
 		ret = fuse_simple_request(fm, &args);
-		dput(parent);
 		/* Zero nodeid is same as -ENOENT */
 		if (!ret && !outarg.nodeid)
 			ret = -ENOENT;
@@ -266,9 +265,7 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
 			if (test_bit(FUSE_I_INIT_RDPLUS, &fi->state))
 				return -ECHILD;
 		} else if (test_and_clear_bit(FUSE_I_INIT_RDPLUS, &fi->state)) {
-			parent = dget_parent(entry);
-			fuse_advise_use_readdirplus(d_inode(parent));
-			dput(parent);
+			fuse_advise_use_readdirplus(dir);
 		}
 	}
 	ret = 1;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 15/20] gfs2_drevalidate(): use stable parent inode and name passed by caller
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (12 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
                         ` (4 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable.  Theoretically a UAF, but it's
hard to exfiltrate the information...

Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/gfs2/dentry.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c
index 86c338901fab..95050e719233 100644
--- a/fs/gfs2/dentry.c
+++ b/fs/gfs2/dentry.c
@@ -35,48 +35,40 @@
 static int gfs2_drevalidate(struct inode *dir, const struct qstr *name,
 			    struct dentry *dentry, unsigned int flags)
 {
-	struct dentry *parent;
-	struct gfs2_sbd *sdp;
-	struct gfs2_inode *dip;
+	struct gfs2_sbd *sdp = GFS2_SB(dir);
+	struct gfs2_inode *dip = GFS2_I(dir);
 	struct inode *inode;
 	struct gfs2_holder d_gh;
 	struct gfs2_inode *ip = NULL;
-	int error, valid = 0;
+	int error, valid;
 	int had_lock = 0;
 
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	parent = dget_parent(dentry);
-	sdp = GFS2_SB(d_inode(parent));
-	dip = GFS2_I(d_inode(parent));
 	inode = d_inode(dentry);
 
 	if (inode) {
 		if (is_bad_inode(inode))
-			goto out;
+			return 0;
 		ip = GFS2_I(inode);
 	}
 
-	if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL) {
-		valid = 1;
-		goto out;
-	}
+	if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL)
+		return 1;
 
 	had_lock = (gfs2_glock_is_locked_by_me(dip->i_gl) != NULL);
 	if (!had_lock) {
 		error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &d_gh);
 		if (error)
-			goto out;
+			return 0;
 	}
 
-	error = gfs2_dir_check(d_inode(parent), &dentry->d_name, ip);
+	error = gfs2_dir_check(dir, name, ip);
 	valid = inode ? !error : (error == -ENOENT);
 
 	if (!had_lock)
 		gfs2_glock_dq_uninit(&d_gh);
-out:
-	dput(parent);
 	return valid;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 16/20] nfs{,4}_lookup_validate(): use stable parent inode passed by caller
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (13 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 15/20] gfs2_drevalidate(): " Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
                         ` (3 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

we can't kill __nfs_lookup_revalidate() completely, but ->d_parent boilerplate
in it is gone

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/nfs/dir.c | 43 +++++++++++++------------------------------
 1 file changed, 13 insertions(+), 30 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 9910d9796f4c..c28983ee75ca 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1732,8 +1732,8 @@ static int nfs_lookup_revalidate_dentry(struct inode *dir,
  * cached dentry and do a new lookup.
  */
 static int
-nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
-			 unsigned int flags)
+nfs_do_lookup_revalidate(struct inode *dir, const struct qstr *name,
+			 struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	int error = 0;
@@ -1785,39 +1785,26 @@ nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 }
 
 static int
-__nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags,
-			int (*reval)(struct inode *, struct dentry *, unsigned int))
+__nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 {
-	struct dentry *parent;
-	struct inode *dir;
-	int ret;
-
 	if (flags & LOOKUP_RCU) {
 		if (dentry->d_fsdata == NFS_FSDATA_BLOCKED)
 			return -ECHILD;
-		parent = READ_ONCE(dentry->d_parent);
-		dir = d_inode_rcu(parent);
-		if (!dir)
-			return -ECHILD;
-		ret = reval(dir, dentry, flags);
-		if (parent != READ_ONCE(dentry->d_parent))
-			return -ECHILD;
 	} else {
 		/* Wait for unlink to complete - see unblock_revalidate() */
 		wait_var_event(&dentry->d_fsdata,
 			       smp_load_acquire(&dentry->d_fsdata)
 			       != NFS_FSDATA_BLOCKED);
-		parent = dget_parent(dentry);
-		ret = reval(d_inode(parent), dentry, flags);
-		dput(parent);
 	}
-	return ret;
+	return 0;
 }
 
 static int nfs_lookup_revalidate(struct inode *dir, const struct qstr *name,
 				 struct dentry *dentry, unsigned int flags)
 {
-	return __nfs_lookup_revalidate(dentry, flags, nfs_do_lookup_revalidate);
+	if (__nfs_lookup_revalidate(dentry, flags))
+		return -ECHILD;
+	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
 }
 
 static void block_revalidate(struct dentry *dentry)
@@ -2216,11 +2203,14 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 EXPORT_SYMBOL_GPL(nfs_atomic_open);
 
 static int
-nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
-			  unsigned int flags)
+nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
+		       struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 
+	if (__nfs_lookup_revalidate(dentry, flags))
+		return -ECHILD;
+
 	trace_nfs_lookup_revalidate_enter(dir, dentry, flags);
 
 	if (!(flags & LOOKUP_OPEN) || (flags & LOOKUP_DIRECTORY))
@@ -2259,14 +2249,7 @@ nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
 
 full_reval:
-	return nfs_do_lookup_revalidate(dir, dentry, flags);
-}
-
-static int nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
-				  struct dentry *dentry, unsigned int flags)
-{
-	return __nfs_lookup_revalidate(dentry, flags,
-			nfs4_do_lookup_revalidate);
+	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
 }
 
 #endif /* CONFIG_NFSV4 */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (14 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
                         ` (2 subsequent siblings)
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

Pass the stable name all the way down to ->rpc_ops->lookup() instances.

Note that passing &dentry->d_name is safe in e.g. nfs_lookup() - it *is*
stable there, as it is in ->create() et.al.

dget_parent() in nfs_instantiate() should be redundant - it'd better be
stable there; if it's not, we have more trouble, since ->d_name would
also be unsafe in such case.

nfs_submount() and nfs4_submount() may or may not require fixes - if
they ever get moved on server with fhandle preserved, we are in trouble
there...

UAF window is fairly narrow here and exfiltration requires the ability
to watch the traffic.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/nfs/dir.c            | 14 ++++++++------
 fs/nfs/namespace.c      |  2 +-
 fs/nfs/nfs3proc.c       |  5 ++---
 fs/nfs/nfs4proc.c       | 20 ++++++++++----------
 fs/nfs/proc.c           |  6 +++---
 include/linux/nfs_xdr.h |  2 +-
 6 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index c28983ee75ca..2b04038b0e40 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1672,7 +1672,7 @@ nfs_lookup_revalidate_delegated(struct inode *dir, struct dentry *dentry,
 	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
 }
 
-static int nfs_lookup_revalidate_dentry(struct inode *dir,
+static int nfs_lookup_revalidate_dentry(struct inode *dir, const struct qstr *name,
 					struct dentry *dentry,
 					struct inode *inode, unsigned int flags)
 {
@@ -1690,7 +1690,7 @@ static int nfs_lookup_revalidate_dentry(struct inode *dir,
 		goto out;
 
 	dir_verifier = nfs_save_change_attribute(dir);
-	ret = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
+	ret = NFS_PROTO(dir)->lookup(dir, dentry, name, fhandle, fattr);
 	if (ret < 0)
 		goto out;
 
@@ -1775,7 +1775,7 @@ nfs_do_lookup_revalidate(struct inode *dir, const struct qstr *name,
 	if (NFS_STALE(inode))
 		goto out_bad;
 
-	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
+	return nfs_lookup_revalidate_dentry(dir, name, dentry, inode, flags);
 out_valid:
 	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
 out_bad:
@@ -1970,7 +1970,8 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in
 
 	dir_verifier = nfs_save_change_attribute(dir);
 	trace_nfs_lookup_enter(dir, dentry, flags);
-	error = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
+	error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
+				       fhandle, fattr);
 	if (error == -ENOENT) {
 		if (nfs_server_capable(dir, NFS_CAP_CASE_INSENSITIVE))
 			dir_verifier = inode_peek_iversion_raw(dir);
@@ -2246,7 +2247,7 @@ nfs4_lookup_revalidate(struct inode *dir, const struct qstr *name,
 reval_dentry:
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
-	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
+	return nfs_lookup_revalidate_dentry(dir, name, dentry, inode, flags);
 
 full_reval:
 	return nfs_do_lookup_revalidate(dir, name, dentry, flags);
@@ -2305,7 +2306,8 @@ nfs_add_or_obtain(struct dentry *dentry, struct nfs_fh *fhandle,
 	d_drop(dentry);
 
 	if (fhandle->size == 0) {
-		error = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr);
+		error = NFS_PROTO(dir)->lookup(dir, dentry, &dentry->d_name,
+					       fhandle, fattr);
 		if (error)
 			goto out_error;
 	}
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index 2d53574da605..973aed9cc5fe 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -308,7 +308,7 @@ int nfs_submount(struct fs_context *fc, struct nfs_server *server)
 	int err;
 
 	/* Look it up again to get its attributes */
-	err = server->nfs_client->rpc_ops->lookup(d_inode(parent), dentry,
+	err = server->nfs_client->rpc_ops->lookup(d_inode(parent), dentry, &dentry->d_name,
 						  ctx->mntfh, ctx->clone_data.fattr);
 	dput(parent);
 	if (err != 0)
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 1566163c6d85..ce70768e0201 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -192,7 +192,7 @@ __nfs3_proc_lookup(struct inode *dir, const char *name, size_t len,
 }
 
 static int
-nfs3_proc_lookup(struct inode *dir, struct dentry *dentry,
+nfs3_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
 		 struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	unsigned short task_flags = 0;
@@ -202,8 +202,7 @@ nfs3_proc_lookup(struct inode *dir, struct dentry *dentry,
 		task_flags |= RPC_TASK_TIMEOUT;
 
 	dprintk("NFS call  lookup %pd2\n", dentry);
-	return __nfs3_proc_lookup(dir, dentry->d_name.name,
-				  dentry->d_name.len, fhandle, fattr,
+	return __nfs3_proc_lookup(dir, name->name, name->len, fhandle, fattr,
 				  task_flags);
 }
 
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 405f17e6e0b4..4d85068e820d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4536,15 +4536,15 @@ nfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
 }
 
 static int _nfs4_proc_lookup(struct rpc_clnt *clnt, struct inode *dir,
-		struct dentry *dentry, struct nfs_fh *fhandle,
-		struct nfs_fattr *fattr)
+		struct dentry *dentry, const struct qstr *name,
+		struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	struct nfs_server *server = NFS_SERVER(dir);
 	int		       status;
 	struct nfs4_lookup_arg args = {
 		.bitmask = server->attr_bitmask,
 		.dir_fh = NFS_FH(dir),
-		.name = &dentry->d_name,
+		.name = name,
 	};
 	struct nfs4_lookup_res res = {
 		.server = server,
@@ -4586,17 +4586,16 @@ static void nfs_fixup_secinfo_attributes(struct nfs_fattr *fattr)
 }
 
 static int nfs4_proc_lookup_common(struct rpc_clnt **clnt, struct inode *dir,
-				   struct dentry *dentry, struct nfs_fh *fhandle,
-				   struct nfs_fattr *fattr)
+				   struct dentry *dentry, const struct qstr *name,
+				   struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	struct nfs4_exception exception = {
 		.interruptible = true,
 	};
 	struct rpc_clnt *client = *clnt;
-	const struct qstr *name = &dentry->d_name;
 	int err;
 	do {
-		err = _nfs4_proc_lookup(client, dir, dentry, fhandle, fattr);
+		err = _nfs4_proc_lookup(client, dir, dentry, name, fhandle, fattr);
 		trace_nfs4_lookup(dir, name, err);
 		switch (err) {
 		case -NFS4ERR_BADNAME:
@@ -4631,13 +4630,13 @@ static int nfs4_proc_lookup_common(struct rpc_clnt **clnt, struct inode *dir,
 	return err;
 }
 
-static int nfs4_proc_lookup(struct inode *dir, struct dentry *dentry,
+static int nfs4_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
 			    struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	int status;
 	struct rpc_clnt *client = NFS_CLIENT(dir);
 
-	status = nfs4_proc_lookup_common(&client, dir, dentry, fhandle, fattr);
+	status = nfs4_proc_lookup_common(&client, dir, dentry, name, fhandle, fattr);
 	if (client != NFS_CLIENT(dir)) {
 		rpc_shutdown_client(client);
 		nfs_fixup_secinfo_attributes(fattr);
@@ -4652,7 +4651,8 @@ nfs4_proc_lookup_mountpoint(struct inode *dir, struct dentry *dentry,
 	struct rpc_clnt *client = NFS_CLIENT(dir);
 	int status;
 
-	status = nfs4_proc_lookup_common(&client, dir, dentry, fhandle, fattr);
+	status = nfs4_proc_lookup_common(&client, dir, dentry, &dentry->d_name,
+					 fhandle, fattr);
 	if (status < 0)
 		return ERR_PTR(status);
 	return (client == NFS_CLIENT(dir)) ? rpc_clone_client(client) : client;
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 6c09cd090c34..77920a2e3cef 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -153,13 +153,13 @@ nfs_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
 }
 
 static int
-nfs_proc_lookup(struct inode *dir, struct dentry *dentry,
+nfs_proc_lookup(struct inode *dir, struct dentry *dentry, const struct qstr *name,
 		struct nfs_fh *fhandle, struct nfs_fattr *fattr)
 {
 	struct nfs_diropargs	arg = {
 		.fh		= NFS_FH(dir),
-		.name		= dentry->d_name.name,
-		.len		= dentry->d_name.len
+		.name		= name->name,
+		.len		= name->len
 	};
 	struct nfs_diropok	res = {
 		.fh		= fhandle,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 559273a0f16d..08b62bbf59f0 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1785,7 +1785,7 @@ struct nfs_rpc_ops {
 			    struct nfs_fattr *, struct inode *);
 	int	(*setattr) (struct dentry *, struct nfs_fattr *,
 			    struct iattr *);
-	int	(*lookup)  (struct inode *, struct dentry *,
+	int	(*lookup)  (struct inode *, struct dentry *, const struct qstr *,
 			    struct nfs_fh *, struct nfs_fattr *);
 	int	(*lookupp) (struct inode *, struct nfs_fh *,
 			    struct nfs_fattr *);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (15 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-23  1:46       ` [PATCH v3 19/20] orangefs_d_revalidate(): " Al Viro
  2025-01-23  1:46       ` [PATCH v3 20/20] 9p: fix ->rename_sem exclusion Al Viro
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

theoretically, ->d_name use in there is a UAF, but only if you are messing with
tracepoints...

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ocfs2/dcache.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c
index ecb1ce6301c4..1873bbbb7e5b 100644
--- a/fs/ocfs2/dcache.c
+++ b/fs/ocfs2/dcache.c
@@ -45,8 +45,7 @@ static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
 	inode = d_inode(dentry);
 	osb = OCFS2_SB(dentry->d_sb);
 
-	trace_ocfs2_dentry_revalidate(dentry, dentry->d_name.len,
-				      dentry->d_name.name);
+	trace_ocfs2_dentry_revalidate(dentry, name->len, name->name);
 
 	/* For a negative dentry -
 	 * check the generation number of the parent and compare with the
@@ -54,12 +53,8 @@ static int ocfs2_dentry_revalidate(struct inode *dir, const struct qstr *name,
 	 */
 	if (inode == NULL) {
 		unsigned long gen = (unsigned long) dentry->d_fsdata;
-		unsigned long pgen;
-		spin_lock(&dentry->d_lock);
-		pgen = OCFS2_I(d_inode(dentry->d_parent))->ip_dir_lock_gen;
-		spin_unlock(&dentry->d_lock);
-		trace_ocfs2_dentry_revalidate_negative(dentry->d_name.len,
-						       dentry->d_name.name,
+		unsigned long pgen = OCFS2_I(dir)->ip_dir_lock_gen;
+		trace_ocfs2_dentry_revalidate_negative(name->len, name->name,
 						       pgen, gen);
 		if (gen != pgen)
 			goto bail;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 19/20] orangefs_d_revalidate(): use stable parent inode and name passed by caller
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (16 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
@ 2025-01-23  1:46       ` Al Viro
  2025-01-25 16:25         ` Mike Marshall
  2025-01-23  1:46       ` [PATCH v3 20/20] 9p: fix ->rename_sem exclusion Al Viro
  18 siblings, 1 reply; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

->d_name use is a UAF if the userland side of things can be slowed down
by attacker.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/orangefs/dcache.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
index c32c9a86e8d0..a19d1ad705db 100644
--- a/fs/orangefs/dcache.c
+++ b/fs/orangefs/dcache.c
@@ -13,10 +13,9 @@
 #include "orangefs-kernel.h"
 
 /* Returns 1 if dentry can still be trusted, else 0. */
-static int orangefs_revalidate_lookup(struct dentry *dentry)
+static int orangefs_revalidate_lookup(struct inode *parent_inode, const struct qstr *name,
+				      struct dentry *dentry)
 {
-	struct dentry *parent_dentry = dget_parent(dentry);
-	struct inode *parent_inode = parent_dentry->d_inode;
 	struct orangefs_inode_s *parent = ORANGEFS_I(parent_inode);
 	struct inode *inode = dentry->d_inode;
 	struct orangefs_kernel_op_s *new_op;
@@ -26,14 +25,14 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
 	gossip_debug(GOSSIP_DCACHE_DEBUG, "%s: attempting lookup.\n", __func__);
 
 	new_op = op_alloc(ORANGEFS_VFS_OP_LOOKUP);
-	if (!new_op) {
-		ret = -ENOMEM;
-		goto out_put_parent;
-	}
+	if (!new_op)
+		return -ENOMEM;
 
 	new_op->upcall.req.lookup.sym_follow = ORANGEFS_LOOKUP_LINK_NO_FOLLOW;
 	new_op->upcall.req.lookup.parent_refn = parent->refn;
-	strscpy(new_op->upcall.req.lookup.d_name, dentry->d_name.name);
+	/* op_alloc() leaves ->upcall zeroed */
+	memcpy(new_op->upcall.req.lookup.d_name, name->name,
+			min(name->len, ORANGEFS_NAME_MAX - 1));
 
 	gossip_debug(GOSSIP_DCACHE_DEBUG,
 		     "%s:%s:%d interrupt flag [%d]\n",
@@ -78,8 +77,6 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
 	ret = 1;
 out_release_op:
 	op_release(new_op);
-out_put_parent:
-	dput(parent_dentry);
 	return ret;
 out_drop:
 	gossip_debug(GOSSIP_DCACHE_DEBUG, "%s:%s:%d revalidate failed\n",
@@ -115,7 +112,7 @@ static int orangefs_d_revalidate(struct inode *dir, const struct qstr *name,
 	 * If this passes, the positive dentry still exists or the negative
 	 * dentry still does not exist.
 	 */
-	if (!orangefs_revalidate_lookup(dentry))
+	if (!orangefs_revalidate_lookup(dir, name, dentry))
 		return 0;
 
 	/* We do not need to continue with negative dentries. */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 20/20] 9p: fix ->rename_sem exclusion
  2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
                         ` (17 preceding siblings ...)
  2025-01-23  1:46       ` [PATCH v3 19/20] orangefs_d_revalidate(): " Al Viro
@ 2025-01-23  1:46       ` Al Viro
  18 siblings, 0 replies; 96+ messages in thread
From: Al Viro @ 2025-01-23  1:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: agruenba, amir73il, brauner, ceph-devel, dhowells, hubcap, jack,
	krisman, linux-nfs, miklos, torvalds

9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.

->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain.  Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.

Solution:
	* have d_move() done in ->rename() rather than in its caller
	* maintain a 9p-private rwsem (per-filesystem)
	* hold it exclusive over the relevant part of ->rename()
	* hold it shared over the places where we want the path.

That almost works.  FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control.  However, there's
also __d_unalias(), which isn't covered by any of that.

If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.

Add a couple of optional methods, so that __d_unalias() would do
	if alias->d_op->d_unalias_trylock != NULL
		if (!alias->d_op->d_unalias_trylock(alias))
			fail (resulting in -ESTALE from lookup)
	__d_move(...)
	if alias->d_op->d_unalias_unlock != NULL
		alias->d_unalias_unlock(alias)
where it currently does __d_move().  9p instances do down_write_trylock()
and up_write() of ->rename_mutex.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/locking.rst |  4 ++++
 Documentation/filesystems/vfs.rst     | 21 +++++++++++++++++++++
 fs/9p/v9fs.h                          |  2 +-
 fs/9p/vfs_dentry.c                    | 16 ++++++++++++++++
 fs/dcache.c                           |  5 +++++
 include/linux/dcache.h                |  2 ++
 6 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 146e7d8aa736..d20a32b77b60 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -31,6 +31,8 @@ prototypes::
 	struct vfsmount *(*d_automount)(struct path *path);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_unalias_trylock)(const struct dentry *);
+	void (*d_unalias_unlock)(const struct dentry *);
 
 locking rules:
 
@@ -50,6 +52,8 @@ d_dname:	   no		no		no		no
 d_automount:	   no		no		yes		no
 d_manage:	   no		no		yes (ref-walk)	maybe
 d_real		   no		no		yes 		no
+d_unalias_trylock  yes		no		no 		no
+d_unalias_unlock   yes		no		no 		no
 ================== ===========	========	==============	========
 
 inode_operations
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c352ebaae98..31eea688609a 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1265,6 +1265,8 @@ defined:
 		struct vfsmount *(*d_automount)(struct path *);
 		int (*d_manage)(const struct path *, bool);
 		struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+		bool (*d_unalias_trylock)(const struct dentry *);
+		void (*d_unalias_unlock)(const struct dentry *);
 	};
 
 ``d_revalidate``
@@ -1428,6 +1430,25 @@ defined:
 
 	For non-regular files, the 'dentry' argument is returned.
 
+``d_unalias_trylock``
+	if present, will be called by d_splice_alias() before moving a
+	preexisting attached alias.  Returning false prevents __d_move(),
+	making d_splice_alias() fail with -ESTALE.
+
+	Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move()
+	and d_exchange() calls from the outside of filesystem methods;
+	however, it does not guarantee that attached dentries won't
+	be renamed or moved by d_splice_alias() finding a preexisting
+	alias for a directory inode.  Normally we would not care;
+	however, something that wants to stabilize the entire path to
+	root over a blocking operation might need that.  See 9p for one
+	(and hopefully only) example.
+
+``d_unalias_unlock``
+	should be paired with ``d_unalias_trylock``; that one is called after
+	__d_move() call in __d_unalias().
+
+
 Each dentry has a pointer to its parent dentry, as well as a hash list
 of child dentries.  Child dentries are basically like files in a
 directory.
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 698c43dd5dc8..f28bc763847a 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -202,7 +202,7 @@ static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode)
 	return inode->i_sb->s_fs_info;
 }
 
-static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry)
+static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry)
 {
 	return dentry->d_sb->s_fs_info;
 }
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 872c1abe3295..5061f192eafd 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -105,14 +105,30 @@ static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
 	return __v9fs_lookup_revalidate(dentry, flags);
 }
 
+static bool v9fs_dentry_unalias_trylock(const struct dentry *dentry)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+	return down_write_trylock(&v9ses->rename_sem);
+}
+
+static void v9fs_dentry_unalias_unlock(const struct dentry *dentry)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+	up_write(&v9ses->rename_sem);
+}
+
 const struct dentry_operations v9fs_cached_dentry_operations = {
 	.d_revalidate = v9fs_lookup_revalidate,
 	.d_weak_revalidate = __v9fs_lookup_revalidate,
 	.d_delete = v9fs_cached_dentry_delete,
 	.d_release = v9fs_dentry_release,
+	.d_unalias_trylock = v9fs_dentry_unalias_trylock,
+	.d_unalias_unlock = v9fs_dentry_unalias_unlock,
 };
 
 const struct dentry_operations v9fs_dentry_operations = {
 	.d_delete = always_delete_dentry,
 	.d_release = v9fs_dentry_release,
+	.d_unalias_trylock = v9fs_dentry_unalias_trylock,
+	.d_unalias_unlock = v9fs_dentry_unalias_unlock,
 };
diff --git a/fs/dcache.c b/fs/dcache.c
index 6f36d3e8c739..695406e48937 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2961,7 +2961,12 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias)
 		goto out_err;
 	m2 = &alias->d_parent->d_inode->i_rwsem;
 out_unalias:
+	if (alias->d_op->d_unalias_trylock &&
+	    !alias->d_op->d_unalias_trylock(alias))
+		goto out_err;
 	__d_move(alias, dentry, false);
+	if (alias->d_op->d_unalias_unlock)
+		alias->d_op->d_unalias_unlock(alias);
 	ret = 0;
 out_err:
 	if (m2)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 4a6bdadf2f29..9a1a30857763 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -159,6 +159,8 @@ struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_unalias_trylock)(const struct dentry *);
+	void (*d_unalias_unlock)(const struct dentry *);
 } ____cacheline_aligned;
 
 /*
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 14/20] fuse_dentry_revalidate(): use stable parent inode and name passed by caller
  2025-01-23  1:46       ` [PATCH v3 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
@ 2025-01-23 10:51         ` Miklos Szeredi
  0 siblings, 0 replies; 96+ messages in thread
From: Miklos Szeredi @ 2025-01-23 10:51 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	hubcap, jack, krisman, linux-nfs, torvalds

On Thu, 23 Jan 2025 at 02:46, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> No need to mess with dget_parent() for the former; for the latter we really should
> not rely upon ->d_name.name remaining stable - it's a real-life UAF.
>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Acked-by: Miklos Szeredi <mszeredi@redhat.com>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 19/20] orangefs_d_revalidate(): use stable parent inode and name passed by caller
  2025-01-23  1:46       ` [PATCH v3 19/20] orangefs_d_revalidate(): " Al Viro
@ 2025-01-25 16:25         ` Mike Marshall
  0 siblings, 0 replies; 96+ messages in thread
From: Mike Marshall @ 2025-01-25 16:25 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, agruenba, amir73il, brauner, ceph-devel, dhowells,
	jack, krisman, linux-nfs, miklos, torvalds

Tested-by: Mike Marshall <hubcap@omnibond.com>

On Wed, Jan 22, 2025 at 8:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> ->d_name use is a UAF if the userland side of things can be slowed down
> by attacker.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/orangefs/dcache.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
> index c32c9a86e8d0..a19d1ad705db 100644
> --- a/fs/orangefs/dcache.c
> +++ b/fs/orangefs/dcache.c
> @@ -13,10 +13,9 @@
>  #include "orangefs-kernel.h"
>
>  /* Returns 1 if dentry can still be trusted, else 0. */
> -static int orangefs_revalidate_lookup(struct dentry *dentry)
> +static int orangefs_revalidate_lookup(struct inode *parent_inode, const struct qstr *name,
> +                                     struct dentry *dentry)
>  {
> -       struct dentry *parent_dentry = dget_parent(dentry);
> -       struct inode *parent_inode = parent_dentry->d_inode;
>         struct orangefs_inode_s *parent = ORANGEFS_I(parent_inode);
>         struct inode *inode = dentry->d_inode;
>         struct orangefs_kernel_op_s *new_op;
> @@ -26,14 +25,14 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
>         gossip_debug(GOSSIP_DCACHE_DEBUG, "%s: attempting lookup.\n", __func__);
>
>         new_op = op_alloc(ORANGEFS_VFS_OP_LOOKUP);
> -       if (!new_op) {
> -               ret = -ENOMEM;
> -               goto out_put_parent;
> -       }
> +       if (!new_op)
> +               return -ENOMEM;
>
>         new_op->upcall.req.lookup.sym_follow = ORANGEFS_LOOKUP_LINK_NO_FOLLOW;
>         new_op->upcall.req.lookup.parent_refn = parent->refn;
> -       strscpy(new_op->upcall.req.lookup.d_name, dentry->d_name.name);
> +       /* op_alloc() leaves ->upcall zeroed */
> +       memcpy(new_op->upcall.req.lookup.d_name, name->name,
> +                       min(name->len, ORANGEFS_NAME_MAX - 1));
>
>         gossip_debug(GOSSIP_DCACHE_DEBUG,
>                      "%s:%s:%d interrupt flag [%d]\n",
> @@ -78,8 +77,6 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
>         ret = 1;
>  out_release_op:
>         op_release(new_op);
> -out_put_parent:
> -       dput(parent_dentry);
>         return ret;
>  out_drop:
>         gossip_debug(GOSSIP_DCACHE_DEBUG, "%s:%s:%d revalidate failed\n",
> @@ -115,7 +112,7 @@ static int orangefs_d_revalidate(struct inode *dir, const struct qstr *name,
>          * If this passes, the positive dentry still exists or the negative
>          * dentry still does not exist.
>          */
> -       if (!orangefs_revalidate_lookup(dentry))
> +       if (!orangefs_revalidate_lookup(dir, name, dentry))
>                 return 0;
>
>         /* We do not need to continue with negative dentries. */
> --
> 2.39.5
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2025-01-25 16:25 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-10  2:38 [PATCHES][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-10  2:42 ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-10  2:42   ` [PATCH 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-10  9:35     ` Jan Kara
2025-01-10 16:24       ` Al Viro
2025-01-10  2:42   ` [PATCH 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-10  9:45     ` Jan Kara
2025-01-10  2:42   ` [PATCH 04/20] dissolve external_name.u into separate members Al Viro
2025-01-10  7:34     ` David Howells
2025-01-10 16:46       ` Al Viro
2025-01-10  2:42   ` [PATCH 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-10  9:15     ` Jan Kara
2025-01-10  2:42   ` [PATCH 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-10  2:42   ` [PATCH 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-10  2:42   ` [PATCH 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-10  2:42   ` [PATCH 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-10 19:45     ` Viacheslav Dubeyko
2025-01-10  2:42   ` [PATCH 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
2025-01-10  2:42   ` [PATCH 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-10  2:42   ` [PATCH 12/20] exfat_d_revalidate(): " Al Viro
2025-01-10  2:42   ` [PATCH 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-10  2:42   ` [PATCH 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-10  2:42   ` [PATCH 15/20] gfs2_drevalidate(): " Al Viro
2025-01-10 19:20     ` Andreas Grünbacher
2025-01-10  2:42   ` [PATCH 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-10  2:43   ` [PATCH 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-10  2:43   ` [PATCH 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-10  9:54     ` Jan Kara
2025-01-10  2:43   ` [PATCH 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-10  3:06     ` Linus Torvalds
2025-01-10  2:43   ` [PATCH 20/20] 9p: fix ->rename_sem exclusion Al Viro
2025-01-10  3:11     ` Linus Torvalds
2025-01-10  5:53       ` Al Viro
2025-01-10  9:21   ` [PATCH 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Jan Kara
2025-01-16  5:21 ` [PATCHES v2][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-16  5:22   ` [PATCH v2 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-16  5:22     ` [PATCH v2 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-16  5:23     ` [PATCH v2 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-16  5:23     ` [PATCH v2 04/20] dissolve external_name.u into separate members Al Viro
2025-01-16 10:06       ` Jan Kara
2025-01-16  5:23     ` [PATCH v2 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-16  5:23     ` [PATCH v2 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-16 15:38       ` Gabriel Krisman Bertazi
2025-01-16 15:46         ` Al Viro
2025-01-16 15:53           ` Gabriel Krisman Bertazi
2025-01-16  5:23     ` [PATCH v2 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-16 15:15       ` Gabriel Krisman Bertazi
2025-01-17 18:55       ` Jeff Layton
2025-01-17 19:00         ` Al Viro
2025-01-16  5:23     ` [PATCH v2 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-22 20:27       ` David Howells
2025-01-22 21:01         ` Al Viro
2025-01-22 21:24           ` Al Viro
2025-01-22 21:55             ` David Howells
2025-01-16  5:23     ` [PATCH v2 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-17 18:35       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 10/20] ceph_d_revalidate(): propagate stable name down into request enconding Al Viro
2025-01-17 18:35       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-17 15:20       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 12/20] exfat_d_revalidate(): " Al Viro
2025-01-16  5:23     ` [PATCH v2 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-17 15:22       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-17 15:18       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 15/20] gfs2_drevalidate(): " Al Viro
2025-01-16  5:23     ` [PATCH v2 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-17 14:05       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-17 15:12       ` Jeff Layton
2025-01-16  5:23     ` [PATCH v2 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-16  5:23     ` [PATCH v2 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-16  5:23     ` [PATCH v2 20/20] 9p: fix ->rename_sem exclusion Al Viro
2025-01-23  1:45   ` [PATCHES v3][RFC][CFT] ->d_revalidate() calling conventions changes (->d_parent/->d_name stability problems) Al Viro
2025-01-23  1:46     ` [PATCH v3 01/20] make sure that DNAME_INLINE_LEN is a multiple of word size Al Viro
2025-01-23  1:46       ` [PATCH v3 02/20] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2025-01-23  1:46       ` [PATCH v3 03/20] make take_dentry_name_snapshot() lockless Al Viro
2025-01-23  1:46       ` [PATCH v3 04/20] dissolve external_name.u into separate members Al Viro
2025-01-23  1:46       ` [PATCH v3 05/20] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2025-01-23  1:46       ` [PATCH v3 06/20] generic_ci_d_compare(): use shortname_storage Al Viro
2025-01-23  1:46       ` [PATCH v3 07/20] Pass parent directory inode and expected name to ->d_revalidate() Al Viro
2025-01-23  1:46       ` [PATCH v3 08/20] afs_d_revalidate(): use stable name and parent inode passed by caller Al Viro
2025-01-23  1:46       ` [PATCH v3 09/20] ceph_d_revalidate(): use stable " Al Viro
2025-01-23  1:46       ` [PATCH v3 10/20] ceph_d_revalidate(): propagate stable name down into request encoding Al Viro
2025-01-23  1:46       ` [PATCH v3 11/20] fscrypt_d_revalidate(): use stable parent inode passed by caller Al Viro
2025-01-23  1:46       ` [PATCH v3 12/20] exfat_d_revalidate(): " Al Viro
2025-01-23  1:46       ` [PATCH v3 13/20] vfat_revalidate{,_ci}(): " Al Viro
2025-01-23  1:46       ` [PATCH v3 14/20] fuse_dentry_revalidate(): use stable parent inode and name " Al Viro
2025-01-23 10:51         ` Miklos Szeredi
2025-01-23  1:46       ` [PATCH v3 15/20] gfs2_drevalidate(): " Al Viro
2025-01-23  1:46       ` [PATCH v3 16/20] nfs{,4}_lookup_validate(): use stable parent inode " Al Viro
2025-01-23  1:46       ` [PATCH v3 17/20] nfs: fix ->d_revalidate() UAF on ->d_name accesses Al Viro
2025-01-23  1:46       ` [PATCH v3 18/20] ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller Al Viro
2025-01-23  1:46       ` [PATCH v3 19/20] orangefs_d_revalidate(): " Al Viro
2025-01-25 16:25         ` Mike Marshall
2025-01-23  1:46       ` [PATCH v3 20/20] 9p: fix ->rename_sem exclusion Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).