* [PATCH] fuse: prevent exchange/revalidate races
@ 2025-09-17 15:30 Miklos Szeredi
2025-09-17 15:36 ` Al Viro
2025-09-18 1:43 ` NeilBrown
0 siblings, 2 replies; 10+ messages in thread
From: Miklos Szeredi @ 2025-09-17 15:30 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Al Viro, NeilBrown
If a path component is revalidated while taking part in a
rename(RENAME_EXCHANGE) request, userspace might find the already exchanged
files, while the kernel still has the old ones in dcache. This mismatch
will cause the dentry to be invalidated (unhashed), resulting in
"(deleted)" being appended to proc paths.
Prevent this by taking the inode lock shared for the dentry being
revalidated.
Another race introduced by commit 5be1fa8abd7b ("Pass parent directory
inode and expected name to ->d_revalidate()") is that the name passed to
revalidate can be stale (rename succeeded after the dentry was looked up in
the dcache).
By checking the name and the parent while the inode is locked, this issue
can also be solved.
This doesn't deal with revalidate/d_splice_alias() races, which happens if
a directory (which is cached) is moved on the server and the new location
discovered by a lookup. In this case the inode is not locked during the
new lookup.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
fs/fuse/dir.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 5c569c3cb53f..7148b2a7611a 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -235,9 +235,18 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
attr_version = fuse_get_attr_version(fm->fc);
+ inode_lock_shared(inode);
+ if (entry->d_parent->d_inode != dir ||
+ !d_same_name(entry, entry, name)) {
+ /* raced with rename, assume revalidated */
+ inode_unlock_shared(inode);
+ return 1;
+ }
+
fuse_lookup_init(fm->fc, &args, get_node_id(dir),
name, &outarg);
ret = fuse_simple_request(fm, &args);
+ inode_unlock_shared(inode);
/* Zero nodeid is same as -ENOENT */
if (!ret && !outarg.nodeid)
ret = -ENOENT;
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] fuse: prevent exchange/revalidate races 2025-09-17 15:30 [PATCH] fuse: prevent exchange/revalidate races Miklos Szeredi @ 2025-09-17 15:36 ` Al Viro 2025-09-17 15:42 ` Miklos Szeredi 2025-09-18 1:43 ` NeilBrown 1 sibling, 1 reply; 10+ messages in thread From: Al Viro @ 2025-09-17 15:36 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-fsdevel, NeilBrown On Wed, Sep 17, 2025 at 05:30:24PM +0200, Miklos Szeredi wrote: > If a path component is revalidated while taking part in a > rename(RENAME_EXCHANGE) request, userspace might find the already exchanged > files, while the kernel still has the old ones in dcache. This mismatch > will cause the dentry to be invalidated (unhashed), resulting in > "(deleted)" being appended to proc paths. > > Prevent this by taking the inode lock shared for the dentry being > revalidated. > > Another race introduced by commit 5be1fa8abd7b ("Pass parent directory > inode and expected name to ->d_revalidate()") is that the name passed to > revalidate can be stale (rename succeeded after the dentry was looked up in > the dcache). > > By checking the name and the parent while the inode is locked, this issue > can also be solved. > > This doesn't deal with revalidate/d_splice_alias() races, which happens if > a directory (which is cached) is moved on the server and the new location > discovered by a lookup. In this case the inode is not locked during the > new lookup. > + inode_lock_shared(inode); > + if (entry->d_parent->d_inode != dir || > + !d_same_name(entry, entry, name)) { > + /* raced with rename, assume revalidated */ ... and if the call of ->d_revalidate() had been with parent locked, you've just got a deadlock in that case. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] fuse: prevent exchange/revalidate races 2025-09-17 15:36 ` Al Viro @ 2025-09-17 15:42 ` Miklos Szeredi 2025-09-17 16:41 ` Al Viro 0 siblings, 1 reply; 10+ messages in thread From: Miklos Szeredi @ 2025-09-17 15:42 UTC (permalink / raw) To: Al Viro; +Cc: Miklos Szeredi, linux-fsdevel, NeilBrown On Wed, 17 Sept 2025 at 17:37, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Wed, Sep 17, 2025 at 05:30:24PM +0200, Miklos Szeredi wrote: > > If a path component is revalidated while taking part in a > > rename(RENAME_EXCHANGE) request, userspace might find the already exchanged > > files, while the kernel still has the old ones in dcache. This mismatch > > will cause the dentry to be invalidated (unhashed), resulting in > > "(deleted)" being appended to proc paths. > > > > Prevent this by taking the inode lock shared for the dentry being > > revalidated. > > > > Another race introduced by commit 5be1fa8abd7b ("Pass parent directory > > inode and expected name to ->d_revalidate()") is that the name passed to > > revalidate can be stale (rename succeeded after the dentry was looked up in > > the dcache). > > > > By checking the name and the parent while the inode is locked, this issue > > can also be solved. > > > > This doesn't deal with revalidate/d_splice_alias() races, which happens if > > a directory (which is cached) is moved on the server and the new location > > discovered by a lookup. In this case the inode is not locked during the > > new lookup. > > > + inode_lock_shared(inode); > > + if (entry->d_parent->d_inode != dir || > > + !d_same_name(entry, entry, name)) { > > + /* raced with rename, assume revalidated */ > > ... and if the call of ->d_revalidate() had been with parent locked, you've > just got a deadlock in that case. Why? Thanks, Miklos ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] fuse: prevent exchange/revalidate races 2025-09-17 15:42 ` Miklos Szeredi @ 2025-09-17 16:41 ` Al Viro 2025-09-17 17:27 ` Miklos Szeredi 0 siblings, 1 reply; 10+ messages in thread From: Al Viro @ 2025-09-17 16:41 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Miklos Szeredi, linux-fsdevel, NeilBrown On Wed, Sep 17, 2025 at 05:42:18PM +0200, Miklos Szeredi wrote: > > ... and if the call of ->d_revalidate() had been with parent locked, you've > > just got a deadlock in that case. > > Why? Because the locking order on directories is "ancestors first"; you are trying to grab an inode that is also a directory and might be anywhere in the tree by that point - that's precisely what you are trying to check, isn't it? What's to prevent several threads from mutually deadlocking that way? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] fuse: prevent exchange/revalidate races 2025-09-17 16:41 ` Al Viro @ 2025-09-17 17:27 ` Miklos Szeredi 2025-09-17 17:43 ` Al Viro 0 siblings, 1 reply; 10+ messages in thread From: Miklos Szeredi @ 2025-09-17 17:27 UTC (permalink / raw) To: Al Viro; +Cc: Miklos Szeredi, linux-fsdevel, NeilBrown On Wed, 17 Sept 2025 at 18:41, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Wed, Sep 17, 2025 at 05:42:18PM +0200, Miklos Szeredi wrote: > > > ... and if the call of ->d_revalidate() had been with parent locked, you've > > > just got a deadlock in that case. > > > > Why? > > Because the locking order on directories is "ancestors first"; you are trying > to grab an inode that is also a directory and might be anywhere in the tree > by that point - that's precisely what you are trying to check, isn't it? But if parent is locked, it must not have been renamed, hence parent-child relationship holds. Thanks, Miklos ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] fuse: prevent exchange/revalidate races 2025-09-17 17:27 ` Miklos Szeredi @ 2025-09-17 17:43 ` Al Viro 2025-09-17 17:45 ` Al Viro 0 siblings, 1 reply; 10+ messages in thread From: Al Viro @ 2025-09-17 17:43 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Miklos Szeredi, linux-fsdevel, NeilBrown On Wed, Sep 17, 2025 at 07:27:25PM +0200, Miklos Szeredi wrote: > On Wed, 17 Sept 2025 at 18:41, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > On Wed, Sep 17, 2025 at 05:42:18PM +0200, Miklos Szeredi wrote: > > > > ... and if the call of ->d_revalidate() had been with parent locked, you've > > > > just got a deadlock in that case. > > > > > > Why? > > > > Because the locking order on directories is "ancestors first"; you are trying > > to grab an inode that is also a directory and might be anywhere in the tree > > by that point - that's precisely what you are trying to check, isn't it? > > But if parent is locked, it must not have been renamed, hence > parent-child relationship holds. Not if parent is held shared. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] fuse: prevent exchange/revalidate races 2025-09-17 17:43 ` Al Viro @ 2025-09-17 17:45 ` Al Viro 0 siblings, 0 replies; 10+ messages in thread From: Al Viro @ 2025-09-17 17:45 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Miklos Szeredi, linux-fsdevel, NeilBrown On Wed, Sep 17, 2025 at 06:43:50PM +0100, Al Viro wrote: > On Wed, Sep 17, 2025 at 07:27:25PM +0200, Miklos Szeredi wrote: > > On Wed, 17 Sept 2025 at 18:41, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > On Wed, Sep 17, 2025 at 05:42:18PM +0200, Miklos Szeredi wrote: > > > > > ... and if the call of ->d_revalidate() had been with parent locked, you've > > > > > just got a deadlock in that case. > > > > > > > > Why? > > > > > > Because the locking order on directories is "ancestors first"; you are trying > > > to grab an inode that is also a directory and might be anywhere in the tree > > > by that point - that's precisely what you are trying to check, isn't it? > > > > But if parent is locked, it must not have been renamed, hence > > parent-child relationship holds. > > Not if parent is held shared. Note that lookup will reparent a subdirectory just fine - exclusive lock on that directory's parent would make it fail, but shared one is not a problem. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] fuse: prevent exchange/revalidate races 2025-09-17 15:30 [PATCH] fuse: prevent exchange/revalidate races Miklos Szeredi 2025-09-17 15:36 ` Al Viro @ 2025-09-18 1:43 ` NeilBrown 2025-09-18 10:02 ` Miklos Szeredi 1 sibling, 1 reply; 10+ messages in thread From: NeilBrown @ 2025-09-18 1:43 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-fsdevel, Al Viro On Thu, 18 Sep 2025, Miklos Szeredi wrote: > If a path component is revalidated while taking part in a > rename(RENAME_EXCHANGE) request, userspace might find the already exchanged > files, while the kernel still has the old ones in dcache. This mismatch > will cause the dentry to be invalidated (unhashed), resulting in > "(deleted)" being appended to proc paths. An inappropriate d_invalidate() on a positive dentry can do worse than just adding "(deleted)" to paths in proc. Any mount points at or below the target of d_invalidate() are unmounted. So I think it is really important to avoid invalidating a positive dentry if at all possible. If I understand the race correctly, the problem happens if an unlocked d_revalidate() returns zero so d_invalidate() is called immediately *after* d_exchange() has succeeded. If it is called before, d_exchange() will rehash the dentry. Could the same thing happen on rename without RENAME_EXCHANGE? The unlocked d_revalidate() finds that the name has already been removed so it requests d_invalidate() which runs immediately *after* d_move() has succeeded? I think the race should be addressed in VFS code as it could affect any filesystem which provides ->d_revalidate. I'm not sure how. Maybe we could make use of LOOKUP_REVAL retries and mandate that if LOOKUP_REVAL is set then ->d_revalidate is always called with an exclusive lock. We would still need some way for d_invalidate() it decide whether to trigger the retry by causing -ESTALE to be returned. Maybe try-lock on s_vfs_rename_mutex could be used. Or maybe we could sample d_seq before calling ->d_revalidate() and only allow d_invalidate() to succeed if d_seq is unchanged. If it changed, we repeat .... something. Maybe the revalidate, maybe the lookup, maybe the whole path ?? Thanks, NeilBrown > > Prevent this by taking the inode lock shared for the dentry being > revalidated. > > Another race introduced by commit 5be1fa8abd7b ("Pass parent directory > inode and expected name to ->d_revalidate()") is that the name passed to > revalidate can be stale (rename succeeded after the dentry was looked up in > the dcache). > > By checking the name and the parent while the inode is locked, this issue > can also be solved. > > This doesn't deal with revalidate/d_splice_alias() races, which happens if > a directory (which is cached) is moved on the server and the new location > discovered by a lookup. In this case the inode is not locked during the > new lookup. > > Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> > --- > fs/fuse/dir.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index 5c569c3cb53f..7148b2a7611a 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -235,9 +235,18 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name, > > attr_version = fuse_get_attr_version(fm->fc); > > + inode_lock_shared(inode); > + if (entry->d_parent->d_inode != dir || > + !d_same_name(entry, entry, name)) { > + /* raced with rename, assume revalidated */ > + inode_unlock_shared(inode); > + return 1; > + } > + > fuse_lookup_init(fm->fc, &args, get_node_id(dir), > name, &outarg); > ret = fuse_simple_request(fm, &args); > + inode_unlock_shared(inode); > /* Zero nodeid is same as -ENOENT */ > if (!ret && !outarg.nodeid) > ret = -ENOENT; > -- > 2.51.0 > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] fuse: prevent exchange/revalidate races 2025-09-18 1:43 ` NeilBrown @ 2025-09-18 10:02 ` Miklos Szeredi 2025-09-19 8:20 ` Miklos Szeredi 0 siblings, 1 reply; 10+ messages in thread From: Miklos Szeredi @ 2025-09-18 10:02 UTC (permalink / raw) To: NeilBrown; +Cc: Miklos Szeredi, linux-fsdevel, Al Viro [-- Attachment #1: Type: text/plain, Size: 2460 bytes --] On Thu, 18 Sept 2025 at 03:43, NeilBrown <neilb@ownmail.net> wrote: > > On Thu, 18 Sep 2025, Miklos Szeredi wrote: > > If a path component is revalidated while taking part in a > > rename(RENAME_EXCHANGE) request, userspace might find the already exchanged > > files, while the kernel still has the old ones in dcache. This mismatch > > will cause the dentry to be invalidated (unhashed), resulting in > > "(deleted)" being appended to proc paths. > > An inappropriate d_invalidate() on a positive dentry can do worse than > just adding "(deleted)" to paths in proc. > Any mount points at or below the target of d_invalidate() are unmounted. > So I think it is really important to avoid invalidating a positive > dentry if at all possible. > > If I understand the race correctly, the problem happens if an unlocked > d_revalidate() returns zero so d_invalidate() is called immediately > *after* d_exchange() has succeeded. If it is called before, d_exchange() > will rehash the dentry. Right. But as you say, d_invalidate() can have irreversible effects, so it doesn't really matter when it happens. > Could the same thing happen on rename without RENAME_EXCHANGE? > The unlocked d_revalidate() finds that the name has already been removed > so it requests d_invalidate() which runs immediately *after* d_move() > has succeeded? For some reason I thought -ENOENT won't invalidate the dentry, but I was wrong. > I think the race should be addressed in VFS code as it could affect any > filesystem which provides ->d_revalidate. I'm not sure how. Attaching reproducer. It will trigger immediately on "passthrough_ll -ocache=never /mnt/fuse", which always revalidates lookups. Without "-ocache=never" revalidatation happens after 1s, which makes the race harder to hit, but it still triggers within half a minute for me. > Maybe we could make use of LOOKUP_REVAL retries and mandate that if > LOOKUP_REVAL is set then ->d_revalidate is always called with an > exclusive lock. We would still need some way for d_invalidate() it > decide whether to trigger the retry by causing -ESTALE to be returned. > Maybe try-lock on s_vfs_rename_mutex could be used. > > Or maybe we could sample d_seq before calling ->d_revalidate() and only > allow d_invalidate() to succeed if d_seq is unchanged. If it changed, > we repeat .... something. Maybe the revalidate, maybe the lookup, maybe > the whole path ?? Sounds good, will look into this. Thanks, Miklos [-- Attachment #2: rename_reval_race.c --] [-- Type: text/x-c-code, Size: 1476 bytes --] #include <unistd.h> #include <string.h> #include <pthread.h> #include <stdio.h> #include <fcntl.h> #include <errno.h> #include <err.h> #include <sys/stat.h> static const char *name1 = "test_file1"; static const char *name2 = "test_file2"; static void *stat_loop(void *arg) { struct stat buf; int res; (void) arg; for (;;) { res = stat(name1, &buf); if (res == -1 && errno != ENOENT) err(1, "stat(%s)", name1); } } static void rename_loop(int fd) { int res; unsigned long count = 0; char proc_path[64]; char link[256]; static const char del_str[] = "(deleted)"; static const size_t del_len = sizeof(del_str) - 1; snprintf(proc_path, sizeof(proc_path), "/proc/self/fd/%d", fd); for (;;) { res = rename(name1, name2); if (res == -1) err(1, "rename(%s, %s)", name1, name2); res = readlink(proc_path, link, sizeof(link)); if (res == -1) err(1, "readlink(%s)", proc_path); if ((size_t) res > del_len && memcmp(link + res - del_len, del_str, del_len) == 0) { fprintf(stderr, "\n"); errx(1, "open file invalidated!"); } res = rename(name2, name1); if (res == -1) err(1, "rename(%s, %s)", name2, name1); count++; if (count % 10000 == 0) fprintf(stderr, "."); } } int main(void) { int fd; pthread_t tid; fd = open(name1, O_CREAT | O_RDONLY, 0644); if (fd == -1) err(1, "open %s", name1); if (pthread_create(&tid, NULL, stat_loop, NULL) != 0) errx(1, "pthread_create failed"); rename_loop(fd); } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] fuse: prevent exchange/revalidate races 2025-09-18 10:02 ` Miklos Szeredi @ 2025-09-19 8:20 ` Miklos Szeredi 0 siblings, 0 replies; 10+ messages in thread From: Miklos Szeredi @ 2025-09-19 8:20 UTC (permalink / raw) To: NeilBrown; +Cc: Miklos Szeredi, linux-fsdevel, Al Viro On Thu, 18 Sept 2025 at 12:02, Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Thu, 18 Sept 2025 at 03:43, NeilBrown <neilb@ownmail.net> wrote: > > Or maybe we could sample d_seq before calling ->d_revalidate() and only > > allow d_invalidate() to succeed if d_seq is unchanged. If it changed, > > we repeat .... something. Maybe the revalidate, maybe the lookup, maybe > > the whole path ?? Checking d_seq makes sense. But that doesn't deal with the case of a rename that modified the fs on the server but didn't yet get to d_move/d_exchange. In that case d_seq would be unchanged, yet invalidation could be triggered. To detect this I think a dentry flag (DENTRY_RENAMING) is sufficient. This flag would be set before calling ->rename() and cleared after d_move. A d_invalidate_reval() variant would check both for the sampled d_seq and the dentry flag and if either indicates that the dentry has been renamed after starting the revalidation or is in the process of being renamed, then skip the invalidation. Sending a patch (done with help from me pal Claude). Thanks, Miklos ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-09-19 8:20 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-17 15:30 [PATCH] fuse: prevent exchange/revalidate races Miklos Szeredi 2025-09-17 15:36 ` Al Viro 2025-09-17 15:42 ` Miklos Szeredi 2025-09-17 16:41 ` Al Viro 2025-09-17 17:27 ` Miklos Szeredi 2025-09-17 17:43 ` Al Viro 2025-09-17 17:45 ` Al Viro 2025-09-18 1:43 ` NeilBrown 2025-09-18 10:02 ` Miklos Szeredi 2025-09-19 8:20 ` Miklos Szeredi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).