From: Luis Henriques <luis@igalia.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
fuse-devel@lists.linux.dev, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org,
Matt Harvey <mharvey@jumptrading.com>,
kernel-dev@igalia.com
Subject: Re: [PATCH] fuse: fix race between inode/dentry invalidation and readdir
Date: Mon, 27 Apr 2026 10:23:50 +0100 [thread overview]
Message-ID: <87mryokb89.fsf@igalia.com> (raw)
In-Reply-To: <CAJnrk1bv4NK7MF4msYMAmbS=+vL=k8NLoLiNyM4QB+amccpOww@mail.gmail.com> (Joanne Koong's message of "Fri, 24 Apr 2026 12:35:11 -0700")
Hi Joanne!
On Fri, Apr 24 2026, Joanne Koong wrote:
> On Fri, Apr 24, 2026 at 6:53 AM Luis Henriques <luis@igalia.com> wrote:
>>
>> When there's a readdir in progress, doing a FUSE_NOTIFY_INVAL_{INODE,ENTRY}
>> on an inode or dentry may result in stale directory info being cached. This
>> is because the invalidation does not reset the readdir cache.
>>
>> This patch fixes this issue by adding a call to fuse_rdc_reset() (modified
>> to include the required locking) to these two operations, allowing the
>> readdir cache to be invalidated while it's being filled-in.
>
> Hi Luis,
>
> Just curious, are you hitting this issue in practice or is this mostly
> theoretical?
>
> afaict for fuse_notify_inval_entry(), it calls into
> fuse_reverse_inval_entry() -> fuse_dir_changed(parent), which calls
> inode_maybe_inc_iversion(). afaict, this actually increments i_version
> (since I_VERSION_QUERIED flag was set when the cache's iversion was
> initialized with inode_query_iversion() in fuse_readdir_cached()),
> which means the next readdir call will detect this version change and
> call fuse_rdc_reset() (in fuse_readdir_cached()). I'm not sure I see
> how this leads to stale directory info lingering in the cache after a
> concurrent fuse_notify_inval_entry()?
>
> For teh fuse_notify_inval_inode() case, which I'm assuming is the case
> you're running into where the directory is the inode being
> invalidated, I see the call to fuse_reverse_inval_inode() which calls
> invalidate_inode_pages2_range() if the offset was non-negative, which
> will invalidate the readdir cache's pages, which means on the next
> readdir call, will already call fuse_rdc_reset() when it detects the
> missing page in the cache (in fuse_readdir_cached()). So I'm not
> really seeing how this can happen either for the
> fuse_notify_inval_inode() case? Unless you are passing a negative
> offset, but as I understand it, passing a negative offset is used only
> if the server wants attributes invalidated [1], not any data.
>
> afaics, the onlyy stale directory info returned would be for the case
> for a concurrent readdir that has already passed the pos == 0
> iversion/mtime check when the invalidation arrives, but that seems
> like a server synchronization issue, eg if the server wants uptodate
> data when doing a concurrent readdir and invalidation, they have to
> order that themselves. ANy fresh lookup after that though, I think
> wouldalways return fresh/non-stale data for the reasons mentioned
> above.
>
> Does this align with what you're seeing in the code or am I missing
> something here?
First of all, thanks a lot for looking into this and for doing such a
great description of the issue.
So, I did had a report regarding a possible race between a readdir and
invalidation when using keep_cache and cache_readdir. But, unfortunately,
I don't have a lot of information regarding the actual issue, and it isn't
something reproducible.
Then, looking at the code (and, for full-disclosure, I've also looked at a
claude analysis that was handed over to me) I could see a race that I'm
trying to fix with this patch. But I believe it's the race that you claim
above that it's a server synchronisation problem. For example, with a
NOTIFY_INVAL_INODE operation, when fuse_reverse_inval_inode() is called
while fuse_add_dirent_to_cache() is being executed in parallel, the
iversion/mtime update could be missed.
It is possible to hit this small race by instrumenting the code, and I
could occasionally (and momentarily) see stale data while running readdir
in such instrumented testing environment. Do you think that's something
inherent to the usage of the INVAL_INODE op, and this race will need to be
handled by user-space?
In fact, the report I got seemed to indicate that the issue was not going
away with a fresh lookup (though an 'echo 1 > /proc/sys/vm/drop_cache'
would fix it). But maybe that's another indication that this is a problem
in the user-space server.
Cheers,
--
Luís
>
>>
>> Assisted-by: Claude:claude-opus-4-5
>> Signed-off-by: Luis Henriques <luis@igalia.com>
>> ---
>> fs/fuse/dir.c | 5 +++--
>> fs/fuse/fuse_i.h | 13 +++++++++++++
>> fs/fuse/inode.c | 1 +
>> fs/fuse/readdir.c | 6 +++---
>> 4 files changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>> index 7ac6b232ef12..6e5851de3613 100644
>> --- a/fs/fuse/dir.c
>> +++ b/fs/fuse/dir.c
>> @@ -1615,6 +1615,7 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
>> if (!(flags & FUSE_EXPIRE_ONLY))
>> d_invalidate(entry);
>> fuse_invalidate_entry_cache(entry);
>> + fuse_rdc_reset(entry->d_inode);
>
> Hmm... I think this resets the child's readdir cache but it's the
> parent's readdir cache that would have to be invalidated, so would
> this have to be fuse_rdc_reset(parent)?
>
> Thanks,
> Joanne
>
> [1] https://libfuse.github.io/doxygen/fuse__lowlevel_8h.html#a9cb974af9745294ff446d11cba2422f1
>
>>
>> if (child_nodeid != 0) {
>> inode_lock(d_inode(entry));
>> @@ -1637,7 +1638,7 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
>> dont_mount(entry);
>> clear_nlink(d_inode(entry));
>> err = 0;
>> - badentry:
>> +badentry:
>> inode_unlock(d_inode(entry));
>> if (!err)
>> d_delete(entry);
next prev parent reply other threads:[~2026-04-27 9:23 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 13:49 [PATCH] fuse: fix race between inode/dentry invalidation and readdir Luis Henriques
2026-04-24 19:35 ` Joanne Koong
2026-04-27 9:23 ` Luis Henriques [this message]
2026-04-27 12:48 ` Joanne Koong
2026-04-27 17:06 ` Luis Henriques
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mryokb89.fsf@igalia.com \
--to=luis@igalia.com \
--cc=fuse-devel@lists.linux.dev \
--cc=joannelkoong@gmail.com \
--cc=kernel-dev@igalia.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mharvey@jumptrading.com \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.