From: Luis Henriques <luis@igalia.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
fuse-devel@lists.linux.dev, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org,
Matt Harvey <mharvey@jumptrading.com>,
kernel-dev@igalia.com
Subject: Re: [PATCH] fuse: fix race between inode/dentry invalidation and readdir
Date: Mon, 27 Apr 2026 18:06:21 +0100 [thread overview]
Message-ID: <87ik9cjpte.fsf@igalia.com> (raw)
In-Reply-To: <CAJnrk1ZCyHtL6NEaSuww+RQvNNfAW8NkvRA=+=HXQwdG_4aE_Q@mail.gmail.com> (Joanne Koong's message of "Mon, 27 Apr 2026 13:48:19 +0100")
On Mon, Apr 27 2026, Joanne Koong wrote:
> On Mon, Apr 27, 2026 at 10:23 AM Luis Henriques <luis@igalia.com> wrote:
>>
>> Hi Joanne!
>>
>> On Fri, Apr 24 2026, Joanne Koong wrote:
>>
>> > On Fri, Apr 24, 2026 at 6:53 AM Luis Henriques <luis@igalia.com> wrote:
>> >>
>> >> When there's a readdir in progress, doing a FUSE_NOTIFY_INVAL_{INODE,ENTRY}
>> >> on an inode or dentry may result in stale directory info being cached. This
>> >> is because the invalidation does not reset the readdir cache.
>> >>
>> >> This patch fixes this issue by adding a call to fuse_rdc_reset() (modified
>> >> to include the required locking) to these two operations, allowing the
>> >> readdir cache to be invalidated while it's being filled-in.
>> >
>> > Hi Luis,
>> >
>> > Just curious, are you hitting this issue in practice or is this mostly
>> > theoretical?
>> >
>> > afaict for fuse_notify_inval_entry(), it calls into
>> > fuse_reverse_inval_entry() -> fuse_dir_changed(parent), which calls
>> > inode_maybe_inc_iversion(). afaict, this actually increments i_version
>> > (since I_VERSION_QUERIED flag was set when the cache's iversion was
>> > initialized with inode_query_iversion() in fuse_readdir_cached()),
>> > which means the next readdir call will detect this version change and
>> > call fuse_rdc_reset() (in fuse_readdir_cached()). I'm not sure I see
>> > how this leads to stale directory info lingering in the cache after a
>> > concurrent fuse_notify_inval_entry()?
>> >
>> > For teh fuse_notify_inval_inode() case, which I'm assuming is the case
>> > you're running into where the directory is the inode being
>> > invalidated, I see the call to fuse_reverse_inval_inode() which calls
>> > invalidate_inode_pages2_range() if the offset was non-negative, which
>> > will invalidate the readdir cache's pages, which means on the next
>> > readdir call, will already call fuse_rdc_reset() when it detects the
>> > missing page in the cache (in fuse_readdir_cached()). So I'm not
>> > really seeing how this can happen either for the
>> > fuse_notify_inval_inode() case? Unless you are passing a negative
>> > offset, but as I understand it, passing a negative offset is used only
>> > if the server wants attributes invalidated [1], not any data.
>> >
>> > afaics, the onlyy stale directory info returned would be for the case
>> > for a concurrent readdir that has already passed the pos == 0
>> > iversion/mtime check when the invalidation arrives, but that seems
>> > like a server synchronization issue, eg if the server wants uptodate
>> > data when doing a concurrent readdir and invalidation, they have to
>> > order that themselves. ANy fresh lookup after that though, I think
>> > wouldalways return fresh/non-stale data for the reasons mentioned
>> > above.
>> >
>> > Does this align with what you're seeing in the code or am I missing
>> > something here?
>
> Hi Luis,
>
>>
>> First of all, thanks a lot for looking into this and for doing such a
>> great description of the issue.
>>
>> So, I did had a report regarding a possible race between a readdir and
>> invalidation when using keep_cache and cache_readdir. But, unfortunately,
>> I don't have a lot of information regarding the actual issue, and it isn't
>> something reproducible.
>>
>> Then, looking at the code (and, for full-disclosure, I've also looked at a
>> claude analysis that was handed over to me) I could see a race that I'm
>> trying to fix with this patch. But I believe it's the race that you claim
>> above that it's a server synchronisation problem. For example, with a
>> NOTIFY_INVAL_INODE operation, when fuse_reverse_inval_inode() is called
>> while fuse_add_dirent_to_cache() is being executed in parallel, the
>> iversion/mtime update could be missed.
>>
>> It is possible to hit this small race by instrumenting the code, and I
>> could occasionally (and momentarily) see stale data while running readdir
>> in such instrumented testing environment. Do you think that's something
>> inherent to the usage of the INVAL_INODE op, and this race will need to be
>> handled by user-space?
>
> imo yes, that is not a bug in the kernel and userspace is responsible
> for synchronizing/coordinating that. I think the kernel is just
> responsible for ensuring that any subsequent readdirs are not stale,
> but afaict the existing code handles that.
Ack, thanks.
>> In fact, the report I got seemed to indicate that the issue was not going
>> away with a fresh lookup (though an 'echo 1 > /proc/sys/vm/drop_cache'
>> would fix it). But maybe that's another indication that this is a problem
>> in the user-space server.
>
> that seems weird to me, maybe there's something else at play here in
> addition to the concurrent race? Is there a repro for where the stale
> data survives a fresh lookup?
Unfortunately no, I do not have any reproducer. And from looking at the
code I couldn't find anything else. I'll have to look closer into the
user-space code doing the invalidation to try to understand what else
could be at play here. And again, thank you for your feedback, Joanne.
Cheers,
--
Luís
prev parent reply other threads:[~2026-04-27 17:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 13:49 [PATCH] fuse: fix race between inode/dentry invalidation and readdir Luis Henriques
2026-04-24 19:35 ` Joanne Koong
2026-04-27 9:23 ` Luis Henriques
2026-04-27 12:48 ` Joanne Koong
2026-04-27 17:06 ` Luis Henriques [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ik9cjpte.fsf@igalia.com \
--to=luis@igalia.com \
--cc=fuse-devel@lists.linux.dev \
--cc=joannelkoong@gmail.com \
--cc=kernel-dev@igalia.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mharvey@jumptrading.com \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox