From: Luis Henriques <luis@igalia.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
fuse-devel@lists.linux.dev, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org,
Matt Harvey <mharvey@jumptrading.com>,
kernel-dev@igalia.com
Subject: Re: [PATCH] fuse: fix race between inode/dentry invalidation and readdir
Date: Mon, 27 Apr 2026 18:06:21 +0100 [thread overview]
Message-ID: <87ik9cjpte.fsf@igalia.com> (raw)
In-Reply-To: <CAJnrk1ZCyHtL6NEaSuww+RQvNNfAW8NkvRA=+=HXQwdG_4aE_Q@mail.gmail.com> (Joanne Koong's message of "Mon, 27 Apr 2026 13:48:19 +0100")
On Mon, Apr 27 2026, Joanne Koong wrote:
> On Mon, Apr 27, 2026 at 10:23 AM Luis Henriques <luis@igalia.com> wrote:
>>
>> Hi Joanne!
>>
>> On Fri, Apr 24 2026, Joanne Koong wrote:
>>
>> > On Fri, Apr 24, 2026 at 6:53 AM Luis Henriques <luis@igalia.com> wrote:
>> >>
>> >> When there's a readdir in progress, doing a FUSE_NOTIFY_INVAL_{INODE,ENTRY}
>> >> on an inode or dentry may result in stale directory info being cached. This
>> >> is because the invalidation does not reset the readdir cache.
>> >>
>> >> This patch fixes this issue by adding a call to fuse_rdc_reset() (modified
>> >> to include the required locking) to these two operations, allowing the
>> >> readdir cache to be invalidated while it's being filled-in.
>> >
>> > Hi Luis,
>> >
>> > Just curious, are you hitting this issue in practice or is this mostly
>> > theoretical?
>> >
>> > afaict for fuse_notify_inval_entry(), it calls into
>> > fuse_reverse_inval_entry() -> fuse_dir_changed(parent), which calls
>> > inode_maybe_inc_iversion(). afaict, this actually increments i_version
>> > (since I_VERSION_QUERIED flag was set when the cache's iversion was
>> > initialized with inode_query_iversion() in fuse_readdir_cached()),
>> > which means the next readdir call will detect this version change and
>> > call fuse_rdc_reset() (in fuse_readdir_cached()). I'm not sure I see
>> > how this leads to stale directory info lingering in the cache after a
>> > concurrent fuse_notify_inval_entry()?
>> >
>> > For teh fuse_notify_inval_inode() case, which I'm assuming is the case
>> > you're running into where the directory is the inode being
>> > invalidated, I see the call to fuse_reverse_inval_inode() which calls
>> > invalidate_inode_pages2_range() if the offset was non-negative, which
>> > will invalidate the readdir cache's pages, which means on the next
>> > readdir call, will already call fuse_rdc_reset() when it detects the
>> > missing page in the cache (in fuse_readdir_cached()). So I'm not
>> > really seeing how this can happen either for the
>> > fuse_notify_inval_inode() case? Unless you are passing a negative
>> > offset, but as I understand it, passing a negative offset is used only
>> > if the server wants attributes invalidated [1], not any data.
>> >
>> > afaics, the onlyy stale directory info returned would be for the case
>> > for a concurrent readdir that has already passed the pos == 0
>> > iversion/mtime check when the invalidation arrives, but that seems
>> > like a server synchronization issue, eg if the server wants uptodate
>> > data when doing a concurrent readdir and invalidation, they have to
>> > order that themselves. ANy fresh lookup after that though, I think
>> > wouldalways return fresh/non-stale data for the reasons mentioned
>> > above.
>> >
>> > Does this align with what you're seeing in the code or am I missing
>> > something here?
>
> Hi Luis,
>
>>
>> First of all, thanks a lot for looking into this and for doing such a
>> great description of the issue.
>>
>> So, I did had a report regarding a possible race between a readdir and
>> invalidation when using keep_cache and cache_readdir. But, unfortunately,
>> I don't have a lot of information regarding the actual issue, and it isn't
>> something reproducible.
>>
>> Then, looking at the code (and, for full-disclosure, I've also looked at a
>> claude analysis that was handed over to me) I could see a race that I'm
>> trying to fix with this patch. But I believe it's the race that you claim
>> above that it's a server synchronisation problem. For example, with a
>> NOTIFY_INVAL_INODE operation, when fuse_reverse_inval_inode() is called
>> while fuse_add_dirent_to_cache() is being executed in parallel, the
>> iversion/mtime update could be missed.
>>
>> It is possible to hit this small race by instrumenting the code, and I
>> could occasionally (and momentarily) see stale data while running readdir
>> in such instrumented testing environment. Do you think that's something
>> inherent to the usage of the INVAL_INODE op, and this race will need to be
>> handled by user-space?
>
> imo yes, that is not a bug in the kernel and userspace is responsible
> for synchronizing/coordinating that. I think the kernel is just
> responsible for ensuring that any subsequent readdirs are not stale,
> but afaict the existing code handles that.
Ack, thanks.
>> In fact, the report I got seemed to indicate that the issue was not going
>> away with a fresh lookup (though an 'echo 1 > /proc/sys/vm/drop_cache'
>> would fix it). But maybe that's another indication that this is a problem
>> in the user-space server.
>
> that seems weird to me, maybe there's something else at play here in
> addition to the concurrent race? Is there a repro for where the stale
> data survives a fresh lookup?
Unfortunately no, I do not have any reproducer. And from looking at the
code I couldn't find anything else. I'll have to look closer into the
user-space code doing the invalidation to try to understand what else
could be at play here. And again, thank you for your feedback, Joanne.
Cheers,
--
Luís
prev parent reply other threads:[~2026-04-27 17:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 13:49 [PATCH] fuse: fix race between inode/dentry invalidation and readdir Luis Henriques
2026-04-24 19:35 ` Joanne Koong
2026-04-27 9:23 ` Luis Henriques
2026-04-27 12:48 ` Joanne Koong
2026-04-27 17:06 ` Luis Henriques [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ik9cjpte.fsf@igalia.com \
--to=luis@igalia.com \
--cc=fuse-devel@lists.linux.dev \
--cc=joannelkoong@gmail.com \
--cc=kernel-dev@igalia.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mharvey@jumptrading.com \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.