public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Luis Henriques <luis@igalia.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	 fuse-devel@lists.linux.dev, linux-fsdevel@vger.kernel.org,
	 linux-kernel@vger.kernel.org,
	 Matt Harvey <mharvey@jumptrading.com>,
	 kernel-dev@igalia.com
Subject: Re: [PATCH] fuse: fix race between inode/dentry invalidation and readdir
Date: Mon, 27 Apr 2026 10:23:50 +0100	[thread overview]
Message-ID: <87mryokb89.fsf@igalia.com> (raw)
In-Reply-To: <CAJnrk1bv4NK7MF4msYMAmbS=+vL=k8NLoLiNyM4QB+amccpOww@mail.gmail.com> (Joanne Koong's message of "Fri, 24 Apr 2026 12:35:11 -0700")

Hi Joanne!

On Fri, Apr 24 2026, Joanne Koong wrote:

> On Fri, Apr 24, 2026 at 6:53 AM Luis Henriques <luis@igalia.com> wrote:
>>
>> When there's a readdir in progress, doing a FUSE_NOTIFY_INVAL_{INODE,ENTRY}
>> on an inode or dentry may result in stale directory info being cached.  This
>> is because the invalidation does not reset the readdir cache.
>>
>> This patch fixes this issue by adding a call to fuse_rdc_reset() (modified
>> to include the required locking) to these two operations, allowing the
>> readdir cache to be invalidated while it's being filled-in.
>
> Hi Luis,
>
> Just curious, are you hitting this issue in practice or is this mostly
> theoretical?
>
> afaict for fuse_notify_inval_entry(), it calls into
> fuse_reverse_inval_entry() -> fuse_dir_changed(parent), which calls
> inode_maybe_inc_iversion(). afaict, this actually increments i_version
> (since I_VERSION_QUERIED flag was set when the cache's iversion was
> initialized with inode_query_iversion() in fuse_readdir_cached()),
> which means the next readdir call will detect this version change and
> call fuse_rdc_reset() (in fuse_readdir_cached()). I'm not sure I see
> how this leads to stale directory info lingering in the cache after a
> concurrent fuse_notify_inval_entry()?
>
> For teh fuse_notify_inval_inode() case, which I'm assuming is the case
> you're running into where the directory is the inode being
> invalidated, I see the call to fuse_reverse_inval_inode() which calls
> invalidate_inode_pages2_range() if the offset was non-negative, which
> will invalidate the readdir cache's pages, which means on the next
> readdir call, will already call fuse_rdc_reset() when it detects the
> missing page in the cache (in fuse_readdir_cached()). So I'm not
> really seeing how this can happen either for the
> fuse_notify_inval_inode() case? Unless you are passing a negative
> offset, but as I understand it, passing a negative offset is used only
> if the server wants attributes invalidated [1], not any data.
>
> afaics, the onlyy stale directory info returned would be for the case
> for a concurrent readdir that has already passed the pos == 0
> iversion/mtime check when the invalidation arrives, but that seems
> like a server synchronization issue, eg if the server wants uptodate
> data when doing a concurrent readdir and invalidation, they have to
> order that themselves. ANy fresh lookup after that though, I think
> wouldalways return fresh/non-stale data for the reasons mentioned
> above.
>
> Does this align with what you're seeing in the code or am I missing
> something here?

First of all, thanks a lot for looking into this and for doing such a
great description of the issue.

So, I did had a report regarding a possible race between a readdir and
invalidation when using keep_cache and cache_readdir.  But, unfortunately,
I don't have a lot of information regarding the actual issue, and it isn't
something reproducible.

Then, looking at the code (and, for full-disclosure, I've also looked at a
claude analysis that was handed over to me) I could see a race that I'm
trying to fix with this patch.  But I believe it's the race that you claim
above that it's a server synchronisation problem.  For example, with a
NOTIFY_INVAL_INODE operation, when fuse_reverse_inval_inode() is called
while fuse_add_dirent_to_cache() is being executed in parallel, the
iversion/mtime update could be missed.

It is possible to hit this small race by instrumenting the code, and I
could occasionally (and momentarily) see stale data while running readdir
in such instrumented testing environment.  Do you think that's something
inherent to the usage of the INVAL_INODE op, and this race will need to be
handled by user-space?

In fact, the report I got seemed to indicate that the issue was not going
away with a fresh lookup (though an 'echo 1 > /proc/sys/vm/drop_cache'
would fix it).  But maybe that's another indication that this is a problem
in the user-space server.

Cheers,
-- 
Luís

>
>>
>> Assisted-by: Claude:claude-opus-4-5
>> Signed-off-by: Luis Henriques <luis@igalia.com>
>> ---
>>  fs/fuse/dir.c     |  5 +++--
>>  fs/fuse/fuse_i.h  | 13 +++++++++++++
>>  fs/fuse/inode.c   |  1 +
>>  fs/fuse/readdir.c |  6 +++---
>>  4 files changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>> index 7ac6b232ef12..6e5851de3613 100644
>> --- a/fs/fuse/dir.c
>> +++ b/fs/fuse/dir.c
>> @@ -1615,6 +1615,7 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
>>         if (!(flags & FUSE_EXPIRE_ONLY))
>>                 d_invalidate(entry);
>>         fuse_invalidate_entry_cache(entry);
>> +       fuse_rdc_reset(entry->d_inode);
>
> Hmm... I think this resets the child's readdir cache but it's the
> parent's readdir cache that would have to be invalidated, so would
> this have to be fuse_rdc_reset(parent)?
>
> Thanks,
> Joanne
>
> [1] https://libfuse.github.io/doxygen/fuse__lowlevel_8h.html#a9cb974af9745294ff446d11cba2422f1
>
>>
>>         if (child_nodeid != 0) {
>>                 inode_lock(d_inode(entry));
>> @@ -1637,7 +1638,7 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
>>                 dont_mount(entry);
>>                 clear_nlink(d_inode(entry));
>>                 err = 0;
>> - badentry:
>> +badentry:
>>                 inode_unlock(d_inode(entry));
>>                 if (!err)
>>                         d_delete(entry);

  reply	other threads:[~2026-04-27  9:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 13:49 [PATCH] fuse: fix race between inode/dentry invalidation and readdir Luis Henriques
2026-04-24 19:35 ` Joanne Koong
2026-04-27  9:23   ` Luis Henriques [this message]
2026-04-27 12:48     ` Joanne Koong
2026-04-27 17:06       ` Luis Henriques

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mryokb89.fsf@igalia.com \
    --to=luis@igalia.com \
    --cc=fuse-devel@lists.linux.dev \
    --cc=joannelkoong@gmail.com \
    --cc=kernel-dev@igalia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mharvey@jumptrading.com \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox