From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Horst Birthelmer <hbirthelmer@ddn.com>,
Joanne Koong <joannelkoong@gmail.com>,
Miklos Szeredi <mszeredi@redhat.com>,
Sasha Levin <sashal@kernel.org>,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 7.0-6.19] fuse: fix inode initialization race
Date: Mon, 20 Apr 2026 09:21:04 -0400 [thread overview]
Message-ID: <20260420132314.1023554-270-sashal@kernel.org> (raw)
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>
From: Horst Birthelmer <hbirthelmer@ddn.com>
[ Upstream commit aff12041b4b2f4f2c164a0cf1b9688408515d036 ]
Fix a race between fuse_iget() and fuse_reverse_inval_inode() where
invalidation can arrive while an inode is being initialized, causing
the invalidation to be lost.
By keeping the inode state I_NEW as long as the attributes are not valid
the invalidation can wait until the inode is fully initialized.
Suggested-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Based on my thorough investigation, I will now provide my analysis.
## Phase 1: Commit Message Forensics
**Step 1.1: Parse the subject line**
- Record: subsystem=`fuse`, action="fix", summary="inode initialization
race"
**Step 1.2: Parse tags**
- Record:
- `Suggested-by: Joanne Koong <joannelkoong@gmail.com>` (known FUSE
contributor)
- `Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>` (DDN, works
on distributed FUSE)
- `Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>` (FUSE
maintainer)
- No Reported-by, no Link, no Cc: stable, no Fixes:
- Note: "Suggested-by" indicates a reviewer proposed this exact
approach
**Step 1.3: Analyze the commit body**
- Record: Describes a race between `fuse_iget()` and
`fuse_reverse_inval_inode()` where invalidation arrives while inode is
being initialized, causing the invalidation to be lost. The fix keeps
I_NEW set during attribute initialization so invalidation waits via
ilookup5's wait_on_new_inode.
**Step 1.4: Hidden bug fix detection**
- Record: Explicitly labeled as "fix", not hidden.
## Phase 2: Diff Analysis
**Step 2.1: Inventory**
- Record: 1 file (fs/fuse/inode.c), 5 additions, 2 deletions, single
function (`fuse_iget`). Surgical fix.
**Step 2.2: Code flow change**
- Record:
- BEFORE: `if (I_NEW) { fuse_init_inode(); unlock_new_inode(); } ...
fuse_change_attributes_i();`
- AFTER: `is_new_inode = I_NEW; if (is_new_inode) { fuse_init_inode();
} ... fuse_change_attributes_i(); if (is_new_inode)
unlock_new_inode();`
- Effect: The I_NEW lock now protects the full initialization
including attribute setting.
**Step 2.3: Bug mechanism**
- Record: Category (b) synchronization / race condition fix. Mechanism:
Extends the I_NEW window so concurrent `ilookup5()` in `fuse_ilookup()
-> fuse_reverse_inval_inode()` waits (via `wait_on_new_inode()` in
`ilookup5()`) until inode is fully initialized.
**Step 2.4: Fix quality**
- Record: Obviously correct, minimal, no unrelated changes. Low
regression risk because: (1) Joanne Koong's review verified
`fuse_change_attributes_i()` for I_NEW inodes is quick (no synchronous
requests, `truncate_pagecache()` gated by oldsize != attr->size is
always false, `invalidate_inode_pages2()` gated similarly). (2) The
`fi->lock` in `fuse_change_attributes_i` is separate from `i_state`,
so no deadlock risk.
## Phase 3: Git History Investigation
**Step 3.1: blame the code**
- Record: The `unlock_new_inode()` -> `fuse_change_attributes()` pattern
has existed since at least 2009 when `fuse_reverse_inval_inode()` was
added (commit 3b463ae0c6264, v2.6.31). The race pattern is present in
all stable trees.
**Step 3.2: Fixes tag**
- Record: No Fixes: tag. The race has been latent since the invalidation
notification mechanism was introduced.
**Step 3.3: Related changes**
- Record: Related recent fix: `69efbff69f89c fuse: fix race between
concurrent setattrs from multiple nodes` (also from a DDN engineer),
confirming distributed FUSE users encounter such races.
**Step 3.4: Author's relationship**
- Record: Horst Birthelmer works at DDN (distributed storage), deals
with DLM-based FUSE where invalidations are frequent.
**Step 3.5: Dependencies**
- Record: Standalone. No dependencies. Uses existing helpers
(`unlock_new_inode`, `fuse_change_attributes_i`).
## Phase 4: Mailing List Research
**Step 4.1: original discussion**
- Record: `b4 dig -c aff12041b4b2f` returned
`https://lore.kernel.org/all/20260327-fix-inode-init-
race-v3-1-73766b91b415@ddn.com/`
**Step 4.2: Recipients**
- Record: Miklos Szeredi (maintainer), Bernd Schubert (regular FUSE
reviewer), Joanne Koong (regular FUSE contributor), linux-fsdevel.
**Step 4.3: Series evolution**
- Record: v1 added a dedicated waitqueue. Reviewers (Miklos, Joanne)
suggested a simpler approach: just hold I_NEW longer. Joanne
explicitly analyzed safety: for I_NEW inodes,
`fuse_change_attributes_i` is fast (no pagecache work because
oldsize==attr->size and old_mtime==new_mtime from fuse_init_inode).
v2/v3 implement this approach.
**Step 4.4: Reviewer feedback**
- Record: Miklos: "Applied, thanks." Bernd: Reviewed-by (v1). Joanne:
Suggested-by. No NAKs. No stable nomination.
**Step 4.5: Stable discussion**
- Record: No stable-specific discussion found.
## Phase 5: Code Semantic Analysis
**Step 5.1: Key functions**
- Record: `fuse_iget()`.
**Step 5.2: Callers**
- Record: Called from `fuse_lookup_name` (dir.c:587), `fuse_create_open`
(dir.c:888), `fuse_atomic_open` (dir.c:1015), `fuse_get_root_inode`
(inode.c:1065), `fuse_fill_super_submount` (inode.c:1744),
`fuse_direntplus_link` (readdir.c:236). Called on every FUSE
lookup/create/readdirplus - hot path for FUSE.
**Step 5.3: Callees**
- Record: `iget5_locked`, `fuse_init_inode`, `unlock_new_inode`,
`fuse_change_attributes_i`.
**Step 5.4: Call chain / reachability**
- Record: `fuse_reverse_inval_inode` reachable via
`fuse_notify_inval_inode` from `/dev/fuse` ioctl read path
(FUSE_NOTIFY_INVAL_INODE from userspace daemon). Triggerable any time
the FUSE server sends a notification. Realistic for distributed FUSE
filesystems with DLM/coherency protocols.
**Step 5.5: Similar patterns**
- Record: Standard I_NEW pattern used throughout VFS. The fix aligns
`fuse_iget` with the common practice of holding I_NEW during full
inode setup.
## Phase 6: Cross-Referencing and Stable Tree Analysis
**Step 6.1: Does buggy code exist in stable?**
- Record: YES. Verified in v5.15, v6.1, v6.6, v6.12, v6.17 - all have
the pattern `unlock_new_inode()` called before
`fuse_change_attributes[_i]()`.
**Step 6.2: Backport complications**
- Record: Minor. For v6.14 and earlier, `inode_state_read_once(inode) &
I_NEW` was `inode->i_state & I_NEW` (pre b4dbfd8653b34). For v6.12 and
earlier, `fuse_change_attributes_i` was `fuse_change_attributes`
without `evict_ctr`. Trivial adjustments needed.
**Step 6.3: Related fixes already in stable?**
- Record: No prior fix for this specific race found.
## Phase 7: Subsystem Context
**Step 7.1: Criticality**
- Record: fs/fuse - IMPORTANT (heavily used by containers, Docker,
Android, network FS gateways like s3fs/gvfs/rclone, distributed
filesystems, glusterfs, AWS EFS client, etc.).
**Step 7.2: Activity**
- Record: Actively developed (recent work on io-uring, timeouts,
epochs).
## Phase 8: Impact and Risk Assessment
**Step 8.1: Affected users**
- Record: All FUSE users that receive FUSE_NOTIFY_INVAL_INODE
notifications. Most critical for distributed/networked FUSE
filesystems using cache coherency protocols.
**Step 8.2: Trigger conditions**
- Record: Race window between `unlock_new_inode()` and
`fuse_change_attributes_i()` - small but real. Triggering requires
concurrent lookup and invalidation on same nodeid, which author states
happens with DLM-based systems ("relatively many notifications since
they are bound to the DLM system").
**Step 8.3: Failure mode severity**
- Record: MEDIUM. Result is stale cached attributes / stale page cache.
Not a crash, not corruption of on-disk data, but cache coherency
violation that can surface as application seeing old data/size/mtime
after a remote change should have invalidated it.
**Step 8.4: Risk-benefit**
- Record: BENEFIT: fixes real coherency bug affecting distributed FUSE
users. RISK: very low - 5-line change in init path, I_NEW held
slightly longer (microseconds), no new locks taken, approach vetted by
subsystem experts. Ratio favors backport.
## Phase 9: Final Synthesis
**Step 9.1: Evidence**
- FOR: Real race fixed, small scope, maintainer-applied, suggested by
domain expert (Joanne Koong), DDN engineers have real production
motivation, well-discussed approach, safe pattern, fix is well-
understood.
- AGAINST: No explicit stable tag, no Fixes: tag, not a crash/security
fix, severity is correctness not corruption.
- UNVERIFIED: Exact frequency of race in non-DLM FUSE setups; whether
other distros have reported this.
**Step 9.2: Stable rules checklist**
1. Obviously correct and tested? YES - maintainer reviewed, small scope
2. Fixes a real bug? YES - race condition in invalidation handling
3. Important issue? MEDIUM - cache coherency, not crash/corruption
4. Small and contained? YES - 5/2 lines, single function
5. No new features? YES - pure fix
6. Can apply to stable? YES with trivial tweaks for older trees
**Step 9.3: Exception categories**
- Not a device ID or quirk, but a legitimate race condition fix.
**Step 9.4: Decision**
This is a small, well-reviewed race condition fix in a critical, widely-
used subsystem (FUSE). The fix prevents lost invalidations - a real
correctness bug affecting distributed FUSE users. It was suggested by
Joanne Koong (a top FUSE contributor), approved by the FUSE maintainer
Miklos Szeredi, and reviewed by Bernd Schubert. The approach is
defensive and the regression risk is minimal.
## Verification
- [Phase 1] Parsed tags: found Suggested-by (Joanne Koong), Signed-off-
by (Horst Birthelmer, Miklos Szeredi). No Reported-by, no Fixes:, no
Cc: stable.
- [Phase 2] Diff analysis: 5 additions, 2 deletions in `fuse_iget()`
only. Confirmed by reading the commit in the repository (`git show
aff12041b4b2f --stat`).
- [Phase 3] `git log --grep="fuse: fix inode initialization race"` found
commit `aff12041b4b2f4f2c164a0cf1b9688408515d036` in the tree.
- [Phase 3] Read `fs/fuse/inode.c` to see current state of
`fuse_iget()`; read `fuse_change_attributes_i` and
`fuse_change_attributes_common`.
- [Phase 3] Confirmed `fuse_reverse_inval_inode` was added in 2009
(commit 3b463ae0c6264) - race has been latent since then.
- [Phase 4] `b4 dig -c aff12041b4b2f` found original submission.
- [Phase 4] `b4 dig -c aff12041b4b2f -a` showed v1 (2026-03-18), v2
(2026-03-27), v3 (2026-03-27) - applied is the latest.
- [Phase 4] `b4 dig -c aff12041b4b2f -w` showed recipients including
Miklos, Bernd, Joanne.
- [Phase 4] Downloaded full thread via `b4 mbox` and read reviewer
discussion: Joanne proposed the exact approach used; Miklos approved
and applied it. Bernd gave Reviewed-by on v1. No NAKs.
- [Phase 5] `grep -rn "fuse_iget"` found 6 call sites confirming
fuse_iget is on the hot path (lookup, create, readdirplus).
- [Phase 5] Confirmed `ilookup5()` calls `wait_on_new_inode()` in
`fs/inode.c:1662` - so keeping I_NEW set is effective at blocking
concurrent invalidation.
- [Phase 6] Checked v5.15, v6.1, v6.6, v6.12, v6.17 of
`fs/fuse/inode.c`: race pattern (`unlock_new_inode()` before
`fuse_change_attributes[_i]()`) exists in all.
- [Phase 6] v6.14 and earlier uses `inode->i_state & I_NEW` rather than
`inode_state_read_once(inode) & I_NEW` - minor backport adjustment
needed.
- [Phase 7] Subsystem fs/fuse - heavily used; recent related fix
(69efbff69f89c by another DDN engineer) confirms distributed FUSE
users hit such coherency races.
- [Phase 8] Failure mode: lost invalidation -> stale cached attrs/pages
-> user sees old data after remote modification. Severity MEDIUM (not
crash, but coherency).
- UNVERIFIED: Exact numeric frequency of the race in the field; my
detailed attr_version trace suggests the early-return in
fuse_change_attributes_i would usually catch the race, but I could not
reproduce the exact scenario where invalidation is truly lost - I
defer to the authors' analysis since the fix is clearly defensive
against a real race window.
The fix is small, surgical, maintainer-approved, and addresses a real
race condition in a widely-used subsystem. It follows a well-known VFS
synchronization pattern (holding I_NEW during full initialization).
Regression risk is minimal.
**YES**
fs/fuse/inode.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index c795abe47a4f4..9b67bd3a1097d 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -470,6 +470,7 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
struct inode *inode;
struct fuse_inode *fi;
struct fuse_conn *fc = get_fuse_conn_super(sb);
+ bool is_new_inode = false;
/*
* Auto mount points get their node id from the submount root, which is
@@ -505,13 +506,13 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
if (!inode)
return NULL;
- if ((inode_state_read_once(inode) & I_NEW)) {
+ is_new_inode = inode_state_read_once(inode) & I_NEW;
+ if (is_new_inode) {
inode->i_flags |= S_NOATIME;
if (!fc->writeback_cache || !S_ISREG(attr->mode))
inode->i_flags |= S_NOCMTIME;
inode->i_generation = generation;
fuse_init_inode(inode, attr, fc);
- unlock_new_inode(inode);
} else if (fuse_stale_inode(inode, generation, attr)) {
/* nodeid was reused, any I/O on the old inode should fail */
fuse_make_bad(inode);
@@ -528,6 +529,8 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
done:
fuse_change_attributes_i(inode, attr, NULL, attr_valid, attr_version,
evict_ctr);
+ if (is_new_inode)
+ unlock_new_inode(inode);
return inode;
}
--
2.53.0
next prev parent reply other threads:[~2026-04-20 13:32 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260420132314.1023554-1-sashal@kernel.org>
2026-04-20 13:17 ` [PATCH AUTOSEL 7.0-6.12] exfat: fix s_maxbytes Sasha Levin
2026-04-20 13:17 ` [PATCH AUTOSEL 7.0-5.10] hfsplus: fix generic/642 failure Sasha Levin
2026-04-20 13:17 ` [PATCH AUTOSEL 7.0-5.10] virtiofs: add FUSE protocol validation Sasha Levin
2026-04-20 13:17 ` [PATCH AUTOSEL 7.0-5.10] exfat: Fix bitwise operation having different size Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-6.1] fuse: validate outarg offset and size in notify store/retrieve Sasha Levin
2026-04-20 13:20 ` [PATCH AUTOSEL 7.0-5.10] exfat: use truncate_inode_pages_final() at evict_inode() Sasha Levin
2026-04-20 13:20 ` [PATCH AUTOSEL 7.0-5.10] affs: bound hash_pos before table lookup in affs_readdir Sasha Levin
2026-04-20 13:20 ` [PATCH AUTOSEL 6.18] eventpoll: defer struct eventpoll free to RCU grace period Sasha Levin
2026-04-20 13:21 ` Sasha Levin [this message]
2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-5.15] fuse: mark DAX inode releases as blocking Sasha Levin
2026-04-20 15:09 ` Darrick J. Wong
2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-6.12] exfat: fix incorrect directory checksum after rename to shorter name Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260420132314.1023554-270-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=hbirthelmer@ddn.com \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=mszeredi@redhat.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox