From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: robbieko <robbieko@synology.com>,
Johannes Thumshirn <johannes.thumshirn@wdc.com>,
David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>,
clm@fb.com, linux-btrfs@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 7.0-6.18] btrfs: fix raid stripe search missing entries at leaf boundaries
Date: Tue, 5 May 2026 05:51:41 -0400 [thread overview]
Message-ID: <20260505095149.512052-25-sashal@kernel.org> (raw)
In-Reply-To: <20260505095149.512052-1-sashal@kernel.org>
From: robbieko <robbieko@synology.com>
[ Upstream commit 2aef5cb1dcf9b3e1be3895a6477dc065e618aab8 ]
In btrfs_delete_raid_extent(), the search key uses offset=0. When the
target stripe entry is the first item on a leaf, btrfs_search_slot()
may land on the previous leaf and decrementing the slot from nritems
still points to the wrong entry, causing the stripe extent to be
silently missed.
Fix this by searching with offset=(u64)-1 instead. Since no real stripe
entry has this offset, btrfs_search_slot() always returns 1 with the
slot pointing past the last matching objectid entry. Then unconditionally
decrement the slot with a proper slots[0]==0 early-exit check to handle
the case where no matching entry exists.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Decision Summary
Backport this to stable trees that contain the
`btrfs_delete_raid_extent()` `offset = 0` search logic, verified locally
in `v6.13` through `v7.0`. It is a small Btrfs filesystem metadata
correctness fix: a leaf-boundary search miss can leave a RAID stripe-
tree entry undeleted, and that tree is used later to derive read-side
physical stripe mappings.
The main caveat is scope: RAID stripe tree support is behind
`CONFIG_BTRFS_EXPERIMENTAL`, so the affected population is config and
filesystem-feature specific. Still, for those users, the benefit
outweighs the low patch risk.
## Phase Walkthrough
Phase 1: Commit Message Forensics
- Step 1.1 Record: subsystem `btrfs`; action verb `fix`; intent is to
prevent RAID stripe-tree deletion from missing entries at B-tree leaf
boundaries.
- Step 1.2 Record: tags are `Reviewed-by: Johannes Thumshirn
<johannes.thumshirn@wdc.com>`, `Signed-off-by: robbieko
<robbieko@synology.com>`, `Signed-off-by: David Sterba
<dsterba@suse.com>`. No `Fixes:`, `Reported-by:`, `Tested-by:`, `Cc:
stable`, or `Link:` tags.
- Step 1.3 Record: the described bug is that searching with `offset = 0`
can land on the previous leaf when the target stripe item is the first
item in the next leaf; the existing decrement then inspects the wrong
entry and silently misses the stripe extent. No affected kernel
versions are stated in the message.
- Step 1.4 Record: this is not hidden cleanup; it is an explicit
correctness bug fix.
Phase 2: Diff Analysis
- Step 2.1 Record: one file changed, `fs/btrfs/raid-stripe-tree.c`,
+15/-3 in `btrfs_delete_raid_extent()`. Scope is single-function
surgical.
- Step 2.2 Record: before, the delete loop searched `(objectid=start,
type=RAID_STRIPE, offset=0)` and only decremented the slot when the
slot was at `nritems`. After, it searches with `offset=(u64)-1`,
checks `slots[0] == 0`, then always decrements to the last candidate
item.
- Step 2.3 Record: bug category is logic/correctness in B-tree search
positioning. The fix uses Btrfs key ordering by `objectid`, then
`type`, then `offset`, and `btrfs_bin_search()` semantics where a
missing key returns the insertion slot.
- Step 2.4 Record: fix quality is good for valid filesystem state:
small, local, no API changes. Regression risk is low. One review
concern exists: David Sterba noted that an exact corrupt
`offset=(u64)-1` key should ideally be handled as `-EUCLEAN`; that is
robustness against corrupted/fuzzed images, not the normal valid-state
bug being fixed.
Phase 3: Git History Investigation
- Step 3.1 Record: blame shows the delete function was introduced by
`ca41504efda646` in `v6.7`, but the specific `key.offset = 0` logic
and `nritems` decrement pattern were introduced by `6aea95ee318890`
(`btrfs: implement partial deletion of RAID stripe extents`), first
contained in `v6.13`.
- Step 3.2 Record: no `Fixes:` tag, so no tagged original commit to
follow. Blame nevertheless identifies `6aea95ee318890` as the relevant
introducer.
- Step 3.3 Record: related recent commits in master are a six-patch
deletion-path fix series: copy missing devid, this leaf-boundary
search fix, wrong `btrfs_previous_item()` min objectid, ASSERT-to-
error handling, `-EAGAIN`/stale leaf handling, and return-value
checking.
- Step 3.4 Record: author `robbieko` authored all six related RAID
stripe-tree deletion fixes in that series. I did not verify them as a
subsystem maintainer; David Sterba committed the series and Johannes
reviewed this patch.
- Step 3.5 Record: candidate applies standalone to the current `7.0.3`
worktree via `git apply --check`; companion patches are related fixes
but not prerequisites for this hunk.
Phase 4: Mailing List And External Research
- Step 4.1 Record: `b4 dig -c 2aef5cb1dcf9b` found the original
submission at `https://patch.msgid.link/20260413065249.2320122-3-
robbieko@synology.com`. `b4 dig -a` found only v1 of the six-patch
series.
- Step 4.2 Record: `b4 dig -w` showed original recipients were
`robbieko` and `linux-btrfs@vger.kernel.org`. The thread later
includes Johannes Thumshirn and David Sterba review/maintainer
feedback.
- Step 4.3 Record: no separate bug report or syzbot report was linked.
- Step 4.4 Record: series context confirms this is patch 2/6 in “fix
multiple bugs in raid-stripe-tree deletion path.” Johannes asked for
tests for the conditions; Johannes also gave `Reviewed-by` on this
patch. David later said the series was added to `for-next`.
- Step 4.5 Record: direct lore stable search was blocked by Anubis; web
search did not find stable-specific discussion. This remains
unverified.
Phase 5: Code Semantic Analysis
- Step 5.1 Record: modified function is `btrfs_delete_raid_extent()`.
- Step 5.2 Record: callers are `do_free_extent_accounting()` in
`fs/btrfs/extent-tree.c` and RAID stripe-tree selftests.
`do_free_extent_accounting()` is reached when a data extent’s refs
drop to zero in `__btrfs_free_extent()`.
- Step 5.3 Record: key callees include `btrfs_search_slot()`,
`btrfs_item_key_to_cpu()`, `btrfs_del_item()`,
`btrfs_previous_item()`, and partial deletion helpers.
- Step 5.4 Record: user reachability is through Btrfs extent
deletion/accounting paths; `file.c` calls `btrfs_free_extent()` while
dropping file extents. Exact VFS entrypoints were not fully traced,
but the path is filesystem-operation reachable on RST-enabled
filesystems.
- Step 5.5 Record: similar verified patterns include
`btrfs_search_prev_slot()` checking `slot == 0` before decrementing.
`zoned.c` also demonstrates the Btrfs pattern of treating impossible
exact matches as `-EUCLEAN`, matching David’s review concern.
Phase 6: Stable Tree Analysis
- Step 6.1 Record: `v6.6` lacks this file/feature; `v6.12` has
`key.offset = length`; `v6.13` through `v7.0` have `key.offset = 0`
plus the vulnerable `nritems` decrement. Candidate is in `v7.1-rc2`,
not in `v7.0`.
- Step 6.2 Record: backport difficulty is clean for current `7.0.3` by
`git apply --check`; older affected trees appear to have the same
local search pattern, but exact apply was not checked on separate
worktrees.
- Step 6.3 Record: no alternate stable fix for the same subject was
found locally; stable-list search was blocked, so external stable
history is partially unverified.
Phase 7: Subsystem Context
- Step 7.1 Record: subsystem is Btrfs filesystem, specifically RAID
stripe tree. Criticality is important for users of that filesystem
feature, but not universal.
- Step 7.2 Record: file history shows active development and several
recent bug fixes in the same deletion path, indicating this code is
relatively new and still being hardened.
Phase 8: Impact And Risk
- Step 8.1 Record: affected population is config-specific and feature-
specific: Btrfs filesystems using `RAID_STRIPE_TREE`, which is
supported only when `CONFIG_BTRFS_EXPERIMENTAL` is enabled.
- Step 8.2 Record: trigger is deleting/freeing data extents where the
matching stripe item is first in a leaf. Commonness of the leaf-
boundary condition is data-layout dependent. User triggering is
partially verified via file extent deletion paths, subject to
filesystem permissions and the RST feature being enabled.
- Step 8.3 Record: failure mode is stale/missed RAID stripe-tree
deletion. Severity is high for affected users because
`btrfs_get_raid_extent_offset()` is used for read mapping and sets
`stripe->physical` from stripe-tree contents.
- Step 8.4 Record: benefit is medium-high for affected RST users; risk
is low because the change is +15/-3, one function, no ABI/API change,
and accepted by Btrfs maintainers.
Phase 9: Final Synthesis
- Step 9.1 Record: evidence for backporting: real metadata correctness
bug, verified affected stable code in `v6.13`-`v7.0`, small fix,
reviewed by Johannes, committed by David, clean apply to current
stable worktree. Evidence against: experimental/config-specific
feature, no `Reported-by`/`Tested-by`, no explicit stable nomination,
companion fixes in same area, and one unaddressed corrupt-image
robustness comment.
- Step 9.2 Record: stable rules: obviously correct for valid state yes;
fixes real bug yes; important issue yes for RST metadata/read mapping
correctness; small and contained yes; no new feature/API yes; applies
to current stable tree yes.
- Step 9.3 Record: no exception category like device ID, quirk, DT,
build, or docs applies.
- Step 9.4 Record: decision is YES, limited to stable trees containing
the `offset = 0` deletion search logic.
## Verification
- Phase 1: Parsed `git show 2aef5cb1dcf9b` and confirmed tags, subject,
message, and +15/-3 diff.
- Phase 2: Read `fs/btrfs/raid-stripe-tree.c` and confirmed the exact
modified code in `btrfs_delete_raid_extent()`.
- Phase 3: Ran blame around the changed lines; confirmed
`6aea95ee318890` introduced `offset = 0`, and tag containment starts
at `v6.13`.
- Phase 3: Inspected `ca41504efda646` and confirmed the function existed
earlier with a different `offset = length` search.
- Phase 4: `b4 dig -c 2aef5cb1dcf9b` found the patch thread; `-a` found
only v1; `-w` showed original recipients.
- Phase 4: Saved and searched the mbox; confirmed Johannes’ `Reviewed-
by`, test request, David’s `EUCLEAN` robustness comment, and David’s
“Added to for-next”.
- Phase 5: Used code search and reads to trace
`btrfs_delete_raid_extent()` from `do_free_extent_accounting()` and
`__btrfs_free_extent()`, with file deletion paths calling
`btrfs_free_extent()`.
- Phase 5: Verified read-side RST use: `set_io_stripe()` calls
`btrfs_get_raid_extent_offset()` for reads, and that function sets
`stripe->physical` from the stripe-tree item.
- Phase 6: Checked historical tags: `v6.12` uses `offset = length`;
`v6.13`-`v7.0` use `offset = 0` and the vulnerable slot handling.
- Phase 6: Ran `git apply --check` for the upstream patch against
current `7.0.3`; it succeeded.
- UNVERIFIED: Direct lore stable search was blocked by Anubis, so
stable-list discussion could not be confirmed.
- UNVERIFIED: Exact apply status on every older affected stable tree was
not checked in separate worktrees.
**YES**
fs/btrfs/raid-stripe-tree.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index d2b8995febec9..dd924048c6659 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -95,14 +95,26 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
while (1) {
key.objectid = start;
key.type = BTRFS_RAID_STRIPE_KEY;
- key.offset = 0;
+ key.offset = (u64)-1;
ret = btrfs_search_slot(trans, stripe_root, &key, path, -1, 1);
if (ret < 0)
break;
- if (path->slots[0] == btrfs_header_nritems(path->nodes[0]))
- path->slots[0]--;
+ /*
+ * Search with offset=(u64)-1 ensures we land on the correct
+ * leaf even when the target entry is the first item on a leaf.
+ * Since no real entry has offset=(u64)-1, ret is always 1 and
+ * slot points past the last entry with objectid==start (or
+ * past the end of the leaf if that entry is the last item).
+ * Back up one slot to find the actual entry.
+ */
+ if (path->slots[0] == 0) {
+ /* No entry with objectid <= start exists. */
+ ret = 0;
+ break;
+ }
+ path->slots[0]--;
leaf = path->nodes[0];
slot = path->slots[0];
--
2.53.0
next prev parent reply other threads:[~2026-05-05 9:53 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] drm/amd/pm: Update emit clock logic Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] smb: client: change allocation requirements in smb2_compound_op Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: handle -EAGAIN from btrfs_duplicate_item and refresh stale leaf pointer Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] dpll: export __dpll_pin_change_ntf() for use under dpll_lock Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus ) Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] ASoC: spacemit: move hw constraints from hw_params to startup Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] ALSA: usb-audio: apply quirk for Playstation PDP Riffmaster Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] rculist: add list_splice_rcu() for private lists Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] ALSA: hda/realtek: enable mute LED support on ThinkBook 16p Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] mailbox: cix: Add IRQF_NO_SUSPEND to mailbox interrupt Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.12] ASoC: codecs: wcd937x: fix AUX PA sequencing and mixer controls Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: replace ASSERT with proper error handling in stripe lookup fallback Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] btrfs: handle unexpected free-space-tree key types Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] md/raid5: Fix UAF on IO across the reshape position Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.6] btrfs: apply first key check for readahead when possible Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.6] ASoC: aw88395: Fix kernel panic caused by invalid GPIO error pointer Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: fix wrong min_objectid in btrfs_previous_item() call Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: check return value of btrfs_partially_delete_raid_extent() Sasha Levin
2026-05-05 9:51 ` Sasha Levin [this message]
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: copy devid in btrfs_partially_delete_raid_extent() Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] btrfs: abort transaction in do_remap_reloc_trans() on failure Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] drm/amdkfd: check if vm ready in svm map and unmap to gpu Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260505095149.512052-25-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=johannes.thumshirn@wdc.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=robbieko@synology.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox