public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: robbieko <robbieko@synology.com>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>,
	clm@fb.com, linux-btrfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 7.0-6.18] btrfs: fix raid stripe search missing entries at leaf boundaries
Date: Tue,  5 May 2026 05:51:41 -0400	[thread overview]
Message-ID: <20260505095149.512052-25-sashal@kernel.org> (raw)
In-Reply-To: <20260505095149.512052-1-sashal@kernel.org>

From: robbieko <robbieko@synology.com>

[ Upstream commit 2aef5cb1dcf9b3e1be3895a6477dc065e618aab8 ]

In btrfs_delete_raid_extent(), the search key uses offset=0. When the
target stripe entry is the first item on a leaf, btrfs_search_slot()
may land on the previous leaf and decrementing the slot from nritems
still points to the wrong entry, causing the stripe extent to be
silently missed.

Fix this by searching with offset=(u64)-1 instead. Since no real stripe
entry has this offset, btrfs_search_slot() always returns 1 with the
slot pointing past the last matching objectid entry. Then unconditionally
decrement the slot with a proper slots[0]==0 early-exit check to handle
the case where no matching entry exists.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Decision Summary
Backport this to stable trees that contain the
`btrfs_delete_raid_extent()` `offset = 0` search logic, verified locally
in `v6.13` through `v7.0`. It is a small Btrfs filesystem metadata
correctness fix: a leaf-boundary search miss can leave a RAID stripe-
tree entry undeleted, and that tree is used later to derive read-side
physical stripe mappings.

The main caveat is scope: RAID stripe tree support is behind
`CONFIG_BTRFS_EXPERIMENTAL`, so the affected population is config and
filesystem-feature specific. Still, for those users, the benefit
outweighs the low patch risk.

## Phase Walkthrough
Phase 1: Commit Message Forensics

- Step 1.1 Record: subsystem `btrfs`; action verb `fix`; intent is to
  prevent RAID stripe-tree deletion from missing entries at B-tree leaf
  boundaries.
- Step 1.2 Record: tags are `Reviewed-by: Johannes Thumshirn
  <johannes.thumshirn@wdc.com>`, `Signed-off-by: robbieko
  <robbieko@synology.com>`, `Signed-off-by: David Sterba
  <dsterba@suse.com>`. No `Fixes:`, `Reported-by:`, `Tested-by:`, `Cc:
  stable`, or `Link:` tags.
- Step 1.3 Record: the described bug is that searching with `offset = 0`
  can land on the previous leaf when the target stripe item is the first
  item in the next leaf; the existing decrement then inspects the wrong
  entry and silently misses the stripe extent. No affected kernel
  versions are stated in the message.
- Step 1.4 Record: this is not hidden cleanup; it is an explicit
  correctness bug fix.

Phase 2: Diff Analysis

- Step 2.1 Record: one file changed, `fs/btrfs/raid-stripe-tree.c`,
  +15/-3 in `btrfs_delete_raid_extent()`. Scope is single-function
  surgical.
- Step 2.2 Record: before, the delete loop searched `(objectid=start,
  type=RAID_STRIPE, offset=0)` and only decremented the slot when the
  slot was at `nritems`. After, it searches with `offset=(u64)-1`,
  checks `slots[0] == 0`, then always decrements to the last candidate
  item.
- Step 2.3 Record: bug category is logic/correctness in B-tree search
  positioning. The fix uses Btrfs key ordering by `objectid`, then
  `type`, then `offset`, and `btrfs_bin_search()` semantics where a
  missing key returns the insertion slot.
- Step 2.4 Record: fix quality is good for valid filesystem state:
  small, local, no API changes. Regression risk is low. One review
  concern exists: David Sterba noted that an exact corrupt
  `offset=(u64)-1` key should ideally be handled as `-EUCLEAN`; that is
  robustness against corrupted/fuzzed images, not the normal valid-state
  bug being fixed.

Phase 3: Git History Investigation

- Step 3.1 Record: blame shows the delete function was introduced by
  `ca41504efda646` in `v6.7`, but the specific `key.offset = 0` logic
  and `nritems` decrement pattern were introduced by `6aea95ee318890`
  (`btrfs: implement partial deletion of RAID stripe extents`), first
  contained in `v6.13`.
- Step 3.2 Record: no `Fixes:` tag, so no tagged original commit to
  follow. Blame nevertheless identifies `6aea95ee318890` as the relevant
  introducer.
- Step 3.3 Record: related recent commits in master are a six-patch
  deletion-path fix series: copy missing devid, this leaf-boundary
  search fix, wrong `btrfs_previous_item()` min objectid, ASSERT-to-
  error handling, `-EAGAIN`/stale leaf handling, and return-value
  checking.
- Step 3.4 Record: author `robbieko` authored all six related RAID
  stripe-tree deletion fixes in that series. I did not verify them as a
  subsystem maintainer; David Sterba committed the series and Johannes
  reviewed this patch.
- Step 3.5 Record: candidate applies standalone to the current `7.0.3`
  worktree via `git apply --check`; companion patches are related fixes
  but not prerequisites for this hunk.

Phase 4: Mailing List And External Research

- Step 4.1 Record: `b4 dig -c 2aef5cb1dcf9b` found the original
  submission at `https://patch.msgid.link/20260413065249.2320122-3-
  robbieko@synology.com`. `b4 dig -a` found only v1 of the six-patch
  series.
- Step 4.2 Record: `b4 dig -w` showed original recipients were
  `robbieko` and `linux-btrfs@vger.kernel.org`. The thread later
  includes Johannes Thumshirn and David Sterba review/maintainer
  feedback.
- Step 4.3 Record: no separate bug report or syzbot report was linked.
- Step 4.4 Record: series context confirms this is patch 2/6 in “fix
  multiple bugs in raid-stripe-tree deletion path.” Johannes asked for
  tests for the conditions; Johannes also gave `Reviewed-by` on this
  patch. David later said the series was added to `for-next`.
- Step 4.5 Record: direct lore stable search was blocked by Anubis; web
  search did not find stable-specific discussion. This remains
  unverified.

Phase 5: Code Semantic Analysis

- Step 5.1 Record: modified function is `btrfs_delete_raid_extent()`.
- Step 5.2 Record: callers are `do_free_extent_accounting()` in
  `fs/btrfs/extent-tree.c` and RAID stripe-tree selftests.
  `do_free_extent_accounting()` is reached when a data extent’s refs
  drop to zero in `__btrfs_free_extent()`.
- Step 5.3 Record: key callees include `btrfs_search_slot()`,
  `btrfs_item_key_to_cpu()`, `btrfs_del_item()`,
  `btrfs_previous_item()`, and partial deletion helpers.
- Step 5.4 Record: user reachability is through Btrfs extent
  deletion/accounting paths; `file.c` calls `btrfs_free_extent()` while
  dropping file extents. Exact VFS entrypoints were not fully traced,
  but the path is filesystem-operation reachable on RST-enabled
  filesystems.
- Step 5.5 Record: similar verified patterns include
  `btrfs_search_prev_slot()` checking `slot == 0` before decrementing.
  `zoned.c` also demonstrates the Btrfs pattern of treating impossible
  exact matches as `-EUCLEAN`, matching David’s review concern.

Phase 6: Stable Tree Analysis

- Step 6.1 Record: `v6.6` lacks this file/feature; `v6.12` has
  `key.offset = length`; `v6.13` through `v7.0` have `key.offset = 0`
  plus the vulnerable `nritems` decrement. Candidate is in `v7.1-rc2`,
  not in `v7.0`.
- Step 6.2 Record: backport difficulty is clean for current `7.0.3` by
  `git apply --check`; older affected trees appear to have the same
  local search pattern, but exact apply was not checked on separate
  worktrees.
- Step 6.3 Record: no alternate stable fix for the same subject was
  found locally; stable-list search was blocked, so external stable
  history is partially unverified.

Phase 7: Subsystem Context

- Step 7.1 Record: subsystem is Btrfs filesystem, specifically RAID
  stripe tree. Criticality is important for users of that filesystem
  feature, but not universal.
- Step 7.2 Record: file history shows active development and several
  recent bug fixes in the same deletion path, indicating this code is
  relatively new and still being hardened.

Phase 8: Impact And Risk

- Step 8.1 Record: affected population is config-specific and feature-
  specific: Btrfs filesystems using `RAID_STRIPE_TREE`, which is
  supported only when `CONFIG_BTRFS_EXPERIMENTAL` is enabled.
- Step 8.2 Record: trigger is deleting/freeing data extents where the
  matching stripe item is first in a leaf. Commonness of the leaf-
  boundary condition is data-layout dependent. User triggering is
  partially verified via file extent deletion paths, subject to
  filesystem permissions and the RST feature being enabled.
- Step 8.3 Record: failure mode is stale/missed RAID stripe-tree
  deletion. Severity is high for affected users because
  `btrfs_get_raid_extent_offset()` is used for read mapping and sets
  `stripe->physical` from stripe-tree contents.
- Step 8.4 Record: benefit is medium-high for affected RST users; risk
  is low because the change is +15/-3, one function, no ABI/API change,
  and accepted by Btrfs maintainers.

Phase 9: Final Synthesis

- Step 9.1 Record: evidence for backporting: real metadata correctness
  bug, verified affected stable code in `v6.13`-`v7.0`, small fix,
  reviewed by Johannes, committed by David, clean apply to current
  stable worktree. Evidence against: experimental/config-specific
  feature, no `Reported-by`/`Tested-by`, no explicit stable nomination,
  companion fixes in same area, and one unaddressed corrupt-image
  robustness comment.
- Step 9.2 Record: stable rules: obviously correct for valid state yes;
  fixes real bug yes; important issue yes for RST metadata/read mapping
  correctness; small and contained yes; no new feature/API yes; applies
  to current stable tree yes.
- Step 9.3 Record: no exception category like device ID, quirk, DT,
  build, or docs applies.
- Step 9.4 Record: decision is YES, limited to stable trees containing
  the `offset = 0` deletion search logic.

## Verification
- Phase 1: Parsed `git show 2aef5cb1dcf9b` and confirmed tags, subject,
  message, and +15/-3 diff.
- Phase 2: Read `fs/btrfs/raid-stripe-tree.c` and confirmed the exact
  modified code in `btrfs_delete_raid_extent()`.
- Phase 3: Ran blame around the changed lines; confirmed
  `6aea95ee318890` introduced `offset = 0`, and tag containment starts
  at `v6.13`.
- Phase 3: Inspected `ca41504efda646` and confirmed the function existed
  earlier with a different `offset = length` search.
- Phase 4: `b4 dig -c 2aef5cb1dcf9b` found the patch thread; `-a` found
  only v1; `-w` showed original recipients.
- Phase 4: Saved and searched the mbox; confirmed Johannes’ `Reviewed-
  by`, test request, David’s `EUCLEAN` robustness comment, and David’s
  “Added to for-next”.
- Phase 5: Used code search and reads to trace
  `btrfs_delete_raid_extent()` from `do_free_extent_accounting()` and
  `__btrfs_free_extent()`, with file deletion paths calling
  `btrfs_free_extent()`.
- Phase 5: Verified read-side RST use: `set_io_stripe()` calls
  `btrfs_get_raid_extent_offset()` for reads, and that function sets
  `stripe->physical` from the stripe-tree item.
- Phase 6: Checked historical tags: `v6.12` uses `offset = length`;
  `v6.13`-`v7.0` use `offset = 0` and the vulnerable slot handling.
- Phase 6: Ran `git apply --check` for the upstream patch against
  current `7.0.3`; it succeeded.
- UNVERIFIED: Direct lore stable search was blocked by Anubis, so
  stable-list discussion could not be confirmed.
- UNVERIFIED: Exact apply status on every older affected stable tree was
  not checked in separate worktrees.

**YES**

 fs/btrfs/raid-stripe-tree.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index d2b8995febec9..dd924048c6659 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -95,14 +95,26 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 	while (1) {
 		key.objectid = start;
 		key.type = BTRFS_RAID_STRIPE_KEY;
-		key.offset = 0;
+		key.offset = (u64)-1;
 
 		ret = btrfs_search_slot(trans, stripe_root, &key, path, -1, 1);
 		if (ret < 0)
 			break;
 
-		if (path->slots[0] == btrfs_header_nritems(path->nodes[0]))
-			path->slots[0]--;
+		/*
+		 * Search with offset=(u64)-1 ensures we land on the correct
+		 * leaf even when the target entry is the first item on a leaf.
+		 * Since no real entry has offset=(u64)-1, ret is always 1 and
+		 * slot points past the last entry with objectid==start (or
+		 * past the end of the leaf if that entry is the last item).
+		 * Back up one slot to find the actual entry.
+		 */
+		if (path->slots[0] == 0) {
+			/* No entry with objectid <= start exists. */
+			ret = 0;
+			break;
+		}
+		path->slots[0]--;
 
 		leaf = path->nodes[0];
 		slot = path->slots[0];
-- 
2.53.0


  parent reply	other threads:[~2026-05-05  9:53 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] drm/amd/pm: Update emit clock logic Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] smb: client: change allocation requirements in smb2_compound_op Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: handle -EAGAIN from btrfs_duplicate_item and refresh stale leaf pointer Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] dpll: export __dpll_pin_change_ntf() for use under dpll_lock Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus ) Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] ASoC: spacemit: move hw constraints from hw_params to startup Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] ALSA: usb-audio: apply quirk for Playstation PDP Riffmaster Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] rculist: add list_splice_rcu() for private lists Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] ALSA: hda/realtek: enable mute LED support on ThinkBook 16p Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] mailbox: cix: Add IRQF_NO_SUSPEND to mailbox interrupt Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] ASoC: codecs: wcd937x: fix AUX PA sequencing and mixer controls Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: replace ASSERT with proper error handling in stripe lookup fallback Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] btrfs: handle unexpected free-space-tree key types Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] md/raid5: Fix UAF on IO across the reshape position Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.6] btrfs: apply first key check for readahead when possible Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.6] ASoC: aw88395: Fix kernel panic caused by invalid GPIO error pointer Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: fix wrong min_objectid in btrfs_previous_item() call Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: check return value of btrfs_partially_delete_raid_extent() Sasha Levin
2026-05-05  9:51 ` Sasha Levin [this message]
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: copy devid in btrfs_partially_delete_raid_extent() Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] btrfs: abort transaction in do_remap_reloc_trans() on failure Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] drm/amdkfd: check if vm ready in svm map and unmap to gpu Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260505095149.512052-25-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=robbieko@synology.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox