Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Michal Grzedzicki <mge@meta.com>, Qu Wenruo <wqu@suse.com>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	Mark Harmstone <mark@harmstone.com>,
	David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>,
	clm@fb.com, linux-btrfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 7.0-6.12] btrfs: fix silent IO error loss in encoded writes and zoned split
Date: Mon, 20 Apr 2026 09:21:14 -0400	[thread overview]
Message-ID: <20260420132314.1023554-280-sashal@kernel.org> (raw)
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Michal Grzedzicki <mge@meta.com>

[ Upstream commit 3cd181cc46d36aa7bd4af85f14639d86a25beaec ]

can_finish_ordered_extent() and btrfs_finish_ordered_zoned() set
BTRFS_ORDERED_IOERR via bare set_bit(). Later,
btrfs_mark_ordered_extent_error() in btrfs_finish_one_ordered() uses
test_and_set_bit(), finds it already set, and skips
mapping_set_error(). The error is never recorded on the inode's
address_space, making it invisible to fsync. For encoded writes this
causes btrfs receive to silently produce files with zero-filled holes.

Fix: replace bare set_bit(BTRFS_ORDERED_IOERR) with
btrfs_mark_ordered_extent_error() which pairs test_and_set_bit() with
mapping_set_error(), guaranteeing the error is recorded exactly once.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: Michal Grzedzicki <mge@meta.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed. Let me compile the full analysis.

---

# PHASE 1: COMMIT MESSAGE FORENSICS

## Step 1.1: PARSE THE SUBJECT LINE
- **Subsystem prefix**: `btrfs`
- **Action verb**: "fix"
- **Summary**: Fix silent IO error loss in encoded writes and zoned
  split
- Record: [btrfs] [fix] [silent IO error loss in encoded writes and
  zoned split paths]

## Step 1.2: PARSE ALL COMMIT MESSAGE TAGS
- **Reviewed-by**: Qu Wenruo `<wqu@suse.com>` — well-known btrfs
  developer/maintainer
- **Reviewed-by**: Johannes Thumshirn `<johannes.thumshirn@wdc.com>` —
  prominent btrfs contributor
- **Reviewed-by**: Mark Harmstone `<mark@harmstone.com>` — btrfs
  developer
- **Signed-off-by**: Michal Grzedzicki `<mge@meta.com>` — author (Meta)
- **Signed-off-by**: David Sterba `<dsterba@suse.com>` — btrfs subsystem
  maintainer
- No Fixes: tag, no Cc: stable — expected for candidate review
- No Reported-by — likely found via code review/testing at Meta
- Record: 3 Reviewed-by from expert btrfs devs, signed off by subsystem
  maintainer.

## Step 1.3: ANALYZE THE COMMIT BODY TEXT
The commit describes a clear, precise bug mechanism:
1. `can_finish_ordered_extent()` and `btrfs_finish_ordered_zoned()` set
   `BTRFS_ORDERED_IOERR` via bare `set_bit()`.
2. Later, `btrfs_mark_ordered_extent_error()` in
   `btrfs_finish_one_ordered()` uses `test_and_set_bit()`, finds the bit
   already set, and **skips** `mapping_set_error()`.
3. The IO error is never recorded on the inode's address_space, making
   it invisible to `fsync`.
4. For encoded writes, `btrfs receive` silently produces files with
   zero-filled holes.

- **Failure mode**: Silent data corruption (zero-filled holes instead of
  actual data)
- **Root cause**: bare `set_bit()` pre-empts the `test_and_set_bit()` in
  the helper that actually records the error
- Record: [Silent data loss bug] [fsync misses IO errors] [Encoded write
  files get zero-filled holes] [Author clearly explains the root cause
  mechanism]

## Step 1.4: DETECT HIDDEN BUG FIXES
This is an explicit bug fix, not hidden. The subject and body directly
describe a data integrity bug.
Record: [Not a hidden fix — explicitly described as a data loss fix]

---

# PHASE 2: DIFF ANALYSIS - LINE BY LINE

## Step 2.1: INVENTORY THE CHANGES
- `fs/btrfs/ordered-data.c`: 1 line changed (line 388)
- `fs/btrfs/zoned.c`: 1 line changed (line 2139)
- Total: 2 lines changed, 0 lines added/removed net
- Functions modified: `can_finish_ordered_extent()`,
  `btrfs_finish_ordered_zoned()`
- Record: [2 files, 2 lines changed, single-file-equivalent surgical
  fix]

## Step 2.2: UNDERSTAND THE CODE FLOW CHANGE
**Hunk 1** (`ordered-data.c:388`):
- Before: `set_bit(BTRFS_ORDERED_IOERR, &ordered->flags)` — sets flag
  only
- After: `btrfs_mark_ordered_extent_error(ordered)` — sets flag AND
  calls `mapping_set_error()`

**Hunk 2** (`zoned.c:2139`):
- Before: `set_bit(BTRFS_ORDERED_IOERR, &ordered->flags)` — sets flag
  only
- After: `btrfs_mark_ordered_extent_error(ordered)` — sets flag AND
  calls `mapping_set_error()`
- Record: [Both hunks: bare set_bit → helper that also records error on
  mapping]

## Step 2.3: IDENTIFY THE BUG MECHANISM
**Category**: Logic / correctness bug → silent data corruption

The bug is a race between setting the error flag and recording the
error:
1. `can_finish_ordered_extent()` uses bare `set_bit()` to set
   `BTRFS_ORDERED_IOERR`
2. `btrfs_finish_one_ordered()` (line 3363) later calls
   `btrfs_mark_ordered_extent_error()`
3. `btrfs_mark_ordered_extent_error()` does `if (!test_and_set_bit(...))
   mapping_set_error()`
4. Since the bit was already set by step 1, step 3 thinks the error was
   already recorded and skips `mapping_set_error()`
5. But the `mapping_set_error()` was NEVER called — the bare `set_bit()`
   didn't do it

Record: [Logic/correctness bug] [test_and_set_bit finds bit already set,
skips recording error to mapping]

## Step 2.4: ASSESS THE FIX QUALITY
- **Obviously correct**: YES. The helper function
  `btrfs_mark_ordered_extent_error()` at line 336-340 does exactly
  what's needed:

```336:340:fs/btrfs/ordered-data.c
void btrfs_mark_ordered_extent_error(struct btrfs_ordered_extent
*ordered)
{
        if (!test_and_set_bit(BTRFS_ORDERED_IOERR, &ordered->flags))
                mapping_set_error(ordered->inode->vfs_inode.i_mapping,
-EIO);
}
```

- **Minimal/surgical**: YES. 2 lines, identical transformation pattern.
- **Regression risk**: Essentially zero. The helper does a superset of
  what the bare `set_bit()` did: same flag setting + additional error
  recording. The `test_and_set_bit()` ensures `mapping_set_error()` is
  called at most once.
- Record: [Obviously correct, minimal, zero regression risk]

---

# PHASE 3: GIT HISTORY INVESTIGATION

## Step 3.1: BLAME THE CHANGED LINES
- **ordered-data.c:388**: Introduced by commit `53df25869a5659`
  (Christoph Hellwig, 2023-05-31) — "btrfs: factor out a
  can_finish_ordered_extent helper". Present since v6.5.
- **zoned.c:2139**: Introduced by commit `71df088c1cc090` (Christoph
  Hellwig, 2023-05-24) — "btrfs: defer splitting of ordered extents
  until I/O completion". Present since v6.5.
- **Helper function** (`btrfs_mark_ordered_extent_error()`): Introduced
  by commit `aa5ccf29173acf` (Josef Bacik, 2024-04-03) — "btrfs: handle
  errors in btrfs_reloc_clone_csums properly". Present since v6.10.

Record: [Buggy code introduced in v6.5 by refactoring commits; helper
exists from v6.10]

## Step 3.2: FOLLOW THE FIXES: TAG
No Fixes: tag present — expected. The commits that introduced the bare
`set_bit()` calls (`53df25869a5659` and `71df088c1cc090`) are the
implicit "fixes targets."
Record: [Implicit fixes: 53df25869a5659 and 71df088c1cc090, both in
v6.5+]

## Step 3.3: CHECK FILE HISTORY FOR RELATED CHANGES
Recent ordered-data.c changes are refactoring (folio conversion, lock
relaxation). No conflicting changes to the `can_finish_ordered_extent()`
function in this area. The fix is self-contained.
Record: [No conflicting recent changes; fix is standalone]

## Step 3.4: CHECK THE AUTHOR'S OTHER COMMITS
Michal Grzedzicki from Meta. Other commits in this tree are in SCSI.
This appears to be a cross-subsystem contributor who found the bug
likely through `btrfs receive` usage at Meta.
Record: [Author is from Meta, likely found bug through production btrfs
receive usage]

## Step 3.5: CHECK FOR DEPENDENT/PREREQUISITE COMMITS
The helper `btrfs_mark_ordered_extent_error()` already exists in the 7.0
tree (and all trees since v6.10). The fix has NO dependencies for v6.10+
trees. For v6.6-v6.9, the helper would need to be backported first.
Record: [No dependencies for 7.0 tree; for older stable trees, helper
(aa5ccf29173acf) may be needed]

---

# PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

## Step 4.1: FIND THE ORIGINAL PATCH DISCUSSION
b4 dig did not find a match (commit not yet in mainline at time of
search). Web search also did not locate the specific patch thread. This
is likely a recently submitted patch that was accepted into David
Sterba's btrfs tree but not yet pushed to mainline.
Record: [Could not locate lore discussion — likely very recent
submission]

## Step 4.2: CHECK WHO REVIEWED THE PATCH
Three Reviewed-by tags from Qu Wenruo, Johannes Thumshirn, and Mark
Harmstone — three core btrfs developers. David Sterba (btrfs maintainer)
signed off, indicating it passed through the official btrfs tree.
Record: [Reviewed by 3 expert btrfs developers; signed off by
maintainer]

## Step 4.3-4.5: SEARCH FOR BUG REPORT / RELATED PATCHES / STABLE
HISTORY
No external bug report found. The commit message specifically mentions
`btrfs receive` producing files with zero-filled holes, suggesting this
was found in a production environment at Meta.
Record: [Likely production-found bug at Meta; no external reports found]

---

# PHASE 5: CODE SEMANTIC ANALYSIS

## Step 5.1: IDENTIFY KEY FUNCTIONS
- `can_finish_ordered_extent()` — processes ordered extent completion
- `btrfs_finish_ordered_zoned()` — handles zoned write completion
- `btrfs_mark_ordered_extent_error()` — the helper being used as the fix

## Step 5.2: TRACE CALLERS
`can_finish_ordered_extent()` is called by:
- `btrfs_finish_ordered_extent()` — called from IO completion paths (bio
  endio)
- `btrfs_mark_ordered_io_finished()` — called from writeback paths

`btrfs_finish_ordered_zoned()` is called by:
- `btrfs_finish_ordered_io()` — the main ordered extent completion
  function

These are core IO completion paths — every data write goes through them.
Record: [Core IO completion path; called for every write operation]

## Step 5.3-5.4: TRACE CALLEES AND CALL CHAIN
The encoded write path is: `btrfs_do_encoded_write()` →
`btrfs_submit_compressed_write()` → bio completion →
`end_bbio_compressed_write()` → `btrfs_finish_ordered_extent()` →
`can_finish_ordered_extent()`. This is the path that `btrfs receive`
uses.
Record: [btrfs receive → encoded write → bio completion → buggy
function]

## Step 5.5: SEARCH FOR SIMILAR PATTERNS
The third `set_bit(BTRFS_ORDERED_IOERR)` in `disk-io.c:4598` is in
`btrfs_destroy_ordered_extents()`, a cleanup path during filesystem
abort/unmount. This is intentionally different — during umount there's
no fsync concern, so bare `set_bit()` is acceptable there.
Record: [disk-io.c case is in cleanup path, doesn't need the fix]

---

# PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS

## Step 6.1: DOES THE BUGGY CODE EXIST IN STABLE TREES?
- Buggy `set_bit()` calls: present since v6.5
- Helper function: present since v6.10
- For 7.0 tree: both the buggy code AND the helper exist. Fix applies
  directly.
- For v6.12.y, v6.6.y: buggy code exists; v6.12 has the helper, v6.6
  does not.
Record: [Bug exists in 7.0 tree; fix applies cleanly]

## Step 6.2: CHECK FOR BACKPORT COMPLICATIONS
The fix is a trivial 2-line substitution. No contextual conflicts
expected for 7.0.
Record: [Clean apply expected for 7.0]

## Step 6.3: CHECK IF RELATED FIXES ARE ALREADY IN STABLE
No related fixes found in this tree for this specific issue.
Record: [No existing fix in stable]

---

# PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

## Step 7.1: IDENTIFY THE SUBSYSTEM AND ITS CRITICALITY
- **Subsystem**: fs/btrfs — filesystem
- **Criticality**: IMPORTANT — btrfs is widely used, and data integrity
  is its primary value proposition
Record: [btrfs filesystem, IMPORTANT criticality]

## Step 7.2: ASSESS SUBSYSTEM ACTIVITY
The btrfs subsystem is very actively developed (48+ commits since v6.6
in ordered-data.c alone).
Record: [Very active subsystem]

---

# PHASE 8: IMPACT AND RISK ASSESSMENT

## Step 8.1: DETERMINE WHO IS AFFECTED
- All btrfs users who encounter IO errors during:
  1. Encoded writes (`btrfs receive` with stream v2/v3)
  2. Zoned device writes where ordered extent splitting fails
Record: [btrfs receive users, zoned device users]

## Step 8.2: DETERMINE THE TRIGGER CONDITIONS
- **Encoded writes**: Any IO error during `btrfs receive` (e.g., disk
  error, corruption)
- **Zoned split**: Memory allocation failure during zoned ordered extent
  splitting
- These are not obscure conditions — disk errors happen, and memory
  pressure with `btrfs receive` on large datasets is common in
  production
Record: [IO error during btrfs receive or zoned write; realistic trigger
conditions]

## Step 8.3: DETERMINE THE FAILURE MODE SEVERITY
- **Silent data corruption**: Files produced by `btrfs receive` have
  zero-filled holes where data should be
- `fsync` returns success because `mapping_set_error()` was never called
- User believes data is safely written when it is not
- **Severity: CRITICAL** — silent data loss is the worst possible
  outcome for a filesystem
Record: [Silent data loss; fsync returns success with corrupted data;
CRITICAL]

## Step 8.4: CALCULATE RISK-BENEFIT RATIO
- **BENEFIT**: Very high — prevents silent data corruption for `btrfs
  receive` users
- **RISK**: Very low — 2-line change, replacing bare `set_bit()` with
  existing helper that does strictly more, obviously correct
Record: [Very high benefit, very low risk — strongly favorable ratio]

---

# PHASE 9: FINAL SYNTHESIS

## Step 9.1: COMPILE THE EVIDENCE

**Evidence FOR backporting:**
- Fixes silent data corruption (the most critical bug category for
  filesystems)
- Extremely small and surgical (2 lines)
- Uses existing well-tested helper function
- Obviously correct by code inspection
- 3 Reviewed-by from expert btrfs developers
- Signed off by btrfs subsystem maintainer (David Sterba)
- Bug affects `btrfs receive` — a critical data migration/backup tool
- All needed infrastructure (the helper) already exists in the 7.0 tree
- Zero regression risk

**Evidence AGAINST backporting:**
- None identified

**Unresolved questions:**
- Could not access lore.kernel.org to read full patch discussion (Anubis
  bot protection)
- Could not find the commit in mainline (likely very recently merged or
  in-flight to next merge window)

## Step 9.2: APPLY THE STABLE RULES CHECKLIST
1. **Obviously correct and tested?** YES — trivial substitution of bare
   `set_bit()` with existing helper; 3 reviews
2. **Fixes a real bug?** YES — silent data loss in `btrfs receive`
3. **Important issue?** YES — data corruption (CRITICAL severity)
4. **Small and contained?** YES — 2 lines, 2 files
5. **No new features or APIs?** YES — uses existing helper
6. **Can apply to stable trees?** YES — clean apply for 7.0 (helper
   exists)

## Step 9.3: CHECK FOR EXCEPTION CATEGORIES
Not an exception category — this is a standard bug fix, and a strong
one.

## Step 9.4: DECISION
This is a textbook stable backport candidate: a tiny, obviously correct
fix for silent data corruption in a widely used filesystem feature
(`btrfs receive`). It uses existing infrastructure, has been thoroughly
reviewed, and carries essentially zero regression risk.

---

# Verification

- [Phase 1] Parsed tags: 3 Reviewed-by (Qu Wenruo, Johannes Thumshirn,
  Mark Harmstone), 2 Signed-off-by (author + David Sterba maintainer)
- [Phase 2] Diff analysis: 2 lines changed —
  `set_bit(BTRFS_ORDERED_IOERR)` → `btrfs_mark_ordered_extent_error()`
  in two locations
- [Phase 2] Verified `btrfs_mark_ordered_extent_error()` at line 336-340
  does `test_and_set_bit() + mapping_set_error()` — confirmed exact
  mechanism described in commit message
- [Phase 2] Verified `btrfs_finish_one_ordered()` at inode.c:3363 calls
  `btrfs_mark_ordered_extent_error()` which finds bit already set and
  skips `mapping_set_error()` — confirmed the "double-set" bug path
- [Phase 3] git blame: `ordered-data.c:388` introduced by 53df25869a5659
  (Christoph Hellwig, v6.5)
- [Phase 3] git blame: `zoned.c:2139` introduced by 71df088c1cc090
  (Christoph Hellwig, v6.5)
- [Phase 3] git show aa5ccf29173acf: confirmed
  `btrfs_mark_ordered_extent_error()` introduced in v6.10 by Josef Bacik
- [Phase 3] git merge-base: confirmed helper exists in v6.10+, buggy
  code in v6.5+
- [Phase 4] b4 dig: no match found (likely very recent patch not yet
  indexed)
- [Phase 4] UNVERIFIED: Could not access lore.kernel.org discussion due
  to bot protection
- [Phase 5] Traced call chain: `btrfs_do_encoded_write()` → bio
  completion → `btrfs_finish_ordered_extent()` →
  `can_finish_ordered_extent()` — confirmed encoded writes reach buggy
  code
- [Phase 5] Verified disk-io.c:4598 `set_bit()` is in
  `btrfs_destroy_ordered_extents()` (cleanup path) — correctly not fixed
- [Phase 5] Verified 3 remaining bare `set_bit(BTRFS_ORDERED_IOERR)`
  calls: 2 fixed by this commit, 1 in cleanup path (acceptable)
- [Phase 6] Confirmed buggy code exists in 7.0 tree (read both files
  directly)
- [Phase 6] Confirmed fix applies cleanly — helper exists, code context
  matches
- [Phase 8] Failure mode: silent data loss (zero-filled holes) —
  CRITICAL severity

**YES**

 fs/btrfs/ordered-data.c | 2 +-
 fs/btrfs/zoned.c        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 5df02c707aee6..b65c1f1e2956e 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -385,7 +385,7 @@ static bool can_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
 	}
 
 	if (!uptodate)
-		set_bit(BTRFS_ORDERED_IOERR, &ordered->flags);
+		btrfs_mark_ordered_extent_error(ordered);
 
 	if (ordered->bytes_left)
 		return false;
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 0cd7fd3fcfa3a..d728c3bafc092 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2136,7 +2136,7 @@ void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
 			continue;
 		}
 		if (!btrfs_zoned_split_ordered(ordered, logical, len)) {
-			set_bit(BTRFS_ORDERED_IOERR, &ordered->flags);
+			btrfs_mark_ordered_extent_error(ordered);
 			btrfs_err(fs_info, "failed to split ordered extent");
 			goto out;
 		}
-- 
2.53.0


      parent reply	other threads:[~2026-04-20 13:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260420132314.1023554-1-sashal@kernel.org>
2026-04-20 13:16 ` [PATCH AUTOSEL 7.0-5.15] btrfs: don't allow log trees to consume global reserve or overcommit metadata Sasha Levin
2026-04-20 13:18 ` [PATCH AUTOSEL 7.0] btrfs: avoid GFP_ATOMIC allocations in qgroup free paths Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-5.10] btrfs: replace BUG_ON() with error return in cache_save_setup() Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-5.10] btrfs: be less aggressive with metadata overcommit when we can do full flushing Sasha Levin
2026-04-22 12:24   ` Aleksandar Gerasimovski
2026-04-22 12:28     ` Aleksandar Gerasimovski
2026-04-22 19:14     ` David Sterba
2026-04-20 13:20 ` [PATCH AUTOSEL 6.18] btrfs: fix zero size inode with non-zero size after log replay Sasha Levin
2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-6.19] btrfs: zoned: cap delayed refs metadata reservation to avoid overcommit Sasha Levin
2026-04-20 13:21 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260420132314.1023554-280-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark@harmstone.com \
    --cc=mge@meta.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox