* [PATCH AUTOSEL 6.19-5.15] btrfs: replace BUG() with error handling in __btrfs_balance()
[not found] <20260223123738.1532940-1-sashal@kernel.org>
@ 2026-02-23 12:37 ` Sasha Levin
2026-02-23 12:37 ` [PATCH AUTOSEL 6.19] btrfs: do not ASSERT() when the fs flips RO inside btrfs_repair_io_failure() Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-02-23 12:37 UTC (permalink / raw)
To: patches, stable
Cc: Adarsh Das, Qu Wenruo, David Sterba, Sasha Levin, clm,
linux-btrfs, linux-kernel
From: Adarsh Das <adarshdas950@gmail.com>
[ Upstream commit be6324a809dbda76d5fdb23720ad9b20e5c1905c ]
We search with offset (u64)-1 which should never match exactly.
Previously this was handled with BUG(). Now logs an error
and return -EUCLEAN.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Adarsh Das <adarshdas950@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Analysis
### What the commit does
This commit replaces a `BUG()` call with proper error handling in
`__btrfs_balance()`. The `BUG()` is triggered when `btrfs_search_slot()`
returns an exact match (ret == 0) for key offset `(u64)-1`, which should
theoretically never happen. The existing code has a `/* FIXME break ?
*/` comment from 2012, indicating the developers always knew `BUG()` was
wrong here.
The fix:
1. Replaces `BUG()` with `btrfs_err()` logging + return `-EUCLEAN`
2. Properly releases `reclaim_bgs_lock` mutex before `goto error`
(fixing what would have been a mutex held across a panic)
3. Uses `unlikely()` to indicate this is an exceptional path
### Bug severity assessment
**The existing `BUG()` crashes the kernel** (panic/oops). While the
condition "should never happen," if it does occur (e.g., due to
filesystem corruption, a prior failed relocate as the comment says, or a
metadata inconsistency), the result is a full kernel crash instead of a
graceful error return. This is in the btrfs balance path, which is user-
triggered via `btrfs balance start`.
Key points:
- **BUG() = kernel crash** - This is a real fix that prevents a kernel
panic
- **User-triggerable**: The balance operation is initiated by userspace,
so a corrupted filesystem could trigger this crash
- **The fix is small and surgical**: Only changes the error handling for
one condition
- **Properly handles mutex**: The new code correctly unlocks
`reclaim_bgs_lock` before the error path
- **Well-reviewed**: Reviewed by Qu Wenruo and David Sterba (btrfs
maintainer)
- **BUG() has existed since 2012** (commit c9e9f97bdfb64d), affecting
all stable trees
### Stable criteria evaluation
- **Obviously correct**: Yes - replacing BUG() with error handling is
well-understood
- **Fixes a real bug**: Yes - a kernel crash/panic on a theoretically-
impossible-but-not-actually-impossible condition
- **Small and contained**: Yes - one file, simple logic change in a
single function
- **No new features**: Correct - pure error handling improvement
- **Risk**: Very low - the only change is what happens when ret == 0,
and the new behavior (return error) is strictly better than crashing
### Verification
- `git blame` confirmed the BUG() has been present since commit
c9e9f97bdfb64d (2012, "Btrfs: add basic restriper infrastructure")
- Read the code at lines 4104-4116: confirmed `reclaim_bgs_lock` is held
when BUG() fires, so the panic would also leave a mutex locked
- The `/* FIXME break ? */` comment confirms this was a known issue
- The new code properly calls `mutex_unlock()` before `goto error`,
matching the pattern used at line 4107 for `ret < 0`
- Reviewed-by from Qu Wenruo (btrfs developer) and David Sterba (btrfs
maintainer)
- The commit exists as be6324a809dbd in the tree, dated 2026-02-03
- The affected function `__btrfs_balance()` has existed for many years
and is present in all stable trees
### Risk vs Benefit
- **Benefit**: Prevents kernel crash (BUG/panic) on a condition that
could occur with corrupted filesystems
- **Risk**: Near-zero - the condition was previously a crash; now it's a
graceful error return. No behavioral change for the normal (ret != 0)
path.
This is a textbook stable candidate: a small, well-reviewed fix that
replaces a kernel crash with proper error handling in a long-standing
code path. BUG() removal in favor of error handling is one of the most
common and safest types of stable backports.
**YES**
fs/btrfs/volumes.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8a08412f3529a..14d988c3ef4f3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4112,8 +4112,14 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
* this shouldn't happen, it means the last relocate
* failed
*/
- if (ret == 0)
- BUG(); /* FIXME break ? */
+ if (unlikely(ret == 0)) {
+ btrfs_err(fs_info,
+ "unexpected exact match of CHUNK_ITEM in chunk tree, offset 0x%llx",
+ key.offset);
+ mutex_unlock(&fs_info->reclaim_bgs_lock);
+ ret = -EUCLEAN;
+ goto error;
+ }
ret = btrfs_previous_item(chunk_root, path, 0,
BTRFS_CHUNK_ITEM_KEY);
--
2.51.0
^ permalink raw reply related [flat|nested] 2+ messages in thread* [PATCH AUTOSEL 6.19] btrfs: do not ASSERT() when the fs flips RO inside btrfs_repair_io_failure()
[not found] <20260223123738.1532940-1-sashal@kernel.org>
2026-02-23 12:37 ` [PATCH AUTOSEL 6.19-5.15] btrfs: replace BUG() with error handling in __btrfs_balance() Sasha Levin
@ 2026-02-23 12:37 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-02-23 12:37 UTC (permalink / raw)
To: patches, stable
Cc: Qu Wenruo, Christoph Hellwig, David Sterba, Sasha Levin, clm,
linux-btrfs, linux-kernel
From: Qu Wenruo <wqu@suse.com>
[ Upstream commit 8ceaad6cd6e7fa5f73b0b2796a2e85d75d37e9f3 ]
[BUG]
There is a bug report that when btrfs hits ENOSPC error in a critical
path, btrfs flips RO (this part is expected, although the ENOSPC bug
still needs to be addressed).
The problem is after the RO flip, if there is a read repair pending, we
can hit the ASSERT() inside btrfs_repair_io_failure() like the following:
BTRFS info (device vdc): relocating block group 30408704 flags metadata|raid1
------------[ cut here ]------------
BTRFS: Transaction aborted (error -28)
WARNING: fs/btrfs/extent-tree.c:3235 at __btrfs_free_extent.isra.0+0x453/0xfd0, CPU#1: btrfs/383844
Modules linked in: kvm_intel kvm irqbypass
[...]
---[ end trace 0000000000000000 ]---
BTRFS info (device vdc state EA): 2 enospc errors during balance
BTRFS info (device vdc state EA): balance: ended with status: -30
BTRFS error (device vdc state EA): parent transid verify failed on logical 30556160 mirror 2 wanted 8 found 6
BTRFS error (device vdc state EA): bdev /dev/nvme0n1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
[...]
assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938
------------[ cut here ]------------
assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938
kernel BUG at fs/btrfs/bio.c:938!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 868 Comm: kworker/u8:13 Tainted: G W N 6.19.0-rc6+ #4788 PREEMPT(full)
Tainted: [W]=WARN, [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Workqueue: btrfs-endio simple_end_io_work
RIP: 0010:btrfs_repair_io_failure.cold+0xb2/0x120
RSP: 0000:ffffc90001d2bcf0 EFLAGS: 00010246
RAX: 0000000000000051 RBX: 0000000000001000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8305cf42 RDI: 00000000ffffffff
RBP: 0000000000000002 R08: 00000000fffeffff R09: ffffffff837fa988
R10: ffffffff8327a9e0 R11: 6f69747265737361 R12: ffff88813018d310
R13: ffff888168b8a000 R14: ffffc90001d2bd90 R15: ffff88810a169000
FS: 0000000000000000(0000) GS:ffff8885e752c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
------------[ cut here ]------------
[CAUSE]
The cause of -ENOSPC error during the test case btrfs/124 is still
unknown, although it's known that we still have cases where metadata can
be over-committed but can not be fulfilled correctly, thus if we hit
such ENOSPC error inside a critical path, we have no choice but abort
the current transaction.
This will mark the fs read-only.
The problem is inside the btrfs_repair_io_failure() path that we require
the fs not to be mount read-only. This is normally fine, but if we are
doing a read-repair meanwhile the fs flips RO due to a critical error,
we can enter btrfs_repair_io_failure() with super block set to
read-only, thus triggering the above crash.
[FIX]
Just replace the ASSERT() with a proper return if the fs is already
read-only.
Reported-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-btrfs/20260126045555.GB31641@lst.de/
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
The same ASSERT exists in v6.1 too (in extent_io.c instead of bio.c).
The bug is present across all stable trees.
## Analysis
### What problem does this commit solve?
This commit fixes a **kernel BUG/crash** (not just a warning) that
occurs when:
1. The btrfs filesystem encounters a critical error (e.g., ENOSPC in a
critical path)
2. The filesystem flips to read-only mode (transaction abort)
3. Meanwhile, a concurrent read-repair operation calls
`btrfs_repair_io_failure()`
4. The ASSERT checks `!(fs_info->sb->s_flags & SB_RDONLY)` and fails
5. Since btrfs's `ASSERT()` macro calls `BUG()`, this causes a kernel
oops/crash
The crash is confirmed by the stack trace in the commit message showing
`kernel BUG at fs/btrfs/bio.c:938!`.
### Does it meet stable kernel rules?
1. **Obviously correct and tested**: Yes. The fix replaces an ASSERT
(which calls BUG()) with a graceful `return 0` when the filesystem is
read-only. It has `Tested-by: Christoph Hellwig` and `Reviewed-by:
David Sterba` (btrfs maintainer).
2. **Fixes a real bug**: Yes. A kernel BUG/crash is triggered. The bug
report is linked and comes from Christoph Hellwig, a prominent kernel
developer.
3. **Fixes an important issue**: Yes. This is a kernel crash (oops via
BUG()). If a filesystem encounters a critical error and goes RO, a
concurrent read-repair shouldn't crash the entire system.
4. **Small and contained**: Yes. The change removes one line
(`ASSERT(...)`) and adds a 4-line `if (unlikely(sb_rdonly(...)))
return 0;` check. Total: ~6 lines changed in one file.
5. **Does NOT introduce new features**: Correct. It only changes error
handling.
### Risk vs Benefit
- **Risk**: Extremely low. Replacing a crash (BUG()) with a graceful
return (0 = success, skip repair) is safe. If the filesystem is
already RO, skipping the repair write is correct behavior — you can't
write to a RO filesystem anyway.
- **Benefit**: High. Prevents kernel crash in a race condition that real
users hit (reported by Christoph Hellwig during btrfs/124 test).
### Backport considerations
The function signature has changed significantly across versions:
- **v6.6**: Uses `struct page *page` parameter — different signature but
the ASSERT is identical
- **v6.12**: Uses `struct folio *folio` parameter — different signature
but the ASSERT is identical
- **Current**: Uses `const phys_addr_t paddrs[]` parameter
However, the **fix itself** (remove ASSERT, add `if (sb_rdonly(...))
return 0`) is signature-independent and applies to all versions. A minor
adaptation to the surrounding context will be needed for each stable
tree, but the core fix is trivial.
### Verification
- **Verified**: The ASSERT macro in btrfs calls `BUG()` (confirmed in
`fs/btrfs/messages.h` lines 139-152)
- **Verified**: The buggy ASSERT exists in v6.12 and v6.6 stable trees
(checked via `git show v6.12:fs/btrfs/bio.c` and `git show
v6.6:fs/btrfs/bio.c`)
- **Verified**: The buggy ASSERT also exists in v6.1 (in
`fs/btrfs/extent_io.c` rather than `bio.c`, function named
`repair_io_failure`)
- **Verified**: The function has two callers: `btrfs_end_repair_bio()`
(bio.c) and `btrfs_repair_eb_io_failure()` (disk-io.c), both in I/O
completion paths that can race with filesystem RO transitions
- **Verified**: The commit has `Tested-by: Christoph Hellwig`,
`Reviewed-by: David Sterba`, and `Reported-by: Christoph Hellwig` with
a link to the bug report
- **Verified**: The fix is a clean 6-line change — remove ASSERT, add
early return with comment
- **Verified**: The function signature differs across stable trees (page
vs folio vs phys_addr_t), but the fix concept (remove ASSERT, add
sb_rdonly check) is identical and straightforward to adapt
This is a textbook stable backport candidate: a small, well-reviewed fix
for a kernel crash in a race condition, reported by a trusted developer,
in a widely-used filesystem. The fix is obviously correct — when the fs
is RO, there's nothing to repair and crashing is worse than skipping.
**YES**
fs/btrfs/bio.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index fa1d321a2fb83..45bd3c3b6d793 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -935,7 +935,6 @@ int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 fileoff,
struct bio *bio = NULL;
int ret = 0;
- ASSERT(!(fs_info->sb->s_flags & SB_RDONLY));
BUG_ON(!mirror_num);
/* Basic alignment checks. */
@@ -947,6 +946,13 @@ int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 fileoff,
ASSERT(step <= length);
ASSERT(is_power_of_2(step));
+ /*
+ * The fs either mounted RO or hit critical errors, no need
+ * to continue repairing.
+ */
+ if (unlikely(sb_rdonly(fs_info->sb)))
+ return 0;
+
if (btrfs_repair_one_zone(fs_info, logical))
return 0;
--
2.51.0
^ permalink raw reply related [flat|nested] 2+ messages in thread