* [PATCH v2] ext4: synchronize free block counter when detecting corruption
@ 2025-10-10 7:38 Albin Babu Varghese
2025-11-06 15:30 ` Theodore Ts'o
0 siblings, 1 reply; 3+ messages in thread
From: Albin Babu Varghese @ 2025-10-10 7:38 UTC (permalink / raw)
To: Theodore Ts'o, Andreas Dilger
Cc: Albin Babu Varghese, syzbot+f3185be57d7e8dda32b8,
Ahmet Eray Karadag, linux-ext4, linux-kernel
When ext4_mb_generate_buddy() detects block group descriptor
corruption (free block count mismatch between descriptor and
bitmap), it corrects the in-memory group descriptor (grp->bb_free)
but does not synchronize the percpu free clusters counter.
This causes delayed allocation to read stale counter values when
checking for available space. The allocator believes space is
available based on the stale counter, makes reservation promises,
but later fails during writeback when trying to allocate actual
blocks from the bitmap. This results in "Delayed block allocation
failed" errors and potential system crashes.
Fix by updating the percpu counter with the correction delta when
corruption is detected:
s64 correction = (s64)free - (s64)grp->bb_free;
grp->bb_free = free;
percpu_counter_add(&sbi->s_freeclusters_counter, correction);
This ensures the global counter stays synchronized with the
corrected group descriptor, preventing false promises and crashes.
Reported-by: syzbot+f3185be57d7e8dda32b8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3185be57d7e8dda32b8
Tested-by: syzbot+f3185be57d7e8dda32b8@syzkaller.appspotmail.com
Co-developed-by: Ahmet Eray Karadag <eraykrdg1@gmail.com>
Signed-off-by: Ahmet Eray Karadag <eraykrdg1@gmail.com>
Signed-off-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com>
---
Changes in v2:
- v1 added bounds checking in ext4_write_inline_data_end() to reject
writes beyond inline capacity
- v2 fixes the root cause by synchronizing the percpu free clusters
counter when corruption is detected in ext4_mb_generate_buddy()
- Addresses review feedback from Ted Ts'o and Darrick Wong
Link to v1:
https://lore.kernel.org/all/20251007234221.28643-2-eraykrdg1@gmail.com/T/
---
fs/ext4/mballoc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 9087183602e4..956e5fa307ca 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1290,8 +1290,11 @@ void ext4_mb_generate_buddy(struct super_block *sb,
/*
* If we intend to continue, we consider group descriptor
* corrupt and update bb_free using bitmap value
+ * Also update the global free clusters counter to stay in sync.
*/
+ s64 correction = (s64)free - (s64)grp->bb_free;
grp->bb_free = free;
+ percpu_counter_add(&sbi->s_freeclusters_counter, correction);
ext4_mark_group_bitmap_corrupted(sb, group,
EXT4_GROUP_INFO_BBITMAP_CORRUPT);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v2] ext4: synchronize free block counter when detecting corruption 2025-10-10 7:38 [PATCH v2] ext4: synchronize free block counter when detecting corruption Albin Babu Varghese @ 2025-11-06 15:30 ` Theodore Ts'o 2025-11-11 8:45 ` Albin Babu Varghese 0 siblings, 1 reply; 3+ messages in thread From: Theodore Ts'o @ 2025-11-06 15:30 UTC (permalink / raw) To: Albin Babu Varghese Cc: Andreas Dilger, syzbot+f3185be57d7e8dda32b8, Ahmet Eray Karadag, linux-ext4, linux-kernel On Fri, Oct 10, 2025 at 03:38:00AM -0400, Albin Babu Varghese wrote: > When ext4_mb_generate_buddy() detects block group descriptor > corruption (free block count mismatch between descriptor and > bitmap), it corrects the in-memory group descriptor (grp->bb_free) > but does not synchronize the percpu free clusters counter. Actually, we do. This happens in ext4_mark_group_bitmap_corrupted in fs/ext4/super.c. if (flags & EXT4_GROUP_INFO_BBITMAP_CORRUPT) { ret = ext4_test_and_set_bit(EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT, &grp->bb_state); if (!ret) percpu_counter_sub(&sbi->s_freeclusters_counter, grp->bb_free); } So we've *already* subtracted out the blocks that were in the block group which we've busied out. > This causes delayed allocation to read stale counter values when > checking for available space. The allocator believes space is > available based on the stale counter, makes reservation promises, > but later fails during writeback when trying to allocate actual > blocks from the bitmap. This results in "Delayed block allocation > failed" errors and potential system crashes. I suspect there is something else going on with s_freeclusters_counter being incorrect, but adding an additional correction to s_freeclusters_counter is not the answer. How is the system crashing? If we have errors=continue, then we really shouldn't let the system crash. If there is delayed allocation failures, the user might lose data, but if the user really cares about that, they shouldn't be using errors=continue. - Ted ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] ext4: synchronize free block counter when detecting corruption 2025-11-06 15:30 ` Theodore Ts'o @ 2025-11-11 8:45 ` Albin Babu Varghese 0 siblings, 0 replies; 3+ messages in thread From: Albin Babu Varghese @ 2025-11-11 8:45 UTC (permalink / raw) To: Theodore Ts'o Cc: Andreas Dilger, syzbot+f3185be57d7e8dda32b8, Ahmet Eray Karadag, linux-ext4, linux-kernel Hey Ted, Thanks for the feedback. On Thu, Nov 06, 2025 at 10:30:35AM -0500, Theodore Ts'o wrote: > On Fri, Oct 10, 2025 at 03:38:00AM -0400, Albin Babu Varghese wrote: > > When ext4_mb_generate_buddy() detects block group descriptor > > corruption (free block count mismatch between descriptor and > > bitmap), it corrects the in-memory group descriptor (grp->bb_free) > > but does not synchronize the percpu free clusters counter. > > Actually, we do. This happens in ext4_mark_group_bitmap_corrupted in > fs/ext4/super.c. > > if (flags & EXT4_GROUP_INFO_BBITMAP_CORRUPT) { > ret = ext4_test_and_set_bit(EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT, > &grp->bb_state); > if (!ret) > percpu_counter_sub(&sbi->s_freeclusters_counter, > grp->bb_free); > } > > So we've *already* subtracted out the blocks that were in the block > group which we've busied out. Thanks for pointing that out. It was naive of me to overlook the other occurences. > > This causes delayed allocation to read stale counter values when > > checking for available space. The allocator believes space is > > available based on the stale counter, makes reservation promises, > > but later fails during writeback when trying to allocate actual > > blocks from the bitmap. This results in "Delayed block allocation > > failed" errors and potential system crashes. > > I suspect there is something else going on with s_freeclusters_counter > being incorrect, but adding an additional correction to > s_freeclusters_counter is not the answer. > > How is the system crashing? If we have errors=continue, then we > really shouldn't let the system crash. If there is delayed allocation > failures, the user might lose data, but if the user really cares about > that, they shouldn't be using errors=continue. I think the existing check in `ext4_mb_generate_buddy` is for runtime errors, and the issue here is happening at mount time due to an already corrupted filesystem. The value of `grp->bb_free` and `s_freeclusters_counter` was `150994969` vs `25`, which is the actual free clusters count. The existing update call subtracts `grp->bb_free` from `s_freeclusters_counter` assuming that the group descriptor is accurate, but in this case it is not. So we still end up with an incorrect global counter. I tried the patch here: https://syzkaller.appspot.com/text?tag=Patch&x=1771a7cd980000 In that version, I attempted to compute and pass the corrected value to the update function, but it failed with the warning at `ext4_dirty_folio`: https://syzkaller.appspot.com/x/report.txt?x=1306a7cd980000 From what I understand, even after adjusting the counter, the dirty buffers had already been created, so it returns an error that unmapped dirty buffers remain. My earlier fix ended up making `s_freeclusters_counter` become 0 due to the update in `ext4_mark_group_bitmap_corrupted()` that I had overlooked. As a result, no warnings or errors were triggered at that time. I might be off here, and I’m not sure how best to proceed. Best, Albin ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-11-11 8:45 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-10 7:38 [PATCH v2] ext4: synchronize free block counter when detecting corruption Albin Babu Varghese 2025-11-06 15:30 ` Theodore Ts'o 2025-11-11 8:45 ` Albin Babu Varghese
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).