* [PATCH v2] ext4: synchronize free block counter when detecting corruption
@ 2025-10-10 7:38 Albin Babu Varghese
2025-11-06 15:30 ` Theodore Ts'o
0 siblings, 1 reply; 3+ messages in thread
From: Albin Babu Varghese @ 2025-10-10 7:38 UTC (permalink / raw)
To: Theodore Ts'o, Andreas Dilger
Cc: Albin Babu Varghese, syzbot+f3185be57d7e8dda32b8,
Ahmet Eray Karadag, linux-ext4, linux-kernel
When ext4_mb_generate_buddy() detects block group descriptor
corruption (free block count mismatch between descriptor and
bitmap), it corrects the in-memory group descriptor (grp->bb_free)
but does not synchronize the percpu free clusters counter.
This causes delayed allocation to read stale counter values when
checking for available space. The allocator believes space is
available based on the stale counter, makes reservation promises,
but later fails during writeback when trying to allocate actual
blocks from the bitmap. This results in "Delayed block allocation
failed" errors and potential system crashes.
Fix by updating the percpu counter with the correction delta when
corruption is detected:
s64 correction = (s64)free - (s64)grp->bb_free;
grp->bb_free = free;
percpu_counter_add(&sbi->s_freeclusters_counter, correction);
This ensures the global counter stays synchronized with the
corrected group descriptor, preventing false promises and crashes.
Reported-by: syzbot+f3185be57d7e8dda32b8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3185be57d7e8dda32b8
Tested-by: syzbot+f3185be57d7e8dda32b8@syzkaller.appspotmail.com
Co-developed-by: Ahmet Eray Karadag <eraykrdg1@gmail.com>
Signed-off-by: Ahmet Eray Karadag <eraykrdg1@gmail.com>
Signed-off-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com>
---
Changes in v2:
- v1 added bounds checking in ext4_write_inline_data_end() to reject
writes beyond inline capacity
- v2 fixes the root cause by synchronizing the percpu free clusters
counter when corruption is detected in ext4_mb_generate_buddy()
- Addresses review feedback from Ted Ts'o and Darrick Wong
Link to v1:
https://lore.kernel.org/all/20251007234221.28643-2-eraykrdg1@gmail.com/T/
---
fs/ext4/mballoc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 9087183602e4..956e5fa307ca 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1290,8 +1290,11 @@ void ext4_mb_generate_buddy(struct super_block *sb,
/*
* If we intend to continue, we consider group descriptor
* corrupt and update bb_free using bitmap value
+ * Also update the global free clusters counter to stay in sync.
*/
+ s64 correction = (s64)free - (s64)grp->bb_free;
grp->bb_free = free;
+ percpu_counter_add(&sbi->s_freeclusters_counter, correction);
ext4_mark_group_bitmap_corrupted(sb, group,
EXT4_GROUP_INFO_BBITMAP_CORRUPT);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] ext4: synchronize free block counter when detecting corruption
2025-10-10 7:38 [PATCH v2] ext4: synchronize free block counter when detecting corruption Albin Babu Varghese
@ 2025-11-06 15:30 ` Theodore Ts'o
2025-11-11 8:45 ` Albin Babu Varghese
0 siblings, 1 reply; 3+ messages in thread
From: Theodore Ts'o @ 2025-11-06 15:30 UTC (permalink / raw)
To: Albin Babu Varghese
Cc: Andreas Dilger, syzbot+f3185be57d7e8dda32b8, Ahmet Eray Karadag,
linux-ext4, linux-kernel
On Fri, Oct 10, 2025 at 03:38:00AM -0400, Albin Babu Varghese wrote:
> When ext4_mb_generate_buddy() detects block group descriptor
> corruption (free block count mismatch between descriptor and
> bitmap), it corrects the in-memory group descriptor (grp->bb_free)
> but does not synchronize the percpu free clusters counter.
Actually, we do. This happens in ext4_mark_group_bitmap_corrupted in
fs/ext4/super.c.
if (flags & EXT4_GROUP_INFO_BBITMAP_CORRUPT) {
ret = ext4_test_and_set_bit(EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT,
&grp->bb_state);
if (!ret)
percpu_counter_sub(&sbi->s_freeclusters_counter,
grp->bb_free);
}
So we've *already* subtracted out the blocks that were in the block
group which we've busied out.
> This causes delayed allocation to read stale counter values when
> checking for available space. The allocator believes space is
> available based on the stale counter, makes reservation promises,
> but later fails during writeback when trying to allocate actual
> blocks from the bitmap. This results in "Delayed block allocation
> failed" errors and potential system crashes.
I suspect there is something else going on with s_freeclusters_counter
being incorrect, but adding an additional correction to
s_freeclusters_counter is not the answer.
How is the system crashing? If we have errors=continue, then we
really shouldn't let the system crash. If there is delayed allocation
failures, the user might lose data, but if the user really cares about
that, they shouldn't be using errors=continue.
- Ted
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] ext4: synchronize free block counter when detecting corruption
2025-11-06 15:30 ` Theodore Ts'o
@ 2025-11-11 8:45 ` Albin Babu Varghese
0 siblings, 0 replies; 3+ messages in thread
From: Albin Babu Varghese @ 2025-11-11 8:45 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Andreas Dilger, syzbot+f3185be57d7e8dda32b8, Ahmet Eray Karadag,
linux-ext4, linux-kernel
Hey Ted, Thanks for the feedback.
On Thu, Nov 06, 2025 at 10:30:35AM -0500, Theodore Ts'o wrote:
> On Fri, Oct 10, 2025 at 03:38:00AM -0400, Albin Babu Varghese wrote:
> > When ext4_mb_generate_buddy() detects block group descriptor
> > corruption (free block count mismatch between descriptor and
> > bitmap), it corrects the in-memory group descriptor (grp->bb_free)
> > but does not synchronize the percpu free clusters counter.
>
> Actually, we do. This happens in ext4_mark_group_bitmap_corrupted in
> fs/ext4/super.c.
>
> if (flags & EXT4_GROUP_INFO_BBITMAP_CORRUPT) {
> ret = ext4_test_and_set_bit(EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT,
> &grp->bb_state);
> if (!ret)
> percpu_counter_sub(&sbi->s_freeclusters_counter,
> grp->bb_free);
> }
>
> So we've *already* subtracted out the blocks that were in the block
> group which we've busied out.
Thanks for pointing that out. It was naive of me to overlook the other
occurences.
> > This causes delayed allocation to read stale counter values when
> > checking for available space. The allocator believes space is
> > available based on the stale counter, makes reservation promises,
> > but later fails during writeback when trying to allocate actual
> > blocks from the bitmap. This results in "Delayed block allocation
> > failed" errors and potential system crashes.
>
> I suspect there is something else going on with s_freeclusters_counter
> being incorrect, but adding an additional correction to
> s_freeclusters_counter is not the answer.
>
> How is the system crashing? If we have errors=continue, then we
> really shouldn't let the system crash. If there is delayed allocation
> failures, the user might lose data, but if the user really cares about
> that, they shouldn't be using errors=continue.
I think the existing check in `ext4_mb_generate_buddy` is for runtime errors,
and the issue here is happening at mount time due to an already corrupted
filesystem. The value of `grp->bb_free` and `s_freeclusters_counter` was
`150994969` vs `25`, which is the actual free clusters count. The existing
update call subtracts `grp->bb_free` from `s_freeclusters_counter` assuming
that the group descriptor is accurate, but in this case it is not. So we
still end up with an incorrect global counter.
I tried the patch here:
https://syzkaller.appspot.com/text?tag=Patch&x=1771a7cd980000
In that version, I attempted to compute and pass the corrected value to the
update function, but it failed with the warning at `ext4_dirty_folio`:
https://syzkaller.appspot.com/x/report.txt?x=1306a7cd980000
From what I understand, even after adjusting the counter, the dirty buffers had
already been created, so it returns an error that unmapped dirty buffers
remain.
My earlier fix ended up making `s_freeclusters_counter` become 0 due to the
update in `ext4_mark_group_bitmap_corrupted()` that I had overlooked. As a
result, no warnings or errors were triggered at that time.
I might be off here, and I’m not sure how best to proceed.
Best,
Albin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-11-11 8:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-10 7:38 [PATCH v2] ext4: synchronize free block counter when detecting corruption Albin Babu Varghese
2025-11-06 15:30 ` Theodore Ts'o
2025-11-11 8:45 ` Albin Babu Varghese
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).