public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
@ 2026-03-02 13:46 Ye Bin
  2026-03-02 16:27 ` Jan Kara
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Ye Bin @ 2026-03-02 13:46 UTC (permalink / raw)
  To: tytso, adilger.kernel, linux-ext4; +Cc: jack

From: Ye Bin <yebin10@huawei.com>

There's issue as follows:
...
EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost

EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost

EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost

EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost

EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2243 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost

EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2239 at logical offset 0 with max blocks 1 with error 117
EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost

EXT4-fs (mmcblk0p1): error count since last fsck: 1
EXT4-fs (mmcblk0p1): initial error at time 1765597433: ext4_mb_generate_buddy:760
EXT4-fs (mmcblk0p1): last error at time 1765597433: ext4_mb_generate_buddy:760
...

According to the log analysis, blocks are always requested from the
corrupted block group. This may happen as follows:
ext4_mb_find_by_goal
  ext4_mb_load_buddy
   ext4_mb_load_buddy_gfp
     ext4_mb_init_cache
      ext4_read_block_bitmap_nowait
      ext4_wait_block_bitmap
       ext4_validate_block_bitmap
        if (!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
         return -EFSCORRUPTED; // There's no logs.
 if (err)
  return err;  // Will return error
ext4_lock_group(ac->ac_sb, group);
  if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) // Unreachable
   goto out;

After commit 9008a58e5dce ("ext4: make the bitmap read routines return
real error codes") merged, Commit 163a203ddb36 ("ext4: mark block group
as corrupt on block bitmap error") is no real solution for allocating
blocks from corrupted block groups. This is because if
'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is true, then
'ext4_mb_load_buddy()' may return an error. This means that the block
allocation will fail.
Therefore, check block group if corrupted when ext4_mb_load_buddy()
returns error.

Fixes: 163a203ddb36 ("ext4: mark block group as corrupt on block bitmap error")
Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes")
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/ext4/mballoc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e2341489f4d0..ffa6886de8a3 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2443,8 +2443,12 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
 		return 0;
 
 	err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
-	if (err)
+	if (err) {
+		if (EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info) &&
+		    !(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
+			return 0;
 		return err;
+	}
 
 	ext4_lock_group(ac->ac_sb, group);
 	if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
  2026-03-02 13:46 [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal() Ye Bin
@ 2026-03-02 16:27 ` Jan Kara
  2026-03-02 19:41 ` Andreas Dilger
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2026-03-02 16:27 UTC (permalink / raw)
  To: Ye Bin; +Cc: tytso, adilger.kernel, linux-ext4, jack

On Mon 02-03-26 21:46:19, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
> 
> There's issue as follows:
> ...
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2243 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2239 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): error count since last fsck: 1
> EXT4-fs (mmcblk0p1): initial error at time 1765597433: ext4_mb_generate_buddy:760
> EXT4-fs (mmcblk0p1): last error at time 1765597433: ext4_mb_generate_buddy:760
> ...
> 
> According to the log analysis, blocks are always requested from the
> corrupted block group. This may happen as follows:
> ext4_mb_find_by_goal
>   ext4_mb_load_buddy
>    ext4_mb_load_buddy_gfp
>      ext4_mb_init_cache
>       ext4_read_block_bitmap_nowait
>       ext4_wait_block_bitmap
>        ext4_validate_block_bitmap
>         if (!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
>          return -EFSCORRUPTED; // There's no logs.
>  if (err)
>   return err;  // Will return error
> ext4_lock_group(ac->ac_sb, group);
>   if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) // Unreachable
>    goto out;
> 
> After commit 9008a58e5dce ("ext4: make the bitmap read routines return
> real error codes") merged, Commit 163a203ddb36 ("ext4: mark block group
> as corrupt on block bitmap error") is no real solution for allocating
> blocks from corrupted block groups. This is because if
> 'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is true, then
> 'ext4_mb_load_buddy()' may return an error. This means that the block
> allocation will fail.
> Therefore, check block group if corrupted when ext4_mb_load_buddy()
> returns error.
> 
> Fixes: 163a203ddb36 ("ext4: mark block group as corrupt on block bitmap error")
> Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes")
> Signed-off-by: Ye Bin <yebin10@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e2341489f4d0..ffa6886de8a3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2443,8 +2443,12 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
>  		return 0;
>  
>  	err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
> -	if (err)
> +	if (err) {
> +		if (EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info) &&
> +		    !(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
> +			return 0;
>  		return err;
> +	}
>  
>  	ext4_lock_group(ac->ac_sb, group);
>  	if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
> -- 
> 2.34.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
  2026-03-02 13:46 [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal() Ye Bin
  2026-03-02 16:27 ` Jan Kara
@ 2026-03-02 19:41 ` Andreas Dilger
  2026-03-03  2:36 ` Baokun Li
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Andreas Dilger @ 2026-03-02 19:41 UTC (permalink / raw)
  To: Ye Bin; +Cc: tytso, linux-ext4, jack

On Mar 2, 2026, at 06:46, Ye Bin <yebin@huaweicloud.com> wrote:
> 
> From: Ye Bin <yebin10@huawei.com>
> 
> There's issue as follows:
> ...
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2243 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2239 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): error count since last fsck: 1
> EXT4-fs (mmcblk0p1): initial error at time 1765597433: ext4_mb_generate_buddy:760
> EXT4-fs (mmcblk0p1): last error at time 1765597433: ext4_mb_generate_buddy:760
> ...
> 
> According to the log analysis, blocks are always requested from the
> corrupted block group. This may happen as follows:
> ext4_mb_find_by_goal
>  ext4_mb_load_buddy
>   ext4_mb_load_buddy_gfp
>     ext4_mb_init_cache
>      ext4_read_block_bitmap_nowait
>      ext4_wait_block_bitmap
>       ext4_validate_block_bitmap
>        if (!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
>         return -EFSCORRUPTED; // There's no logs.
> if (err)
>  return err;  // Will return error
> ext4_lock_group(ac->ac_sb, group);
>  if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) // Unreachable
>   goto out;
> 
> After commit 9008a58e5dce ("ext4: make the bitmap read routines return
> real error codes") merged, Commit 163a203ddb36 ("ext4: mark block group
> as corrupt on block bitmap error") is no real solution for allocating
> blocks from corrupted block groups. This is because if
> 'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is true, then
> 'ext4_mb_load_buddy()' may return an error. This means that the block
> allocation will fail.
> Therefore, check block group if corrupted when ext4_mb_load_buddy()
> returns error.
> 
> Fixes: 163a203ddb36 ("ext4: mark block group as corrupt on block bitmap error")
> Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes")
> Signed-off-by: Ye Bin <yebin10@huawei.com>

Reviewed-by: Andreas Dilger <adilger@dilger.ca <mailto:adilger@dilger.ca>>

> ---
> fs/ext4/mballoc.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e2341489f4d0..ffa6886de8a3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2443,8 +2443,12 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
> return 0;
> 
> err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
> - if (err)
> + if (err) {
> + if (EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info) &&
> +    !(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
> + return 0;
> return err;
> + }
> 
> ext4_lock_group(ac->ac_sb, group);
> if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
  2026-03-02 13:46 [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal() Ye Bin
  2026-03-02 16:27 ` Jan Kara
  2026-03-02 19:41 ` Andreas Dilger
@ 2026-03-03  2:36 ` Baokun Li
  2026-03-03  7:55   ` yebin (H)
  2026-03-07  8:07 ` Zhang Yi
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: Baokun Li @ 2026-03-03  2:36 UTC (permalink / raw)
  To: Ye Bin; +Cc: jack, tytso, adilger.kernel, linux-ext4


On 3/2/26 9:46 PM, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
>
> There's issue as follows:
> ...
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2243 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2239 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): error count since last fsck: 1
> EXT4-fs (mmcblk0p1): initial error at time 1765597433: ext4_mb_generate_buddy:760
> EXT4-fs (mmcblk0p1): last error at time 1765597433: ext4_mb_generate_buddy:760
> ...
>
> According to the log analysis, blocks are always requested from the
> corrupted block group. This may happen as follows:
> ext4_mb_find_by_goal
>   ext4_mb_load_buddy
>    ext4_mb_load_buddy_gfp
>      ext4_mb_init_cache
>       ext4_read_block_bitmap_nowait
>       ext4_wait_block_bitmap
>        ext4_validate_block_bitmap
>         if (!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
>          return -EFSCORRUPTED; // There's no logs.
>  if (err)
>   return err;  // Will return error
> ext4_lock_group(ac->ac_sb, group);
>   if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) // Unreachable
>    goto out;
>
> After commit 9008a58e5dce ("ext4: make the bitmap read routines return
> real error codes") merged, Commit 163a203ddb36 ("ext4: mark block group
> as corrupt on block bitmap error") is no real solution for allocating
> blocks from corrupted block groups. This is because if
> 'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is true, then
> 'ext4_mb_load_buddy()' may return an error. This means that the block
> allocation will fail.
> Therefore, check block group if corrupted when ext4_mb_load_buddy()
> returns error.

Good catch!

Agreed, we should try other groups upon failure unless it's a goal-only
allocation.

But note that e4b->bd_info might be uninitialized if ext4_mb_load_buddy()
fails.

I think we can optimize this in ext4_mb_regular_allocator(): we can record
the error from ext4_mb_find_by_goal() but avoid an early exit.

Specifically, after checking that EXT4_MB_HINT_GOAL_ONLY is not set,
we can assign the error to ac->ac_first_err. This way, if subsequent
allocation attempts still fail, we can preserve the original.


Cheers,
Baokun

> Fixes: 163a203ddb36 ("ext4: mark block group as corrupt on block bitmap error")
> Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes")
> Signed-off-by: Ye Bin <yebin10@huawei.com>
> ---
>  fs/ext4/mballoc.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e2341489f4d0..ffa6886de8a3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2443,8 +2443,12 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
>  		return 0;
>  
>  	err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
> -	if (err)
> +	if (err) {
> +		if (EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info) &&
> +		    !(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
> +			return 0;
>  		return err;
> +	}
>  
>  	ext4_lock_group(ac->ac_sb, group);
>  	if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
  2026-03-03  2:36 ` Baokun Li
@ 2026-03-03  7:55   ` yebin (H)
  0 siblings, 0 replies; 8+ messages in thread
From: yebin (H) @ 2026-03-03  7:55 UTC (permalink / raw)
  To: Baokun Li, Ye Bin; +Cc: jack, tytso, adilger.kernel, linux-ext4



On 2026/3/3 10:36, Baokun Li wrote:
>
> On 3/2/26 9:46 PM, Ye Bin wrote:
>> From: Ye Bin <yebin10@huawei.com>
>>
>> There's issue as follows:
>> ...
>> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
>> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>>
>> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
>> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>>
>> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
>> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>>
>> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
>> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>>
>> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2243 at logical offset 0 with max blocks 1 with error 117
>> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>>
>> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2239 at logical offset 0 with max blocks 1 with error 117
>> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>>
>> EXT4-fs (mmcblk0p1): error count since last fsck: 1
>> EXT4-fs (mmcblk0p1): initial error at time 1765597433: ext4_mb_generate_buddy:760
>> EXT4-fs (mmcblk0p1): last error at time 1765597433: ext4_mb_generate_buddy:760
>> ...
>>
>> According to the log analysis, blocks are always requested from the
>> corrupted block group. This may happen as follows:
>> ext4_mb_find_by_goal
>>    ext4_mb_load_buddy
>>     ext4_mb_load_buddy_gfp
>>       ext4_mb_init_cache
>>        ext4_read_block_bitmap_nowait
>>        ext4_wait_block_bitmap
>>         ext4_validate_block_bitmap
>>          if (!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
>>           return -EFSCORRUPTED; // There's no logs.
>>   if (err)
>>    return err;  // Will return error
>> ext4_lock_group(ac->ac_sb, group);
>>    if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) // Unreachable
>>     goto out;
>>
>> After commit 9008a58e5dce ("ext4: make the bitmap read routines return
>> real error codes") merged, Commit 163a203ddb36 ("ext4: mark block group
>> as corrupt on block bitmap error") is no real solution for allocating
>> blocks from corrupted block groups. This is because if
>> 'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is true, then
>> 'ext4_mb_load_buddy()' may return an error. This means that the block
>> allocation will fail.
>> Therefore, check block group if corrupted when ext4_mb_load_buddy()
>> returns error.
>
> Good catch!
>
> Agreed, we should try other groups upon failure unless it's a goal-only
> allocation.
>
> But note that e4b->bd_info might be uninitialized if ext4_mb_load_buddy()
> fails.
>
The situation you mentioned probably doesn't exist.
ext4_mb_find_by_goal
   struct ext4_group_info *grp = ext4_get_group_info(ac->ac_sb, group);
   if (!grp)   // The possibility that e4b->bd_info is not initialized 
has been avoided.
     return -EFSCORRUPTED;
   err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
      ext4_mb_load_buddy_gfp(sb, group, e4b, GFP_NOFS);
        grp = ext4_get_group_info(sb, group);
        if (!grp)   // This condition probably will not be met.
          return -EFSCORRUPTED;
        e4b->bd_info = grp;
> I think we can optimize this in ext4_mb_regular_allocator(): we can record
> the error from ext4_mb_find_by_goal() but avoid an early exit.
>
> Specifically, after checking that EXT4_MB_HINT_GOAL_ONLY is not set,
> we can assign the error to ac->ac_first_err. This way, if subsequent
> allocation attempts still fail, we can preserve the original.
>
>
> Cheers,
> Baokun
>
>> Fixes: 163a203ddb36 ("ext4: mark block group as corrupt on block bitmap error")
>> Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes")
>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>> ---
>>   fs/ext4/mballoc.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index e2341489f4d0..ffa6886de8a3 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2443,8 +2443,12 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
>>   		return 0;
>>
>>   	err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
>> -	if (err)
>> +	if (err) {
>> +		if (EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info) &&
>> +		    !(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
>> +			return 0;
>>   		return err;
>> +	}
>>
>>   	ext4_lock_group(ac->ac_sb, group);
>>   	if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
>
>
> .
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
  2026-03-02 13:46 [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal() Ye Bin
                   ` (2 preceding siblings ...)
  2026-03-03  2:36 ` Baokun Li
@ 2026-03-07  8:07 ` Zhang Yi
  2026-03-14  8:39 ` Ritesh Harjani
  2026-03-27  4:06 ` Theodore Ts'o
  5 siblings, 0 replies; 8+ messages in thread
From: Zhang Yi @ 2026-03-07  8:07 UTC (permalink / raw)
  To: Ye Bin, tytso, adilger.kernel, linux-ext4; +Cc: jack

On 3/2/2026 9:46 PM, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
> 
> There's issue as follows:
> ...
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2243 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2239 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): error count since last fsck: 1
> EXT4-fs (mmcblk0p1): initial error at time 1765597433: ext4_mb_generate_buddy:760
> EXT4-fs (mmcblk0p1): last error at time 1765597433: ext4_mb_generate_buddy:760
> ...
> 
> According to the log analysis, blocks are always requested from the
> corrupted block group. This may happen as follows:
> ext4_mb_find_by_goal
>    ext4_mb_load_buddy
>     ext4_mb_load_buddy_gfp
>       ext4_mb_init_cache
>        ext4_read_block_bitmap_nowait
>        ext4_wait_block_bitmap
>         ext4_validate_block_bitmap
>          if (!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
>           return -EFSCORRUPTED; // There's no logs.
>   if (err)
>    return err;  // Will return error
> ext4_lock_group(ac->ac_sb, group);
>    if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) // Unreachable
>     goto out;
> 
> After commit 9008a58e5dce ("ext4: make the bitmap read routines return
> real error codes") merged, Commit 163a203ddb36 ("ext4: mark block group
> as corrupt on block bitmap error") is no real solution for allocating
> blocks from corrupted block groups. This is because if
> 'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is true, then
> 'ext4_mb_load_buddy()' may return an error. This means that the block
> allocation will fail.
> Therefore, check block group if corrupted when ext4_mb_load_buddy()
> returns error.
> 
> Fixes: 163a203ddb36 ("ext4: mark block group as corrupt on block bitmap error")
> Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes")
> Signed-off-by: Ye Bin <yebin10@huawei.com>

Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>   fs/ext4/mballoc.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e2341489f4d0..ffa6886de8a3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2443,8 +2443,12 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
>   		return 0;
>   
>   	err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
> -	if (err)
> +	if (err) {
> +		if (EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info) &&
> +		    !(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
> +			return 0;
>   		return err;
> +	}
>   
>   	ext4_lock_group(ac->ac_sb, group);
>   	if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
  2026-03-02 13:46 [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal() Ye Bin
                   ` (3 preceding siblings ...)
  2026-03-07  8:07 ` Zhang Yi
@ 2026-03-14  8:39 ` Ritesh Harjani
  2026-03-27  4:06 ` Theodore Ts'o
  5 siblings, 0 replies; 8+ messages in thread
From: Ritesh Harjani @ 2026-03-14  8:39 UTC (permalink / raw)
  To: Ye Bin, tytso, adilger.kernel, linux-ext4; +Cc: jack

Ye Bin <yebin@huaweicloud.com> writes:

> From: Ye Bin <yebin10@huawei.com>
>
> There's issue as follows:
> ...
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2243 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2239 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
>
> EXT4-fs (mmcblk0p1): error count since last fsck: 1
> EXT4-fs (mmcblk0p1): initial error at time 1765597433: ext4_mb_generate_buddy:760
> EXT4-fs (mmcblk0p1): last error at time 1765597433: ext4_mb_generate_buddy:760
> ...
>
> According to the log analysis, blocks are always requested from the
> corrupted block group. This may happen as follows:
> ext4_mb_find_by_goal
>   ext4_mb_load_buddy
>    ext4_mb_load_buddy_gfp
>      ext4_mb_init_cache
>       ext4_read_block_bitmap_nowait
>       ext4_wait_block_bitmap
>        ext4_validate_block_bitmap
>         if (!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
>          return -EFSCORRUPTED; // There's no logs.
>  if (err)
>   return err;  // Will return error
> ext4_lock_group(ac->ac_sb, group);
>   if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) // Unreachable
>    goto out;
>
> After commit 9008a58e5dce ("ext4: make the bitmap read routines return
> real error codes") merged, Commit 163a203ddb36 ("ext4: mark block group
> as corrupt on block bitmap error") is no real solution for allocating
> blocks from corrupted block groups. This is because if
> 'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is true, then
> 'ext4_mb_load_buddy()' may return an error. This means that the block
> allocation will fail.
> Therefore, check block group if corrupted when ext4_mb_load_buddy()
> returns error.
>
> Fixes: 163a203ddb36 ("ext4: mark block group as corrupt on block bitmap error")
> Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes")
> Signed-off-by: Ye Bin <yebin10@huawei.com>
> ---
>  fs/ext4/mballoc.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e2341489f4d0..ffa6886de8a3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2443,8 +2443,12 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
>  		return 0;
>  
>  	err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
> -	if (err)
> +	if (err) {
> +		if (EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info) &&
> +		    !(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
> +			return 0;
>  		return err;
> +	}


So, if we had to load the buddy info and if the group's block bitmap was
marked as corrupted, then we always return error, instead of
seaching for free blocks in other block groups (even for non-goal-only
allocations).

This patch fixes that path..
Nice catch! Was this happening as part of some xfstests?


Feel free to add:
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
  2026-03-02 13:46 [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal() Ye Bin
                   ` (4 preceding siblings ...)
  2026-03-14  8:39 ` Ritesh Harjani
@ 2026-03-27  4:06 ` Theodore Ts'o
  5 siblings, 0 replies; 8+ messages in thread
From: Theodore Ts'o @ 2026-03-27  4:06 UTC (permalink / raw)
  To: adilger.kernel, linux-ext4, Ye Bin; +Cc: Theodore Ts'o, jack


On Mon, 02 Mar 2026 21:46:19 +0800, Ye Bin wrote:
> There's issue as follows:
> ...
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117
> EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost
> 
> [...]

Applied, thanks!

[1/1] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal()
      commit: 4a1e038b056fca4a9644de1af8009c4980e158e3

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-27  4:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02 13:46 [PATCH] ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal() Ye Bin
2026-03-02 16:27 ` Jan Kara
2026-03-02 19:41 ` Andreas Dilger
2026-03-03  2:36 ` Baokun Li
2026-03-03  7:55   ` yebin (H)
2026-03-07  8:07 ` Zhang Yi
2026-03-14  8:39 ` Ritesh Harjani
2026-03-27  4:06 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox