public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
@ 2026-03-10 12:28 Theodore Ts'o
  2026-03-11  2:38 ` Baokun Li
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2026-03-10 12:28 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Theodore Ts'o, Jan Kara

Commit 4865c768b563 ("ext4: always allocate blocks only from groups
inode can use") restricts what blocks will be allocated for indirect
block based files to block numbers that fit within 32-bit block
numbers.

However, when using a review bot running on the latest Gemini LLM to
check this commit when backporting into an LTS based kernel, it raised
the following concerns:

   If ac->ac_g_ex.fe_group is >= ngroups (for instance, if the goal
   group was populated via stream allocation from s_mb_last_groups),
   then start will be >= ngroups.

   Does this allow allocating blocks beyond the 32-bit limit for
   indirect block mapped files? The commit message mentions that
   ext4_mb_scan_groups_linear() takes care to not select unsupported
   groups. However, its loop uses group = *start, and the very first
   iteration will call ext4_mb_scan_group() with this unsupported
   group because next_linear_group() is only called at the end of the
   iteration.

   Also, in the optimized scanning paths like
   ext4_mb_scan_groups_p2_aligned(), start is passed into
   ext4_mb_scan_groups_xa_range(). If start >= ngroups, will this
   trigger the WARN_ON_ONCE(end > ngroups || start >= end) because end
   is set to ngroups? Furthermore, after wrapping around, end will be
   set to the original start which is > ngroups, triggering the
   warning a second time. Should this code clamp start to < ngroups
   before scanning?

After reviewing the code paths involved and considering the LLM
review, I believe that while it is unlikely, it is possible that how
commit 4865c768b563 added ext4_get_allocation_groups_count() doesn't
completely address all of the possible situtions that might show up
when using indirect-mapped files and allocating blocks on a file
system with more than 2**32 blocks.

So this commit adds some safety checks and wrap around logic to some
functions if the multiblock allocator: ext4_mb_scan_groups_xa_range(),
ext4_mb_scan_groups_best_avail(), ext4_mb_scan_groups_p2_aligned(),
and ext4_mb_scan_groups().

Fixes: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
---
 fs/ext4/mballoc.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 20e9fdaf4301..454d5a1b4803 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -915,13 +915,17 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	enum criteria cr = ac->ac_criteria;
 	ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
-	unsigned long group = start;
+	unsigned long group, end_range;
 	struct ext4_group_info *grp;
 
-	if (WARN_ON_ONCE(end > ngroups || start >= end))
-		return 0;
-
-	xa_for_each_range(xa, group, grp, start, end - 1) {
+	if (start >= ngroups)
+		start = 0;
+	group = start;
+wrap_around:
+	end_range = end;
+	if (end_range > ngroups)
+		end_range = ngroups;
+	xa_for_each_range(xa, group, grp, start, end_range - 1) {
 		int err;
 
 		if (sbi->s_mb_stats)
@@ -933,7 +937,11 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
 
 		cond_resched();
 	}
-
+	if (start) {
+		end = start;
+		start = 0;
+		goto wrap_around;
+	}
 	return 0;
 }
 
@@ -967,6 +975,8 @@ static int ext4_mb_scan_groups_p2_aligned(struct ext4_allocation_context *ac,
 
 	start = group;
 	end = ext4_get_allocation_groups_count(ac);
+	if (start >= end)
+		return 0;
 wrap_around:
 	for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
 		ret = ext4_mb_scan_groups_largest_free_order_range(ac, i,
@@ -1099,6 +1109,8 @@ static int ext4_mb_scan_groups_best_avail(struct ext4_allocation_context *ac,
 
 	start = group;
 	end = ext4_get_allocation_groups_count(ac);
+	if (start >= end)
+		start = 0;
 wrap_around:
 	for (i = order; i >= min_order; i--) {
 		int frag_order;
@@ -1199,6 +1211,8 @@ static int ext4_mb_scan_groups(struct ext4_allocation_context *ac)
 
 	/* searching for the right group start from the goal value specified */
 	start = ac->ac_g_ex.fe_group;
+	if (start >= ngroups)
+		start = 0;
 	ac->ac_prefetch_grp = start;
 	ac->ac_prefetch_nr = 0;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
  2026-03-10 12:28 [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks Theodore Ts'o
@ 2026-03-11  2:38 ` Baokun Li
  2026-03-12 14:23   ` Theodore Tso
  0 siblings, 1 reply; 6+ messages in thread
From: Baokun Li @ 2026-03-11  2:38 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Jan Kara, libaokun, Ext4 Developers List

Hi Ted,

On 3/10/26 8:28 PM, Theodore Ts'o wrote:
> Commit 4865c768b563 ("ext4: always allocate blocks only from groups
> inode can use") restricts what blocks will be allocated for indirect
> block based files to block numbers that fit within 32-bit block
> numbers.
>
> However, when using a review bot running on the latest Gemini LLM to
> check this commit when backporting into an LTS based kernel, it raised
> the following concerns:
>
>    If ac->ac_g_ex.fe_group is >= ngroups (for instance, if the goal
>    group was populated via stream allocation from s_mb_last_groups),
>    then start will be >= ngroups.
>
>    Does this allow allocating blocks beyond the 32-bit limit for
>    indirect block mapped files? The commit message mentions that
>    ext4_mb_scan_groups_linear() takes care to not select unsupported
>    groups. However, its loop uses group = *start, and the very first
>    iteration will call ext4_mb_scan_group() with this unsupported
>    group because next_linear_group() is only called at the end of the
>    iteration.
>
>    Also, in the optimized scanning paths like
>    ext4_mb_scan_groups_p2_aligned(), start is passed into
>    ext4_mb_scan_groups_xa_range(). If start >= ngroups, will this
>    trigger the WARN_ON_ONCE(end > ngroups || start >= end) because end
>    is set to ngroups? Furthermore, after wrapping around, end will be
>    set to the original start which is > ngroups, triggering the
>    warning a second time. Should this code clamp start to < ngroups
>    before scanning?
>
> After reviewing the code paths involved and considering the LLM
> review, I believe that while it is unlikely, it is possible that how
> commit 4865c768b563 added ext4_get_allocation_groups_count() doesn't
> completely address all of the possible situtions that might show up
> when using indirect-mapped files and allocating blocks on a file
> system with more than 2**32 blocks.
>
> So this commit adds some safety checks and wrap around logic to some
> functions if the multiblock allocator: ext4_mb_scan_groups_xa_range(),
> ext4_mb_scan_groups_best_avail(), ext4_mb_scan_groups_p2_aligned(),
> and ext4_mb_scan_groups().
>
> Fixes: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Cc: Jan Kara <jack@suse.cz>
> ---

Good spotting! ext4_find_goal() ensures that the goal block obtained for
indirect-block-based files will not exceed EXT4_MAX_BLOCK_FILE_PHYS.
However, on an ext4 filesystem where the two file formats are mixed,
it is indeed possible to get an excessively large goal group via stream
allocation.

Since the mixed-format case is quite rare, I think we can simply validate
start in ext4_mb_scan_groups() and reset it to 0 when it exceeds the limit,
like this:

        /* searching for the right group start from the goal value
specified */
        start = ac->ac_g_ex.fe_group;
+       if (start >= ngroups)
+               start = 0;
        ac->ac_prefetch_grp = start;
        ac->ac_prefetch_nr = 0;

This way, we don’t need to add similar checks in every CR.


Besides, ext4_mb_scan_groups_xa_range relies on the caller to handle the
wrap-around logic so as to ensure an overall one-way traversal of groups;
see commit a3ce570a5d6a ("ext4: implement linear-like traversal across
order xarrays") for details. So in any case, I don’t think it is necessary
to add wrap-around logic here. It should be sufficient for the caller to
guarantee that end <= ngroups and start < end.


Cheers,
Baokun

>  fs/ext4/mballoc.c | 26 ++++++++++++++++++++------
>  1 file changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 20e9fdaf4301..454d5a1b4803 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -915,13 +915,17 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
>  	enum criteria cr = ac->ac_criteria;
>  	ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
> -	unsigned long group = start;
> +	unsigned long group, end_range;
>  	struct ext4_group_info *grp;
>  
> -	if (WARN_ON_ONCE(end > ngroups || start >= end))
> -		return 0;
> -
> -	xa_for_each_range(xa, group, grp, start, end - 1) {
> +	if (start >= ngroups)
> +		start = 0;
> +	group = start;
> +wrap_around:
> +	end_range = end;
> +	if (end_range > ngroups)
> +		end_range = ngroups;
> +	xa_for_each_range(xa, group, grp, start, end_range - 1) {
>  		int err;
>  
>  		if (sbi->s_mb_stats)
> @@ -933,7 +937,11 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
>  
>  		cond_resched();
>  	}
> -
> +	if (start) {
> +		end = start;
> +		start = 0;
> +		goto wrap_around;
> +	}
>  	return 0;
>  }
>  
> @@ -967,6 +975,8 @@ static int ext4_mb_scan_groups_p2_aligned(struct ext4_allocation_context *ac,
>  
>  	start = group;
>  	end = ext4_get_allocation_groups_count(ac);
> +	if (start >= end)
> +		return 0;
>  wrap_around:
>  	for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
>  		ret = ext4_mb_scan_groups_largest_free_order_range(ac, i,
> @@ -1099,6 +1109,8 @@ static int ext4_mb_scan_groups_best_avail(struct ext4_allocation_context *ac,
>  
>  	start = group;
>  	end = ext4_get_allocation_groups_count(ac);
> +	if (start >= end)
> +		start = 0;
>  wrap_around:
>  	for (i = order; i >= min_order; i--) {
>  		int frag_order;
> @@ -1199,6 +1211,8 @@ static int ext4_mb_scan_groups(struct ext4_allocation_context *ac)
>  
>  	/* searching for the right group start from the goal value specified */
>  	start = ac->ac_g_ex.fe_group;
> +	if (start >= ngroups)
> +		start = 0;
>  	ac->ac_prefetch_grp = start;
>  	ac->ac_prefetch_nr = 0;
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
  2026-03-11  2:38 ` Baokun Li
@ 2026-03-12 14:23   ` Theodore Tso
  2026-03-13  2:00     ` Baokun Li
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Tso @ 2026-03-12 14:23 UTC (permalink / raw)
  To: Baokun Li; +Cc: Jan Kara, Ext4 Developers List

On Wed, Mar 11, 2026 at 10:38:20AM +0800, Baokun Li wrote:
> 
> Good spotting! ext4_find_goal() ensures that the goal block obtained for
> indirect-block-based files will not exceed EXT4_MAX_BLOCK_FILE_PHYS.
> However, on an ext4 filesystem where the two file formats are mixed,
> it is indeed possible to get an excessively large goal group via stream
> allocation.

Well, I didn't spot it; an LLM AI noticed.  :-) Arguably I should have
noticed it when doing my review, but I didn't.

> Since the mixed-format case is quite rare, I think we can simply validate
> start in ext4_mb_scan_groups() and reset it to 0 when it exceeds the limit,
> like this:

No, I don't think that's enough.  It's not just that we could have an
excessively large goal group.  The goal group could also just be, say,
2**32 - 5.  If the next 5 groups are full, then when we do an
optimized scan, we will end up beyond the 2**32 limit.  That's why we
need to add some kidn of wraparound logic to *any* caller to
ext4_get_allocation_groups_count().

							- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
  2026-03-12 14:23   ` Theodore Tso
@ 2026-03-13  2:00     ` Baokun Li
  2026-03-13 14:31       ` Theodore Tso
  0 siblings, 1 reply; 6+ messages in thread
From: Baokun Li @ 2026-03-13  2:00 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jan Kara, Ext4 Developers List, libaokun


On 3/12/26 10:23 PM, Theodore Tso wrote:
> On Wed, Mar 11, 2026 at 10:38:20AM +0800, Baokun Li wrote:
>> Good spotting! ext4_find_goal() ensures that the goal block obtained for
>> indirect-block-based files will not exceed EXT4_MAX_BLOCK_FILE_PHYS.
>> However, on an ext4 filesystem where the two file formats are mixed,
>> it is indeed possible to get an excessively large goal group via stream
>> allocation.
> Well, I didn't spot it; an LLM AI noticed.  :-) Arguably I should have
> noticed it when doing my review, but I didn't.
>
>> Since the mixed-format case is quite rare, I think we can simply validate
>> start in ext4_mb_scan_groups() and reset it to 0 when it exceeds the limit,
>> like this:
> No, I don't think that's enough.  It's not just that we could have an
> excessively large goal group.  The goal group could also just be, say,
> 2**32 - 5.  If the next 5 groups are full, then when we do an
> optimized scan, we will end up beyond the 2**32 limit.  That's why we
> need to add some kidn of wraparound logic to *any* caller to
> ext4_get_allocation_groups_count().
>
IIUC, ngroups/end here only depend on filesystem size and whether the inode
is extent-based, and both should stay unchanged during block allocation.
So doing the check once at the beginning should be sufficient. Am I missing
anything?


Cheers,
Baokun


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
  2026-03-13  2:00     ` Baokun Li
@ 2026-03-13 14:31       ` Theodore Tso
  2026-03-14  7:41         ` Baokun Li
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Tso @ 2026-03-13 14:31 UTC (permalink / raw)
  To: Baokun Li; +Cc: Jan Kara, Ext4 Developers List

On Fri, Mar 13, 2026 at 10:00:04AM +0800, Baokun Li wrote:
> IIUC, ngroups/end here only depend on filesystem size and whether the inode
> is extent-based, and both should stay unchanged during block allocation.
> So doing the check once at the beginning should be sufficient. Am I missing
> anything?

The problem here is that case where we use
ext4_get_allocation_groups_count as ngroups and end are different.  In
some places we do this:

        ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);

and in others we do this:

        end = ext4_get_allocation_groups_count(ac);

Now, if start is zero, then these two are equivalent.  But if start is
not zero, but say, is 2**32 - 8, then where we use
ext4_allocaiton_groups_count() as the last block group to search, then
we only will search exactly 8 block groups, and if
there are free blocks in the first 8 block groups, then the scanning
function will fail.

Alternatively, if we just do the check at the beginning, then 2**32 -
8 is a valid starting point, but if we just search forward by ngroups,
then we may end up returning a block group which won't work for
indirect mapped inodes.

Hence, in *every* function where we call
ext4_get_allocaiton_groups_count(), if the goal is to search all block
groups that are valid for indirect mapped inodes, and start might be
greater than 0, we *have* to handle wraparound.

Does that make sense?

						- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
  2026-03-13 14:31       ` Theodore Tso
@ 2026-03-14  7:41         ` Baokun Li
  0 siblings, 0 replies; 6+ messages in thread
From: Baokun Li @ 2026-03-14  7:41 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jan Kara, Ext4 Developers List, libaokun

On 3/13/26 10:31 PM, Theodore Tso wrote:

> On Fri, Mar 13, 2026 at 10:00:04AM +0800, Baokun Li wrote:
>> IIUC, ngroups/end here only depend on filesystem size and whether the inode
>> is extent-based, and both should stay unchanged during block allocation.
>> So doing the check once at the beginning should be sufficient. Am I missing
>> anything?
> The problem here is that case where we use
> ext4_get_allocation_groups_count as ngroups and end are different.  In
> some places we do this:
>
>         ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
>
> and in others we do this:
>
>         end = ext4_get_allocation_groups_count(ac);
>
> Now, if start is zero, then these two are equivalent.  But if start is
> not zero, but say, is 2**32 - 8, then where we use
> ext4_allocaiton_groups_count() as the last block group to search, then
> we only will search exactly 8 block groups, and if
> there are free blocks in the first 8 block groups, then the scanning
> function will fail.

ext4_mb_scan_groups_p2_aligned(), ext4_mb_scan_groups_goal_fast(), and
ext4_mb_scan_groups_best_avail() will all wrap around and retry
[0, 2^32 - 8) after failing to search [2^32 - 8, 2^32). So if there are
free blocks in the first 8 block groups, the allocation will not fail.

>
> Alternatively, if we just do the check at the beginning, then 2**32 -
> 8 is a valid starting point, but if we just search forward by ngroups,
> then we may end up returning a block group which won't work for
> indirect mapped inodes.

Here, ngroups refers to the number of block groups available for
allocation, i.e. the range [0, ngroups), rather than implying that
ngroups groups are scanned on each pass. In general, we first scan the
[start, ngroups) portion of the xarrays, and then the [0, start) portion.

For indirect-mapped inodes, ext4_get_allocation_groups_count() returns
s_blockfile_groups. The end of those groups should not exceed 2^32 blocks;
otherwise, the s_blockfile_groups logic itself would be broken.

> Hence, in *every* function where we call
> ext4_get_allocaiton_groups_count(), if the goal is to search all block
> groups that are valid for indirect mapped inodes, and start might be
> greater than 0, we *have* to handle wraparound.
>
> Does that make sense?
>
Thanks for the detailed explanation, but I’m afraid I still don’t
quite follow.

Take `ext4_mb_scan_groups_p2_aligned` as an example:

ext4_mb_scan_groups
  ngroups = ext4_get_allocation_groups_count(ac); // safe
  start = ac->ac_g_ex.fe_group; // may be greater than ngroups
  ext4_mb_scan_groups_p2_aligned()
    start = group; // may be greater than ngroups
    end = ext4_get_allocation_groups_count(ac); // == ngroups
wrap_around:
    ext4_mb_scan_groups_xa_range(start, end)
      ngroups = ext4_get_allocation_groups_count(ac); // == end
        // `start` may exceed `end`, but `end` is always within bounds.
        // So we only need to make sure the `start` computed by
        // ext4_mb_scan_groups() is less than `ngroups`.
    // Forward scan failed; let the caller handle wrap-around.
    if (start) {
        end = start; // `end` may be invalid if `start` is invalid.
        start = 0;
        goto wrap_around;
    }

So it looks like the later start/end values only go wrong if the initial
start is already >= ngroups.

Did I miss something here?


Thanks,
Baokun


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-14  7:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 12:28 [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks Theodore Ts'o
2026-03-11  2:38 ` Baokun Li
2026-03-12 14:23   ` Theodore Tso
2026-03-13  2:00     ` Baokun Li
2026-03-13 14:31       ` Theodore Tso
2026-03-14  7:41         ` Baokun Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox