* [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
@ 2026-03-10 12:28 Theodore Ts'o
2026-03-11 2:38 ` Baokun Li
0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2026-03-10 12:28 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: Theodore Ts'o, Jan Kara
Commit 4865c768b563 ("ext4: always allocate blocks only from groups
inode can use") restricts what blocks will be allocated for indirect
block based files to block numbers that fit within 32-bit block
numbers.
However, when using a review bot running on the latest Gemini LLM to
check this commit when backporting into an LTS based kernel, it raised
the following concerns:
If ac->ac_g_ex.fe_group is >= ngroups (for instance, if the goal
group was populated via stream allocation from s_mb_last_groups),
then start will be >= ngroups.
Does this allow allocating blocks beyond the 32-bit limit for
indirect block mapped files? The commit message mentions that
ext4_mb_scan_groups_linear() takes care to not select unsupported
groups. However, its loop uses group = *start, and the very first
iteration will call ext4_mb_scan_group() with this unsupported
group because next_linear_group() is only called at the end of the
iteration.
Also, in the optimized scanning paths like
ext4_mb_scan_groups_p2_aligned(), start is passed into
ext4_mb_scan_groups_xa_range(). If start >= ngroups, will this
trigger the WARN_ON_ONCE(end > ngroups || start >= end) because end
is set to ngroups? Furthermore, after wrapping around, end will be
set to the original start which is > ngroups, triggering the
warning a second time. Should this code clamp start to < ngroups
before scanning?
After reviewing the code paths involved and considering the LLM
review, I believe that while it is unlikely, it is possible that how
commit 4865c768b563 added ext4_get_allocation_groups_count() doesn't
completely address all of the possible situtions that might show up
when using indirect-mapped files and allocating blocks on a file
system with more than 2**32 blocks.
So this commit adds some safety checks and wrap around logic to some
functions if the multiblock allocator: ext4_mb_scan_groups_xa_range(),
ext4_mb_scan_groups_best_avail(), ext4_mb_scan_groups_p2_aligned(),
and ext4_mb_scan_groups().
Fixes: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
---
fs/ext4/mballoc.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 20e9fdaf4301..454d5a1b4803 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -915,13 +915,17 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
struct ext4_sb_info *sbi = EXT4_SB(sb);
enum criteria cr = ac->ac_criteria;
ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
- unsigned long group = start;
+ unsigned long group, end_range;
struct ext4_group_info *grp;
- if (WARN_ON_ONCE(end > ngroups || start >= end))
- return 0;
-
- xa_for_each_range(xa, group, grp, start, end - 1) {
+ if (start >= ngroups)
+ start = 0;
+ group = start;
+wrap_around:
+ end_range = end;
+ if (end_range > ngroups)
+ end_range = ngroups;
+ xa_for_each_range(xa, group, grp, start, end_range - 1) {
int err;
if (sbi->s_mb_stats)
@@ -933,7 +937,11 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
cond_resched();
}
-
+ if (start) {
+ end = start;
+ start = 0;
+ goto wrap_around;
+ }
return 0;
}
@@ -967,6 +975,8 @@ static int ext4_mb_scan_groups_p2_aligned(struct ext4_allocation_context *ac,
start = group;
end = ext4_get_allocation_groups_count(ac);
+ if (start >= end)
+ return 0;
wrap_around:
for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
ret = ext4_mb_scan_groups_largest_free_order_range(ac, i,
@@ -1099,6 +1109,8 @@ static int ext4_mb_scan_groups_best_avail(struct ext4_allocation_context *ac,
start = group;
end = ext4_get_allocation_groups_count(ac);
+ if (start >= end)
+ start = 0;
wrap_around:
for (i = order; i >= min_order; i--) {
int frag_order;
@@ -1199,6 +1211,8 @@ static int ext4_mb_scan_groups(struct ext4_allocation_context *ac)
/* searching for the right group start from the goal value specified */
start = ac->ac_g_ex.fe_group;
+ if (start >= ngroups)
+ start = 0;
ac->ac_prefetch_grp = start;
ac->ac_prefetch_nr = 0;
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
2026-03-10 12:28 [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks Theodore Ts'o
@ 2026-03-11 2:38 ` Baokun Li
2026-03-12 14:23 ` Theodore Tso
0 siblings, 1 reply; 6+ messages in thread
From: Baokun Li @ 2026-03-11 2:38 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Jan Kara, libaokun, Ext4 Developers List
Hi Ted,
On 3/10/26 8:28 PM, Theodore Ts'o wrote:
> Commit 4865c768b563 ("ext4: always allocate blocks only from groups
> inode can use") restricts what blocks will be allocated for indirect
> block based files to block numbers that fit within 32-bit block
> numbers.
>
> However, when using a review bot running on the latest Gemini LLM to
> check this commit when backporting into an LTS based kernel, it raised
> the following concerns:
>
> If ac->ac_g_ex.fe_group is >= ngroups (for instance, if the goal
> group was populated via stream allocation from s_mb_last_groups),
> then start will be >= ngroups.
>
> Does this allow allocating blocks beyond the 32-bit limit for
> indirect block mapped files? The commit message mentions that
> ext4_mb_scan_groups_linear() takes care to not select unsupported
> groups. However, its loop uses group = *start, and the very first
> iteration will call ext4_mb_scan_group() with this unsupported
> group because next_linear_group() is only called at the end of the
> iteration.
>
> Also, in the optimized scanning paths like
> ext4_mb_scan_groups_p2_aligned(), start is passed into
> ext4_mb_scan_groups_xa_range(). If start >= ngroups, will this
> trigger the WARN_ON_ONCE(end > ngroups || start >= end) because end
> is set to ngroups? Furthermore, after wrapping around, end will be
> set to the original start which is > ngroups, triggering the
> warning a second time. Should this code clamp start to < ngroups
> before scanning?
>
> After reviewing the code paths involved and considering the LLM
> review, I believe that while it is unlikely, it is possible that how
> commit 4865c768b563 added ext4_get_allocation_groups_count() doesn't
> completely address all of the possible situtions that might show up
> when using indirect-mapped files and allocating blocks on a file
> system with more than 2**32 blocks.
>
> So this commit adds some safety checks and wrap around logic to some
> functions if the multiblock allocator: ext4_mb_scan_groups_xa_range(),
> ext4_mb_scan_groups_best_avail(), ext4_mb_scan_groups_p2_aligned(),
> and ext4_mb_scan_groups().
>
> Fixes: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Cc: Jan Kara <jack@suse.cz>
> ---
Good spotting! ext4_find_goal() ensures that the goal block obtained for
indirect-block-based files will not exceed EXT4_MAX_BLOCK_FILE_PHYS.
However, on an ext4 filesystem where the two file formats are mixed,
it is indeed possible to get an excessively large goal group via stream
allocation.
Since the mixed-format case is quite rare, I think we can simply validate
start in ext4_mb_scan_groups() and reset it to 0 when it exceeds the limit,
like this:
/* searching for the right group start from the goal value
specified */
start = ac->ac_g_ex.fe_group;
+ if (start >= ngroups)
+ start = 0;
ac->ac_prefetch_grp = start;
ac->ac_prefetch_nr = 0;
This way, we don’t need to add similar checks in every CR.
Besides, ext4_mb_scan_groups_xa_range relies on the caller to handle the
wrap-around logic so as to ensure an overall one-way traversal of groups;
see commit a3ce570a5d6a ("ext4: implement linear-like traversal across
order xarrays") for details. So in any case, I don’t think it is necessary
to add wrap-around logic here. It should be sufficient for the caller to
guarantee that end <= ngroups and start < end.
Cheers,
Baokun
> fs/ext4/mballoc.c | 26 ++++++++++++++++++++------
> 1 file changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 20e9fdaf4301..454d5a1b4803 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -915,13 +915,17 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> enum criteria cr = ac->ac_criteria;
> ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
> - unsigned long group = start;
> + unsigned long group, end_range;
> struct ext4_group_info *grp;
>
> - if (WARN_ON_ONCE(end > ngroups || start >= end))
> - return 0;
> -
> - xa_for_each_range(xa, group, grp, start, end - 1) {
> + if (start >= ngroups)
> + start = 0;
> + group = start;
> +wrap_around:
> + end_range = end;
> + if (end_range > ngroups)
> + end_range = ngroups;
> + xa_for_each_range(xa, group, grp, start, end_range - 1) {
> int err;
>
> if (sbi->s_mb_stats)
> @@ -933,7 +937,11 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
>
> cond_resched();
> }
> -
> + if (start) {
> + end = start;
> + start = 0;
> + goto wrap_around;
> + }
> return 0;
> }
>
> @@ -967,6 +975,8 @@ static int ext4_mb_scan_groups_p2_aligned(struct ext4_allocation_context *ac,
>
> start = group;
> end = ext4_get_allocation_groups_count(ac);
> + if (start >= end)
> + return 0;
> wrap_around:
> for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
> ret = ext4_mb_scan_groups_largest_free_order_range(ac, i,
> @@ -1099,6 +1109,8 @@ static int ext4_mb_scan_groups_best_avail(struct ext4_allocation_context *ac,
>
> start = group;
> end = ext4_get_allocation_groups_count(ac);
> + if (start >= end)
> + start = 0;
> wrap_around:
> for (i = order; i >= min_order; i--) {
> int frag_order;
> @@ -1199,6 +1211,8 @@ static int ext4_mb_scan_groups(struct ext4_allocation_context *ac)
>
> /* searching for the right group start from the goal value specified */
> start = ac->ac_g_ex.fe_group;
> + if (start >= ngroups)
> + start = 0;
> ac->ac_prefetch_grp = start;
> ac->ac_prefetch_nr = 0;
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
2026-03-11 2:38 ` Baokun Li
@ 2026-03-12 14:23 ` Theodore Tso
2026-03-13 2:00 ` Baokun Li
0 siblings, 1 reply; 6+ messages in thread
From: Theodore Tso @ 2026-03-12 14:23 UTC (permalink / raw)
To: Baokun Li; +Cc: Jan Kara, Ext4 Developers List
On Wed, Mar 11, 2026 at 10:38:20AM +0800, Baokun Li wrote:
>
> Good spotting! ext4_find_goal() ensures that the goal block obtained for
> indirect-block-based files will not exceed EXT4_MAX_BLOCK_FILE_PHYS.
> However, on an ext4 filesystem where the two file formats are mixed,
> it is indeed possible to get an excessively large goal group via stream
> allocation.
Well, I didn't spot it; an LLM AI noticed. :-) Arguably I should have
noticed it when doing my review, but I didn't.
> Since the mixed-format case is quite rare, I think we can simply validate
> start in ext4_mb_scan_groups() and reset it to 0 when it exceeds the limit,
> like this:
No, I don't think that's enough. It's not just that we could have an
excessively large goal group. The goal group could also just be, say,
2**32 - 5. If the next 5 groups are full, then when we do an
optimized scan, we will end up beyond the 2**32 limit. That's why we
need to add some kidn of wraparound logic to *any* caller to
ext4_get_allocation_groups_count().
- Ted
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
2026-03-12 14:23 ` Theodore Tso
@ 2026-03-13 2:00 ` Baokun Li
2026-03-13 14:31 ` Theodore Tso
0 siblings, 1 reply; 6+ messages in thread
From: Baokun Li @ 2026-03-13 2:00 UTC (permalink / raw)
To: Theodore Tso; +Cc: Jan Kara, Ext4 Developers List, libaokun
On 3/12/26 10:23 PM, Theodore Tso wrote:
> On Wed, Mar 11, 2026 at 10:38:20AM +0800, Baokun Li wrote:
>> Good spotting! ext4_find_goal() ensures that the goal block obtained for
>> indirect-block-based files will not exceed EXT4_MAX_BLOCK_FILE_PHYS.
>> However, on an ext4 filesystem where the two file formats are mixed,
>> it is indeed possible to get an excessively large goal group via stream
>> allocation.
> Well, I didn't spot it; an LLM AI noticed. :-) Arguably I should have
> noticed it when doing my review, but I didn't.
>
>> Since the mixed-format case is quite rare, I think we can simply validate
>> start in ext4_mb_scan_groups() and reset it to 0 when it exceeds the limit,
>> like this:
> No, I don't think that's enough. It's not just that we could have an
> excessively large goal group. The goal group could also just be, say,
> 2**32 - 5. If the next 5 groups are full, then when we do an
> optimized scan, we will end up beyond the 2**32 limit. That's why we
> need to add some kidn of wraparound logic to *any* caller to
> ext4_get_allocation_groups_count().
>
IIUC, ngroups/end here only depend on filesystem size and whether the inode
is extent-based, and both should stay unchanged during block allocation.
So doing the check once at the beginning should be sufficient. Am I missing
anything?
Cheers,
Baokun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
2026-03-13 2:00 ` Baokun Li
@ 2026-03-13 14:31 ` Theodore Tso
2026-03-14 7:41 ` Baokun Li
0 siblings, 1 reply; 6+ messages in thread
From: Theodore Tso @ 2026-03-13 14:31 UTC (permalink / raw)
To: Baokun Li; +Cc: Jan Kara, Ext4 Developers List
On Fri, Mar 13, 2026 at 10:00:04AM +0800, Baokun Li wrote:
> IIUC, ngroups/end here only depend on filesystem size and whether the inode
> is extent-based, and both should stay unchanged during block allocation.
> So doing the check once at the beginning should be sufficient. Am I missing
> anything?
The problem here is that case where we use
ext4_get_allocation_groups_count as ngroups and end are different. In
some places we do this:
ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
and in others we do this:
end = ext4_get_allocation_groups_count(ac);
Now, if start is zero, then these two are equivalent. But if start is
not zero, but say, is 2**32 - 8, then where we use
ext4_allocaiton_groups_count() as the last block group to search, then
we only will search exactly 8 block groups, and if
there are free blocks in the first 8 block groups, then the scanning
function will fail.
Alternatively, if we just do the check at the beginning, then 2**32 -
8 is a valid starting point, but if we just search forward by ngroups,
then we may end up returning a block group which won't work for
indirect mapped inodes.
Hence, in *every* function where we call
ext4_get_allocaiton_groups_count(), if the goal is to search all block
groups that are valid for indirect mapped inodes, and start might be
greater than 0, we *have* to handle wraparound.
Does that make sense?
- Ted
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
2026-03-13 14:31 ` Theodore Tso
@ 2026-03-14 7:41 ` Baokun Li
0 siblings, 0 replies; 6+ messages in thread
From: Baokun Li @ 2026-03-14 7:41 UTC (permalink / raw)
To: Theodore Tso; +Cc: Jan Kara, Ext4 Developers List, libaokun
On 3/13/26 10:31 PM, Theodore Tso wrote:
> On Fri, Mar 13, 2026 at 10:00:04AM +0800, Baokun Li wrote:
>> IIUC, ngroups/end here only depend on filesystem size and whether the inode
>> is extent-based, and both should stay unchanged during block allocation.
>> So doing the check once at the beginning should be sufficient. Am I missing
>> anything?
> The problem here is that case where we use
> ext4_get_allocation_groups_count as ngroups and end are different. In
> some places we do this:
>
> ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
>
> and in others we do this:
>
> end = ext4_get_allocation_groups_count(ac);
>
> Now, if start is zero, then these two are equivalent. But if start is
> not zero, but say, is 2**32 - 8, then where we use
> ext4_allocaiton_groups_count() as the last block group to search, then
> we only will search exactly 8 block groups, and if
> there are free blocks in the first 8 block groups, then the scanning
> function will fail.
ext4_mb_scan_groups_p2_aligned(), ext4_mb_scan_groups_goal_fast(), and
ext4_mb_scan_groups_best_avail() will all wrap around and retry
[0, 2^32 - 8) after failing to search [2^32 - 8, 2^32). So if there are
free blocks in the first 8 block groups, the allocation will not fail.
>
> Alternatively, if we just do the check at the beginning, then 2**32 -
> 8 is a valid starting point, but if we just search forward by ngroups,
> then we may end up returning a block group which won't work for
> indirect mapped inodes.
Here, ngroups refers to the number of block groups available for
allocation, i.e. the range [0, ngroups), rather than implying that
ngroups groups are scanned on each pass. In general, we first scan the
[start, ngroups) portion of the xarrays, and then the [0, start) portion.
For indirect-mapped inodes, ext4_get_allocation_groups_count() returns
s_blockfile_groups. The end of those groups should not exceed 2^32 blocks;
otherwise, the s_blockfile_groups logic itself would be broken.
> Hence, in *every* function where we call
> ext4_get_allocaiton_groups_count(), if the goal is to search all block
> groups that are valid for indirect mapped inodes, and start might be
> greater than 0, we *have* to handle wraparound.
>
> Does that make sense?
>
Thanks for the detailed explanation, but I’m afraid I still don’t
quite follow.
Take `ext4_mb_scan_groups_p2_aligned` as an example:
ext4_mb_scan_groups
ngroups = ext4_get_allocation_groups_count(ac); // safe
start = ac->ac_g_ex.fe_group; // may be greater than ngroups
ext4_mb_scan_groups_p2_aligned()
start = group; // may be greater than ngroups
end = ext4_get_allocation_groups_count(ac); // == ngroups
wrap_around:
ext4_mb_scan_groups_xa_range(start, end)
ngroups = ext4_get_allocation_groups_count(ac); // == end
// `start` may exceed `end`, but `end` is always within bounds.
// So we only need to make sure the `start` computed by
// ext4_mb_scan_groups() is less than `ngroups`.
// Forward scan failed; let the caller handle wrap-around.
if (start) {
end = start; // `end` may be invalid if `start` is invalid.
start = 0;
goto wrap_around;
}
So it looks like the later start/end values only go wrong if the initial
start is already >= ngroups.
Did I miss something here?
Thanks,
Baokun
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-14 7:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 12:28 [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks Theodore Ts'o
2026-03-11 2:38 ` Baokun Li
2026-03-12 14:23 ` Theodore Tso
2026-03-13 2:00 ` Baokun Li
2026-03-13 14:31 ` Theodore Tso
2026-03-14 7:41 ` Baokun Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox