* Re: [PATCH 1/2] ext4: always allocate blocks only from groups inode can use
2026-01-09 10:53 ` [PATCH 1/2] ext4: always allocate blocks only from groups inode can use Jan Kara
@ 2026-01-10 0:59 ` Baokun Li
2026-01-10 1:36 ` Zhang Yi
2026-01-13 16:28 ` Pedro Falcato
2 siblings, 0 replies; 10+ messages in thread
From: Baokun Li @ 2026-01-10 0:59 UTC (permalink / raw)
To: Jan Kara; +Cc: Ted Tso, linux-ext4
On 2026-01-09 18:53, Jan Kara wrote:
> For filesystems with more than 2^32 blocks inodes using indirect block
> based format cannot use blocks beyond the 32-bit limit.
> ext4_mb_scan_groups_linear() takes care to not select these unsupported
> groups for such inodes however other functions selecting groups for
> allocation don't. So far this is harmless because the other selection
> functions are used only with mb_optimize_scan and this is currently
> disabled for inodes with indirect blocks however in the following patch
> we want to enable mb_optimize_scan regardless of inode format.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Looks good, thanks for the patch!
Reviewed-by: Baokun Li <libaokun1@huawei.com>
> ---
> fs/ext4/mballoc.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 56d50fd3310b..f0e07bf11a93 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -892,6 +892,18 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
> }
> }
>
> +static ext4_group_t ext4_get_allocation_groups_count(
> + struct ext4_allocation_context *ac)
> +{
> + ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> +
> + /* non-extent files are limited to low blocks/groups */
> + if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> + ngroups = EXT4_SB(ac->ac_sb)->s_blockfile_groups;
> +
> + return ngroups;
> +}
> +
> static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
> struct xarray *xa,
> ext4_group_t start, ext4_group_t end)
> @@ -899,7 +911,7 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
> struct super_block *sb = ac->ac_sb;
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> enum criteria cr = ac->ac_criteria;
> - ext4_group_t ngroups = ext4_get_groups_count(sb);
> + ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
> unsigned long group = start;
> struct ext4_group_info *grp;
>
> @@ -951,7 +963,7 @@ static int ext4_mb_scan_groups_p2_aligned(struct ext4_allocation_context *ac,
> ext4_group_t start, end;
>
> start = group;
> - end = ext4_get_groups_count(ac->ac_sb);
> + end = ext4_get_allocation_groups_count(ac);
> wrap_around:
> for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
> ret = ext4_mb_scan_groups_largest_free_order_range(ac, i,
> @@ -1001,7 +1013,7 @@ static int ext4_mb_scan_groups_goal_fast(struct ext4_allocation_context *ac,
> ext4_group_t start, end;
>
> start = group;
> - end = ext4_get_groups_count(ac->ac_sb);
> + end = ext4_get_allocation_groups_count(ac);
> wrap_around:
> i = mb_avg_fragment_size_order(ac->ac_sb, ac->ac_g_ex.fe_len);
> for (; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
> @@ -1083,7 +1095,7 @@ static int ext4_mb_scan_groups_best_avail(struct ext4_allocation_context *ac,
> min_order = fls(ac->ac_o_ex.fe_len);
>
> start = group;
> - end = ext4_get_groups_count(ac->ac_sb);
> + end = ext4_get_allocation_groups_count(ac);
> wrap_around:
> for (i = order; i >= min_order; i--) {
> int frag_order;
> @@ -1182,11 +1194,7 @@ static int ext4_mb_scan_groups(struct ext4_allocation_context *ac)
> int ret = 0;
> ext4_group_t start;
> struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
> - ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> -
> - /* non-extent files are limited to low blocks/groups */
> - if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> - ngroups = sbi->s_blockfile_groups;
> + ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
>
> /* searching for the right group start from the goal value specified */
> start = ac->ac_g_ex.fe_group;
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/2] ext4: always allocate blocks only from groups inode can use
2026-01-09 10:53 ` [PATCH 1/2] ext4: always allocate blocks only from groups inode can use Jan Kara
2026-01-10 0:59 ` Baokun Li
@ 2026-01-10 1:36 ` Zhang Yi
2026-01-13 16:28 ` Pedro Falcato
2 siblings, 0 replies; 10+ messages in thread
From: Zhang Yi @ 2026-01-10 1:36 UTC (permalink / raw)
To: Jan Kara, Ted Tso; +Cc: linux-ext4, Baokun Li
On 1/9/2026 6:53 PM, Jan Kara wrote:
> For filesystems with more than 2^32 blocks inodes using indirect block
> based format cannot use blocks beyond the 32-bit limit.
> ext4_mb_scan_groups_linear() takes care to not select these unsupported
> groups for such inodes however other functions selecting groups for
> allocation don't. So far this is harmless because the other selection
> functions are used only with mb_optimize_scan and this is currently
> disabled for inodes with indirect blocks however in the following patch
> we want to enable mb_optimize_scan regardless of inode format.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Looks good to me.
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> ---
> fs/ext4/mballoc.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 56d50fd3310b..f0e07bf11a93 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -892,6 +892,18 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
> }
> }
>
> +static ext4_group_t ext4_get_allocation_groups_count(
> + struct ext4_allocation_context *ac)
> +{
> + ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> +
> + /* non-extent files are limited to low blocks/groups */
> + if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> + ngroups = EXT4_SB(ac->ac_sb)->s_blockfile_groups;
> +
> + return ngroups;
> +}
> +
> static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
> struct xarray *xa,
> ext4_group_t start, ext4_group_t end)
> @@ -899,7 +911,7 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
> struct super_block *sb = ac->ac_sb;
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> enum criteria cr = ac->ac_criteria;
> - ext4_group_t ngroups = ext4_get_groups_count(sb);
> + ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
> unsigned long group = start;
> struct ext4_group_info *grp;
>
> @@ -951,7 +963,7 @@ static int ext4_mb_scan_groups_p2_aligned(struct ext4_allocation_context *ac,
> ext4_group_t start, end;
>
> start = group;
> - end = ext4_get_groups_count(ac->ac_sb);
> + end = ext4_get_allocation_groups_count(ac);
> wrap_around:
> for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
> ret = ext4_mb_scan_groups_largest_free_order_range(ac, i,
> @@ -1001,7 +1013,7 @@ static int ext4_mb_scan_groups_goal_fast(struct ext4_allocation_context *ac,
> ext4_group_t start, end;
>
> start = group;
> - end = ext4_get_groups_count(ac->ac_sb);
> + end = ext4_get_allocation_groups_count(ac);
> wrap_around:
> i = mb_avg_fragment_size_order(ac->ac_sb, ac->ac_g_ex.fe_len);
> for (; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
> @@ -1083,7 +1095,7 @@ static int ext4_mb_scan_groups_best_avail(struct ext4_allocation_context *ac,
> min_order = fls(ac->ac_o_ex.fe_len);
>
> start = group;
> - end = ext4_get_groups_count(ac->ac_sb);
> + end = ext4_get_allocation_groups_count(ac);
> wrap_around:
> for (i = order; i >= min_order; i--) {
> int frag_order;
> @@ -1182,11 +1194,7 @@ static int ext4_mb_scan_groups(struct ext4_allocation_context *ac)
> int ret = 0;
> ext4_group_t start;
> struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
> - ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> -
> - /* non-extent files are limited to low blocks/groups */
> - if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> - ngroups = sbi->s_blockfile_groups;
> + ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
>
> /* searching for the right group start from the goal value specified */
> start = ac->ac_g_ex.fe_group;
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/2] ext4: always allocate blocks only from groups inode can use
2026-01-09 10:53 ` [PATCH 1/2] ext4: always allocate blocks only from groups inode can use Jan Kara
2026-01-10 0:59 ` Baokun Li
2026-01-10 1:36 ` Zhang Yi
@ 2026-01-13 16:28 ` Pedro Falcato
2026-01-14 17:26 ` Jan Kara
2 siblings, 1 reply; 10+ messages in thread
From: Pedro Falcato @ 2026-01-13 16:28 UTC (permalink / raw)
To: Jan Kara; +Cc: Ted Tso, linux-ext4, Baokun Li
On Fri, Jan 09, 2026 at 11:53:37AM +0100, Jan Kara wrote:
> For filesystems with more than 2^32 blocks inodes using indirect block
> based format cannot use blocks beyond the 32-bit limit.
> ext4_mb_scan_groups_linear() takes care to not select these unsupported
> groups for such inodes however other functions selecting groups for
> allocation don't. So far this is harmless because the other selection
> functions are used only with mb_optimize_scan and this is currently
> disabled for inodes with indirect blocks however in the following patch
> we want to enable mb_optimize_scan regardless of inode format.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/ext4/mballoc.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 56d50fd3310b..f0e07bf11a93 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -892,6 +892,18 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
> }
> }
>
> +static ext4_group_t ext4_get_allocation_groups_count(
> + struct ext4_allocation_context *ac)
> +{
> + ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> +
> + /* non-extent files are limited to low blocks/groups */
> + if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> + ngroups = EXT4_SB(ac->ac_sb)->s_blockfile_groups;
> +
> + return ngroups;
> +}
I know you're mostly only moving code around, but I think I see a problem here.
Namely, we (probably?) need an smp_rmb() right after the s_blockfile_groups
read to pair with the one in ext4_update_super(). The pre-existing smp_rmb()
in ext4_get_groups_acount() after the s_groups_count load perhaps *incidentally*
works here, but it seems to me like we need a new barrier. So fundamentally
something like:
static ext4_group_t ext4_get_allocation_groups_count(...)
{
struct ext4_sb_info *sb = EXT4_SB(ac->ac_sb);
ext4_group_t ngroups;
ngroups = sb->s_groups_count;
if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))
ngroups = sb->s_blockfile_groups;
/* pairs with ext4_group_add() logic */
smp_rmb();
return ngroups;
}
and to be even more technically correct, we probably want READ_ONCE()
and WRITE_ONCE() here as well.
Does this make sense?
--
Pedro
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/2] ext4: always allocate blocks only from groups inode can use
2026-01-13 16:28 ` Pedro Falcato
@ 2026-01-14 17:26 ` Jan Kara
0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2026-01-14 17:26 UTC (permalink / raw)
To: Pedro Falcato; +Cc: Jan Kara, Ted Tso, linux-ext4, Baokun Li
On Tue 13-01-26 16:28:07, Pedro Falcato wrote:
> On Fri, Jan 09, 2026 at 11:53:37AM +0100, Jan Kara wrote:
> > For filesystems with more than 2^32 blocks inodes using indirect block
> > based format cannot use blocks beyond the 32-bit limit.
> > ext4_mb_scan_groups_linear() takes care to not select these unsupported
> > groups for such inodes however other functions selecting groups for
> > allocation don't. So far this is harmless because the other selection
> > functions are used only with mb_optimize_scan and this is currently
> > disabled for inodes with indirect blocks however in the following patch
> > we want to enable mb_optimize_scan regardless of inode format.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/ext4/mballoc.c | 26 +++++++++++++++++---------
> > 1 file changed, 17 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 56d50fd3310b..f0e07bf11a93 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -892,6 +892,18 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
> > }
> > }
> >
> > +static ext4_group_t ext4_get_allocation_groups_count(
> > + struct ext4_allocation_context *ac)
> > +{
> > + ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> > +
> > + /* non-extent files are limited to low blocks/groups */
> > + if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> > + ngroups = EXT4_SB(ac->ac_sb)->s_blockfile_groups;
> > +
> > + return ngroups;
> > +}
>
> I know you're mostly only moving code around, but I think I see a problem here.
> Namely, we (probably?) need an smp_rmb() right after the s_blockfile_groups
> read to pair with the one in ext4_update_super(). The pre-existing smp_rmb()
> in ext4_get_groups_acount() after the s_groups_count load perhaps *incidentally*
> works here, but it seems to me like we need a new barrier. So fundamentally
> something like:
>
> static ext4_group_t ext4_get_allocation_groups_count(...)
> {
> struct ext4_sb_info *sb = EXT4_SB(ac->ac_sb);
> ext4_group_t ngroups;
>
> ngroups = sb->s_groups_count;
> if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))
> ngroups = sb->s_blockfile_groups;
> /* pairs with ext4_group_add() logic */
> smp_rmb();
> return ngroups;
> }
>
> and to be even more technically correct, we probably want READ_ONCE()
> and WRITE_ONCE() here as well.
>
> Does this make sense?
I agree with both although I'd note this isn't strictly related to this
patch as the problem is already preexisting in the code. I think smp_rmb()
is good to add when we are touching the code, regarding READ_ONCE /
WRITE_ONCE, that will require modifying all the places touching
s_blockfile_groups / s_groups_count so I'd leave that for a separate series
as that's going to be more intrusive.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread