public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2 v2] ext4: use mb_optimize_scan regardless of inode format
@ 2026-01-09 10:53 Jan Kara
  2026-01-09 10:53 ` [PATCH 1/2] ext4: always allocate blocks only from groups inode can use Jan Kara
  2026-01-09 10:53 ` [PATCH 2/2] ext4: use optimized mballoc scanning regardless of inode format Jan Kara
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Kara @ 2026-01-09 10:53 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Baokun Li, Jan Kara

Hello,

this patch series enables use of mballoc optimizations regardless of the inode
format. See patch 2 for details.

Changes since v1:
* Added patch to make sure mballoc doesn't select group with block numbers
  greater than 2^32 for indirect block based inodes

Previous versions:
v1: https://lore.kernel.org/all/20260108160907.24892-2-jack@suse.cz/

								Honza

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] ext4: always allocate blocks only from groups inode can use
  2026-01-09 10:53 [PATCH 0/2 v2] ext4: use mb_optimize_scan regardless of inode format Jan Kara
@ 2026-01-09 10:53 ` Jan Kara
  2026-01-10  0:59   ` Baokun Li
                     ` (2 more replies)
  2026-01-09 10:53 ` [PATCH 2/2] ext4: use optimized mballoc scanning regardless of inode format Jan Kara
  1 sibling, 3 replies; 10+ messages in thread
From: Jan Kara @ 2026-01-09 10:53 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Baokun Li, Jan Kara

For filesystems with more than 2^32 blocks inodes using indirect block
based format cannot use blocks beyond the 32-bit limit.
ext4_mb_scan_groups_linear() takes care to not select these unsupported
groups for such inodes however other functions selecting groups for
allocation don't. So far this is harmless because the other selection
functions are used only with mb_optimize_scan and this is currently
disabled for inodes with indirect blocks however in the following patch
we want to enable mb_optimize_scan regardless of inode format.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/mballoc.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 56d50fd3310b..f0e07bf11a93 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -892,6 +892,18 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
 	}
 }
 
+static ext4_group_t ext4_get_allocation_groups_count(
+				struct ext4_allocation_context *ac)
+{
+	ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
+
+	/* non-extent files are limited to low blocks/groups */
+	if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
+		ngroups = EXT4_SB(ac->ac_sb)->s_blockfile_groups;
+
+	return ngroups;
+}
+
 static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
 					struct xarray *xa,
 					ext4_group_t start, ext4_group_t end)
@@ -899,7 +911,7 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
 	struct super_block *sb = ac->ac_sb;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	enum criteria cr = ac->ac_criteria;
-	ext4_group_t ngroups = ext4_get_groups_count(sb);
+	ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
 	unsigned long group = start;
 	struct ext4_group_info *grp;
 
@@ -951,7 +963,7 @@ static int ext4_mb_scan_groups_p2_aligned(struct ext4_allocation_context *ac,
 	ext4_group_t start, end;
 
 	start = group;
-	end = ext4_get_groups_count(ac->ac_sb);
+	end = ext4_get_allocation_groups_count(ac);
 wrap_around:
 	for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
 		ret = ext4_mb_scan_groups_largest_free_order_range(ac, i,
@@ -1001,7 +1013,7 @@ static int ext4_mb_scan_groups_goal_fast(struct ext4_allocation_context *ac,
 	ext4_group_t start, end;
 
 	start = group;
-	end = ext4_get_groups_count(ac->ac_sb);
+	end = ext4_get_allocation_groups_count(ac);
 wrap_around:
 	i = mb_avg_fragment_size_order(ac->ac_sb, ac->ac_g_ex.fe_len);
 	for (; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
@@ -1083,7 +1095,7 @@ static int ext4_mb_scan_groups_best_avail(struct ext4_allocation_context *ac,
 		min_order = fls(ac->ac_o_ex.fe_len);
 
 	start = group;
-	end = ext4_get_groups_count(ac->ac_sb);
+	end = ext4_get_allocation_groups_count(ac);
 wrap_around:
 	for (i = order; i >= min_order; i--) {
 		int frag_order;
@@ -1182,11 +1194,7 @@ static int ext4_mb_scan_groups(struct ext4_allocation_context *ac)
 	int ret = 0;
 	ext4_group_t start;
 	struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
-	ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
-
-	/* non-extent files are limited to low blocks/groups */
-	if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
-		ngroups = sbi->s_blockfile_groups;
+	ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
 
 	/* searching for the right group start from the goal value specified */
 	start = ac->ac_g_ex.fe_group;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] ext4: use optimized mballoc scanning regardless of inode format
  2026-01-09 10:53 [PATCH 0/2 v2] ext4: use mb_optimize_scan regardless of inode format Jan Kara
  2026-01-09 10:53 ` [PATCH 1/2] ext4: always allocate blocks only from groups inode can use Jan Kara
@ 2026-01-09 10:53 ` Jan Kara
  2026-01-10  1:00   ` Baokun Li
  2026-01-10  1:38   ` Zhang Yi
  1 sibling, 2 replies; 10+ messages in thread
From: Jan Kara @ 2026-01-09 10:53 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Baokun Li, Jan Kara

Currently we don't used mballoc optimized scanning (using max free
extent order and avg free extent order group lists) for inodes with
indirect block based format. This is confusing for users and I don't see
a good reason for that. Even with indirect block based inode format we
can spend big amount of time searching for free blocks for large
filesystems with fragmented free space. To add to the confusion before
commit 077d0c2c78df ("ext4: make mb_optimize_scan performance mount
option work with extents") optimized scanning was applied *only* to
indirect block based inodes so that commit appears as a performance
regression to some users. Just use optimized scanning whenever it is
enabled by mount options.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/mballoc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f0e07bf11a93..cd98c472631e 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1145,8 +1145,6 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac)
 		return 0;
 	if (ac->ac_criteria >= CR_GOAL_LEN_SLOW)
 		return 0;
-	if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))
-		return 0;
 	return 1;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] ext4: always allocate blocks only from groups inode can use
  2026-01-09 10:53 ` [PATCH 1/2] ext4: always allocate blocks only from groups inode can use Jan Kara
@ 2026-01-10  0:59   ` Baokun Li
  2026-01-10  1:36   ` Zhang Yi
  2026-01-13 16:28   ` Pedro Falcato
  2 siblings, 0 replies; 10+ messages in thread
From: Baokun Li @ 2026-01-10  0:59 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4

On 2026-01-09 18:53, Jan Kara wrote:
> For filesystems with more than 2^32 blocks inodes using indirect block
> based format cannot use blocks beyond the 32-bit limit.
> ext4_mb_scan_groups_linear() takes care to not select these unsupported
> groups for such inodes however other functions selecting groups for
> allocation don't. So far this is harmless because the other selection
> functions are used only with mb_optimize_scan and this is currently
> disabled for inodes with indirect blocks however in the following patch
> we want to enable mb_optimize_scan regardless of inode format.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Looks good, thanks for the patch!

Reviewed-by: Baokun Li <libaokun1@huawei.com>

> ---
>  fs/ext4/mballoc.c | 26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 56d50fd3310b..f0e07bf11a93 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -892,6 +892,18 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
>  	}
>  }
>  
> +static ext4_group_t ext4_get_allocation_groups_count(
> +				struct ext4_allocation_context *ac)
> +{
> +	ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> +
> +	/* non-extent files are limited to low blocks/groups */
> +	if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> +		ngroups = EXT4_SB(ac->ac_sb)->s_blockfile_groups;
> +
> +	return ngroups;
> +}
> +
>  static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
>  					struct xarray *xa,
>  					ext4_group_t start, ext4_group_t end)
> @@ -899,7 +911,7 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
>  	struct super_block *sb = ac->ac_sb;
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
>  	enum criteria cr = ac->ac_criteria;
> -	ext4_group_t ngroups = ext4_get_groups_count(sb);
> +	ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
>  	unsigned long group = start;
>  	struct ext4_group_info *grp;
>  
> @@ -951,7 +963,7 @@ static int ext4_mb_scan_groups_p2_aligned(struct ext4_allocation_context *ac,
>  	ext4_group_t start, end;
>  
>  	start = group;
> -	end = ext4_get_groups_count(ac->ac_sb);
> +	end = ext4_get_allocation_groups_count(ac);
>  wrap_around:
>  	for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
>  		ret = ext4_mb_scan_groups_largest_free_order_range(ac, i,
> @@ -1001,7 +1013,7 @@ static int ext4_mb_scan_groups_goal_fast(struct ext4_allocation_context *ac,
>  	ext4_group_t start, end;
>  
>  	start = group;
> -	end = ext4_get_groups_count(ac->ac_sb);
> +	end = ext4_get_allocation_groups_count(ac);
>  wrap_around:
>  	i = mb_avg_fragment_size_order(ac->ac_sb, ac->ac_g_ex.fe_len);
>  	for (; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
> @@ -1083,7 +1095,7 @@ static int ext4_mb_scan_groups_best_avail(struct ext4_allocation_context *ac,
>  		min_order = fls(ac->ac_o_ex.fe_len);
>  
>  	start = group;
> -	end = ext4_get_groups_count(ac->ac_sb);
> +	end = ext4_get_allocation_groups_count(ac);
>  wrap_around:
>  	for (i = order; i >= min_order; i--) {
>  		int frag_order;
> @@ -1182,11 +1194,7 @@ static int ext4_mb_scan_groups(struct ext4_allocation_context *ac)
>  	int ret = 0;
>  	ext4_group_t start;
>  	struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
> -	ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> -
> -	/* non-extent files are limited to low blocks/groups */
> -	if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> -		ngroups = sbi->s_blockfile_groups;
> +	ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
>  
>  	/* searching for the right group start from the goal value specified */
>  	start = ac->ac_g_ex.fe_group;



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ext4: use optimized mballoc scanning regardless of inode format
  2026-01-09 10:53 ` [PATCH 2/2] ext4: use optimized mballoc scanning regardless of inode format Jan Kara
@ 2026-01-10  1:00   ` Baokun Li
  2026-01-10  1:38   ` Zhang Yi
  1 sibling, 0 replies; 10+ messages in thread
From: Baokun Li @ 2026-01-10  1:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4

On 2026-01-09 18:53, Jan Kara wrote:
> Currently we don't used mballoc optimized scanning (using max free
> extent order and avg free extent order group lists) for inodes with
> indirect block based format. This is confusing for users and I don't see
> a good reason for that. Even with indirect block based inode format we
> can spend big amount of time searching for free blocks for large
> filesystems with fragmented free space. To add to the confusion before
> commit 077d0c2c78df ("ext4: make mb_optimize_scan performance mount
> option work with extents") optimized scanning was applied *only* to
> indirect block based inodes so that commit appears as a performance
> regression to some users. Just use optimized scanning whenever it is
> enabled by mount options.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Makes sense. Feel free to add:

Reviewed-by: Baokun Li <libaokun1@huawei.com>

> ---
>  fs/ext4/mballoc.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index f0e07bf11a93..cd98c472631e 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1145,8 +1145,6 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac)
>  		return 0;
>  	if (ac->ac_criteria >= CR_GOAL_LEN_SLOW)
>  		return 0;
> -	if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))
> -		return 0;
>  	return 1;
>  }
>  



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] ext4: always allocate blocks only from groups inode can use
  2026-01-09 10:53 ` [PATCH 1/2] ext4: always allocate blocks only from groups inode can use Jan Kara
  2026-01-10  0:59   ` Baokun Li
@ 2026-01-10  1:36   ` Zhang Yi
  2026-01-13 16:28   ` Pedro Falcato
  2 siblings, 0 replies; 10+ messages in thread
From: Zhang Yi @ 2026-01-10  1:36 UTC (permalink / raw)
  To: Jan Kara, Ted Tso; +Cc: linux-ext4, Baokun Li

On 1/9/2026 6:53 PM, Jan Kara wrote:
> For filesystems with more than 2^32 blocks inodes using indirect block
> based format cannot use blocks beyond the 32-bit limit.
> ext4_mb_scan_groups_linear() takes care to not select these unsupported
> groups for such inodes however other functions selecting groups for
> allocation don't. So far this is harmless because the other selection
> functions are used only with mb_optimize_scan and this is currently
> disabled for inodes with indirect blocks however in the following patch
> we want to enable mb_optimize_scan regardless of inode format.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>   fs/ext4/mballoc.c | 26 +++++++++++++++++---------
>   1 file changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 56d50fd3310b..f0e07bf11a93 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -892,6 +892,18 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
>   	}
>   }
>   
> +static ext4_group_t ext4_get_allocation_groups_count(
> +				struct ext4_allocation_context *ac)
> +{
> +	ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> +
> +	/* non-extent files are limited to low blocks/groups */
> +	if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> +		ngroups = EXT4_SB(ac->ac_sb)->s_blockfile_groups;
> +
> +	return ngroups;
> +}
> +
>   static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
>   					struct xarray *xa,
>   					ext4_group_t start, ext4_group_t end)
> @@ -899,7 +911,7 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
>   	struct super_block *sb = ac->ac_sb;
>   	struct ext4_sb_info *sbi = EXT4_SB(sb);
>   	enum criteria cr = ac->ac_criteria;
> -	ext4_group_t ngroups = ext4_get_groups_count(sb);
> +	ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
>   	unsigned long group = start;
>   	struct ext4_group_info *grp;
>   
> @@ -951,7 +963,7 @@ static int ext4_mb_scan_groups_p2_aligned(struct ext4_allocation_context *ac,
>   	ext4_group_t start, end;
>   
>   	start = group;
> -	end = ext4_get_groups_count(ac->ac_sb);
> +	end = ext4_get_allocation_groups_count(ac);
>   wrap_around:
>   	for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
>   		ret = ext4_mb_scan_groups_largest_free_order_range(ac, i,
> @@ -1001,7 +1013,7 @@ static int ext4_mb_scan_groups_goal_fast(struct ext4_allocation_context *ac,
>   	ext4_group_t start, end;
>   
>   	start = group;
> -	end = ext4_get_groups_count(ac->ac_sb);
> +	end = ext4_get_allocation_groups_count(ac);
>   wrap_around:
>   	i = mb_avg_fragment_size_order(ac->ac_sb, ac->ac_g_ex.fe_len);
>   	for (; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
> @@ -1083,7 +1095,7 @@ static int ext4_mb_scan_groups_best_avail(struct ext4_allocation_context *ac,
>   		min_order = fls(ac->ac_o_ex.fe_len);
>   
>   	start = group;
> -	end = ext4_get_groups_count(ac->ac_sb);
> +	end = ext4_get_allocation_groups_count(ac);
>   wrap_around:
>   	for (i = order; i >= min_order; i--) {
>   		int frag_order;
> @@ -1182,11 +1194,7 @@ static int ext4_mb_scan_groups(struct ext4_allocation_context *ac)
>   	int ret = 0;
>   	ext4_group_t start;
>   	struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
> -	ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> -
> -	/* non-extent files are limited to low blocks/groups */
> -	if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> -		ngroups = sbi->s_blockfile_groups;
> +	ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
>   
>   	/* searching for the right group start from the goal value specified */
>   	start = ac->ac_g_ex.fe_group;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ext4: use optimized mballoc scanning regardless of inode format
  2026-01-09 10:53 ` [PATCH 2/2] ext4: use optimized mballoc scanning regardless of inode format Jan Kara
  2026-01-10  1:00   ` Baokun Li
@ 2026-01-10  1:38   ` Zhang Yi
  1 sibling, 0 replies; 10+ messages in thread
From: Zhang Yi @ 2026-01-10  1:38 UTC (permalink / raw)
  To: Jan Kara, Ted Tso; +Cc: linux-ext4, Baokun Li

On 1/9/2026 6:53 PM, Jan Kara wrote:
> Currently we don't used mballoc optimized scanning (using max free
> extent order and avg free extent order group lists) for inodes with
> indirect block based format. This is confusing for users and I don't see
> a good reason for that. Even with indirect block based inode format we
> can spend big amount of time searching for free blocks for large
> filesystems with fragmented free space. To add to the confusion before
> commit 077d0c2c78df ("ext4: make mb_optimize_scan performance mount
> option work with extents") optimized scanning was applied *only* to
> indirect block based inodes so that commit appears as a performance
> regression to some users. Just use optimized scanning whenever it is
> enabled by mount options.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Makes sense to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>   fs/ext4/mballoc.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index f0e07bf11a93..cd98c472631e 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1145,8 +1145,6 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac)
>   		return 0;
>   	if (ac->ac_criteria >= CR_GOAL_LEN_SLOW)
>   		return 0;
> -	if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))
> -		return 0;
>   	return 1;
>   }
>   


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] ext4: always allocate blocks only from groups inode can use
  2026-01-09 10:53 ` [PATCH 1/2] ext4: always allocate blocks only from groups inode can use Jan Kara
  2026-01-10  0:59   ` Baokun Li
  2026-01-10  1:36   ` Zhang Yi
@ 2026-01-13 16:28   ` Pedro Falcato
  2026-01-14 17:26     ` Jan Kara
  2 siblings, 1 reply; 10+ messages in thread
From: Pedro Falcato @ 2026-01-13 16:28 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4, Baokun Li

On Fri, Jan 09, 2026 at 11:53:37AM +0100, Jan Kara wrote:
> For filesystems with more than 2^32 blocks inodes using indirect block
> based format cannot use blocks beyond the 32-bit limit.
> ext4_mb_scan_groups_linear() takes care to not select these unsupported
> groups for such inodes however other functions selecting groups for
> allocation don't. So far this is harmless because the other selection
> functions are used only with mb_optimize_scan and this is currently
> disabled for inodes with indirect blocks however in the following patch
> we want to enable mb_optimize_scan regardless of inode format.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/mballoc.c | 26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 56d50fd3310b..f0e07bf11a93 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -892,6 +892,18 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
>  	}
>  }
>  
> +static ext4_group_t ext4_get_allocation_groups_count(
> +				struct ext4_allocation_context *ac)
> +{
> +	ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> +
> +	/* non-extent files are limited to low blocks/groups */
> +	if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> +		ngroups = EXT4_SB(ac->ac_sb)->s_blockfile_groups;
> +
> +	return ngroups;
> +}

I know you're mostly only moving code around, but I think I see a problem here.
Namely, we (probably?) need an smp_rmb() right after the s_blockfile_groups
read to pair with the one in ext4_update_super(). The pre-existing smp_rmb()
in ext4_get_groups_acount() after the s_groups_count load perhaps *incidentally*
works here, but it seems to me like we need a new barrier. So fundamentally
something like:

static ext4_group_t ext4_get_allocation_groups_count(...)
{
	struct ext4_sb_info *sb = EXT4_SB(ac->ac_sb);
	ext4_group_t ngroups;

	ngroups = sb->s_groups_count;
	if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))
		ngroups = sb->s_blockfile_groups;
	/* pairs with ext4_group_add() logic */
	smp_rmb();
	return ngroups;
}

and to be even more technically correct, we probably want READ_ONCE()
and WRITE_ONCE() here as well.

Does this make sense?

-- 
Pedro

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] ext4: always allocate blocks only from groups inode can use
  2026-01-13 16:28   ` Pedro Falcato
@ 2026-01-14 17:26     ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2026-01-14 17:26 UTC (permalink / raw)
  To: Pedro Falcato; +Cc: Jan Kara, Ted Tso, linux-ext4, Baokun Li

On Tue 13-01-26 16:28:07, Pedro Falcato wrote:
> On Fri, Jan 09, 2026 at 11:53:37AM +0100, Jan Kara wrote:
> > For filesystems with more than 2^32 blocks inodes using indirect block
> > based format cannot use blocks beyond the 32-bit limit.
> > ext4_mb_scan_groups_linear() takes care to not select these unsupported
> > groups for such inodes however other functions selecting groups for
> > allocation don't. So far this is harmless because the other selection
> > functions are used only with mb_optimize_scan and this is currently
> > disabled for inodes with indirect blocks however in the following patch
> > we want to enable mb_optimize_scan regardless of inode format.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ext4/mballoc.c | 26 +++++++++++++++++---------
> >  1 file changed, 17 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 56d50fd3310b..f0e07bf11a93 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -892,6 +892,18 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
> >  	}
> >  }
> >  
> > +static ext4_group_t ext4_get_allocation_groups_count(
> > +				struct ext4_allocation_context *ac)
> > +{
> > +	ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb);
> > +
> > +	/* non-extent files are limited to low blocks/groups */
> > +	if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
> > +		ngroups = EXT4_SB(ac->ac_sb)->s_blockfile_groups;
> > +
> > +	return ngroups;
> > +}
> 
> I know you're mostly only moving code around, but I think I see a problem here.
> Namely, we (probably?) need an smp_rmb() right after the s_blockfile_groups
> read to pair with the one in ext4_update_super(). The pre-existing smp_rmb()
> in ext4_get_groups_acount() after the s_groups_count load perhaps *incidentally*
> works here, but it seems to me like we need a new barrier. So fundamentally
> something like:
> 
> static ext4_group_t ext4_get_allocation_groups_count(...)
> {
> 	struct ext4_sb_info *sb = EXT4_SB(ac->ac_sb);
> 	ext4_group_t ngroups;
> 
> 	ngroups = sb->s_groups_count;
> 	if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))
> 		ngroups = sb->s_blockfile_groups;
> 	/* pairs with ext4_group_add() logic */
> 	smp_rmb();
> 	return ngroups;
> }
> 
> and to be even more technically correct, we probably want READ_ONCE()
> and WRITE_ONCE() here as well.
> 
> Does this make sense?

I agree with both although I'd note this isn't strictly related to this
patch as the problem is already preexisting in the code. I think smp_rmb()
is good to add when we are touching the code, regarding READ_ONCE /
WRITE_ONCE, that will require modifying all the places touching
s_blockfile_groups / s_groups_count so I'd leave that for a separate series
as that's going to be more intrusive.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] ext4: use optimized mballoc scanning regardless of inode format
  2026-01-14 18:28 [PATCH 0/2 v3] ext4: use mb_optimize_scan " Jan Kara
@ 2026-01-14 18:28 ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2026-01-14 18:28 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Baokun Li, Pedro Falcato, Jan Kara, Zhang Yi

Currently we don't used mballoc optimized scanning (using max free
extent order and avg free extent order group lists) for inodes with
indirect block based format. This is confusing for users and I don't see
a good reason for that. Even with indirect block based inode format we
can spend big amount of time searching for free blocks for large
filesystems with fragmented free space. To add to the confusion before
commit 077d0c2c78df ("ext4: make mb_optimize_scan performance mount
option work with extents") optimized scanning was applied *only* to
indirect block based inodes so that commit appears as a performance
regression to some users. Just use optimized scanning whenever it is
enabled by mount options.

Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/mballoc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a88fbaa4f5f4..bca62cc2be1c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1148,8 +1148,6 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac)
 		return 0;
 	if (ac->ac_criteria >= CR_GOAL_LEN_SLOW)
 		return 0;
-	if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))
-		return 0;
 	return 1;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-01-14 18:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-09 10:53 [PATCH 0/2 v2] ext4: use mb_optimize_scan regardless of inode format Jan Kara
2026-01-09 10:53 ` [PATCH 1/2] ext4: always allocate blocks only from groups inode can use Jan Kara
2026-01-10  0:59   ` Baokun Li
2026-01-10  1:36   ` Zhang Yi
2026-01-13 16:28   ` Pedro Falcato
2026-01-14 17:26     ` Jan Kara
2026-01-09 10:53 ` [PATCH 2/2] ext4: use optimized mballoc scanning regardless of inode format Jan Kara
2026-01-10  1:00   ` Baokun Li
2026-01-10  1:38   ` Zhang Yi
  -- strict thread matches above, loose matches on Subject: below --
2026-01-14 18:28 [PATCH 0/2 v3] ext4: use mb_optimize_scan " Jan Kara
2026-01-14 18:28 ` [PATCH 2/2] ext4: use optimized mballoc scanning " Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox