public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks
@ 2025-11-07 11:58 Yang Erkun
  2025-11-07 11:58 ` [PATCH v3 2/3] ext4: rename EXT4_GET_BLOCKS_PRE_IO Yang Erkun
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Yang Erkun @ 2025-11-07 11:58 UTC (permalink / raw)
  To: linux-ext4, tytso, adilger.kernel, jack
  Cc: yi.zhang, libaokun1, yangerkun, yangerkun

IO path with EXT4_GET_BLOCKS_PRE_IO means dio within i_size or
dioread_nolock buffer writeback, they all means we need a unwritten
extent(or this extent has already been initialized), and the split won't
zero the range we really write. So this check seems useless. Besides,
even if we repeatedly execute ext4_es_insert_extent, there won't
actually be any issues.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
---
 fs/ext4/inode.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e99306a8f47c..e8bac93ca668 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -583,7 +583,6 @@ static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
 static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
 				  struct ext4_map_blocks *map, int flags)
 {
-	struct extent_status es;
 	unsigned int status;
 	int err, retval = 0;
 
@@ -644,16 +643,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
 			return err;
 	}
 
-	/*
-	 * If the extent has been zeroed out, we don't need to update
-	 * extent status tree.
-	 */
-	if (flags & EXT4_GET_BLOCKS_PRE_IO &&
-	    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
-		if (ext4_es_is_written(&es))
-			return retval;
-	}
-
 	status = map->m_flags & EXT4_MAP_UNWRITTEN ?
 			EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
 	ext4_es_insert_extent(inode, map->m_lblk, map->m_len, map->m_pblk,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/3] ext4: rename EXT4_GET_BLOCKS_PRE_IO
  2025-11-07 11:58 [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks Yang Erkun
@ 2025-11-07 11:58 ` Yang Erkun
  2025-11-08  1:49   ` Baokun Li
  2025-11-07 11:58 ` [PATCH v3 3/3] ext4: cleanup for ext4_map_blocks Yang Erkun
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Yang Erkun @ 2025-11-07 11:58 UTC (permalink / raw)
  To: linux-ext4, tytso, adilger.kernel, jack
  Cc: yi.zhang, libaokun1, yangerkun, yangerkun

This flag has been generalized to split an unwritten extent when we do
dio or dioread_nolock writeback, or to avoid merge new extents which was
created by extents split. Update some related comments too.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
---
 fs/ext4/ext4.h              | 21 +++++++++++++++------
 fs/ext4/extents.c           | 16 ++++++++--------
 include/trace/events/ext4.h |  2 +-
 3 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 57087da6c7be..6ee0bc072589 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -694,13 +694,22 @@ enum {
 	/* Caller is from the delayed allocation writeout path
 	 * finally doing the actual allocation of delayed blocks */
 #define EXT4_GET_BLOCKS_DELALLOC_RESERVE	0x0004
-	/* caller is from the direct IO path, request to creation of an
-	unwritten extents if not allocated, split the unwritten
-	extent if blocks has been preallocated already*/
-#define EXT4_GET_BLOCKS_PRE_IO			0x0008
-#define EXT4_GET_BLOCKS_CONVERT			0x0010
-#define EXT4_GET_BLOCKS_IO_CREATE_EXT		(EXT4_GET_BLOCKS_PRE_IO|\
+	/*
+	 * This means that we cannot merge newly allocated extents, and if we
+	 * found an unwritten extent, we need to split it.
+	 */
+#define EXT4_GET_BLOCKS_SPLIT_NOMERGE		0x0008
+	/*
+	 * Caller is from the dio or dioread_nolock buffered IO, reqest to
+	 * create an unwritten extent if it does not exist or split the
+	 * found unwritten extent. Also do not merge the newly created
+	 * unwritten extent, io end will convert unwritten to written,
+	 * and try to merge the written extent.
+	 */
+#define EXT4_GET_BLOCKS_IO_CREATE_EXT		(EXT4_GET_BLOCKS_SPLIT_NOMERGE|\
 					 EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT)
+	/* Convert unwritten extent to initialized. */
+#define EXT4_GET_BLOCKS_CONVERT			0x0010
 	/* Eventual metadata allocation (due to growing extent tree)
 	 * should not fail, so try to use reserved blocks for that.*/
 #define EXT4_GET_BLOCKS_METADATA_NOFAIL		0x0020
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ca5499e9412b..241b5f5d29ad 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -333,7 +333,7 @@ ext4_force_split_extent_at(handle_t *handle, struct inode *inode,
 			   int nofail)
 {
 	int unwritten = ext4_ext_is_unwritten(path[path->p_depth].p_ext);
-	int flags = EXT4_EX_NOCACHE | EXT4_GET_BLOCKS_PRE_IO;
+	int flags = EXT4_EX_NOCACHE | EXT4_GET_BLOCKS_SPLIT_NOMERGE;
 
 	if (nofail)
 		flags |= EXT4_GET_BLOCKS_METADATA_NOFAIL | EXT4_EX_NOFAIL;
@@ -2002,7 +2002,7 @@ ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
 	}
 
 	/* try to insert block into found extent and return */
-	if (ex && !(gb_flags & EXT4_GET_BLOCKS_PRE_IO)) {
+	if (ex && !(gb_flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE)) {
 
 		/*
 		 * Try to see whether we should rather test the extent on
@@ -2181,7 +2181,7 @@ ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
 
 merge:
 	/* try to merge extents */
-	if (!(gb_flags & EXT4_GET_BLOCKS_PRE_IO))
+	if (!(gb_flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE))
 		ext4_ext_try_to_merge(handle, inode, path, nearex);
 
 	/* time to correct all indexes above */
@@ -3224,7 +3224,7 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
 		else
 			ext4_ext_mark_initialized(ex);
 
-		if (!(flags & EXT4_GET_BLOCKS_PRE_IO))
+		if (!(flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE))
 			ext4_ext_try_to_merge(handle, inode, path, ex);
 
 		err = ext4_ext_dirty(handle, inode, path + path->p_depth);
@@ -3368,7 +3368,7 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
 
 	if (map->m_lblk + map->m_len < ee_block + ee_len) {
 		split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
-		flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;
+		flags1 = flags | EXT4_GET_BLOCKS_SPLIT_NOMERGE;
 		if (unwritten)
 			split_flag1 |= EXT4_EXT_MARK_UNWRIT1 |
 				       EXT4_EXT_MARK_UNWRIT2;
@@ -3739,7 +3739,7 @@ static struct ext4_ext_path *ext4_split_convert_extents(handle_t *handle,
 			      EXT4_EXT_MAY_ZEROOUT : 0;
 		split_flag |= (EXT4_EXT_MARK_UNWRIT2 | EXT4_EXT_DATA_VALID2);
 	}
-	flags |= EXT4_GET_BLOCKS_PRE_IO;
+	flags |= EXT4_GET_BLOCKS_SPLIT_NOMERGE;
 	return ext4_split_extent(handle, inode, path, map, split_flag, flags,
 				 allocated);
 }
@@ -3911,7 +3911,7 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
 						*allocated, newblock);
 
 	/* get_block() before submitting IO, split the extent */
-	if (flags & EXT4_GET_BLOCKS_PRE_IO) {
+	if (flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE) {
 		path = ext4_split_convert_extents(handle, inode, map, path,
 				flags | EXT4_GET_BLOCKS_CONVERT, allocated);
 		if (IS_ERR(path))
@@ -5618,7 +5618,7 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 			path = ext4_split_extent_at(handle, inode, path,
 					start_lblk, split_flag,
 					EXT4_EX_NOCACHE |
-					EXT4_GET_BLOCKS_PRE_IO |
+					EXT4_GET_BLOCKS_SPLIT_NOMERGE |
 					EXT4_GET_BLOCKS_METADATA_NOFAIL);
 		}
 
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index a374e7ea7e57..ada2b9223df5 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -39,7 +39,7 @@ struct partial_cluster;
 	{ EXT4_GET_BLOCKS_CREATE,		"CREATE" },		\
 	{ EXT4_GET_BLOCKS_UNWRIT_EXT,		"UNWRIT" },		\
 	{ EXT4_GET_BLOCKS_DELALLOC_RESERVE,	"DELALLOC" },		\
-	{ EXT4_GET_BLOCKS_PRE_IO,		"PRE_IO" },		\
+	{ EXT4_GET_BLOCKS_SPLIT_NOMERGE,	"SPLIT_NOMERGE" },	\
 	{ EXT4_GET_BLOCKS_CONVERT,		"CONVERT" },		\
 	{ EXT4_GET_BLOCKS_METADATA_NOFAIL,	"METADATA_NOFAIL" },	\
 	{ EXT4_GET_BLOCKS_NO_NORMALIZE,		"NO_NORMALIZE" },	\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/3] ext4: cleanup for ext4_map_blocks
  2025-11-07 11:58 [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks Yang Erkun
  2025-11-07 11:58 ` [PATCH v3 2/3] ext4: rename EXT4_GET_BLOCKS_PRE_IO Yang Erkun
@ 2025-11-07 11:58 ` Yang Erkun
  2025-11-08  1:54   ` Baokun Li
  2025-11-08  1:46 ` [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks Baokun Li
  2025-11-12  4:46 ` Zhang Yi
  3 siblings, 1 reply; 9+ messages in thread
From: Yang Erkun @ 2025-11-07 11:58 UTC (permalink / raw)
  To: linux-ext4, tytso, adilger.kernel, jack
  Cc: yi.zhang, libaokun1, yangerkun, yangerkun

Retval from ext4_map_create_blocks means we really create some blocks,
cannot happened with m_flags without EXT4_MAP_UNWRITTEN and
EXT4_MAP_MAPPED.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
---
 fs/ext4/inode.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e8bac93ca668..3d8ada26d5cd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -799,7 +799,13 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 	down_write(&EXT4_I(inode)->i_data_sem);
 	retval = ext4_map_create_blocks(handle, inode, map, flags);
 	up_write((&EXT4_I(inode)->i_data_sem));
-	if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
+
+	if (retval < 0)
+		ext_debug(inode, "failed with err %d\n", retval);
+	if (retval <= 0)
+		return retval;
+
+	if (map->m_flags & EXT4_MAP_MAPPED) {
 		ret = check_block_validity(inode, map);
 		if (ret != 0)
 			return ret;
@@ -828,12 +834,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 				return ret;
 		}
 	}
-	if (retval > 0 && (map->m_flags & EXT4_MAP_UNWRITTEN ||
-				map->m_flags & EXT4_MAP_MAPPED))
-		ext4_fc_track_range(handle, inode, map->m_lblk,
-					map->m_lblk + map->m_len - 1);
-	if (retval < 0)
-		ext_debug(inode, "failed with err %d\n", retval);
+	ext4_fc_track_range(handle, inode, map->m_lblk, map->m_lblk +
+			    map->m_len - 1);
 	return retval;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks
  2025-11-07 11:58 [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks Yang Erkun
  2025-11-07 11:58 ` [PATCH v3 2/3] ext4: rename EXT4_GET_BLOCKS_PRE_IO Yang Erkun
  2025-11-07 11:58 ` [PATCH v3 3/3] ext4: cleanup for ext4_map_blocks Yang Erkun
@ 2025-11-08  1:46 ` Baokun Li
  2025-11-12  4:46 ` Zhang Yi
  3 siblings, 0 replies; 9+ messages in thread
From: Baokun Li @ 2025-11-08  1:46 UTC (permalink / raw)
  To: Yang Erkun
  Cc: linux-ext4, yi.zhang, yangerkun, tytso, adilger.kernel, jack,
	Baokun Li

On 2025-11-07 19:58, Yang Erkun wrote:
> IO path with EXT4_GET_BLOCKS_PRE_IO means dio within i_size or
> dioread_nolock buffer writeback, they all means we need a unwritten
> extent(or this extent has already been initialized), and the split won't
> zero the range we really write. So this check seems useless. Besides,
> even if we repeatedly execute ext4_es_insert_extent, there won't
> actually be any issues.
>
> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Yang Erkun <yangerkun@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Baokun Li <libaokun1@huawei.com>

> ---
>  fs/ext4/inode.c | 11 -----------
>  1 file changed, 11 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e99306a8f47c..e8bac93ca668 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -583,7 +583,6 @@ static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
>  static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>  				  struct ext4_map_blocks *map, int flags)
>  {
> -	struct extent_status es;
>  	unsigned int status;
>  	int err, retval = 0;
>  
> @@ -644,16 +643,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>  			return err;
>  	}
>  
> -	/*
> -	 * If the extent has been zeroed out, we don't need to update
> -	 * extent status tree.
> -	 */
> -	if (flags & EXT4_GET_BLOCKS_PRE_IO &&
> -	    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> -		if (ext4_es_is_written(&es))
> -			return retval;
> -	}
> -
>  	status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>  			EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
>  	ext4_es_insert_extent(inode, map->m_lblk, map->m_len, map->m_pblk,



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/3] ext4: rename EXT4_GET_BLOCKS_PRE_IO
  2025-11-07 11:58 ` [PATCH v3 2/3] ext4: rename EXT4_GET_BLOCKS_PRE_IO Yang Erkun
@ 2025-11-08  1:49   ` Baokun Li
  0 siblings, 0 replies; 9+ messages in thread
From: Baokun Li @ 2025-11-08  1:49 UTC (permalink / raw)
  To: Yang Erkun
  Cc: yi.zhang, yangerkun, linux-ext4, tytso, adilger.kernel, jack,
	Baokun Li

On 2025-11-07 19:58, Yang Erkun wrote:
> This flag has been generalized to split an unwritten extent when we do
> dio or dioread_nolock writeback, or to avoid merge new extents which was
> created by extents split. Update some related comments too.
>
> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Yang Erkun <yangerkun@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Baokun Li <libaokun1@huawei.com>

> ---
>  fs/ext4/ext4.h              | 21 +++++++++++++++------
>  fs/ext4/extents.c           | 16 ++++++++--------
>  include/trace/events/ext4.h |  2 +-
>  3 files changed, 24 insertions(+), 15 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 57087da6c7be..6ee0bc072589 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -694,13 +694,22 @@ enum {
>  	/* Caller is from the delayed allocation writeout path
>  	 * finally doing the actual allocation of delayed blocks */
>  #define EXT4_GET_BLOCKS_DELALLOC_RESERVE	0x0004
> -	/* caller is from the direct IO path, request to creation of an
> -	unwritten extents if not allocated, split the unwritten
> -	extent if blocks has been preallocated already*/
> -#define EXT4_GET_BLOCKS_PRE_IO			0x0008
> -#define EXT4_GET_BLOCKS_CONVERT			0x0010
> -#define EXT4_GET_BLOCKS_IO_CREATE_EXT		(EXT4_GET_BLOCKS_PRE_IO|\
> +	/*
> +	 * This means that we cannot merge newly allocated extents, and if we
> +	 * found an unwritten extent, we need to split it.
> +	 */
> +#define EXT4_GET_BLOCKS_SPLIT_NOMERGE		0x0008
> +	/*
> +	 * Caller is from the dio or dioread_nolock buffered IO, reqest to
> +	 * create an unwritten extent if it does not exist or split the
> +	 * found unwritten extent. Also do not merge the newly created
> +	 * unwritten extent, io end will convert unwritten to written,
> +	 * and try to merge the written extent.
> +	 */
> +#define EXT4_GET_BLOCKS_IO_CREATE_EXT		(EXT4_GET_BLOCKS_SPLIT_NOMERGE|\
>  					 EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT)
> +	/* Convert unwritten extent to initialized. */
> +#define EXT4_GET_BLOCKS_CONVERT			0x0010
>  	/* Eventual metadata allocation (due to growing extent tree)
>  	 * should not fail, so try to use reserved blocks for that.*/
>  #define EXT4_GET_BLOCKS_METADATA_NOFAIL		0x0020
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ca5499e9412b..241b5f5d29ad 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -333,7 +333,7 @@ ext4_force_split_extent_at(handle_t *handle, struct inode *inode,
>  			   int nofail)
>  {
>  	int unwritten = ext4_ext_is_unwritten(path[path->p_depth].p_ext);
> -	int flags = EXT4_EX_NOCACHE | EXT4_GET_BLOCKS_PRE_IO;
> +	int flags = EXT4_EX_NOCACHE | EXT4_GET_BLOCKS_SPLIT_NOMERGE;
>  
>  	if (nofail)
>  		flags |= EXT4_GET_BLOCKS_METADATA_NOFAIL | EXT4_EX_NOFAIL;
> @@ -2002,7 +2002,7 @@ ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
>  	}
>  
>  	/* try to insert block into found extent and return */
> -	if (ex && !(gb_flags & EXT4_GET_BLOCKS_PRE_IO)) {
> +	if (ex && !(gb_flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE)) {
>  
>  		/*
>  		 * Try to see whether we should rather test the extent on
> @@ -2181,7 +2181,7 @@ ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
>  
>  merge:
>  	/* try to merge extents */
> -	if (!(gb_flags & EXT4_GET_BLOCKS_PRE_IO))
> +	if (!(gb_flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE))
>  		ext4_ext_try_to_merge(handle, inode, path, nearex);
>  
>  	/* time to correct all indexes above */
> @@ -3224,7 +3224,7 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
>  		else
>  			ext4_ext_mark_initialized(ex);
>  
> -		if (!(flags & EXT4_GET_BLOCKS_PRE_IO))
> +		if (!(flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE))
>  			ext4_ext_try_to_merge(handle, inode, path, ex);
>  
>  		err = ext4_ext_dirty(handle, inode, path + path->p_depth);
> @@ -3368,7 +3368,7 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
>  
>  	if (map->m_lblk + map->m_len < ee_block + ee_len) {
>  		split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
> -		flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;
> +		flags1 = flags | EXT4_GET_BLOCKS_SPLIT_NOMERGE;
>  		if (unwritten)
>  			split_flag1 |= EXT4_EXT_MARK_UNWRIT1 |
>  				       EXT4_EXT_MARK_UNWRIT2;
> @@ -3739,7 +3739,7 @@ static struct ext4_ext_path *ext4_split_convert_extents(handle_t *handle,
>  			      EXT4_EXT_MAY_ZEROOUT : 0;
>  		split_flag |= (EXT4_EXT_MARK_UNWRIT2 | EXT4_EXT_DATA_VALID2);
>  	}
> -	flags |= EXT4_GET_BLOCKS_PRE_IO;
> +	flags |= EXT4_GET_BLOCKS_SPLIT_NOMERGE;
>  	return ext4_split_extent(handle, inode, path, map, split_flag, flags,
>  				 allocated);
>  }
> @@ -3911,7 +3911,7 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
>  						*allocated, newblock);
>  
>  	/* get_block() before submitting IO, split the extent */
> -	if (flags & EXT4_GET_BLOCKS_PRE_IO) {
> +	if (flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE) {
>  		path = ext4_split_convert_extents(handle, inode, map, path,
>  				flags | EXT4_GET_BLOCKS_CONVERT, allocated);
>  		if (IS_ERR(path))
> @@ -5618,7 +5618,7 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
>  			path = ext4_split_extent_at(handle, inode, path,
>  					start_lblk, split_flag,
>  					EXT4_EX_NOCACHE |
> -					EXT4_GET_BLOCKS_PRE_IO |
> +					EXT4_GET_BLOCKS_SPLIT_NOMERGE |
>  					EXT4_GET_BLOCKS_METADATA_NOFAIL);
>  		}
>  
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index a374e7ea7e57..ada2b9223df5 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -39,7 +39,7 @@ struct partial_cluster;
>  	{ EXT4_GET_BLOCKS_CREATE,		"CREATE" },		\
>  	{ EXT4_GET_BLOCKS_UNWRIT_EXT,		"UNWRIT" },		\
>  	{ EXT4_GET_BLOCKS_DELALLOC_RESERVE,	"DELALLOC" },		\
> -	{ EXT4_GET_BLOCKS_PRE_IO,		"PRE_IO" },		\
> +	{ EXT4_GET_BLOCKS_SPLIT_NOMERGE,	"SPLIT_NOMERGE" },	\
>  	{ EXT4_GET_BLOCKS_CONVERT,		"CONVERT" },		\
>  	{ EXT4_GET_BLOCKS_METADATA_NOFAIL,	"METADATA_NOFAIL" },	\
>  	{ EXT4_GET_BLOCKS_NO_NORMALIZE,		"NO_NORMALIZE" },	\



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/3] ext4: cleanup for ext4_map_blocks
  2025-11-07 11:58 ` [PATCH v3 3/3] ext4: cleanup for ext4_map_blocks Yang Erkun
@ 2025-11-08  1:54   ` Baokun Li
  0 siblings, 0 replies; 9+ messages in thread
From: Baokun Li @ 2025-11-08  1:54 UTC (permalink / raw)
  To: Yang Erkun
  Cc: yi.zhang, yangerkun, linux-ext4, tytso, adilger.kernel, jack,
	Baokun Li

On 2025-11-07 19:58, Yang Erkun wrote:
> Retval from ext4_map_create_blocks means we really create some blocks,
> cannot happened with m_flags without EXT4_MAP_UNWRITTEN and
> EXT4_MAP_MAPPED.
>
> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Yang Erkun <yangerkun@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Baokun Li <libaokun1@huawei.com>

> ---
>  fs/ext4/inode.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e8bac93ca668..3d8ada26d5cd 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -799,7 +799,13 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>  	down_write(&EXT4_I(inode)->i_data_sem);
>  	retval = ext4_map_create_blocks(handle, inode, map, flags);
>  	up_write((&EXT4_I(inode)->i_data_sem));
> -	if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
> +
> +	if (retval < 0)
> +		ext_debug(inode, "failed with err %d\n", retval);
> +	if (retval <= 0)
> +		return retval;
> +
> +	if (map->m_flags & EXT4_MAP_MAPPED) {
>  		ret = check_block_validity(inode, map);
>  		if (ret != 0)
>  			return ret;
> @@ -828,12 +834,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>  				return ret;
>  		}
>  	}
> -	if (retval > 0 && (map->m_flags & EXT4_MAP_UNWRITTEN ||
> -				map->m_flags & EXT4_MAP_MAPPED))
> -		ext4_fc_track_range(handle, inode, map->m_lblk,
> -					map->m_lblk + map->m_len - 1);
> -	if (retval < 0)
> -		ext_debug(inode, "failed with err %d\n", retval);
> +	ext4_fc_track_range(handle, inode, map->m_lblk, map->m_lblk +
> +			    map->m_len - 1);
>  	return retval;
>  }
>  



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks
  2025-11-07 11:58 [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks Yang Erkun
                   ` (2 preceding siblings ...)
  2025-11-08  1:46 ` [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks Baokun Li
@ 2025-11-12  4:46 ` Zhang Yi
  2025-11-12  8:30   ` yangerkun
  2025-11-13 11:27   ` Jan Kara
  3 siblings, 2 replies; 9+ messages in thread
From: Zhang Yi @ 2025-11-12  4:46 UTC (permalink / raw)
  To: Yang Erkun, linux-ext4, tytso, adilger.kernel, jack; +Cc: libaokun1, yangerkun

Hi!

On 11/7/2025 7:58 PM, Yang Erkun wrote:
> IO path with EXT4_GET_BLOCKS_PRE_IO means dio within i_size or
> dioread_nolock buffer writeback, they all means we need a unwritten
> extent(or this extent has already been initialized), and the split won't
> zero the range we really write. So this check seems useless. Besides,
> even if we repeatedly execute ext4_es_insert_extent, there won't
> actually be any issues.
> 
> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Yang Erkun <yangerkun@huawei.com>
> ---
>  fs/ext4/inode.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e99306a8f47c..e8bac93ca668 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -583,7 +583,6 @@ static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
>  static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>  				  struct ext4_map_blocks *map, int flags)
>  {
> -	struct extent_status es;
>  	unsigned int status;
>  	int err, retval = 0;
>  
> @@ -644,16 +643,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>  			return err;
>  	}
>  
> -	/*
> -	 * If the extent has been zeroed out, we don't need to update
> -	 * extent status tree.
> -	 */
> -	if (flags & EXT4_GET_BLOCKS_PRE_IO &&
> -	    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> -		if (ext4_es_is_written(&es))
> -			return retval;
> -	}
> -

Sorry, I think I was wrong and I now realize that we need to keep this code
snippet. Because ext4_split_extent() may convert the on-disk extent to written
with the EXT4_EXT_MAY_ZEROOUT flag set. If we drop this check, it will add an
unwritten extent into the extent status tree, which is inconsistent with the
real one.

Although this might not seem like a practical issue now, it's a potential
problem and conflicts with the ext4_es_cache_extent() extension I am currently
developing[1], which triggers a mismatch alarm when caching extents.

Besides, I also notice there is a potential stale data issue about the
EXT4_EXT_MAY_ZEROOUT flag.

Assume we have an unwritten file, and then DIO writes the second half.

   [UUUUUUUUUUUUUUUU] on-disk extent
   [UUUUUUUUUUUUUUUU] extent status tree
            |<----->| dio write

1. ext4_iomap_alloc() call ext4_map_blocks() with EXT4_GET_BLOCKS_PRE_IO,
   EXT4_GET_BLOCKS_UNWRIT_EXT and EXT4_GET_BLOCKS_CREATE flags set.
2. ext4_map_blocks() find this extent and call ext4_split_convert_extents()
   with EXT4_GET_BLOCKS_CONVERT and the above flags set.
3. call ext4_split_extent() with EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and
   EXT4_EXT_DATA_VALID2 flags set.
4. call ext4_split_extent_at() to split the second half with EXT4_EXT_DATA_VALID2,
   EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT and EXT4_EXT_MARK_UNWRIT2 flags
   set.
5. We failed to insert extent since -NOSPC in ext4_split_extent_at().
6. ext4_split_extent_at() zero out the first half but convert the entire on-disk
   extent to written since the EXT4_EXT_DATA_VALID2 flag set, and left the second
   half as unwritten in the extent status tree.

   [0000000000SSSSSS]  data
   [WWWWWWWWWWWWWWWW]  on-disk extent
   [WWWWWWWWWWUUUUUU]  extent status tree

7. If the dio failed to write data to the disk, If DIO fails to write data, the
   stale data in the second half will be exposed.

Therefore, I think we should zero out the entire extent range to zero for this
case, and also mark the extent as written in the extent status tree (that is the
another reason I think we should keep this code snippet). I was still confirming
this issue and thinking about how to fix it, but I believe it would be better
not to merge this patch for now. What do you think?

[1]. https://lore.kernel.org/linux-ext4/20251031062905.4135909-1-yi.zhang@huaweicloud.com/

Thanks,
Yi.

>  	status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>  			EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
>  	ext4_es_insert_extent(inode, map->m_lblk, map->m_len, map->m_pblk,


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks
  2025-11-12  4:46 ` Zhang Yi
@ 2025-11-12  8:30   ` yangerkun
  2025-11-13 11:27   ` Jan Kara
  1 sibling, 0 replies; 9+ messages in thread
From: yangerkun @ 2025-11-12  8:30 UTC (permalink / raw)
  To: Zhang Yi, Yang Erkun, linux-ext4, tytso, adilger.kernel, jack; +Cc: libaokun1



在 2025/11/12 12:46, Zhang Yi 写道:
> Hi!
> 
> On 11/7/2025 7:58 PM, Yang Erkun wrote:
>> IO path with EXT4_GET_BLOCKS_PRE_IO means dio within i_size or
>> dioread_nolock buffer writeback, they all means we need a unwritten
>> extent(or this extent has already been initialized), and the split won't
>> zero the range we really write. So this check seems useless. Besides,
>> even if we repeatedly execute ext4_es_insert_extent, there won't
>> actually be any issues.
>>
>> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
>> Reviewed-by: Jan Kara <jack@suse.cz>
>> Signed-off-by: Yang Erkun <yangerkun@huawei.com>
>> ---
>>   fs/ext4/inode.c | 11 -----------
>>   1 file changed, 11 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index e99306a8f47c..e8bac93ca668 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -583,7 +583,6 @@ static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
>>   static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>>   				  struct ext4_map_blocks *map, int flags)
>>   {
>> -	struct extent_status es;
>>   	unsigned int status;
>>   	int err, retval = 0;
>>   
>> @@ -644,16 +643,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>>   			return err;
>>   	}
>>   
>> -	/*
>> -	 * If the extent has been zeroed out, we don't need to update
>> -	 * extent status tree.
>> -	 */
>> -	if (flags & EXT4_GET_BLOCKS_PRE_IO &&
>> -	    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
>> -		if (ext4_es_is_written(&es))
>> -			return retval;
>> -	}
>> -
> 
> Sorry, I think I was wrong and I now realize that we need to keep this code
> snippet. Because ext4_split_extent() may convert the on-disk extent to written
> with the EXT4_EXT_MAY_ZEROOUT flag set. If we drop this check, it will add an
> unwritten extent into the extent status tree, which is inconsistent with the
> real one.
> 
> Although this might not seem like a practical issue now, it's a potential
> problem and conflicts with the ext4_es_cache_extent() extension I am currently
> developing[1], which triggers a mismatch alarm when caching extents.
> 
> Besides, I also notice there is a potential stale data issue about the
> EXT4_EXT_MAY_ZEROOUT flag.
> 
> Assume we have an unwritten file, and then DIO writes the second half.
> 
>     [UUUUUUUUUUUUUUUU] on-disk extent
>     [UUUUUUUUUUUUUUUU] extent status tree
>              |<----->| dio write
> 
> 1. ext4_iomap_alloc() call ext4_map_blocks() with EXT4_GET_BLOCKS_PRE_IO,
>     EXT4_GET_BLOCKS_UNWRIT_EXT and EXT4_GET_BLOCKS_CREATE flags set.
> 2. ext4_map_blocks() find this extent and call ext4_split_convert_extents()
>     with EXT4_GET_BLOCKS_CONVERT and the above flags set.
> 3. call ext4_split_extent() with EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and
>     EXT4_EXT_DATA_VALID2 flags set.
> 4. call ext4_split_extent_at() to split the second half with EXT4_EXT_DATA_VALID2,
>     EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT and EXT4_EXT_MARK_UNWRIT2 flags
>     set.
> 5. We failed to insert extent since -NOSPC in ext4_split_extent_at().
> 6. ext4_split_extent_at() zero out the first half but convert the entire on-disk
>     extent to written since the EXT4_EXT_DATA_VALID2 flag set, and left the second
>     half as unwritten in the extent status tree.
> 
>     [0000000000SSSSSS]  data
>     [WWWWWWWWWWWWWWWW]  on-disk extent
>     [WWWWWWWWWWUUUUUU]  extent status tree
> 
> 7. If the dio failed to write data to the disk, If DIO fails to write data, the
>     stale data in the second half will be exposed.
> 
> Therefore, I think we should zero out the entire extent range to zero for this
> case, and also mark the extent as written in the extent status tree (that is the
> another reason I think we should keep this code snippet). I was still confirming
> this issue and thinking about how to fix it, but I believe it would be better
> not to merge this patch for now. What do you think?


Hi,

Thank you very much for your detailed analysis!

Yes, ext4_split_extent with EXT4_EXT_DATA_VALID2 indicates that the
mapped range has been written and contains valid data. It seems correct
that ext4_convert_unwritten_extents_endio calls
ext4_split_convert_extents with EXT4_GET_BLOCKS_CONVERT. However, for
the I/O submit context, which calls ext4_split_convert_extents with
EXT4_GET_BLOCKS_CONVERT from ext4_ext_handle_unwritten_extents, the
mapped range is still an unwritten extent and does not contain valid
data. This could lead to the stale data issue you described. Therefore,
the root cause is that we cannot use EXT4_GET_BLOCKS_CONVERT in the I/O
submit context, but it appears difficult to prevent this.

Additionally, I agree with you that merging this patch should be
avoided. Thanks again for your thorough review!

Thanks,
Erkun.

> 
> [1]. https://lore.kernel.org/linux-ext4/20251031062905.4135909-1-yi.zhang@huaweicloud.com/
> 
> Thanks,
> Yi.
> 
>>   	status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>>   			EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
>>   	ext4_es_insert_extent(inode, map->m_lblk, map->m_len, map->m_pblk,


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks
  2025-11-12  4:46 ` Zhang Yi
  2025-11-12  8:30   ` yangerkun
@ 2025-11-13 11:27   ` Jan Kara
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Kara @ 2025-11-13 11:27 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Yang Erkun, linux-ext4, tytso, adilger.kernel, jack, libaokun1,
	yangerkun

Hi!

On Wed 12-11-25 12:46:26, Zhang Yi wrote:
> On 11/7/2025 7:58 PM, Yang Erkun wrote:
> > IO path with EXT4_GET_BLOCKS_PRE_IO means dio within i_size or
> > dioread_nolock buffer writeback, they all means we need a unwritten
> > extent(or this extent has already been initialized), and the split won't
> > zero the range we really write. So this check seems useless. Besides,
> > even if we repeatedly execute ext4_es_insert_extent, there won't
> > actually be any issues.
> > 
> > Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Yang Erkun <yangerkun@huawei.com>
> > ---
> >  fs/ext4/inode.c | 11 -----------
> >  1 file changed, 11 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index e99306a8f47c..e8bac93ca668 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -583,7 +583,6 @@ static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
> >  static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
> >  				  struct ext4_map_blocks *map, int flags)
> >  {
> > -	struct extent_status es;
> >  	unsigned int status;
> >  	int err, retval = 0;
> >  
> > @@ -644,16 +643,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
> >  			return err;
> >  	}
> >  
> > -	/*
> > -	 * If the extent has been zeroed out, we don't need to update
> > -	 * extent status tree.
> > -	 */
> > -	if (flags & EXT4_GET_BLOCKS_PRE_IO &&
> > -	    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> > -		if (ext4_es_is_written(&es))
> > -			return retval;
> > -	}
> > -
> 
> Sorry, I think I was wrong and I now realize that we need to keep this code
> snippet. Because ext4_split_extent() may convert the on-disk extent to written
> with the EXT4_EXT_MAY_ZEROOUT flag set. If we drop this check, it will add an
> unwritten extent into the extent status tree, which is inconsistent with the
> real one.
> 
> Although this might not seem like a practical issue now, it's a potential
> problem and conflicts with the ext4_es_cache_extent() extension I am currently
> developing[1], which triggers a mismatch alarm when caching extents.
> 
> Besides, I also notice there is a potential stale data issue about the
> EXT4_EXT_MAY_ZEROOUT flag.
> 
> Assume we have an unwritten file, and then DIO writes the second half.
> 
>    [UUUUUUUUUUUUUUUU] on-disk extent
>    [UUUUUUUUUUUUUUUU] extent status tree
>             |<----->| dio write
> 
> 1. ext4_iomap_alloc() call ext4_map_blocks() with EXT4_GET_BLOCKS_PRE_IO,
>    EXT4_GET_BLOCKS_UNWRIT_EXT and EXT4_GET_BLOCKS_CREATE flags set.
> 2. ext4_map_blocks() find this extent and call ext4_split_convert_extents()
>    with EXT4_GET_BLOCKS_CONVERT and the above flags set.
> 3. call ext4_split_extent() with EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and
>    EXT4_EXT_DATA_VALID2 flags set.
> 4. call ext4_split_extent_at() to split the second half with EXT4_EXT_DATA_VALID2,
>    EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT and EXT4_EXT_MARK_UNWRIT2 flags
>    set.
> 5. We failed to insert extent since -NOSPC in ext4_split_extent_at().
> 6. ext4_split_extent_at() zero out the first half but convert the entire on-disk
>    extent to written since the EXT4_EXT_DATA_VALID2 flag set, and left the second
>    half as unwritten in the extent status tree.
> 
>    [0000000000SSSSSS]  data
>    [WWWWWWWWWWWWWWWW]  on-disk extent
>    [WWWWWWWWWWUUUUUU]  extent status tree
> 
> 7. If the dio failed to write data to the disk, If DIO fails to write data, the
>    stale data in the second half will be exposed.

Right. That looks like a bug.

> Therefore, I think we should zero out the entire extent range to zero for this
> case, and also mark the extent as written in the extent status tree (that is the
> another reason I think we should keep this code snippet).

I agree this is probably the easiest fix.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-11-13 11:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-07 11:58 [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks Yang Erkun
2025-11-07 11:58 ` [PATCH v3 2/3] ext4: rename EXT4_GET_BLOCKS_PRE_IO Yang Erkun
2025-11-08  1:49   ` Baokun Li
2025-11-07 11:58 ` [PATCH v3 3/3] ext4: cleanup for ext4_map_blocks Yang Erkun
2025-11-08  1:54   ` Baokun Li
2025-11-08  1:46 ` [PATCH v3 1/3] ext4: remove useless code in ext4_map_create_blocks Baokun Li
2025-11-12  4:46 ` Zhang Yi
2025-11-12  8:30   ` yangerkun
2025-11-13 11:27   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox