From: Zhang Yi <yi.zhang@huaweicloud.com>
To: linux-ext4@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
yi.zhang@huawei.com, yi.zhang@huaweicloud.com,
yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com
Subject: [PATCH v2 04/13] ext4: don't set EXT4_GET_BLOCKS_CONVERT when splitting before submitting I/O
Date: Fri, 21 Nov 2025 14:08:02 +0800 [thread overview]
Message-ID: <20251121060811.1685783-5-yi.zhang@huaweicloud.com> (raw)
In-Reply-To: <20251121060811.1685783-1-yi.zhang@huaweicloud.com>
From: Zhang Yi <yi.zhang@huawei.com>
When allocating blocks during within-EOF DIO and writeback with
dioread_nolock enabled, EXT4_GET_BLOCKS_PRE_IO was set to split an
existing large unwritten extent. However, EXT4_GET_BLOCKS_CONVERT was
set when calling ext4_split_convert_extents(), which may potentially
result in stale data issues.
Assume we have an unwritten extent, and then DIO writes the second half.
[UUUUUUUUUUUUUUUU] on-disk extent U: unwritten extent
[UUUUUUUUUUUUUUUU] extent status tree
|<- ->| ----> dio write this range
First, ext4_iomap_alloc() call ext4_map_blocks() with
EXT4_GET_BLOCKS_PRE_IO, EXT4_GET_BLOCKS_UNWRIT_EXT and
EXT4_GET_BLOCKS_CREATE flags set. ext4_map_blocks() find this extent and
call ext4_split_convert_extents() with EXT4_GET_BLOCKS_CONVERT and the
above flags set.
Then, ext4_split_convert_extents() calls ext4_split_extent() with
EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and EXT4_EXT_DATA_VALID2
flags set, and it calls ext4_split_extent_at() to split the second half
with EXT4_EXT_DATA_VALID2, EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT
and EXT4_EXT_MARK_UNWRIT2 flags set. However, ext4_split_extent_at()
failed to insert extent since a temporary lack -ENOSPC. It zeroes out
the first half but convert the entire on-disk extent to written since
the EXT4_EXT_DATA_VALID2 flag set, but left the second half as unwritten
in the extent status tree.
[0000000000SSSSSS] data S: stale data, 0: zeroed
[WWWWWWWWWWWWWWWW] on-disk extent W: written extent
[WWWWWWWWWWUUUUUU] extent status tree
Finally, if the DIO failed to write data to the disk, the stale data in
the second half will be exposed once the cached extent entry is gone.
Fix this issue by not passing EXT4_GET_BLOCKS_CONVERT when splitting
an unwritten extent before submitting I/O, and make
ext4_split_convert_extents() to zero out the entire extent range
to zero for this case, and also mark the extent in the extent status
tree for consistency.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
fs/ext4/extents.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index cafe66cb562f..2db84f6b0056 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3733,11 +3733,15 @@ static struct ext4_ext_path *ext4_split_convert_extents(handle_t *handle,
/* Convert to unwritten */
if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) {
split_flag |= EXT4_EXT_DATA_ENTIRE_VALID1;
- /* Convert to initialized */
- } else if (flags & EXT4_GET_BLOCKS_CONVERT) {
+ /* Split the existing unwritten extent */
+ } else if (flags & (EXT4_GET_BLOCKS_UNWRIT_EXT |
+ EXT4_GET_BLOCKS_CONVERT)) {
split_flag |= ee_block + ee_len <= eof_block ?
EXT4_EXT_MAY_ZEROOUT : 0;
- split_flag |= (EXT4_EXT_MARK_UNWRIT2 | EXT4_EXT_DATA_VALID2);
+ split_flag |= EXT4_EXT_MARK_UNWRIT2;
+ /* Convert to initialized */
+ if (flags & EXT4_GET_BLOCKS_CONVERT)
+ split_flag |= EXT4_EXT_DATA_VALID2;
}
flags |= EXT4_GET_BLOCKS_PRE_IO;
return ext4_split_extent(handle, inode, path, map, split_flag, flags,
@@ -3913,7 +3917,7 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
/* get_block() before submitting IO, split the extent */
if (flags & EXT4_GET_BLOCKS_PRE_IO) {
path = ext4_split_convert_extents(handle, inode, map, path,
- flags | EXT4_GET_BLOCKS_CONVERT, allocated);
+ flags, allocated);
if (IS_ERR(path))
return path;
/*
--
2.46.1
next prev parent reply other threads:[~2025-11-21 6:10 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-21 6:07 [PATCH v2 00/13] ext4: replace ext4_es_insert_extent() when caching on-disk extents Zhang Yi
2025-11-21 6:07 ` [PATCH v2 01/13] ext4: cleanup zeroout in ext4_split_extent_at() Zhang Yi
2025-11-26 11:26 ` Ojaswin Mujoo
2025-11-27 12:02 ` Jan Kara
2025-11-28 2:22 ` Zhang Yi
2025-11-28 1:54 ` Baokun Li
2025-11-21 6:08 ` [PATCH v2 02/13] ext4: subdivide EXT4_EXT_DATA_VALID1 Zhang Yi
2025-11-26 11:27 ` Ojaswin Mujoo
2025-11-26 11:55 ` Ojaswin Mujoo
2025-11-27 6:09 ` Zhang Yi
2025-11-21 6:08 ` [PATCH v2 03/13] ext4: don't zero the entire extent if EXT4_EXT_DATA_PARTIAL_VALID1 Zhang Yi
2025-11-26 11:29 ` Ojaswin Mujoo
2025-11-27 6:13 ` Zhang Yi
2025-11-27 13:41 ` Jan Kara
2025-11-28 3:45 ` Zhang Yi
2025-11-28 10:58 ` Jan Kara
2025-11-28 7:28 ` Ojaswin Mujoo
2025-11-28 11:14 ` Jan Kara
2025-11-28 14:20 ` Ojaswin Mujoo
2025-11-28 19:52 ` Andreas Dilger
2025-11-29 18:41 ` Ojaswin Mujoo
2025-11-21 6:08 ` Zhang Yi [this message]
2025-11-26 11:50 ` [PATCH v2 04/13] ext4: don't set EXT4_GET_BLOCKS_CONVERT when splitting before submitting I/O Ojaswin Mujoo
2025-11-21 6:08 ` [PATCH v2 05/13] ext4: correct the mapping status if the extent has been zeroed Zhang Yi
2025-11-26 11:56 ` Ojaswin Mujoo
2025-11-21 6:08 ` [PATCH v2 06/13] ext4: don't cache extent during splitting extent Zhang Yi
2025-11-26 12:04 ` Ojaswin Mujoo
2025-11-27 7:01 ` Zhang Yi
2025-11-28 8:18 ` Ojaswin Mujoo
2025-11-21 6:08 ` [PATCH v2 07/13] ext4: drop extent cache before " Zhang Yi
2025-11-26 12:24 ` Ojaswin Mujoo
2025-11-27 7:27 ` Zhang Yi
2025-11-28 8:16 ` Ojaswin Mujoo
2025-11-29 1:36 ` Zhang Yi
2025-11-21 6:08 ` [PATCH v2 08/13] ext4: cleanup useless out tag in __es_remove_extent() Zhang Yi
2025-11-27 12:43 ` Jan Kara
2025-11-28 8:20 ` Ojaswin Mujoo
2025-11-21 6:08 ` [PATCH v2 09/13] ext4: make __es_remove_extent() check extent status Zhang Yi
2025-11-21 6:08 ` [PATCH v2 10/13] ext4: make ext4_es_cache_extent() support overwrite existing extents Zhang Yi
2025-11-21 6:08 ` [PATCH v2 11/13] ext4: adjust the debug info in ext4_es_cache_extent() Zhang Yi
2025-11-21 6:08 ` [PATCH v2 12/13] ext4: replace ext4_es_insert_extent() when caching on-disk extents Zhang Yi
2025-11-21 6:08 ` [PATCH v2 13/13] ext4: drop the TODO comment in ext4_es_insert_extent() Zhang Yi
2025-11-23 10:55 ` [PATCH v2 00/13] ext4: replace ext4_es_insert_extent() when caching on-disk extents Ojaswin Mujoo
2025-11-24 5:04 ` Zhang Yi
2025-11-24 12:50 ` Ojaswin Mujoo
2025-11-24 14:05 ` Zhang Yi
2025-11-27 12:24 ` Jan Kara
2025-11-28 4:37 ` Zhang Yi
2025-11-28 16:49 ` Theodore Tso
2025-11-29 1:32 ` Zhang Yi
2025-11-29 3:52 ` Theodore Tso
2025-11-29 4:44 ` Zhang Yi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251121060811.1685783-5-yi.zhang@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=adilger.kernel@dilger.ca \
--cc=jack@suse.cz \
--cc=libaokun1@huawei.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yizhang089@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).