From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, tytso@mit.edu,
adilger.kernel@dilger.ca, yi.zhang@huawei.com,
libaokun1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH v2 03/13] ext4: introduce seq counter for the extent status entry
Date: Thu, 9 Oct 2025 14:52:36 +0800 [thread overview]
Message-ID: <1daf2836-f497-4de7-ac8c-32d4d5e68f83@huaweicloud.com> (raw)
In-Reply-To: <ympvfypw3222g2k4xzd5pba4zhkz5jihw4td67iixvrqhuu43y@wse63ntv4s6u>
On 10/8/2025 7:44 PM, Jan Kara wrote:
> On Thu 25-09-25 17:25:59, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> In the iomap_write_iter(), the iomap buffered write frame does not hold
>> any locks between querying the inode extent mapping info and performing
>> page cache writes. As a result, the extent mapping can be changed due to
>> concurrent I/O in flight. Similarly, in the iomap_writepage_map(), the
>> write-back process faces a similar problem: concurrent changes can
>> invalidate the extent mapping before the I/O is submitted.
>>
>> Therefore, both of these processes must recheck the mapping info after
>> acquiring the folio lock. To address this, similar to XFS, we propose
>> introducing an extent sequence number to serve as a validity cookie for
>> the extent. After commit 24b7a2331fcd ("ext4: clairfy the rules for
>> modifying extents"), we can ensure the extent information should always
>> be processed through the extent status tree, and the extent status tree
>> is always uptodate under i_rwsem or invalidate_lock or folio lock, so
>> it's safe to introduce this sequence number. The sequence number will be
>> increased whenever the extent status tree changes, preparing for the
>> buffered write iomap conversion.
>>
>> Besides, this mechanism is also applicable for the moving extents case.
>> In move_extent_per_page(), it also needs to reacquire data_sem and check
>> the mapping info again under the folio lock.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>
> One idea for future optimization as I'm reading the series:
Hi, Jan!
Thank you very much for reviewing this series!
>
>> @@ -955,6 +961,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>> }
>> pending = err3;
>> }
>> + ext4_es_inc_seq(inode);
>> error:
>> write_unlock(&EXT4_I(inode)->i_es_lock);
>> /*
>
> ext4_es_insert_extent() doesn't always need to increment the sequence
> counter. It is used in two situations:
>
> 1) When we found the extent in the on-disk extent tree and want to cache it
> in memory. No increment needed is in this case.
>
> 2) When we allocated new blocks or changed their status. Increment needed
> in this case.
>
> Case 1) can be actually pretty frequent on large files and we would be
> unnecessarily invalidating mapping information for operations happening in
> other parts of the file although no allocation information changes are
> actually happening.
>
Indeed, the sequence count increment in Case 1 can be omitted because it
does not change any real extent. This increment can cause unnecessary
invalidation, potentially incurring additional overhead in some concurrency
scenarios.
Distinguishing between these two scenarios does not seem complicated. Since
the iomap conversion has not yet been completed, currently only the
defragmentation use this mechanism, I can add a TODO comment here now and
then initiate a new series to optimize it.
Thanks,
Yi.
next prev parent reply other threads:[~2025-10-09 6:52 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-25 9:25 [PATCH v2 00/13] ext4: optimize online defragment Zhang Yi
2025-09-25 9:25 ` [PATCH v2 01/13] ext4: fix an off-by-one issue during moving extents Zhang Yi
2025-09-25 9:25 ` [PATCH v2 02/13] ext4: correct the checking of quota files before " Zhang Yi
2025-10-08 11:17 ` Jan Kara
2025-09-25 9:25 ` [PATCH v2 03/13] ext4: introduce seq counter for the extent status entry Zhang Yi
2025-10-08 11:26 ` Jan Kara
2025-10-08 11:44 ` Jan Kara
2025-10-09 6:52 ` Zhang Yi [this message]
2025-09-25 9:26 ` [PATCH v2 04/13] ext4: make ext4_es_lookup_extent() pass out the extent seq counter Zhang Yi
2025-10-08 11:28 ` Jan Kara
2025-09-25 9:26 ` [PATCH v2 05/13] ext4: pass out extent seq counter when mapping blocks Zhang Yi
2025-10-08 11:36 ` Jan Kara
2025-09-25 9:26 ` [PATCH v2 06/13] ext4: use EXT4_B_TO_LBLK() in mext_check_arguments() Zhang Yi
2025-10-08 11:45 ` Jan Kara
2025-09-25 9:26 ` [PATCH v2 07/13] ext4: add mext_check_validity() to do basic check Zhang Yi
2025-10-08 11:47 ` Jan Kara
2025-09-25 9:26 ` [PATCH v2 08/13] ext4: refactor mext_check_arguments() Zhang Yi
2025-10-08 11:51 ` Jan Kara
2025-09-25 9:26 ` [PATCH v2 09/13] ext4: rename mext_page_mkuptodate() to mext_folio_mkuptodate() Zhang Yi
2025-10-08 11:51 ` Jan Kara
2025-09-25 9:26 ` [PATCH v2 10/13] ext4: introduce mext_move_extent() Zhang Yi
2025-10-08 12:16 ` Jan Kara
2025-09-25 9:26 ` [PATCH v2 11/13] ext4: switch to using the new extent movement method Zhang Yi
2025-10-08 12:49 ` Jan Kara
2025-10-09 7:20 ` Zhang Yi
2025-10-09 9:14 ` Jan Kara
2025-10-09 12:24 ` Zhang Yi
2025-09-25 9:26 ` [PATCH v2 12/13] ext4: add large folios support for moving extents Zhang Yi
2025-10-08 12:53 ` Jan Kara
2025-10-09 7:23 ` Zhang Yi
2025-09-25 9:26 ` [PATCH v2 13/13] ext4: add two trace points " Zhang Yi
2025-10-08 12:54 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1daf2836-f497-4de7-ac8c-32d4d5e68f83@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=adilger.kernel@dilger.ca \
--cc=jack@suse.cz \
--cc=libaokun1@huawei.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).