From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, tytso@mit.edu,
adilger.kernel@dilger.ca, yi.zhang@huawei.com,
libaokun1@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH 1/4] ext4: make ext4_es_cache_extent() support overwrite existing extents
Date: Wed, 19 Nov 2025 17:36:48 +0800 [thread overview]
Message-ID: <cfd95673-d0e6-44e6-86af-04bf2e0a9a8f@huaweicloud.com> (raw)
In-Reply-To: <hmfdz3arnmmmrvar2266ye4vb64txvxsa4hrpzppb4sp354b25@tnpvja7o7uww>
On 11/11/2025 6:33 PM, Jan Kara wrote:
> Hi!
>
> On Thu 06-11-25 21:02:35, Zhang Yi wrote:
>> On 11/6/2025 5:15 PM, Jan Kara wrote:
>>> On Fri 31-10-25 14:29:02, Zhang Yi wrote:
>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>
>>>> Currently, ext4_es_cache_extent() is used to load extents into the
>>>> extent status tree when reading on-disk extent blocks. Since it may be
>>>> called while moving or modifying the extent tree, so it does not
>>>> overwrite existing extents in the extent status tree and is only used
>>>> for the initial loading.
>>>>
>>>> There are many other places in ext4 where on-disk extents are inserted
>>>> into the extent status tree, such as in ext4_map_query_blocks().
>>>> Currently, they call ext4_es_insert_extent() to perform the insertion,
>>>> but they don't modify the extents, so ext4_es_cache_extent() would be a
>>>> more appropriate choice. However, when ext4_map_query_blocks() inserts
>>>> an extent, it may overwrite a short existing extent of the same type.
>>>> Therefore, to prepare for the replacements, we need to extend
>>>> ext4_es_cache_extent() to allow it to overwrite existing extents with
>>>> the same type.
>>>>
>>>> In addition, since cached extents can be more lenient than the extents
>>>> they modify and do not involve modifying reserved blocks, it is not
>>>> necessary to ensure that the insertion operation succeeds as strictly as
>>>> in the ext4_es_insert_extent() function.
>>>>
>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>
>>> Thanks for writing this series! I think we can actually simplify things
>>> event further. Extent status tree operations can be divided into three
>>> groups:
>>> 1) Lookups in es tree - protected only by i_es_lock.
>>> 2) Caching of on-disk state into es tree - protected by i_es_lock and
>>> i_data_sem (at least in read mode).
>>> 3) Modification of existing state - protected by i_es_lock and i_data_sem
>>> in write mode.
>>
>> Yeah.
>>
>>>
>>> Now because 2) has exclusion vs 3) due to i_data_sem, the observation is
>>> that 2) should never see a real conflict - i.e., all intersecting entries
>>> in es tree have the same status, otherwise this is a bug.
>>
>> While I was debugging, I observed two exceptions here.
>>
>> A. The first exceptions is about the delay extent. Since there is no actual
>> extent present in the extent tree on the disk, if a delayed extent
>> already exists in the extent status tree and someone calls
>> ext4_find_extent()->ext4_cache_extents() to cache an extent at the same
>> location, then a status mismatch will occur (attempting to replace
>> the delayed extent with a hole). This is not a bug.
>> B. I also observed that ext4_find_extent()->ext4_cache_extents() is called
>> during splitting and conversion between unwritten and written states (in
>> most scenarios, EXT4_EX_NOCACHE is not added). However, because the
>> process is in an intermediate state of handling extents, there can be
>> cases where the status do not match. I did not analyze this scenario in
>> detail, but since ext4_es_insert_extent() is called at the end of the
>> processing to ensure the final state is correct, I don't think this is a
>> practical issue either.
>
> Thanks for bringing this up. I didn't think about these two cases. As for
> case A that is easy to deal with as you write below. A hole insertion can
> be deemed compatible with existing delalloc extent.
>
Yeah.
> Case B is more difficult and I think I need to better understand the
> details there to decide what to do. Only extent splitting (as it happens
> e.g. with EXT4_GET_BLOCKS_PRE_IO) should keep extents in the extent tree and
> extent status tree compatible. So it has to be something like
> EXT4_GET_BLOCKS_CONVERT case. There indeed after we call
> ext4_ext_mark_initialized() we have initialized extent on disk but in
> extent status tree it is still as unwritten. But I just didn't find a place
> in the extent conversion path that would modify extent state on disk and
> then call ext4_find_extent(). Can you perhaps share a stacktrace where the
> extent incompatibility was hit from ext4_cache_extents()? Thanks!
>
> Honza
>
Sorry for the late. I have found several real issues during debugging this
case, the situation is a bit complicated and will take some time, I will
address these in the next iteration.
Cheers,
Yi.
next prev parent reply other threads:[~2025-11-19 9:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-31 6:29 [PATCH 0/4] ext4: replace ext4_es_insert_extent() when caching on-disk extents Zhang Yi
2025-10-31 6:29 ` [PATCH 1/4] ext4: make ext4_es_cache_extent() support overwrite existing extents Zhang Yi
2025-11-06 9:15 ` Jan Kara
2025-11-06 13:02 ` Zhang Yi
2025-11-11 10:33 ` Jan Kara
2025-11-19 9:36 ` Zhang Yi [this message]
2025-10-31 6:29 ` [PATCH 2/4] ext4: check for conflicts when caching extents Zhang Yi
2025-10-31 6:29 ` [PATCH 3/4] ext4: adjust the debug info in ext4_es_cache_extent() Zhang Yi
2025-10-31 6:29 ` [PATCH 4/4] ext4: replace ext4_es_insert_extent() when caching on-disk extents Zhang Yi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cfd95673-d0e6-44e6-86af-04bf2e0a9a8f@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=adilger.kernel@dilger.ca \
--cc=jack@suse.cz \
--cc=libaokun1@huawei.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).