linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Jan Kara <jack@suse.cz>
Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, yi.zhang@huawei.com,
	yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH v2 00/13] ext4: replace ext4_es_insert_extent() when caching on-disk extents
Date: Fri, 28 Nov 2025 12:37:33 +0800	[thread overview]
Message-ID: <bb5b9323-d7ea-467b-a59f-ef6e415e766a@huaweicloud.com> (raw)
In-Reply-To: <yfekmxz7biiuvairgen2pw6laccs4qvblt56uxmqenyckt2pp6@rfagttgqpdfr>

On 11/27/2025 8:24 PM, Jan Kara wrote:
> Hi Yi!
> 
> On Mon 24-11-25 13:04:04, Zhang Yi wrote:
>> On 11/23/2025 6:55 PM, Ojaswin Mujoo wrote:
>>> On Fri, Nov 21, 2025 at 02:07:58PM +0800, Zhang Yi wrote:
>>>> Changes since v1:
>>>>  - Rebase the codes based on the latest linux-next 20251120.
>>>>  - Add patches 01-05, fix two stale data problems caused by
>>>
>>> Hi Zhang, thanks for the patches.
>>>
>>
>> Thank you for take time to look at this series.
>>
>>> I've always felt uncomfortable with the ZEROOUT code here because it
>>> seems to have many such bugs as you pointed out in the series. Its very
>>> fragile and the bugs are easy to miss behind all the data valid and
>>> split flags mess. 
>>>
>>
>> Yes, I agree with you. The implementation of EXT4_EXT_MAY_ZEROOUT has
>> significantly increased the complexity of split extents and the
>> potential for bugs.
> 
> Yep, that code is complex and prone to bugs.
> 
>>> As per my understanding, ZEROOUT logic seems to be a special best-effort
>>> try to make the split/convert operation "work" when dealing with
>>> transient errors like ENOSPC etc. I was just wondering if it makes sense
>>> to just get rid of the whole ZEROOUT logic completely and just reset the
>>> extent to orig state if there is any error. This allows us to get rid of
>>> DATA_VALID* flags as well and makes the whole ext4_split_convert_extents() 
>>> slightly less messy.
>>>
>>> Maybe we can have a retry loop at the top level caller if we want to try
>>> again for say ENOSPC or ENOMEM. 
>>>
>>> Would love to hear your thoughts on it.
>>
>> I think this is a direction worth exploring. However, what I am
>> currently considering is that we need to address this scenario of
>> splitting extent during the I/O completion. Although the ZEROOUT logic
>> is fragile and has many issues recently, it currently serves as a
>> fallback solution for handling ENOSPC errors that arise when splitting
>> extents during I/O completion. It ensures that I/O operations do not
>> fail due to insufficient extent blocks.
> 
> Also partial extent zeroout offers a good performance win when the
> portion needing zeroout is small (we can save extent splitting). And I
> agree it is a good safety net for ENOSPC issues - otherwise there's no
> guarantee page writeback can finish without hitting ENOSPC. We do have
> reserved blocks for these cases but the pool is limited so you can still
> run out of blocks if you try hard enough.

Yes, Indeed!

> 
>> Please see ext4_convert_unwritten_extents_endio(). Although we have made
>> our best effort to tried to split extents using
>> EXT4_GET_BLOCKS_IO_CREATE_EXT before issuing I/Os, we still have not
>> covered all scenarios. Moreover, after converting the buffered I/O path
>> to the iomap infrastructure in the future, we may need to split extents
>> during the I/O completion worker[1].
> 
> Yes, this might be worth exploring. The advantage of doing extent splitting
> in advance is that on IO submission you have the opportunity of restarting
> the transaction on ENOSPC to possibly release some blocks. This is not
> easily doable e.g. on writeback completion so the pressure on the pool of
> reserved blocks is going to be more common (previously you needed reserved
> blocks only when writeback was racing with fallocate or similar, now you
> may need them each time you write in the middle of unwritten extent). So I
> think the change will need some testing whether it isn't too easy to hit
> ENOSPC conditions on IO completion without EXT4_GET_BLOCKS_IO_CREATE_EXT.
> But otherwise it's always nice to remove code :)
>  
> 								Honza

Yes, we need to be very careful about this, I have written a POC patch and
am currently conducting related tests. I hope it works.  :-)

Thanks,
Yi.


  reply	other threads:[~2025-11-28  4:37 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-21  6:07 [PATCH v2 00/13] ext4: replace ext4_es_insert_extent() when caching on-disk extents Zhang Yi
2025-11-21  6:07 ` [PATCH v2 01/13] ext4: cleanup zeroout in ext4_split_extent_at() Zhang Yi
2025-11-26 11:26   ` Ojaswin Mujoo
2025-11-27 12:02   ` Jan Kara
2025-11-28  2:22     ` Zhang Yi
2025-11-28  1:54   ` Baokun Li
2025-11-21  6:08 ` [PATCH v2 02/13] ext4: subdivide EXT4_EXT_DATA_VALID1 Zhang Yi
2025-11-26 11:27   ` Ojaswin Mujoo
2025-11-26 11:55     ` Ojaswin Mujoo
2025-11-27  6:09       ` Zhang Yi
2025-11-21  6:08 ` [PATCH v2 03/13] ext4: don't zero the entire extent if EXT4_EXT_DATA_PARTIAL_VALID1 Zhang Yi
2025-11-26 11:29   ` Ojaswin Mujoo
2025-11-27  6:13     ` Zhang Yi
2025-11-27 13:41   ` Jan Kara
2025-11-28  3:45     ` Zhang Yi
2025-11-28 10:58       ` Jan Kara
2025-11-28  7:28     ` Ojaswin Mujoo
2025-11-28 11:14       ` Jan Kara
2025-11-28 14:20         ` Ojaswin Mujoo
2025-11-28 19:52         ` Andreas Dilger
2025-11-29 18:41           ` Ojaswin Mujoo
2025-11-21  6:08 ` [PATCH v2 04/13] ext4: don't set EXT4_GET_BLOCKS_CONVERT when splitting before submitting I/O Zhang Yi
2025-11-26 11:50   ` Ojaswin Mujoo
2025-11-21  6:08 ` [PATCH v2 05/13] ext4: correct the mapping status if the extent has been zeroed Zhang Yi
2025-11-26 11:56   ` Ojaswin Mujoo
2025-11-21  6:08 ` [PATCH v2 06/13] ext4: don't cache extent during splitting extent Zhang Yi
2025-11-26 12:04   ` Ojaswin Mujoo
2025-11-27  7:01     ` Zhang Yi
2025-11-28  8:18       ` Ojaswin Mujoo
2025-11-21  6:08 ` [PATCH v2 07/13] ext4: drop extent cache before " Zhang Yi
2025-11-26 12:24   ` Ojaswin Mujoo
2025-11-27  7:27     ` Zhang Yi
2025-11-28  8:16       ` Ojaswin Mujoo
2025-11-29  1:36         ` Zhang Yi
2025-11-21  6:08 ` [PATCH v2 08/13] ext4: cleanup useless out tag in __es_remove_extent() Zhang Yi
2025-11-27 12:43   ` Jan Kara
2025-11-28  8:20   ` Ojaswin Mujoo
2025-11-21  6:08 ` [PATCH v2 09/13] ext4: make __es_remove_extent() check extent status Zhang Yi
2025-11-21  6:08 ` [PATCH v2 10/13] ext4: make ext4_es_cache_extent() support overwrite existing extents Zhang Yi
2025-11-21  6:08 ` [PATCH v2 11/13] ext4: adjust the debug info in ext4_es_cache_extent() Zhang Yi
2025-11-21  6:08 ` [PATCH v2 12/13] ext4: replace ext4_es_insert_extent() when caching on-disk extents Zhang Yi
2025-11-21  6:08 ` [PATCH v2 13/13] ext4: drop the TODO comment in ext4_es_insert_extent() Zhang Yi
2025-11-23 10:55 ` [PATCH v2 00/13] ext4: replace ext4_es_insert_extent() when caching on-disk extents Ojaswin Mujoo
2025-11-24  5:04   ` Zhang Yi
2025-11-24 12:50     ` Ojaswin Mujoo
2025-11-24 14:05       ` Zhang Yi
2025-11-27 12:24     ` Jan Kara
2025-11-28  4:37       ` Zhang Yi [this message]
2025-11-28 16:49 ` Theodore Tso
2025-11-29  1:32   ` Zhang Yi
2025-11-29  3:52     ` Theodore Tso
2025-11-29  4:44       ` Zhang Yi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb5b9323-d7ea-467b-a59f-ef6e415e766a@huaweicloud.com \
    --to=yi.zhang@huaweicloud.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=libaokun1@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=tytso@mit.edu \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yizhang089@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).