public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, ojaswin@linux.ibm.com,
	ritesh.list@gmail.com, libaokun@linux.alibaba.com,
	yi.zhang@huawei.com, yizhang089@gmail.com, yangerkun@huawei.com,
	yukuai@fnnas.com
Subject: Re: [PATCH 09/10] ext4: move zero partial block range functions out of active handle
Date: Tue, 24 Mar 2026 11:10:13 +0800	[thread overview]
Message-ID: <7fa8415e-199f-4e31-8b07-08f0f73d5b99@huaweicloud.com> (raw)
In-Reply-To: <irkaodowvqmpug3moetmd75vssj6czh67qzuxs4v5vxe56qxz7@opkvl3apdpuk>

On 3/24/2026 4:17 AM, Jan Kara wrote:
> On Tue 10-03-26 09:41:00, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Move ext4_block_zero_eof() and ext4_zero_partial_blocks() calls out of
>> the active handle context, making them independent operations. This is
>> safe because it still ensures data is updated before metadata for
>> data=ordered mode and data=journal mode because we still zero data and
>> ordering data before modifying the metadata.
>>
>> This change is required for iomap infrastructure conversion because the
>> iomap buffered I/O path does not use the same journal infrastructure for
>> partial block zeroing. The lock ordering of folio lock and starting
>> transactions is "folio lock -> transaction start", which is opposite of
>> the current path. Therefore, zeroing partial blocks cannot be performed
>> under the active handle.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> So in this patch you also move ext4_zero_partial_blocks() before the
> transaction start in ext4_zero_range() and ext4_punch_hole(). However
> cannot these operations in theory fail with error such as EDQUOT or ENOSPC
> when they need to grow the extent tree as a result of the operation? In
> that case we'd return such error but partial blocks are already zeroed out
> which is a problem... 

Thank you for review this patch set!

Let me see, I believe you are referring to the cases where
ext4_xxx_remove_space() in ext4_punch_hole() and
ext4_alloc_file_blocks() in ext4_zero_range() return normal error codes
such as EDQUOT or ENOSPC.

In ext4_punch_hole(), we first zero out the partial blocks at the
beginning and end of the punch range, and then release the aligned
blocks in the middle. If errors occur during the middle of the
hole-punching operation, there will left some data in the middle of the
punch range, this is weird. Conversely, if the zeroization is performed
in sequence, the result is zeroization at the front and the retention of
valid data at the rear, which is acceptable. right? Besides, this
problem seems not occur in ext4_zero_range() because
ext4_zero_partial_blocks() is still called after
ext4_alloc_file_blocks() after this patch.

Regarding this, I found that the result is the same whether
ext4_alloc_file_blocks() is called inside transaction start or not. The
partial blocks at the head and tail have already been zeroed out after
calling ext4_zero_partial_blocks(), am I missing some thing? Or do you
suggest that we should also move ext4_zero_partial_blocks() to execute
after ext4_xxx_remove_space() in ext4_punch_hole()?

> OTOH in these cases we don't need to order the data
> at all so in principle we shouldn't need to move the zeroing earlier
> here. Am I missing something?
> 

Moving the zero partial block before the start handle is not only
because they do not need to order the data, but rather to avoid
deadlock. iomap_zero_range() acquires the folio lock, and if it is
called after ext4_journal_start(), the locking order would be
transaction start -> folio lock, which is the opposite of the locking
order used in the writeback process. If these two operations are
performed concurrently (), it could potentially result in an ABBA
deadlock.

In the subsequent iomap patch [1], I will add a WARNING and a comment
in ext4_iomap_block_zero_range(I don't see anything can prevent this)
to ensure it is not called under an active handle.

[1] https://lore.kernel.org/linux-ext4/20260203062523.3869120-18-yi.zhang@huawei.com/

Thanks,
Yi.

> 								Honza
> 
>> @@ -4722,25 +4724,18 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>>  	if (IS_ALIGNED(offset | end, blocksize))
>>  		return ret;
>>  
>> -	/*
>> -	 * In worst case we have to writeout two nonadjacent unwritten
>> -	 * blocks and update the inode
>> -	 */
>> -	credits = (2 * ext4_ext_index_trans_blocks(inode, 2)) + 1;
>> -	if (ext4_should_journal_data(inode))
>> -		credits += 2;
>> -	handle = ext4_journal_start(inode, EXT4_HT_MISC, credits);
>> +	/* Zero out partial block at the edges of the range */
>> +	ret = ext4_zero_partial_blocks(inode, offset, len);
>> +	if (ret)
>> +		return ret;
>> +
>> +	handle = ext4_journal_start(inode, EXT4_HT_MISC, 1);
>>  	if (IS_ERR(handle)) {
>>  		ret = PTR_ERR(handle);
>>  		ext4_std_error(inode->i_sb, ret);
>>  		return ret;
>>  	}
>>  
>> -	/* Zero out partial block at the edges of the range */
>> -	ret = ext4_zero_partial_blocks(inode, offset, len);
>> -	if (ret)
>> -		goto out_handle;
>> -
>>  	if (new_size)
>>  		ext4_update_inode_size(inode, new_size);
>>  	ret = ext4_mark_inode_dirty(handle, inode);
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index d5b783a7c814..5288d36b0f09 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -4443,8 +4443,12 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>>  	if (ret)
>>  		return ret;
>>  
>> +	ret = ext4_zero_partial_blocks(inode, offset, length);
>> +	if (ret)
>> +		return ret;
>> +
>>  	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
>> -		credits = ext4_chunk_trans_extent(inode, 2);
>> +		credits = ext4_chunk_trans_extent(inode, 0);
>>  	else
>>  		credits = ext4_blocks_for_truncate(inode);
>>  	handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
>> @@ -4454,10 +4458,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>>  		return ret;
>>  	}
>>  
>> -	ret = ext4_zero_partial_blocks(inode, offset, length);
>> -	if (ret)
>> -		goto out_handle;
>> -
>>  	/* If there are blocks to remove, do it */
>>  	start_lblk = EXT4_B_TO_LBLK(inode, offset);
>>  	end_lblk = end >> inode->i_blkbits;


  reply	other threads:[~2026-03-24  3:10 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10  1:40 [PATCH 00/10] ext4: refactor partial block zero-out for iomap conversion Zhang Yi
2026-03-10  1:40 ` [PATCH 01/10] ext4: add did_zero output parameter to ext4_block_zero_page_range() Zhang Yi
2026-03-23 16:33   ` Jan Kara
2026-03-10  1:40 ` [PATCH 02/10] ext4: ext4_block_truncate_page() returns zeroed length on success Zhang Yi
2026-03-23 16:34   ` Jan Kara
2026-03-10  1:40 ` [PATCH 03/10] ext4: rename and extend ext4_block_truncate_page() Zhang Yi
2026-03-23 16:42   ` Jan Kara
2026-03-10  1:40 ` [PATCH 04/10] ext4: factor out journalled block zeroing range Zhang Yi
2026-03-23 16:48   ` Jan Kara
2026-03-24  3:16     ` Zhang Yi
2026-03-10  1:40 ` [PATCH 05/10] ext4: rename ext4_block_zero_page_range() to ext4_block_zero_range() Zhang Yi
2026-03-23 16:49   ` Jan Kara
2026-03-10  1:40 ` [PATCH 06/10] ext4: move ordered data handling out of ext4_block_do_zero_range() Zhang Yi
2026-03-23 16:59   ` Jan Kara
2026-03-24  3:17     ` Zhang Yi
2026-03-10  1:40 ` [PATCH 07/10] ext4: remove handle parameters from zero partial block functions Zhang Yi
2026-03-23 17:27   ` Jan Kara
2026-03-10  1:40 ` [PATCH 08/10] ext4: pass allocate range as loff_t to ext4_alloc_file_blocks() Zhang Yi
2026-03-23 17:06   ` Jan Kara
2026-03-10  1:41 ` [PATCH 09/10] ext4: move zero partial block range functions out of active handle Zhang Yi
2026-03-23 20:17   ` Jan Kara
2026-03-24  3:10     ` Zhang Yi [this message]
2026-03-24  3:14       ` Zhang Yi
2026-03-24 13:56       ` Jan Kara
2026-03-10  1:41 ` [PATCH 10/10] ext4: zero post-EOF partial block before appending write Zhang Yi
2026-03-23 20:31   ` Jan Kara
2026-03-24  3:29     ` Zhang Yi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7fa8415e-199f-4e31-8b07-08f0f73d5b99@huaweicloud.com \
    --to=yi.zhang@huaweicloud.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=libaokun@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yizhang089@gmail.com \
    --cc=yukuai@fnnas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox