Re: [PATCH v2 11/13] ext4: switch to using the new extent movement method

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, yi.zhang@huawei.com,
	libaokun1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH v2 11/13] ext4: switch to using the new extent movement method
Date: Thu, 9 Oct 2025 20:24:10 +0800	[thread overview]
Message-ID: <25b45870-c0a8-4f4e-bd3b-2d962b7a31a3@huaweicloud.com> (raw)
In-Reply-To: <5g66nxbf3ay2bryv4legk46pudqonsbrdkxr5ljegbxaydkctk@2dyyoxguxyxu>

On 10/9/2025 5:14 PM, Jan Kara wrote:
> On Thu 09-10-25 15:20:59, Zhang Yi wrote:
>> On 10/8/2025 8:49 PM, Jan Kara wrote:
>>> On Thu 25-09-25 17:26:07, Zhang Yi wrote:
>>>> +			if (ret == -EBUSY &&
>>>> +			    sbi->s_journal && retries++ < 4 &&
>>>> +			    jbd2_journal_force_commit_nested(sbi->s_journal))
>>>> +				continue;
>>>> +			if (ret)
>>>>  				goto out;
>>>> -		} else { /* in_range(o_start, o_blk, o_len) */
>>>> -			cur_len += cur_blk - o_start;
>>>> +
>>>> +			*moved_len += m_len;
>>>> +			retries = 0;
>>>>  		}
>>>> -		unwritten = ext4_ext_is_unwritten(ex);
>>>> -		if (o_end - o_start < cur_len)
>>>> -			cur_len = o_end - o_start;
>>>> -
>>>> -		orig_page_index = o_start >> (PAGE_SHIFT -
>>>> -					       orig_inode->i_blkbits);
>>>> -		donor_page_index = d_start >> (PAGE_SHIFT -
>>>> -					       donor_inode->i_blkbits);
>>>> -		offset_in_page = o_start % blocks_per_page;
>>>> -		if (cur_len > blocks_per_page - offset_in_page)
>>>> -			cur_len = blocks_per_page - offset_in_page;
>>>> -		/*
>>>> -		 * Up semaphore to avoid following problems:
>>>> -		 * a. transaction deadlock among ext4_journal_start,
>>>> -		 *    ->write_begin via pagefault, and jbd2_journal_commit
>>>> -		 * b. racing with ->read_folio, ->write_begin, and
>>>> -		 *    ext4_get_block in move_extent_per_page
>>>> -		 */
>>>> -		ext4_double_up_write_data_sem(orig_inode, donor_inode);
>>>> -		/* Swap original branches with new branches */
>>>> -		*moved_len += move_extent_per_page(o_filp, donor_inode,
>>>> -				     orig_page_index, donor_page_index,
>>>> -				     offset_in_page, cur_len,
>>>> -				     unwritten, &ret);
>>>> -		ext4_double_down_write_data_sem(orig_inode, donor_inode);
>>>> -		if (ret < 0)
>>>> -			break;
>>>> -		o_start += cur_len;
>>>> -		d_start += cur_len;
>>>> +		orig_blk += mext.orig_map.m_len;
>>>> +		donor_blk += mext.orig_map.m_len;
>>>> +		len -= mext.orig_map.m_len;
>>>
>>> In case we've called mext_move_extent() we should update everything only by
>>> m_len, shouldn't we? Although I have somewhat hard time coming up with a
>>> realistic scenario where m_len != mext.orig_map.m_len for the parameters we
>>> call ext4_swap_extents() with... So maybe I'm missing something.
>>
>> In the case of MEXT_SKIP_EXTENT, the target move range of the donor file
>> is a hole. In this case, the m_len is return zero after calling
>> mext_move_extent(), not equal to mext.orig_map.m_len, and we need to move
>> forward and skip this range in the next iteration in ext4_move_extents().
>> Otherwise, it will lead to an infinite loop.
> 
> Right, that would be a problem. I thought this shouldn't happen because we
> call mext_move_extent() only if we have mapped or unwritten extent but if
> donor inode has a hole in the same place, MEXT_SKIP_EXTENT can still
> happen.

Yes, we can choose to simultaneously check the extent status of both the
origin inode and the donor inode before calling mext_move_extent(), and
only call mext_move_extent() when both extents are either mapped or
unwritten. However, the current iomap infrastructure (iomap_iter()) does
not support getting extents for two inodes simultaneously. In order to
facilitate a smoother conversion to iomap in the future (I've still
thinking details about how to switch this to iomap), I have only checked
the original inode in ext4_move_extents() and deferred the extent check
for the donor inode. At least, I think it should not be a significant
problem for now, as the presence of holes in the donor file is uncommon.

> 
>> In the other two cases, MEXT_MOVE_EXTENT and MEXT_COPY_DATA, m_len should
>> be equal to mext.orig_map.m_len after calling mext_move_extent().
> 
> So this is the bit which isn't 100% clear to me. Because what looks fishy
> to me is that ext4_swap_extents() can fail after swapping part of the
> passed range (e.g. due to extent split failure). In that case we'll return
> number smaller than mext.orig_map.m_len. Now that I'm looking again, we'll
> set *erp in all those cases (there are cases where ext4_swap_extents()
> returns smaller number even without setting *erp but I don't think those
> can happen given the locks we hold and what we've already verified - still

Yes, ext4_swap_extents() could shortly return if it encounters a hole.
However, we have already verified this case under locks. So this can not
happen.

> it would be good to add an assert for this in mext_move_extent()) so the

Sure.

> problem would rather be that we don't advance by m_len in case of error
> returned from mext_move_extent()?
> 

Yeah, you are right, this is a problem, I missed this case. As long as
m_len is not zero, we still need to increase move_len in
ext4_move_extents(), even if mext_move_extent() returns an error code.
Thank you for such a detailed review! :-)

Best Regards,
Yi.

next prev parent reply	other threads:[~2025-10-09 12:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-25  9:25 [PATCH v2 00/13] ext4: optimize online defragment Zhang Yi
2025-09-25  9:25 ` [PATCH v2 01/13] ext4: fix an off-by-one issue during moving extents Zhang Yi
2025-09-25  9:25 ` [PATCH v2 02/13] ext4: correct the checking of quota files before " Zhang Yi
2025-10-08 11:17   ` Jan Kara
2025-09-25  9:25 ` [PATCH v2 03/13] ext4: introduce seq counter for the extent status entry Zhang Yi
2025-10-08 11:26   ` Jan Kara
2025-10-08 11:44   ` Jan Kara
2025-10-09  6:52     ` Zhang Yi
2025-09-25  9:26 ` [PATCH v2 04/13] ext4: make ext4_es_lookup_extent() pass out the extent seq counter Zhang Yi
2025-10-08 11:28   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 05/13] ext4: pass out extent seq counter when mapping blocks Zhang Yi
2025-10-08 11:36   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 06/13] ext4: use EXT4_B_TO_LBLK() in mext_check_arguments() Zhang Yi
2025-10-08 11:45   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 07/13] ext4: add mext_check_validity() to do basic check Zhang Yi
2025-10-08 11:47   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 08/13] ext4: refactor mext_check_arguments() Zhang Yi
2025-10-08 11:51   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 09/13] ext4: rename mext_page_mkuptodate() to mext_folio_mkuptodate() Zhang Yi
2025-10-08 11:51   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 10/13] ext4: introduce mext_move_extent() Zhang Yi
2025-10-08 12:16   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 11/13] ext4: switch to using the new extent movement method Zhang Yi
2025-10-08 12:49   ` Jan Kara
2025-10-09  7:20     ` Zhang Yi
2025-10-09  9:14       ` Jan Kara
2025-10-09 12:24         ` Zhang Yi [this message]
2025-09-25  9:26 ` [PATCH v2 12/13] ext4: add large folios support for moving extents Zhang Yi
2025-10-08 12:53   ` Jan Kara
2025-10-09  7:23     ` Zhang Yi
2025-09-25  9:26 ` [PATCH v2 13/13] ext4: add two trace points " Zhang Yi
2025-10-08 12:54   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25b45870-c0a8-4f4e-bd3b-2d962b7a31a3@huaweicloud.com \
    --to=yi.zhang@huaweicloud.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=libaokun1@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).