Re: [PATCH v3 09/12] ext4: introduce mext_move_extent()

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, yi.zhang@huawei.com,
	libaokun1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH v3 09/12] ext4: introduce mext_move_extent()
Date: Sat, 11 Oct 2025 09:20:33 +0800	[thread overview]
Message-ID: <021b2564-4f1d-4dd7-b98c-569668c8359a@huaweicloud.com> (raw)
In-Reply-To: <pkhkxgsoa3e3svcwudqo5jckurdqnhkdd6ckbkvgp424lxfcvn@h4nazw5rrd77>

On 10/10/2025 9:38 PM, Jan Kara wrote:
> On Fri 10-10-25 18:33:23, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> When moving extents, the current move_extent_per_page() process can only
>> move extents of length PAGE_SIZE at a time, which is highly inefficient,
>> especially when the fragmentation of the file is not particularly
>> severe, this will result in a large number of unnecessary extent split
>> and merge operations. Moreover, since the ext4 file system now supports
>> large folios, using PAGE_SIZE as the processing unit is no longer
>> practical.
>>
>> Therefore, introduce a new move extents method, mext_move_extent(). It
>> moves one extent of the origin inode at a time, but not exceeding the
>> size of a folio. The parameters for the move are passed through the new
>> mext_data data structure, which includes the origin inode, donor inode,
>> the mapping extent of the origin inode to be moved, and the starting
>> offset of the donor inode.
>>
>> The move process is similar to move_extent_per_page() and can be
>> categorized into three types: MEXT_SKIP_EXTENT, MEXT_MOVE_EXTENT, and
>> MEXT_COPY_DATA. MEXT_SKIP_EXTENT indicates that the corresponding area
>> of the donor file is a hole, meaning no actual space is allocated, so
>> the move is skipped. MEXT_MOVE_EXTENT indicates that the corresponding
>> areas of both the origin and donor files are unwritten, so no data needs
>> to be copied; only the extents are swapped. MEXT_COPY_DATA indicates
>> that the corresponding areas of both the origin and donor files contain
>> data, so data must be copied. The data copying is performed in three
>> steps: first, the data from the original location is read into the page
>> cache; then, the extents are swapped, and the page cache is rebuilt to
>> reflect the index of the physical blocks; finally, the dirty page cache
>> is marked and written back to ensure that the data is written to disk
>> before the metadata is persisted.
>>
>> One important point to note is that the folio lock and i_data_sem are
>> held only during the moving process. Therefore, before moving an extent,
>> it is necessary to check whether the sequence cookie of the area to be
>> moved has changed while holding the folio lock. If a change is detected,
>> it indicates that concurrent write-back operations may have occurred
>> during this period, and the type of the extent to be moved can no longer
>> be considered reliable. For example, it may have changed from unwritten
>> to written. In such cases, return -ESTALE, and the calling function
>> should reacquire the move extent of the original file and retry the
>> movement.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> ...
> 
>> +static __used int mext_move_extent(struct mext_data *mext, u64 *m_len)
>> +{
>> +	struct inode *orig_inode = mext->orig_inode;
>> +	struct inode *donor_inode = mext->donor_inode;
>> +	struct ext4_map_blocks *orig_map = &mext->orig_map;
>> +	unsigned int blkbits = orig_inode->i_blkbits;
>> +	struct folio *folio[2] = {NULL, NULL};
>> +	loff_t from, length;
>> +	enum mext_move_type move_type = 0;
>> +	handle_t *handle;
>> +	u64 r_len = 0;
>> +	unsigned int credits;
>> +	int ret, ret2;
>> +
>> +	*m_len = 0;
>> +	credits = ext4_chunk_trans_extent(orig_inode, 0) * 2;
>> +	handle = ext4_journal_start(orig_inode, EXT4_HT_MOVE_EXTENTS, credits);
>> +	if (IS_ERR(handle))
>> +		return PTR_ERR(handle);
>> +
>> +	ret = mext_move_begin(mext, folio, &move_type);
>> +	if (ret)
>> +		goto stop_handle;
>> +
>> +	if (move_type == MEXT_SKIP_EXTENT)
>> +		goto unlock;
>> +
>> +	/*
>> +	 * Copy the data. First, read the original inode data into the page
>> +	 * cache. Then, release the existing mapping relationships and swap
>> +	 * the extent. Finally, re-establish the new mapping relationships
>> +	 * and dirty the page cache.
>> +	 */
>> +	if (move_type == MEXT_COPY_DATA) {
>> +		from = offset_in_folio(folio[0],
>> +				((loff_t)orig_map->m_lblk) << blkbits);
>> +		length = ((loff_t)orig_map->m_len) << blkbits;
>> +
>> +		ret = mext_folio_mkuptodate(folio[0], from, from + length);
>> +		if (ret)
>> +			goto unlock;
>> +	}
>> +
>> +	if (!filemap_release_folio(folio[0], 0) ||
>> +	    !filemap_release_folio(folio[1], 0)) {
>> +		ret = -EBUSY;
>> +		goto unlock;
>> +	}
>> +
>> +	/* Move extent */
>> +	ext4_double_down_write_data_sem(orig_inode, donor_inode);
>> +	*m_len = ext4_swap_extents(handle, orig_inode, donor_inode,
>> +				   orig_map->m_lblk, mext->donor_lblk,
>> +				   orig_map->m_len, 1, &ret);
>> +	ext4_double_up_write_data_sem(orig_inode, donor_inode);
>> +
>> +	/* A short-length swap cannot occur after a successful swap extent. */
>> +	if (WARN_ON_ONCE(!ret && (*m_len != orig_map->m_len)))
>> +		ret = -EIO;
>> +
>> +	if (!(*m_len) || (move_type == MEXT_MOVE_EXTENT))
>> +		goto unlock;
>> +
>> +	/* Copy data */
>> +	length = (*m_len) << blkbits;
>> +	ret = mext_folio_mkwrite(orig_inode, folio[0], from, from + length);
>> +	if (ret)
>> +		goto repair_branches;
> 
> I think you need to be careful here and below to not overwrite 'ret' if it
> is != 0. So something like:
> 
> 	ret2 = mext_folio_mkwrite(..)
> 	if (ret2) {
> 		if (!ret)
> 			ret = ret2;
> 		goto repair_branches;
> 	}
> 
> and something similar below. Otherwise the patch looks good to me.
> 
> 								Honza

OK, although overwrite 'ret' seems fine, it's better to keep it.

Thanks,
Yi.

next prev parent reply	other threads:[~2025-10-11  1:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-10 10:33 [PATCH v3 00/12] ext4: optimize online defragment Zhang Yi
2025-10-10 10:33 ` [PATCH v3 01/12] ext4: correct the checking of quota files before moving extents Zhang Yi
2025-10-10 10:33 ` [PATCH v3 02/12] ext4: introduce seq counter for the extent status entry Zhang Yi
2025-10-10 10:33 ` [PATCH v3 03/12] ext4: make ext4_es_lookup_extent() pass out the extent seq counter Zhang Yi
2025-10-10 10:33 ` [PATCH v3 04/12] ext4: pass out extent seq counter when mapping blocks Zhang Yi
2025-10-10 10:33 ` [PATCH v3 05/12] ext4: use EXT4_B_TO_LBLK() in mext_check_arguments() Zhang Yi
2025-10-10 10:33 ` [PATCH v3 06/12] ext4: add mext_check_validity() to do basic check Zhang Yi
2025-10-10 10:33 ` [PATCH v3 07/12] ext4: refactor mext_check_arguments() Zhang Yi
2025-10-10 10:33 ` [PATCH v3 08/12] ext4: rename mext_page_mkuptodate() to mext_folio_mkuptodate() Zhang Yi
2025-10-10 10:33 ` [PATCH v3 09/12] ext4: introduce mext_move_extent() Zhang Yi
2025-10-10 13:38   ` Jan Kara
2025-10-11  1:20     ` Zhang Yi [this message]
2025-10-10 10:33 ` [PATCH v3 10/12] ext4: switch to using the new extent movement method Zhang Yi
2025-10-10 13:41   ` Jan Kara
2025-10-10 10:33 ` [PATCH v3 11/12] ext4: add large folios support for moving extents Zhang Yi
2025-10-10 10:33 ` [PATCH v3 12/12] ext4: add two trace points " Zhang Yi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=021b2564-4f1d-4dd7-b98c-569668c8359a@huaweicloud.com \
    --to=yi.zhang@huaweicloud.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=libaokun1@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).