linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/13] ext4: optimize online defragment
@ 2025-09-25  9:25 Zhang Yi
  2025-09-25  9:25 ` [PATCH v2 01/13] ext4: fix an off-by-one issue during moving extents Zhang Yi
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: Zhang Yi @ 2025-09-25  9:25 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	yi.zhang, yi.zhang, libaokun1, yukuai3, yangerkun

From: Zhang Yi <yi.zhang@huawei.com>

Changes since v1:
 - Fix the syzbot issues reported in v1 by adjusting the order of
   parameter checks in mext_check_validity() in patches 07 and 08.

v1: https://lore.kernel.org/linux-ext4/20250923012724.2378858-1-yi.zhang@huaweicloud.com/


Original Description:

Currently, the online defragmentation of the ext4 is primarily
implemented through the move extent operation in the kernel. This
extent-moving operates at the granularity of PAGE_SIZE, iteratively
performing extent swapping and data movement operations, which is quite
inefficient. Especially since ext4 now supports large folios, iterations
at the PAGE_SIZE granularity are no longer practical and fail to
leverage the advantages of large folios. Additionally, the current
implementation is tightly coupled with buffer_head, making it unable to
support after the conversion of buffered I/O processes to the iomap
infrastructure.

This patch set (based on 6.17-rc7) optimizes the extent-moving process,
deprecates the old move_extent_per_page() interface, and introduces a
new mext_move_extent() interface. The new interface iterates over and
copies data based on the extents of the original file instead of the
PAGE_SIZE, and supporting large folios. The data processing logic in the
iteration remains largely consistent with previous versions, with no
additional optimizations or changes made. 

Additionally, the primary objective of this set of patches is to prepare
for converting the buffered I/O process for regular files to the iomap
infrastructure. These patches decouple the buffer_head from the main
extent-moving process, restricting its use to only the helpers
mext_folio_mkwrite() and mext_folio_mkuptodate(), which handle updating
and marking pages in the swapped page cache as dirty. The overall coding
style of the extent-moving process aligns with the iomap infrastructure,
laying the foundation for supporting online defragmentation once the
iomap infrastructure is adopted.

Patch overview:

Patch 1:    Fix an off-by-one issue.
Patch 2:    Fix a minor issue related to validity checking.
Patch 3-5:  Introduce a sequence counter for the mapping extent status
            tree, this also prepares for the iomap infrastructure.
Patch 6-8:  Refactor the mext_check_arguments() helper function and the
            validity checking to improve code readability.
Patch 9-13: Drop move_extent_per_page() and switch to using the new
            mext_move_extent(). Additionally, add support for large
            folios.

With this patch set, the efficiency of online defragmentation for the
ext4 file system can also be improved under general circumstances. Below
is a set of typical test obtained using the fio e4defrag ioengine on the
environment with Intel Xeon Gold 6240 CPU, 400G memory and a NVMe SSD
device.

  [defrag]
  directory=/mnt
  filesize=400G
  buffered=1
  fadvise_hint=0
  ioengine=e4defrag
  bs=4k         # 4k,32k,128k
  donorname=test.def
  filename=test
  inplace=0
  rw=write
  overwrite=0   # 0 for unwritten extent and 1 for written extent
  numjobs=1
  iodepth=1
  runtime=30s

  [w/o]
   U 4k:    IOPS=225k,  BW=877MiB/s      # U: unwritten extent-moving
   U 32k:   IOPS=33.2k, BW=1037MiB/s
   U 128k:  IOPS=8510,  BW=1064MiB/s
   M 4k:    IOPS=19.8k, BW=77.2MiB/s     # M: written extent-moving
   M 32k:   IOPS=2502,  BW=78.2MiB/s
   M 128k:  IOPS=635,   BW=79.5MiB/s

  [w]
   U 4k:    IOPS=246k,  BW=963MiB/s
   U 32k:   IOPS=209k,  BW=6529MiB/s
   U 128k:  IOPS=146k,  BW=17.8GiB/s
   M 4k:    IOPS=19.5k, BW=76.2MiB/s
   M 32k:   IOPS=4091,  BW=128MiB/s
   M 128k:  IOPS=2814,  BW=352MiB/s 


Best Regards,
Yi.


Zhang Yi (13):
  ext4: fix an off-by-one issue during moving extents
  ext4: correct the checking of quota files before moving extents
  ext4: introduce seq counter for the extent status entry
  ext4: make ext4_es_lookup_extent() pass out the extent seq counter
  ext4: pass out extent seq counter when mapping blocks
  ext4: use EXT4_B_TO_LBLK() in mext_check_arguments()
  ext4: add mext_check_validity() to do basic check
  ext4: refactor mext_check_arguments()
  ext4: rename mext_page_mkuptodate() to mext_folio_mkuptodate()
  ext4: introduce mext_move_extent()
  ext4: switch to using the new extent movement method
  ext4: add large folios support for moving extents
  ext4: add two trace points for moving extents

 fs/ext4/ext4.h              |   3 +
 fs/ext4/extents.c           |   2 +-
 fs/ext4/extents_status.c    |  27 +-
 fs/ext4/extents_status.h    |   2 +-
 fs/ext4/inode.c             |  28 +-
 fs/ext4/ioctl.c             |  10 -
 fs/ext4/move_extent.c       | 773 ++++++++++++++++--------------------
 fs/ext4/super.c             |   1 +
 include/trace/events/ext4.h |  97 ++++-
 9 files changed, 486 insertions(+), 457 deletions(-)

-- 
2.46.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-10-09 12:24 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-25  9:25 [PATCH v2 00/13] ext4: optimize online defragment Zhang Yi
2025-09-25  9:25 ` [PATCH v2 01/13] ext4: fix an off-by-one issue during moving extents Zhang Yi
2025-09-25  9:25 ` [PATCH v2 02/13] ext4: correct the checking of quota files before " Zhang Yi
2025-10-08 11:17   ` Jan Kara
2025-09-25  9:25 ` [PATCH v2 03/13] ext4: introduce seq counter for the extent status entry Zhang Yi
2025-10-08 11:26   ` Jan Kara
2025-10-08 11:44   ` Jan Kara
2025-10-09  6:52     ` Zhang Yi
2025-09-25  9:26 ` [PATCH v2 04/13] ext4: make ext4_es_lookup_extent() pass out the extent seq counter Zhang Yi
2025-10-08 11:28   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 05/13] ext4: pass out extent seq counter when mapping blocks Zhang Yi
2025-10-08 11:36   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 06/13] ext4: use EXT4_B_TO_LBLK() in mext_check_arguments() Zhang Yi
2025-10-08 11:45   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 07/13] ext4: add mext_check_validity() to do basic check Zhang Yi
2025-10-08 11:47   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 08/13] ext4: refactor mext_check_arguments() Zhang Yi
2025-10-08 11:51   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 09/13] ext4: rename mext_page_mkuptodate() to mext_folio_mkuptodate() Zhang Yi
2025-10-08 11:51   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 10/13] ext4: introduce mext_move_extent() Zhang Yi
2025-10-08 12:16   ` Jan Kara
2025-09-25  9:26 ` [PATCH v2 11/13] ext4: switch to using the new extent movement method Zhang Yi
2025-10-08 12:49   ` Jan Kara
2025-10-09  7:20     ` Zhang Yi
2025-10-09  9:14       ` Jan Kara
2025-10-09 12:24         ` Zhang Yi
2025-09-25  9:26 ` [PATCH v2 12/13] ext4: add large folios support for moving extents Zhang Yi
2025-10-08 12:53   ` Jan Kara
2025-10-09  7:23     ` Zhang Yi
2025-09-25  9:26 ` [PATCH v2 13/13] ext4: add two trace points " Zhang Yi
2025-10-08 12:54   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).