linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/12] ext4: optimize online defragment
@ 2025-10-10 10:33 Zhang Yi
  2025-10-10 10:33 ` [PATCH v3 01/12] ext4: correct the checking of quota files before moving extents Zhang Yi
                   ` (11 more replies)
  0 siblings, 12 replies; 16+ messages in thread
From: Zhang Yi @ 2025-10-10 10:33 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	yi.zhang, yi.zhang, libaokun1, yukuai3, yangerkun

From: Zhang Yi <yi.zhang@huawei.com>

Changes since v2:
 - Rebase patches to the 6.18-5472d60c129f.
 - Patch 02, add a TODO comment, we should optimize the increasement of
   the extents sequence counter ext4_es_insert_extent() in the future as
   Jan suggested.
 - Patch 09, add a WARN_ON_ONCE if ext4_swap_extents() return
   successfully but the swapped length is shorter than required. Also,
   copy data if some extents have been swapped to prevent data loss.
   Finally, fix the comment as Jan suggested.
 - Patch 10, fix the increasement of moved_len in ext4_move_extents()
   as Jan pointed out.
 - Patch 11, fix potential overflow issues on the left shift as Jan
   pointed out.
 - Add review tag in patch 01-08,11-12 from Jan.
Changes since v1:
 - Fix the syzbot issues reported in v1 by adjusting the order of
   parameter checks in mext_check_validity() in patches 07 and 08.

v2: https://lore.kernel.org/linux-ext4/20250925092610.1936929-1-yi.zhang@huaweicloud.com/
v1: https://lore.kernel.org/linux-ext4/20250923012724.2378858-1-yi.zhang@huaweicloud.com/


Original Description:

Currently, the online defragmentation of the ext4 is primarily
implemented through the move extent operation in the kernel. This
extent-moving operates at the granularity of PAGE_SIZE, iteratively
performing extent swapping and data movement operations, which is quite
inefficient. Especially since ext4 now supports large folios, iterations
at the PAGE_SIZE granularity are no longer practical and fail to
leverage the advantages of large folios. Additionally, the current
implementation is tightly coupled with buffer_head, making it unable to
support after the conversion of buffered I/O processes to the iomap
infrastructure.

This patch set (based on 6.17-rc7) optimizes the extent-moving process,
deprecates the old move_extent_per_page() interface, and introduces a
new mext_move_extent() interface. The new interface iterates over and
copies data based on the extents of the original file instead of the
PAGE_SIZE, and supporting large folios. The data processing logic in the
iteration remains largely consistent with previous versions, with no
additional optimizations or changes made. 

Additionally, the primary objective of this set of patches is to prepare
for converting the buffered I/O process for regular files to the iomap
infrastructure. These patches decouple the buffer_head from the main
extent-moving process, restricting its use to only the helpers
mext_folio_mkwrite() and mext_folio_mkuptodate(), which handle updating
and marking pages in the swapped page cache as dirty. The overall coding
style of the extent-moving process aligns with the iomap infrastructure,
laying the foundation for supporting online defragmentation once the
iomap infrastructure is adopted.

Patch overview:

Patch 1:    Fix a minor issue related to validity checking.
Patch 2-4:  Introduce a sequence counter for the mapping extent status
            tree, this also prepares for the iomap infrastructure.
Patch 5-7:  Refactor the mext_check_arguments() helper function and the
            validity checking to improve code readability.
Patch 8-12: Drop move_extent_per_page() and switch to using the new
            mext_move_extent(). Additionally, add support for large
            folios.

With this patch set, the efficiency of online defragmentation for the
ext4 file system can also be improved under general circumstances. Below
is a set of typical test obtained using the fio e4defrag ioengine on the
environment with Intel Xeon Gold 6240 CPU, 400G memory and a NVMe SSD
device.

  [defrag]
  directory=/mnt
  filesize=400G
  buffered=1
  fadvise_hint=0
  ioengine=e4defrag
  bs=4k         # 4k,32k,128k
  donorname=test.def
  filename=test
  inplace=0
  rw=write
  overwrite=0   # 0 for unwritten extent and 1 for written extent
  numjobs=1
  iodepth=1
  runtime=30s

  [w/o]
   U 4k:    IOPS=225k,  BW=877MiB/s      # U: unwritten extent-moving
   U 32k:   IOPS=33.2k, BW=1037MiB/s
   U 128k:  IOPS=8510,  BW=1064MiB/s
   M 4k:    IOPS=19.8k, BW=77.2MiB/s     # M: written extent-moving
   M 32k:   IOPS=2502,  BW=78.2MiB/s
   M 128k:  IOPS=635,   BW=79.5MiB/s

  [w]
   U 4k:    IOPS=246k,  BW=963MiB/s
   U 32k:   IOPS=209k,  BW=6529MiB/s
   U 128k:  IOPS=146k,  BW=17.8GiB/s
   M 4k:    IOPS=19.5k, BW=76.2MiB/s
   M 32k:   IOPS=4091,  BW=128MiB/s
   M 128k:  IOPS=2814,  BW=352MiB/s 

Best Regards,
Yi.


Zhang Yi (12):
  ext4: correct the checking of quota files before moving extents
  ext4: introduce seq counter for the extent status entry
  ext4: make ext4_es_lookup_extent() pass out the extent seq counter
  ext4: pass out extent seq counter when mapping blocks
  ext4: use EXT4_B_TO_LBLK() in mext_check_arguments()
  ext4: add mext_check_validity() to do basic check
  ext4: refactor mext_check_arguments()
  ext4: rename mext_page_mkuptodate() to mext_folio_mkuptodate()
  ext4: introduce mext_move_extent()
  ext4: switch to using the new extent movement method
  ext4: add large folios support for moving extents
  ext4: add two trace points for moving extents

 fs/ext4/ext4.h              |   3 +
 fs/ext4/extents.c           |   2 +-
 fs/ext4/extents_status.c    |  31 +-
 fs/ext4/extents_status.h    |   2 +-
 fs/ext4/inode.c             |  28 +-
 fs/ext4/ioctl.c             |  10 -
 fs/ext4/move_extent.c       | 780 +++++++++++++++++-------------------
 fs/ext4/super.c             |   1 +
 include/trace/events/ext4.h |  97 ++++-
 9 files changed, 499 insertions(+), 455 deletions(-)

-- 
2.46.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-10-11  1:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-10 10:33 [PATCH v3 00/12] ext4: optimize online defragment Zhang Yi
2025-10-10 10:33 ` [PATCH v3 01/12] ext4: correct the checking of quota files before moving extents Zhang Yi
2025-10-10 10:33 ` [PATCH v3 02/12] ext4: introduce seq counter for the extent status entry Zhang Yi
2025-10-10 10:33 ` [PATCH v3 03/12] ext4: make ext4_es_lookup_extent() pass out the extent seq counter Zhang Yi
2025-10-10 10:33 ` [PATCH v3 04/12] ext4: pass out extent seq counter when mapping blocks Zhang Yi
2025-10-10 10:33 ` [PATCH v3 05/12] ext4: use EXT4_B_TO_LBLK() in mext_check_arguments() Zhang Yi
2025-10-10 10:33 ` [PATCH v3 06/12] ext4: add mext_check_validity() to do basic check Zhang Yi
2025-10-10 10:33 ` [PATCH v3 07/12] ext4: refactor mext_check_arguments() Zhang Yi
2025-10-10 10:33 ` [PATCH v3 08/12] ext4: rename mext_page_mkuptodate() to mext_folio_mkuptodate() Zhang Yi
2025-10-10 10:33 ` [PATCH v3 09/12] ext4: introduce mext_move_extent() Zhang Yi
2025-10-10 13:38   ` Jan Kara
2025-10-11  1:20     ` Zhang Yi
2025-10-10 10:33 ` [PATCH v3 10/12] ext4: switch to using the new extent movement method Zhang Yi
2025-10-10 13:41   ` Jan Kara
2025-10-10 10:33 ` [PATCH v3 11/12] ext4: add large folios support for moving extents Zhang Yi
2025-10-10 10:33 ` [PATCH v3 12/12] ext4: add two trace points " Zhang Yi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).