From: Wengang Wang <wen.gang.wang@oracle.com>
To: linux-xfs@vger.kernel.org
Cc: wen.gang.wang@oracle.com
Subject: [PATCH 0/9] introduce defrag to xfs_spaceman
Date: Tue, 9 Jul 2024 12:10:19 -0700 [thread overview]
Message-ID: <20240709191028.2329-1-wen.gang.wang@oracle.com> (raw)
This patch set introduces defrag to xfs_spaceman command. It has the functionality and
features below (also subject to be added to man page, so please review):
defrag [-f free_space] [-i idle_time] [-s segment_size] [-n] [-a]
defrag defragments the specified XFS file online non-exclusively. The target XFS
doesn't have to (and must not) be unmunted. When defragmentation is in progress, file
IOs are served 'in parallel'. reflink feature must be enabled in the XFS.
Defragmentation and file IOs
The target file is virtually devided into many small segments. Segments are the
smallest units for defragmentation. Each segment is defragmented one by one in a
lock->defragment->unlock->idle manner. File IOs are blocked when the target file is
locked and are served during the defragmentation idle time (file is unlocked). Though
the file IOs can't really go in parallel, they are not blocked long. The locking time
basically depends on the segment size. Smaller segments usually take less locking time
and thus IOs are blocked shorterly, bigger segments usually need more locking time and
IOs are blocked longer. Check -s and -i options to balance the defragmentation and IO
service.
Temporary file
A temporary file is used for the defragmentation. The temporary file is created in the
same directory as the target file is and is named ".xfsdefrag_<pid>". It is a sparse
file and contains a defragmentation segment at a time. The temporary file is removed
automatically when defragmentation is done or is cancelled by ctrl-c. It remains in
case kernel crashes when defragmentation is going on. In that case, the temporary file
has to be removed manaully.
Free blocks consumption
Defragmenation works by (trying) allocating new (contiguous) blocks, copying data and
then freeing old (non-contig) blocks. Usually the number of old blocks to free equals
to the number the newly allocated blocks. As a finally result, defragmentation doesn't
consume free blocks. Well, that is true if the target file is not sharing blocks with
other files. In case the target file contains shared blocks, those shared blocks won't
be freed back to filesystem as they are still owned by other files. So defragmenation
allocates more blocks than it frees. For existing XFS, free blocks might be over-
committed when reflink snapshots were created. To avoid causing the XFS running into
low free blocks state, this defragmentation excludes (partially) shared segments when
the file system free blocks reaches a shreshold. Check the -f option.
Safty and consistency
The defragmentation file is guanrantted safe and data consistent for ctrl-c and kernel
crash.
First extent share
Current kernel has routine for each segment defragmentation detecting if the file is
sharing blocks. It takes long in case the target file contains huge number of extents
and the shared ones, if there is, are at the end. The First extent share feature works
around above issue by making the first serveral blocks shared. Seeing the first blocks
are shared, the kernel routine ends quickly. The side effect is that the "share" flag
would remain traget file. This feature is enabled by default and can be disabled by -n
option.
extsize and cowextsize
According to kernel implementation, extsize and cowextsize could have following impacts
to defragmentation: 1) non-zero extsize causes separated block allocations for each
extent in the segment and those blocks are not contiguous. The segment remains same
number of extents after defragmention (no effect). 2) When extsize and/or cowextsize
are too big, a lot of pre-allocated blocks remain in memory for a while. When new IO
comes to whose pre-allocated blocks Copy on Write happens and causes the file
fragmented.
Readahead
Readahead tries to fetch the data blocks for next segment with less locking in
backgroud during idle time. This feature is disabled by default, use -a to enable it.
The command takes the following options:
-f free_space
The shreshold of XFS free blocks in MiB. When free blocks are less than this
number, (partially) shared segments are excluded from defragmentation. Default
number is 1024
-i idle_time
The time in milliseconds, defragmentation enters idle state for this long after
defragmenting a segment and before handing the next. Default number is TOBEDONE.
-s segment_size
The size limitation in bytes of segments. Minimium number is 4MiB, default
number is 16MiB.
-n Disable the First extent share feature. Enabled by default.
-a Enable readahead feature, disabled by default.
We tested with real customer metadump with some different 'idle_time's and found 250ms is good pratice
sleep time. Here comes some number of the test:
Test: running of defrag on the image file which is used for the back end of a block device in a
virtual machine. At the same time, fio is running at the same time inside virtual machine
on that block device.
block device type: NVME
File size: 200GiB
paramters to defrag: free_space: 1024 idle_time: 250 First_extent_share: enabled readahead: disabled
Defrag run time: 223 minutes
Number of extents: 6745489(before) -> 203571(after)
Fio read latency: 15.72ms(without defrag) -> 14.53ms(during defrag)
Fio write latency: 32.21ms(without defrag) -> 20.03ms(during defrag)
Wengang Wang (9):
xfsprogs: introduce defrag command to spaceman
spaceman/defrag: pick up segments from target file
spaceman/defrag: defrag segments
spaceman/defrag: ctrl-c handler
spaceman/defrag: exclude shared segments on low free space
spaceman/defrag: workaround kernel xfs_reflink_try_clear_inode_flag()
spaceman/defrag: sleeps between segments
spaceman/defrag: readahead for better performance
spaceman/defrag: warn on extsize
spaceman/Makefile | 2 +-
spaceman/defrag.c | 788 ++++++++++++++++++++++++++++++++++++++++++++++
spaceman/init.c | 1 +
spaceman/space.h | 1 +
4 files changed, 791 insertions(+), 1 deletion(-)
create mode 100644 spaceman/defrag.c
--
2.39.3 (Apple Git-146)
next reply other threads:[~2024-07-09 19:10 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-09 19:10 Wengang Wang [this message]
2024-07-09 19:10 ` [PATCH 1/9] xfsprogs: introduce defrag command to spaceman Wengang Wang
2024-07-09 21:18 ` Darrick J. Wong
2024-07-11 21:54 ` Wengang Wang
2024-07-15 21:30 ` Wengang Wang
2024-07-15 22:44 ` Darrick J. Wong
2024-07-09 19:10 ` [PATCH 2/9] spaceman/defrag: pick up segments from target file Wengang Wang
2024-07-09 21:50 ` [PATCH 2/9] spaceman/defrag: pick up segments from target fileOM Darrick J. Wong
2024-07-11 22:37 ` Wengang Wang
2024-07-15 23:40 ` [PATCH 2/9] spaceman/defrag: pick up segments from target file Dave Chinner
2024-07-16 20:23 ` Wengang Wang
2024-07-17 4:11 ` Dave Chinner
2024-07-18 19:03 ` Wengang Wang
2024-07-19 4:59 ` Dave Chinner
2024-07-19 4:01 ` Christoph Hellwig
2024-07-24 19:22 ` Wengang Wang
2024-07-30 22:13 ` Dave Chinner
2024-07-09 19:10 ` [PATCH 3/9] spaceman/defrag: defrag segments Wengang Wang
2024-07-09 21:57 ` Darrick J. Wong
2024-07-11 22:49 ` Wengang Wang
2024-07-12 19:07 ` Wengang Wang
2024-07-15 22:42 ` Darrick J. Wong
2024-07-16 0:08 ` Dave Chinner
2024-07-18 18:06 ` Wengang Wang
2024-07-09 19:10 ` [PATCH 4/9] spaceman/defrag: ctrl-c handler Wengang Wang
2024-07-09 21:08 ` Darrick J. Wong
2024-07-11 22:58 ` Wengang Wang
2024-07-15 22:56 ` Darrick J. Wong
2024-07-16 16:21 ` Wengang Wang
2024-07-09 19:10 ` [PATCH 5/9] spaceman/defrag: exclude shared segments on low free space Wengang Wang
2024-07-09 21:05 ` Darrick J. Wong
2024-07-11 23:08 ` Wengang Wang
2024-07-15 22:58 ` Darrick J. Wong
2024-07-09 19:10 ` [PATCH 6/9] spaceman/defrag: workaround kernel xfs_reflink_try_clear_inode_flag() Wengang Wang
2024-07-09 20:51 ` Darrick J. Wong
2024-07-11 23:11 ` Wengang Wang
2024-07-16 0:25 ` Dave Chinner
2024-07-18 18:24 ` Wengang Wang
2024-07-31 22:25 ` Dave Chinner
2024-07-09 19:10 ` [PATCH 7/9] spaceman/defrag: sleeps between segments Wengang Wang
2024-07-09 20:46 ` Darrick J. Wong
2024-07-11 23:26 ` Wengang Wang
2024-07-11 23:30 ` Wengang Wang
2024-07-09 19:10 ` [PATCH 8/9] spaceman/defrag: readahead for better performance Wengang Wang
2024-07-09 20:27 ` Darrick J. Wong
2024-07-11 23:29 ` Wengang Wang
2024-07-16 0:56 ` Dave Chinner
2024-07-18 18:40 ` Wengang Wang
2024-07-31 3:10 ` Dave Chinner
2024-08-02 18:31 ` Wengang Wang
2024-07-09 19:10 ` [PATCH 9/9] spaceman/defrag: warn on extsize Wengang Wang
2024-07-09 20:21 ` Darrick J. Wong
2024-07-11 23:36 ` Wengang Wang
2024-07-16 0:29 ` Dave Chinner
2024-07-22 18:01 ` Wengang Wang
2024-07-30 22:43 ` Dave Chinner
2024-07-15 23:03 ` [PATCH 0/9] introduce defrag to xfs_spaceman Dave Chinner
2024-07-16 19:45 ` Wengang Wang
2024-07-31 2:51 ` Dave Chinner
2024-08-02 18:14 ` Wengang Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240709191028.2329-1-wen.gang.wang@oracle.com \
--to=wen.gang.wang@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox