From: Dave Chinner <david@fromorbit.com>
To: Wengang Wang <wen.gang.wang@oracle.com>
Cc: "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH 2/9] spaceman/defrag: pick up segments from target file
Date: Wed, 17 Jul 2024 14:11:16 +1000 [thread overview]
Message-ID: <ZpdEZOWDbg5SKauo@dread.disaster.area> (raw)
In-Reply-To: <65CF7656-6B69-47A3-90E4-462E052D2543@oracle.com>
On Tue, Jul 16, 2024 at 08:23:35PM +0000, Wengang Wang wrote:
> > Ok, so this is a linear iteration of all extents in the file that
> > filters extents for the specific "segment" that is going to be
> > processed. I still have no idea why fixed length segments are
> > important, but "linear extent scan for filtering" seems somewhat
> > expensive.
>
> Hm… fixed length segments — actually not fixed length segments, but segment
> size can’t exceed the limitation. So segment.ds_length <= LIMIT.
Which is effectively fixed length segments....
> Larger segment take longer time (with filed locked) to defrag. The
> segment size limit is a way to balance the defrag and the parallel
> IO latency.
Yes, I know why you've done it. These were the same arguments made a
while back for a new way of cloning files on XFS. We solved those
problems just with a small change to the locking, and didn't need
new ioctls or lots of new code just to solve the "clone blocks
concurrent IO" problem.
I'm looking at this from exactly the same POV. The code presented is
doing lots of complex, unusable stuff to work around the fact that
UNSHARE blocks concurrent IO. I don't see any difference between
CLONE and UNSHARE from the IO perspective - if anything UNSHARE can
have looser rules than CLONE, because a concurrent write will either
do the COW of a shared block itself, or hit the exclusive block that
has already been unshared.
So if we fix these locking issues in the kernel, then the whole need
for working around the IO concurrency problems with UNSHARE goes
away and the userspace code becomes much, much simpler.
> > Indeed, if you used FIEMAP, you can pass a minimum
> > segment length to filter out all the small extents. Iterating that
> > extent list means all the ranges you need to defrag are in the holes
> > of the returned mapping information. This would be much faster
> > than an entire linear mapping to find all the regions with small
> > extents that need defrag. The second step could then be doing a
> > fine grained mapping of each region that we now know either contains
> > fragmented data or holes....
>
> Hm… just a question here:
> As your way, say you set the filter length to 2048, all extents with 2048 or less blocks are to defragmented.
> What if the extent layout is like this:
>
> 1. 1
> 2. 2049
> 3. 2
> 4. 2050
> 5. 1
> 6. 2051
>
> In above case, do you do defrag or not?
The filtering presenting in the patch above will not defrag any of
this with a 2048 block segment side, because the second extent in
each segment extend beyond the configured max segment length. IOWs,
it ends up with a single extent per "2048 block segment", and that
won't get defragged with the current algorithm.
As it is, this really isn't a common fragmentation pattern for a
file that does not contain shared extents, so I wouldn't expect to
ever need to decide if this needs to be defragged or not.
However, it is exactly the layout I would expect to see for cloned
and modified filesystem image files. That is, the common layout for
such a "cloned from golden image" Vm images is this:
1. 1 written
2. 2049 shared
3. 2 written
4. 2050 shared
5. 1 written
6. 2051 shared
i.e. there are large chunks of contiguous shared extents between the
small individual COW block modifications that have been made to
customise the image for the deployed VM.
Either way, if the segment/filter length is 2048 blocks, then this
isn't a pattern that should be defragmented. If the segment/filter
length is 4096 or larger, then yes, this pattern should definitely
be defragmented.
> As I understand the situation, performance of defrag it’s self is
> not a critical concern here.
Sure, but implementing a low performing, high CPU consumption,
entirely single threaded defragmentation model that requires
specific tuning in every different environment it is run in doesn't
seem like the best idea to me.
I'm trying to work out if there is a faster, simpler way of
achieving the same goal....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2024-07-17 4:11 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-09 19:10 [PATCH 0/9] introduce defrag to xfs_spaceman Wengang Wang
2024-07-09 19:10 ` [PATCH 1/9] xfsprogs: introduce defrag command to spaceman Wengang Wang
2024-07-09 21:18 ` Darrick J. Wong
2024-07-11 21:54 ` Wengang Wang
2024-07-15 21:30 ` Wengang Wang
2024-07-15 22:44 ` Darrick J. Wong
2024-07-09 19:10 ` [PATCH 2/9] spaceman/defrag: pick up segments from target file Wengang Wang
2024-07-09 21:50 ` [PATCH 2/9] spaceman/defrag: pick up segments from target fileOM Darrick J. Wong
2024-07-11 22:37 ` Wengang Wang
2024-07-15 23:40 ` [PATCH 2/9] spaceman/defrag: pick up segments from target file Dave Chinner
2024-07-16 20:23 ` Wengang Wang
2024-07-17 4:11 ` Dave Chinner [this message]
2024-07-18 19:03 ` Wengang Wang
2024-07-19 4:59 ` Dave Chinner
2024-07-19 4:01 ` Christoph Hellwig
2024-07-24 19:22 ` Wengang Wang
2024-07-30 22:13 ` Dave Chinner
2024-07-09 19:10 ` [PATCH 3/9] spaceman/defrag: defrag segments Wengang Wang
2024-07-09 21:57 ` Darrick J. Wong
2024-07-11 22:49 ` Wengang Wang
2024-07-12 19:07 ` Wengang Wang
2024-07-15 22:42 ` Darrick J. Wong
2024-07-16 0:08 ` Dave Chinner
2024-07-18 18:06 ` Wengang Wang
2024-07-09 19:10 ` [PATCH 4/9] spaceman/defrag: ctrl-c handler Wengang Wang
2024-07-09 21:08 ` Darrick J. Wong
2024-07-11 22:58 ` Wengang Wang
2024-07-15 22:56 ` Darrick J. Wong
2024-07-16 16:21 ` Wengang Wang
2024-07-09 19:10 ` [PATCH 5/9] spaceman/defrag: exclude shared segments on low free space Wengang Wang
2024-07-09 21:05 ` Darrick J. Wong
2024-07-11 23:08 ` Wengang Wang
2024-07-15 22:58 ` Darrick J. Wong
2024-07-09 19:10 ` [PATCH 6/9] spaceman/defrag: workaround kernel xfs_reflink_try_clear_inode_flag() Wengang Wang
2024-07-09 20:51 ` Darrick J. Wong
2024-07-11 23:11 ` Wengang Wang
2024-07-16 0:25 ` Dave Chinner
2024-07-18 18:24 ` Wengang Wang
2024-07-31 22:25 ` Dave Chinner
2024-07-09 19:10 ` [PATCH 7/9] spaceman/defrag: sleeps between segments Wengang Wang
2024-07-09 20:46 ` Darrick J. Wong
2024-07-11 23:26 ` Wengang Wang
2024-07-11 23:30 ` Wengang Wang
2024-07-09 19:10 ` [PATCH 8/9] spaceman/defrag: readahead for better performance Wengang Wang
2024-07-09 20:27 ` Darrick J. Wong
2024-07-11 23:29 ` Wengang Wang
2024-07-16 0:56 ` Dave Chinner
2024-07-18 18:40 ` Wengang Wang
2024-07-31 3:10 ` Dave Chinner
2024-08-02 18:31 ` Wengang Wang
2024-07-09 19:10 ` [PATCH 9/9] spaceman/defrag: warn on extsize Wengang Wang
2024-07-09 20:21 ` Darrick J. Wong
2024-07-11 23:36 ` Wengang Wang
2024-07-16 0:29 ` Dave Chinner
2024-07-22 18:01 ` Wengang Wang
2024-07-30 22:43 ` Dave Chinner
2024-07-15 23:03 ` [PATCH 0/9] introduce defrag to xfs_spaceman Dave Chinner
2024-07-16 19:45 ` Wengang Wang
2024-07-31 2:51 ` Dave Chinner
2024-08-02 18:14 ` Wengang Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZpdEZOWDbg5SKauo@dread.disaster.area \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=wen.gang.wang@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox