From: Brian Foster <bfoster@redhat.com>
To: Chris Dunlop <chris@onthe.net.au>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Extreme fragmentation ho!
Date: Tue, 22 Dec 2020 08:03:59 -0500 [thread overview]
Message-ID: <20201222130359.GA2805699@bfoster> (raw)
In-Reply-To: <20201221215453.GA1886598@onthe.net.au>
On Tue, Dec 22, 2020 at 08:54:53AM +1100, Chris Dunlop wrote:
> Hi,
>
> I have a 2T file fragmented into 841891 randomly placed extents. It takes
> 4-6 minutes (depending on what else the filesystem is doing) to delete the
> file. This is causing a timeout in the application doing the removal, and
> hilarity ensues.
>
> The fragmentation is the result of reflinking bits and bobs from other files
> into the subject file, so it's probably unavoidable.
>
> The file is sitting on XFS on LV on a raid6 comprising 6 x 5400 RPM HDD:
>
> # xfs_info /home
> meta-data=/dev/mapper/vg00-home isize=512 agcount=32, agsize=244184192 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=1 finobt=1, sparse=1, rmapbt=1
> = reflink=1
> data = bsize=4096 blocks=7813893120, imaxpct=5
> = sunit=128 swidth=512 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=521728, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> I'm guessing the time taken to remove is not unreasonable given the speed of
> the underlying storage and the amount of metadata involved. Does my guess
> seem correct?
>
> I'd like to do some experimentation with a facsimile of this file, e.g. try
> the remove on different storage subsystems, and/or with a external fast
> journal etc., to see how they compare.
>
> What is the easiest way to recreate a similarly (or even better,
> identically) fragmented file?
>
> One way would be to use xfs_metadump / xfs_mdrestore to create an entire
> copy of the original filesystem, but I'd really prefer not taking the
> original fs offline for the time required. I also don't have the space to
> restore the whole fs but perhaps using lvmthin can address the restore
> issue, at the cost of a slight(?) performance impact due to the extra layer.
>
Note that xfs_metadump doesn't include file data, only metadata, so it
might actually be the most time and space efficient way to replicate the
large file. You would need a similarly sized block device to restore to
and would not be able to change filesystem geometry and whatnot. The
former can be easily worked around by restoring the image to a file on a
smaller fs though, which may or may not interfere with whatever
performance testing you're doing.
> Is it possible to using the output of xfs_bmap on the original file to drive
> ...something, maybe xfs_io, to recreate the fragmentation? A naive test
> using xfs_io pwrite didn't produce any fragmentation - unsurprisingly, given
> the effort XFS puts into reducing fragmentation.
>
fstests has a helper program (xfstests-dev/src/punch-alternating) that
helps create fragmented files. IIRC, you create a fully allocated file
in advance and it will punch out alternating ranges based on the
offset/size parameters. You might have to wait a bit for it to complete,
but it's pretty easy to use (and you can always create a metadump image
from the result for quicker restoration).
Yet another option might be to try a write workload that attempts to
defeat the allocator heuristics. For example, do direct I/O or falloc
requests in reverse order and in small sizes across a file. xfs_io has a
couple flags you can pass to pwrite (i.e., -B, -R) to make that easier,
but that's more manual and you may have to play around with it to get
the behavior you want.
Brian
> Cheers,
>
> Chris
>
next prev parent reply other threads:[~2020-12-22 13:05 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-21 21:54 Extreme fragmentation ho! Chris Dunlop
2020-12-22 13:03 ` Brian Foster [this message]
2020-12-28 22:06 ` Dave Chinner
2020-12-30 6:28 ` Chris Dunlop
2020-12-30 22:03 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201222130359.GA2805699@bfoster \
--to=bfoster@redhat.com \
--cc=chris@onthe.net.au \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.