From: Chris Dunlop <chris@onthe.net.au>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: fstrim and strace considered harmful?
Date: Thu, 19 May 2022 08:36:06 +1000 [thread overview]
Message-ID: <20220518223606.GA1343027@onthe.net.au> (raw)
In-Reply-To: <YoUXxBe1d7b29wif@magnolia>
On Wed, May 18, 2022 at 08:59:00AM -0700, Darrick J. Wong wrote:
> On Wed, May 18, 2022 at 05:07:13PM +1000, Chris Dunlop wrote:
>> Oh, sorry... on linux v5.15.34
>>
>> On Wed, May 18, 2022 at 04:59:49PM +1000, Chris Dunlop wrote:
>>> I have an fstrim that's been running for over 48 hours on a 256T thin
>>> provisioned XFS fs containing around 55T of actual data on a slow
>>> subsystem (ceph 8,3 erasure-encoded rbd). I don't think there would be
>>> an an enourmous amount of data to trim, maybe a few T, but I've no idea
>>> how long how long it might be expected to take. In an attempt to see
>>> what the what the fstrim was doing, I ran an strace on it. The strace
>>> has been sitting there without output and unkillable since then, now 5+
>>> hours ago. Since the strace, on that same filesystem I now have 123 df
>>> processes and 615 rm processes -- and growing -- that are blocked in
>>> xfs_inodegc_flush, e.g.:
...
> It looks like the storage device is stalled on the discard, and most
> everything else is stuck waiting for buffer locks? The statfs threads
> are the same symptom as last time.
Note: the box has been rebooted and it's back to normal after an anxious
30 minutes waiting for the mount recovery. (Not an entirely wasted 30
minutes - what a thrilling stage of the Giro d'Italia!)
I'm not sure if the fstrim was stalled, unless the strace had stalled it
somehow: it had been running for ~48 hours without apparent issues before
the strace was attached, and then it was another hour before the first
process stuck on xfs_inodegc_flush appeared.
The open question is what caused the stuck processes? It's possible the
strace was involved: the stuck process with the earliest start time, a
"df", was started an hour after the strace and it's entirely plausible
that was the very first df or rm issued after the strace. However it's
also plausible that was a coincidence and the strace had nothing to do
with it. Indeed it's even plausible the fstrim had nothing to do with the
stuck processes and there's something else entirely going on: I don't know
if there's a ticking time bomb somewhere in the system
It's now no mystery to me why the fstrim was taking so long, nor why the
strace didn't produce any output: it turns out fstrim, without an explicit
--offset --length range, issues a single ioctl() to trim from the start of
the device to the end, and without an explicit --minimum, uses
/sys/block/xxx/queue/discard_granularity as the minimum block size to
discard, in this case 64kB. So it would have been issuing a metric
shit-ton of discard requests to the underlying storage, something close
to:
(fs-size - fs-used) / discard-size
256T - 26T / 64k
3,858,759,680 requests
It was after figuring out all that that I hit the reset.
Note: it turns out the actual used space per the filesystem is 26T, whilst
the underlying storage shows 55T used, i.e. there's 29T of real discards
to process. With this ceph rbd storage I don't know if a "real" discard
takes any more or less time than a discard to already-unoccupied storage.
Next time I'll issue the fstrim in much smaller increments, e.g. starting
with perhaps 128G (at least at first), and use a --minimum that matches
the underlying object size (4MB). Then play around and monitor it to work
out what parameters work best for this system.
Cheers,
Chris - older, wiser, a little more sleep deprived
next prev parent reply other threads:[~2022-05-18 22:36 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-18 6:59 fstrim and strace considered harmful? Chris Dunlop
2022-05-18 7:07 ` Chris Dunlop
2022-05-18 15:59 ` Darrick J. Wong
2022-05-18 22:36 ` Chris Dunlop [this message]
2022-05-19 0:50 ` Dave Chinner
2022-05-19 2:33 ` Chris Dunlop
2022-05-19 6:33 ` Dave Chinner
2022-05-19 15:25 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220518223606.GA1343027@onthe.net.au \
--to=chris@onthe.net.au \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox