Re: fstrim and strace considered harmful?

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Chris Dunlop <chris@onthe.net.au>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: fstrim and strace considered harmful?
Date: Thu, 19 May 2022 08:36:06 +1000	[thread overview]
Message-ID: <20220518223606.GA1343027@onthe.net.au> (raw)
In-Reply-To: <YoUXxBe1d7b29wif@magnolia>

On Wed, May 18, 2022 at 08:59:00AM -0700, Darrick J. Wong wrote:
> On Wed, May 18, 2022 at 05:07:13PM +1000, Chris Dunlop wrote:
>> Oh, sorry... on linux v5.15.34
>>
>> On Wed, May 18, 2022 at 04:59:49PM +1000, Chris Dunlop wrote:
>>> I have an fstrim that's been running for over 48 hours on a 256T thin
>>> provisioned XFS fs containing around 55T of actual data on a slow
>>> subsystem (ceph 8,3 erasure-encoded rbd). I don't think there would be
>>> an an enourmous amount of data to trim, maybe a few T, but I've no idea
>>> how long how long it might be expected to take. In an attempt to see
>>> what the what the fstrim was doing, I ran an strace on it. The strace
>>> has been sitting there without output and unkillable since then, now 5+
>>> hours ago.  Since the strace, on that same filesystem I now have 123 df
>>> processes and 615 rm processes -- and growing -- that are blocked in
>>> xfs_inodegc_flush, e.g.:
...
> It looks like the storage device is stalled on the discard, and most
> everything else is stuck waiting for buffer locks?  The statfs threads
> are the same symptom as last time.

Note: the box has been rebooted and it's back to normal after an anxious 
30 minutes waiting for the mount recovery. (Not an entirely wasted 30 
minutes - what a thrilling stage of the Giro d'Italia!)

I'm not sure if the fstrim was stalled, unless the strace had stalled it 
somehow: it had been running for ~48 hours without apparent issues before 
the strace was attached, and then it was another hour before the first 
process stuck on xfs_inodegc_flush appeared.

The open question is what caused the stuck processes? It's possible the 
strace was involved: the stuck process with the earliest start time, a 
"df", was started an hour after the strace and it's entirely plausible 
that was the very first df or rm issued after the strace. However it's 
also plausible that was a coincidence and the strace had nothing to do 
with it. Indeed it's even plausible the fstrim had nothing to do with the 
stuck processes and there's something else entirely going on: I don't know 
if there's a ticking time bomb somewhere in the system

It's now no mystery to me why the fstrim was taking so long, nor why the 
strace didn't produce any output: it turns out fstrim, without an explicit 
--offset --length range, issues a single ioctl() to trim from the start of 
the device to the end, and without an explicit --minimum, uses 
/sys/block/xxx/queue/discard_granularity as the minimum block size to 
discard, in this case 64kB. So it would have been issuing a metric 
shit-ton of discard requests to the underlying storage, something close 
to:

   (fs-size - fs-used) / discard-size
   256T - 26T / 64k
   3,858,759,680 requests

It was after figuring out all that that I hit the reset.

Note: it turns out the actual used space per the filesystem is 26T, whilst 
the underlying storage shows 55T used, i.e. there's 29T of real discards 
to process. With this ceph rbd storage I don't know if a "real" discard 
takes any more or less time than a discard to already-unoccupied storage. 

Next time I'll issue the fstrim in much smaller increments, e.g. starting 
with perhaps 128G (at least at first), and use a --minimum that matches 
the underlying object size (4MB). Then play around and monitor it to work 
out what parameters work best for this system.

Cheers,

Chris - older, wiser, a little more sleep deprived

next prev parent reply	other threads:[~2022-05-18 22:36 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-18  6:59 fstrim and strace considered harmful? Chris Dunlop
2022-05-18  7:07 ` Chris Dunlop
2022-05-18 15:59   ` Darrick J. Wong
2022-05-18 22:36     ` Chris Dunlop [this message]
2022-05-19  0:50       ` Dave Chinner
2022-05-19  2:33         ` Chris Dunlop
2022-05-19  6:33           ` Dave Chinner
2022-05-19 15:25         ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220518223606.GA1343027@onthe.net.au \
    --to=chris@onthe.net.au \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox