From: "Darrick J. Wong" <djwong@kernel.org>
To: Chris Dunlop <chris@onthe.net.au>
Cc: linux-xfs@vger.kernel.org
Subject: Re: fstrim and strace considered harmful?
Date: Wed, 18 May 2022 08:59:00 -0700 [thread overview]
Message-ID: <YoUXxBe1d7b29wif@magnolia> (raw)
In-Reply-To: <20220518070713.GA1238882@onthe.net.au>
On Wed, May 18, 2022 at 05:07:13PM +1000, Chris Dunlop wrote:
> Oh, sorry... on linux v5.15.34
>
> On Wed, May 18, 2022 at 04:59:49PM +1000, Chris Dunlop wrote:
> > Hi,
> >
> > I have an fstrim that's been running for over 48 hours on a 256T thin
> > provisioned XFS fs containing around 55T of actual data on a slow
> > subsystem (ceph 8,3 erasure-encoded rbd). I don't think there would be
> > an an enourmous amount of data to trim, maybe a few T, but I've no idea
> > how long how long it might be expected to take. In an attempt to see
> > what the what the fstrim was doing, I ran an strace on it. The strace
> > has been sitting there without output and unkillable since then, now 5+
> > hours ago. Since the strace, on that same filesystem I now have 123 df
> > processes and 615 rm processes -- and growing -- that are blocked in
> > xfs_inodegc_flush, e.g.:
> >
> > May 18 15:31:52 d5 kernel: task:df state:D stack: 0 pid:31741 ppid: 1 flags:0x00004004
> > May 18 15:31:52 d5 kernel: Call Trace:
> > May 18 15:31:52 d5 kernel: <TASK>
> > May 18 15:31:52 d5 kernel: __schedule+0x241/0x740
> > May 18 15:31:52 d5 kernel: ? lock_is_held_type+0x97/0x100
> > May 18 15:31:52 d5 kernel: schedule+0x3a/0xa0
> > May 18 15:31:52 d5 kernel: schedule_timeout+0x271/0x310
> > May 18 15:31:52 d5 kernel: ? find_held_lock+0x2d/0x90
> > May 18 15:31:52 d5 kernel: ? sched_clock_cpu+0x9/0xa0
> > May 18 15:31:52 d5 kernel: ? lock_release+0x214/0x350
> > May 18 15:31:52 d5 kernel: wait_for_completion+0x7b/0xc0
> > May 18 15:31:52 d5 kernel: __flush_work+0x217/0x350
> > May 18 15:31:52 d5 kernel: ? flush_workqueue_prep_pwqs+0x120/0x120
> > May 18 15:31:52 d5 kernel: ? wait_for_completion+0x1c/0xc0
> > May 18 15:31:52 d5 kernel: xfs_inodegc_flush.part.24+0x62/0xc0 [xfs]
> > May 18 15:31:52 d5 kernel: xfs_fs_statfs+0x37/0x1a0 [xfs]
> > May 18 15:31:52 d5 kernel: statfs_by_dentry+0x3c/0x60
> > May 18 15:31:52 d5 kernel: vfs_statfs+0x16/0xd0
> > May 18 15:31:52 d5 kernel: user_statfs+0x44/0x80
> > May 18 15:31:52 d5 kernel: __do_sys_statfs+0x10/0x30
> > May 18 15:31:52 d5 kernel: do_syscall_64+0x34/0x80
> > May 18 15:31:52 d5 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
> > May 18 15:31:52 d5 kernel: RIP: 0033:0x7fe9e9db3c07
> > May 18 15:31:52 d5 kernel: RSP: 002b:00007ffe08f50178 EFLAGS: 00000246 ORIG_RAX: 0000000000000089
> > May 18 15:31:52 d5 kernel: RAX: ffffffffffffffda RBX: 0000555963fcae40 RCX: 00007fe9e9db3c07
> > May 18 15:31:52 d5 kernel: RDX: 00007ffe08f50400 RSI: 00007ffe08f50180 RDI: 0000555963fcae40
> > May 18 15:31:52 d5 kernel: RBP: 00007ffe08f50180 R08: 0000555963fcae80 R09: 0000000000000000
> > May 18 15:31:52 d5 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe08f50220
> > May 18 15:31:52 d5 kernel: R13: 0000000000000000 R14: 0000555963fcae80 R15: 0000555963fcae40
> > May 18 15:31:52 d5 kernel: </TASK>
> >
> > Full 1.5M sysrq output at: https://file.io/bWOL8F7mzKI6
task:fstrim state:D stack: 0 pid: 3552 ppid: 2091 flags:0x00004006
Call Trace:
<TASK>
__schedule+0x241/0x740
schedule+0x3a/0xa0
schedule_timeout+0x1c9/0x310
? del_timer_sync+0x90/0x90
io_schedule_timeout+0x19/0x40
wait_for_completion_io_timeout+0x75/0xd0
submit_bio_wait+0x63/0x90
? wait_for_completion_io_timeout+0x1f/0xd0
blkdev_issue_discard+0x6a/0xa0
? _raw_spin_unlock+0x1f/0x30
xfs_trim_extents+0x1a7/0x3d0 [xfs]
xfs_ioc_trim+0x161/0x1e0 [xfs]
xfs_file_ioctl+0x914/0xbf0 [xfs]
? __do_sys_newfstat+0x2d/0x40
__x64_sys_ioctl+0x71/0xb0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fa84e61ae57
RSP: 002b:00007ffe90fa1da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe90fa1f10 RCX: 00007fa84e61ae57
RDX: 00007ffe90fa1db0 RSI: 00000000c0185879 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe90fa3d10
R13: 00007ffe90fa3d10 R14: 0000000000000000 R15: 00007fa84e1fdff8
</TASK>
It looks like the storage device is stalled on the discard, and most
everything else is stuck waiting for buffer locks? The statfs threads
are the same symptom as last time.
--D
> >
> > That stack trace is uncomfortably familiar:
> >
> > Subject: Highly reflinked and fragmented considered harmful?
> > https://lore.kernel.org/linux-xfs/20220509024659.GA62606@onthe.net.au/
> >
> > FYI:
> >
> > # xfs_info /vol
> > meta-data=/dev/vg01/vol isize=512 agcount=257, agsize=268434432 blks
> > = sectsz=4096 attr=2, projid32bit=1
> > = crc=1 finobt=1, sparse=1, rmapbt=1
> > = reflink=1 bigtime=1 inobtcount=1
> > data = bsize=4096 blocks=68719475712, imaxpct=1
> > = sunit=1024 swidth=8192 blks
> > naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> > log =internal log bsize=4096 blocks=521728, version=2
> > = sectsz=4096 sunit=1 blks, lazy-count=1
> > realtime =none extsz=4096 blocks=0, rtextents=0
> >
> > Is there something I can do to "unstick" things, or is it time to hit
> > the reset, and hope the recovery on mount isn't onerous?
> >
> > Aside from that immediate issue, what has gone wrong here?
> >
> > Cheers,
> >
> > Chris
next prev parent reply other threads:[~2022-05-18 15:59 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-18 6:59 fstrim and strace considered harmful? Chris Dunlop
2022-05-18 7:07 ` Chris Dunlop
2022-05-18 15:59 ` Darrick J. Wong [this message]
2022-05-18 22:36 ` Chris Dunlop
2022-05-19 0:50 ` Dave Chinner
2022-05-19 2:33 ` Chris Dunlop
2022-05-19 6:33 ` Dave Chinner
2022-05-19 15:25 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YoUXxBe1d7b29wif@magnolia \
--to=djwong@kernel.org \
--cc=chris@onthe.net.au \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox