From: "J. Bruce Fields" <bfields@fieldses.org>
To: Timothy Pearson <tpearson@raptorengineering.com>
Cc: Chuck Lever <chuck.lever@oracle.com>,
linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load
Date: Fri, 23 Jul 2021 17:01:24 -0400 [thread overview]
Message-ID: <20210723210124.GD1278@fieldses.org> (raw)
In-Reply-To: <1288667080.5652.1625478421955.JavaMail.zimbra@raptorengineeringinc.com>
On Mon, Jul 05, 2021 at 04:47:01AM -0500, Timothy Pearson wrote:
> Forgot to add -- sometimes, right before the core stall and backtrace, we see messages similar to the following:
>
> [16825.408854] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000051f43ff7 xid 2e0c9b7a
> [16825.414070] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000051f43ff7 xid 2f0c9b7a
> [16825.414360] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000051f43ff7 xid 300c9b7a
That suggests either the server has a bug or its receiving tcp data
that's corrupted somehow.
--b.
>
> We're not sure if they are related or not.
>
> ----- Original Message -----
> > From: "Timothy Pearson" <tpearson@raptorengineering.com>
> > To: "J. Bruce Fields" <bfields@fieldses.org>, "Chuck Lever" <chuck.lever@oracle.com>
> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> > Sent: Monday, July 5, 2021 4:44:29 AM
> > Subject: CPU stall, eventual host hang with BTRFS + NFS under heavy load
>
> > We've been dealing with a fairly nasty NFS-related problem off and on for the
> > past couple of years. The host is a large POWER server with several external
> > SAS arrays attached, using BTRFS for cold storage of large amounts of data.
> > The main symptom is that under heavy sustained NFS write traffic using certain
> > file types (see below) a core will suddenly lock up, continually spewing a
> > backtrace similar to the one I've pasted below. While this immediately halts
> > all NFS traffic to the affected client (which is never the same client as the
> > machine doing the large file transfer), the larger issue is that over the next
> > few minutes / hours the entire host will gradually degrade in responsiveness
> > until it grinds to a complete halt. Once the core stall occurs we have been
> > unable to find any way to restore the machine to full functionality or avoid
> > the degradation and eventual hang short of a hard power down and restart.
> >
> > Tens of GB of compressed data in a single file seems to be fairly good at
> > triggering the problem, whereas raw disk images or other regularly patterned
> > data tend not to be. The underlying hardware is functioning perfectly with no
> > problems noted, and moving the files without NFS avoids the bug.
> >
> > We've been using a workaround involving purposefully pausing (SIGSTOP) the file
> > transfer process on the client as soon as other clients start to show a
> > slowdown. This hack avoids the bug entirely provided the host is allowed to
> > catch back up prior to resuming (SIGCONT) the file transfer process. From
> > this, it seems something is going very wrong within the NFS stack under high
> > storage I/O pressure and high storage write latency (timeout?) -- it should
> > simply pause transfers while the storage subsystem catches up, not lock up a
> > core and force a host restart. Interesting, sometimes it does exactly what it
> > is supposed to and does pause and wait for the storage subsystem, but around
> > 20% of the time it just triggers this bug and stalls a core.
> >
> > This bug has been present since at least 4.14 and is still present in the latest
> > 5.12.14 version.
> >
> > As the machine is in production, it is difficult to gather further information
> > or test patches, however we would be able to apply patches to the kernel that
> > would potentially restore stability with enough advance scheduling.
> >
> > Sample backtrace below:
> >
> > [16846.426141] rcu: INFO: rcu_sched self-detected stall on CPU
> > [16846.426202] rcu: 32-....: (5249 ticks this GP)
> > idle=78a/1/0x4000000000000002 softirq=1663878/1663878 fqs=1986
> > [16846.426241] (t=5251 jiffies g=2720809 q=756724)
> > [16846.426273] NMI backtrace for cpu 32
> > [16846.426298] CPU: 32 PID: 10624 Comm: kworker/u130:25 Not tainted 5.12.14 #1
> > [16846.426342] Workqueue: rpciod rpc_async_schedule [sunrpc]
> > [16846.426406] Call Trace:
> > [16846.426429] [c000200010823250] [c00000000074e630] dump_stack+0xc4/0x114
> > (unreliable)
> > [16846.426483] [c000200010823290] [c00000000075aebc]
> > nmi_cpu_backtrace+0xfc/0x150
> > [16846.426506] [c000200010823310] [c00000000075b0a8]
> > nmi_trigger_cpumask_backtrace+0x198/0x1f0
> > [16846.426577] [c0002000108233b0] [c000000000072818]
> > arch_trigger_cpumask_backtrace+0x28/0x40
> > [16846.426621] [c0002000108233d0] [c000000000202db8]
> > rcu_dump_cpu_stacks+0x158/0x1b8
> > [16846.426667] [c000200010823470] [c000000000201828]
> > rcu_sched_clock_irq+0x908/0xb10
> > [16846.426708] [c000200010823560] [c0000000002141d0]
> > update_process_times+0xc0/0x140
> > [16846.426768] [c0002000108235a0] [c00000000022dd34]
> > tick_sched_handle.isra.18+0x34/0xd0
> > [16846.426808] [c0002000108235d0] [c00000000022e1e8] tick_sched_timer+0x68/0xe0
> > [16846.426856] [c000200010823610] [c00000000021577c]
> > __hrtimer_run_queues+0x16c/0x370
> > [16846.426903] [c000200010823690] [c000000000216378]
> > hrtimer_interrupt+0x128/0x2f0
> > [16846.426947] [c000200010823740] [c000000000029494] timer_interrupt+0x134/0x310
> > [16846.426989] [c0002000108237a0] [c000000000016c54]
> > replay_soft_interrupts+0x124/0x2e0
> > [16846.427045] [c000200010823990] [c000000000016f14]
> > arch_local_irq_restore+0x104/0x170
> > [16846.427103] [c0002000108239c0] [c00000000017247c]
> > mod_delayed_work_on+0x8c/0xe0
> > [16846.427149] [c000200010823a20] [c00800000819fe04]
> > rpc_set_queue_timer+0x5c/0x80 [sunrpc]
> > [16846.427234] [c000200010823a40] [c0080000081a096c]
> > __rpc_sleep_on_priority_timeout+0x194/0x1b0 [sunrpc]
> > [16846.427324] [c000200010823a90] [c0080000081a3080]
> > rpc_sleep_on_timeout+0x88/0x110 [sunrpc]
> > [16846.427388] [c000200010823ad0] [c0080000071f7220] nfsd4_cb_done+0x468/0x530
> > [nfsd]
> > [16846.427457] [c000200010823b60] [c0080000081a0a0c] rpc_exit_task+0x84/0x1d0
> > [sunrpc]
> > [16846.427520] [c000200010823ba0] [c0080000081a2448] __rpc_execute+0xd0/0x760
> > [sunrpc]
> > [16846.427598] [c000200010823c30] [c0080000081a2b18]
> > rpc_async_schedule+0x40/0x70 [sunrpc]
> > [16846.427687] [c000200010823c60] [c000000000170bf0]
> > process_one_work+0x290/0x580
> > [16846.427736] [c000200010823d00] [c000000000170f68] worker_thread+0x88/0x620
> > [16846.427813] [c000200010823da0] [c00000000017b860] kthread+0x1a0/0x1b0
> > [16846.427865] [c000200010823e10] [c00000000000d6ec]
> > ret_from_kernel_thread+0x5c/0x70
> > [16873.869180] watchdog: BUG: soft lockup - CPU#32 stuck for 49s!
> > [kworker/u130:25:10624]
> > [16873.869245] Modules linked in: rpcsec_gss_krb5 iscsi_target_mod
> > target_core_user uio target_core_pscsi target_core_file target_core_iblock
> > target_core_mod tun nft_counter nf_tables nfnetlink vfio_pci vfio_virqfd
> > vfio_iommu_spapr_tce vfio vfio_spapr_eeh i2c_dev bridg$
> > [16873.869413] linear mlx4_ib ib_uverbs ib_core raid1 md_mod sd_mod t10_pi
> > hid_generic usbhid hid ses enclosure crct10dif_vpmsum crc32c_vpmsum xhci_pci
> > xhci_hcd ixgbe mlx4_core mpt3sas usbcore tg3 mdio_devres of_mdio fixed_phy
> > xfrm_algo mdio libphy aacraid igb raid_cl$
> > [16873.869889] CPU: 32 PID: 10624 Comm: kworker/u130:25 Not tainted 5.12.14 #1
> > [16873.869966] Workqueue: rpciod rpc_async_schedule [sunrpc]
> > [16873.870023] NIP: c000000000711300 LR: c0080000081a0708 CTR: c0000000007112a0
> > [16873.870073] REGS: c0002000108237d0 TRAP: 0900 Not tainted (5.12.14)
> > [16873.870109] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR:
> > 24004842 XER: 00000000
> > [16873.870146] CFAR: c0080000081d8054 IRQMASK: 0
> > GPR00: c0080000081a0748 c000200010823a70 c0000000015c0700 c0000000e2227a40
> > GPR04: c0000000e2227a40 c0000000e2227a40 c000200ffb6cc0a8 0000000000000018
> > GPR08: 0000000000000000 5deadbeef0000122 c0080000081ffd18 c0080000081d8040
> > GPR12: c0000000007112a0 c000200fff7fee00 c00000000017b6c8 c000000090d9ccc0
> > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000040
> > GPR24: 0000000000000000 0000000000000000 fffffffffffffe00 0000000000000001
> > GPR28: c00000001a62f000 c0080000081a0988 c0080000081ffd10 c0000000e2227a00
> > [16873.870452] NIP [c000000000711300] __list_del_entry_valid+0x60/0x100
> > [16873.870507] LR [c0080000081a0708]
> > rpc_wake_up_task_on_wq_queue_action_locked+0x330/0x400 [sunrpc]
next prev parent reply other threads:[~2021-07-23 21:01 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-05 9:44 CPU stall, eventual host hang with BTRFS + NFS under heavy load Timothy Pearson
2021-07-05 9:47 ` Timothy Pearson
2021-07-23 21:01 ` J. Bruce Fields [this message]
[not found] ` <B4D8C4B7-EE8C-456C-A6C5-D25FF1F3608E@rutgers.edu>
[not found] ` <3A4DF3BB-955C-4301-BBED-4D5F02959F71@rutgers.edu>
2021-08-09 17:06 ` Timothy Pearson
2021-08-09 17:15 ` hedrick
2021-08-09 17:25 ` Timothy Pearson
2021-08-09 17:37 ` Chuck Lever III
[not found] ` <F5179A41-FB9A-4AB1-BE58-C2859DB7EC06@rutgers.edu>
2021-08-09 18:30 ` Timothy Pearson
2021-08-09 18:38 ` hedrick
2021-08-09 18:44 ` Timothy Pearson
2021-08-09 18:49 ` J. Bruce Fields
[not found] ` <15AD846A-4638-4ACF-B47C-8EF655AD6E85@rutgers.edu>
2021-08-09 18:56 ` Timothy Pearson
2021-08-09 20:54 ` Charles Hedrick
2021-08-09 21:49 ` Timothy Pearson
2021-08-09 22:01 ` Charles Hedrick
[not found] ` <1119B476-171F-4C5A-9DEF-184F211A6A98@rutgers.edu>
2021-08-10 16:22 ` Timothy Pearson
2021-08-16 14:43 ` hedrick
2021-08-09 18:30 ` J. Bruce Fields
2021-08-09 18:34 ` hedrick
[not found] ` <413163A6-8484-4170-9877-C0C2D50B13C0@rutgers.edu>
2021-08-10 14:58 ` J. Bruce Fields
2021-07-23 21:00 ` J. Bruce Fields
2021-07-23 21:22 ` Timothy Pearson
2021-07-28 19:51 ` Timothy Pearson
2021-08-02 19:28 ` J. Bruce Fields
2021-08-10 0:43 ` NeilBrown
2021-08-10 0:54 ` J. Bruce Fields
2021-08-12 14:44 ` J. Bruce Fields
2021-08-12 21:36 ` NeilBrown
2021-10-08 20:27 ` Scott Mayhew
2021-10-08 20:53 ` Timothy Pearson
2021-10-08 21:11 ` J. Bruce Fields
2021-10-09 17:33 ` Chuck Lever III
2021-10-11 14:30 ` Bruce Fields
2021-10-11 16:36 ` Chuck Lever III
2021-10-11 21:57 ` NeilBrown
2021-10-14 22:36 ` Trond Myklebust
2021-10-14 22:51 ` NeilBrown
2021-10-15 8:03 ` Trond Myklebust
2021-10-15 8:05 ` Trond Myklebust
2021-12-01 18:36 ` Scott Mayhew
2021-12-01 19:35 ` Trond Myklebust
2021-12-01 20:13 ` Scott Mayhew
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210723210124.GD1278@fieldses.org \
--to=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=tpearson@raptorengineering.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).