public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH v1] RDMA/core: Fix check_flush_dependency splat on addr_wq
Date: Tue, 23 Aug 2022 11:09:31 +0300	[thread overview]
Message-ID: <YwSLOxyEtV4l2Frc@unreal> (raw)
In-Reply-To: <166118222093.3250511.11454048195824271658.stgit@morisot.1015granger.net>

On Mon, Aug 22, 2022 at 11:30:20AM -0400, Chuck Lever wrote:
> While setting up a new lab, I accidentally misconfigured the
> Ethernet port for a system that tried an NFS mount using RoCE.
> This made the NFS server unreachable. The following WARNING
> popped on the NFS client while waiting for the mount attempt to
> time out:
> 
> Aug 20 17:12:05 bazille kernel: workqueue: WQ_MEM_RECLAIM xprtiod:xprt_rdma_connect_worker [rpcrdma] is flushing !WQ_MEM_RECLAI>
> Aug 20 17:12:05 bazille kernel: WARNING: CPU: 0 PID: 100 at kernel/workqueue.c:2628 check_flush_dependency+0xbf/0xca
> Aug 20 17:12:05 bazille kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs 8021q garp stp mrp llc rfkill rpcrdma>
> Aug 20 17:12:05 bazille kernel: CPU: 0 PID: 100 Comm: kworker/u8:8 Not tainted 6.0.0-rc1-00002-g6229f8c054e5 #13
> Aug 20 17:12:05 bazille kernel: Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0b 06/12/2017
> Aug 20 17:12:05 bazille kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
> Aug 20 17:12:05 bazille kernel: RIP: 0010:check_flush_dependency+0xbf/0xca
> Aug 20 17:12:05 bazille kernel: Code: 75 2a 48 8b 55 18 48 8d 8b b0 00 00 00 4d 89 e0 48 81 c6 b0 00 00 00 48 c7 c7 65 33 2e be>
> Aug 20 17:12:05 bazille kernel: RSP: 0018:ffffb562806cfcf8 EFLAGS: 00010092
> Aug 20 17:12:05 bazille kernel: RAX: 0000000000000082 RBX: ffff97894f8c3c00 RCX: 0000000000000027
> Aug 20 17:12:05 bazille kernel: RDX: 0000000000000002 RSI: ffffffffbe3447d1 RDI: 00000000ffffffff
> Aug 20 17:12:05 bazille kernel: RBP: ffff978941315840 R08: 0000000000000000 R09: 0000000000000000
> Aug 20 17:12:05 bazille kernel: R10: 00000000000008b0 R11: 0000000000000001 R12: ffffffffc0ce3731
> Aug 20 17:12:05 bazille kernel: R13: ffff978950c00500 R14: ffff97894341f0c0 R15: ffff978951112eb0
> Aug 20 17:12:05 bazille kernel: FS:  0000000000000000(0000) GS:ffff97987fc00000(0000) knlGS:0000000000000000
> Aug 20 17:12:05 bazille kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Aug 20 17:12:05 bazille kernel: CR2: 00007f807535eae8 CR3: 000000010b8e4002 CR4: 00000000003706f0
> Aug 20 17:12:05 bazille kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Aug 20 17:12:05 bazille kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Aug 20 17:12:05 bazille kernel: Call Trace:
> Aug 20 17:12:05 bazille kernel:  <TASK>
> Aug 20 17:12:05 bazille kernel:  __flush_work.isra.0+0xaf/0x188
> Aug 20 17:12:05 bazille kernel:  ? _raw_spin_lock_irqsave+0x2c/0x37
> Aug 20 17:12:05 bazille kernel:  ? lock_timer_base+0x38/0x5f
> Aug 20 17:12:05 bazille kernel:  __cancel_work_timer+0xea/0x13d
> Aug 20 17:12:05 bazille kernel:  ? preempt_latency_start+0x2b/0x46
> Aug 20 17:12:05 bazille kernel:  rdma_addr_cancel+0x70/0x81 [ib_core]
> Aug 20 17:12:05 bazille kernel:  _destroy_id+0x1a/0x246 [rdma_cm]
> Aug 20 17:12:05 bazille kernel:  rpcrdma_xprt_connect+0x115/0x5ae [rpcrdma]
> Aug 20 17:12:05 bazille kernel:  ? _raw_spin_unlock+0x14/0x29
> Aug 20 17:12:05 bazille kernel:  ? raw_spin_rq_unlock_irq+0x5/0x10
> Aug 20 17:12:05 bazille kernel:  ? finish_task_switch.isra.0+0x171/0x249
> Aug 20 17:12:05 bazille kernel:  xprt_rdma_connect_worker+0x3b/0xc7 [rpcrdma]
> Aug 20 17:12:05 bazille kernel:  process_one_work+0x1d8/0x2d4
> Aug 20 17:12:05 bazille kernel:  worker_thread+0x18b/0x24f
> Aug 20 17:12:05 bazille kernel:  ? rescuer_thread+0x280/0x280
> Aug 20 17:12:05 bazille kernel:  kthread+0xf4/0xfc
> Aug 20 17:12:05 bazille kernel:  ? kthread_complete_and_exit+0x1b/0x1b
> Aug 20 17:12:05 bazille kernel:  ret_from_fork+0x22/0x30
> Aug 20 17:12:05 bazille kernel:  </TASK>
> 
> The xprtiod work queue is WQ_MEM_RECLAIM, so any work queue that
> one of its work items tries to cancel has to be WQ_MEM_RECLAIM to
> prevent a priority inversion.

But why do you have WQ_MEM_RECLAIM in xprtiod?

  1270         wq = alloc_workqueue("xprtiod", WQ_UNBOUND | WQ_MEM_RECLAIM, 0);

IMHO, It will be nicer if we remove WQ_MEM_RECLAIM instead of adding it.

Thanks

> 
> Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  drivers/infiniband/core/addr.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index f253295795f0..5c36d01ebf0b 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -872,7 +872,7 @@ static struct notifier_block nb = {
>  
>  int addr_init(void)
>  {
> -	addr_wq = alloc_ordered_workqueue("ib_addr", 0);
> +	addr_wq = alloc_ordered_workqueue("ib_addr", WQ_MEM_RECLAIM);
>  	if (!addr_wq)
>  		return -ENOMEM;
>  
> 
> 

  reply	other threads:[~2022-08-23  8:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-22 15:30 [PATCH v1] RDMA/core: Fix check_flush_dependency splat on addr_wq Chuck Lever
2022-08-23  8:09 ` Leon Romanovsky [this message]
2022-08-23 13:58   ` Chuck Lever III
2022-08-24  9:20     ` Leon Romanovsky
2022-08-24 14:09       ` Chuck Lever III
2022-08-26 13:29         ` Jason Gunthorpe
2022-08-26 14:02           ` Chuck Lever III
2022-08-26 14:08             ` Jason Gunthorpe
2022-08-26 19:57               ` Chuck Lever III
2022-08-29 16:45                 ` Jason Gunthorpe
2022-08-29 17:14                   ` Chuck Lever III
2022-08-29 17:22                     ` Jason Gunthorpe
2022-08-29 18:15                       ` Chuck Lever III
2022-08-29 18:26                         ` Jason Gunthorpe
2022-08-29 19:31                           ` Chuck Lever III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YwSLOxyEtV4l2Frc@unreal \
    --to=leon@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox