Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH v1] RDMA/core: Fix check_flush_dependency splat on addr_wq
Date: Fri, 26 Aug 2022 19:57:04 +0000	[thread overview]
Message-ID: <FF62F78D-95EE-4BA1-9FC6-4C6B1F355520@oracle.com> (raw)
In-Reply-To: <YwjT9yz8reC1HDR/@nvidia.com>



> On Aug 26, 2022, at 10:08 AM, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Fri, Aug 26, 2022 at 02:02:55PM +0000, Chuck Lever III wrote:
> 
>> I see recent commits that do exactly what I've done for the reason I've done it.
>> 
>> 4c4b1996b5db ("IB/hfi1: Fix WQ_MEM_RECLAIM warning")
> 
> No, this one says:
> 
>    The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become
>    entangled with memory reclaim, so this flag is appropriate.
> 
> So it is OK, it is not the same thing as adding WQ_MEM_RECLAIM to a WQ
> that allocates memory.
> 
>> I accept that this might be a long chain to pull, but we need a plan
>> to resolve this. 
> 
> It is not just a long chain, it is something that was never designed
> to even work or thought about. People put storage ULPs on top of this
> and just ignored the problem.
> 
> If someone wants to tackle this then we need a comprehensive patch
> series identifying what functions are safe to call under memory
> reclaim contexts and then fully auditing them that they are actually
> safe.
> 
> Right now I don't even know the basic information what functions the
> storage community need to be reclaim safe.

The connect APIs would be a place to start. In the meantime, though...

Two or three years ago I spent some effort to ensure that closing
an RDMA connection leaves a client-side RPC/RDMA transport with no
RDMA resources associated with it. It releases the CQs, QP, and all
the MRs. That makes initial connect and reconnect both behave exactly
the same, and guarantees that a reconnect does not get stuck with
an old CQ that is no longer working or a QP that is in TIMEWAIT.

However that does mean that substantial resource allocation is
done on every reconnect.

One way to resolve the check_flush_dependency() splat would be
to have rpcrdma.ko allocate its own workqueue for handling
connections and MR allocation, and leave WQ_MEM_RECLAIM disabled
for it. Basically, replace the use of the xprtiod workqueue for
RPC/RDMA transports.


--
Chuck Lever




  reply	other threads:[~2022-08-26 19:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-22 15:30 [PATCH v1] RDMA/core: Fix check_flush_dependency splat on addr_wq Chuck Lever
2022-08-23  8:09 ` Leon Romanovsky
2022-08-23 13:58   ` Chuck Lever III
2022-08-24  9:20     ` Leon Romanovsky
2022-08-24 14:09       ` Chuck Lever III
2022-08-26 13:29         ` Jason Gunthorpe
2022-08-26 14:02           ` Chuck Lever III
2022-08-26 14:08             ` Jason Gunthorpe
2022-08-26 19:57               ` Chuck Lever III [this message]
2022-08-29 16:45                 ` Jason Gunthorpe
2022-08-29 17:14                   ` Chuck Lever III
2022-08-29 17:22                     ` Jason Gunthorpe
2022-08-29 18:15                       ` Chuck Lever III
2022-08-29 18:26                         ` Jason Gunthorpe
2022-08-29 19:31                           ` Chuck Lever III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FF62F78D-95EE-4BA1-9FC6-4C6B1F355520@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=jgg@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox