From: Chuck Lever III <chuck.lever@oracle.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH v1] RDMA/core: Fix check_flush_dependency splat on addr_wq
Date: Fri, 26 Aug 2022 14:02:55 +0000 [thread overview]
Message-ID: <08F23441-1532-4F40-9C2A-5DBD61B11483@oracle.com> (raw)
In-Reply-To: <YwjKpoVbd1WygWwF@nvidia.com>
> On Aug 26, 2022, at 9:29 AM, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Aug 24, 2022 at 02:09:52PM +0000, Chuck Lever III wrote:
>>
>>
>>> On Aug 24, 2022, at 5:20 AM, Leon Romanovsky <leon@kernel.org> wrote:
>>>
>>> On Tue, Aug 23, 2022 at 01:58:44PM +0000, Chuck Lever III wrote:
>>>>
>>>>
>>>>> On Aug 23, 2022, at 4:09 AM, Leon Romanovsky <leon@kernel.org> wrote:
>>>>>
>>>>> On Mon, Aug 22, 2022 at 11:30:20AM -0400, Chuck Lever wrote:
>>>
>>> <...>
>>>
>>>>>> The xprtiod work queue is WQ_MEM_RECLAIM, so any work queue that
>>>>>> one of its work items tries to cancel has to be WQ_MEM_RECLAIM to
>>>>>> prevent a priority inversion.
>>>>>
>>>>> But why do you have WQ_MEM_RECLAIM in xprtiod?
>>>>
>>>> Because RPC is under a filesystem (NFS). Therefore it has to handle
>>>> writeback demanded by direct reclaim. All of the storage ULPs have
>>>> this constraint, in fact.
>>>
>>> I don't know, this ib_addr workqueue is used when connection is created.
>>
>> Reconnection is exactly when we need to ensure that creating
>> a new connection won't trigger more memory allocation, because
>> that will immediately deadlock.
>>
>> Again, all network storage ULPs have this constraint.
>
> IMHO this whole concept is broken.
>
> The RDMA stack does not operate globally under RECLAIM, nor should it.
>
> If you attempt to do a reconnection/etc from within a RECLAIM context
> it will deadlock on one of the many allocations that are made to
> support opening the connection.
>
> The general idea of reclaim is that the entire task context working
> under the reclaim is marked with an override of the gfp flags to make
> all allocations under that call chain reclaim safe.
>
> But rdmacm does allocations outside this, eg in the WQs processing the
> CM packets. So this doesn't work and we will deadlock.
>
> Fixing it is a big deal and needs more that poking WQ_MEM_RECLAIM here
> and there..
>
> For instance, this patch is just incorrect, you can't use
> WQ_MEM_RECLAIM on a WQ that is doing allocations and expect anything
> useful to happen:
>
> addr_resolve()
> addr_resolve_neigh()
> fetch_ha()
> ib_nl_fetch_ha()
> ib_nl_ip_send_msg()
> skb = nlmsg_new(len, GFP_KERNEL);
>
> So regardless of the MEM_RECLAIM the *actual work* being canceled can
> be stuck on an non-reclaim-safe allocation.
I see recent commits that do exactly what I've done for the reason I've done it.
4c4b1996b5db ("IB/hfi1: Fix WQ_MEM_RECLAIM warning")
533d2e8b4d5e ("nvmet-tcp: fix lockdep complaint on nvmet_tcp_wq flush during queue teardown")
I accept that this might be a long chain to pull, but we need a plan
to resolve this. Storage ULPs go to a lot of trouble to pre-allocate
resources to avoid deadlocking in reclaim -- handling reclaim properly
is supposed to be designed into this stack.
--
Chuck Lever
next prev parent reply other threads:[~2022-08-26 14:03 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-22 15:30 [PATCH v1] RDMA/core: Fix check_flush_dependency splat on addr_wq Chuck Lever
2022-08-23 8:09 ` Leon Romanovsky
2022-08-23 13:58 ` Chuck Lever III
2022-08-24 9:20 ` Leon Romanovsky
2022-08-24 14:09 ` Chuck Lever III
2022-08-26 13:29 ` Jason Gunthorpe
2022-08-26 14:02 ` Chuck Lever III [this message]
2022-08-26 14:08 ` Jason Gunthorpe
2022-08-26 19:57 ` Chuck Lever III
2022-08-29 16:45 ` Jason Gunthorpe
2022-08-29 17:14 ` Chuck Lever III
2022-08-29 17:22 ` Jason Gunthorpe
2022-08-29 18:15 ` Chuck Lever III
2022-08-29 18:26 ` Jason Gunthorpe
2022-08-29 19:31 ` Chuck Lever III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=08F23441-1532-4F40-9C2A-5DBD61B11483@oracle.com \
--to=chuck.lever@oracle.com \
--cc=jgg@nvidia.com \
--cc=leon@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox