From: Chuck Lever <cel@kernel.org>
To: Olga Kornievskaia <aglo@umich.edu>
Cc: Olga Kornievskaia <okorniev@redhat.com>,
NeilBrown <neil@brown.name>, Jeff Layton <jlayton@kernel.org>,
Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [RFC PATCH v1] svcrdma: Release transport resources synchronously
Date: Wed, 10 Sep 2025 13:26:58 -0400 [thread overview]
Message-ID: <098f4f9d-2055-48cf-9299-a26221a3a8f5@kernel.org> (raw)
In-Reply-To: <CAN-5tyFXySFeOcDorhDcD+oAzBFq_G-48mmxFSQMEik8rEcd8w@mail.gmail.com>
On 9/10/25 1:14 PM, Olga Kornievskaia wrote:
> On Thu, Sep 4, 2025 at 11:48 AM Chuck Lever <cel@kernel.org> wrote:
>>
>> On 9/4/25 11:43 AM, Olga Kornievskaia wrote:
>>> On Wed, Sep 3, 2025 at 11:53 AM Chuck Lever <cel@kernel.org> wrote:
>>>>
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>
>>>> NFSD has always supported added network listeners. The new netlink
>>>> protocol now enables the removal of listeners.
>>>>
>>>> Olga noticed that if an RDMA listener is removed and immediately
>>>> re-added, the deferred __svc_rdma_free() function might not have
>>>> run yet, so some or all of the old listener's RDMA resources
>>>> linger, which prevents a new listener on the same address from
>>>> being created.
>>>
>>> Does this mean that you prefer to go the way of rdma synchronous
>>> release vs the patch I posted?
>>
>> We could do both. IMO we need to make "remove listener" work while
>> there are still nfsd threads running, and this RFC patch does
>> nothing about that.
>>
>> But as noted below, it looks like the svc_xprt_free() code path assumes
>> that ->xpo_free releases all transport resources synchronously, and
>> there can be consequences if that does not happen. That needs to be
>> addressed somehow.
>>
>>
>>> I'm not against the approach as I have previously noted it as an
>>> alternative which I tested and it also solves the problem. But I still
>>> dont grasp the consequence of making svc_rdma_free() synchronous,
>>> especially for active transports (not listening sockets).
>>
>> I've tested the synchronous approach a little, there didn't seem to
>> be a problem with it. But I agree, the certainty level is not as
>> high as it ought to be.
>
> So what do you think about including this patch? I don't see it in
> your nfsd-testing branch.
As you mentioned earlier, it needs some aggressive testing to ensure
it does not introduce a deadlock.
> Either this patch or my patch fix an existing problem and I believe
> would be beneficial to include now (and backport). A solution for
> removal of listeners on an active server can be worked on top of that.
Agreed, both changes are worth including.
>>>> Also, svc_xprt_free() does a module_put() just after calling
>>>> ->xpo_free(). That means if there is deferred work going on, the
>>>> module could be unloaded before that work is even started,
>>>> resulting in a UAF.
>>>>
>>>> Neil asks:
>>>>> What particular part of __svc_rdma_free() needs to run in order for a
>>>>> subsequent registration to succeed?
>>>>> Can that bit be run directory from svc_rdma_free() rather than be
>>>>> delayed?
>>>>> (I know almost nothing about rdma so forgive me if the answers to these
>>>>> questions seems obvious)
>>>>
>>>> The reasons I can recall are:
>>>>
>>>> - Some of the transport tear-down work can sleep
>>>> - Releasing a cm_id is tricky and can deadlock
>>>>
>>>> We might be able to mitigate the second issue with judicious
>>>> application of transport reference counting.
>>>>
>>>> Reported-by: Olga Kornievskaia <okorniev@redhat.com>
>>>> Suggested-by: NeilBrown <neil@brown.name>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>> net/sunrpc/svc_xprt.c | 1 +
>>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 19 ++++++++-----------
>>>> 2 files changed, 9 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>> index 8b1837228799..8526bfc3ab20 100644
>>>> --- a/net/sunrpc/svc_xprt.c
>>>> +++ b/net/sunrpc/svc_xprt.c
>>>> @@ -168,6 +168,7 @@ static void svc_xprt_free(struct kref *kref)
>>>> struct svc_xprt *xprt =
>>>> container_of(kref, struct svc_xprt, xpt_ref);
>>>> struct module *owner = xprt->xpt_class->xcl_owner;
>>>> +
>>>> if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags))
>>>> svcauth_unix_info_release(xprt);
>>>> put_cred(xprt->xpt_cred);
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>> index 3d7f1413df02..b7b318ad25c4 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>> @@ -591,12 +591,18 @@ static void svc_rdma_detach(struct svc_xprt *xprt)
>>>> rdma_disconnect(rdma->sc_cm_id);
>>>> }
>>>>
>>>> -static void __svc_rdma_free(struct work_struct *work)
>>>> +/**
>>>> + * svc_rdma_free - Release class-specific transport resources
>>>> + * @xprt: Generic svc transport object
>>>> + */
>>>> +static void svc_rdma_free(struct svc_xprt *xprt)
>>>> {
>>>> struct svcxprt_rdma *rdma =
>>>> - container_of(work, struct svcxprt_rdma, sc_work);
>>>> + container_of(xprt, struct svcxprt_rdma, sc_xprt);
>>>> struct ib_device *device = rdma->sc_cm_id->device;
>>>>
>>>> + might_sleep();
>>>> +
>>>> /* This blocks until the Completion Queues are empty */
>>>> if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
>>>> ib_drain_qp(rdma->sc_qp);
>>>> @@ -629,15 +635,6 @@ static void __svc_rdma_free(struct work_struct *work)
>>>> kfree(rdma);
>>>> }
>>>>
>>>> -static void svc_rdma_free(struct svc_xprt *xprt)
>>>> -{
>>>> - struct svcxprt_rdma *rdma =
>>>> - container_of(xprt, struct svcxprt_rdma, sc_xprt);
>>>> -
>>>> - INIT_WORK(&rdma->sc_work, __svc_rdma_free);
>>>> - schedule_work(&rdma->sc_work);
>>>> -}
>>>> -
>>>> static int svc_rdma_has_wspace(struct svc_xprt *xprt)
>>>> {
>>>> struct svcxprt_rdma *rdma =
>>>> --
>>>> 2.50.0
>>>>
>>>
>>
>>
>> --
>> Chuck Lever
>>
--
Chuck Lever
prev parent reply other threads:[~2025-09-10 17:27 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-03 15:53 [RFC PATCH v1] svcrdma: Release transport resources synchronously Chuck Lever
2025-09-04 15:43 ` Olga Kornievskaia
2025-09-04 15:48 ` Chuck Lever
2025-09-10 17:14 ` Olga Kornievskaia
2025-09-10 17:26 ` Chuck Lever [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=098f4f9d-2055-48cf-9299-a26221a3a8f5@kernel.org \
--to=cel@kernel.org \
--cc=aglo@umich.edu \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox