Linux NFS development
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: Olga Kornievskaia <okorniev@redhat.com>
Cc: NeilBrown <neil@brown.name>, Jeff Layton <jlayton@kernel.org>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [RFC PATCH v1] svcrdma: Release transport resources synchronously
Date: Thu, 4 Sep 2025 11:48:49 -0400	[thread overview]
Message-ID: <5a1f9a16-2373-4e30-b356-42e3af047126@kernel.org> (raw)
In-Reply-To: <CACSpFtB7CSkakYL5FZj_6L4dgj2ybBMVzgqX8kWhZrGBW0GT7Q@mail.gmail.com>

On 9/4/25 11:43 AM, Olga Kornievskaia wrote:
> On Wed, Sep 3, 2025 at 11:53 AM Chuck Lever <cel@kernel.org> wrote:
>>
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> NFSD has always supported added network listeners. The new netlink
>> protocol now enables the removal of listeners.
>>
>> Olga noticed that if an RDMA listener is removed and immediately
>> re-added, the deferred __svc_rdma_free() function might not have
>> run yet, so some or all of the old listener's RDMA resources
>> linger, which prevents a new listener on the same address from
>> being created.
> 
> Does this mean that you prefer to go the way of rdma synchronous
> release vs the patch I posted?

We could do both. IMO we need to make "remove listener" work while
there are still nfsd threads running, and this RFC patch does
nothing about that.

But as noted below, it looks like the svc_xprt_free() code path assumes
that ->xpo_free releases all transport resources synchronously, and
there can be consequences if that does not happen. That needs to be
addressed somehow.


> I'm not against the approach as I have previously noted it as an
> alternative which I tested and it also solves the problem. But I still
> dont grasp the consequence of making svc_rdma_free() synchronous,
> especially for active transports (not listening sockets).

I've tested the synchronous approach a little, there didn't seem to
be a problem with it. But I agree, the certainty level is not as
high as it ought to be.


>> Also, svc_xprt_free() does a module_put() just after calling
>> ->xpo_free(). That means if there is deferred work going on, the
>> module could be unloaded before that work is even started,
>> resulting in a UAF.
>>
>> Neil asks:
>>> What particular part of __svc_rdma_free() needs to run in order for a
>>> subsequent registration to succeed?
>>> Can that bit be run directory from svc_rdma_free() rather than be
>>> delayed?
>>> (I know almost nothing about rdma so forgive me if the answers to these
>>> questions seems obvious)
>>
>> The reasons I can recall are:
>>
>>  - Some of the transport tear-down work can sleep
>>  - Releasing a cm_id is tricky and can deadlock
>>
>> We might be able to mitigate the second issue with judicious
>> application of transport reference counting.
>>
>> Reported-by: Olga Kornievskaia <okorniev@redhat.com>
>> Suggested-by: NeilBrown <neil@brown.name>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  net/sunrpc/svc_xprt.c                    |  1 +
>>  net/sunrpc/xprtrdma/svc_rdma_transport.c | 19 ++++++++-----------
>>  2 files changed, 9 insertions(+), 11 deletions(-)
>>
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index 8b1837228799..8526bfc3ab20 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -168,6 +168,7 @@ static void svc_xprt_free(struct kref *kref)
>>         struct svc_xprt *xprt =
>>                 container_of(kref, struct svc_xprt, xpt_ref);
>>         struct module *owner = xprt->xpt_class->xcl_owner;
>> +
>>         if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags))
>>                 svcauth_unix_info_release(xprt);
>>         put_cred(xprt->xpt_cred);
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index 3d7f1413df02..b7b318ad25c4 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -591,12 +591,18 @@ static void svc_rdma_detach(struct svc_xprt *xprt)
>>         rdma_disconnect(rdma->sc_cm_id);
>>  }
>>
>> -static void __svc_rdma_free(struct work_struct *work)
>> +/**
>> + * svc_rdma_free - Release class-specific transport resources
>> + * @xprt: Generic svc transport object
>> + */
>> +static void svc_rdma_free(struct svc_xprt *xprt)
>>  {
>>         struct svcxprt_rdma *rdma =
>> -               container_of(work, struct svcxprt_rdma, sc_work);
>> +               container_of(xprt, struct svcxprt_rdma, sc_xprt);
>>         struct ib_device *device = rdma->sc_cm_id->device;
>>
>> +       might_sleep();
>> +
>>         /* This blocks until the Completion Queues are empty */
>>         if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
>>                 ib_drain_qp(rdma->sc_qp);
>> @@ -629,15 +635,6 @@ static void __svc_rdma_free(struct work_struct *work)
>>         kfree(rdma);
>>  }
>>
>> -static void svc_rdma_free(struct svc_xprt *xprt)
>> -{
>> -       struct svcxprt_rdma *rdma =
>> -               container_of(xprt, struct svcxprt_rdma, sc_xprt);
>> -
>> -       INIT_WORK(&rdma->sc_work, __svc_rdma_free);
>> -       schedule_work(&rdma->sc_work);
>> -}
>> -
>>  static int svc_rdma_has_wspace(struct svc_xprt *xprt)
>>  {
>>         struct svcxprt_rdma *rdma =
>> --
>> 2.50.0
>>
> 


-- 
Chuck Lever

  reply	other threads:[~2025-09-04 15:48 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-03 15:53 [RFC PATCH v1] svcrdma: Release transport resources synchronously Chuck Lever
2025-09-04 15:43 ` Olga Kornievskaia
2025-09-04 15:48   ` Chuck Lever [this message]
2025-09-10 17:14     ` Olga Kornievskaia
2025-09-10 17:26       ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a1f9a16-2373-4e30-b356-42e3af047126@kernel.org \
    --to=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox