Linux NFS development
 help / color / mirror / Atom feed
* [RFC PATCH v1] svcrdma: Release transport resources synchronously
@ 2025-09-03 15:53 Chuck Lever
  2025-09-04 15:43 ` Olga Kornievskaia
  0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2025-09-03 15:53 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

NFSD has always supported added network listeners. The new netlink
protocol now enables the removal of listeners.

Olga noticed that if an RDMA listener is removed and immediately
re-added, the deferred __svc_rdma_free() function might not have
run yet, so some or all of the old listener's RDMA resources
linger, which prevents a new listener on the same address from
being created.

Also, svc_xprt_free() does a module_put() just after calling
->xpo_free(). That means if there is deferred work going on, the
module could be unloaded before that work is even started,
resulting in a UAF.

Neil asks:
> What particular part of __svc_rdma_free() needs to run in order for a
> subsequent registration to succeed?
> Can that bit be run directory from svc_rdma_free() rather than be
> delayed?
> (I know almost nothing about rdma so forgive me if the answers to these
> questions seems obvious)

The reasons I can recall are:

 - Some of the transport tear-down work can sleep
 - Releasing a cm_id is tricky and can deadlock

We might be able to mitigate the second issue with judicious
application of transport reference counting.

Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Suggested-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/svc_xprt.c                    |  1 +
 net/sunrpc/xprtrdma/svc_rdma_transport.c | 19 ++++++++-----------
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 8b1837228799..8526bfc3ab20 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -168,6 +168,7 @@ static void svc_xprt_free(struct kref *kref)
 	struct svc_xprt *xprt =
 		container_of(kref, struct svc_xprt, xpt_ref);
 	struct module *owner = xprt->xpt_class->xcl_owner;
+
 	if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags))
 		svcauth_unix_info_release(xprt);
 	put_cred(xprt->xpt_cred);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 3d7f1413df02..b7b318ad25c4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -591,12 +591,18 @@ static void svc_rdma_detach(struct svc_xprt *xprt)
 	rdma_disconnect(rdma->sc_cm_id);
 }
 
-static void __svc_rdma_free(struct work_struct *work)
+/**
+ * svc_rdma_free - Release class-specific transport resources
+ * @xprt: Generic svc transport object
+ */
+static void svc_rdma_free(struct svc_xprt *xprt)
 {
 	struct svcxprt_rdma *rdma =
-		container_of(work, struct svcxprt_rdma, sc_work);
+		container_of(xprt, struct svcxprt_rdma, sc_xprt);
 	struct ib_device *device = rdma->sc_cm_id->device;
 
+	might_sleep();
+
 	/* This blocks until the Completion Queues are empty */
 	if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
 		ib_drain_qp(rdma->sc_qp);
@@ -629,15 +635,6 @@ static void __svc_rdma_free(struct work_struct *work)
 	kfree(rdma);
 }
 
-static void svc_rdma_free(struct svc_xprt *xprt)
-{
-	struct svcxprt_rdma *rdma =
-		container_of(xprt, struct svcxprt_rdma, sc_xprt);
-
-	INIT_WORK(&rdma->sc_work, __svc_rdma_free);
-	schedule_work(&rdma->sc_work);
-}
-
 static int svc_rdma_has_wspace(struct svc_xprt *xprt)
 {
 	struct svcxprt_rdma *rdma =
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1] svcrdma: Release transport resources synchronously
  2025-09-03 15:53 [RFC PATCH v1] svcrdma: Release transport resources synchronously Chuck Lever
@ 2025-09-04 15:43 ` Olga Kornievskaia
  2025-09-04 15:48   ` Chuck Lever
  0 siblings, 1 reply; 5+ messages in thread
From: Olga Kornievskaia @ 2025-09-04 15:43 UTC (permalink / raw)
  To: Chuck Lever
  Cc: NeilBrown, Jeff Layton, Dai Ngo, Tom Talpey, linux-nfs,
	Chuck Lever

On Wed, Sep 3, 2025 at 11:53 AM Chuck Lever <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> NFSD has always supported added network listeners. The new netlink
> protocol now enables the removal of listeners.
>
> Olga noticed that if an RDMA listener is removed and immediately
> re-added, the deferred __svc_rdma_free() function might not have
> run yet, so some or all of the old listener's RDMA resources
> linger, which prevents a new listener on the same address from
> being created.

Does this mean that you prefer to go the way of rdma synchronous
release vs the patch I posted?

I'm not against the approach as I have previously noted it as an
alternative which I tested and it also solves the problem. But I still
dont grasp the consequence of making svc_rdma_free() synchronous,
especially for active transports (not listening sockets).

> Also, svc_xprt_free() does a module_put() just after calling
> ->xpo_free(). That means if there is deferred work going on, the
> module could be unloaded before that work is even started,
> resulting in a UAF.
>
> Neil asks:
> > What particular part of __svc_rdma_free() needs to run in order for a
> > subsequent registration to succeed?
> > Can that bit be run directory from svc_rdma_free() rather than be
> > delayed?
> > (I know almost nothing about rdma so forgive me if the answers to these
> > questions seems obvious)
>
> The reasons I can recall are:
>
>  - Some of the transport tear-down work can sleep
>  - Releasing a cm_id is tricky and can deadlock
>
> We might be able to mitigate the second issue with judicious
> application of transport reference counting.
>
> Reported-by: Olga Kornievskaia <okorniev@redhat.com>
> Suggested-by: NeilBrown <neil@brown.name>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/svc_xprt.c                    |  1 +
>  net/sunrpc/xprtrdma/svc_rdma_transport.c | 19 ++++++++-----------
>  2 files changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 8b1837228799..8526bfc3ab20 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -168,6 +168,7 @@ static void svc_xprt_free(struct kref *kref)
>         struct svc_xprt *xprt =
>                 container_of(kref, struct svc_xprt, xpt_ref);
>         struct module *owner = xprt->xpt_class->xcl_owner;
> +
>         if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags))
>                 svcauth_unix_info_release(xprt);
>         put_cred(xprt->xpt_cred);
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 3d7f1413df02..b7b318ad25c4 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -591,12 +591,18 @@ static void svc_rdma_detach(struct svc_xprt *xprt)
>         rdma_disconnect(rdma->sc_cm_id);
>  }
>
> -static void __svc_rdma_free(struct work_struct *work)
> +/**
> + * svc_rdma_free - Release class-specific transport resources
> + * @xprt: Generic svc transport object
> + */
> +static void svc_rdma_free(struct svc_xprt *xprt)
>  {
>         struct svcxprt_rdma *rdma =
> -               container_of(work, struct svcxprt_rdma, sc_work);
> +               container_of(xprt, struct svcxprt_rdma, sc_xprt);
>         struct ib_device *device = rdma->sc_cm_id->device;
>
> +       might_sleep();
> +
>         /* This blocks until the Completion Queues are empty */
>         if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
>                 ib_drain_qp(rdma->sc_qp);
> @@ -629,15 +635,6 @@ static void __svc_rdma_free(struct work_struct *work)
>         kfree(rdma);
>  }
>
> -static void svc_rdma_free(struct svc_xprt *xprt)
> -{
> -       struct svcxprt_rdma *rdma =
> -               container_of(xprt, struct svcxprt_rdma, sc_xprt);
> -
> -       INIT_WORK(&rdma->sc_work, __svc_rdma_free);
> -       schedule_work(&rdma->sc_work);
> -}
> -
>  static int svc_rdma_has_wspace(struct svc_xprt *xprt)
>  {
>         struct svcxprt_rdma *rdma =
> --
> 2.50.0
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1] svcrdma: Release transport resources synchronously
  2025-09-04 15:43 ` Olga Kornievskaia
@ 2025-09-04 15:48   ` Chuck Lever
  2025-09-10 17:14     ` Olga Kornievskaia
  0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2025-09-04 15:48 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: NeilBrown, Jeff Layton, Dai Ngo, Tom Talpey, linux-nfs,
	Chuck Lever

On 9/4/25 11:43 AM, Olga Kornievskaia wrote:
> On Wed, Sep 3, 2025 at 11:53 AM Chuck Lever <cel@kernel.org> wrote:
>>
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> NFSD has always supported added network listeners. The new netlink
>> protocol now enables the removal of listeners.
>>
>> Olga noticed that if an RDMA listener is removed and immediately
>> re-added, the deferred __svc_rdma_free() function might not have
>> run yet, so some or all of the old listener's RDMA resources
>> linger, which prevents a new listener on the same address from
>> being created.
> 
> Does this mean that you prefer to go the way of rdma synchronous
> release vs the patch I posted?

We could do both. IMO we need to make "remove listener" work while
there are still nfsd threads running, and this RFC patch does
nothing about that.

But as noted below, it looks like the svc_xprt_free() code path assumes
that ->xpo_free releases all transport resources synchronously, and
there can be consequences if that does not happen. That needs to be
addressed somehow.


> I'm not against the approach as I have previously noted it as an
> alternative which I tested and it also solves the problem. But I still
> dont grasp the consequence of making svc_rdma_free() synchronous,
> especially for active transports (not listening sockets).

I've tested the synchronous approach a little, there didn't seem to
be a problem with it. But I agree, the certainty level is not as
high as it ought to be.


>> Also, svc_xprt_free() does a module_put() just after calling
>> ->xpo_free(). That means if there is deferred work going on, the
>> module could be unloaded before that work is even started,
>> resulting in a UAF.
>>
>> Neil asks:
>>> What particular part of __svc_rdma_free() needs to run in order for a
>>> subsequent registration to succeed?
>>> Can that bit be run directory from svc_rdma_free() rather than be
>>> delayed?
>>> (I know almost nothing about rdma so forgive me if the answers to these
>>> questions seems obvious)
>>
>> The reasons I can recall are:
>>
>>  - Some of the transport tear-down work can sleep
>>  - Releasing a cm_id is tricky and can deadlock
>>
>> We might be able to mitigate the second issue with judicious
>> application of transport reference counting.
>>
>> Reported-by: Olga Kornievskaia <okorniev@redhat.com>
>> Suggested-by: NeilBrown <neil@brown.name>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  net/sunrpc/svc_xprt.c                    |  1 +
>>  net/sunrpc/xprtrdma/svc_rdma_transport.c | 19 ++++++++-----------
>>  2 files changed, 9 insertions(+), 11 deletions(-)
>>
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index 8b1837228799..8526bfc3ab20 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -168,6 +168,7 @@ static void svc_xprt_free(struct kref *kref)
>>         struct svc_xprt *xprt =
>>                 container_of(kref, struct svc_xprt, xpt_ref);
>>         struct module *owner = xprt->xpt_class->xcl_owner;
>> +
>>         if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags))
>>                 svcauth_unix_info_release(xprt);
>>         put_cred(xprt->xpt_cred);
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index 3d7f1413df02..b7b318ad25c4 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -591,12 +591,18 @@ static void svc_rdma_detach(struct svc_xprt *xprt)
>>         rdma_disconnect(rdma->sc_cm_id);
>>  }
>>
>> -static void __svc_rdma_free(struct work_struct *work)
>> +/**
>> + * svc_rdma_free - Release class-specific transport resources
>> + * @xprt: Generic svc transport object
>> + */
>> +static void svc_rdma_free(struct svc_xprt *xprt)
>>  {
>>         struct svcxprt_rdma *rdma =
>> -               container_of(work, struct svcxprt_rdma, sc_work);
>> +               container_of(xprt, struct svcxprt_rdma, sc_xprt);
>>         struct ib_device *device = rdma->sc_cm_id->device;
>>
>> +       might_sleep();
>> +
>>         /* This blocks until the Completion Queues are empty */
>>         if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
>>                 ib_drain_qp(rdma->sc_qp);
>> @@ -629,15 +635,6 @@ static void __svc_rdma_free(struct work_struct *work)
>>         kfree(rdma);
>>  }
>>
>> -static void svc_rdma_free(struct svc_xprt *xprt)
>> -{
>> -       struct svcxprt_rdma *rdma =
>> -               container_of(xprt, struct svcxprt_rdma, sc_xprt);
>> -
>> -       INIT_WORK(&rdma->sc_work, __svc_rdma_free);
>> -       schedule_work(&rdma->sc_work);
>> -}
>> -
>>  static int svc_rdma_has_wspace(struct svc_xprt *xprt)
>>  {
>>         struct svcxprt_rdma *rdma =
>> --
>> 2.50.0
>>
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1] svcrdma: Release transport resources synchronously
  2025-09-04 15:48   ` Chuck Lever
@ 2025-09-10 17:14     ` Olga Kornievskaia
  2025-09-10 17:26       ` Chuck Lever
  0 siblings, 1 reply; 5+ messages in thread
From: Olga Kornievskaia @ 2025-09-10 17:14 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Olga Kornievskaia, NeilBrown, Jeff Layton, Dai Ngo, Tom Talpey,
	linux-nfs, Chuck Lever

On Thu, Sep 4, 2025 at 11:48 AM Chuck Lever <cel@kernel.org> wrote:
>
> On 9/4/25 11:43 AM, Olga Kornievskaia wrote:
> > On Wed, Sep 3, 2025 at 11:53 AM Chuck Lever <cel@kernel.org> wrote:
> >>
> >> From: Chuck Lever <chuck.lever@oracle.com>
> >>
> >> NFSD has always supported added network listeners. The new netlink
> >> protocol now enables the removal of listeners.
> >>
> >> Olga noticed that if an RDMA listener is removed and immediately
> >> re-added, the deferred __svc_rdma_free() function might not have
> >> run yet, so some or all of the old listener's RDMA resources
> >> linger, which prevents a new listener on the same address from
> >> being created.
> >
> > Does this mean that you prefer to go the way of rdma synchronous
> > release vs the patch I posted?
>
> We could do both. IMO we need to make "remove listener" work while
> there are still nfsd threads running, and this RFC patch does
> nothing about that.
>
> But as noted below, it looks like the svc_xprt_free() code path assumes
> that ->xpo_free releases all transport resources synchronously, and
> there can be consequences if that does not happen. That needs to be
> addressed somehow.
>
>
> > I'm not against the approach as I have previously noted it as an
> > alternative which I tested and it also solves the problem. But I still
> > dont grasp the consequence of making svc_rdma_free() synchronous,
> > especially for active transports (not listening sockets).
>
> I've tested the synchronous approach a little, there didn't seem to
> be a problem with it. But I agree, the certainty level is not as
> high as it ought to be.

So what do you think about including this patch? I don't see it in
your nfsd-testing branch.

Either this patch or my patch fix an existing problem and I believe
would be beneficial to include now (and backport). A solution for
removal of listeners on an active server can be worked on top of that.



> >> Also, svc_xprt_free() does a module_put() just after calling
> >> ->xpo_free(). That means if there is deferred work going on, the
> >> module could be unloaded before that work is even started,
> >> resulting in a UAF.
> >>
> >> Neil asks:
> >>> What particular part of __svc_rdma_free() needs to run in order for a
> >>> subsequent registration to succeed?
> >>> Can that bit be run directory from svc_rdma_free() rather than be
> >>> delayed?
> >>> (I know almost nothing about rdma so forgive me if the answers to these
> >>> questions seems obvious)
> >>
> >> The reasons I can recall are:
> >>
> >>  - Some of the transport tear-down work can sleep
> >>  - Releasing a cm_id is tricky and can deadlock
> >>
> >> We might be able to mitigate the second issue with judicious
> >> application of transport reference counting.
> >>
> >> Reported-by: Olga Kornievskaia <okorniev@redhat.com>
> >> Suggested-by: NeilBrown <neil@brown.name>
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> ---
> >>  net/sunrpc/svc_xprt.c                    |  1 +
> >>  net/sunrpc/xprtrdma/svc_rdma_transport.c | 19 ++++++++-----------
> >>  2 files changed, 9 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >> index 8b1837228799..8526bfc3ab20 100644
> >> --- a/net/sunrpc/svc_xprt.c
> >> +++ b/net/sunrpc/svc_xprt.c
> >> @@ -168,6 +168,7 @@ static void svc_xprt_free(struct kref *kref)
> >>         struct svc_xprt *xprt =
> >>                 container_of(kref, struct svc_xprt, xpt_ref);
> >>         struct module *owner = xprt->xpt_class->xcl_owner;
> >> +
> >>         if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags))
> >>                 svcauth_unix_info_release(xprt);
> >>         put_cred(xprt->xpt_cred);
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >> index 3d7f1413df02..b7b318ad25c4 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >> @@ -591,12 +591,18 @@ static void svc_rdma_detach(struct svc_xprt *xprt)
> >>         rdma_disconnect(rdma->sc_cm_id);
> >>  }
> >>
> >> -static void __svc_rdma_free(struct work_struct *work)
> >> +/**
> >> + * svc_rdma_free - Release class-specific transport resources
> >> + * @xprt: Generic svc transport object
> >> + */
> >> +static void svc_rdma_free(struct svc_xprt *xprt)
> >>  {
> >>         struct svcxprt_rdma *rdma =
> >> -               container_of(work, struct svcxprt_rdma, sc_work);
> >> +               container_of(xprt, struct svcxprt_rdma, sc_xprt);
> >>         struct ib_device *device = rdma->sc_cm_id->device;
> >>
> >> +       might_sleep();
> >> +
> >>         /* This blocks until the Completion Queues are empty */
> >>         if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
> >>                 ib_drain_qp(rdma->sc_qp);
> >> @@ -629,15 +635,6 @@ static void __svc_rdma_free(struct work_struct *work)
> >>         kfree(rdma);
> >>  }
> >>
> >> -static void svc_rdma_free(struct svc_xprt *xprt)
> >> -{
> >> -       struct svcxprt_rdma *rdma =
> >> -               container_of(xprt, struct svcxprt_rdma, sc_xprt);
> >> -
> >> -       INIT_WORK(&rdma->sc_work, __svc_rdma_free);
> >> -       schedule_work(&rdma->sc_work);
> >> -}
> >> -
> >>  static int svc_rdma_has_wspace(struct svc_xprt *xprt)
> >>  {
> >>         struct svcxprt_rdma *rdma =
> >> --
> >> 2.50.0
> >>
> >
>
>
> --
> Chuck Lever
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1] svcrdma: Release transport resources synchronously
  2025-09-10 17:14     ` Olga Kornievskaia
@ 2025-09-10 17:26       ` Chuck Lever
  0 siblings, 0 replies; 5+ messages in thread
From: Chuck Lever @ 2025-09-10 17:26 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: Olga Kornievskaia, NeilBrown, Jeff Layton, Dai Ngo, Tom Talpey,
	linux-nfs, Chuck Lever

On 9/10/25 1:14 PM, Olga Kornievskaia wrote:
> On Thu, Sep 4, 2025 at 11:48 AM Chuck Lever <cel@kernel.org> wrote:
>>
>> On 9/4/25 11:43 AM, Olga Kornievskaia wrote:
>>> On Wed, Sep 3, 2025 at 11:53 AM Chuck Lever <cel@kernel.org> wrote:
>>>>
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>
>>>> NFSD has always supported added network listeners. The new netlink
>>>> protocol now enables the removal of listeners.
>>>>
>>>> Olga noticed that if an RDMA listener is removed and immediately
>>>> re-added, the deferred __svc_rdma_free() function might not have
>>>> run yet, so some or all of the old listener's RDMA resources
>>>> linger, which prevents a new listener on the same address from
>>>> being created.
>>>
>>> Does this mean that you prefer to go the way of rdma synchronous
>>> release vs the patch I posted?
>>
>> We could do both. IMO we need to make "remove listener" work while
>> there are still nfsd threads running, and this RFC patch does
>> nothing about that.
>>
>> But as noted below, it looks like the svc_xprt_free() code path assumes
>> that ->xpo_free releases all transport resources synchronously, and
>> there can be consequences if that does not happen. That needs to be
>> addressed somehow.
>>
>>
>>> I'm not against the approach as I have previously noted it as an
>>> alternative which I tested and it also solves the problem. But I still
>>> dont grasp the consequence of making svc_rdma_free() synchronous,
>>> especially for active transports (not listening sockets).
>>
>> I've tested the synchronous approach a little, there didn't seem to
>> be a problem with it. But I agree, the certainty level is not as
>> high as it ought to be.
> 
> So what do you think about including this patch? I don't see it in
> your nfsd-testing branch.

As you mentioned earlier, it needs some aggressive testing to ensure
it does not introduce a deadlock.


> Either this patch or my patch fix an existing problem and I believe
> would be beneficial to include now (and backport). A solution for
> removal of listeners on an active server can be worked on top of that.

Agreed, both changes are worth including.


>>>> Also, svc_xprt_free() does a module_put() just after calling
>>>> ->xpo_free(). That means if there is deferred work going on, the
>>>> module could be unloaded before that work is even started,
>>>> resulting in a UAF.
>>>>
>>>> Neil asks:
>>>>> What particular part of __svc_rdma_free() needs to run in order for a
>>>>> subsequent registration to succeed?
>>>>> Can that bit be run directory from svc_rdma_free() rather than be
>>>>> delayed?
>>>>> (I know almost nothing about rdma so forgive me if the answers to these
>>>>> questions seems obvious)
>>>>
>>>> The reasons I can recall are:
>>>>
>>>>  - Some of the transport tear-down work can sleep
>>>>  - Releasing a cm_id is tricky and can deadlock
>>>>
>>>> We might be able to mitigate the second issue with judicious
>>>> application of transport reference counting.
>>>>
>>>> Reported-by: Olga Kornievskaia <okorniev@redhat.com>
>>>> Suggested-by: NeilBrown <neil@brown.name>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>  net/sunrpc/svc_xprt.c                    |  1 +
>>>>  net/sunrpc/xprtrdma/svc_rdma_transport.c | 19 ++++++++-----------
>>>>  2 files changed, 9 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>> index 8b1837228799..8526bfc3ab20 100644
>>>> --- a/net/sunrpc/svc_xprt.c
>>>> +++ b/net/sunrpc/svc_xprt.c
>>>> @@ -168,6 +168,7 @@ static void svc_xprt_free(struct kref *kref)
>>>>         struct svc_xprt *xprt =
>>>>                 container_of(kref, struct svc_xprt, xpt_ref);
>>>>         struct module *owner = xprt->xpt_class->xcl_owner;
>>>> +
>>>>         if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags))
>>>>                 svcauth_unix_info_release(xprt);
>>>>         put_cred(xprt->xpt_cred);
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>> index 3d7f1413df02..b7b318ad25c4 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>> @@ -591,12 +591,18 @@ static void svc_rdma_detach(struct svc_xprt *xprt)
>>>>         rdma_disconnect(rdma->sc_cm_id);
>>>>  }
>>>>
>>>> -static void __svc_rdma_free(struct work_struct *work)
>>>> +/**
>>>> + * svc_rdma_free - Release class-specific transport resources
>>>> + * @xprt: Generic svc transport object
>>>> + */
>>>> +static void svc_rdma_free(struct svc_xprt *xprt)
>>>>  {
>>>>         struct svcxprt_rdma *rdma =
>>>> -               container_of(work, struct svcxprt_rdma, sc_work);
>>>> +               container_of(xprt, struct svcxprt_rdma, sc_xprt);
>>>>         struct ib_device *device = rdma->sc_cm_id->device;
>>>>
>>>> +       might_sleep();
>>>> +
>>>>         /* This blocks until the Completion Queues are empty */
>>>>         if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
>>>>                 ib_drain_qp(rdma->sc_qp);
>>>> @@ -629,15 +635,6 @@ static void __svc_rdma_free(struct work_struct *work)
>>>>         kfree(rdma);
>>>>  }
>>>>
>>>> -static void svc_rdma_free(struct svc_xprt *xprt)
>>>> -{
>>>> -       struct svcxprt_rdma *rdma =
>>>> -               container_of(xprt, struct svcxprt_rdma, sc_xprt);
>>>> -
>>>> -       INIT_WORK(&rdma->sc_work, __svc_rdma_free);
>>>> -       schedule_work(&rdma->sc_work);
>>>> -}
>>>> -
>>>>  static int svc_rdma_has_wspace(struct svc_xprt *xprt)
>>>>  {
>>>>         struct svcxprt_rdma *rdma =
>>>> --
>>>> 2.50.0
>>>>
>>>
>>
>>
>> --
>> Chuck Lever
>>


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-09-10 17:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-03 15:53 [RFC PATCH v1] svcrdma: Release transport resources synchronously Chuck Lever
2025-09-04 15:43 ` Olga Kornievskaia
2025-09-04 15:48   ` Chuck Lever
2025-09-10 17:14     ` Olga Kornievskaia
2025-09-10 17:26       ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox