* [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock
@ 2025-08-22 8:19 Philipp Reisner
2025-09-08 14:24 ` Leon Romanovsky
2025-09-24 13:21 ` Jason Gunthorpe
0 siblings, 2 replies; 8+ messages in thread
From: Philipp Reisner @ 2025-08-22 8:19 UTC (permalink / raw)
To: Zhu Yanjun
Cc: Jason Gunthorpe, Leon Romanovsky, linux-rdma, linux-kernel,
Philipp Reisner
Allow the comp_handler callback implementation to call ib_poll_cq().
A call to ib_poll_cq() calls rxe_poll_cq() with the rdma_rxe driver.
And rxe_poll_cq() locks cq->cq_lock. That leads to a spinlock deadlock.
The Mellanox and Intel drivers allow a comp_handler callback
implementation to call ib_poll_cq().
Avoid the deadlock by calling the comp_handler callback without
holding cq->cq_lock.
Changelog:
v1: https://lore.kernel.org/all/20250806123921.633410-1-philipp.reisner@linbit.com/
v1 -> v2:
- Only reset cq->notify to 0 when invoking the comp_handler
====================
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
drivers/infiniband/sw/rxe/rxe_cq.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c
index fffd144d509e..95652001665d 100644
--- a/drivers/infiniband/sw/rxe/rxe_cq.c
+++ b/drivers/infiniband/sw/rxe/rxe_cq.c
@@ -88,6 +88,7 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
int full;
void *addr;
unsigned long flags;
+ bool invoke_handler = false;
spin_lock_irqsave(&cq->cq_lock, flags);
@@ -113,11 +114,14 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
if ((cq->notify & IB_CQ_NEXT_COMP) ||
(cq->notify & IB_CQ_SOLICITED && solicited)) {
cq->notify = 0;
- cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
+ invoke_handler = true;
}
spin_unlock_irqrestore(&cq->cq_lock, flags);
+ if (invoke_handler)
+ cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
+
return 0;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock
2025-08-22 8:19 [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock Philipp Reisner
@ 2025-09-08 14:24 ` Leon Romanovsky
2025-09-09 14:48 ` Philipp Reisner
2025-09-24 13:21 ` Jason Gunthorpe
1 sibling, 1 reply; 8+ messages in thread
From: Leon Romanovsky @ 2025-09-08 14:24 UTC (permalink / raw)
To: Philipp Reisner; +Cc: Zhu Yanjun, Jason Gunthorpe, linux-rdma, linux-kernel
On Fri, Aug 22, 2025 at 10:19:41AM +0200, Philipp Reisner wrote:
> Allow the comp_handler callback implementation to call ib_poll_cq().
> A call to ib_poll_cq() calls rxe_poll_cq() with the rdma_rxe driver.
> And rxe_poll_cq() locks cq->cq_lock. That leads to a spinlock deadlock.
Can you please be more specific about the deadlock?
Please write call stack to describe it.
>
> The Mellanox and Intel drivers allow a comp_handler callback
> implementation to call ib_poll_cq().
>
> Avoid the deadlock by calling the comp_handler callback without
> holding cq->cq_lock.
>
> Changelog:
> v1: https://lore.kernel.org/all/20250806123921.633410-1-philipp.reisner@linbit.com/
> v1 -> v2:
> - Only reset cq->notify to 0 when invoking the comp_handler
> ====================
>
> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
> drivers/infiniband/sw/rxe/rxe_cq.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c
> index fffd144d509e..95652001665d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_cq.c
> +++ b/drivers/infiniband/sw/rxe/rxe_cq.c
> @@ -88,6 +88,7 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
> int full;
> void *addr;
> unsigned long flags;
> + bool invoke_handler = false;
>
> spin_lock_irqsave(&cq->cq_lock, flags);
>
> @@ -113,11 +114,14 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
> if ((cq->notify & IB_CQ_NEXT_COMP) ||
> (cq->notify & IB_CQ_SOLICITED && solicited)) {
> cq->notify = 0;
> - cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
> + invoke_handler = true;
> }
>
> spin_unlock_irqrestore(&cq->cq_lock, flags);
>
> + if (invoke_handler)
> + cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
> +
> return 0;
> }
>
> --
> 2.50.1
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock
2025-09-08 14:24 ` Leon Romanovsky
@ 2025-09-09 14:48 ` Philipp Reisner
2025-09-09 15:31 ` Jason Gunthorpe
0 siblings, 1 reply; 8+ messages in thread
From: Philipp Reisner @ 2025-09-09 14:48 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Zhu Yanjun, Jason Gunthorpe, linux-rdma, linux-kernel
On Mon, Sep 8, 2025 at 4:25 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Fri, Aug 22, 2025 at 10:19:41AM +0200, Philipp Reisner wrote:
> > Allow the comp_handler callback implementation to call ib_poll_cq().
> > A call to ib_poll_cq() calls rxe_poll_cq() with the rdma_rxe driver.
> > And rxe_poll_cq() locks cq->cq_lock. That leads to a spinlock deadlock.
>
> Can you please be more specific about the deadlock?
> Please write call stack to describe it.
>
Instead of a call stack, I write it from top to bottom:
The line numbers in the .c files are valid for Linux-6.16:
1 rxe_cq_post() [rxe_cq.c:85]
2 spin_lock_irqsave() [rxe_cq.c:93]
3 cq->ibcq.comp_handler() [rxe_cq.c:116]
4 some_comp_handler()
5 ib_poll_cq()
6 cq->device->ops.poll_cq() [ib_verbs.h:4037]
7 rxe_poll_cq() [rxe_verbs.c:1165]
8 spin_lock_irqsave() [rxe_verbs.c:1172]
In line 8 of this call graph, it deadlocks because the spinlock
was already acquired in line 2 of the call graph.
This patch changes that call graph to:
(Line numbers now valid with the patch in discussion applied)
1 rxe_cq_post() [rxe_cq.c:85]
2 spin_lock_irqsave() [rxe_cq.c:93]
3 spin_unlock_irqrestore() [rxe_cq.c:120]
4 cq->ibcq.comp_handler() [rxe_cq.c:123]
5 some_comp_handler()
6 ib_poll_cq()
7 cq->device->ops.poll_cq() [ib_verbs.h:4037]
8 rxe_poll_cq() [rxe_verbs.c:1165]
9 spin_lock_irqsave() [rxe_verbs.c:1172]
With the patch, there is no deadlock in line 9 of the call graph,
as the spinlock was released in line 3.
I hope that helps.
[...]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock
2025-09-09 14:48 ` Philipp Reisner
@ 2025-09-09 15:31 ` Jason Gunthorpe
2025-09-09 16:00 ` Philipp Reisner
0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2025-09-09 15:31 UTC (permalink / raw)
To: Philipp Reisner; +Cc: Leon Romanovsky, Zhu Yanjun, linux-rdma, linux-kernel
On Tue, Sep 09, 2025 at 04:48:19PM +0200, Philipp Reisner wrote:
> On Mon, Sep 8, 2025 at 4:25 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Fri, Aug 22, 2025 at 10:19:41AM +0200, Philipp Reisner wrote:
> > > Allow the comp_handler callback implementation to call ib_poll_cq().
> > > A call to ib_poll_cq() calls rxe_poll_cq() with the rdma_rxe driver.
> > > And rxe_poll_cq() locks cq->cq_lock. That leads to a spinlock deadlock.
> >
> > Can you please be more specific about the deadlock?
> > Please write call stack to describe it.
> >
> Instead of a call stack, I write it from top to bottom:
>
> The line numbers in the .c files are valid for Linux-6.16:
>
> 1 rxe_cq_post() [rxe_cq.c:85]
> 2 spin_lock_irqsave() [rxe_cq.c:93]
> 3 cq->ibcq.comp_handler() [rxe_cq.c:116]
> 4 some_comp_handler()
> 5 ib_poll_cq()
> 6 cq->device->ops.poll_cq() [ib_verbs.h:4037]
> 7 rxe_poll_cq() [rxe_verbs.c:1165]
> 8 spin_lock_irqsave() [rxe_verbs.c:1172]
>
> In line 8 of this call graph, it deadlocks because the spinlock
> was already acquired in line 2 of the call graph.
Is this even legal in verbs? I'm not sure you can do pull cq from a
interrupt driven comp handler.. Is something already doing this intree?
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock
2025-09-09 15:31 ` Jason Gunthorpe
@ 2025-09-09 16:00 ` Philipp Reisner
2025-09-10 10:27 ` Leon Romanovsky
0 siblings, 1 reply; 8+ messages in thread
From: Philipp Reisner @ 2025-09-09 16:00 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Leon Romanovsky, Zhu Yanjun, linux-rdma, linux-kernel
On Tue, Sep 9, 2025 at 5:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Sep 09, 2025 at 04:48:19PM +0200, Philipp Reisner wrote:
> > On Mon, Sep 8, 2025 at 4:25 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Fri, Aug 22, 2025 at 10:19:41AM +0200, Philipp Reisner wrote:
> > > > Allow the comp_handler callback implementation to call ib_poll_cq().
> > > > A call to ib_poll_cq() calls rxe_poll_cq() with the rdma_rxe driver.
> > > > And rxe_poll_cq() locks cq->cq_lock. That leads to a spinlock deadlock.
> > >
> > > Can you please be more specific about the deadlock?
> > > Please write call stack to describe it.
> > >
> > Instead of a call stack, I write it from top to bottom:
> >
> > The line numbers in the .c files are valid for Linux-6.16:
> >
> > 1 rxe_cq_post() [rxe_cq.c:85]
> > 2 spin_lock_irqsave() [rxe_cq.c:93]
> > 3 cq->ibcq.comp_handler() [rxe_cq.c:116]
> > 4 some_comp_handler()
> > 5 ib_poll_cq()
> > 6 cq->device->ops.poll_cq() [ib_verbs.h:4037]
> > 7 rxe_poll_cq() [rxe_verbs.c:1165]
> > 8 spin_lock_irqsave() [rxe_verbs.c:1172]
> >
> > In line 8 of this call graph, it deadlocks because the spinlock
> > was already acquired in line 2 of the call graph.
>
> Is this even legal in verbs? I'm not sure you can do pull cq from a
> interrupt driven comp handler.. Is something already doing this intree?
>
The file drivers/infiniband/sw/rdmavt/cq.c has this comment:
/*
* The completion handler will most likely rearm the notification
* and poll for all pending entries. If a new completion entry
* is added while we are in this routine, queue_work()
* won't call us again until we return so we check triggered to
* see if we need to call the handler again.
*/
Also, Intel and Mellanox cards and drivers allow calling ib_poll_cq()
from the completion handler.
The problem exists only with the RXE driver.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock
2025-09-09 16:00 ` Philipp Reisner
@ 2025-09-10 10:27 ` Leon Romanovsky
0 siblings, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2025-09-10 10:27 UTC (permalink / raw)
To: Philipp Reisner; +Cc: Jason Gunthorpe, Zhu Yanjun, linux-rdma, linux-kernel
On Tue, Sep 09, 2025 at 06:00:36PM +0200, Philipp Reisner wrote:
> On Tue, Sep 9, 2025 at 5:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Sep 09, 2025 at 04:48:19PM +0200, Philipp Reisner wrote:
> > > On Mon, Sep 8, 2025 at 4:25 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Fri, Aug 22, 2025 at 10:19:41AM +0200, Philipp Reisner wrote:
> > > > > Allow the comp_handler callback implementation to call ib_poll_cq().
> > > > > A call to ib_poll_cq() calls rxe_poll_cq() with the rdma_rxe driver.
> > > > > And rxe_poll_cq() locks cq->cq_lock. That leads to a spinlock deadlock.
> > > >
> > > > Can you please be more specific about the deadlock?
> > > > Please write call stack to describe it.
> > > >
> > > Instead of a call stack, I write it from top to bottom:
> > >
> > > The line numbers in the .c files are valid for Linux-6.16:
> > >
> > > 1 rxe_cq_post() [rxe_cq.c:85]
> > > 2 spin_lock_irqsave() [rxe_cq.c:93]
> > > 3 cq->ibcq.comp_handler() [rxe_cq.c:116]
> > > 4 some_comp_handler()
> > > 5 ib_poll_cq()
> > > 6 cq->device->ops.poll_cq() [ib_verbs.h:4037]
> > > 7 rxe_poll_cq() [rxe_verbs.c:1165]
> > > 8 spin_lock_irqsave() [rxe_verbs.c:1172]
> > >
> > > In line 8 of this call graph, it deadlocks because the spinlock
> > > was already acquired in line 2 of the call graph.
> >
> > Is this even legal in verbs? I'm not sure you can do pull cq from a
> > interrupt driven comp handler.. Is something already doing this intree?
> >
>
> The file drivers/infiniband/sw/rdmavt/cq.c has this comment:
> /*
> * The completion handler will most likely rearm the notification
> * and poll for all pending entries. If a new completion entry
> * is added while we are in this routine, queue_work()
> * won't call us again until we return so we check triggered to
> * see if we need to call the handler again.
> */
>
> Also, Intel and Mellanox cards and drivers allow calling ib_poll_cq()
> from the completion handler.
And do these drivers drop CQ lock like you are proposing here?
>
> The problem exists only with the RXE driver.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock
2025-08-22 8:19 [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock Philipp Reisner
2025-09-08 14:24 ` Leon Romanovsky
@ 2025-09-24 13:21 ` Jason Gunthorpe
2025-09-25 6:02 ` Philipp Reisner
1 sibling, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2025-09-24 13:21 UTC (permalink / raw)
To: Philipp Reisner; +Cc: Zhu Yanjun, Leon Romanovsky, linux-rdma, linux-kernel
On Fri, Aug 22, 2025 at 10:19:41AM +0200, Philipp Reisner wrote:
> Allow the comp_handler callback implementation to call ib_poll_cq().
> A call to ib_poll_cq() calls rxe_poll_cq() with the rdma_rxe driver.
> And rxe_poll_cq() locks cq->cq_lock. That leads to a spinlock deadlock.
>
> The Mellanox and Intel drivers allow a comp_handler callback
> implementation to call ib_poll_cq().
>
> Avoid the deadlock by calling the comp_handler callback without
> holding cq->cq_lock.
I spent some time looking at this, and I think the basic statement
above is right. The comp_handler should be able to call poll_cq/etc
rxe holding a lock it used to push a CQE is not correct.
However! The comp_handler is also supposed to be single threaded by
the driver, I don't think ULPs are prepared to handle concurrent calls
to comp_handler.
Other HW drivers run their comp_handlers from an EQ which is both
single threaded and does not exclude poll_cq/etc.
So while removing the cq lock here is correct from the perspective of
allowing poll_cq, I could not find any locking in rxe that made
do_complete() be single threaded.
Please send a v2, either explain how the do_complete is single
threaded in a comment above the comp_handler call, or make it be
single threaded.
Thanks,
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock
2025-09-24 13:21 ` Jason Gunthorpe
@ 2025-09-25 6:02 ` Philipp Reisner
0 siblings, 0 replies; 8+ messages in thread
From: Philipp Reisner @ 2025-09-25 6:02 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Zhu Yanjun, Leon Romanovsky, linux-rdma, linux-kernel
Hi Jason,
On Wed, Sep 24, 2025 at 3:21 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Fri, Aug 22, 2025 at 10:19:41AM +0200, Philipp Reisner wrote:
> > Allow the comp_handler callback implementation to call ib_poll_cq().
> > A call to ib_poll_cq() calls rxe_poll_cq() with the rdma_rxe driver.
> > And rxe_poll_cq() locks cq->cq_lock. That leads to a spinlock deadlock.
> >
> > The Mellanox and Intel drivers allow a comp_handler callback
> > implementation to call ib_poll_cq().
> >
> > Avoid the deadlock by calling the comp_handler callback without
> > holding cq->cq_lock.
>
> I spent some time looking at this, and I think the basic statement
> above is right. The comp_handler should be able to call poll_cq/etc
>
> rxe holding a lock it used to push a CQE is not correct.
>
> However! The comp_handler is also supposed to be single threaded by
> the driver, I don't think ULPs are prepared to handle concurrent calls
> to comp_handler.
>
> Other HW drivers run their comp_handlers from an EQ which is both
> single threaded and does not exclude poll_cq/etc.
>
> So while removing the cq lock here is correct from the perspective of
> allowing poll_cq, I could not find any locking in rxe that made
> do_complete() be single threaded.
>
> Please send a v2, either explain how the do_complete is single
> threaded in a comment above the comp_handler call, or make it be
> single threaded.
>
Thanks for following up. Sure, I will send a new version of it, it will
be v3, as this is already the discussion on v2.
Best regards,
Philipp
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-09-25 6:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 8:19 [PATCH V2] rdma_rxe: call comp_handler without holding cq->cq_lock Philipp Reisner
2025-09-08 14:24 ` Leon Romanovsky
2025-09-09 14:48 ` Philipp Reisner
2025-09-09 15:31 ` Jason Gunthorpe
2025-09-09 16:00 ` Philipp Reisner
2025-09-10 10:27 ` Leon Romanovsky
2025-09-24 13:21 ` Jason Gunthorpe
2025-09-25 6:02 ` Philipp Reisner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).