Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH v3 5.10/5.15 0/2] Backport RDMA/rxe task and timer cleanup fixes
@ 2026-06-05 17:14 Vladislav Nikolaev
  2026-06-05 17:14 ` [PATCH v3 5.10/5.15 1/2] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" Vladislav Nikolaev
  2026-06-05 17:14 ` [PATCH v3 5.10/5.15 2/2] RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bug Vladislav Nikolaev
  0 siblings, 2 replies; 4+ messages in thread
From: Vladislav Nikolaev @ 2026-06-05 17:14 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Vladislav Nikolaev, Zhu Yanjun, Doug Ledford, Jason Gunthorpe,
	Haggai Eran, Kamal Heib, Amir Vadai, Moni Shoua, Yonatan Cohen,
	Leon Romanovsky, linux-rdma, linux-kernel, Zhu Yanjun,
	lvc-project

This series backports two upstream RDMA/rxe fixes to linux-5.10.y and
linux-5.15.y.

The first patch fixes cleanup of RXE tasks that may not have been
initialized on the rxe_create_qp() error path. The second patch fixes
the same class of lockdep issue for RC timers by checking that both
timers were initialized before deleting them.

In linux-5.10.y and linux-5.15.y the relevant task and timer cleanup
still lives in rxe_qp_destroy(), so the 1c7eec4d5f3b backport applies
the timer guard there and keeps del_timer_sync().

Zhu Yanjun (2):
  RDMA/rxe: Fix the error "trying to register non-static key in
    rxe_cleanup_task"
  RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup"
    bug

 drivers/infiniband/sw/rxe/rxe_qp.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

-- 
2.39.5

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v3 5.10/5.15 1/2] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"
  2026-06-05 17:14 [PATCH v3 5.10/5.15 0/2] Backport RDMA/rxe task and timer cleanup fixes Vladislav Nikolaev
@ 2026-06-05 17:14 ` Vladislav Nikolaev
  2026-06-05 23:23   ` yanjun.zhu
  2026-06-05 17:14 ` [PATCH v3 5.10/5.15 2/2] RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bug Vladislav Nikolaev
  1 sibling, 1 reply; 4+ messages in thread
From: Vladislav Nikolaev @ 2026-06-05 17:14 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Vladislav Nikolaev, Zhu Yanjun, Doug Ledford, Jason Gunthorpe,
	Haggai Eran, Kamal Heib, Amir Vadai, Moni Shoua, Yonatan Cohen,
	Leon Romanovsky, linux-rdma, linux-kernel, Zhu Yanjun,
	lvc-project, syzbot+cfcc1a3c85be15a40cba, Zhu Yanjun

From: Zhu Yanjun <yanjun.zhu@linux.dev>

commit b2b1ddc457458fecd1c6f385baa9fbda5f0c63ad upstream.

In the function rxe_create_qp(), rxe_qp_from_init() is called to
initialize qp, internally things like rxe_init_task are not setup until
rxe_qp_init_req().

If an error occurred before this point then the unwind will call
rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
which will oops when trying to access the uninitialized spinlock.

If rxe_init_task is not executed, rxe_cleanup_task will not be called.

Reported-by: syzbot+cfcc1a3c85be15a40cba@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=fd85757b74b3eb59f904138486f755f71e090df8
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Fixes: 2d4b21e0a291 ("IB/rxe: Prevent from completer to operate on non valid QP")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20230413101115.1366068-1-yanjun.zhu@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
[ Vladislav: add the missing resp.task.func check and keep the cleanup
order used by upstream after 960ebe97e523 ("RDMA/rxe: Remove
__rxe_do_task()"). Moving rxe_cleanup_task(&qp->resp.task) after the RC
timer cleanup is independent from that commit: timer deletion does not
depend on the responder task cleanup, and placing all task cleanup after
the timers matches the final upstream ordering while keeping this fix
minimal for 5.10/5.15. ]
Signed-off-by: Vladislav Nikolaev <vlad102nikolaev@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 4c938d841f76..0532c446760d 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -760,15 +760,20 @@ void rxe_qp_destroy(struct rxe_qp *qp)
 {
 	qp->valid = 0;
 	qp->qp_timeout_jiffies = 0;
-	rxe_cleanup_task(&qp->resp.task);
 
 	if (qp_type(qp) == IB_QPT_RC) {
 		del_timer_sync(&qp->retrans_timer);
 		del_timer_sync(&qp->rnr_nak_timer);
 	}
 
-	rxe_cleanup_task(&qp->req.task);
-	rxe_cleanup_task(&qp->comp.task);
+	if (qp->resp.task.func)
+		rxe_cleanup_task(&qp->resp.task);
+
+	if (qp->req.task.func)
+		rxe_cleanup_task(&qp->req.task);
+
+	if (qp->comp.task.func)
+		rxe_cleanup_task(&qp->comp.task);
 
 	/* flush out any receive wr's or pending requests */
 	if (qp->req.task.func)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v3 5.10/5.15 2/2] RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bug
  2026-06-05 17:14 [PATCH v3 5.10/5.15 0/2] Backport RDMA/rxe task and timer cleanup fixes Vladislav Nikolaev
  2026-06-05 17:14 ` [PATCH v3 5.10/5.15 1/2] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" Vladislav Nikolaev
@ 2026-06-05 17:14 ` Vladislav Nikolaev
  1 sibling, 0 replies; 4+ messages in thread
From: Vladislav Nikolaev @ 2026-06-05 17:14 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Vladislav Nikolaev, Zhu Yanjun, Doug Ledford, Jason Gunthorpe,
	Haggai Eran, Kamal Heib, Amir Vadai, Moni Shoua, Yonatan Cohen,
	Leon Romanovsky, linux-rdma, linux-kernel, Zhu Yanjun,
	lvc-project, syzbot+4edb496c3cad6e953a31, Zhu Yanjun

From: Zhu Yanjun <yanjun.zhu@linux.dev>

commit 1c7eec4d5f3b39cdea2153abaebf1b7229a47072 upstream.

Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 assign_lock_key kernel/locking/lockdep.c:986 [inline]
 register_lock_class+0x4a3/0x4c0 kernel/locking/lockdep.c:1300
 __lock_acquire+0x99/0x1ba0 kernel/locking/lockdep.c:5110
 lock_acquire kernel/locking/lockdep.c:5866 [inline]
 lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5823
 __timer_delete_sync+0x152/0x1b0 kernel/time/timer.c:1644
 rxe_qp_do_cleanup+0x5c3/0x7e0 drivers/infiniband/sw/rxe/rxe_qp.c:815
 execute_in_process_context+0x3a/0x160 kernel/workqueue.c:4596
 __rxe_cleanup+0x267/0x3c0 drivers/infiniband/sw/rxe/rxe_pool.c:232
 rxe_create_qp+0x3f7/0x5f0 drivers/infiniband/sw/rxe/rxe_verbs.c:604
 create_qp+0x62d/0xa80 drivers/infiniband/core/verbs.c:1250
 ib_create_qp_kernel+0x9f/0x310 drivers/infiniband/core/verbs.c:1361
 ib_create_qp include/rdma/ib_verbs.h:3803 [inline]
 rdma_create_qp+0x10c/0x340 drivers/infiniband/core/cma.c:1144
 rds_ib_setup_qp+0xc86/0x19a0 net/rds/ib_cm.c:600
 rds_ib_cm_initiate_connect+0x1e8/0x3d0 net/rds/ib_cm.c:944
 rds_rdma_cm_event_handler_cmn+0x61f/0x8c0 net/rds/rdma_transport.c:109
 cma_cm_event_handler+0x94/0x300 drivers/infiniband/core/cma.c:2184
 cma_work_handler+0x15b/0x230 drivers/infiniband/core/cma.c:3042
 process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3238
 process_scheduled_works kernel/workqueue.c:3319 [inline]
 worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
 kthread+0x3c2/0x780 kernel/kthread.c:464
 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:153
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

The root cause is as below:

In the function rxe_create_qp, the function rxe_qp_from_init is called
to create qp, if this function rxe_qp_from_init fails, rxe_cleanup will
be called to handle all the allocated resources, including the timers:
retrans_timer and rnr_nak_timer.

The function rxe_qp_from_init calls the function rxe_qp_init_req to
initialize the timers: retrans_timer and rnr_nak_timer.

But these timers are initialized in the end of rxe_qp_init_req.
If some errors occur before the initialization of these timers, this
problem will occur.

The solution is to check whether these timers are initialized or not.
If these timers are not initialized, ignore these timers.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Reported-by: syzbot+4edb496c3cad6e953a31@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4edb496c3cad6e953a31
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20250419080741.1515231-1-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
[ Vladislav: apply the timer initialization guard in rxe_qp_destroy(),
where linux-5.10.y deletes retrans_timer and rnr_nak_timer. Keep
del_timer_sync() because this branch has not renamed it to
timer_delete_sync() yet. ]
Signed-off-by: Vladislav Nikolaev <vlad102nikolaev@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 0532c446760d..f7afb3da1f92 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -761,7 +761,12 @@ void rxe_qp_destroy(struct rxe_qp *qp)
 	qp->valid = 0;
 	qp->qp_timeout_jiffies = 0;
 
-	if (qp_type(qp) == IB_QPT_RC) {
+	/* In the function timer_setup, .function is initialized. If .function
+	 * is NULL, it indicates the function timer_setup is not called, the
+	 * timer is not initialized. Or else, the timer is initialized.
+	 */
+	if (qp_type(qp) == IB_QPT_RC && qp->retrans_timer.function &&
+		qp->rnr_nak_timer.function) {
 		del_timer_sync(&qp->retrans_timer);
 		del_timer_sync(&qp->rnr_nak_timer);
 	}
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 5.10/5.15 1/2] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"
  2026-06-05 17:14 ` [PATCH v3 5.10/5.15 1/2] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" Vladislav Nikolaev
@ 2026-06-05 23:23   ` yanjun.zhu
  0 siblings, 0 replies; 4+ messages in thread
From: yanjun.zhu @ 2026-06-05 23:23 UTC (permalink / raw)
  To: Vladislav Nikolaev, stable, Greg Kroah-Hartman, Zhu Yanjun
  Cc: Zhu Yanjun, Doug Ledford, Jason Gunthorpe, Haggai Eran,
	Kamal Heib, Amir Vadai, Moni Shoua, Yonatan Cohen,
	Leon Romanovsky, linux-rdma, linux-kernel, Zhu Yanjun,
	lvc-project, syzbot+cfcc1a3c85be15a40cba

On 6/5/26 10:14 AM, Vladislav Nikolaev wrote:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> 
> commit b2b1ddc457458fecd1c6f385baa9fbda5f0c63ad upstream.
> 
> In the function rxe_create_qp(), rxe_qp_from_init() is called to
> initialize qp, internally things like rxe_init_task are not setup until
> rxe_qp_init_req().
> 
> If an error occurred before this point then the unwind will call
> rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
> which will oops when trying to access the uninitialized spinlock.
> 
> If rxe_init_task is not executed, rxe_cleanup_task will not be called.
> 
> Reported-by: syzbot+cfcc1a3c85be15a40cba@syzkaller.appspotmail.com
> Link: https://syzkaller.appspot.com/bug?id=fd85757b74b3eb59f904138486f755f71e090df8
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Fixes: 2d4b21e0a291 ("IB/rxe: Prevent from completer to operate on non valid QP")
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> Link: https://lore.kernel.org/r/20230413101115.1366068-1-yanjun.zhu@intel.com
> Signed-off-by: Leon Romanovsky <leon@kernel.org>
> [ Vladislav: add the missing resp.task.func check and keep the cleanup
> order used by upstream after 960ebe97e523 ("RDMA/rxe: Remove
> __rxe_do_task()"). Moving rxe_cleanup_task(&qp->resp.task) after the RC
> timer cleanup is independent from that commit: timer deletion does not
> depend on the responder task cleanup, and placing all task cleanup after
> the timers matches the final upstream ordering while keeping this fix
> minimal for 5.10/5.15. ]
> Signed-off-by: Vladislav Nikolaev <vlad102nikolaev@gmail.com>

Thanks a lot. I am fine with this.

Zhu Yanjun

> ---
>   drivers/infiniband/sw/rxe/rxe_qp.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index 4c938d841f76..0532c446760d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -760,15 +760,20 @@ void rxe_qp_destroy(struct rxe_qp *qp)
>   {
>   	qp->valid = 0;
>   	qp->qp_timeout_jiffies = 0;
> -	rxe_cleanup_task(&qp->resp.task);
>   
>   	if (qp_type(qp) == IB_QPT_RC) {
>   		del_timer_sync(&qp->retrans_timer);
>   		del_timer_sync(&qp->rnr_nak_timer);
>   	}
>   
> -	rxe_cleanup_task(&qp->req.task);
> -	rxe_cleanup_task(&qp->comp.task);
> +	if (qp->resp.task.func)
> +		rxe_cleanup_task(&qp->resp.task);
> +
> +	if (qp->req.task.func)
> +		rxe_cleanup_task(&qp->req.task);
> +
> +	if (qp->comp.task.func)
> +		rxe_cleanup_task(&qp->comp.task);
>   
>   	/* flush out any receive wr's or pending requests */
>   	if (qp->req.task.func)


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-05 23:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05 17:14 [PATCH v3 5.10/5.15 0/2] Backport RDMA/rxe task and timer cleanup fixes Vladislav Nikolaev
2026-06-05 17:14 ` [PATCH v3 5.10/5.15 1/2] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" Vladislav Nikolaev
2026-06-05 23:23   ` yanjun.zhu
2026-06-05 17:14 ` [PATCH v3 5.10/5.15 2/2] RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bug Vladislav Nikolaev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox