Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH 6.1 0/3] RDMA/rxe: correct cleanup-task backport and timer cleanup
@ 2026-06-05 17:03 Vladislav Nikolaev
  2026-06-05 17:03 ` [PATCH 6.1 1/3] Revert "RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"" Vladislav Nikolaev
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Vladislav Nikolaev @ 2026-06-05 17:03 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Vladislav Nikolaev, Zhu Yanjun, Doug Ledford, Jason Gunthorpe,
	Haggai Eran, Kamal Heib, Amir Vadai, Moni Shoua, Yonatan Cohen,
	Leon Romanovsky, linux-rdma, linux-kernel, Zhu Yanjun,
	lvc-project

The linux-6.1.y tree contains commit 3236221bb8e4 ("RDMA/rxe: Fix the
error "trying to register non-static key in rxe_cleanup_task""), which is
an incomplete backport of upstream commit b2b1ddc45745 ("RDMA/rxe: Fix
the error "trying to register non-static key in rxe_cleanup_task"").

The stable backport added guards for req.task and comp.task, but missed
the resp.task guard and also left rxe_cleanup_task(&qp->resp.task) above
the RC timer cleanup.  The upstream fix checks all three tasks and keeps
resp.task cleanup after the timer cleanup.

This series first reverts the incomplete stable backport, then applies the
correct backport, and finally backports commit 1c7eec4d5f3b ("RDMA/rxe:
Fix "trying to register non-static key in rxe_qp_do_cleanup" bug") to
avoid deleting uninitialized RC timers during QP cleanup.  The last patch
keeps del_timer_sync(), because linux-6.1.y has not renamed it to
timer_delete_sync() yet.

Vladislav Nikolaev (1):
  Revert "RDMA/rxe: Fix the error "trying to register non-static key in
    rxe_cleanup_task""

Zhu Yanjun (2):
  RDMA/rxe: Fix the error "trying to register non-static key in
    rxe_cleanup_task"
  RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup"
    bug

 drivers/infiniband/sw/rxe/rxe_qp.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 6.1 1/3] Revert "RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task""
  2026-06-05 17:03 [PATCH 6.1 0/3] RDMA/rxe: correct cleanup-task backport and timer cleanup Vladislav Nikolaev
@ 2026-06-05 17:03 ` Vladislav Nikolaev
  2026-06-16 13:39   ` Greg Kroah-Hartman
  2026-06-05 17:03 ` [PATCH 6.1 2/3] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" Vladislav Nikolaev
  2026-06-05 17:03 ` [PATCH 6.1 3/3] RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bug Vladislav Nikolaev
  2 siblings, 1 reply; 5+ messages in thread
From: Vladislav Nikolaev @ 2026-06-05 17:03 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Vladislav Nikolaev, Zhu Yanjun, Doug Ledford, Jason Gunthorpe,
	Haggai Eran, Kamal Heib, Amir Vadai, Moni Shoua, Yonatan Cohen,
	Leon Romanovsky, linux-rdma, linux-kernel, Zhu Yanjun,
	lvc-project

This reverts commit 3236221bb8e4de8e3d0c8385f634064fb26b8e38.

The reverted commit is an incomplete backport of upstream
commit b2b1ddc45745. It added guards for req.task and comp.task
cleanup, but missed resp.task cleanup and left it before the RC timer
cleanup, unlike the upstream fix. Revert it first so the correct
backport can be applied cleanly in the following patch.

Signed-off-by: Vladislav Nikolaev <vlad102nikolaev@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 709c63e9773c..05e4a270084f 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -788,11 +788,8 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 		del_timer_sync(&qp->rnr_nak_timer);
 	}
 
-	if (qp->req.task.func)
-		rxe_cleanup_task(&qp->req.task);
-
-	if (qp->comp.task.func)
-		rxe_cleanup_task(&qp->comp.task);
+	rxe_cleanup_task(&qp->req.task);
+	rxe_cleanup_task(&qp->comp.task);
 
 	/* flush out any receive wr's or pending requests */
 	if (qp->req.task.func)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 6.1 2/3] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"
  2026-06-05 17:03 [PATCH 6.1 0/3] RDMA/rxe: correct cleanup-task backport and timer cleanup Vladislav Nikolaev
  2026-06-05 17:03 ` [PATCH 6.1 1/3] Revert "RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"" Vladislav Nikolaev
@ 2026-06-05 17:03 ` Vladislav Nikolaev
  2026-06-05 17:03 ` [PATCH 6.1 3/3] RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bug Vladislav Nikolaev
  2 siblings, 0 replies; 5+ messages in thread
From: Vladislav Nikolaev @ 2026-06-05 17:03 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Vladislav Nikolaev, Zhu Yanjun, Doug Ledford, Jason Gunthorpe,
	Haggai Eran, Kamal Heib, Amir Vadai, Moni Shoua, Yonatan Cohen,
	Leon Romanovsky, linux-rdma, linux-kernel, Zhu Yanjun,
	lvc-project, syzbot+cfcc1a3c85be15a40cba, Zhu Yanjun

From: Zhu Yanjun <yanjun.zhu@linux.dev>

commit b2b1ddc457458fecd1c6f385baa9fbda5f0c63ad upstream.

In the function rxe_create_qp(), rxe_qp_from_init() is called to
initialize qp, internally things like rxe_init_task are not setup until
rxe_qp_init_req().

If an error occurred before this point then the unwind will call
rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
which will oops when trying to access the uninitialized spinlock.

If rxe_init_task is not executed, rxe_cleanup_task will not be called.

Reported-by: syzbot+cfcc1a3c85be15a40cba@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=fd85757b74b3eb59f904138486f755f71e090df8
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Fixes: 2d4b21e0a291 ("IB/rxe: Prevent from completer to operate on non valid QP")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20230413101115.1366068-1-yanjun.zhu@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
[ Vladislav: match upstream cleanup order and add the missing
  resp.task.func check. ]
Signed-off-by: Vladislav Nikolaev <vlad102nikolaev@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 05e4a270084f..171c0f4dcbec 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -781,15 +781,20 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 
 	qp->valid = 0;
 	qp->qp_timeout_jiffies = 0;
-	rxe_cleanup_task(&qp->resp.task);
 
 	if (qp_type(qp) == IB_QPT_RC) {
 		del_timer_sync(&qp->retrans_timer);
 		del_timer_sync(&qp->rnr_nak_timer);
 	}
 
-	rxe_cleanup_task(&qp->req.task);
-	rxe_cleanup_task(&qp->comp.task);
+	if (qp->resp.task.func)
+		rxe_cleanup_task(&qp->resp.task);
+
+	if (qp->req.task.func)
+		rxe_cleanup_task(&qp->req.task);
+
+	if (qp->comp.task.func)
+		rxe_cleanup_task(&qp->comp.task);
 
 	/* flush out any receive wr's or pending requests */
 	if (qp->req.task.func)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 6.1 3/3] RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bug
  2026-06-05 17:03 [PATCH 6.1 0/3] RDMA/rxe: correct cleanup-task backport and timer cleanup Vladislav Nikolaev
  2026-06-05 17:03 ` [PATCH 6.1 1/3] Revert "RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"" Vladislav Nikolaev
  2026-06-05 17:03 ` [PATCH 6.1 2/3] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" Vladislav Nikolaev
@ 2026-06-05 17:03 ` Vladislav Nikolaev
  2 siblings, 0 replies; 5+ messages in thread
From: Vladislav Nikolaev @ 2026-06-05 17:03 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Vladislav Nikolaev, Zhu Yanjun, Doug Ledford, Jason Gunthorpe,
	Haggai Eran, Kamal Heib, Amir Vadai, Moni Shoua, Yonatan Cohen,
	Leon Romanovsky, linux-rdma, linux-kernel, Zhu Yanjun,
	lvc-project, syzbot+4edb496c3cad6e953a31, Zhu Yanjun

From: Zhu Yanjun <yanjun.zhu@linux.dev>

commit 1c7eec4d5f3b39cdea2153abaebf1b7229a47072 upstream.

Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 assign_lock_key kernel/locking/lockdep.c:986 [inline]
 register_lock_class+0x4a3/0x4c0 kernel/locking/lockdep.c:1300
 __lock_acquire+0x99/0x1ba0 kernel/locking/lockdep.c:5110
 lock_acquire kernel/locking/lockdep.c:5866 [inline]
 lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5823
 __timer_delete_sync+0x152/0x1b0 kernel/time/timer.c:1644
 rxe_qp_do_cleanup+0x5c3/0x7e0 drivers/infiniband/sw/rxe/rxe_qp.c:815
 execute_in_process_context+0x3a/0x160 kernel/workqueue.c:4596
 __rxe_cleanup+0x267/0x3c0 drivers/infiniband/sw/rxe/rxe_pool.c:232
 rxe_create_qp+0x3f7/0x5f0 drivers/infiniband/sw/rxe/rxe_verbs.c:604
 create_qp+0x62d/0xa80 drivers/infiniband/core/verbs.c:1250
 ib_create_qp_kernel+0x9f/0x310 drivers/infiniband/core/verbs.c:1361
 ib_create_qp include/rdma/ib_verbs.h:3803 [inline]
 rdma_create_qp+0x10c/0x340 drivers/infiniband/core/cma.c:1144
 rds_ib_setup_qp+0xc86/0x19a0 net/rds/ib_cm.c:600
 rds_ib_cm_initiate_connect+0x1e8/0x3d0 net/rds/ib_cm.c:944
 rds_rdma_cm_event_handler_cmn+0x61f/0x8c0 net/rds/rdma_transport.c:109
 cma_cm_event_handler+0x94/0x300 drivers/infiniband/core/cma.c:2184
 cma_work_handler+0x15b/0x230 drivers/infiniband/core/cma.c:3042
 process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3238
 process_scheduled_works kernel/workqueue.c:3319 [inline]
 worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
 kthread+0x3c2/0x780 kernel/kthread.c:464
 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:153
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

The root cause is as below:

In the function rxe_create_qp, the function rxe_qp_from_init is called
to create qp, if this function rxe_qp_from_init fails, rxe_cleanup will
be called to handle all the allocated resources, including the timers:
retrans_timer and rnr_nak_timer.

The function rxe_qp_from_init calls the function rxe_qp_init_req to
initialize the timers: retrans_timer and rnr_nak_timer.

But these timers are initialized in the end of rxe_qp_init_req.
If some errors occur before the initialization of these timers, this
problem will occur.

The solution is to check whether these timers are initialized or not.
If these timers are not initialized, ignore these timers.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Reported-by: syzbot+4edb496c3cad6e953a31@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4edb496c3cad6e953a31
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20250419080741.1515231-1-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
[ Vladislav: keep del_timer_sync() because linux-6.1.y has not renamed it
  to timer_delete_sync() yet. ]
Signed-off-by: Vladislav Nikolaev <vlad102nikolaev@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 171c0f4dcbec..899fee5f145a 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -782,7 +782,12 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 	qp->valid = 0;
 	qp->qp_timeout_jiffies = 0;
 
-	if (qp_type(qp) == IB_QPT_RC) {
+	/* In the function timer_setup, .function is initialized. If .function
+	 * is NULL, it indicates the function timer_setup is not called, the
+	 * timer is not initialized. Or else, the timer is initialized.
+	 */
+	if (qp_type(qp) == IB_QPT_RC && qp->retrans_timer.function &&
+		qp->rnr_nak_timer.function) {
 		del_timer_sync(&qp->retrans_timer);
 		del_timer_sync(&qp->rnr_nak_timer);
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 6.1 1/3] Revert "RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task""
  2026-06-05 17:03 ` [PATCH 6.1 1/3] Revert "RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"" Vladislav Nikolaev
@ 2026-06-16 13:39   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2026-06-16 13:39 UTC (permalink / raw)
  To: Vladislav Nikolaev
  Cc: stable, Zhu Yanjun, Doug Ledford, Jason Gunthorpe, Haggai Eran,
	Kamal Heib, Amir Vadai, Moni Shoua, Yonatan Cohen,
	Leon Romanovsky, linux-rdma, linux-kernel, Zhu Yanjun,
	lvc-project

On Fri, Jun 05, 2026 at 08:03:27PM +0300, Vladislav Nikolaev wrote:
> This reverts commit 3236221bb8e4de8e3d0c8385f634064fb26b8e38.
> 
> The reverted commit is an incomplete backport of upstream
> commit b2b1ddc45745. It added guards for req.task and comp.task
> cleanup, but missed resp.task cleanup and left it before the RC timer
> cleanup, unlike the upstream fix. Revert it first so the correct
> backport can be applied cleanly in the following patch.
> 
> Signed-off-by: Vladislav Nikolaev <vlad102nikolaev@gmail.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_qp.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index 709c63e9773c..05e4a270084f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -788,11 +788,8 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
>  		del_timer_sync(&qp->rnr_nak_timer);
>  	}
>  
> -	if (qp->req.task.func)
> -		rxe_cleanup_task(&qp->req.task);
> -
> -	if (qp->comp.task.func)
> -		rxe_cleanup_task(&qp->comp.task);
> +	rxe_cleanup_task(&qp->req.task);
> +	rxe_cleanup_task(&qp->comp.task);
>  
>  	/* flush out any receive wr's or pending requests */
>  	if (qp->req.task.func)
> -- 
> 2.43.0
> 

This series does not apply to the latest tree :(

Are you sure it is still needed?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-16 13:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05 17:03 [PATCH 6.1 0/3] RDMA/rxe: correct cleanup-task backport and timer cleanup Vladislav Nikolaev
2026-06-05 17:03 ` [PATCH 6.1 1/3] Revert "RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"" Vladislav Nikolaev
2026-06-16 13:39   ` Greg Kroah-Hartman
2026-06-05 17:03 ` [PATCH 6.1 2/3] RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" Vladislav Nikolaev
2026-06-05 17:03 ` [PATCH 6.1 3/3] RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bug Vladislav Nikolaev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox