* [PATCH 1/1] RDMA/rxe: Fix null deref on srq->rq.queue after resize failure
@ 2025-10-27 17:43 Zhu Yanjun
2025-10-27 18:08 ` yanjun.zhu
2025-10-27 20:04 ` Leon Romanovsky
0 siblings, 2 replies; 4+ messages in thread
From: Zhu Yanjun @ 2025-10-27 17:43 UTC (permalink / raw)
To: zyjzyj2000, jgg, leon, linux-rdma; +Cc: Zhu Yanjun, Liu Yi
A NULL pointer dereference can occur in rxe_srq_chk_attr() when
ibv_modify_srq() is invoked twice in succession under certain error
conditions. The first call may fail in rxe_queue_resize(), which leads
rxe_srq_from_attr() to set srq->rq.queue = NULL. The second call then
triggers a crash (null deref) when accessing
srq->rq.queue->buf->index_mask.
Call Trace:
<TASK>
rxe_modify_srq+0x170/0x480 [rdma_rxe]
? __pfx_rxe_modify_srq+0x10/0x10 [rdma_rxe]
? uverbs_try_lock_object+0x4f/0xa0 [ib_uverbs]
? rdma_lookup_get_uobject+0x1f0/0x380 [ib_uverbs]
ib_uverbs_modify_srq+0x204/0x290 [ib_uverbs]
? __pfx_ib_uverbs_modify_srq+0x10/0x10 [ib_uverbs]
? tryinc_node_nr_active+0xe6/0x150
? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2c0/0x470 [ib_uverbs]
? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs]
ib_uverbs_run_method+0x55a/0x6e0 [ib_uverbs]
? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
ib_uverbs_cmd_verbs+0x54d/0x800 [ib_uverbs]
? __pfx_ib_uverbs_cmd_verbs+0x10/0x10 [ib_uverbs]
? __pfx___raw_spin_lock_irqsave+0x10/0x10
? __pfx_do_vfs_ioctl+0x10/0x10
? ioctl_has_perm.constprop.0.isra.0+0x2c7/0x4c0
? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
ib_uverbs_ioctl+0x13e/0x220 [ib_uverbs]
? __pfx_ib_uverbs_ioctl+0x10/0x10 [ib_uverbs]
__x64_sys_ioctl+0x138/0x1c0
do_syscall_64+0x82/0x250
? fdget_pos+0x58/0x4c0
? ksys_write+0xf3/0x1c0
? __pfx_ksys_write+0x10/0x10
? do_syscall_64+0xc8/0x250
? __pfx_vm_mmap_pgoff+0x10/0x10
? fget+0x173/0x230
? fput+0x2a/0x80
? ksys_mmap_pgoff+0x224/0x4c0
? do_syscall_64+0xc8/0x250
? do_user_addr_fault+0x37b/0xfe0
? clear_bhb_loop+0x50/0xa0
? clear_bhb_loop+0x50/0xa0
? clear_bhb_loop+0x50/0xa0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Root cause:
The queue is released when the first failure of rxe_cq_resize_queue.
Thus, when rxe_cq_resize_queue is called again, the above call trace
will occur.
Fix:
Aligning the error handling path in rxe_srq_from_attr() with
rxe_cq_resize_queue(), which also uses rxe_queue_resize(): do not
nullify the queue when resize fails.
Reported-by: Liu Yi <asatsuyu.liu@gmail.com>
Closes: https://paste.ubuntu.com/p/Zhj65q6gr9/
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Tested-by: Liu Yi <asatsuyu.liu@gmail.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
drivers/infiniband/sw/rxe/rxe_srq.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_srq.c b/drivers/infiniband/sw/rxe/rxe_srq.c
index 3661cb627d28..2a234f26ac10 100644
--- a/drivers/infiniband/sw/rxe/rxe_srq.c
+++ b/drivers/infiniband/sw/rxe/rxe_srq.c
@@ -171,7 +171,7 @@ int rxe_srq_from_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
udata, mi, &srq->rq.producer_lock,
&srq->rq.consumer_lock);
if (err)
- goto err_free;
+ return err;
srq->rq.max_wr = attr->max_wr;
}
@@ -180,11 +180,6 @@ int rxe_srq_from_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
srq->limit = attr->srq_limit;
return 0;
-
-err_free:
- rxe_queue_cleanup(q);
- srq->rq.queue = NULL;
- return err;
}
void rxe_srq_cleanup(struct rxe_pool_elem *elem)
--
2.51.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH 1/1] RDMA/rxe: Fix null deref on srq->rq.queue after resize failure
2025-10-27 17:43 [PATCH 1/1] RDMA/rxe: Fix null deref on srq->rq.queue after resize failure Zhu Yanjun
@ 2025-10-27 18:08 ` yanjun.zhu
2025-10-27 20:04 ` Leon Romanovsky
1 sibling, 0 replies; 4+ messages in thread
From: yanjun.zhu @ 2025-10-27 18:08 UTC (permalink / raw)
To: zyjzyj2000, jgg, leon, linux-rdma; +Cc: Liu Yi
On 10/27/25 10:43 AM, Zhu Yanjun wrote:
> A NULL pointer dereference can occur in rxe_srq_chk_attr() when
> ibv_modify_srq() is invoked twice in succession under certain error
> conditions. The first call may fail in rxe_queue_resize(), which leads
> rxe_srq_from_attr() to set srq->rq.queue = NULL. The second call then
> triggers a crash (null deref) when accessing
> srq->rq.queue->buf->index_mask.
>
> Call Trace:
> <TASK>
> rxe_modify_srq+0x170/0x480 [rdma_rxe]
> ? __pfx_rxe_modify_srq+0x10/0x10 [rdma_rxe]
> ? uverbs_try_lock_object+0x4f/0xa0 [ib_uverbs]
> ? rdma_lookup_get_uobject+0x1f0/0x380 [ib_uverbs]
> ib_uverbs_modify_srq+0x204/0x290 [ib_uverbs]
> ? __pfx_ib_uverbs_modify_srq+0x10/0x10 [ib_uverbs]
> ? tryinc_node_nr_active+0xe6/0x150
> ? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs]
> ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2c0/0x470 [ib_uverbs]
> ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
> ? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs]
> ib_uverbs_run_method+0x55a/0x6e0 [ib_uverbs]
> ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
> ib_uverbs_cmd_verbs+0x54d/0x800 [ib_uverbs]
> ? __pfx_ib_uverbs_cmd_verbs+0x10/0x10 [ib_uverbs]
> ? __pfx___raw_spin_lock_irqsave+0x10/0x10
> ? __pfx_do_vfs_ioctl+0x10/0x10
> ? ioctl_has_perm.constprop.0.isra.0+0x2c7/0x4c0
> ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
> ib_uverbs_ioctl+0x13e/0x220 [ib_uverbs]
> ? __pfx_ib_uverbs_ioctl+0x10/0x10 [ib_uverbs]
> __x64_sys_ioctl+0x138/0x1c0
> do_syscall_64+0x82/0x250
> ? fdget_pos+0x58/0x4c0
> ? ksys_write+0xf3/0x1c0
> ? __pfx_ksys_write+0x10/0x10
> ? do_syscall_64+0xc8/0x250
> ? __pfx_vm_mmap_pgoff+0x10/0x10
> ? fget+0x173/0x230
> ? fput+0x2a/0x80
> ? ksys_mmap_pgoff+0x224/0x4c0
> ? do_syscall_64+0xc8/0x250
> ? do_user_addr_fault+0x37b/0xfe0
> ? clear_bhb_loop+0x50/0xa0
> ? clear_bhb_loop+0x50/0xa0
> ? clear_bhb_loop+0x50/0xa0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Root cause:
> The queue is released when the first failure of rxe_cq_resize_queue.
> Thus, when rxe_cq_resize_queue is called again, the above call trace
> will occur.
>
> Fix:
> Aligning the error handling path in rxe_srq_from_attr() with
> rxe_cq_resize_queue(), which also uses rxe_queue_resize(): do not
> nullify the queue when resize fails.
This commit is based on the repository
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next.
Yanjun.Zhu
>
> Reported-by: Liu Yi <asatsuyu.liu@gmail.com>
> Closes: https://paste.ubuntu.com/p/Zhj65q6gr9/
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Tested-by: Liu Yi <asatsuyu.liu@gmail.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
> drivers/infiniband/sw/rxe/rxe_srq.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_srq.c b/drivers/infiniband/sw/rxe/rxe_srq.c
> index 3661cb627d28..2a234f26ac10 100644
> --- a/drivers/infiniband/sw/rxe/rxe_srq.c
> +++ b/drivers/infiniband/sw/rxe/rxe_srq.c
> @@ -171,7 +171,7 @@ int rxe_srq_from_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
> udata, mi, &srq->rq.producer_lock,
> &srq->rq.consumer_lock);
> if (err)
> - goto err_free;
> + return err;
>
> srq->rq.max_wr = attr->max_wr;
> }
> @@ -180,11 +180,6 @@ int rxe_srq_from_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
> srq->limit = attr->srq_limit;
>
> return 0;
> -
> -err_free:
> - rxe_queue_cleanup(q);
> - srq->rq.queue = NULL;
> - return err;
> }
>
> void rxe_srq_cleanup(struct rxe_pool_elem *elem)
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH 1/1] RDMA/rxe: Fix null deref on srq->rq.queue after resize failure
2025-10-27 17:43 [PATCH 1/1] RDMA/rxe: Fix null deref on srq->rq.queue after resize failure Zhu Yanjun
2025-10-27 18:08 ` yanjun.zhu
@ 2025-10-27 20:04 ` Leon Romanovsky
2025-10-27 21:29 ` yanjun.zhu
1 sibling, 1 reply; 4+ messages in thread
From: Leon Romanovsky @ 2025-10-27 20:04 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: zyjzyj2000, jgg, linux-rdma, Liu Yi
On Mon, Oct 27, 2025 at 10:43:06AM -0700, Zhu Yanjun wrote:
> A NULL pointer dereference can occur in rxe_srq_chk_attr() when
> ibv_modify_srq() is invoked twice in succession under certain error
> conditions. The first call may fail in rxe_queue_resize(), which leads
> rxe_srq_from_attr() to set srq->rq.queue = NULL. The second call then
> triggers a crash (null deref) when accessing
> srq->rq.queue->buf->index_mask.
>
> Call Trace:
> <TASK>
> rxe_modify_srq+0x170/0x480 [rdma_rxe]
> ? __pfx_rxe_modify_srq+0x10/0x10 [rdma_rxe]
> ? uverbs_try_lock_object+0x4f/0xa0 [ib_uverbs]
> ? rdma_lookup_get_uobject+0x1f0/0x380 [ib_uverbs]
> ib_uverbs_modify_srq+0x204/0x290 [ib_uverbs]
> ? __pfx_ib_uverbs_modify_srq+0x10/0x10 [ib_uverbs]
> ? tryinc_node_nr_active+0xe6/0x150
> ? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs]
> ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2c0/0x470 [ib_uverbs]
> ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
> ? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs]
> ib_uverbs_run_method+0x55a/0x6e0 [ib_uverbs]
> ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
> ib_uverbs_cmd_verbs+0x54d/0x800 [ib_uverbs]
> ? __pfx_ib_uverbs_cmd_verbs+0x10/0x10 [ib_uverbs]
> ? __pfx___raw_spin_lock_irqsave+0x10/0x10
> ? __pfx_do_vfs_ioctl+0x10/0x10
> ? ioctl_has_perm.constprop.0.isra.0+0x2c7/0x4c0
> ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
> ib_uverbs_ioctl+0x13e/0x220 [ib_uverbs]
> ? __pfx_ib_uverbs_ioctl+0x10/0x10 [ib_uverbs]
> __x64_sys_ioctl+0x138/0x1c0
> do_syscall_64+0x82/0x250
> ? fdget_pos+0x58/0x4c0
> ? ksys_write+0xf3/0x1c0
> ? __pfx_ksys_write+0x10/0x10
> ? do_syscall_64+0xc8/0x250
> ? __pfx_vm_mmap_pgoff+0x10/0x10
> ? fget+0x173/0x230
> ? fput+0x2a/0x80
> ? ksys_mmap_pgoff+0x224/0x4c0
> ? do_syscall_64+0xc8/0x250
> ? do_user_addr_fault+0x37b/0xfe0
> ? clear_bhb_loop+0x50/0xa0
> ? clear_bhb_loop+0x50/0xa0
> ? clear_bhb_loop+0x50/0xa0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Root cause:
> The queue is released when the first failure of rxe_cq_resize_queue.
> Thus, when rxe_cq_resize_queue is called again, the above call trace
> will occur.
>
> Fix:
> Aligning the error handling path in rxe_srq_from_attr() with
> rxe_cq_resize_queue(), which also uses rxe_queue_resize(): do not
> nullify the queue when resize fails.
Did these two paragraphs come from AI? They don't add any new
information, let's remove them.
>
> Reported-by: Liu Yi <asatsuyu.liu@gmail.com>
> Closes: https://paste.ubuntu.com/p/Zhj65q6gr9/
Link in "Closes" tag should point to report and not to some random
place.
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Tested-by: Liu Yi <asatsuyu.liu@gmail.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
> drivers/infiniband/sw/rxe/rxe_srq.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
It is second version of previously sent patch. Please add changelog.
Thanks
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH 1/1] RDMA/rxe: Fix null deref on srq->rq.queue after resize failure
2025-10-27 20:04 ` Leon Romanovsky
@ 2025-10-27 21:29 ` yanjun.zhu
0 siblings, 0 replies; 4+ messages in thread
From: yanjun.zhu @ 2025-10-27 21:29 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: zyjzyj2000, jgg, linux-rdma, Liu Yi
On 10/27/25 1:04 PM, Leon Romanovsky wrote:
> On Mon, Oct 27, 2025 at 10:43:06AM -0700, Zhu Yanjun wrote:
>> A NULL pointer dereference can occur in rxe_srq_chk_attr() when
>> ibv_modify_srq() is invoked twice in succession under certain error
>> conditions. The first call may fail in rxe_queue_resize(), which leads
>> rxe_srq_from_attr() to set srq->rq.queue = NULL. The second call then
>> triggers a crash (null deref) when accessing
>> srq->rq.queue->buf->index_mask.
>>
>> Call Trace:
>> <TASK>
>> rxe_modify_srq+0x170/0x480 [rdma_rxe]
>> ? __pfx_rxe_modify_srq+0x10/0x10 [rdma_rxe]
>> ? uverbs_try_lock_object+0x4f/0xa0 [ib_uverbs]
>> ? rdma_lookup_get_uobject+0x1f0/0x380 [ib_uverbs]
>> ib_uverbs_modify_srq+0x204/0x290 [ib_uverbs]
>> ? __pfx_ib_uverbs_modify_srq+0x10/0x10 [ib_uverbs]
>> ? tryinc_node_nr_active+0xe6/0x150
>> ? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs]
>> ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2c0/0x470 [ib_uverbs]
>> ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
>> ? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs]
>> ib_uverbs_run_method+0x55a/0x6e0 [ib_uverbs]
>> ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
>> ib_uverbs_cmd_verbs+0x54d/0x800 [ib_uverbs]
>> ? __pfx_ib_uverbs_cmd_verbs+0x10/0x10 [ib_uverbs]
>> ? __pfx___raw_spin_lock_irqsave+0x10/0x10
>> ? __pfx_do_vfs_ioctl+0x10/0x10
>> ? ioctl_has_perm.constprop.0.isra.0+0x2c7/0x4c0
>> ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
>> ib_uverbs_ioctl+0x13e/0x220 [ib_uverbs]
>> ? __pfx_ib_uverbs_ioctl+0x10/0x10 [ib_uverbs]
>> __x64_sys_ioctl+0x138/0x1c0
>> do_syscall_64+0x82/0x250
>> ? fdget_pos+0x58/0x4c0
>> ? ksys_write+0xf3/0x1c0
>> ? __pfx_ksys_write+0x10/0x10
>> ? do_syscall_64+0xc8/0x250
>> ? __pfx_vm_mmap_pgoff+0x10/0x10
>> ? fget+0x173/0x230
>> ? fput+0x2a/0x80
>> ? ksys_mmap_pgoff+0x224/0x4c0
>> ? do_syscall_64+0xc8/0x250
>> ? do_user_addr_fault+0x37b/0xfe0
>> ? clear_bhb_loop+0x50/0xa0
>> ? clear_bhb_loop+0x50/0xa0
>> ? clear_bhb_loop+0x50/0xa0
>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> Root cause:
>> The queue is released when the first failure of rxe_cq_resize_queue.
>> Thus, when rxe_cq_resize_queue is called again, the above call trace
>> will occur.
>>
>> Fix:
>> Aligning the error handling path in rxe_srq_from_attr() with
>> rxe_cq_resize_queue(), which also uses rxe_queue_resize(): do not
>> nullify the queue when resize fails.
>
> Did these two paragraphs come from AI? They don't add any new
> information, let's remove them.
>
>>
>> Reported-by: Liu Yi <asatsuyu.liu@gmail.com>
>> Closes: https://paste.ubuntu.com/p/Zhj65q6gr9/
>
> Link in "Closes" tag should point to report and not to some random
> place.
>
>> Fixes: 8700e3e7c485 ("Soft RoCE driver")
>> Tested-by: Liu Yi <asatsuyu.liu@gmail.com>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>> ---
>> drivers/infiniband/sw/rxe/rxe_srq.c | 7 +------
>> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> It is second version of previously sent patch. Please add changelog.
OK. I will send the 3rd patch.
Yanjun.Zhu
>
> Thanks
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-10-27 21:29 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-27 17:43 [PATCH 1/1] RDMA/rxe: Fix null deref on srq->rq.queue after resize failure Zhu Yanjun
2025-10-27 18:08 ` yanjun.zhu
2025-10-27 20:04 ` Leon Romanovsky
2025-10-27 21:29 ` yanjun.zhu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).