* [PATCH 1/1] RDMA/rxe: Fix the qp flush warnings in req
@ 2024-10-25 15:20 Zhu Yanjun
2024-10-26 1:58 ` Honggang LI
2024-10-30 12:22 ` Leon Romanovsky
0 siblings, 2 replies; 4+ messages in thread
From: Zhu Yanjun @ 2024-10-25 15:20 UTC (permalink / raw)
To: zyjzyj2000, jgg, leon, linux-rdma; +Cc: Zhu Yanjun
When the qp is in error state, the status of WQEs in the queue should be
set to error. Or else the following will appear.
[ 920.617269] WARNING: CPU: 1 PID: 21 at drivers/infiniband/sw/rxe/rxe_comp.c:756 rxe_completer+0x989/0xcc0 [rdma_rxe]
[ 920.617744] Modules linked in: rnbd_client(O) rtrs_client(O) rtrs_core(O) rdma_ucm rdma_cm iw_cm ib_cm crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel ib_uverbs ib_core loop brd null_blk ipv6
[ 920.618516] CPU: 1 PID: 21 Comm: ksoftirqd/1 Tainted: G O 6.1.113-storage+ #65
[ 920.618986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 920.619396] RIP: 0010:rxe_completer+0x989/0xcc0 [rdma_rxe]
[ 920.619658] Code: 0f b6 84 24 3a 02 00 00 41 89 84 24 44 04 00 00 e9 2a f7 ff ff 39 ca bb 03 00 00 00 b8 0e 00 00 00 48 0f 45 d8 e9 15 f7 ff ff <0f> 0b e9 cb f8 ff ff 41 bf f5 ff ff ff e9 08 f8 ff ff 49 8d bc 24
[ 920.620482] RSP: 0018:ffff97b7c00bbc38 EFLAGS: 00010246
[ 920.620817] RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000008
[ 920.621183] RDX: ffff960dc396ebc0 RSI: 0000000000005400 RDI: ffff960dc4e2fbac
[ 920.621548] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffffffac406450
[ 920.621884] R10: ffffffffac4060c0 R11: 0000000000000001 R12: ffff960dc4e2f800
[ 920.622254] R13: ffff960dc4e2f928 R14: ffff97b7c029c580 R15: 0000000000000000
[ 920.622609] FS: 0000000000000000(0000) GS:ffff960ef7d00000(0000) knlGS:0000000000000000
[ 920.622979] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 920.623245] CR2: 00007fa056965e90 CR3: 00000001107f1000 CR4: 00000000000006e0
[ 920.623680] Call Trace:
[ 920.623815] <TASK>
[ 920.623933] ? __warn+0x79/0xc0
[ 920.624116] ? rxe_completer+0x989/0xcc0 [rdma_rxe]
[ 920.624356] ? report_bug+0xfb/0x150
[ 920.624594] ? handle_bug+0x3c/0x60
[ 920.624796] ? exc_invalid_op+0x14/0x70
[ 920.624976] ? asm_exc_invalid_op+0x16/0x20
[ 920.625203] ? rxe_completer+0x989/0xcc0 [rdma_rxe]
[ 920.625474] ? rxe_completer+0x329/0xcc0 [rdma_rxe]
[ 920.625749] rxe_do_task+0x80/0x110 [rdma_rxe]
[ 920.626037] rxe_requester+0x625/0xde0 [rdma_rxe]
[ 920.626310] ? rxe_cq_post+0xe2/0x180 [rdma_rxe]
[ 920.626583] ? do_complete+0x18d/0x220 [rdma_rxe]
[ 920.626812] ? rxe_completer+0x1a3/0xcc0 [rdma_rxe]
[ 920.627050] rxe_do_task+0x80/0x110 [rdma_rxe]
[ 920.627285] tasklet_action_common.constprop.0+0xa4/0x120
[ 920.627522] handle_softirqs+0xc2/0x250
[ 920.627728] ? sort_range+0x20/0x20
[ 920.627942] run_ksoftirqd+0x1f/0x30
[ 920.628158] smpboot_thread_fn+0xc7/0x1b0
[ 920.628334] kthread+0xd6/0x100
[ 920.628504] ? kthread_complete_and_exit+0x20/0x20
[ 920.628709] ret_from_fork+0x1f/0x30
[ 920.628892] </TASK>
Fixes: ae720bdb703b ("RDMA/rxe: Generate error completion for error requester QP state")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 479c07e6e4ed..87a02f0deb00 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -663,10 +663,12 @@ int rxe_requester(struct rxe_qp *qp)
if (unlikely(qp_state(qp) == IB_QPS_ERR)) {
wqe = __req_next_wqe(qp);
spin_unlock_irqrestore(&qp->state_lock, flags);
- if (wqe)
+ if (wqe) {
+ wqe->status = IB_WC_WR_FLUSH_ERR;
goto err;
- else
+ } else {
goto exit;
+ }
}
if (unlikely(qp_state(qp) == IB_QPS_RESET)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] RDMA/rxe: Fix the qp flush warnings in req
2024-10-25 15:20 [PATCH 1/1] RDMA/rxe: Fix the qp flush warnings in req Zhu Yanjun
@ 2024-10-26 1:58 ` Honggang LI
2024-10-26 5:52 ` Zhu Yanjun
2024-10-30 12:22 ` Leon Romanovsky
1 sibling, 1 reply; 4+ messages in thread
From: Honggang LI @ 2024-10-26 1:58 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: zyjzyj2000, jgg, leon, linux-rdma
On Fri, Oct 25, 2024 at 05:20:36PM +0200, Zhu Yanjun wrote:
> ---
> drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index 479c07e6e4ed..87a02f0deb00 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -663,10 +663,12 @@ int rxe_requester(struct rxe_qp *qp)
> if (unlikely(qp_state(qp) == IB_QPS_ERR)) {
> wqe = __req_next_wqe(qp);
> spin_unlock_irqrestore(&qp->state_lock, flags);
> - if (wqe)
> + if (wqe) {
> + wqe->status = IB_WC_WR_FLUSH_ERR;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Why not update wqe->status in function `flush_send_wqe()` ?
thanks
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] RDMA/rxe: Fix the qp flush warnings in req
2024-10-26 1:58 ` Honggang LI
@ 2024-10-26 5:52 ` Zhu Yanjun
0 siblings, 0 replies; 4+ messages in thread
From: Zhu Yanjun @ 2024-10-26 5:52 UTC (permalink / raw)
To: Honggang LI; +Cc: zyjzyj2000, jgg, leon, linux-rdma
在 2024/10/26 3:58, Honggang LI 写道:
> On Fri, Oct 25, 2024 at 05:20:36PM +0200, Zhu Yanjun wrote:
>> ---
>> drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index 479c07e6e4ed..87a02f0deb00 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -663,10 +663,12 @@ int rxe_requester(struct rxe_qp *qp)
>> if (unlikely(qp_state(qp) == IB_QPS_ERR)) {
>> wqe = __req_next_wqe(qp);
>> spin_unlock_irqrestore(&qp->state_lock, flags);
>> - if (wqe)
>> + if (wqe) {
>> + wqe->status = IB_WC_WR_FLUSH_ERR;
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Why not update wqe->status in function `flush_send_wqe()` ?
flush_send_wqe is to handle the cqe in cq queue. Please see the source
code as below.
static int flush_send_wqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
{
struct rxe_cqe cqe = {};
struct ib_wc *wc = &cqe.ibwc;
struct ib_uverbs_wc *uwc = &cqe.uibwc;
int err;
if (qp->is_user) {
uwc->wr_id = wqe->wr.wr_id;
uwc->status = IB_WC_WR_FLUSH_ERR;
uwc->qp_num = qp->ibqp.qp_num;
} else {
wc->wr_id = wqe->wr.wr_id;
wc->status = IB_WC_WR_FLUSH_ERR;
wc->qp = &qp->ibqp;
}
err = rxe_cq_post(qp->scq, &cqe, 0);
if (err)
rxe_dbg_cq(qp->scq, "post cq failed, err = %d\n", err);
return err;
}
This error occurs in send queue. Please see the source code as below.
static struct rxe_send_wqe *__req_next_wqe(struct rxe_qp *qp)
{
struct rxe_queue *q = qp->sq.queue;
unsigned int index = qp->req.wqe_index;
unsigned int prod;
prod = queue_get_producer(q, QUEUE_TYPE_FROM_CLIENT);
if (index == prod)
return NULL;
else
return queue_addr_from_index(q, index);
}
This is why we should set the error status in send queue error handler.
Thanks,
Zhu Yanjun
>
> thanks
>
--
Best Regards,
Yanjun.Zhu
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] RDMA/rxe: Fix the qp flush warnings in req
2024-10-25 15:20 [PATCH 1/1] RDMA/rxe: Fix the qp flush warnings in req Zhu Yanjun
2024-10-26 1:58 ` Honggang LI
@ 2024-10-30 12:22 ` Leon Romanovsky
1 sibling, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2024-10-30 12:22 UTC (permalink / raw)
To: zyjzyj2000, jgg, linux-rdma, Zhu Yanjun
On Fri, 25 Oct 2024 17:20:36 +0200, Zhu Yanjun wrote:
> When the qp is in error state, the status of WQEs in the queue should be
> set to error. Or else the following will appear.
>
> [ 920.617269] WARNING: CPU: 1 PID: 21 at drivers/infiniband/sw/rxe/rxe_comp.c:756 rxe_completer+0x989/0xcc0 [rdma_rxe]
> [ 920.617744] Modules linked in: rnbd_client(O) rtrs_client(O) rtrs_core(O) rdma_ucm rdma_cm iw_cm ib_cm crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel ib_uverbs ib_core loop brd null_blk ipv6
> [ 920.618516] CPU: 1 PID: 21 Comm: ksoftirqd/1 Tainted: G O 6.1.113-storage+ #65
> [ 920.618986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> [ 920.619396] RIP: 0010:rxe_completer+0x989/0xcc0 [rdma_rxe]
> [ 920.619658] Code: 0f b6 84 24 3a 02 00 00 41 89 84 24 44 04 00 00 e9 2a f7 ff ff 39 ca bb 03 00 00 00 b8 0e 00 00 00 48 0f 45 d8 e9 15 f7 ff ff <0f> 0b e9 cb f8 ff ff 41 bf f5 ff ff ff e9 08 f8 ff ff 49 8d bc 24
> [ 920.620482] RSP: 0018:ffff97b7c00bbc38 EFLAGS: 00010246
> [ 920.620817] RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000008
> [ 920.621183] RDX: ffff960dc396ebc0 RSI: 0000000000005400 RDI: ffff960dc4e2fbac
> [ 920.621548] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffffffac406450
> [ 920.621884] R10: ffffffffac4060c0 R11: 0000000000000001 R12: ffff960dc4e2f800
> [ 920.622254] R13: ffff960dc4e2f928 R14: ffff97b7c029c580 R15: 0000000000000000
> [ 920.622609] FS: 0000000000000000(0000) GS:ffff960ef7d00000(0000) knlGS:0000000000000000
> [ 920.622979] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 920.623245] CR2: 00007fa056965e90 CR3: 00000001107f1000 CR4: 00000000000006e0
> [ 920.623680] Call Trace:
> [ 920.623815] <TASK>
> [ 920.623933] ? __warn+0x79/0xc0
> [ 920.624116] ? rxe_completer+0x989/0xcc0 [rdma_rxe]
> [ 920.624356] ? report_bug+0xfb/0x150
> [ 920.624594] ? handle_bug+0x3c/0x60
> [ 920.624796] ? exc_invalid_op+0x14/0x70
> [ 920.624976] ? asm_exc_invalid_op+0x16/0x20
> [ 920.625203] ? rxe_completer+0x989/0xcc0 [rdma_rxe]
> [ 920.625474] ? rxe_completer+0x329/0xcc0 [rdma_rxe]
> [ 920.625749] rxe_do_task+0x80/0x110 [rdma_rxe]
> [ 920.626037] rxe_requester+0x625/0xde0 [rdma_rxe]
> [ 920.626310] ? rxe_cq_post+0xe2/0x180 [rdma_rxe]
> [ 920.626583] ? do_complete+0x18d/0x220 [rdma_rxe]
> [ 920.626812] ? rxe_completer+0x1a3/0xcc0 [rdma_rxe]
> [ 920.627050] rxe_do_task+0x80/0x110 [rdma_rxe]
> [ 920.627285] tasklet_action_common.constprop.0+0xa4/0x120
> [ 920.627522] handle_softirqs+0xc2/0x250
> [ 920.627728] ? sort_range+0x20/0x20
> [ 920.627942] run_ksoftirqd+0x1f/0x30
> [ 920.628158] smpboot_thread_fn+0xc7/0x1b0
> [ 920.628334] kthread+0xd6/0x100
> [ 920.628504] ? kthread_complete_and_exit+0x20/0x20
> [ 920.628709] ret_from_fork+0x1f/0x30
> [ 920.628892] </TASK>
>
> [...]
Applied, thanks!
[1/1] RDMA/rxe: Fix the qp flush warnings in req
https://git.kernel.org/rdma/rdma/c/ea4c990fa9e19f
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-10-30 12:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-25 15:20 [PATCH 1/1] RDMA/rxe: Fix the qp flush warnings in req Zhu Yanjun
2024-10-26 1:58 ` Honggang LI
2024-10-26 5:52 ` Zhu Yanjun
2024-10-30 12:22 ` Leon Romanovsky
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.