* [PATCH] RDMA/siw: Always report immediate post SQ errors
@ 2025-09-23 14:45 bernard.metzler
2025-09-23 15:44 ` Stefan Metzmacher
2025-09-26 16:15 ` Jason Gunthorpe
0 siblings, 2 replies; 4+ messages in thread
From: bernard.metzler @ 2025-09-23 14:45 UTC (permalink / raw)
To: jgg, leon, metze; +Cc: linux-rdma, Bernard Metzler
From: Bernard Metzler <bernard.metzler@linux.dev>
In siw_post_send(), any immediate error encountered during
processing of the work request list must be reported to the
caller, even if previous work requests in that list were just
accepted and added to the send queue.
Not reporting those errors confuses the caller, which would
wait indefinitely for the failing and potentially subsequently
aborted work requests completion.
This fixes a case where immediate errors were overwritten
by subsequent code in siw_post_send().
Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Bernard Metzler <bernard.metzler@linux.dev>
---
drivers/infiniband/sw/siw/siw_verbs.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index 35c3bde0d00a..efa2f097b582 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -769,7 +769,7 @@ int siw_post_send(struct ib_qp *base_qp, const struct ib_send_wr *wr,
struct siw_wqe *wqe = tx_wqe(qp);
unsigned long flags;
- int rv = 0;
+ int rv = 0, imm_err = 0;
if (wr && !rdma_is_kernel_res(&qp->base_qp.res)) {
siw_dbg_qp(qp, "wr must be empty for user mapped sq\n");
@@ -955,9 +955,17 @@ int siw_post_send(struct ib_qp *base_qp, const struct ib_send_wr *wr,
* Send directly if SQ processing is not in progress.
* Eventual immediate errors (rv < 0) do not affect the involved
* RI resources (Verbs, 8.3.1) and thus do not prevent from SQ
- * processing, if new work is already pending. But rv must be passed
- * to caller.
+ * processing, if new work is already pending. But rv and pointer
+ * to failed work request must be passed to caller.
*/
+ if (unlikely(rv < 0)) {
+ /*
+ * Immediate error
+ */
+ siw_dbg_qp(qp, "Immediate error %d\n", rv);
+ imm_err = rv;
+ *bad_wr = wr;
+ }
if (wqe->wr_status != SIW_WR_IDLE) {
spin_unlock_irqrestore(&qp->sq_lock, flags);
goto skip_direct_sending;
@@ -982,15 +990,10 @@ int siw_post_send(struct ib_qp *base_qp, const struct ib_send_wr *wr,
up_read(&qp->state_lock);
- if (rv >= 0)
- return 0;
- /*
- * Immediate error
- */
- siw_dbg_qp(qp, "error %d\n", rv);
+ if (unlikely(imm_err))
+ return imm_err;
- *bad_wr = wr;
- return rv;
+ return (rv >= 0) ? 0 : rv;
}
/*
--
2.50.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] RDMA/siw: Always report immediate post SQ errors
2025-09-23 14:45 [PATCH] RDMA/siw: Always report immediate post SQ errors bernard.metzler
@ 2025-09-23 15:44 ` Stefan Metzmacher
2025-09-23 16:08 ` Bernard Metzler
2025-09-26 16:15 ` Jason Gunthorpe
1 sibling, 1 reply; 4+ messages in thread
From: Stefan Metzmacher @ 2025-09-23 15:44 UTC (permalink / raw)
To: bernard.metzler, jgg, leon; +Cc: linux-rdma
Am 23.09.25 um 16:45 schrieb bernard.metzler@linux.dev:
> From: Bernard Metzler <bernard.metzler@linux.dev>
>
> In siw_post_send(), any immediate error encountered during
> processing of the work request list must be reported to the
> caller, even if previous work requests in that list were just
> accepted and added to the send queue.
> Not reporting those errors confuses the caller, which would
> wait indefinitely for the failing and potentially subsequently
> aborted work requests completion.
> This fixes a case where immediate errors were overwritten
> by subsequent code in siw_post_send().
>
> Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
> Suggested-by: Stefan Metzmacher <metze@samba.org>
> Signed-off-by: Bernard Metzler <bernard.metzler@linux.dev>
> ---
> drivers/infiniband/sw/siw/siw_verbs.c | 25 ++++++++++++++-----------
> 1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
> index 35c3bde0d00a..efa2f097b582 100644
> --- a/drivers/infiniband/sw/siw/siw_verbs.c
> +++ b/drivers/infiniband/sw/siw/siw_verbs.c
> @@ -769,7 +769,7 @@ int siw_post_send(struct ib_qp *base_qp, const struct ib_send_wr *wr,
> struct siw_wqe *wqe = tx_wqe(qp);
>
> unsigned long flags;
> - int rv = 0;
> + int rv = 0, imm_err = 0;
>
> if (wr && !rdma_is_kernel_res(&qp->base_qp.res)) {
> siw_dbg_qp(qp, "wr must be empty for user mapped sq\n");
> @@ -955,9 +955,17 @@ int siw_post_send(struct ib_qp *base_qp, const struct ib_send_wr *wr,
> * Send directly if SQ processing is not in progress.
> * Eventual immediate errors (rv < 0) do not affect the involved
> * RI resources (Verbs, 8.3.1) and thus do not prevent from SQ
> - * processing, if new work is already pending. But rv must be passed
> - * to caller.
> + * processing, if new work is already pending. But rv and pointer
> + * to failed work request must be passed to caller.
> */
> + if (unlikely(rv < 0)) {
> + /*
> + * Immediate error
> + */
> + siw_dbg_qp(qp, "Immediate error %d\n", rv);
> + imm_err = rv;
> + *bad_wr = wr;
> + }
> if (wqe->wr_status != SIW_WR_IDLE) {
> spin_unlock_irqrestore(&qp->sq_lock, flags);
> goto skip_direct_sending;
> @@ -982,15 +990,10 @@ int siw_post_send(struct ib_qp *base_qp, const struct ib_send_wr *wr,
>
> up_read(&qp->state_lock);
>
> - if (rv >= 0)
> - return 0;
> - /*
> - * Immediate error
> - */
> - siw_dbg_qp(qp, "error %d\n", rv);
> + if (unlikely(imm_err))
> + return imm_err;
>
> - *bad_wr = wr;
> - return rv;
> + return (rv >= 0) ? 0 : rv;
> }
So the caller needs to set *bad_rw = NULL itself
in order to detect if an immediate error happened or not...
I'm also fine with that, I just hope it's consistent across
all drivers...
Thanks!
metze
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] RDMA/siw: Always report immediate post SQ errors
2025-09-23 15:44 ` Stefan Metzmacher
@ 2025-09-23 16:08 ` Bernard Metzler
0 siblings, 0 replies; 4+ messages in thread
From: Bernard Metzler @ 2025-09-23 16:08 UTC (permalink / raw)
To: Stefan Metzmacher, jgg, leon; +Cc: linux-rdma
On 23.09.2025 17:44, Stefan Metzmacher wrote:
> Am 23.09.25 um 16:45 schrieb bernard.metzler@linux.dev:
>> From: Bernard Metzler <bernard.metzler@linux.dev>
>>
>> In siw_post_send(), any immediate error encountered during
>> processing of the work request list must be reported to the
>> caller, even if previous work requests in that list were just
>> accepted and added to the send queue.
>> Not reporting those errors confuses the caller, which would
>> wait indefinitely for the failing and potentially subsequently
>> aborted work requests completion.
>> This fixes a case where immediate errors were overwritten
>> by subsequent code in siw_post_send().
>>
>> Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
>> Suggested-by: Stefan Metzmacher <metze@samba.org>
>> Signed-off-by: Bernard Metzler <bernard.metzler@linux.dev>
>> ---
>> drivers/infiniband/sw/siw/siw_verbs.c | 25 ++++++++++++++-----------
>> 1 file changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
>> index 35c3bde0d00a..efa2f097b582 100644
>> --- a/drivers/infiniband/sw/siw/siw_verbs.c
>> +++ b/drivers/infiniband/sw/siw/siw_verbs.c
>> @@ -769,7 +769,7 @@ int siw_post_send(struct ib_qp *base_qp, const struct ib_send_wr *wr,
>> struct siw_wqe *wqe = tx_wqe(qp);
>> unsigned long flags;
>> - int rv = 0;
>> + int rv = 0, imm_err = 0;
>> if (wr && !rdma_is_kernel_res(&qp->base_qp.res)) {
>> siw_dbg_qp(qp, "wr must be empty for user mapped sq\n");
>> @@ -955,9 +955,17 @@ int siw_post_send(struct ib_qp *base_qp, const struct ib_send_wr *wr,
>> * Send directly if SQ processing is not in progress.
>> * Eventual immediate errors (rv < 0) do not affect the involved
>> * RI resources (Verbs, 8.3.1) and thus do not prevent from SQ
>> - * processing, if new work is already pending. But rv must be passed
>> - * to caller.
>> + * processing, if new work is already pending. But rv and pointer
>> + * to failed work request must be passed to caller.
>> */
>> + if (unlikely(rv < 0)) {
>> + /*
>> + * Immediate error
>> + */
>> + siw_dbg_qp(qp, "Immediate error %d\n", rv);
>> + imm_err = rv;
>> + *bad_wr = wr;
>> + }
>> if (wqe->wr_status != SIW_WR_IDLE) {
>> spin_unlock_irqrestore(&qp->sq_lock, flags);
>> goto skip_direct_sending;
>> @@ -982,15 +990,10 @@ int siw_post_send(struct ib_qp *base_qp, const struct ib_send_wr *wr,
>> up_read(&qp->state_lock);
>> - if (rv >= 0)
>> - return 0;
>> - /*
>> - * Immediate error
>> - */
>> - siw_dbg_qp(qp, "error %d\n", rv);
>> + if (unlikely(imm_err))
>> + return imm_err;
>> - *bad_wr = wr;
>> - return rv;
>> + return (rv >= 0) ? 0 : rv;
>> }
>
>
> So the caller needs to set *bad_rw = NULL itself
> in order to detect if an immediate error happened or not...
>
> I'm also fine with that, I just hope it's consistent across
> all drivers...
>
Yes, that's what all providers do, if populating the devices
post_send() method.
Looking at the in-kernel interface, ib_post_send() even
checks if there is a valid bad_wr ptr at all, providing
a dummy pointer otherwise.
That, since all (!) ULP's in infiniband/ulp provide a
NULL bad_wr. They just do not care too much about unwinding
errors to a certain WR, or just send single entry WR
lists.
Thank you!
Bernard.
> Thanks!
> metze
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] RDMA/siw: Always report immediate post SQ errors
2025-09-23 14:45 [PATCH] RDMA/siw: Always report immediate post SQ errors bernard.metzler
2025-09-23 15:44 ` Stefan Metzmacher
@ 2025-09-26 16:15 ` Jason Gunthorpe
1 sibling, 0 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2025-09-26 16:15 UTC (permalink / raw)
To: bernard.metzler; +Cc: leon, metze, linux-rdma
On Tue, Sep 23, 2025 at 04:45:36PM +0200, bernard.metzler@linux.dev wrote:
> From: Bernard Metzler <bernard.metzler@linux.dev>
>
> In siw_post_send(), any immediate error encountered during
> processing of the work request list must be reported to the
> caller, even if previous work requests in that list were just
> accepted and added to the send queue.
> Not reporting those errors confuses the caller, which would
> wait indefinitely for the failing and potentially subsequently
> aborted work requests completion.
> This fixes a case where immediate errors were overwritten
> by subsequent code in siw_post_send().
>
> Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
> Suggested-by: Stefan Metzmacher <metze@samba.org>
> Signed-off-by: Bernard Metzler <bernard.metzler@linux.dev>
> ---
> drivers/infiniband/sw/siw/siw_verbs.c | 25 ++++++++++++++-----------
> 1 file changed, 14 insertions(+), 11 deletions(-)
Applied to for-next
Thanks,
Jason
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-09-26 16:15 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-23 14:45 [PATCH] RDMA/siw: Always report immediate post SQ errors bernard.metzler
2025-09-23 15:44 ` Stefan Metzmacher
2025-09-23 16:08 ` Bernard Metzler
2025-09-26 16:15 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox