Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* RE: [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event
@ 2022-07-14 12:59 Bernard Metzler
  2022-07-14 13:20 ` Cheng Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Bernard Metzler @ 2022-07-14 12:59 UTC (permalink / raw)
  To: Cheng Xu, jgg@ziepe.ca, leon@kernel.org; +Cc: linux-rdma@vger.kernel.org

> -----Original Message-----
> From: Cheng Xu <chengyou@linux.alibaba.com>
> Sent: Thursday, 14 July 2022 03:31
> To: jgg@ziepe.ca; leon@kernel.org; Bernard Metzler <BMT@zurich.ibm.com>
> Cc: linux-rdma@vger.kernel.org; chengyou@linux.alibaba.com
> Subject: [EXTERNAL] [PATCH for-next] RDMA/siw: Fix duplicated reported
> IW_CM_EVENT_CONNECT_REPLY event
> 
> If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
> been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
> in this case. This may trigger a call trace in iw_cm. A simple way to
> trigger this:

Great, thanks! I obviously did never hit an incomplete
MPA hdr. Please make another change to fix it correctly,
as suggested below.


case of an incomplete 
>  server: ib_send_lat
>  client: ib_send_lat -R <server_ip>
> 
> The call trace looks like this:
> 
>  kernel BUG at drivers/infiniband/core/iwcm.c:894!
>  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>  <...>
>  Workqueue: iw_cm_wq cm_work_handler [iw_cm]
>  Call Trace:
>   <TASK>
>   cm_work_handler+0x1dd/0x370 [iw_cm]
>   process_one_work+0x1e2/0x3b0
>   worker_thread+0x49/0x2e0
>   ? rescuer_thread+0x370/0x370
>   kthread+0xe5/0x110
>   ? kthread_complete_and_exit+0x20/0x20
>   ret_from_fork+0x1f/0x30
>   </TASK>
> 
> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
> ---
>  drivers/infiniband/sw/siw/siw_cm.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
> b/drivers/infiniband/sw/siw/siw_cm.c
> index 17f34d584cd9..f88d2971c2c6 100644
> --- a/drivers/infiniband/sw/siw/siw_cm.c
> +++ b/drivers/infiniband/sw/siw/siw_cm.c
> @@ -725,11 +725,11 @@ static int siw_proc_mpareply(struct siw_cep *cep)
>  	enum mpa_v2_ctrl mpa_p2p_mode = MPA_V2_RDMA_NO_RTR;
> 
>  	rv = siw_recv_mpa_rr(cep);
> -	if (rv != -EAGAIN)
> -		siw_cancel_mpatimer(cep);
>  	if (rv)
>  		goto out_err;
> 
> +	siw_cancel_mpatimer(cep);
> +

Cancel the MPA timer only if we have a
real error. -EAGAIN translates to just
further waiting. So best to add the timer
cancellation to the error bailout section.

>  	rep = &cep->mpa.hdr;
> 
>  	if (__mpa_rr_revision(rep->params.bits) > MPA_REVISION_2) {
> @@ -895,7 +895,8 @@ static int siw_proc_mpareply(struct siw_cep *cep)
>  	}
> 
>  out_err:
> -	siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
> +	if (rv != -EAGAIN)
{
cancel MPA timer here.
		siw_cancel_mpatimer(cep);
> +		siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
}
> 
>  	return rv;
>  }
> --
> 2.37.0


^ permalink raw reply	[flat|nested] 5+ messages in thread
* [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event
@ 2022-07-14  1:30 Cheng Xu
  2022-07-18 11:21 ` Leon Romanovsky
  0 siblings, 1 reply; 5+ messages in thread
From: Cheng Xu @ 2022-07-14  1:30 UTC (permalink / raw)
  To: jgg, leon, BMT; +Cc: linux-rdma, chengyou

If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
in this case. This may trigger a call trace in iw_cm. A simple way to
trigger this:
 server: ib_send_lat
 client: ib_send_lat -R <server_ip>

The call trace looks like this:

 kernel BUG at drivers/infiniband/core/iwcm.c:894!
 invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
 <...>
 Workqueue: iw_cm_wq cm_work_handler [iw_cm]
 Call Trace:
  <TASK>
  cm_work_handler+0x1dd/0x370 [iw_cm]
  process_one_work+0x1e2/0x3b0
  worker_thread+0x49/0x2e0
  ? rescuer_thread+0x370/0x370
  kthread+0xe5/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
---
 drivers/infiniband/sw/siw/siw_cm.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index 17f34d584cd9..f88d2971c2c6 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -725,11 +725,11 @@ static int siw_proc_mpareply(struct siw_cep *cep)
 	enum mpa_v2_ctrl mpa_p2p_mode = MPA_V2_RDMA_NO_RTR;
 
 	rv = siw_recv_mpa_rr(cep);
-	if (rv != -EAGAIN)
-		siw_cancel_mpatimer(cep);
 	if (rv)
 		goto out_err;
 
+	siw_cancel_mpatimer(cep);
+
 	rep = &cep->mpa.hdr;
 
 	if (__mpa_rr_revision(rep->params.bits) > MPA_REVISION_2) {
@@ -895,7 +895,8 @@ static int siw_proc_mpareply(struct siw_cep *cep)
 	}
 
 out_err:
-	siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
+	if (rv != -EAGAIN)
+		siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
 
 	return rv;
 }
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-07-18 11:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-14 12:59 [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event Bernard Metzler
2022-07-14 13:20 ` Cheng Xu
2022-07-14 13:58   ` Bernard Metzler
  -- strict thread matches above, loose matches on Subject: below --
2022-07-14  1:30 Cheng Xu
2022-07-18 11:21 ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox