linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fixing blocking rdma_connect call on failure
@ 2010-09-28 14:44 Animesh K Trivedi1
       [not found] ` <OFD1E009D6.AB14F4A6-ONC12577AC.00485D02-C12577AC.0050EF83-Xeyd2O9EBijQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Animesh K Trivedi1 @ 2010-09-28 14:44 UTC (permalink / raw)
  To: Bernard Metzler; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA


Hi,

Below is a short patch to eliminate an uninterruptible indefinite wait in
kernel while destroying the cm_id when iw_cm_connect(...) fails.

It happens when creation of a protection domain fails but user, without
checking the return value, continues with an attempt to connect to the
server. In the call iw_cm_connect(...) it retrieves  a NULL qp from the
device and fails, but does not clear the IWCM_F_CONNECT_WAIT bit. In
destroy_cm_id(...) it  waits on clearance of IWCM_F_CONNECT_WAIT bit which
never happens.

Same goes with the accept call.

I am not on the list, so please cc me for the comments and changes.

Thanks,
--
Animesh



Signed-off-by: Animesh Trivedi <atr-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>

diff --git a/drivers/infiniband/core/iwcm.c
b/drivers/infiniband/core/iwcm.c
index bfead5b..2a1e9ae 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -506,6 +506,8 @@ int iw_cm_accept(struct iw_cm_id *cm_id,
      qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn);
      if (!qp) {
            spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+           clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+           wake_up_all(&cm_id_priv->connect_wait);
            return -EINVAL;
      }
      cm_id->device->iwcm->add_ref(qp);
@@ -565,6 +567,8 @@ int iw_cm_connect(struct iw_cm_id *cm_id, struct
iw_cm_conn_param *iw_param)
      qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn);
      if (!qp) {
            spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+           clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+           wake_up_all(&cm_id_priv->connect_wait);
            return -EINVAL;
      }
      cm_id->device->iwcm->add_ref(qp);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: fixing blocking rdma_connect call on failure
       [not found] ` <OFD1E009D6.AB14F4A6-ONC12577AC.00485D02-C12577AC.0050EF83-Xeyd2O9EBijQT0dZR+AlfA@public.gmane.org>
@ 2010-09-28 14:55   ` Steve Wise
  2010-10-12  3:25   ` Roland Dreier
  1 sibling, 0 replies; 3+ messages in thread
From: Steve Wise @ 2010-09-28 14:55 UTC (permalink / raw)
  To: Animesh K Trivedi1; +Cc: Bernard Metzler, linux-rdma-u79uwXL29TY76Z2rM5mHXA


On 09/28/2010 09:44 AM, Animesh K Trivedi1 wrote:
> Hi,
>
> Below is a short patch to eliminate an uninterruptible indefinite wait in
> kernel while destroying the cm_id when iw_cm_connect(...) fails.
>
> It happens when creation of a protection domain fails but user, without
> checking the return value, continues with an attempt to connect to the
> server. In the call iw_cm_connect(...) it retrieves  a NULL qp from the
> device and fails, but does not clear the IWCM_F_CONNECT_WAIT bit. In
> destroy_cm_id(...) it  waits on clearance of IWCM_F_CONNECT_WAIT bit which
> never happens.
>
> Same goes with the accept call.
>
> I am not on the list, so please cc me for the comments and changes.
>
> Thanks,
> --
> Animesh
>
>
>
> Signed-off-by: Animesh Trivedi<atr-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>
>
> diff --git a/drivers/infiniband/core/iwcm.c
> b/drivers/infiniband/core/iwcm.c
> index bfead5b..2a1e9ae 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -506,6 +506,8 @@ int iw_cm_accept(struct iw_cm_id *cm_id,
>        qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn);
>        if (!qp) {
>              spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> +           clear_bit(IWCM_F_CONNECT_WAIT,&cm_id_priv->flags);
> +           wake_up_all(&cm_id_priv->connect_wait);
>              return -EINVAL;
>        }
>        cm_id->device->iwcm->add_ref(qp);
> @@ -565,6 +567,8 @@ int iw_cm_connect(struct iw_cm_id *cm_id, struct
> iw_cm_conn_param *iw_param)
>        qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn);
>        if (!qp) {
>              spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> +           clear_bit(IWCM_F_CONNECT_WAIT,&cm_id_priv->flags);
> +           wake_up_all(&cm_id_priv->connect_wait);
>              return -EINVAL;
>        }
>        cm_id->device->iwcm->add_ref(qp);
>
>    

Looks good to me.

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: fixing blocking rdma_connect call on failure
       [not found] ` <OFD1E009D6.AB14F4A6-ONC12577AC.00485D02-C12577AC.0050EF83-Xeyd2O9EBijQT0dZR+AlfA@public.gmane.org>
  2010-09-28 14:55   ` Steve Wise
@ 2010-10-12  3:25   ` Roland Dreier
  1 sibling, 0 replies; 3+ messages in thread
From: Roland Dreier @ 2010-10-12  3:25 UTC (permalink / raw)
  To: Animesh K Trivedi1; +Cc: Bernard Metzler, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Thanks, applied (with Steve Wise's ack-ed by).

This patch was actually whitespace mangled (tabs turned to spaces) and
so I had to apply it by hand.  In the future please try to send patches
with a mailer that does not mangle them, so that they can be handled
automatically.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-10-12  3:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-28 14:44 fixing blocking rdma_connect call on failure Animesh K Trivedi1
     [not found] ` <OFD1E009D6.AB14F4A6-ONC12577AC.00485D02-C12577AC.0050EF83-Xeyd2O9EBijQT0dZR+AlfA@public.gmane.org>
2010-09-28 14:55   ` Steve Wise
2010-10-12  3:25   ` Roland Dreier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).