linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: [PATCH 1/3] iw_cm: free cm_id resources on the last deref
Date: Thu, 21 Jul 2016 09:17:12 -0500	[thread overview]
Message-ID: <045e01d1e35a$935a1050$ba0e30f0$@opengridcomputing.com> (raw)
In-Reply-To: <027401d1e28d$c15bcca0$441365e0$@opengridcomputing.com>

> > > Remove the complicated logic to free the cm_id resources in iw_cm event
> > > handlers vs when an application thread destroys the device.  I'm not sure
> > > why this code was written, but simply allowing the last deref to free
> > > the memory is cleaner.  It also prevents a deadlock when applications
> > > try to destroy cm_id's in their cm event handler function.
> >
> > The description here is misleading. we can never destroy the cm_id
> > inside the cm_id handler. Also, I don't think the deadlock was on cm_id
> > removal but rather on the qp referenced by the cm_id. I think the change
> > log can be improved.
> >
> 
> I'll reword it.

The nvme unplug handler does indeed destroy all the qps -and- cm_ids used for
the controllers for this device, with the exception of the cm_id handling the
event.  That is what causes this deadlock.  Once I fixed iw_cxgb4 (in patch 2)
to not block until the refcnt reaches 0 in c4iw_destroy_qp(), I then hit the
block in iw_destroy_cm_id() which deadlocks the process due to the iw_cm worker
thread already stuck trying to post an event to the rdma_cm for the cm_id
handling the event.  

Perhaps I should describe the deadlock in detail like I did in the email threads
leading up to this series?

While I'm rambling, there is still a condition that probably needs to be
addressed:  if the application event handler function disconnects the cm_id that
is handling the event, the iw_cm workq thread gets stuck posting a
IW_CM_EVENT_CLOSE to rdma_cm.  So the iw_cm workq thread is stuck in
cm_close_handler() calling cm_id_priv->id.cm_handler() which is cma_iw_handler()
which is blocked in cma_disable_callback() because the application is currently
running its event handler for this cm_id.  This block is released when the
application returns from its event handler function.  

But maybe cma_iw_handler() should queue the event if it cannot deliver it, vs
blocking the iw_cm workq thread?   

Steve.

  reply	other threads:[~2016-07-21 14:17 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-18 21:58 [PATCH RFC 0/3] iwarp device removal deadlock fix Steve Wise
2016-07-18 20:44 ` [PATCH 1/3] iw_cm: free cm_id resources on the last deref Steve Wise
2016-07-20  8:51   ` Sagi Grimberg
2016-07-20 13:51     ` Steve Wise
2016-07-21 14:17       ` Steve Wise [this message]
     [not found]       ` <045f01d1e35a$93618a60$ba249f20$@opengridcomputing.com>
2016-07-21 15:45         ` Steve Wise
2016-07-18 20:44 ` [PATCH 2/3] iw_cxgb4: don't block in destroy_qp awaiting " Steve Wise
2016-07-20  8:52   ` Sagi Grimberg
2016-07-18 20:44 ` [PATCH 3/3] nvme-rdma: Fix device removal handling Sagi Grimberg
2016-07-21  8:15   ` Christoph Hellwig
2016-07-22 18:37   ` Steve Wise
2016-07-20  8:47 ` [PATCH RFC 0/3] iwarp device removal deadlock fix Sagi Grimberg
2016-07-20 13:49   ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='045e01d1e35a$935a1050$ba0e30f0$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).