[PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: swise@opengridcomputing.com (Steve Wise)
Subject: [PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl
Date: Mon, 18 Jul 2016 11:34:32 -0500	[thread overview]
Message-ID: <020f01d1e112$43189340$c949b9c0$@opengridcomputing.com> (raw)
In-Reply-To: <01a401d1e10b$b51afe30$1f50fa90$@opengridcomputing.com>

> > > > Hey Sagi, here is some lite reading for you. :)
> > > >
> > > > Prelude:  As part of disconnecting an iwarp connection, the iwarp
provider
> > needs
> > > > to post an IW_CM_EVENT_CLOSE event to iw_cm, which is scheduled onto the
> > > > singlethread workq thread for iw_cm.
> > > >
> > > > Here is what happens with Sagi's patch:
> > > >
> > > > nvme_rdma_device_unplug() calls nvme_rdma_stop_queue() which calls
> > > > rdma_disconnect().  This triggers the disconnect.  iw_cxgb4 posts the
> > > > IW_CM_EVENT_CLOSE to iw_cm, which ends up calling cm_close_handler() in
> > the
> > > > iw_cm workq thread context.  cm_close_handler() calls the rdma_cm event
> > > handler
> > > > for this cm_id, function cm_iw_handler(), which blocks until any
currently
> > > > running event handler for this cm_id finishes.  It does this by calling
> > > > cm_disable_callback().  However since this whole unplug process is
running
> > in
> > > > the event handler function for this same cm_id, the iw_cm workq thread
is
> > now
> > > > stuck in a deadlock.   nvme_rdma_device_unplug() however, continues on
and
> > > > schedules the controller delete worker thread and waits for it to
> complete.
> > The
> > > > delete controller worker thread tries to disconnect and destroy all the
> > > > remaining IO queues, but gets stuck in the destroy() path on the first
IO
> > queue
> > > > because the iw_cm workq thread is already stuck, and processing the
CLOSE
> > > event
> > > > is required to release a reference the iw_cm has on the iwarp providers
> qp.
> > So
> > > > everything comes to a grinding halt....
> > > >
> > > > Now: Ming's 2 patches avoid this deadlock because the cm_id that
received
> > the
> > > > device removal event is disconnected/destroyed _only after_ all the
> > controller
> > > > queues are disconnected/destroyed.  So nvme_rdma_device_unplug() doesn't
> > get
> > > > stuck waiting for the controller to delete the io queues, and only after
> > that
> > > > completes, does it delete the cm_id/qp that got the device removal
event.
> > It
> > > > then returns thus causing the rdma_cm to release the cm_id's callback
> mutex.
> > > > This causes the iw_cm workq thread to now unblock and we continue on.
> (can
> > > you
> > > > say house of cards?)
> > > >
> > > > So the net is:  the cm_id that received the device remove event _must_
be
> > > > disconnect/destroyed _last_.
> > >
> > > Hey Steve, thanks for the detailed description.
> > >
> > > IMHO, this really looks like a buggy design in iWARP connection
> > > management implementation. The fact that a rdma_disconnect forward
> > > progress is dependent on other cm_id execution looks wrong and
> > > backwards to me.
> > >
> >
> > Yea.  I'm not sure how to fix it at this point...
> >
> > > Given that the device is being removed altogether I don't think that
> > > calling rdma_disconnect is important at all. Given the fact that
> > > ib_drain_qp makes sure that the queue-pair is in error state does
> > > this incremental patch resolve the iwarp-deadlock you are seeing?
> > >
> >
> > I'll need to change c4iw_drain_sq/rq() to move the QP to ERROR explicitly.
> When
> > I implemented them I assumed the QP would be disconnected.  Let me try this
> and
> > report back.
> 
> Just doing the drain has the same issue because moving the QP to ERROR does
the
> same process as a disconnect wrt the provider/IWCM interactions.
> 
> The reason for this close provider/iwcm interaction is needed is due to the
> IWARP relationship between QPs and the TCP connection (and thus the cm_id).
The
> iwarp driver stack manipulates the QP state based on the connection events
> directly. For example, the transition from IDLE -> RTS is done _only_ by the
> iwarp driver stack.  Similarly, if the peer closes the TCP connection (or
> aborts), then the local iwarp driver stack will move the QP out of RTS into
> CLOSING or ERROR.  I think this is different from IB and the IBCM.  This in
> connection with the IWCM maintaining a reference on the QP until the iwarp
> driver posts a CLOSED event for its iw_cm_id is causing the problem here.  I'm
> not justifying the existing logic, just trying to explain.
> 
> Perhaps we can explore changing the iw_cm and the iWARP parts of the rdma_cm
> to
> fix this problem...  I guess the main problem is that qp disconnect and
destroy
> basically cannot be done in the event handler.  Granted it works if you only
> destroy the qp associated with the cm_id handling the event, but that still
> causes a deadlock, its just that the event handler can return and thus unblock
> the deadlock.

I think I can fix this in iw_cxgb4.  Stay tuned.

next prev parent reply	other threads:[~2016-07-18 16:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-13 21:26 [PATCH 0/2] nvme-rdma: device removal crash fixes Ming Lin
2016-07-13 21:26 ` [PATCH 1/2] nvme-rdma: grab reference for device removal event Ming Lin
2016-07-13 21:33   ` Steve Wise
2016-07-13 21:26 ` [PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl Ming Lin
2016-07-13 21:33   ` Steve Wise
2016-07-13 23:19   ` J Freyensee
2016-07-13 23:36     ` Ming Lin
2016-07-13 23:59       ` J Freyensee
2016-07-14  6:39         ` Ming Lin
2016-07-14 17:09           ` J Freyensee
2016-07-14 18:04             ` Ming Lin
2016-07-14  9:15   ` Sagi Grimberg
2016-07-14  9:17     ` Sagi Grimberg
2016-07-14 14:30       ` Steve Wise
2016-07-14 14:44         ` Sagi Grimberg
2016-07-14 14:59     ` Steve Wise
     [not found]     ` <011301d1dde0$4450e4e0$ccf2aea0$@opengridcomputing.com>
2016-07-14 15:02       ` Steve Wise
2016-07-14 15:26         ` Steve Wise
2016-07-14 21:27           ` Steve Wise
2016-07-15 15:52             ` Steve Wise
2016-07-17  6:01               ` Sagi Grimberg
2016-07-18 14:55                 ` Steve Wise
2016-07-18 15:47                   ` Steve Wise
2016-07-18 16:34                     ` Steve Wise [this message]
2016-07-18 18:04                       ` Steve Wise
2016-07-13 21:58 ` [PATCH 0/2] nvme-rdma: device removal crash fixes Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='020f01d1e112$43189340$c949b9c0$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).