[PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: swise@opengridcomputing.com (Steve Wise)
Subject: [PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl
Date: Mon, 18 Jul 2016 10:47:36 -0500	[thread overview]
Message-ID: <01a401d1e10b$b51afe30$1f50fa90$@opengridcomputing.com> (raw)
In-Reply-To: <012601d1e104$5cde3400$169a9c00$@opengridcomputing.com>

> > > Hey Sagi, here is some lite reading for you. :)
> > >
> > > Prelude:  As part of disconnecting an iwarp connection, the iwarp provider
> needs
> > > to post an IW_CM_EVENT_CLOSE event to iw_cm, which is scheduled onto the
> > > singlethread workq thread for iw_cm.
> > >
> > > Here is what happens with Sagi's patch:
> > >
> > > nvme_rdma_device_unplug() calls nvme_rdma_stop_queue() which calls
> > > rdma_disconnect().  This triggers the disconnect.  iw_cxgb4 posts the
> > > IW_CM_EVENT_CLOSE to iw_cm, which ends up calling cm_close_handler() in
> the
> > > iw_cm workq thread context.  cm_close_handler() calls the rdma_cm event
> > handler
> > > for this cm_id, function cm_iw_handler(), which blocks until any currently
> > > running event handler for this cm_id finishes.  It does this by calling
> > > cm_disable_callback().  However since this whole unplug process is running
> in
> > > the event handler function for this same cm_id, the iw_cm workq thread is
> now
> > > stuck in a deadlock.   nvme_rdma_device_unplug() however, continues on and
> > > schedules the controller delete worker thread and waits for it to
complete.
> The
> > > delete controller worker thread tries to disconnect and destroy all the
> > > remaining IO queues, but gets stuck in the destroy() path on the first IO
> queue
> > > because the iw_cm workq thread is already stuck, and processing the CLOSE
> > event
> > > is required to release a reference the iw_cm has on the iwarp providers
qp.
> So
> > > everything comes to a grinding halt....
> > >
> > > Now: Ming's 2 patches avoid this deadlock because the cm_id that received
> the
> > > device removal event is disconnected/destroyed _only after_ all the
> controller
> > > queues are disconnected/destroyed.  So nvme_rdma_device_unplug() doesn't
> get
> > > stuck waiting for the controller to delete the io queues, and only after
> that
> > > completes, does it delete the cm_id/qp that got the device removal event.
> It
> > > then returns thus causing the rdma_cm to release the cm_id's callback
mutex.
> > > This causes the iw_cm workq thread to now unblock and we continue on.
(can
> > you
> > > say house of cards?)
> > >
> > > So the net is:  the cm_id that received the device remove event _must_ be
> > > disconnect/destroyed _last_.
> >
> > Hey Steve, thanks for the detailed description.
> >
> > IMHO, this really looks like a buggy design in iWARP connection
> > management implementation. The fact that a rdma_disconnect forward
> > progress is dependent on other cm_id execution looks wrong and
> > backwards to me.
> >
> 
> Yea.  I'm not sure how to fix it at this point...
> 
> > Given that the device is being removed altogether I don't think that
> > calling rdma_disconnect is important at all. Given the fact that
> > ib_drain_qp makes sure that the queue-pair is in error state does
> > this incremental patch resolve the iwarp-deadlock you are seeing?
> >
> 
> I'll need to change c4iw_drain_sq/rq() to move the QP to ERROR explicitly.
When
> I implemented them I assumed the QP would be disconnected.  Let me try this
and
> report back.

Just doing the drain has the same issue because moving the QP to ERROR does the
same process as a disconnect wrt the provider/IWCM interactions.

The reason for this close provider/iwcm interaction is needed is due to the
IWARP relationship between QPs and the TCP connection (and thus the cm_id).  The
iwarp driver stack manipulates the QP state based on the connection events
directly. For example, the transition from IDLE -> RTS is done _only_ by the
iwarp driver stack.  Similarly, if the peer closes the TCP connection (or
aborts), then the local iwarp driver stack will move the QP out of RTS into
CLOSING or ERROR.  I think this is different from IB and the IBCM.  This in
connection with the IWCM maintaining a reference on the QP until the iwarp
driver posts a CLOSED event for its iw_cm_id is causing the problem here.  I'm
not justifying the existing logic, just trying to explain.

Perhaps we can explore changing the iw_cm and the iWARP parts of the rdma_cm to
fix this problem...  I guess the main problem is that qp disconnect and destroy
basically cannot be done in the event handler.  Granted it works if you only
destroy the qp associated with the cm_id handling the event, but that still
causes a deadlock, its just that the event handler can return and thus unblock
the deadlock.

next prev parent reply	other threads:[~2016-07-18 15:47 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-13 21:26 [PATCH 0/2] nvme-rdma: device removal crash fixes Ming Lin
2016-07-13 21:26 ` [PATCH 1/2] nvme-rdma: grab reference for device removal event Ming Lin
2016-07-13 21:33   ` Steve Wise
2016-07-13 21:26 ` [PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl Ming Lin
2016-07-13 21:33   ` Steve Wise
2016-07-13 23:19   ` J Freyensee
2016-07-13 23:36     ` Ming Lin
2016-07-13 23:59       ` J Freyensee
2016-07-14  6:39         ` Ming Lin
2016-07-14 17:09           ` J Freyensee
2016-07-14 18:04             ` Ming Lin
2016-07-14  9:15   ` Sagi Grimberg
2016-07-14  9:17     ` Sagi Grimberg
2016-07-14 14:30       ` Steve Wise
2016-07-14 14:44         ` Sagi Grimberg
2016-07-14 14:59     ` Steve Wise
     [not found]     ` <011301d1dde0$4450e4e0$ccf2aea0$@opengridcomputing.com>
2016-07-14 15:02       ` Steve Wise
2016-07-14 15:26         ` Steve Wise
2016-07-14 21:27           ` Steve Wise
2016-07-15 15:52             ` Steve Wise
2016-07-17  6:01               ` Sagi Grimberg
2016-07-18 14:55                 ` Steve Wise
2016-07-18 15:47                   ` Steve Wise [this message]
2016-07-18 16:34                     ` Steve Wise
2016-07-18 18:04                       ` Steve Wise
2016-07-13 21:58 ` [PATCH 0/2] nvme-rdma: device removal crash fixes Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='01a401d1e10b$b51afe30$1f50fa90$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).