linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: 'Sagi Grimberg' <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	mlin-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Subject: RE: [PATCH RFC 0/3] iwarp device removal deadlock fix
Date: Wed, 20 Jul 2016 08:49:06 -0500	[thread overview]
Message-ID: <027201d1e28d$7be227a0$73a676e0$@opengridcomputing.com> (raw)
In-Reply-To: <578F3A90.1000208-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

> > This RFC series attempts to address the deadlock issue discovered
> > while testing nvmf/rdma handling rdma device removal events from
> > the rdma_cm.
> 
> Thanks for doing this Steve!
> 
> > For a discussion of the deadlock that can happen, see
> >
> > http://lists.infradead.org/pipermail/linux-nvme/2016-July/005440.html.
> >
> > For my description of the deadlock itself, see this post in the above
thread:
> >
> > http://lists.infradead.org/pipermail/linux-nvme/2016-July/005465.html
> >
> > In a nutshell, iw_cxgb4 and the iw_cm block during qp/cm_id destruction
> > until all references are removed.  This combined with the iwarp CM passing
> > disconnect events up to the rdma_cm during disconnect and/or qp/cm_id
> destruction
> > leads to a deadlock.
> >
> > My proposed solution is to remove the need for iw_cxgb4 and iw_cm to
> > block during object destruction for the recnts to reach 0, but rather to
> > let the freeing of the object memory be deferred when the last deref is
> > done. This allows all the qps/cm_ids to be destroyed without blocking, and
> > all the object memory freeing ends up happinging when the application's
> > device_remove event handler function returns to the rdma_cm.
> 
> This sounds like a very good approach moving forward.
> 
> > Sean, I was hoping you could have a look at the iwcm.c patch particularly,
> > to tell my why its broken. :)  I spent some time trying to figure out
> > why we really need the CALLBACK_DESTROY flag, but I concluded it really
> > isn't needed.  The one side effect I see with my change, is that the
> > application could possibly get a cm_id event after it has destroyed the
> > cm_id.  There probably is a way to discard events that have a reference
> > on the cm_id but get processed after the app has destoyed the cm_id by
> > having a new flag indicating "destroyed by app".
> 

By the way, I think Sean is on sabbatical until 9/12. 

> That sounds easy enough. Does this mean that iwcm relies on the driver
> to do this or is it inter-operable with the existing logic? If not this
> will need to take care of all the iWARP drivers.

This can be handled all in the iw_cm module.  In fact, I'm testing a new version
of the iw_cm patch now.

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      parent reply	other threads:[~2016-07-20 13:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-18 21:58 [PATCH RFC 0/3] iwarp device removal deadlock fix Steve Wise
     [not found] ` <cover.1468879135.git.swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2016-07-18 20:44   ` [PATCH 1/3] iw_cm: free cm_id resources on the last deref Steve Wise
     [not found]     ` <93c3c47c16406ef00184011948424a9597e4c6b8.1468879135.git.swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2016-07-20  8:51       ` Sagi Grimberg
     [not found]         ` <578F3B92.2050803-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-07-20 13:51           ` Steve Wise
2016-07-21 14:17             ` Steve Wise
     [not found]             ` <045f01d1e35a$93618a60$ba249f20$@opengridcomputing.com>
2016-07-21 15:45               ` Steve Wise
2016-07-18 20:44   ` [PATCH 2/3] iw_cxgb4: don't block in destroy_qp awaiting " Steve Wise
     [not found]     ` <90b07add78b64320f5a4f99b8f71633214c1823c.1468879135.git.swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2016-07-20  8:52       ` Sagi Grimberg
2016-07-18 20:44   ` [PATCH 3/3] nvme-rdma: Fix device removal handling Sagi Grimberg
     [not found]     ` <0cb1ccaa920b3ec48dd94ea49fa0f0b7c5520d38.1468879135.git.swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2016-07-21  8:15       ` Christoph Hellwig
2016-07-22 18:37       ` Steve Wise
2016-07-20  8:47   ` [PATCH RFC 0/3] iwarp device removal deadlock fix Sagi Grimberg
     [not found]     ` <578F3A90.1000208-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-07-20 13:49       ` Steve Wise [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='027201d1e28d$7be227a0$73a676e0$@opengridcomputing.com' \
    --to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mlin-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org \
    --cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).