linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events
Date: Mon, 29 Aug 2016 14:42:19 -0500	[thread overview]
Message-ID: <01c101d2022d$740f7680$5c2e6380$@opengridcomputing.com> (raw)
In-Reply-To: <13f597d1-0dd3-9c9e-9658-209f6817600a@grimberg.me>

> >> Care to respin your client registration patch so we can judge which
> >> is better?
> >
> > FYI, I also really hate the idea of having to potentially allocate
> > resources on each device at driver load time which the client registration
> > forces us into.
> 
> The client registration doesn't force us to allocate anything.
> It's simply for us trigger cleanups when the device is unplugged...
> 
> static void nvme_rdma_add_one(struct ib_device *device)
> {
> 	/* Do nothing */
> }
> 
> static void nvme_rdma_remove_one(struct ib_device *device,
> 		void *cdata)
> {
> 	/*
> 	 * for each ctrl where (ctrl->dev->device == device)
> 	 * 	queue delete controller
> 	 *
> 	 * flush the workqueue
> 	 */
> }
> 
> static struct ib_client nvme_rdma_client = {
>          .name   = "nvme_rdma",
>          .add    = nvme_rdma_add_one,
>          .remove = nvme_rdma_remove_one
> };
> 
> 
> > I really think we need to take a step back and offer interfaces that don't
> > suck in the core instead of trying to work around RDMA/CM in the core.
> > Unfortunately I don't really know what it takes for that yet.  I'm pretty
> > busy this work, but I'd be happy to reserve a lot of time next week to
> > dig into it unless someone beats me.
> 
> I agree we have *plenty* of room to improve in the RDMA_CM interface.
> But this particular problem is the fact that we might get a device
> removal right in the moment where we have no cm_id's open because we
> are in the middle of periodic reconnects. This is why we can't even see
> the event.
> 
> What sort of interface that would help here did you have in mind?
> 
> > I suspect a big part of that is having a queue state machine in the core,
> 
> We have a queue-pair state machine in the core, but currently it's not
> very useful for the consumers, and the silly thing is that it's not
> represented in the ib_qp struct and needs a ib_query_qp to figure it
> out (one of the reasons is that the QP states and their transitions
> are detailed in the different specs and not all of them are
> synchronous).
> 
> > and getting rid of that horrible RDMA/CM event multiplexer.
> 
> That would be very nice improvement...
> 

So should I respin the ib_client patch to just do device removal, or am I
wasting my time?

      reply	other threads:[~2016-08-29 19:42 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-26 13:53 [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes Steve Wise
2016-08-25 20:49 ` [PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
2016-08-28 12:42   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
2016-08-28 12:43   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
2016-08-26 14:38   ` Christoph Hellwig
2016-08-26 14:41     ` Steve Wise
2016-08-28 12:45   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
2016-08-26 14:39   ` Christoph Hellwig
2016-08-26 14:42     ` Steve Wise
2016-08-28 12:44   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 5/6] nvme-rdma: add DELETING queue flag Steve Wise
2016-08-26 14:14   ` Steve Wise
2016-08-28 12:48     ` Sagi Grimberg
2016-08-26 13:52 ` [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events Steve Wise
2016-08-26 14:41   ` Christoph Hellwig
2016-08-26 14:48     ` Steve Wise
2016-08-28 12:56   ` Sagi Grimberg
2016-08-29  7:30     ` Christoph Hellwig
2016-08-29 14:32       ` Sagi Grimberg
2016-08-29 19:42         ` Steve Wise [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='01c101d2022d$740f7680$5c2e6380$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).