From: swise@opengridcomputing.com (Steve Wise)
Subject: [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes
Date: Fri, 26 Aug 2016 06:53:05 -0700 [thread overview]
Message-ID: <cover.1472219585.git.swise@opengridcomputing.com> (raw)
This series is a Work In Progress (WIP) attempting to address several
problems when shutting down a nvme-rdma host when its controllers are
attempting to reconnect to a target that is no longer reachable.
I'm still testing, and there is at least one outstanding bug I'm still
chasing, but I welcome review. Specifically the last patch which
solves the problem of always being able to detect a device removal.
To tickle these bugs:
1) attach over iw_cxgb4 to 10 devices on a target.
2) 'ifconfig down' the target's interface
3) wait for keep-alive to fire and begin reconnecting (~15-20 seconds)
4) do one of these on the host:
- rmmod iw_cxgb4
- reboot
- reboot -f
Note, the default configuration of the chelsio RNIC is for a lan/wan
environment. This causes very long delays due to TCP retransmit
backoff algorithms and basically hangs the host during a shutdown for an
unreasonable amount of time. This is further complicated by the fact
that the RDMA_CM blocks cm_id destruction if the cm_id is attempting
connection setup. I will address this issue with another series.
To work around the problem, the chelsio RNIC can be configured for a
storage cluster environment, where the retransmit timeout times are
much shorter. If anyone is doing this sort of testing, I can provide
you with a config file for the storage/cluster configuration.
Sagi, I included your DELETEING patch since it is needed to make forward
progress on my device removal testing.
Thanks,
Steve.
----
Sagi Grimberg (1):
nvme-rdma: add DELETING queue flag
Steve Wise (5):
iw_cxgb4: call dev_put() on l2t allocation failure
iw_cxgb4: block module unload until all ep resources are released
nvme_rdma: keep a ref on the ctrl during delete/flush
nvme-rdma: destroy nvme queue rdma resources on connect failure
nvme-rdma: keep a cm_id around during reconnect to get events
drivers/infiniband/hw/cxgb4/cm.c | 7 +-
drivers/infiniband/hw/cxgb4/device.c | 5 +
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 1 +
drivers/nvme/host/rdma.c | 170 ++++++++++++++++++++++++++-------
4 files changed, 149 insertions(+), 34 deletions(-)
--
2.7.0
next reply other threads:[~2016-08-26 13:53 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-26 13:53 Steve Wise [this message]
2016-08-25 20:49 ` [PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
2016-08-28 12:42 ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
2016-08-28 12:43 ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
2016-08-26 14:38 ` Christoph Hellwig
2016-08-26 14:41 ` Steve Wise
2016-08-28 12:45 ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
2016-08-26 14:39 ` Christoph Hellwig
2016-08-26 14:42 ` Steve Wise
2016-08-28 12:44 ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 5/6] nvme-rdma: add DELETING queue flag Steve Wise
2016-08-26 14:14 ` Steve Wise
2016-08-28 12:48 ` Sagi Grimberg
2016-08-26 13:52 ` [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events Steve Wise
2016-08-26 14:41 ` Christoph Hellwig
2016-08-26 14:48 ` Steve Wise
2016-08-28 12:56 ` Sagi Grimberg
2016-08-29 7:30 ` Christoph Hellwig
2016-08-29 14:32 ` Sagi Grimberg
2016-08-29 19:42 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1472219585.git.swise@opengridcomputing.com \
--to=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.