From: swise@opengridcomputing.com (Steve Wise)
Subject: [PATCH v5 0/6] nvme-rdma device removal fixes
Date: Fri, 2 Sep 2016 09:23:40 -0700 [thread overview]
Message-ID: <cover.1472833420.git.swise@opengridcomputing.com> (raw)
This series addresses several problems when shutting down a nvme-rdma
host when its controllers are attempting to reconnect to a target that
is no longer reachable.
To tickle these bugs:
1) attach over iw_cxgb4 to 10 devices on a target.
2) 'ifconfig down' the target's interface
3) wait for keep-alive to fire and begin reconnecting (~15-20 seconds)
4) do one of these on the host:
- rmmod iw_cxgb4
- reboot
- reboot -f
Doug/Sagi, the first 2 iw_cxgb4 patches are included here because they're
needed for the testing. While I've also submitted them to linux-rdma,
perhaps Sagi can merge them in nvmf-4.8-rc? If that is acceptable
with everyone.
Patch series:
1/6 iw_cxgb4: call dev_put() on l2t allocation failure
2/6 iw_cxgb4: block module unload until all ep resources are released
3/6 nvme_rdma: keep a ref on the ctrl during delete/flush
4/6 nvme-rdma: destroy nvme queue rdma resources on connect failure
5/6 nvme-rdma: add DELETING queue flag
6/6 nvme-rdma: use ib_client API to detect device removal
Changes since v4:
- nvme_rdma_init_queue() remove (incorrect) code to check the
ALLOCATED flag. Instead just call nvme_rdma_destroy_queue_ib() and
let it check the ALLOCATED flag.
- nvme_rdma_destroy_queue_ib() - only access queue->device if the
queue was allocated.
Changes since v3:
- removed WIP/RFC tag
- fixed a bug in patch 4 where a rdma reject from the target causes a
double free of the ib queue and cm_id.
- remove noisy pr_info()s in patch 6
- kref_get -> kref_get_unless_zero in nvme_rdma_del_ctrl() of patch 3
- add reviewed-by tags
Changes since v2:
- refactor/simplify the remove_one function.
- nvme-rdma module remove function doesn't need to explicitly
remove the controllers; they will be removed as part of
ib_client unregister.
- removed forward declarations.
Changes since v1:
- the big change was patch 6 rewrite - use client_ib API to handle
device removal instead of rdma_cm device removal events.
- tweaked patch 5 to avoid bisect issues
- small code rework on patch 3 based on Christoph's suggestion
- clear_bit() -> !test_and_clear_bit() in patch 4 (Christoph's comment)
- add reviewed-by tags.
---
Sagi Grimberg (1):
nvme-rdma: add DELETING queue flag
Steve Wise (5):
iw_cxgb4: call dev_put() on l2t allocation failure
iw_cxgb4: block module unload until all ep resources are released
nvme_rdma: keep a ref on the ctrl during delete/flush
nvme-rdma: destroy nvme queue rdma resources on connect failure
nvme-rdma: use ib_client API to detect device removal
drivers/infiniband/hw/cxgb4/cm.c | 6 +-
drivers/infiniband/hw/cxgb4/device.c | 5 ++
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 1 +
drivers/nvme/host/rdma.c | 142 ++++++++++++++++-----------------
4 files changed, 80 insertions(+), 74 deletions(-)
--
2.7.0
WARNING: multiple messages have this Message-ID (diff)
From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org,
hch-jcswGhMUV9g@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [PATCH v5 0/6] nvme-rdma device removal fixes
Date: Fri, 2 Sep 2016 09:23:40 -0700 [thread overview]
Message-ID: <cover.1472833420.git.swise@opengridcomputing.com> (raw)
This series addresses several problems when shutting down a nvme-rdma
host when its controllers are attempting to reconnect to a target that
is no longer reachable.
To tickle these bugs:
1) attach over iw_cxgb4 to 10 devices on a target.
2) 'ifconfig down' the target's interface
3) wait for keep-alive to fire and begin reconnecting (~15-20 seconds)
4) do one of these on the host:
- rmmod iw_cxgb4
- reboot
- reboot -f
Doug/Sagi, the first 2 iw_cxgb4 patches are included here because they're
needed for the testing. While I've also submitted them to linux-rdma,
perhaps Sagi can merge them in nvmf-4.8-rc? If that is acceptable
with everyone.
Patch series:
1/6 iw_cxgb4: call dev_put() on l2t allocation failure
2/6 iw_cxgb4: block module unload until all ep resources are released
3/6 nvme_rdma: keep a ref on the ctrl during delete/flush
4/6 nvme-rdma: destroy nvme queue rdma resources on connect failure
5/6 nvme-rdma: add DELETING queue flag
6/6 nvme-rdma: use ib_client API to detect device removal
Changes since v4:
- nvme_rdma_init_queue() remove (incorrect) code to check the
ALLOCATED flag. Instead just call nvme_rdma_destroy_queue_ib() and
let it check the ALLOCATED flag.
- nvme_rdma_destroy_queue_ib() - only access queue->device if the
queue was allocated.
Changes since v3:
- removed WIP/RFC tag
- fixed a bug in patch 4 where a rdma reject from the target causes a
double free of the ib queue and cm_id.
- remove noisy pr_info()s in patch 6
- kref_get -> kref_get_unless_zero in nvme_rdma_del_ctrl() of patch 3
- add reviewed-by tags
Changes since v2:
- refactor/simplify the remove_one function.
- nvme-rdma module remove function doesn't need to explicitly
remove the controllers; they will be removed as part of
ib_client unregister.
- removed forward declarations.
Changes since v1:
- the big change was patch 6 rewrite - use client_ib API to handle
device removal instead of rdma_cm device removal events.
- tweaked patch 5 to avoid bisect issues
- small code rework on patch 3 based on Christoph's suggestion
- clear_bit() -> !test_and_clear_bit() in patch 4 (Christoph's comment)
- add reviewed-by tags.
---
Sagi Grimberg (1):
nvme-rdma: add DELETING queue flag
Steve Wise (5):
iw_cxgb4: call dev_put() on l2t allocation failure
iw_cxgb4: block module unload until all ep resources are released
nvme_rdma: keep a ref on the ctrl during delete/flush
nvme-rdma: destroy nvme queue rdma resources on connect failure
nvme-rdma: use ib_client API to detect device removal
drivers/infiniband/hw/cxgb4/cm.c | 6 +-
drivers/infiniband/hw/cxgb4/device.c | 5 ++
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 1 +
drivers/nvme/host/rdma.c | 142 ++++++++++++++++-----------------
4 files changed, 80 insertions(+), 74 deletions(-)
--
2.7.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next reply other threads:[~2016-09-02 16:23 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-02 16:23 Steve Wise [this message]
2016-09-02 16:23 ` [PATCH v5 0/6] nvme-rdma device removal fixes Steve Wise
2016-09-01 13:43 ` [PATCH v5 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
2016-09-01 13:43 ` Steve Wise
2016-09-01 13:44 ` [PATCH v5 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
2016-09-01 13:44 ` Steve Wise
2016-09-01 16:12 ` [PATCH v5 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
2016-09-01 16:12 ` Steve Wise
2016-09-02 16:01 ` [PATCH v5 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
2016-09-02 16:01 ` Steve Wise
2016-09-02 16:01 ` [PATCH v5 5/6] nvme-rdma: add DELETING queue flag Steve Wise
2016-09-02 16:01 ` Steve Wise
2016-09-02 16:01 ` [PATCH v5 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
2016-09-02 16:01 ` Steve Wise
2016-09-04 7:01 ` [PATCH v5 0/6] nvme-rdma device removal fixes Sagi Grimberg
2016-09-04 7:01 ` Sagi Grimberg
2016-09-12 14:49 ` Steve Wise
2016-09-12 14:49 ` Steve Wise
2016-09-12 14:58 ` Sagi Grimberg
2016-09-12 14:58 ` Sagi Grimberg
2016-09-12 15:00 ` Steve Wise
2016-09-12 15:00 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1472833420.git.swise@opengridcomputing.com \
--to=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.