* [PATCH v2] nvme-rdma: fix possible hang when issuing commands during ctrl removal
@ 2017-10-23 13:04 Sagi Grimberg
2017-10-23 14:25 ` Christoph Hellwig
2017-10-23 14:42 ` Max Gurtovoy
0 siblings, 2 replies; 3+ messages in thread
From: Sagi Grimberg @ 2017-10-23 13:04 UTC (permalink / raw)
nvme_rdma_queue_is_ready() fails requests in case a queue is not
LIVE. If the controller is in RECONNECTING state, we might be in
this state for a long time (until we successfully reconnect) and
we are better off with failing the request fast. Otherwise, we
fail with BLK_STS_RESOURCE to have the block layer try again
soon.
In case we are removing the controller when the admin queue
is not LIVE, we will terminate the request with BLK_STS_RESOURCE
but it happens before we call blk_mq_start_request() so the
request timeout never expires, and the queue will never get
back to LIVE (because we are removing the controller). This
causes the removal operation to block infinitly [1].
Thus, if we are removing (state DELETING), and the queue is
not LIVE, we need to fail the request permanently as there is
no chance for it to ever complete successfully.
[1]
--
sysrq: SysRq : Show Blocked State
task PC stack pid father
kworker/u66:2 D 0 440 2 0x80000000
Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma]
Call Trace:
__schedule+0x3e9/0xb00
schedule+0x40/0x90
schedule_timeout+0x221/0x580
io_schedule_timeout+0x1e/0x50
wait_for_completion_io_timeout+0x118/0x180
blk_execute_rq+0x86/0xc0
__nvme_submit_sync_cmd+0x89/0xf0
nvmf_reg_write32+0x4b/0x90 [nvme_fabrics]
nvme_shutdown_ctrl+0x41/0xe0
nvme_rdma_shutdown_ctrl+0xca/0xd0 [nvme_rdma]
nvme_rdma_remove_ctrl+0x2b/0x40 [nvme_rdma]
nvme_rdma_del_ctrl_work+0x25/0x30 [nvme_rdma]
process_one_work+0x1fd/0x630
worker_thread+0x1db/0x3b0
kthread+0x11e/0x150
ret_from_fork+0x27/0x40
01 D 0 2868 2862 0x00000000
Call Trace:
__schedule+0x3e9/0xb00
schedule+0x40/0x90
schedule_timeout+0x260/0x580
wait_for_completion+0x108/0x170
flush_work+0x1e0/0x270
nvme_rdma_del_ctrl+0x5a/0x80 [nvme_rdma]
nvme_sysfs_delete+0x2a/0x40
dev_attr_store+0x18/0x30
sysfs_kf_write+0x45/0x60
kernfs_fop_write+0x124/0x1c0
__vfs_write+0x28/0x150
vfs_write+0xc7/0x1b0
SyS_write+0x49/0xa0
entry_SYSCALL_64_fastpath+0x18/0xad
--
Reported-by: Bart Van Assche <bart.vanassche at wdc.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
Changes from v1:
- squashed to ctrl state conditions in nvme_rdma_queue_is_ready
drivers/nvme/host/rdma.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8a4a4a5836a8..fd26bcb9e6b7 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1386,12 +1386,15 @@ nvme_rdma_queue_is_ready(struct nvme_rdma_queue *queue, struct request *rq)
/*
* reconnecting state means transport disruption, which
* can take a long time and even might fail permanently,
- * so we can't let incoming I/O be requeued forever.
- * fail it fast to allow upper layers a chance to
- * failover.
+ * fail fast to give upper layers a chance to failover.
+ * deleting state means that the ctrl will never accept
+ * commands again, fail it permanently.
*/
- if (queue->ctrl->ctrl.state == NVME_CTRL_RECONNECTING)
+ if (queue->ctrl->ctrl.state == NVME_CTRL_RECONNECTING ||
+ queue->ctrl->ctrl.state == NVME_CTRL_DELETING) {
+ nvme_req(rq)->status = NVME_SC_ABORT_REQ;
return BLK_STS_IOERR;
+ }
return BLK_STS_RESOURCE; /* try again later */
}
}
--
2.7.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v2] nvme-rdma: fix possible hang when issuing commands during ctrl removal
2017-10-23 13:04 [PATCH v2] nvme-rdma: fix possible hang when issuing commands during ctrl removal Sagi Grimberg
@ 2017-10-23 14:25 ` Christoph Hellwig
2017-10-23 14:42 ` Max Gurtovoy
1 sibling, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2017-10-23 14:25 UTC (permalink / raw)
Looks fine, I'll queue it up for 4.14.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v2] nvme-rdma: fix possible hang when issuing commands during ctrl removal
2017-10-23 13:04 [PATCH v2] nvme-rdma: fix possible hang when issuing commands during ctrl removal Sagi Grimberg
2017-10-23 14:25 ` Christoph Hellwig
@ 2017-10-23 14:42 ` Max Gurtovoy
1 sibling, 0 replies; 3+ messages in thread
From: Max Gurtovoy @ 2017-10-23 14:42 UTC (permalink / raw)
>
> Reported-by: Bart Van Assche <bart.vanassche at wdc.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
> Changes from v1:
> - squashed to ctrl state conditions in nvme_rdma_queue_is_ready
>
> drivers/nvme/host/rdma.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
Looks good,
Reviewed-by: Max Gurtovoy <maxg at mellanox.com>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-10-23 14:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-23 13:04 [PATCH v2] nvme-rdma: fix possible hang when issuing commands during ctrl removal Sagi Grimberg
2017-10-23 14:25 ` Christoph Hellwig
2017-10-23 14:42 ` Max Gurtovoy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.