[PATCH v2 0/2] nvme_fc: rework io timeouts and termination

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] nvme_fc: rework io timeouts and termination
@ 2017-10-19 23:11 James Smart
  2017-10-19 23:11 ` [PATCH v2 1/2] nvme_fc: correct io termination handling James Smart
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: James Smart @ 2017-10-19 23:11 UTC (permalink / raw)


This is a rework of a prior patch that has now been split into two
patches. One patch addresses changes to the io timeout error handler.
The other patch addresses io completions for failing ios to ensure
io retries or to fail commands during association create.

Patches also address prior review comments

James Smart (2):
  nvme_fc: correct io termination handling
  nvme_fc: correct io timeout behavior

 drivers/nvme/host/fc.c | 45 ++++++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 17 deletions(-)

-- 
2.13.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] nvme_fc: correct io termination handling
  2017-10-19 23:11 [PATCH v2 0/2] nvme_fc: rework io timeouts and termination James Smart
@ 2017-10-19 23:11 ` James Smart
  2017-10-20  8:15   ` Johannes Thumshirn
  2017-10-19 23:11 ` [PATCH v2 2/2] nvme_fc: correct io timeout behavior James Smart
  2017-10-20 10:13 ` [PATCH v2 0/2] nvme_fc: rework io timeouts and termination Christoph Hellwig
  2 siblings, 1 reply; 6+ messages in thread
From: James Smart @ 2017-10-19 23:11 UTC (permalink / raw)


The io completion handling for i/o's that are failing due to
to a transport error or association termination had issues, causing
io failures (DNR set so retries didn't kick in) or long stalls.

Change the io completion handler for the following items:

When an io has been completed due to a transport abort (based on an
exchange error) or when marked as aborted as part of an association
termination (FCOP_FLAGS_TERMIO), set the NVME completion status to
NVME_SC_ABORTED. By default, do not set DNR on the status so that a
retry can be attempted after association recreate.

In cases where an io is failed (non-successful nvme status including
aborted), if the controller is being deleted (blk_queue_dying) or
the io was part of the ios used for association creation (ctrl state
is NEW or RECONNECTING), then additionally set the DNR bit so the io
will not be retried. If the failed io was part of association creation,
the failure will tear down the partially completioned association and
typically restart a new reconnect attempt (another create association
later).

Rearranged code flow to remove a largely unneeded local variable.

Signed-off-by: James Smart <james.smart at broadcom.com>
---
 drivers/nvme/host/fc.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 18e036c6833b..ce8617188d3a 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1387,7 +1387,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 	struct nvme_command *sqe = &op->cmd_iu.sqe;
 	__le16 status = cpu_to_le16(NVME_SC_SUCCESS << 1);
 	union nvme_result result;
-	bool complete_rq, terminate_assoc = true;
+	bool terminate_assoc = true;
 
 	/*
 	 * WARNING:
@@ -1429,8 +1429,9 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 	fc_dma_sync_single_for_cpu(ctrl->lport->dev, op->fcp_req.rspdma,
 				sizeof(op->rsp_iu), DMA_FROM_DEVICE);
 
-	if (atomic_read(&op->state) == FCPOP_STATE_ABORTED)
-		status = cpu_to_le16((NVME_SC_ABORT_REQ | NVME_SC_DNR) << 1);
+	if (atomic_read(&op->state) == FCPOP_STATE_ABORTED ||
+			op->flags & FCOP_FLAGS_TERMIO)
+		status = cpu_to_le16(NVME_SC_ABORT_REQ << 1);
 	else if (freq->status)
 		status = cpu_to_le16(NVME_SC_INTERNAL << 1);
 
@@ -1494,23 +1495,27 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 done:
 	if (op->flags & FCOP_FLAGS_AEN) {
 		nvme_complete_async_event(&queue->ctrl->ctrl, status, &result);
-		complete_rq = __nvme_fc_fcpop_chk_teardowns(ctrl, op);
+		__nvme_fc_fcpop_chk_teardowns(ctrl, op);
 		atomic_set(&op->state, FCPOP_STATE_IDLE);
 		op->flags = FCOP_FLAGS_AEN;	/* clear other flags */
 		nvme_fc_ctrl_put(ctrl);
 		goto check_error;
 	}
 
-	complete_rq = __nvme_fc_fcpop_chk_teardowns(ctrl, op);
-	if (!complete_rq) {
-		if (unlikely(op->flags & FCOP_FLAGS_TERMIO)) {
-			status = cpu_to_le16(NVME_SC_ABORT_REQ << 1);
-			if (blk_queue_dying(rq->q))
-				status |= cpu_to_le16(NVME_SC_DNR << 1);
-		}
-		nvme_end_request(rq, status, result);
-	} else
+	/*
+	 * Force failures of commands if we're killing the controller
+	 * or have an error on a command used to create an new association
+	 */
+	if (status &&
+	    (blk_queue_dying(rq->q) ||
+	     ctrl->ctrl.state == NVME_CTRL_NEW ||
+	     ctrl->ctrl.state == NVME_CTRL_RECONNECTING))
+		status |= cpu_to_le16(NVME_SC_DNR << 1);
+
+	if (__nvme_fc_fcpop_chk_teardowns(ctrl, op))
 		__nvme_fc_final_op_cleanup(rq);
+	else
+		nvme_end_request(rq, status, result);
 
 check_error:
 	if (terminate_assoc)
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] nvme_fc: correct io timeout behavior
  2017-10-19 23:11 [PATCH v2 0/2] nvme_fc: rework io timeouts and termination James Smart
  2017-10-19 23:11 ` [PATCH v2 1/2] nvme_fc: correct io termination handling James Smart
@ 2017-10-19 23:11 ` James Smart
  2017-10-20  8:18   ` Johannes Thumshirn
  2017-10-20 10:13 ` [PATCH v2 0/2] nvme_fc: rework io timeouts and termination Christoph Hellwig
  2 siblings, 1 reply; 6+ messages in thread
From: James Smart @ 2017-10-19 23:11 UTC (permalink / raw)


The transport io timeout behavior wasn't quite correct. It ignored
that the io error handler is supposed to be synchronous so it possibly
allowed the blk request to be restarted while the io associated was
still aborting. Timeouts on reserved commands, those used for
association create, were never timing out thus they hung out forever.

To correct:
If an io is times out while a remoteport is not connected, just
restart the io timer. The lack of connectivity will simultaneously
be resetting the controller, so the reset path will abort and terminate
the io.

If an io is times out while it was marked for transport abort, just
reset the io timer. The abort process is underway and will complete
the io.

Otherwise, if an io times out, abort the io. If the abort was
unsuccessful (unlikely) give up and return not handled.

If the abort was successful, as the abort process is underway it will
terminate the io, so rather than synchronously waiting, just restart
the io timer.

Signed-off-by: James Smart <james.smart at broadcom.com>
---
 drivers/nvme/host/fc.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index ce8617188d3a..78091bc7af69 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1903,13 +1903,14 @@ nvme_fc_timeout(struct request *rq, bool reserved)
 	struct nvme_fc_ctrl *ctrl = op->ctrl;
 	int ret;
 
-	if (reserved)
+	if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE ||
+			atomic_read(&op->state) == FCPOP_STATE_ABORTED)
 		return BLK_EH_RESET_TIMER;
 
 	ret = __nvme_fc_abort_op(ctrl, op);
 	if (ret)
-		/* io wasn't active to abort consider it done */
-		return BLK_EH_HANDLED;
+		/* io wasn't active to abort */
+		return BLK_EH_NOT_HANDLED;
 
 	/*
 	 * we can't individually ABTS an io without affecting the queue,
@@ -1920,7 +1921,12 @@ nvme_fc_timeout(struct request *rq, bool reserved)
 	 */
 	nvme_fc_error_recovery(ctrl, "io timeout error");
 
-	return BLK_EH_HANDLED;
+	/*
+	 * the io abort has been initiated. Have the reset timer
+	 * restarted and the abort completion will complete the io
+	 * shortly. Avoids a synchronous wait while the abort finishes.
+	 */
+	return BLK_EH_RESET_TIMER;
 }
 
 static int
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] nvme_fc: correct io termination handling
  2017-10-19 23:11 ` [PATCH v2 1/2] nvme_fc: correct io termination handling James Smart
@ 2017-10-20  8:15   ` Johannes Thumshirn
  0 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2017-10-20  8:15 UTC (permalink / raw)


Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] nvme_fc: correct io timeout behavior
  2017-10-19 23:11 ` [PATCH v2 2/2] nvme_fc: correct io timeout behavior James Smart
@ 2017-10-20  8:18   ` Johannes Thumshirn
  0 siblings, 0 replies; 6+ messages in thread
From: Johannes Thumshirn @ 2017-10-20  8:18 UTC (permalink / raw)


Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 0/2] nvme_fc: rework io timeouts and termination
  2017-10-19 23:11 [PATCH v2 0/2] nvme_fc: rework io timeouts and termination James Smart
  2017-10-19 23:11 ` [PATCH v2 1/2] nvme_fc: correct io termination handling James Smart
  2017-10-19 23:11 ` [PATCH v2 2/2] nvme_fc: correct io timeout behavior James Smart
@ 2017-10-20 10:13 ` Christoph Hellwig
  2 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2017-10-20 10:13 UTC (permalink / raw)


Thanks,

applied the series to nvme-4.15.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-10-20 10:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-19 23:11 [PATCH v2 0/2] nvme_fc: rework io timeouts and termination James Smart
2017-10-19 23:11 ` [PATCH v2 1/2] nvme_fc: correct io termination handling James Smart
2017-10-20  8:15   ` Johannes Thumshirn
2017-10-19 23:11 ` [PATCH v2 2/2] nvme_fc: correct io timeout behavior James Smart
2017-10-20  8:18   ` Johannes Thumshirn
2017-10-20 10:13 ` [PATCH v2 0/2] nvme_fc: rework io timeouts and termination Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).