[PATCH v2] nvme: expand nvmf_check_if

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] nvme: expand nvmf_check_if_ready checks
@ 2018-03-27 23:07 James Smart
  2018-03-28  8:31 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: James Smart @ 2018-03-27 23:07 UTC (permalink / raw)


The nvmf_check_if_ready() checks that were added are very simplistic.
As such, the routine allows a lot of cases to fail ios during windows
of reset or re-connection. In cases where there are not multi-path
options present, the error goes back to the callee - the filesystem
or application. Not good.

The common routine was rewritten and calling syntax slightly expanded
so that per-transport is_ready routines don't need to be present.
The transports can call the routine directly. The routine now looks
at controller state to decide the action to take. Some states mandate
io failure. Others define the condition where a command can be accepted.
When the decision is unclear, a generic queue-or-reject check is made
to look for failfast or multipath ios and only fails the io if it is
so marked. Otherwise, the io will be queued and wait for the controller
state to resolve.

Signed-off-by: James Smart <james.smart at broadcom.com>

---
v2:
  needed to set nvme status when rejecting io
---
 drivers/nvme/host/fabrics.h | 86 +++++++++++++++++++++++++++++++++------------
 drivers/nvme/host/fc.c      | 12 ++-----
 drivers/nvme/host/rdma.c    | 14 ++------
 drivers/nvme/target/loop.c  | 11 ++----
 4 files changed, 71 insertions(+), 52 deletions(-)

diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index a3145d90c1d2..e8dfc00b28b6 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -158,35 +158,77 @@ void nvmf_free_options(struct nvmf_ctrl_options *opts);
 int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size);
 bool nvmf_should_reconnect(struct nvme_ctrl *ctrl);
 
-static inline blk_status_t nvmf_check_init_req(struct nvme_ctrl *ctrl,
-		struct request *rq)
+static inline blk_status_t nvmf_check_if_ready(struct nvme_ctrl *ctrl,
+		struct request *rq, bool qlive, bool connectivity)
 {
 	struct nvme_command *cmd = nvme_req(rq)->cmd;
 
-	/*
-	 * We cannot accept any other command until the connect command has
-	 * completed, so only allow connect to pass.
-	 */
-	if (!blk_rq_is_passthrough(rq) ||
-	    cmd->common.opcode != nvme_fabrics_command ||
-	    cmd->fabrics.fctype != nvme_fabrics_type_connect) {
+	if (ctrl->state == NVME_CTRL_LIVE && connectivity)
+		return BLK_STS_OK;
+
+	switch (ctrl->state) {
+	case NVME_CTRL_DELETING:
+		goto reject_io;
+
+	case NVME_CTRL_NEW:
+	case NVME_CTRL_CONNECTING:
+		if (!connectivity)
+			/*
+			 * This is the case of starting a new
+			 * association but connectivity was lost
+			 * before it was fully created. We need to
+			 * error the commands used to initialize the
+			 * controller so the reconnect can go into a
+			 * retry attempt. The commands should all be
+			 * marked REQ_FAILFAST_DRIVER, which will hit
+			 * the reject path below. Anything else will
+			 * be queued while the state settles.
+			 */
+			goto reject_or_queue_io;
+
+		if (qlive ||
+		    (cmd->common.opcode == nvme_fabrics_command &&
+		     cmd->fabrics.fctype == nvme_fabrics_type_connect))
+			/*
+			 * let anything to a live queue through.
+			 * Typically this will be commands to the admin
+			 * queue which are either being used to initialize
+			 * the controller or are commands being issued
+			 * via the cli/ioctl path.
+			 *
+			 * if the q isn't live, allow only the connect
+			 * command through.
+			 */
+			return BLK_STS_OK;
+
 		/*
-		 * Connecting state means transport disruption or initial
-		 * establishment, which can take a long time and even might
-		 * fail permanently, fail fast to give upper layers a chance
-		 * to failover.
-		 * Deleting state means that the ctrl will never accept commands
-		 * again, fail it permanently.
+		 * q isn't live to accept the command.
+		 * fall-thru to the reject_or_queue_io clause
 		 */
-		if (ctrl->state == NVME_CTRL_CONNECTING ||
-		    ctrl->state == NVME_CTRL_DELETING) {
-			nvme_req(rq)->status = NVME_SC_ABORT_REQ;
-			return BLK_STS_IOERR;
-		}
-		return BLK_STS_RESOURCE; /* try again later */
+		break;
+
+	/* these cases fall-thru
+	 * case NVME_CTRL_LIVE:
+	 * case NVME_CTRL_RESETTING:
+	 */
+	default:
+		break;
 	}
 
-	return BLK_STS_OK;
+reject_or_queue_io:
+	/*
+	 * Any other new io is something we're not in a state to send
+	 * to the device. Default action is to Busy it and retry it
+	 * after the controller state is recovered. However, anything
+	 * marked for failfast or nvme multipath is immediately failed.
+	 * Note: nvme cli/ioctl commands are marked for failfast.
+	 */
+	if (!blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
+		return BLK_STS_RESOURCE;
+
+reject_io:
+	nvme_req(rq)->status = NVME_SC_ABORT_REQ;
+	return BLK_STS_IOERR;
 }
 
 #endif /* _NVME_FABRICS_H */
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index c6e719b2f3ca..6cb26bcf6ec0 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2277,14 +2277,6 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue,
 	return BLK_STS_OK;
 }
 
-static inline blk_status_t nvme_fc_is_ready(struct nvme_fc_queue *queue,
-		struct request *rq)
-{
-	if (unlikely(!test_bit(NVME_FC_Q_LIVE, &queue->flags)))
-		return nvmf_check_init_req(&queue->ctrl->ctrl, rq);
-	return BLK_STS_OK;
-}
-
 static blk_status_t
 nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 			const struct blk_mq_queue_data *bd)
@@ -2300,7 +2292,9 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *hctx,
 	u32 data_len;
 	blk_status_t ret;
 
-	ret = nvme_fc_is_ready(queue, rq);
+	ret = nvmf_check_if_ready(&queue->ctrl->ctrl, rq,
+		test_bit(NVME_FC_Q_LIVE, &queue->flags),
+		ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE);
 	if (unlikely(ret))
 		return ret;
 
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f5f460b8045c..e65b762723a3 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1593,17 +1593,6 @@ nvme_rdma_timeout(struct request *rq, bool reserved)
 	return BLK_EH_HANDLED;
 }
 
-/*
- * We cannot accept any other command until the Connect command has completed.
- */
-static inline blk_status_t
-nvme_rdma_is_ready(struct nvme_rdma_queue *queue, struct request *rq)
-{
-	if (unlikely(!test_bit(NVME_RDMA_Q_LIVE, &queue->flags)))
-		return nvmf_check_init_req(&queue->ctrl->ctrl, rq);
-	return BLK_STS_OK;
-}
-
 static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 		const struct blk_mq_queue_data *bd)
 {
@@ -1619,7 +1608,8 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	WARN_ON_ONCE(rq->tag < 0);
 
-	ret = nvme_rdma_is_ready(queue, rq);
+	ret = nvmf_check_if_ready(&queue->ctrl->ctrl, rq,
+		test_bit(NVME_RDMA_Q_LIVE, &queue->flags), true);
 	if (unlikely(ret))
 		return ret;
 
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 861d1509b22b..e10987f87603 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -149,14 +149,6 @@ nvme_loop_timeout(struct request *rq, bool reserved)
 	return BLK_EH_HANDLED;
 }
 
-static inline blk_status_t nvme_loop_is_ready(struct nvme_loop_queue *queue,
-		struct request *rq)
-{
-	if (unlikely(!test_bit(NVME_LOOP_Q_LIVE, &queue->flags)))
-		return nvmf_check_init_req(&queue->ctrl->ctrl, rq);
-	return BLK_STS_OK;
-}
-
 static blk_status_t nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 		const struct blk_mq_queue_data *bd)
 {
@@ -166,7 +158,8 @@ static blk_status_t nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct nvme_loop_iod *iod = blk_mq_rq_to_pdu(req);
 	blk_status_t ret;
 
-	ret = nvme_loop_is_ready(queue, req);
+	ret = nvmf_check_if_ready(&queue->ctrl->ctrl, req,
+		test_bit(NVME_LOOP_Q_LIVE, &queue->flags), true);
 	if (unlikely(ret))
 		return ret;
 
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v2] nvme: expand nvmf_check_if_ready checks
  2018-03-27 23:07 [PATCH v2] nvme: expand nvmf_check_if_ready checks James Smart
@ 2018-03-28  8:31 ` Christoph Hellwig
  2018-03-28 16:21   ` James Smart
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2018-03-28  8:31 UTC (permalink / raw)


> +static inline blk_status_t nvmf_check_if_ready(struct nvme_ctrl *ctrl,
> +		struct request *rq, bool qlive, bool connectivity)

Please rename qlive to queue_live and explain what connectivity means.
Maye this should be is_connected?  How do we get a command on a not
conected queue?

Also I think the function is large enough now to move out of line.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2] nvme: expand nvmf_check_if_ready checks
  2018-03-28  8:31 ` Christoph Hellwig
@ 2018-03-28 16:21   ` James Smart
  0 siblings, 0 replies; 3+ messages in thread
From: James Smart @ 2018-03-28 16:21 UTC (permalink / raw)

On 3/28/2018 1:31 AM, Christoph Hellwig wrote:
>> +static inline blk_status_t nvmf_check_if_ready(struct nvme_ctrl *ctrl,
>> +		struct request *rq, bool qlive, bool connectivity)
> 
> Please rename qlive to queue_live and explain what connectivity means.
> Maye this should be is_connected?  How do we get a command on a not
> conected queue?
> 
> Also I think the function is large enough now to move out of line.
> 

the change requests are fine and I'll repost shortly.

it's fairly easy to get a command on a not connected queue during a 
reset or reconnect state.

Both rdma and fc unquiesce the admin queue blk-mq after the link side 
association is terminated.  rdma unquiesces the io queues blk-mq as well 
at that time, while fc leaves the io queues blk-mq quiesced until 
min(ctrl_reconnect_tmo,dev_loss_tmo).

The most common case is an ioctl from the cli hitting the admin queue 
while there's no link side association (ignoring the connect cmd). New 
normal io to io queues will hit it on rdma while the link side 
association isn't present.

The connectivity thing is: the transport knows there is no longer 
connectivity to the nvme targetport so io's should be stopped/requeued 
but the actions against the controllers for that targetport, usually 
scheduled by workq items, have yet to kick in and tear down the 
controller connections. So far, FC has the additional connectivity check 
that trumps the queue state, while rdma doesn't know and thus hard-sets 
it to true (connected). Given the other changes in rdma recently, they 
may do the same soon if they know the qp is dead.

-- james

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-03-28 16:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-27 23:07 [PATCH v2] nvme: expand nvmf_check_if_ready checks James Smart
2018-03-28  8:31 ` Christoph Hellwig
2018-03-28 16:21   ` James Smart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox