[PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery
@ 2018-06-12 23:50 James Smart
  2018-06-12 23:50 ` [PATCH v3 1/4] nvme_fc: remove reinit_request routine James Smart
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: James Smart @ 2018-06-12 23:50 UTC (permalink / raw)


This patch set modifies the fc transport such that create_ctrl results
in the OS's controller creation only and does not initially connect
to the device on the wire inline to the create_ctrl call. Instead, the
initial connect immediately schedules the background reconnect thread
to perform the initial association connect. The patchset also contains
several other cleanups found while implementing the asynchronous
connect testing.

There are two main reasons why asynchronous controller create is done:
1) It simplifies error handling and retries. The initial controller
   connect attempts can be disrupted or by errors as easily as after
   the controller is initially created. As the code currently stands
   there has to be special retry logic and prohibitions around state
   changes if errors occur during the initial connect (which the code
   today does not have). With this patch set, initial connections use
   the same path as a reconnect, and any error handling uses the same
   paths as if the errors occurred post initial connect.
2) As the create_ctrl() call is now very fast, simplistic udev rules
   can be used for auto-connect rather than involving systemd to work
   around long initial connect times, especially if errors in initial
   connect occur.

However, several hurdles in the common infrastructure need to be changed
in order to make this work. The initial controller creation expects the
controller to be fully connected and live on the wire before it returns
back to the cli.  This gave a lot of time for the udev event to be
generated and serviced to create the corresponding /dev file.  The cli
now has to be prepared that it may access the /dev file before the event
had been serviced. The cli has been updated to accomodate this.
Additionally, operations such as connect-all may occur while there is
a connectivity disturbance with the discovery controller, thus the
discovery log read may fail. To circumvent the side effects of giving
up and not connecting, the cli needs to retry the discovery log reads.
The cli has been updated to accomodate this.

Therefore, this patch set is dependent on the following modifications
that have been made to the cli:
  nvme-cli: Wait for device file if not present after successful add_ctrl
     github repo commit id: bb2d87d7f386
  nvme-cli: Add ioctl retry support for "connect-all"
     github repo commit id: d8f949c21e9d


As the patchset allows simplistic udev scripting, this patchset also
adds the change to the transport to regenerate udev discovery events in
case the system missed the events earlier (such as boot scenarios).

v2:
  added Reviewed-by's
  Reworked 2nd patch to change nvme_set_queue_count() behavior rather
    than add additional validation in the fc transport.

v3:
 Patch 1 (remove unnecessary subnqn validation) and 3 (remove DNR on
   exception) of the v2 set were committed individually
 Patch 2 (revise set_queue_count) was dropped from the set.
 Repost remaining patches. AFAIK there were no disagreements.
 There was one comment on also looking to remove reinit_request from
  core and block - which can be done independently from this patch set.


James Smart (4):
  nvme_fc: remove reinit_request routine
  nvme_fc: change controllers first connect to use reconnect path
  nvme_fc: fix nulling of queue data on reconnect
  nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device

 drivers/nvme/host/fc.c | 247 +++++++++++++++++++++++++++++++------------------
 1 file changed, 155 insertions(+), 92 deletions(-)

-- 
2.13.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/4] nvme_fc: remove reinit_request routine
  2018-06-12 23:50 [PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery James Smart
@ 2018-06-12 23:50 ` James Smart
  2018-06-13 11:28   ` Christoph Hellwig
  2018-06-12 23:50 ` [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path James Smart
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: James Smart @ 2018-06-12 23:50 UTC (permalink / raw)


The reinit_request routine is not necessary. Remove support for the
op callback.

As all that nvme_reinit_tagset() does is itterate and call the
reinit routine, it too has no purpose. Remove the call.

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
A separate effort will look at other areas in nvme that may be
able to remove the reinit interface as well.

 drivers/nvme/host/fc.c | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 0bad65803271..a4e508d2800b 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1470,21 +1470,6 @@ nvme_fc_xmt_disconnect_assoc(struct nvme_fc_ctrl *ctrl)
 
 static void nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg);
 
-static int
-nvme_fc_reinit_request(void *data, struct request *rq)
-{
-	struct nvme_fc_fcp_op *op = blk_mq_rq_to_pdu(rq);
-	struct nvme_fc_cmd_iu *cmdiu = &op->cmd_iu;
-
-	memset(cmdiu, 0, sizeof(*cmdiu));
-	cmdiu->scsi_id = NVME_CMD_SCSI_ID;
-	cmdiu->fc_id = NVME_CMD_FC_ID;
-	cmdiu->iu_len = cpu_to_be16(sizeof(*cmdiu) / sizeof(u32));
-	memset(&op->rsp_iu, 0, sizeof(op->rsp_iu));
-
-	return 0;
-}
-
 static void
 __nvme_fc_exit_request(struct nvme_fc_ctrl *ctrl,
 		struct nvme_fc_fcp_op *op)
@@ -2502,10 +2487,6 @@ nvme_fc_reinit_io_queues(struct nvme_fc_ctrl *ctrl)
 
 	nvme_fc_init_io_queues(ctrl);
 
-	ret = nvme_reinit_tagset(&ctrl->ctrl, ctrl->ctrl.tagset);
-	if (ret)
-		goto out_free_io_queues;
-
 	ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
 	if (ret)
 		goto out_free_io_queues;
@@ -2917,7 +2898,6 @@ static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = {
 	.submit_async_event	= nvme_fc_submit_async_event,
 	.delete_ctrl		= nvme_fc_delete_ctrl,
 	.get_address		= nvmf_get_address,
-	.reinit_request		= nvme_fc_reinit_request,
 };
 
 static void
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 1/4] nvme_fc: remove reinit_request routine
  2018-06-12 23:50 ` [PATCH v3 1/4] nvme_fc: remove reinit_request routine James Smart
@ 2018-06-13 11:28   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2018-06-13 11:28 UTC (permalink / raw)


On Tue, Jun 12, 2018@04:50:25PM -0700, James Smart wrote:
> The reinit_request routine is not necessary. Remove support for the
> op callback.
> 
> As all that nvme_reinit_tagset() does is itterate and call the
> reinit routine, it too has no purpose. Remove the call.
> 
> Signed-off-by: James Smart <james.smart at broadcom.com>
> Reviewed-by: Hannes Reinecke <hare at suse.com>

Looks good, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path
  2018-06-12 23:50 [PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery James Smart
  2018-06-12 23:50 ` [PATCH v3 1/4] nvme_fc: remove reinit_request routine James Smart
@ 2018-06-12 23:50 ` James Smart
  2018-06-13 11:31   ` Christoph Hellwig
  2018-06-12 23:50 ` [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect James Smart
  2018-06-12 23:50 ` [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device James Smart
  3 siblings, 1 reply; 11+ messages in thread
From: James Smart @ 2018-06-12 23:50 UTC (permalink / raw)


Current code follows the framework that has been in the transports
from the beginning where initial link-side controller connect occurs
as part of "creating the controller". Thus that first connect fully
talks to the controller and obtains values that can then be used in
for blk-mq setup, etc. It also means that everything about the
controller is fully know before the "create controller" call returns.

This has several weaknesses:
- The initial create_ctrl call made by the cli will block for a long
  time as wire transactions are performed synchronously. This delay
  becomes longer if errors occur or connectivity is lost and retries
  need to be performed.
- Code wise, it means there is a separate connect path for initial
  controller connect vs the (same) steps used in the reconnect path.
- And as there's separate paths, it means there's separate error
  handling and retry logic. It also plays havoc with the NEW state
  (should transition out of it after successful initial connect) vs
  the RESETTING and CONNECTING (reconnect) states that want to be
  transitioned to on error.
- As there's separate paths, to recover from errors and disruptions,
  it requires separate recovery/retry paths as well and can severely
  convolute the controller state.
- Given the duration of the initial create_ctrl() call, automation
  of dynamic discovery can't use simplistic udev rules. Instead,
  more convoluted use of systemd is required.

This patch reworks the fc transport to use the same connect paths
for the initial connection as it uses for reconnect. This makes a
single path for error recovery and handling. Given that the initial
connect becomes asynchronous on a background work thread, the
create_ctrl() now returns very quickly, allowing the simplified
auto-connnect scripting.

This patch:
- Removes the driving of the initial connect and replaces it with
  a state transition to CONNECTING and initiating the reconnect
  thread. A dummy state transition of RESETTING had to be traversed
  as a direct transtion of NEW->CONNECTING is not allowed. Given
  that the controller is "new", the RESETTING transition is a simple
  no-op. Once in the reconnecting thread, the normal behaviors of
  ctrl_loss_tmo (max_retries * connect_delay) and dev_loss_tmo will
  apply before the controller is torn down.
- Only if the state transitions couldn't be traversed and the
  reconnect thread not scheduled, will the controller be torn down
  while in create_ctrl.
- The prior code used the controller state of NEW to indicate
  whether request queues had been initialized or not. For the admin
  queue, the request queue is always created, so there's no need to
  check a state. For IO queues, change to tracking whether a successful
  io request queue create has occurred (e.g. 1st successful connect).
- The initial controller id is initialized to the dynamic controller
  id used in the initial connect message. It will be overwritten by
  the real controller id once the controller is connected on the wire.

Note: this patch is dependent on these changes that have been
added to the nvme-cli utility:
- nvme-cli: Wait for device file if not present after successful add_ctrl
- nvme-cli: Add ioctl retry support for "connect-all"

With this patch, a simplistic udev rule such as one of the following
options below, may now be used to support auto connectivity:

option A:
   ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
        ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \
	RUN+="/usr/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR}"

option B:
   SUBSYSTEM!="fc", GOTO="nvme_autoconnect_end"
   ACTION!="change", GOTO="nvme_autoconnect_end"
   ENV{FC_EVENT}!="nvmediscovery", GOTO="nvme_autoconnect_end"
   ENV{NVMEFC_HOST_TRADDR}!="?*", GOTO="nvme_autoconnect_end"
   ENV{NVMEFC_TRADDR}!="?*", GOTO="nvme_autoconnect_end"
   RUN+="/usr/sbin/nvme connect-all -t fc --host-traddr=$env{NVMEFC_HOST_TRADDR} -a $env{NVMEFC_TRADDR}"
   LABEL="nvme_autoconnect_end"

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/fc.c | 102 ++++++++++++++++++++++---------------------------
 1 file changed, 45 insertions(+), 57 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index a4e508d2800b..7407b464800f 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -142,6 +142,7 @@ struct nvme_fc_ctrl {
 	struct nvme_fc_rport	*rport;
 	u32			cnum;
 
+	bool			ioq_live;
 	bool			assoc_active;
 	u64			association_id;
 
@@ -2448,6 +2449,8 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
 	if (ret)
 		goto out_delete_hw_queues;
 
+	ctrl->ioq_live = true;
+
 	return 0;
 
 out_delete_hw_queues:
@@ -2596,8 +2599,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	if (ret)
 		goto out_delete_hw_queue;
 
-	if (ctrl->ctrl.state != NVME_CTRL_NEW)
-		blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 
 	ret = nvmf_connect_admin_queue(&ctrl->ctrl);
 	if (ret)
@@ -2670,7 +2672,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	 */
 
 	if (ctrl->ctrl.queue_count > 1) {
-		if (ctrl->ctrl.state == NVME_CTRL_NEW)
+		if (!ctrl->ioq_live)
 			ret = nvme_fc_create_io_queues(ctrl);
 		else
 			ret = nvme_fc_reinit_io_queues(ctrl);
@@ -2757,8 +2759,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
 	 * use blk_mq_tagset_busy_itr() and the transport routine to
 	 * terminate the exchanges.
 	 */
-	if (ctrl->ctrl.state != NVME_CTRL_NEW)
-		blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
 				nvme_fc_terminate_exchange, &ctrl->ctrl);
 
@@ -2914,7 +2915,7 @@ nvme_fc_connect_ctrl_work(struct work_struct *work)
 		nvme_fc_reconnect_or_delete(ctrl, ret);
 	else
 		dev_info(ctrl->ctrl.device,
-			"NVME-FC{%d}: controller reconnect complete\n",
+			"NVME-FC{%d}: controller connect complete\n",
 			ctrl->cnum);
 }
 
@@ -2962,7 +2963,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 {
 	struct nvme_fc_ctrl *ctrl;
 	unsigned long flags;
-	int ret, idx, retry;
+	int ret, idx;
 
 	if (!(rport->remoteport.port_role &
 	    (FC_PORT_ROLE_NVME_DISCOVERY | FC_PORT_ROLE_NVME_TARGET))) {
@@ -2989,11 +2990,13 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	}
 
 	ctrl->ctrl.opts = opts;
+	ctrl->ctrl.nr_reconnects = 0;
 	INIT_LIST_HEAD(&ctrl->ctrl_list);
 	ctrl->lport = lport;
 	ctrl->rport = rport;
 	ctrl->dev = lport->dev;
 	ctrl->cnum = idx;
+	ctrl->ioq_live = false;
 	ctrl->assoc_active = false;
 	init_waitqueue_head(&ctrl->ioabort_wait);
 
@@ -3012,6 +3015,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 
 	ctrl->ctrl.sqsize = opts->queue_size - 1;
 	ctrl->ctrl.kato = opts->kato;
+	ctrl->ctrl.cntlid = 0xffff;
 
 	ret = -ENOMEM;
 	ctrl->queues = kcalloc(ctrl->ctrl.queue_count,
@@ -3061,68 +3065,52 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
 	spin_unlock_irqrestore(&rport->lock, flags);
 
-	/*
-	 * It's possible that transactions used to create the association
-	 * may fail. Examples: CreateAssociation LS or CreateIOConnection
-	 * LS gets dropped/corrupted/fails; or a frame gets dropped or a
-	 * command times out for one of the actions to init the controller
-	 * (Connect, Get/Set_Property, Set_Features, etc). Many of these
-	 * transport errors (frame drop, LS failure) inherently must kill
-	 * the association. The transport is coded so that any command used
-	 * to create the association (prior to a LIVE state transition
-	 * while NEW or CONNECTING) will fail if it completes in error or
-	 * times out.
-	 *
-	 * As such: as the connect request was mostly likely due to a
-	 * udev event that discovered the remote port, meaning there is
-	 * not an admin or script there to restart if the connect
-	 * request fails, retry the initial connection creation up to
-	 * three times before giving up and declaring failure.
-	 */
-	for (retry = 0; retry < 3; retry++) {
-		ret = nvme_fc_create_association(ctrl);
-		if (!ret)
-			break;
+	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
+	    !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+		dev_err(ctrl->ctrl.device,
+			"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
+		goto fail_ctrl;
 	}
 
-	if (ret) {
-		nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING);
-		cancel_work_sync(&ctrl->ctrl.reset_work);
-		cancel_delayed_work_sync(&ctrl->connect_work);
+	nvme_get_ctrl(&ctrl->ctrl);
 
-		/* couldn't schedule retry - fail out */
+	if (!queue_delayed_work(nvme_wq, &ctrl->connect_work, 0)) {
+		nvme_put_ctrl(&ctrl->ctrl);
 		dev_err(ctrl->ctrl.device,
-			"NVME-FC{%d}: Connect retry failed\n", ctrl->cnum);
+			"NVME-FC{%d}: failed to schedule initial connect\n",
+			ctrl->cnum);
+		goto fail_ctrl;
+	}
 
-		ctrl->ctrl.opts = NULL;
+	dev_info(ctrl->ctrl.device,
+		"NVME-FC{%d}: new ctrl: NQN \"%s\"\n",
+		ctrl->cnum, ctrl->ctrl.opts->subsysnqn);
 
-		/* initiate nvme ctrl ref counting teardown */
-		nvme_uninit_ctrl(&ctrl->ctrl);
+	return &ctrl->ctrl;
 
-		/* Remove core ctrl ref. */
-		nvme_put_ctrl(&ctrl->ctrl);
+fail_ctrl:
+	nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING);
+	cancel_work_sync(&ctrl->ctrl.reset_work);
+	cancel_delayed_work_sync(&ctrl->connect_work);
 
-		/* as we're past the point where we transition to the ref
-		 * counting teardown path, if we return a bad pointer here,
-		 * the calling routine, thinking it's prior to the
-		 * transition, will do an rport put. Since the teardown
-		 * path also does a rport put, we do an extra get here to
-		 * so proper order/teardown happens.
-		 */
-		nvme_fc_rport_get(rport);
+	ctrl->ctrl.opts = NULL;
 
-		if (ret > 0)
-			ret = -EIO;
-		return ERR_PTR(ret);
-	}
+	/* initiate nvme ctrl ref counting teardown */
+	nvme_uninit_ctrl(&ctrl->ctrl);
 
-	nvme_get_ctrl(&ctrl->ctrl);
+	/* Remove core ctrl ref. */
+	nvme_put_ctrl(&ctrl->ctrl);
 
-	dev_info(ctrl->ctrl.device,
-		"NVME-FC{%d}: new ctrl: NQN \"%s\"\n",
-		ctrl->cnum, ctrl->ctrl.opts->subsysnqn);
+	/* as we're past the point where we transition to the ref
+	 * counting teardown path, if we return a bad pointer here,
+	 * the calling routine, thinking it's prior to the
+	 * transition, will do an rport put. Since the teardown
+	 * path also does a rport put, we do an extra get here to
+	 * so proper order/teardown happens.
+	 */
+	nvme_fc_rport_get(rport);
 
-	return &ctrl->ctrl;
+	return ERR_PTR(-EIO);
 
 out_cleanup_admin_q:
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path
  2018-06-12 23:50 ` [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path James Smart
@ 2018-06-13 11:31   ` Christoph Hellwig
  2018-06-13 15:15     ` James Smart
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2018-06-13 11:31 UTC (permalink / raw)


> Note: this patch is dependent on these changes that have been
> added to the nvme-cli utility:
> - nvme-cli: Wait for device file if not present after successful add_ctrl
> - nvme-cli: Add ioctl retry support for "connect-all"

Which means it breaks backwards compatibility.  We can't do that.

IFF we actually end up wanting async connect it needs to be an
explicit opt-it with a new option.

In the meantime I really like this code in general, so please resubmit
it with a flush_delayed_work to make it synchronus.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path
  2018-06-13 11:31   ` Christoph Hellwig
@ 2018-06-13 15:15     ` James Smart
  0 siblings, 0 replies; 11+ messages in thread
From: James Smart @ 2018-06-13 15:15 UTC (permalink / raw)


On 6/13/2018 4:31 AM, Christoph Hellwig wrote:
>> Note: this patch is dependent on these changes that have been
>> added to the nvme-cli utility:
>> - nvme-cli: Wait for device file if not present after successful add_ctrl
>> - nvme-cli: Add ioctl retry support for "connect-all"
> 
> Which means it breaks backwards compatibility.  We can't do that.

I don't know what compatibility you are talking about. Both of those 
issues could be hit even with the non-async implementation.

The cli has already been updated and addressed them.


> 
> IFF we actually end up wanting async connect it needs to be an
> explicit opt-it with a new option.
> 
> In the meantime I really like this code in general, so please resubmit
> it with a flush_delayed_work to make it synchronus.

I disagree. No user is aware of the async vs sync nature. It's the 
non-delay of the create thread that allows the simplistic udev scripts.

-- james

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect
  2018-06-12 23:50 [PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery James Smart
  2018-06-12 23:50 ` [PATCH v3 1/4] nvme_fc: remove reinit_request routine James Smart
  2018-06-12 23:50 ` [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path James Smart
@ 2018-06-12 23:50 ` James Smart
  2018-06-13 11:32   ` Christoph Hellwig
  2018-06-12 23:50 ` [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device James Smart
  3 siblings, 1 reply; 11+ messages in thread
From: James Smart @ 2018-06-12 23:50 UTC (permalink / raw)


The reconnect path is calling the init routines to clear a queue
structure. But the queue structure has state that perhaps needs
to persist as long as the controller is live.

Remove the nvme_fc_init_queue() calls on reconnect.
The nvme_fc_free_queue() calls will clear state bits and reset
any relevant queue state for a new connection.

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/fc.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 7407b464800f..0b9cc767a157 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1879,6 +1879,7 @@ nvme_fc_free_queue(struct nvme_fc_queue *queue)
 	 */
 
 	queue->connection_id = 0;
+	atomic_set(&queue->csn, 1);
 }
 
 static void
@@ -2468,7 +2469,7 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
 }
 
 static int
-nvme_fc_reinit_io_queues(struct nvme_fc_ctrl *ctrl)
+nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
 {
 	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
 	unsigned int nr_io_queues;
@@ -2488,8 +2489,6 @@ nvme_fc_reinit_io_queues(struct nvme_fc_ctrl *ctrl)
 	if (ctrl->ctrl.queue_count == 1)
 		return 0;
 
-	nvme_fc_init_io_queues(ctrl);
-
 	ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
 	if (ret)
 		goto out_free_io_queues;
@@ -2587,8 +2586,6 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	 * Create the admin queue
 	 */
 
-	nvme_fc_init_queue(ctrl, 0);
-
 	ret = __nvme_fc_create_hw_queue(ctrl, &ctrl->queues[0], 0,
 				NVME_AQ_DEPTH);
 	if (ret)
@@ -2675,7 +2672,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 		if (!ctrl->ioq_live)
 			ret = nvme_fc_create_io_queues(ctrl);
 		else
-			ret = nvme_fc_reinit_io_queues(ctrl);
+			ret = nvme_fc_recreate_io_queues(ctrl);
 		if (ret)
 			goto out_term_aen_ops;
 	}
@@ -3023,6 +3020,8 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	if (!ctrl->queues)
 		goto out_free_ida;
 
+	nvme_fc_init_queue(ctrl, 0);
+
 	memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
 	ctrl->admin_tag_set.ops = &nvme_fc_admin_mq_ops;
 	ctrl->admin_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect
  2018-06-12 23:50 ` [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect James Smart
@ 2018-06-13 11:32   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2018-06-13 11:32 UTC (permalink / raw)


On Tue, Jun 12, 2018@04:50:27PM -0700, James Smart wrote:
> The reconnect path is calling the init routines to clear a queue
> structure. But the queue structure has state that perhaps needs
> to persist as long as the controller is live.
> 
> Remove the nvme_fc_init_queue() calls on reconnect.
> The nvme_fc_free_queue() calls will clear state bits and reset
> any relevant queue state for a new connection.
> 
> Signed-off-by: James Smart <james.smart at broadcom.com>
> Reviewed-by: Hannes Reinecke <hare at suse.com>

Looks fine to me.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device
  2018-06-12 23:50 [PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery James Smart
                   ` (2 preceding siblings ...)
  2018-06-12 23:50 ` [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect James Smart
@ 2018-06-12 23:50 ` James Smart
  2018-06-13 11:33   ` Christoph Hellwig
  3 siblings, 1 reply; 11+ messages in thread
From: James Smart @ 2018-06-12 23:50 UTC (permalink / raw)


The fc transport device should allow for a rediscovery, as userspace
might have lost the events. Example is udev events not handled during
system startup.

This patch add a sysfs entry 'nvme_discovery' on the fc class to
have it replay all udev discovery events for all local port/remote
port address pairs.

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Cc: Ewan Milne <emilne at redhat.com>
Cc: Johannes Thumshirn <jthumshirn at suse.com>
---
 drivers/nvme/host/fc.c | 114 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 105 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 0b9cc767a157..6e8b4a0a130e 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -122,6 +122,7 @@ struct nvme_fc_rport {
 	struct list_head		endp_list; /* for lport->endp_list */
 	struct list_head		ctrl_list;
 	struct list_head		ls_req_list;
+	struct list_head		disc_list;
 	struct device			*dev;	/* physical device for dma */
 	struct nvme_fc_lport		*lport;
 	spinlock_t			lock;
@@ -210,7 +211,6 @@ static DEFINE_IDA(nvme_fc_ctrl_cnt);
  * These items are short-term. They will eventually be moved into
  * a generic FC class. See comments in module init.
  */
-static struct class *fc_class;
 static struct device *fc_udev_device;
 
 
@@ -507,6 +507,7 @@ nvme_fc_free_rport(struct kref *ref)
 	list_del(&rport->endp_list);
 	spin_unlock_irqrestore(&nvme_fc_lock, flags);
 
+	WARN_ON(!list_empty(&rport->disc_list));
 	ida_simple_remove(&lport->endp_cnt, rport->remoteport.port_num);
 
 	kfree(rport);
@@ -694,6 +695,7 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport,
 	INIT_LIST_HEAD(&newrec->endp_list);
 	INIT_LIST_HEAD(&newrec->ctrl_list);
 	INIT_LIST_HEAD(&newrec->ls_req_list);
+	INIT_LIST_HEAD(&newrec->disc_list);
 	kref_init(&newrec->ref);
 	atomic_set(&newrec->act_ctrl_cnt, 0);
 	spin_lock_init(&newrec->lock);
@@ -3252,6 +3254,100 @@ static struct nvmf_transport_ops nvme_fc_transport = {
 	.create_ctrl	= nvme_fc_create_ctrl,
 };
 
+static void nvme_fc_discovery_unwind(struct list_head *lheadp)
+{
+	unsigned long flags;
+	struct nvme_fc_lport *lport;
+	struct nvme_fc_rport *rport, *tmp_rport;
+
+	list_for_each_entry_safe(rport, tmp_rport,
+				 lheadp, disc_list) {
+		spin_lock_irqsave(&nvme_fc_lock, flags);
+		list_del_init(&rport->disc_list);
+		spin_unlock_irqrestore(&nvme_fc_lock, flags);
+		lport = rport->lport;
+		/* signal discovery. Won't hurt if it repeats */
+		nvme_fc_signal_discovery_scan(lport, rport);
+		nvme_fc_rport_put(rport);
+		nvme_fc_lport_put(lport);
+	}
+}
+
+/* Arbitrary successive failures max. With lots of subsystems could be high */
+#define DISCOVERY_MAX_FAIL	20
+
+static ssize_t nvme_fc_nvme_discovery_store(struct device *dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t count)
+{
+	unsigned long flags;
+	LIST_HEAD(nvme_fc_disc_list);
+	struct nvme_fc_lport *lport;
+	struct nvme_fc_rport *rport, *tmp_rport;
+	int failcnt = 0;
+
+restart:
+	spin_lock_irqsave(&nvme_fc_lock, flags);
+	list_for_each_entry(lport, &nvme_fc_lport_list, port_list) {
+		list_for_each_entry(rport, &lport->endp_list, endp_list) {
+			if (!nvme_fc_lport_get(lport))
+				continue;
+			if (!nvme_fc_rport_get(rport)) {
+				/*
+				 * This is a temporary condition, so upon
+				 * restart this node will be gone from the
+				 * list.
+				 */
+				spin_unlock_irqrestore(&nvme_fc_lock, flags);
+				nvme_fc_lport_put(lport);
+				nvme_fc_discovery_unwind(&nvme_fc_disc_list);
+				if (failcnt++ < DISCOVERY_MAX_FAIL)
+					goto restart;
+				pr_err("nvme_discovery: too many reference "
+				       "failures\n");
+				return 0;
+			}
+			if (list_empty(&rport->disc_list))
+				list_add_tail(&rport->disc_list,
+					      &nvme_fc_disc_list);
+		}
+	}
+	spin_unlock_irqrestore(&nvme_fc_lock, flags);
+
+	list_for_each_entry_safe(rport, tmp_rport,
+				 &nvme_fc_disc_list, disc_list) {
+		spin_lock_irqsave(&nvme_fc_lock, flags);
+		list_del_init(&rport->disc_list);
+		spin_unlock_irqrestore(&nvme_fc_lock, flags);
+		lport = rport->lport;
+		nvme_fc_signal_discovery_scan(lport, rport);
+		nvme_fc_rport_put(rport);
+		nvme_fc_lport_put(lport);
+	}
+	return count;
+}
+static DEVICE_ATTR(nvme_discovery, 0200, NULL, nvme_fc_nvme_discovery_store);
+
+static struct attribute *nvme_fc_attrs[] = {
+	&dev_attr_nvme_discovery.attr,
+	NULL
+};
+
+static struct attribute_group nvme_fc_attr_group = {
+	.attrs = nvme_fc_attrs,
+};
+
+static const struct attribute_group *nvme_fc_attr_groups[] = {
+	&nvme_fc_attr_group,
+	NULL
+};
+
+static struct class fc_class = {
+	.name = "fc",
+	.dev_groups = nvme_fc_attr_groups,
+	.owner = THIS_MODULE,
+};
+
 static int __init nvme_fc_init_module(void)
 {
 	int ret;
@@ -3270,16 +3366,16 @@ static int __init nvme_fc_init_module(void)
 	 * put in place, this code will move to a more generic
 	 * location for the class.
 	 */
-	fc_class = class_create(THIS_MODULE, "fc");
-	if (IS_ERR(fc_class)) {
+	ret = class_register(&fc_class);
+	if (ret) {
 		pr_err("couldn't register class fc\n");
-		return PTR_ERR(fc_class);
+		return ret;
 	}
 
 	/*
 	 * Create a device for the FC-centric udev events
 	 */
-	fc_udev_device = device_create(fc_class, NULL, MKDEV(0, 0), NULL,
+	fc_udev_device = device_create(&fc_class, NULL, MKDEV(0, 0), NULL,
 				"fc_udev_device");
 	if (IS_ERR(fc_udev_device)) {
 		pr_err("couldn't create fc_udev device!\n");
@@ -3294,9 +3390,9 @@ static int __init nvme_fc_init_module(void)
 	return 0;
 
 out_destroy_device:
-	device_destroy(fc_class, MKDEV(0, 0));
+	device_destroy(&fc_class, MKDEV(0, 0));
 out_destroy_class:
-	class_destroy(fc_class);
+	class_unregister(&fc_class);
 	return ret;
 }
 
@@ -3311,8 +3407,8 @@ static void __exit nvme_fc_exit_module(void)
 	ida_destroy(&nvme_fc_local_port_cnt);
 	ida_destroy(&nvme_fc_ctrl_cnt);
 
-	device_destroy(fc_class, MKDEV(0, 0));
-	class_destroy(fc_class);
+	device_destroy(&fc_class, MKDEV(0, 0));
+	class_unregister(&fc_class);
 }
 
 module_init(nvme_fc_init_module);
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device
  2018-06-12 23:50 ` [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device James Smart
@ 2018-06-13 11:33   ` Christoph Hellwig
  2018-06-13 15:29     ` James Smart
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2018-06-13 11:33 UTC (permalink / raw)


On Tue, Jun 12, 2018@04:50:28PM -0700, James Smart wrote:
> The fc transport device should allow for a rediscovery, as userspace
> might have lost the events. Example is udev events not handled during
> system startup.

Please stop adding hacks for your non-standard discovery methods.

> +static void nvme_fc_discovery_unwind(struct list_head *lheadp)
> +{
> +	unsigned long flags;
> +	struct nvme_fc_lport *lport;
> +	struct nvme_fc_rport *rport, *tmp_rport;
> +
> +	list_for_each_entry_safe(rport, tmp_rport,
> +				 lheadp, disc_list) {
> +		spin_lock_irqsave(&nvme_fc_lock, flags);
> +		list_del_init(&rport->disc_list);
> +		spin_unlock_irqrestore(&nvme_fc_lock, flags);

Without the lock taken around the list iteration this is racy.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device
  2018-06-13 11:33   ` Christoph Hellwig
@ 2018-06-13 15:29     ` James Smart
  0 siblings, 0 replies; 11+ messages in thread
From: James Smart @ 2018-06-13 15:29 UTC (permalink / raw)

On 6/13/2018 4:33 AM, Christoph Hellwig wrote:
> On Tue, Jun 12, 2018@04:50:28PM -0700, James Smart wrote:
>> The fc transport device should allow for a rediscovery, as userspace
>> might have lost the events. Example is udev events not handled during
>> system startup.
> 
> Please stop adding hacks for your non-standard discovery methods.

Please stop these boorish anti-fc comments. There is nothing 
non-standard or hackish about fc's discovery methods.

This patch simply addresses an issue that arises as nvme discovery isn't 
in-kernel thus has a relationship to the boot process.

> 
>> +static void nvme_fc_discovery_unwind(struct list_head *lheadp)
>> +{
>> +	unsigned long flags;
>> +	struct nvme_fc_lport *lport;
>> +	struct nvme_fc_rport *rport, *tmp_rport;
>> +
>> +	list_for_each_entry_safe(rport, tmp_rport,
>> +				 lheadp, disc_list) {
>> +		spin_lock_irqsave(&nvme_fc_lock, flags);
>> +		list_del_init(&rport->disc_list);
>> +		spin_unlock_irqrestore(&nvme_fc_lock, flags);
> 
> Without the lock taken around the list iteration this is racy.
> 

how so ?

this list was built up ahead of time solely for this routine, gets are 
done on all objects as they were added to the list, the list isn't 
changed except for deleting elements from the list by this routine, and 
a port can't get on this list twice. I don't see the race.

-- james

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-06-13 15:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-12 23:50 [PATCH v3 0/4] nvme_fc: asynchronous controller create and simple discovery James Smart
2018-06-12 23:50 ` [PATCH v3 1/4] nvme_fc: remove reinit_request routine James Smart
2018-06-13 11:28   ` Christoph Hellwig
2018-06-12 23:50 ` [PATCH v3 2/4] nvme_fc: change controllers first connect to use reconnect path James Smart
2018-06-13 11:31   ` Christoph Hellwig
2018-06-13 15:15     ` James Smart
2018-06-12 23:50 ` [PATCH v3 3/4] nvme_fc: fix nulling of queue data on reconnect James Smart
2018-06-13 11:32   ` Christoph Hellwig
2018-06-12 23:50 ` [PATCH v3 4/4] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device James Smart
2018-06-13 11:33   ` Christoph Hellwig
2018-06-13 15:29     ` James Smart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox