* [PATCH v3 0/5] nvme_fc: add dev_loss_tmo support @ 2017-10-17 23:32 James Smart 2017-10-17 23:32 ` [PATCH v3 1/5] nvme core: allow controller RESETTING to RECONNECTING transition James Smart ` (4 more replies) 0 siblings, 5 replies; 20+ messages in thread From: James Smart @ 2017-10-17 23:32 UTC (permalink / raw) FC, on the SCSI side, has long had a device loss timeout which governed how long it would hide connectivity loss to remote target ports. The timeout value is maintained in the SCSI FC transport and admins are used to going there to maintain it. Eventually, the SCSI FC transport will be moved into something independent from and above SCSI so that SCSI and NVME protocols can be peers. In the meantime, to add the functionality now, and sync with the SCSI FC transport, the LLDD will be used as the conduit. The initial value for the timeout can be set by the LLDD when it creates the remoteport via nvme_fc_register_remoteport(). Later, if the value is updated via the SCSI transport, the LLDD can call a new nvme_fc routine to update the remoteport's dev_loss_tmo value. The nvme fabrics implementation already has a similar timer, the ctrl_loss_tmo, which is distilled into a max_reconnect attempts and a reconnect_delay between attempts, where the overall duration until max is hit is the ctrl_loss_tmo. This was primarily for transports that didn't have the ability to track device connectivity and would retry per the delay until finally giving up. The implementation in this patch set implements the FC dev_loss_tmo at the FC port level. The timer is initiated when connectivity is lost. If connectivity is not re-established and the timer expires, all controllers on the remote port will be deleted. When the FC remoteport-level connectivity is lost, all controllers on the remoteport are reset, which results in them transitioning to a reconnecting state, with the ctrl_loss_tmo behavior kicks in. Thus the controller may be deleted as soon as ctrl_loss_tmo expires or the FC port level dev_loss_tmo expires. If connectivity is re-established before the dev_loss_tmo expires, any controllers on the remoteport, which would be in a reconnecting state, would immediately have a reconnect attempted. The patches were cut on the nvme-4.15 branch Patch 5, which adds the dev_loss_tmo timeout, is dependent on the nvme_fc_signal_discovery_scan() routine added by this patch: http://lists.infradead.org/pipermail/linux-nvme/2017-September/012781.html The patch has been approved but not yet pulled into a tree. V3: In v2, the implementation merged the dev_loss_tmo value into the ctlr_loss_tmo in the controller, so only a single timer on each controller was running. V3 changed to keep the dev_loss_tmo on the FC remoteport and to run it independently from the ctrl_loss_tmo timer, excepting for loss of connectivity to start both simultaneously. James Smart (5): nvme core: allow controller RESETTING to RECONNECTING transition nvme_fc: change ctlr state assignments during reset/reconnect nvme_fc: add a dev_loss_tmo field to the remoteport nvme_fc: check connectivity before initiating reconnects nvme_fc: add dev_loss_tmo timeout and remoteport resume support drivers/nvme/host/core.c | 1 + drivers/nvme/host/fc.c | 337 ++++++++++++++++++++++++++++++++++++----- include/linux/nvme-fc-driver.h | 11 +- 3 files changed, 310 insertions(+), 39 deletions(-) -- 2.13.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 1/5] nvme core: allow controller RESETTING to RECONNECTING transition 2017-10-17 23:32 [PATCH v3 0/5] nvme_fc: add dev_loss_tmo support James Smart @ 2017-10-17 23:32 ` James Smart 2017-10-18 8:26 ` Johannes Thumshirn 2017-10-20 5:58 ` Hannes Reinecke 2017-10-17 23:32 ` [PATCH v3 2/5] nvme_fc: change ctlr state assignments during reset/reconnect James Smart ` (3 subsequent siblings) 4 siblings, 2 replies; 20+ messages in thread From: James Smart @ 2017-10-17 23:32 UTC (permalink / raw) Allow controller state transition : RESETTING to RECONNECTING Transport will typically transition from LIVE->RESETTING when initially performing a reset or recovering from an error. Adding this transition allows a transport to transition to RECONNECTING when it checks/waits for connectivity then creates new transport connections and reinits the controller. -- james Signed-off-by: James Smart <james.smart at broadcom.com> Reviewed-by: Christoph Hellwig <hch at lst.de> Reviewed-by: Sagi Grimberg <sagi at grimberg.me> --- v3: no change. incorporated Reviewed-by's --- drivers/nvme/host/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 26c8913435b2..aa9e2df27bf7 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -205,6 +205,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl, case NVME_CTRL_RECONNECTING: switch (old_state) { case NVME_CTRL_LIVE: + case NVME_CTRL_RESETTING: changed = true; /* FALLTHRU */ default: -- 2.13.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v3 1/5] nvme core: allow controller RESETTING to RECONNECTING transition 2017-10-17 23:32 ` [PATCH v3 1/5] nvme core: allow controller RESETTING to RECONNECTING transition James Smart @ 2017-10-18 8:26 ` Johannes Thumshirn 2017-10-20 5:58 ` Hannes Reinecke 1 sibling, 0 replies; 20+ messages in thread From: Johannes Thumshirn @ 2017-10-18 8:26 UTC (permalink / raw) Looks good, Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de> -- Johannes Thumshirn Storage jthumshirn at suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 1/5] nvme core: allow controller RESETTING to RECONNECTING transition 2017-10-17 23:32 ` [PATCH v3 1/5] nvme core: allow controller RESETTING to RECONNECTING transition James Smart 2017-10-18 8:26 ` Johannes Thumshirn @ 2017-10-20 5:58 ` Hannes Reinecke 1 sibling, 0 replies; 20+ messages in thread From: Hannes Reinecke @ 2017-10-20 5:58 UTC (permalink / raw) On 10/18/2017 01:32 AM, James Smart wrote: > Allow controller state transition : RESETTING to RECONNECTING > > Transport will typically transition from LIVE->RESETTING when > initially performing a reset or recovering from an error. Adding > this transition allows a transport to transition to RECONNECTING > when it checks/waits for connectivity then creates new transport > connections and reinits the controller. > > -- james > > Signed-off-by: James Smart <james.smart at broadcom.com> > Reviewed-by: Christoph Hellwig <hch at lst.de> > Reviewed-by: Sagi Grimberg <sagi at grimberg.me> > > --- > v3: no change. incorporated Reviewed-by's > --- > drivers/nvme/host/core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 26c8913435b2..aa9e2df27bf7 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -205,6 +205,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl, > case NVME_CTRL_RECONNECTING: > switch (old_state) { > case NVME_CTRL_LIVE: > + case NVME_CTRL_RESETTING: > changed = true; > /* FALLTHRU */ > default: > Reviewed-by: Hannes Reinecke <hare at suse.com> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare at suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N?rnberg) ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 2/5] nvme_fc: change ctlr state assignments during reset/reconnect 2017-10-17 23:32 [PATCH v3 0/5] nvme_fc: add dev_loss_tmo support James Smart 2017-10-17 23:32 ` [PATCH v3 1/5] nvme core: allow controller RESETTING to RECONNECTING transition James Smart @ 2017-10-17 23:32 ` James Smart 2017-10-18 8:27 ` Johannes Thumshirn 2017-10-20 6:00 ` Hannes Reinecke 2017-10-17 23:32 ` [PATCH v3 3/5] nvme_fc: add a dev_loss_tmo field to the remoteport James Smart ` (2 subsequent siblings) 4 siblings, 2 replies; 20+ messages in thread From: James Smart @ 2017-10-17 23:32 UTC (permalink / raw) Clean up some of the controller state checks and add the RESETTING->RECONNECTING state transition. Specifically: - the movement of the RESETTING state change and schedule of reset_work to core doesn't work wiht nvme_fc_error_recovery setting state to RECONNECTING before attempting to reset. Remove the state change as the reset request does it. - In the rare cases where an error occurs right as we're transitioning to LIVE, defer the controller start actions. - In error handling on teardown of associations while performing initial controller creation - avoid quiesce calls on the admin_q. They are unneeded. - Add the RESETTING->RECONNECTING transition in the reset handler. Signed-off-by: James Smart <james.smart at broadcom.com> Reviewed-by: Christoph Hellwig <hch at lst.de> --- v3: no change. incorporated Reviewed-by's --- drivers/nvme/host/fc.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 48c241620d2b..5afb518c39ad 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -1881,13 +1881,6 @@ nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg) dev_warn(ctrl->ctrl.device, "NVME-FC{%d}: resetting controller\n", ctrl->cnum); - if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) { - dev_err(ctrl->ctrl.device, - "NVME-FC{%d}: error_recovery: Couldn't change state " - "to RECONNECTING\n", ctrl->cnum); - return; - } - nvme_reset_ctrl(&ctrl->ctrl); } @@ -2521,11 +2514,11 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl) } changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE); - WARN_ON_ONCE(!changed); ctrl->ctrl.nr_reconnects = 0; - nvme_start_ctrl(&ctrl->ctrl); + if (changed) + nvme_start_ctrl(&ctrl->ctrl); return 0; /* Success */ @@ -2593,7 +2586,8 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl) * use blk_mq_tagset_busy_itr() and the transport routine to * terminate the exchanges. */ - blk_mq_quiesce_queue(ctrl->ctrl.admin_q); + if (ctrl->ctrl.state != NVME_CTRL_NEW) + blk_mq_quiesce_queue(ctrl->ctrl.admin_q); blk_mq_tagset_busy_iter(&ctrl->admin_tag_set, nvme_fc_terminate_exchange, &ctrl->ctrl); @@ -2697,12 +2691,8 @@ nvme_fc_del_nvme_ctrl(struct nvme_ctrl *nctrl) static void nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status) { - /* If we are resetting/deleting then do nothing */ - if (ctrl->ctrl.state != NVME_CTRL_RECONNECTING) { - WARN_ON_ONCE(ctrl->ctrl.state == NVME_CTRL_NEW || - ctrl->ctrl.state == NVME_CTRL_LIVE); + if (ctrl->ctrl.state != NVME_CTRL_RECONNECTING) return; - } dev_info(ctrl->ctrl.device, "NVME-FC{%d}: reset: Reconnect attempt failed (%d)\n", @@ -2731,9 +2721,17 @@ nvme_fc_reset_ctrl_work(struct work_struct *work) int ret; nvme_stop_ctrl(&ctrl->ctrl); + /* will block will waiting for io to terminate */ nvme_fc_delete_association(ctrl); + if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) { + dev_err(ctrl->ctrl.device, + "NVME-FC{%d}: error_recovery: Couldn't change state " + "to RECONNECTING\n", ctrl->cnum); + return; + } + ret = nvme_fc_create_association(ctrl); if (ret) nvme_fc_reconnect_or_delete(ctrl, ret); -- 2.13.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v3 2/5] nvme_fc: change ctlr state assignments during reset/reconnect 2017-10-17 23:32 ` [PATCH v3 2/5] nvme_fc: change ctlr state assignments during reset/reconnect James Smart @ 2017-10-18 8:27 ` Johannes Thumshirn 2017-10-20 6:00 ` Hannes Reinecke 1 sibling, 0 replies; 20+ messages in thread From: Johannes Thumshirn @ 2017-10-18 8:27 UTC (permalink / raw) Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de> -- Johannes Thumshirn Storage jthumshirn at suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 2/5] nvme_fc: change ctlr state assignments during reset/reconnect 2017-10-17 23:32 ` [PATCH v3 2/5] nvme_fc: change ctlr state assignments during reset/reconnect James Smart 2017-10-18 8:27 ` Johannes Thumshirn @ 2017-10-20 6:00 ` Hannes Reinecke 1 sibling, 0 replies; 20+ messages in thread From: Hannes Reinecke @ 2017-10-20 6:00 UTC (permalink / raw) On 10/18/2017 01:32 AM, James Smart wrote: > Clean up some of the controller state checks and add the > RESETTING->RECONNECTING state transition. > > Specifically: > - the movement of the RESETTING state change and schedule of > reset_work to core doesn't work wiht nvme_fc_error_recovery > setting state to RECONNECTING before attempting to reset. > Remove the state change as the reset request does it. > - In the rare cases where an error occurs right as we're > transitioning to LIVE, defer the controller start actions. > - In error handling on teardown of associations while > performing initial controller creation - avoid quiesce > calls on the admin_q. They are unneeded. > - Add the RESETTING->RECONNECTING transition in the reset > handler. > > Signed-off-by: James Smart <james.smart at broadcom.com> > Reviewed-by: Christoph Hellwig <hch at lst.de> > > --- > v3: no change. incorporated Reviewed-by's > --- > drivers/nvme/host/fc.c | 28 +++++++++++++--------------- > 1 file changed, 13 insertions(+), 15 deletions(-) > Reviewed-by: Hannes Reinecke <hare at suse.com> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare at suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N?rnberg) ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 3/5] nvme_fc: add a dev_loss_tmo field to the remoteport 2017-10-17 23:32 [PATCH v3 0/5] nvme_fc: add dev_loss_tmo support James Smart 2017-10-17 23:32 ` [PATCH v3 1/5] nvme core: allow controller RESETTING to RECONNECTING transition James Smart 2017-10-17 23:32 ` [PATCH v3 2/5] nvme_fc: change ctlr state assignments during reset/reconnect James Smart @ 2017-10-17 23:32 ` James Smart 2017-10-18 8:28 ` Johannes Thumshirn 2017-10-20 6:04 ` Hannes Reinecke 2017-10-17 23:32 ` [PATCH v3 4/5] nvme_fc: check connectivity before initiating reconnects James Smart 2017-10-17 23:32 ` [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support James Smart 4 siblings, 2 replies; 20+ messages in thread From: James Smart @ 2017-10-17 23:32 UTC (permalink / raw) Add a dev_loss_tmo value, paralleling the SCSI FC transport, for device connectivity loss. The transport initializes the value in the nvme_fc_register_remoteport() call. If the value is not set, a default of 60s is set. Add a new routine to the api, nvme_fc_set_remoteport_devloss() routine, which allows the lldd to dynamically update the value on an existing remoteport. Signed-off-by: James Smart <james.smart at broadcom.com> --- v3: Removed the expected reconnect time and min devloss defines. They were not that meaningful. Incorporated the new nvme_fc_set_remoteport_devloss() routine which was in a different patch in v2. New routine no longer touches controllers. It only modifies the remoteport. --- drivers/nvme/host/fc.c | 31 +++++++++++++++++++++++++++++++ include/linux/nvme-fc-driver.h | 11 +++++++++-- 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 5afb518c39ad..3ac49e670c38 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -45,6 +45,8 @@ enum nvme_fc_queue_flags { #define NVMEFC_QUEUE_DELAY 3 /* ms units */ +#define NVME_FC_DEFAULT_DEV_LOSS_TMO 60 /* seconds */ + struct nvme_fc_queue { struct nvme_fc_ctrl *ctrl; struct device *dev; @@ -587,6 +589,11 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, newrec->remoteport.port_id = pinfo->port_id; newrec->remoteport.port_state = FC_OBJSTATE_ONLINE; newrec->remoteport.port_num = idx; + /* a registration value of dev_loss_tmo=0 results in the default */ + if (pinfo->dev_loss_tmo) + newrec->remoteport.dev_loss_tmo = pinfo->dev_loss_tmo; + else + newrec->remoteport.dev_loss_tmo = NVME_FC_DEFAULT_DEV_LOSS_TMO; spin_lock_irqsave(&nvme_fc_lock, flags); list_add_tail(&newrec->endp_list, &lport->endp_list); @@ -690,6 +697,30 @@ nvme_fc_rescan_remoteport(struct nvme_fc_remote_port *remoteport) } EXPORT_SYMBOL_GPL(nvme_fc_rescan_remoteport); +int +nvme_fc_set_remoteport_devloss(struct nvme_fc_remote_port *portptr, + u32 dev_loss_tmo) +{ + struct nvme_fc_rport *rport = remoteport_to_rport(portptr); + struct nvme_fc_ctrl *ctrl; + unsigned long flags; + + spin_lock_irqsave(&rport->lock, flags); + + if (portptr->port_state != FC_OBJSTATE_ONLINE) { + spin_unlock_irqrestore(&rport->lock, flags); + return -EINVAL; + } + + /* a dev_loss_tmo of 0 (immediate) is allowed to be set */ + rport->remoteport.dev_loss_tmo = dev_loss_tmo; + + spin_unlock_irqrestore(&rport->lock, flags); + + return 0; +} +EXPORT_SYMBOL_GPL(nvme_fc_set_remoteport_devloss); + /* *********************** FC-NVME DMA Handling **************************** */ diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h index 4ea03b9a5c8c..3de268bf86bf 100644 --- a/include/linux/nvme-fc-driver.h +++ b/include/linux/nvme-fc-driver.h @@ -40,6 +40,8 @@ * @node_name: FC WWNN for the port * @port_name: FC WWPN for the port * @port_role: What NVME roles are supported (see FC_PORT_ROLE_xxx) + * @dev_loss_tmo: maximum delay for reconnects to an association on + * this device. Used only on a remoteport. * * Initialization values for dynamic port fields: * @port_id: FC N_Port_ID currently assigned the port. Upper 8 bits must @@ -50,6 +52,7 @@ struct nvme_fc_port_info { u64 port_name; u32 port_role; u32 port_id; + u32 dev_loss_tmo; }; @@ -202,6 +205,9 @@ enum nvme_fc_obj_state { * The length of the buffer corresponds to the local_priv_sz * value specified in the nvme_fc_port_template supplied by * the LLDD. + * @dev_loss_tmo: maximum delay for reconnects to an association on + * this device. To modify, lldd must call + * nvme_fc_set_remoteport_devloss(). * * Fields with dynamic values. Values may change base on link state. LLDD * may reference fields directly to change them. Initialized by the @@ -259,10 +265,9 @@ struct nvme_fc_remote_port { u32 port_role; u64 node_name; u64 port_name; - struct nvme_fc_local_port *localport; - void *private; + u32 dev_loss_tmo; /* dynamic fields */ u32 port_id; @@ -448,6 +453,8 @@ int nvme_fc_unregister_remoteport(struct nvme_fc_remote_port *remoteport); void nvme_fc_rescan_remoteport(struct nvme_fc_remote_port *remoteport); +int nvme_fc_set_remoteport_devloss(struct nvme_fc_remote_port *remoteport, + u32 dev_loss_tmo); /* -- 2.13.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v3 3/5] nvme_fc: add a dev_loss_tmo field to the remoteport 2017-10-17 23:32 ` [PATCH v3 3/5] nvme_fc: add a dev_loss_tmo field to the remoteport James Smart @ 2017-10-18 8:28 ` Johannes Thumshirn 2017-10-20 6:04 ` Hannes Reinecke 1 sibling, 0 replies; 20+ messages in thread From: Johannes Thumshirn @ 2017-10-18 8:28 UTC (permalink / raw) Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de> -- Johannes Thumshirn Storage jthumshirn at suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 3/5] nvme_fc: add a dev_loss_tmo field to the remoteport 2017-10-17 23:32 ` [PATCH v3 3/5] nvme_fc: add a dev_loss_tmo field to the remoteport James Smart 2017-10-18 8:28 ` Johannes Thumshirn @ 2017-10-20 6:04 ` Hannes Reinecke 2017-10-20 14:50 ` James Smart 1 sibling, 1 reply; 20+ messages in thread From: Hannes Reinecke @ 2017-10-20 6:04 UTC (permalink / raw) On 10/18/2017 01:32 AM, James Smart wrote: > Add a dev_loss_tmo value, paralleling the SCSI FC transport, for device > connectivity loss. > > The transport initializes the value in the nvme_fc_register_remoteport() > call. If the value is not set, a default of 60s is set. > > Add a new routine to the api, nvme_fc_set_remoteport_devloss() routine, > which allows the lldd to dynamically update the value on an existing > remoteport. > > Signed-off-by: James Smart <james.smart at broadcom.com> > > --- > v3: > Removed the expected reconnect time and min devloss defines. They > were not that meaningful. > Incorporated the new nvme_fc_set_remoteport_devloss() routine > which was in a different patch in v2. New routine no longer > touches controllers. It only modifies the remoteport. > --- > drivers/nvme/host/fc.c | 31 +++++++++++++++++++++++++++++++ > include/linux/nvme-fc-driver.h | 11 +++++++++-- > 2 files changed, 40 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c > index 5afb518c39ad..3ac49e670c38 100644 > --- a/drivers/nvme/host/fc.c > +++ b/drivers/nvme/host/fc.c > @@ -45,6 +45,8 @@ enum nvme_fc_queue_flags { > > #define NVMEFC_QUEUE_DELAY 3 /* ms units */ > > +#define NVME_FC_DEFAULT_DEV_LOSS_TMO 60 /* seconds */ > + > struct nvme_fc_queue { > struct nvme_fc_ctrl *ctrl; > struct device *dev; > @@ -587,6 +589,11 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, > newrec->remoteport.port_id = pinfo->port_id; > newrec->remoteport.port_state = FC_OBJSTATE_ONLINE; > newrec->remoteport.port_num = idx; > + /* a registration value of dev_loss_tmo=0 results in the default */ > + if (pinfo->dev_loss_tmo) > + newrec->remoteport.dev_loss_tmo = pinfo->dev_loss_tmo; > + else > + newrec->remoteport.dev_loss_tmo = NVME_FC_DEFAULT_DEV_LOSS_TMO; > > spin_lock_irqsave(&nvme_fc_lock, flags); > list_add_tail(&newrec->endp_list, &lport->endp_list); > @@ -690,6 +697,30 @@ nvme_fc_rescan_remoteport(struct nvme_fc_remote_port *remoteport) > } > EXPORT_SYMBOL_GPL(nvme_fc_rescan_remoteport); > > +int > +nvme_fc_set_remoteport_devloss(struct nvme_fc_remote_port *portptr, > + u32 dev_loss_tmo) > +{ > + struct nvme_fc_rport *rport = remoteport_to_rport(portptr); > + struct nvme_fc_ctrl *ctrl; > + unsigned long flags; > + > + spin_lock_irqsave(&rport->lock, flags); > + > + if (portptr->port_state != FC_OBJSTATE_ONLINE) { > + spin_unlock_irqrestore(&rport->lock, flags); > + return -EINVAL; > + } > + > + /* a dev_loss_tmo of 0 (immediate) is allowed to be set */ > + rport->remoteport.dev_loss_tmo = dev_loss_tmo; > + > + spin_unlock_irqrestore(&rport->lock, flags); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(nvme_fc_set_remoteport_devloss); > + > > /* *********************** FC-NVME DMA Handling **************************** */ > > diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h > index 4ea03b9a5c8c..3de268bf86bf 100644 > --- a/include/linux/nvme-fc-driver.h > +++ b/include/linux/nvme-fc-driver.h > @@ -40,6 +40,8 @@ > * @node_name: FC WWNN for the port > * @port_name: FC WWPN for the port > * @port_role: What NVME roles are supported (see FC_PORT_ROLE_xxx) > + * @dev_loss_tmo: maximum delay for reconnects to an association on > + * this device. Used only on a remoteport. > * > * Initialization values for dynamic port fields: > * @port_id: FC N_Port_ID currently assigned the port. Upper 8 bits must > @@ -50,6 +52,7 @@ struct nvme_fc_port_info { > u64 port_name; > u32 port_role; > u32 port_id; > + u32 dev_loss_tmo; > }; > > > @@ -202,6 +205,9 @@ enum nvme_fc_obj_state { > * The length of the buffer corresponds to the local_priv_sz > * value specified in the nvme_fc_port_template supplied by > * the LLDD. > + * @dev_loss_tmo: maximum delay for reconnects to an association on > + * this device. To modify, lldd must call > + * nvme_fc_set_remoteport_devloss(). > * > * Fields with dynamic values. Values may change base on link state. LLDD > * may reference fields directly to change them. Initialized by the > @@ -259,10 +265,9 @@ struct nvme_fc_remote_port { > u32 port_role; > u64 node_name; > u64 port_name; > - > struct nvme_fc_local_port *localport; > - > void *private; > + u32 dev_loss_tmo; > > /* dynamic fields */ > u32 port_id; > @@ -448,6 +453,8 @@ int nvme_fc_unregister_remoteport(struct nvme_fc_remote_port *remoteport); > > void nvme_fc_rescan_remoteport(struct nvme_fc_remote_port *remoteport); > > +int nvme_fc_set_remoteport_devloss(struct nvme_fc_remote_port *remoteport, > + u32 dev_loss_tmo); > > > /* > Is this exported to sysfs somewhere? If not, can we have it in sysfs? It surely made live _so_ much easier when configuring the odd device... Other than that: Reviewed-by: Hannes Reinecke <hare at suse.com> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare at suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N?rnberg) ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 3/5] nvme_fc: add a dev_loss_tmo field to the remoteport 2017-10-20 6:04 ` Hannes Reinecke @ 2017-10-20 14:50 ` James Smart 0 siblings, 0 replies; 20+ messages in thread From: James Smart @ 2017-10-20 14:50 UTC (permalink / raw) On 10/19/2017 11:04 PM, Hannes Reinecke wrote: > Is this exported to sysfs somewhere? If not, can we have it in sysfs? > It surely made live _so_ much easier when configuring the odd device... > > Other than that: > > Reviewed-by: Hannes Reinecke <hare at suse.com> > > Cheers, > > Hannes > Currently, the scsi transpoirt fc_rport sysfs entry has the dev_loss_tmo attribute, and when set, the driver will call the nvme_fc_set_remoteport_devloss() routine to set it on the nvme-fc remoteport. When we genericize the fc transport, it will be on the generic fc_rport, and the set will call both the SCSI transport/lldd and the NVME remoteport. -- james ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 4/5] nvme_fc: check connectivity before initiating reconnects 2017-10-17 23:32 [PATCH v3 0/5] nvme_fc: add dev_loss_tmo support James Smart ` (2 preceding siblings ...) 2017-10-17 23:32 ` [PATCH v3 3/5] nvme_fc: add a dev_loss_tmo field to the remoteport James Smart @ 2017-10-17 23:32 ` James Smart 2017-10-18 8:29 ` Johannes Thumshirn 2017-10-20 6:05 ` Hannes Reinecke 2017-10-17 23:32 ` [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support James Smart 4 siblings, 2 replies; 20+ messages in thread From: James Smart @ 2017-10-17 23:32 UTC (permalink / raw) check remoteport connectivity before initiating reconnects Signed-off-by: James Smart <james.smart at broadcom.com> --- v3: removed inline with lock around 1 line read of state. replaced calls with explicit checks of read state --- drivers/nvme/host/fc.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 3ac49e670c38..61da2f92f71a 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -810,7 +810,6 @@ fc_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, dma_unmap_sg(dev, sg, nents, dir); } - /* *********************** FC-NVME LS Handling **************************** */ static void nvme_fc_ctrl_put(struct nvme_fc_ctrl *); @@ -2454,6 +2453,9 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl) ++ctrl->ctrl.nr_reconnects; + if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) + return -ENODEV; + /* * Create the admin queue */ @@ -2730,6 +2732,10 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status) ctrl->cnum, status); if (nvmf_should_reconnect(&ctrl->ctrl)) { + /* Only schedule the reconnect if the remote port is online */ + if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) + return; + dev_info(ctrl->ctrl.device, "NVME-FC{%d}: Reconnect attempt in %d seconds.\n", ctrl->cnum, ctrl->ctrl.opts->reconnect_delay); @@ -2763,12 +2769,15 @@ nvme_fc_reset_ctrl_work(struct work_struct *work) return; } - ret = nvme_fc_create_association(ctrl); - if (ret) - nvme_fc_reconnect_or_delete(ctrl, ret); - else - dev_info(ctrl->ctrl.device, - "NVME-FC{%d}: controller reset complete\n", ctrl->cnum); + if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) { + ret = nvme_fc_create_association(ctrl); + if (ret) + nvme_fc_reconnect_or_delete(ctrl, ret); + else + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: controller reset complete\n", + ctrl->cnum); + } } static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = { -- 2.13.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v3 4/5] nvme_fc: check connectivity before initiating reconnects 2017-10-17 23:32 ` [PATCH v3 4/5] nvme_fc: check connectivity before initiating reconnects James Smart @ 2017-10-18 8:29 ` Johannes Thumshirn 2017-10-20 6:05 ` Hannes Reinecke 1 sibling, 0 replies; 20+ messages in thread From: Johannes Thumshirn @ 2017-10-18 8:29 UTC (permalink / raw) Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de> -- Johannes Thumshirn Storage jthumshirn at suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 4/5] nvme_fc: check connectivity before initiating reconnects 2017-10-17 23:32 ` [PATCH v3 4/5] nvme_fc: check connectivity before initiating reconnects James Smart 2017-10-18 8:29 ` Johannes Thumshirn @ 2017-10-20 6:05 ` Hannes Reinecke 1 sibling, 0 replies; 20+ messages in thread From: Hannes Reinecke @ 2017-10-20 6:05 UTC (permalink / raw) On 10/18/2017 01:32 AM, James Smart wrote: > check remoteport connectivity before initiating reconnects > > Signed-off-by: James Smart <james.smart at broadcom.com> > > --- > v3: > removed inline with lock around 1 line read of state. > replaced calls with explicit checks of read state > --- > drivers/nvme/host/fc.c | 23 ++++++++++++++++------- > 1 file changed, 16 insertions(+), 7 deletions(-) > Reviewed-by: Hannes Reinecke <hare at suse.com> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare at suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N?rnberg) ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support 2017-10-17 23:32 [PATCH v3 0/5] nvme_fc: add dev_loss_tmo support James Smart ` (3 preceding siblings ...) 2017-10-17 23:32 ` [PATCH v3 4/5] nvme_fc: check connectivity before initiating reconnects James Smart @ 2017-10-17 23:32 ` James Smart 2017-10-19 15:46 ` Trapp, Darren 2017-10-20 6:27 ` Hannes Reinecke 4 siblings, 2 replies; 20+ messages in thread From: James Smart @ 2017-10-17 23:32 UTC (permalink / raw) This patch adds the dev_loss_tmo functionality to the transport. When a remoteport is unregistered (connectivity lost), the following actions are taken: - the remoteport is marked DELETED - a dev_loss_tmo timer is started for the remoteport - all controllers on the remoteport are reset. After a controller resets, it will stall in a RECONNECTING state waiting for one of the following: - the controller will continue to attempt reconnect per max_retries and reconnect_delay. As no remoteport connectivity, the reconnect attempt will immediately fail. if max reconnect attempts are reached (e.g. ctrl_loss_tmo reached) the controller is deleted. - the remoteport is re-registered prior to dev_loss_tmo expiring. The resume of the remoteport will immediately attempt to reconnect each of its suspended controllers. - the remoteport's dev_loss_tmo expires, causing all of its controllers to be deleted. Signed-off-by: James Smart <james.smart at broadcom.com> v3: Reworked so dev_loss_tmo specific to the remoteport. Revised so connectivity loss resets controllers, and connectivity gain schedules reconnect. --- drivers/nvme/host/fc.c | 289 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 257 insertions(+), 32 deletions(-) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 61da2f92f71a..3628790371d7 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -138,6 +138,7 @@ struct nvme_fc_rport { struct nvme_fc_lport *lport; spinlock_t lock; struct kref ref; + struct delayed_work dev_loss_work; } __aligned(sizeof(u64)); /* alignment for other things alloc'd with */ enum nvme_fcctrl_flags { @@ -503,6 +504,8 @@ nvme_fc_free_rport(struct kref *ref) WARN_ON(rport->remoteport.port_state != FC_OBJSTATE_DELETED); WARN_ON(!list_empty(&rport->ctrl_list)); + cancel_delayed_work_sync(&rport->dev_loss_work); + /* remove from lport list */ spin_lock_irqsave(&nvme_fc_lock, flags); list_del(&rport->endp_list); @@ -530,6 +533,124 @@ nvme_fc_rport_get(struct nvme_fc_rport *rport) return kref_get_unless_zero(&rport->ref); } +static void +nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl) +{ + switch (ctrl->ctrl.state) { + case NVME_CTRL_NEW: + case NVME_CTRL_RECONNECTING: + /* + * As all reconnects were suppressed, schedule a + * connect. + */ + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: connectivity re-established. " + "Attempting reconnect\n", ctrl->cnum); + + queue_delayed_work(nvme_wq, &ctrl->connect_work, 0); + break; + + case NVME_CTRL_RESETTING: + /* + * Controller is already in the process of terminating the + * association. No need to do anything further. The reconnect + * step will naturally occur after the reset completes. + */ + break; + + default: + /* no action to take - let it delete */ + break; + } +} + +static struct nvme_fc_rport * +nvme_fc_attach_to_suspended_rport(struct nvme_fc_lport *lport, + struct nvme_fc_port_info *pinfo) +{ + struct nvme_fc_rport *rport; + struct nvme_fc_ctrl *ctrl; + unsigned long flags; + + spin_lock_irqsave(&nvme_fc_lock, flags); + + list_for_each_entry(rport, &lport->endp_list, endp_list) { + if (rport->remoteport.node_name != pinfo->node_name || + rport->remoteport.port_name != pinfo->port_name) + continue; + + if (!nvme_fc_rport_get(rport)) { + rport = ERR_PTR(-ENOLCK); + goto out_done; + } + + spin_unlock_irqrestore(&nvme_fc_lock, flags); + + cancel_delayed_work_sync(&rport->dev_loss_work); + + spin_lock_irqsave(&rport->lock, flags); + + /* has it been unregistered */ + if (rport->remoteport.port_state != FC_OBJSTATE_DELETED) { + /* means lldd called us twice */ + spin_unlock_irqrestore(&rport->lock, flags); + nvme_fc_rport_put(rport); + return ERR_PTR(-ESTALE); + } + + rport->remoteport.port_state = FC_OBJSTATE_ONLINE; + + /* + * kick off a reconnect attempt on all associations to the + * remote port. A successful reconnects will resume i/o. + */ + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) + nvme_fc_resume_controller(ctrl); + + spin_unlock_irqrestore(&rport->lock, flags); + + return rport; + } + + rport = NULL; + +out_done: + spin_unlock_irqrestore(&nvme_fc_lock, flags); + + return rport; +} + +static void +nvme_fc_rport_dev_loss_work(struct work_struct *work) +{ + struct nvme_fc_rport *rport = + container_of(to_delayed_work(work), + struct nvme_fc_rport, dev_loss_work); + struct nvme_fc_ctrl *ctrl; + unsigned long flags; + + spin_lock_irqsave(&rport->lock, flags); + + /* If port state transitioned dev loss shouldn't kick in */ + if (rport->remoteport.port_state != FC_OBJSTATE_DELETED) { + spin_unlock_irqrestore(&rport->lock, flags); + return; + } + + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) { + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: Remote Port failed to reconnect within " + "dev_loss_tmo (%d seconds). Deleting controller\n", + ctrl->cnum, rport->remoteport.dev_loss_tmo); + if (__nvme_fc_del_ctrl(ctrl)) + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: delete request failed\n", + ctrl->cnum); + } + + spin_unlock_irqrestore(&rport->lock, flags); +} + /** * nvme_fc_register_remoteport - transport entry point called by an * LLDD to register the existence of a NVME @@ -556,22 +677,46 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, unsigned long flags; int ret, idx; + if (!nvme_fc_lport_get(lport)) { + ret = -ESHUTDOWN; + goto out_reghost_failed; + } + + /* + * look to see if there is already a remoteport that is waiting + * for a reconnect (within dev_loss_tmo) with the same WWN's. + * If so, transition to it and reconnect. + */ + newrec = nvme_fc_attach_to_suspended_rport(lport, pinfo); + + /* found an rport, but something about its state is bad */ + if (IS_ERR(newrec)) { + ret = PTR_ERR(newrec); + goto out_lport_put; + + /* found existing rport, which was resumed */ + } else if (newrec) { + /* Ignore pinfo->dev_loss_tmo. Leave rport and ctlr's as is */ + + nvme_fc_lport_put(lport); + nvme_fc_signal_discovery_scan(lport, newrec); + *portptr = &newrec->remoteport; + return 0; + } + + /* nothing found - allocate a new remoteport struct */ + newrec = kmalloc((sizeof(*newrec) + lport->ops->remote_priv_sz), GFP_KERNEL); if (!newrec) { ret = -ENOMEM; - goto out_reghost_failed; - } - - if (!nvme_fc_lport_get(lport)) { - ret = -ESHUTDOWN; - goto out_kfree_rport; + goto out_lport_put; } idx = ida_simple_get(&lport->endp_cnt, 0, 0, GFP_KERNEL); if (idx < 0) { ret = -ENOSPC; - goto out_lport_put; + goto out_kfree_rport; } INIT_LIST_HEAD(&newrec->endp_list); @@ -594,6 +739,7 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, newrec->remoteport.dev_loss_tmo = pinfo->dev_loss_tmo; else newrec->remoteport.dev_loss_tmo = NVME_FC_DEFAULT_DEV_LOSS_TMO; + INIT_DELAYED_WORK(&newrec->dev_loss_work, nvme_fc_rport_dev_loss_work); spin_lock_irqsave(&nvme_fc_lock, flags); list_add_tail(&newrec->endp_list, &lport->endp_list); @@ -604,10 +750,10 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, *portptr = &newrec->remoteport; return 0; -out_lport_put: - nvme_fc_lport_put(lport); out_kfree_rport: kfree(newrec); +out_lport_put: + nvme_fc_lport_put(lport); out_reghost_failed: *portptr = NULL; return ret; @@ -638,6 +784,61 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport) return 0; } +static void +nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl) +{ + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: controller connectivity lost. Awaiting " + "Reconnect", ctrl->cnum); + + switch (ctrl->ctrl.state) { + case NVME_CTRL_NEW: + case NVME_CTRL_LIVE: + /* + * Schedule a controller reset. The reset will + * terminate the association and schedule the + * reconnect timer. Reconnects will be attempted + * until either the ctlr_loss_tmo + * (max_retries * connect_delay) expires or the + * remoteport's dev_loss_tmo expires. + */ + if (nvme_reset_ctrl(&ctrl->ctrl)) { + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: Couldn't schedule reset. " + "Deleting controller.\n", + ctrl->cnum); + __nvme_fc_del_ctrl(ctrl); + } + break; + + case NVME_CTRL_RECONNECTING: + /* + * The association has already been terminated + * and the controller is attempting reconnects. + * No need to do anything futher. Reconnects will + * be attempted until either the ctlr_loss_tmo + * (max_retries * connect_delay) expires or the + * remoteport's dev_loss_tmo expires. + */ + break; + + case NVME_CTRL_RESETTING: + /* + * Controller is already in the process of + * terminating the association. No need to do + * anything further. The reconnect step will + * kick in naturally after the association is + * terminated. + */ + break; + + case NVME_CTRL_DELETING: + default: + /* no action to take - let it delete */ + break; + } +} + /** * nvme_fc_unregister_remoteport - transport entry point called by an * LLDD to deregister/remove a previously @@ -667,15 +868,32 @@ nvme_fc_unregister_remoteport(struct nvme_fc_remote_port *portptr) } portptr->port_state = FC_OBJSTATE_DELETED; - /* tear down all associations to the remote port */ - list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) - __nvme_fc_del_ctrl(ctrl); + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) { + /* if dev_loss_tmo==0, dev loss is immediate */ + if (!portptr->dev_loss_tmo) { + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: controller connectivity lost. " + "Deleting controller.\n", + ctrl->cnum); + __nvme_fc_del_ctrl(ctrl); + } else + nvme_fc_ctrl_connectivity_loss(ctrl); + } spin_unlock_irqrestore(&rport->lock, flags); nvme_fc_abort_lsops(rport); + queue_delayed_work(nvme_wq, &rport->dev_loss_work, + portptr->dev_loss_tmo * HZ); + + /* + * release the reference, which will allow, if all controllers + * go away, which should only occur after dev_loss_tmo occurs, + * for the rport to be torn down. + */ nvme_fc_rport_put(rport); + return 0; } EXPORT_SYMBOL_GPL(nvme_fc_unregister_remoteport); @@ -702,7 +920,6 @@ nvme_fc_set_remoteport_devloss(struct nvme_fc_remote_port *portptr, u32 dev_loss_tmo) { struct nvme_fc_rport *rport = remoteport_to_rport(portptr); - struct nvme_fc_ctrl *ctrl; unsigned long flags; spin_lock_irqsave(&rport->lock, flags); @@ -2727,25 +2944,31 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status) if (ctrl->ctrl.state != NVME_CTRL_RECONNECTING) return; - dev_info(ctrl->ctrl.device, - "NVME-FC{%d}: reset: Reconnect attempt failed (%d)\n", - ctrl->cnum, status); + if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: reset: Reconnect attempt failed (%d)\n", + ctrl->cnum, status); if (nvmf_should_reconnect(&ctrl->ctrl)) { - /* Only schedule the reconnect if the remote port is online */ - if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) - return; - - dev_info(ctrl->ctrl.device, - "NVME-FC{%d}: Reconnect attempt in %d seconds.\n", - ctrl->cnum, ctrl->ctrl.opts->reconnect_delay); + if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: Reconnect attempt in %d " + "seconds.\n", + ctrl->cnum, ctrl->ctrl.opts->reconnect_delay); queue_delayed_work(nvme_wq, &ctrl->connect_work, ctrl->ctrl.opts->reconnect_delay * HZ); } else { - dev_warn(ctrl->ctrl.device, + if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) + dev_warn(ctrl->ctrl.device, "NVME-FC{%d}: Max reconnect attempts (%d) " "reached. Removing controller\n", ctrl->cnum, ctrl->ctrl.nr_reconnects); + else + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: Max reconnect attempts (%d) " + "reached while waiting for remoteport " + "connectivity. Removing controller\n", + ctrl->cnum, ctrl->ctrl.nr_reconnects); WARN_ON(__nvme_fc_schedule_delete_work(ctrl)); } } @@ -2769,15 +2992,17 @@ nvme_fc_reset_ctrl_work(struct work_struct *work) return; } - if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) { + if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) ret = nvme_fc_create_association(ctrl); - if (ret) - nvme_fc_reconnect_or_delete(ctrl, ret); - else - dev_info(ctrl->ctrl.device, - "NVME-FC{%d}: controller reset complete\n", - ctrl->cnum); - } + else + ret = -ENOTCONN; + + if (ret) + nvme_fc_reconnect_or_delete(ctrl, ret); + else + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: controller reset complete\n", + ctrl->cnum); } static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = { -- 2.13.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support 2017-10-17 23:32 ` [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support James Smart @ 2017-10-19 15:46 ` Trapp, Darren 2017-10-19 18:04 ` James Smart 2017-10-20 6:27 ` Hannes Reinecke 1 sibling, 1 reply; 20+ messages in thread From: Trapp, Darren @ 2017-10-19 15:46 UTC (permalink / raw) It seems like we need a call back to the LLD to let him know that the dev_loss_tmo has been canceled upon device coming back. The LLD has called the unregister_remoteport in the transport and done a wait_for_completion. If the device re-appears before dev_loss_tmo has been reached, the transport cancels the del_work. But, the LLD still has the completion event pending. Something like this: spin_unlock_irqrestore(&nvme_fc_lock, flags); - cancel_delayed_work_sync(&rport->dev_loss_work); + if (cancel_delayed_work_sync(&rport->dev_loss_work)) + if (lport->ops->remote_delete_cancel) + lport->ops->remote_delete_cancel(&rport->remoteport); spin_lock_irqsave(&rport->lock, flags); The LLD can complete its task and do whatever else it needs to do knowing that the unregister task is dead. Seems like a clean handoff between the 2. On 10/17/17, 4:34 PM, "Linux-nvme on behalf of James Smart" <linux-nvme-bounces@lists.infradead.org on behalf of jsmart2021@gmail.com> wrote: This patch adds the dev_loss_tmo functionality to the transport. When a remoteport is unregistered (connectivity lost), the following actions are taken: - the remoteport is marked DELETED - a dev_loss_tmo timer is started for the remoteport - all controllers on the remoteport are reset. After a controller resets, it will stall in a RECONNECTING state waiting for one of the following: - the controller will continue to attempt reconnect per max_retries and reconnect_delay. As no remoteport connectivity, the reconnect attempt will immediately fail. if max reconnect attempts are reached (e.g. ctrl_loss_tmo reached) the controller is deleted. - the remoteport is re-registered prior to dev_loss_tmo expiring. The resume of the remoteport will immediately attempt to reconnect each of its suspended controllers. - the remoteport's dev_loss_tmo expires, causing all of its controllers to be deleted. Signed-off-by: James Smart <james.smart at broadcom.com> v3: Reworked so dev_loss_tmo specific to the remoteport. Revised so connectivity loss resets controllers, and connectivity gain schedules reconnect. --- drivers/nvme/host/fc.c | 289 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 257 insertions(+), 32 deletions(-) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 61da2f92f71a..3628790371d7 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -138,6 +138,7 @@ struct nvme_fc_rport { struct nvme_fc_lport *lport; spinlock_t lock; struct kref ref; + struct delayed_work dev_loss_work; } __aligned(sizeof(u64)); /* alignment for other things alloc'd with */ enum nvme_fcctrl_flags { @@ -503,6 +504,8 @@ nvme_fc_free_rport(struct kref *ref) WARN_ON(rport->remoteport.port_state != FC_OBJSTATE_DELETED); WARN_ON(!list_empty(&rport->ctrl_list)); + cancel_delayed_work_sync(&rport->dev_loss_work); + /* remove from lport list */ spin_lock_irqsave(&nvme_fc_lock, flags); list_del(&rport->endp_list); @@ -530,6 +533,124 @@ nvme_fc_rport_get(struct nvme_fc_rport *rport) return kref_get_unless_zero(&rport->ref); } +static void +nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl) +{ + switch (ctrl->ctrl.state) { + case NVME_CTRL_NEW: + case NVME_CTRL_RECONNECTING: + /* + * As all reconnects were suppressed, schedule a + * connect. + */ + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: connectivity re-established. " + "Attempting reconnect\n", ctrl->cnum); + + queue_delayed_work(nvme_wq, &ctrl->connect_work, 0); + break; + + case NVME_CTRL_RESETTING: + /* + * Controller is already in the process of terminating the + * association. No need to do anything further. The reconnect + * step will naturally occur after the reset completes. + */ + break; + + default: + /* no action to take - let it delete */ + break; + } +} + +static struct nvme_fc_rport * +nvme_fc_attach_to_suspended_rport(struct nvme_fc_lport *lport, + struct nvme_fc_port_info *pinfo) +{ + struct nvme_fc_rport *rport; + struct nvme_fc_ctrl *ctrl; + unsigned long flags; + + spin_lock_irqsave(&nvme_fc_lock, flags); + + list_for_each_entry(rport, &lport->endp_list, endp_list) { + if (rport->remoteport.node_name != pinfo->node_name || + rport->remoteport.port_name != pinfo->port_name) + continue; + + if (!nvme_fc_rport_get(rport)) { + rport = ERR_PTR(-ENOLCK); + goto out_done; + } + + spin_unlock_irqrestore(&nvme_fc_lock, flags); + + cancel_delayed_work_sync(&rport->dev_loss_work); + + spin_lock_irqsave(&rport->lock, flags); + + /* has it been unregistered */ + if (rport->remoteport.port_state != FC_OBJSTATE_DELETED) { + /* means lldd called us twice */ + spin_unlock_irqrestore(&rport->lock, flags); + nvme_fc_rport_put(rport); + return ERR_PTR(-ESTALE); + } + + rport->remoteport.port_state = FC_OBJSTATE_ONLINE; + + /* + * kick off a reconnect attempt on all associations to the + * remote port. A successful reconnects will resume i/o. + */ + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) + nvme_fc_resume_controller(ctrl); + + spin_unlock_irqrestore(&rport->lock, flags); + + return rport; + } + + rport = NULL; + +out_done: + spin_unlock_irqrestore(&nvme_fc_lock, flags); + + return rport; +} + +static void +nvme_fc_rport_dev_loss_work(struct work_struct *work) +{ + struct nvme_fc_rport *rport = + container_of(to_delayed_work(work), + struct nvme_fc_rport, dev_loss_work); + struct nvme_fc_ctrl *ctrl; + unsigned long flags; + + spin_lock_irqsave(&rport->lock, flags); + + /* If port state transitioned dev loss shouldn't kick in */ + if (rport->remoteport.port_state != FC_OBJSTATE_DELETED) { + spin_unlock_irqrestore(&rport->lock, flags); + return; + } + + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) { + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: Remote Port failed to reconnect within " + "dev_loss_tmo (%d seconds). Deleting controller\n", + ctrl->cnum, rport->remoteport.dev_loss_tmo); + if (__nvme_fc_del_ctrl(ctrl)) + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: delete request failed\n", + ctrl->cnum); + } + + spin_unlock_irqrestore(&rport->lock, flags); +} + /** * nvme_fc_register_remoteport - transport entry point called by an * LLDD to register the existence of a NVME @@ -556,22 +677,46 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, unsigned long flags; int ret, idx; + if (!nvme_fc_lport_get(lport)) { + ret = -ESHUTDOWN; + goto out_reghost_failed; + } + + /* + * look to see if there is already a remoteport that is waiting + * for a reconnect (within dev_loss_tmo) with the same WWN's. + * If so, transition to it and reconnect. + */ + newrec = nvme_fc_attach_to_suspended_rport(lport, pinfo); + + /* found an rport, but something about its state is bad */ + if (IS_ERR(newrec)) { + ret = PTR_ERR(newrec); + goto out_lport_put; + + /* found existing rport, which was resumed */ + } else if (newrec) { + /* Ignore pinfo->dev_loss_tmo. Leave rport and ctlr's as is */ + + nvme_fc_lport_put(lport); + nvme_fc_signal_discovery_scan(lport, newrec); + *portptr = &newrec->remoteport; + return 0; + } + + /* nothing found - allocate a new remoteport struct */ + newrec = kmalloc((sizeof(*newrec) + lport->ops->remote_priv_sz), GFP_KERNEL); if (!newrec) { ret = -ENOMEM; - goto out_reghost_failed; - } - - if (!nvme_fc_lport_get(lport)) { - ret = -ESHUTDOWN; - goto out_kfree_rport; + goto out_lport_put; } idx = ida_simple_get(&lport->endp_cnt, 0, 0, GFP_KERNEL); if (idx < 0) { ret = -ENOSPC; - goto out_lport_put; + goto out_kfree_rport; } INIT_LIST_HEAD(&newrec->endp_list); @@ -594,6 +739,7 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, newrec->remoteport.dev_loss_tmo = pinfo->dev_loss_tmo; else newrec->remoteport.dev_loss_tmo = NVME_FC_DEFAULT_DEV_LOSS_TMO; + INIT_DELAYED_WORK(&newrec->dev_loss_work, nvme_fc_rport_dev_loss_work); spin_lock_irqsave(&nvme_fc_lock, flags); list_add_tail(&newrec->endp_list, &lport->endp_list); @@ -604,10 +750,10 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, *portptr = &newrec->remoteport; return 0; -out_lport_put: - nvme_fc_lport_put(lport); out_kfree_rport: kfree(newrec); +out_lport_put: + nvme_fc_lport_put(lport); out_reghost_failed: *portptr = NULL; return ret; @@ -638,6 +784,61 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport) return 0; } +static void +nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl) +{ + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: controller connectivity lost. Awaiting " + "Reconnect", ctrl->cnum); + + switch (ctrl->ctrl.state) { + case NVME_CTRL_NEW: + case NVME_CTRL_LIVE: + /* + * Schedule a controller reset. The reset will + * terminate the association and schedule the + * reconnect timer. Reconnects will be attempted + * until either the ctlr_loss_tmo + * (max_retries * connect_delay) expires or the + * remoteport's dev_loss_tmo expires. + */ + if (nvme_reset_ctrl(&ctrl->ctrl)) { + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: Couldn't schedule reset. " + "Deleting controller.\n", + ctrl->cnum); + __nvme_fc_del_ctrl(ctrl); + } + break; + + case NVME_CTRL_RECONNECTING: + /* + * The association has already been terminated + * and the controller is attempting reconnects. + * No need to do anything futher. Reconnects will + * be attempted until either the ctlr_loss_tmo + * (max_retries * connect_delay) expires or the + * remoteport's dev_loss_tmo expires. + */ + break; + + case NVME_CTRL_RESETTING: + /* + * Controller is already in the process of + * terminating the association. No need to do + * anything further. The reconnect step will + * kick in naturally after the association is + * terminated. + */ + break; + + case NVME_CTRL_DELETING: + default: + /* no action to take - let it delete */ + break; + } +} + /** * nvme_fc_unregister_remoteport - transport entry point called by an * LLDD to deregister/remove a previously @@ -667,15 +868,32 @@ nvme_fc_unregister_remoteport(struct nvme_fc_remote_port *portptr) } portptr->port_state = FC_OBJSTATE_DELETED; - /* tear down all associations to the remote port */ - list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) - __nvme_fc_del_ctrl(ctrl); + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) { + /* if dev_loss_tmo==0, dev loss is immediate */ + if (!portptr->dev_loss_tmo) { + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: controller connectivity lost. " + "Deleting controller.\n", + ctrl->cnum); + __nvme_fc_del_ctrl(ctrl); + } else + nvme_fc_ctrl_connectivity_loss(ctrl); + } spin_unlock_irqrestore(&rport->lock, flags); nvme_fc_abort_lsops(rport); + queue_delayed_work(nvme_wq, &rport->dev_loss_work, + portptr->dev_loss_tmo * HZ); + + /* + * release the reference, which will allow, if all controllers + * go away, which should only occur after dev_loss_tmo occurs, + * for the rport to be torn down. + */ nvme_fc_rport_put(rport); + return 0; } EXPORT_SYMBOL_GPL(nvme_fc_unregister_remoteport); @@ -702,7 +920,6 @@ nvme_fc_set_remoteport_devloss(struct nvme_fc_remote_port *portptr, u32 dev_loss_tmo) { struct nvme_fc_rport *rport = remoteport_to_rport(portptr); - struct nvme_fc_ctrl *ctrl; unsigned long flags; spin_lock_irqsave(&rport->lock, flags); @@ -2727,25 +2944,31 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status) if (ctrl->ctrl.state != NVME_CTRL_RECONNECTING) return; - dev_info(ctrl->ctrl.device, - "NVME-FC{%d}: reset: Reconnect attempt failed (%d)\n", - ctrl->cnum, status); + if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: reset: Reconnect attempt failed (%d)\n", + ctrl->cnum, status); if (nvmf_should_reconnect(&ctrl->ctrl)) { - /* Only schedule the reconnect if the remote port is online */ - if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) - return; - - dev_info(ctrl->ctrl.device, - "NVME-FC{%d}: Reconnect attempt in %d seconds.\n", - ctrl->cnum, ctrl->ctrl.opts->reconnect_delay); + if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: Reconnect attempt in %d " + "seconds.\n", + ctrl->cnum, ctrl->ctrl.opts->reconnect_delay); queue_delayed_work(nvme_wq, &ctrl->connect_work, ctrl->ctrl.opts->reconnect_delay * HZ); } else { - dev_warn(ctrl->ctrl.device, + if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) + dev_warn(ctrl->ctrl.device, "NVME-FC{%d}: Max reconnect attempts (%d) " "reached. Removing controller\n", ctrl->cnum, ctrl->ctrl.nr_reconnects); + else + dev_warn(ctrl->ctrl.device, + "NVME-FC{%d}: Max reconnect attempts (%d) " + "reached while waiting for remoteport " + "connectivity. Removing controller\n", + ctrl->cnum, ctrl->ctrl.nr_reconnects); WARN_ON(__nvme_fc_schedule_delete_work(ctrl)); } } @@ -2769,15 +2992,17 @@ nvme_fc_reset_ctrl_work(struct work_struct *work) return; } - if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) { + if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) ret = nvme_fc_create_association(ctrl); - if (ret) - nvme_fc_reconnect_or_delete(ctrl, ret); - else - dev_info(ctrl->ctrl.device, - "NVME-FC{%d}: controller reset complete\n", - ctrl->cnum); - } + else + ret = -ENOTCONN; + + if (ret) + nvme_fc_reconnect_or_delete(ctrl, ret); + else + dev_info(ctrl->ctrl.device, + "NVME-FC{%d}: controller reset complete\n", + ctrl->cnum); } static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = { -- 2.13.1 _______________________________________________ Linux-nvme mailing list Linux-nvme at lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support 2017-10-19 15:46 ` Trapp, Darren @ 2017-10-19 18:04 ` James Smart 2017-10-19 19:49 ` Trapp, Darren 0 siblings, 1 reply; 20+ messages in thread From: James Smart @ 2017-10-19 18:04 UTC (permalink / raw) On 10/19/2017 8:46 AM, Trapp, Darren wrote: > It seems like we need a call back to the LLD to let him know that the dev_loss_tmo has been canceled upon device coming back. I don't think so. We already have the desired functionality in the remoteport_delete callback. In SCSI we had devloss tied 1:1 with the remote_port, devloss started when remote_port unregistered, and when dev_loss_tmo fired thus releasing the remote port, we called the dev_loss_tmo_callbk. So the dev_loss_tmo_callbk is actually the remoteport delete callback. There was also the notion that the SCSI target, which is 1:1 with the remote port, was visible on the os until that callback occurred. In nvme-fc, the lldd calls the unregister remoteport, and when the remoteport is finished/done, the remoteport_delete will be called back. Currently that callback is based on the remoteport datastructure being freed which means it waits for all controllers to be freed as well. This turns out to be rather ugly as controller freeing is based on namespace devices being freed, and some apps can hold on to the devnode a long time. So I've been wrapping up a patch that instead has the remoteport_delete called when all associations relative to the lldd have been terminated. The association terminations are done immediately upon the unregister call. That should have killed all hw queues, etc leaving nothing left for the lldd to be tracking. Although the transport<->lldd associations are killed, the nvme controller and it's namespaces are still live relative to the os, just stalled waiting for the ctrl_loss_tmo or dev_loss_tmo to fire. Those timers can be/are completely asynchronous to the remoteport_delete. If the lldd registers a new and matching remoteport, the transport will reuse the existing remoteport structures and related controllers, etc - just updating the LLDD related info (which I've found a bug or two in). Is there a specific reason you would want to hold off the remoteport delete for a full dev_loss_tmo vs as soon as the transport terminates the references to the lldd relative to remoteport (e.g. association terminations)? With the association teardowns only, it can be much faster than dev_loss_tmo. > > The LLD has called the unregister_remoteport in the transport and done a wait_for_completion. If the device re-appears before dev_loss_tmo has been reached, the transport cancels the del_work. But, the LLD still has the completion event pending. Ahh. I understand. The wait_for_completions are in the lldd and I know older code shows had the sequence you describe. True, if we reuse the remoteport struct, as references never expired, the delete call would have never occurred. So this is a bug with what's there. I think the cleanest fix is to have the delete callback occur as soon as the delete_associations are done - as that will always occur and do so rather quickly. If needed, if we find we are reusing a remoteport struct and we hadn't called the delete, we call it prior to returning to the second register remoteport call, which should work as well. note: as when we call remoteport_delete (ref cnt and free vs association terminate) is independent from dev_loss_tmo, it shouldn't affect this dev_loss patch set. -- james ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support 2017-10-19 18:04 ` James Smart @ 2017-10-19 19:49 ` Trapp, Darren 0 siblings, 0 replies; 20+ messages in thread From: Trapp, Darren @ 2017-10-19 19:49 UTC (permalink / raw) Thanks for the response. I think if it tears down the rport immediately and always calls back to the LLD it solves the problem. On 10/19/17, 11:04 AM, "James Smart" <jsmart2021@gmail.com> wrote: Ahh. I understand. The wait_for_completions are in the lldd and I know older code shows had the sequence you describe. True, if we reuse the remoteport struct, as references never expired, the delete call would have never occurred. So this is a bug with what's there. I think the cleanest fix is to have the delete callback occur as soon as the delete_associations are done - as that will always occur and do so rather quickly. If needed, if we find we are reusing a remoteport struct and we hadn't called the delete, we call it prior to returning to the second register remoteport call, which should work as well. note: as when we call remoteport_delete (ref cnt and free vs association terminate) is independent from dev_loss_tmo, it shouldn't affect this dev_loss patch set. -- james ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support 2017-10-17 23:32 ` [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support James Smart 2017-10-19 15:46 ` Trapp, Darren @ 2017-10-20 6:27 ` Hannes Reinecke 2017-10-20 18:53 ` James Smart 1 sibling, 1 reply; 20+ messages in thread From: Hannes Reinecke @ 2017-10-20 6:27 UTC (permalink / raw) On 10/18/2017 01:32 AM, James Smart wrote: > This patch adds the dev_loss_tmo functionality to the transport. > > When a remoteport is unregistered (connectivity lost), the following > actions are taken: > - the remoteport is marked DELETED > - a dev_loss_tmo timer is started for the remoteport > - all controllers on the remoteport are reset. > > After a controller resets, it will stall in a RECONNECTING state > waiting for one of the following: > - the controller will continue to attempt reconnect per max_retries > and reconnect_delay. As no remoteport connectivity, the reconnect > attempt will immediately fail. if max reconnect attempts are > reached (e.g. ctrl_loss_tmo reached) the controller is deleted. > - the remoteport is re-registered prior to dev_loss_tmo expiring. > The resume of the remoteport will immediately attempt to reconnect > each of its suspended controllers. > - the remoteport's dev_loss_tmo expires, causing all of its > controllers to be deleted. > > Signed-off-by: James Smart <james.smart at broadcom.com> > > v3: > Reworked so dev_loss_tmo specific to the remoteport. Revised > so connectivity loss resets controllers, and connectivity gain > schedules reconnect. > --- > drivers/nvme/host/fc.c | 289 +++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 257 insertions(+), 32 deletions(-) > > diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c > index 61da2f92f71a..3628790371d7 100644 > --- a/drivers/nvme/host/fc.c > +++ b/drivers/nvme/host/fc.c > @@ -138,6 +138,7 @@ struct nvme_fc_rport { > struct nvme_fc_lport *lport; > spinlock_t lock; > struct kref ref; > + struct delayed_work dev_loss_work; > } __aligned(sizeof(u64)); /* alignment for other things alloc'd with */ > > enum nvme_fcctrl_flags { > @@ -503,6 +504,8 @@ nvme_fc_free_rport(struct kref *ref) > WARN_ON(rport->remoteport.port_state != FC_OBJSTATE_DELETED); > WARN_ON(!list_empty(&rport->ctrl_list)); > > + cancel_delayed_work_sync(&rport->dev_loss_work); > + > /* remove from lport list */ > spin_lock_irqsave(&nvme_fc_lock, flags); > list_del(&rport->endp_list); > @@ -530,6 +533,124 @@ nvme_fc_rport_get(struct nvme_fc_rport *rport) > return kref_get_unless_zero(&rport->ref); > } > > +static void > +nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl) > +{ > + switch (ctrl->ctrl.state) { > + case NVME_CTRL_NEW: > + case NVME_CTRL_RECONNECTING: > + /* > + * As all reconnects were suppressed, schedule a > + * connect. > + */ > + dev_info(ctrl->ctrl.device, > + "NVME-FC{%d}: connectivity re-established. " > + "Attempting reconnect\n", ctrl->cnum); > + > + queue_delayed_work(nvme_wq, &ctrl->connect_work, 0); > + break; > + > + case NVME_CTRL_RESETTING: > + /* > + * Controller is already in the process of terminating the > + * association. No need to do anything further. The reconnect > + * step will naturally occur after the reset completes. > + */ > + break; > + > + default: > + /* no action to take - let it delete */ > + break; > + } > +} > + > +static struct nvme_fc_rport * > +nvme_fc_attach_to_suspended_rport(struct nvme_fc_lport *lport, > + struct nvme_fc_port_info *pinfo) > +{ > + struct nvme_fc_rport *rport; > + struct nvme_fc_ctrl *ctrl; > + unsigned long flags; > + > + spin_lock_irqsave(&nvme_fc_lock, flags); > + > + list_for_each_entry(rport, &lport->endp_list, endp_list) { > + if (rport->remoteport.node_name != pinfo->node_name || > + rport->remoteport.port_name != pinfo->port_name) > + continue; > + > + if (!nvme_fc_rport_get(rport)) { > + rport = ERR_PTR(-ENOLCK); > + goto out_done; > + } > + > + spin_unlock_irqrestore(&nvme_fc_lock, flags); > + > + cancel_delayed_work_sync(&rport->dev_loss_work); > + > + spin_lock_irqsave(&rport->lock, flags); > + > + /* has it been unregistered */ > + if (rport->remoteport.port_state != FC_OBJSTATE_DELETED) { > + /* means lldd called us twice */ > + spin_unlock_irqrestore(&rport->lock, flags); > + nvme_fc_rport_put(rport); > + return ERR_PTR(-ESTALE); > + } > + Why can't we re-use this rport? It's patently _not_ deleted, so we should be good to use it, right? Someone else might've messed up, but if _we_ get our reference counting right that shouldn't affect us, no? Plus: How would we recover from such a situation? The rport most likely will _never_ go away, meaning we'll always hit this scenario for each and every connection attempt. (Speaking from experience here; I've hit this scenario accidentally :-) So the only chance we have is a reboot. > + rport->remoteport.port_state = FC_OBJSTATE_ONLINE; > + > + /* > + * kick off a reconnect attempt on all associations to the > + * remote port. A successful reconnects will resume i/o. > + */ > + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) > + nvme_fc_resume_controller(ctrl); > + > + spin_unlock_irqrestore(&rport->lock, flags); > + > + return rport; > + } > + > + rport = NULL; > + > +out_done: > + spin_unlock_irqrestore(&nvme_fc_lock, flags); > + > + return rport; > +} > + > +static void > +nvme_fc_rport_dev_loss_work(struct work_struct *work) > +{ > + struct nvme_fc_rport *rport = > + container_of(to_delayed_work(work), > + struct nvme_fc_rport, dev_loss_work); > + struct nvme_fc_ctrl *ctrl; > + unsigned long flags; > + > + spin_lock_irqsave(&rport->lock, flags); > + > + /* If port state transitioned dev loss shouldn't kick in */ > + if (rport->remoteport.port_state != FC_OBJSTATE_DELETED) { > + spin_unlock_irqrestore(&rport->lock, flags); > + return; > + } > + > + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) { > + dev_warn(ctrl->ctrl.device, > + "NVME-FC{%d}: Remote Port failed to reconnect within " > + "dev_loss_tmo (%d seconds). Deleting controller\n", > + ctrl->cnum, rport->remoteport.dev_loss_tmo); > + if (__nvme_fc_del_ctrl(ctrl)) > + dev_warn(ctrl->ctrl.device, > + "NVME-FC{%d}: delete request failed\n", > + ctrl->cnum); > + } > + > + spin_unlock_irqrestore(&rport->lock, flags); > +} > + > /** > * nvme_fc_register_remoteport - transport entry point called by an > * LLDD to register the existence of a NVME > @@ -556,22 +677,46 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, > unsigned long flags; > int ret, idx; > > + if (!nvme_fc_lport_get(lport)) { > + ret = -ESHUTDOWN; > + goto out_reghost_failed; > + } > + > + /* > + * look to see if there is already a remoteport that is waiting > + * for a reconnect (within dev_loss_tmo) with the same WWN's. > + * If so, transition to it and reconnect. > + */ > + newrec = nvme_fc_attach_to_suspended_rport(lport, pinfo); > + > + /* found an rport, but something about its state is bad */ > + if (IS_ERR(newrec)) { > + ret = PTR_ERR(newrec); > + goto out_lport_put; > + > + /* found existing rport, which was resumed */ > + } else if (newrec) { > + /* Ignore pinfo->dev_loss_tmo. Leave rport and ctlr's as is */ > + > + nvme_fc_lport_put(lport); > + nvme_fc_signal_discovery_scan(lport, newrec); > + *portptr = &newrec->remoteport; > + return 0; > + } > + > + /* nothing found - allocate a new remoteport struct */ > + > newrec = kmalloc((sizeof(*newrec) + lport->ops->remote_priv_sz), > GFP_KERNEL); > if (!newrec) { > ret = -ENOMEM; > - goto out_reghost_failed; > - } > - > - if (!nvme_fc_lport_get(lport)) { > - ret = -ESHUTDOWN; > - goto out_kfree_rport; > + goto out_lport_put; > } > > idx = ida_simple_get(&lport->endp_cnt, 0, 0, GFP_KERNEL); > if (idx < 0) { > ret = -ENOSPC; > - goto out_lport_put; > + goto out_kfree_rport; > } > > INIT_LIST_HEAD(&newrec->endp_list); > @@ -594,6 +739,7 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, > newrec->remoteport.dev_loss_tmo = pinfo->dev_loss_tmo; > else > newrec->remoteport.dev_loss_tmo = NVME_FC_DEFAULT_DEV_LOSS_TMO; > + INIT_DELAYED_WORK(&newrec->dev_loss_work, nvme_fc_rport_dev_loss_work); > > spin_lock_irqsave(&nvme_fc_lock, flags); > list_add_tail(&newrec->endp_list, &lport->endp_list); > @@ -604,10 +750,10 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport, > *portptr = &newrec->remoteport; > return 0; > > -out_lport_put: > - nvme_fc_lport_put(lport); > out_kfree_rport: > kfree(newrec); > +out_lport_put: > + nvme_fc_lport_put(lport); > out_reghost_failed: > *portptr = NULL; > return ret; > @@ -638,6 +784,61 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport) > return 0; > } > > +static void > +nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl) > +{ > + dev_info(ctrl->ctrl.device, > + "NVME-FC{%d}: controller connectivity lost. Awaiting " > + "Reconnect", ctrl->cnum); > + > + switch (ctrl->ctrl.state) { > + case NVME_CTRL_NEW: > + case NVME_CTRL_LIVE: > + /* > + * Schedule a controller reset. The reset will > + * terminate the association and schedule the > + * reconnect timer. Reconnects will be attempted > + * until either the ctlr_loss_tmo > + * (max_retries * connect_delay) expires or the > + * remoteport's dev_loss_tmo expires. > + */ > + if (nvme_reset_ctrl(&ctrl->ctrl)) { > + dev_warn(ctrl->ctrl.device, > + "NVME-FC{%d}: Couldn't schedule reset. " > + "Deleting controller.\n", > + ctrl->cnum); > + __nvme_fc_del_ctrl(ctrl); > + } > + break; > + > + case NVME_CTRL_RECONNECTING: > + /* > + * The association has already been terminated > + * and the controller is attempting reconnects. > + * No need to do anything futher. Reconnects will > + * be attempted until either the ctlr_loss_tmo > + * (max_retries * connect_delay) expires or the > + * remoteport's dev_loss_tmo expires. > + */ > + break; > + > + case NVME_CTRL_RESETTING: > + /* > + * Controller is already in the process of > + * terminating the association. No need to do > + * anything further. The reconnect step will > + * kick in naturally after the association is > + * terminated. > + */ > + break; > + > + case NVME_CTRL_DELETING: > + default: > + /* no action to take - let it delete */ > + break; > + } > +} > + > /** > * nvme_fc_unregister_remoteport - transport entry point called by an > * LLDD to deregister/remove a previously > @@ -667,15 +868,32 @@ nvme_fc_unregister_remoteport(struct nvme_fc_remote_port *portptr) > } > portptr->port_state = FC_OBJSTATE_DELETED; > > - /* tear down all associations to the remote port */ > - list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) > - __nvme_fc_del_ctrl(ctrl); > + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) { > + /* if dev_loss_tmo==0, dev loss is immediate */ > + if (!portptr->dev_loss_tmo) { > + dev_warn(ctrl->ctrl.device, > + "NVME-FC{%d}: controller connectivity lost. " > + "Deleting controller.\n", > + ctrl->cnum); > + __nvme_fc_del_ctrl(ctrl); > + } else > + nvme_fc_ctrl_connectivity_loss(ctrl); > + } > > spin_unlock_irqrestore(&rport->lock, flags); > > nvme_fc_abort_lsops(rport); > > + queue_delayed_work(nvme_wq, &rport->dev_loss_work, > + portptr->dev_loss_tmo * HZ); > + > + /*nvme_fc_ctrl_connectivity_loss > + * release the reference, which will allow, if all controllers > + * go away, which should only occur after dev_loss_tmo occurs, > + * for the rport to be torn down. > + */ > nvme_fc_rport_put(rport); > + > return 0; > } > EXPORT_SYMBOL_GPL(nvme_fc_unregister_remoteport); Hmm. The dev_loss_tmo is pretty awkward; scheduling a per-rport structure only to tear down all attached _controllers_ once an rport goes down. But seeing that __nvme_fc_del_ctrl() is actually just punting the deletion to a workqueue I guess that's okay. Also we might end up having called __nvme_fc_del_ctrl() for each controller, making me wonder why we need to schedule a dev_loss_work if all controllers are being removed anyway. Also I'm not exactly thrilled with the reference counting; we put things on a workqueue but do not increase the reference for it. Which means that the rport structure might be freed even though it's still on the workqueue. Shouldn't we rather take a reference before putting it on the workqueue, and drop the reference once we're done with the workqueue function? Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare at suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N?rnberg) ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support 2017-10-20 6:27 ` Hannes Reinecke @ 2017-10-20 18:53 ` James Smart 0 siblings, 0 replies; 20+ messages in thread From: James Smart @ 2017-10-20 18:53 UTC (permalink / raw) On 10/19/2017 11:27 PM, Hannes Reinecke wrote: ... >> +static struct nvme_fc_rport * >> +nvme_fc_attach_to_suspended_rport(struct nvme_fc_lport *lport, >> + struct nvme_fc_port_info *pinfo) >> +{ ... >> + >> + /* has it been unregistered */ >> + if (rport->remoteport.port_state != FC_OBJSTATE_DELETED) { >> + /* means lldd called us twice */ >> + spin_unlock_irqrestore(&rport->lock, flags); >> + nvme_fc_rport_put(rport); >> + return ERR_PTR(-ESTALE); >> + } >> + > Why can't we re-use this rport? > It's patently _not_ deleted, so we should be good to use it, right? > Someone else might've messed up, but if _we_ get our reference counting > right that shouldn't affect us, no? > Plus: How would we recover from such a situation? > The rport most likely will _never_ go away, meaning we'll always hit > this scenario for each and every connection attempt. > (Speaking from experience here; I've hit this scenario accidentally :-) > So the only chance we have is a reboot. Because if it's not in a deleted state, then it's one that is effectively live/present and there can be only 1 remoteport with the same WWPN/WWNN on the localport. This would be the case of a lldd calling register for the same remoteport twice, which we don't allow. I'm not looking to allow or survive such a driver error - I'll let the second registration fail as it is illegal. How did you hit this scenario ? it would really surprise me. >> } >> >> +static void >> +nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl) >> +{ >> + dev_info(ctrl->ctrl.device, >> + "NVME-FC{%d}: controller connectivity lost. Awaiting " >> + "Reconnect", ctrl->cnum); >> + >> + switch (ctrl->ctrl.state) { >> + case NVME_CTRL_NEW: >> + case NVME_CTRL_LIVE: >> + /* >> + * Schedule a controller reset. The reset will >> + * terminate the association and schedule the >> + * reconnect timer. Reconnects will be attempted >> + * until either the ctlr_loss_tmo >> + * (max_retries * connect_delay) expires or the >> + * remoteport's dev_loss_tmo expires. >> + */ >> + if (nvme_reset_ctrl(&ctrl->ctrl)) { >> + dev_warn(ctrl->ctrl.device, >> + "NVME-FC{%d}: Couldn't schedule reset. " >> + "Deleting controller.\n", >> + ctrl->cnum); >> + __nvme_fc_del_ctrl(ctrl); >> + } >> + break; >> + >> + case NVME_CTRL_RECONNECTING: >> + /* >> + * The association has already been terminated >> + * and the controller is attempting reconnects. >> + * No need to do anything futher. Reconnects will >> + * be attempted until either the ctlr_loss_tmo >> + * (max_retries * connect_delay) expires or the >> + * remoteport's dev_loss_tmo expires. >> + */ >> + break; >> + >> + case NVME_CTRL_RESETTING: >> + /* >> + * Controller is already in the process of >> + * terminating the association. No need to do >> + * anything further. The reconnect step will >> + * kick in naturally after the association is >> + * terminated. >> + */ >> + break; >> + >> + case NVME_CTRL_DELETING: >> + default: >> + /* no action to take - let it delete */ >> + break; >> + } >> +} >> + >> /** >> * nvme_fc_unregister_remoteport - transport entry point called by an >> * LLDD to deregister/remove a previously >> @@ -667,15 +868,32 @@ nvme_fc_unregister_remoteport(struct nvme_fc_remote_port *portptr) >> } >> portptr->port_state = FC_OBJSTATE_DELETED; >> >> - /* tear down all associations to the remote port */ >> - list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) >> - __nvme_fc_del_ctrl(ctrl); >> + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) { >> + /* if dev_loss_tmo==0, dev loss is immediate */ >> + if (!portptr->dev_loss_tmo) { >> + dev_warn(ctrl->ctrl.device, >> + "NVME-FC{%d}: controller connectivity lost. " >> + "Deleting controller.\n", >> + ctrl->cnum); >> + __nvme_fc_del_ctrl(ctrl); >> + } else >> + nvme_fc_ctrl_connectivity_loss(ctrl); >> + } >> >> spin_unlock_irqrestore(&rport->lock, flags); >> >> nvme_fc_abort_lsops(rport); >> >> + queue_delayed_work(nvme_wq, &rport->dev_loss_work, >> + portptr->dev_loss_tmo * HZ); >> + >> + /*nvme_fc_ctrl_connectivity_loss >> + * release the reference, which will allow, if all controllers >> + * go away, which should only occur after dev_loss_tmo occurs, >> + * for the rport to be torn down. >> + */ >> nvme_fc_rport_put(rport); >> + >> return 0; >> } >> EXPORT_SYMBOL_GPL(nvme_fc_unregister_remoteport); > > Hmm. > The dev_loss_tmo is pretty awkward; scheduling a per-rport structure > only to tear down all attached _controllers_ once an rport goes down. > But seeing that __nvme_fc_del_ctrl() is actually just punting the > deletion to a workqueue I guess that's okay. I guess so. Just reusing the existing delete code paths whcih set the nvme controller state. > Also we might end up having called __nvme_fc_del_ctrl() for each > controller, making me wonder why we need to schedule a dev_loss_work if > all controllers are being removed anyway. I'm not sure I follow, but.. We could defer to the ctlr_loss_tmo completely, but it may be longer than dev_loss_tmo and I think the confuses people that may set the dev_loss_tmo value an expect things to terminate at dev_loss_tmo. Currently, the controller delete is min(dev_loss_tmo, ctlr_loss_tmo). > Also I'm not exactly thrilled with the reference counting; we put things > on a workqueue but do not increase the reference for it. > Which means that the rport structure might be freed even though it's > still on the workqueue. Shouldn't we rather take a reference before > putting it on the workqueue, and drop the reference once we're done with > the workqueue function? ok, I agree. I'll rework so that there is no dev_loss_tmo timer running. Instead, we'll record the rport connectivity loss time, and in the controller reconnect loops, if we try to reconnect but connectivity has been lost for dev_loss_tmo, we'll give up. -- james ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-10-20 18:53 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-10-17 23:32 [PATCH v3 0/5] nvme_fc: add dev_loss_tmo support James Smart 2017-10-17 23:32 ` [PATCH v3 1/5] nvme core: allow controller RESETTING to RECONNECTING transition James Smart 2017-10-18 8:26 ` Johannes Thumshirn 2017-10-20 5:58 ` Hannes Reinecke 2017-10-17 23:32 ` [PATCH v3 2/5] nvme_fc: change ctlr state assignments during reset/reconnect James Smart 2017-10-18 8:27 ` Johannes Thumshirn 2017-10-20 6:00 ` Hannes Reinecke 2017-10-17 23:32 ` [PATCH v3 3/5] nvme_fc: add a dev_loss_tmo field to the remoteport James Smart 2017-10-18 8:28 ` Johannes Thumshirn 2017-10-20 6:04 ` Hannes Reinecke 2017-10-20 14:50 ` James Smart 2017-10-17 23:32 ` [PATCH v3 4/5] nvme_fc: check connectivity before initiating reconnects James Smart 2017-10-18 8:29 ` Johannes Thumshirn 2017-10-20 6:05 ` Hannes Reinecke 2017-10-17 23:32 ` [PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support James Smart 2017-10-19 15:46 ` Trapp, Darren 2017-10-19 18:04 ` James Smart 2017-10-19 19:49 ` Trapp, Darren 2017-10-20 6:27 ` Hannes Reinecke 2017-10-20 18:53 ` James Smart
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).