* [PATCH v2 1/7] nvme core: allow controller RESETTING to RECONNECTING transition
2017-09-27 4:50 [PATCH v2 0/7] nvme_fc: add dev_loss_tmo support James Smart
@ 2017-09-27 4:50 ` James Smart
2017-10-04 8:28 ` Christoph Hellwig
2017-09-27 4:50 ` [PATCH v2 2/7] nvme_fc: change ctlr state assignments during reset/reconnect James Smart
` (5 subsequent siblings)
6 siblings, 1 reply; 26+ messages in thread
From: James Smart @ 2017-09-27 4:50 UTC (permalink / raw)
Allow controller state transition : RESETTING to RECONNECTING
Transport will typically transition from LIVE->RESETTING when
initially performing a reset or recovering from an error. Adding
this transition allows a transport to transition to RECONNECTING
when it checks/waits for connectivity then creates new transport
connections and reinits the controller.
-- james
Signed-off-by: James Smart <james.smart at broadcom.com>
---
drivers/nvme/host/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bb2aad078637..179ca6f79da3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -205,6 +205,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
case NVME_CTRL_RECONNECTING:
switch (old_state) {
case NVME_CTRL_LIVE:
+ case NVME_CTRL_RESETTING:
changed = true;
/* FALLTHRU */
default:
--
2.13.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 1/7] nvme core: allow controller RESETTING to RECONNECTING transition
2017-09-27 4:50 ` [PATCH v2 1/7] nvme core: allow controller RESETTING to RECONNECTING transition James Smart
@ 2017-10-04 8:28 ` Christoph Hellwig
2017-10-11 10:02 ` Sagi Grimberg
0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-10-04 8:28 UTC (permalink / raw)
On Tue, Sep 26, 2017@09:50:40PM -0700, James Smart wrote:
> Allow controller state transition : RESETTING to RECONNECTING
>
> Transport will typically transition from LIVE->RESETTING when
> initially performing a reset or recovering from an error. Adding
> this transition allows a transport to transition to RECONNECTING
> when it checks/waits for connectivity then creates new transport
> connections and reinits the controller.
Looks fine for now, although we really need to move to a common
state machine..
Reviewed-by: Christoph Hellwig <hch at lst.de>
>
> -- james
>
> Signed-off-by: James Smart <james.smart at broadcom.com>
> ---
> drivers/nvme/host/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index bb2aad078637..179ca6f79da3 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -205,6 +205,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> case NVME_CTRL_RECONNECTING:
> switch (old_state) {
> case NVME_CTRL_LIVE:
> + case NVME_CTRL_RESETTING:
> changed = true;
> /* FALLTHRU */
> default:
> --
> 2.13.1
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
---end quoted text---
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 1/7] nvme core: allow controller RESETTING to RECONNECTING transition
2017-10-04 8:28 ` Christoph Hellwig
@ 2017-10-11 10:02 ` Sagi Grimberg
0 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-10-11 10:02 UTC (permalink / raw)
>> Allow controller state transition : RESETTING to RECONNECTING
>>
>> Transport will typically transition from LIVE->RESETTING when
>> initially performing a reset or recovering from an error. Adding
>> this transition allows a transport to transition to RECONNECTING
>> when it checks/waits for connectivity then creates new transport
>> connections and reinits the controller.
>
> Looks fine for now, although we really need to move to a common
> state machine..
Agree, will work on it,
> Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 2/7] nvme_fc: change ctlr state assignments during reset/reconnect
2017-09-27 4:50 [PATCH v2 0/7] nvme_fc: add dev_loss_tmo support James Smart
2017-09-27 4:50 ` [PATCH v2 1/7] nvme core: allow controller RESETTING to RECONNECTING transition James Smart
@ 2017-09-27 4:50 ` James Smart
2017-10-05 7:57 ` Christoph Hellwig
2017-09-27 4:50 ` [PATCH v2 3/7] nvme_fc: add a dev_loss_tmo field to the remoteport James Smart
` (4 subsequent siblings)
6 siblings, 1 reply; 26+ messages in thread
From: James Smart @ 2017-09-27 4:50 UTC (permalink / raw)
Clean up some of the controller state checks and add the
RESETTING->RECONNECTING state transition.
Specifically:
- the movement of the RESETTING state change and schedule of
reset_work to core doesn't work wiht nvme_fc_error_recovery
setting state to RECONNECTING before attempting to reset.
Remove the state change as the reset request does it.
- In the rare cases where an error occurs right as we're
transitioning to LIVE, defer the controller start actions.
- In error handling on teardown of associations while
performing initial controller creation - avoid quiesce
calls on the admin_q. They are unneeded.
- Add the RESETTING->RECONNECTING transition in the reset
handler.
Signed-off-by: James Smart <james.smart at broadcom.com>
---
drivers/nvme/host/fc.c | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index af075e998944..ce5718b33d13 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1825,13 +1825,6 @@ nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
dev_warn(ctrl->ctrl.device,
"NVME-FC{%d}: resetting controller\n", ctrl->cnum);
- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) {
- dev_err(ctrl->ctrl.device,
- "NVME-FC{%d}: error_recovery: Couldn't change state "
- "to RECONNECTING\n", ctrl->cnum);
- return;
- }
-
nvme_reset_ctrl(&ctrl->ctrl);
}
@@ -2465,11 +2458,11 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
}
changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
- WARN_ON_ONCE(!changed);
ctrl->ctrl.nr_reconnects = 0;
- nvme_start_ctrl(&ctrl->ctrl);
+ if (changed)
+ nvme_start_ctrl(&ctrl->ctrl);
return 0; /* Success */
@@ -2537,7 +2530,8 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
* use blk_mq_tagset_busy_itr() and the transport routine to
* terminate the exchanges.
*/
- blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+ if (ctrl->ctrl.state != NVME_CTRL_NEW)
+ blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_fc_terminate_exchange, &ctrl->ctrl);
@@ -2641,12 +2635,8 @@ nvme_fc_del_nvme_ctrl(struct nvme_ctrl *nctrl)
static void
nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
{
- /* If we are resetting/deleting then do nothing */
- if (ctrl->ctrl.state != NVME_CTRL_RECONNECTING) {
- WARN_ON_ONCE(ctrl->ctrl.state == NVME_CTRL_NEW ||
- ctrl->ctrl.state == NVME_CTRL_LIVE);
+ if (ctrl->ctrl.state != NVME_CTRL_RECONNECTING)
return;
- }
dev_info(ctrl->ctrl.device,
"NVME-FC{%d}: reset: Reconnect attempt failed (%d)\n",
@@ -2675,9 +2665,17 @@ nvme_fc_reset_ctrl_work(struct work_struct *work)
int ret;
nvme_stop_ctrl(&ctrl->ctrl);
+
/* will block will waiting for io to terminate */
nvme_fc_delete_association(ctrl);
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) {
+ dev_err(ctrl->ctrl.device,
+ "NVME-FC{%d}: error_recovery: Couldn't change state "
+ "to RECONNECTING\n", ctrl->cnum);
+ return;
+ }
+
ret = nvme_fc_create_association(ctrl);
if (ret)
nvme_fc_reconnect_or_delete(ctrl, ret);
--
2.13.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 2/7] nvme_fc: change ctlr state assignments during reset/reconnect
2017-09-27 4:50 ` [PATCH v2 2/7] nvme_fc: change ctlr state assignments during reset/reconnect James Smart
@ 2017-10-05 7:57 ` Christoph Hellwig
2017-10-07 15:53 ` James Smart
0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-10-05 7:57 UTC (permalink / raw)
> changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
> - WARN_ON_ONCE(!changed);
>
> ctrl->ctrl.nr_reconnects = 0;
>
> - nvme_start_ctrl(&ctrl->ctrl);
> + if (changed)
> + nvme_start_ctrl(&ctrl->ctrl);
It's just cosmetic, but can you folow the RDMA pattern here:
changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
if (!changed) {
/* state change failure is ok if we're in DELETING state */
WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
return;
}
else the patch looks fine:
Reviewed-by: Christoph Hellwig <hch at lst.de>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 2/7] nvme_fc: change ctlr state assignments during reset/reconnect
2017-10-05 7:57 ` Christoph Hellwig
@ 2017-10-07 15:53 ` James Smart
2017-10-11 10:07 ` Sagi Grimberg
0 siblings, 1 reply; 26+ messages in thread
From: James Smart @ 2017-10-07 15:53 UTC (permalink / raw)
On 10/5/2017 12:57 AM, Christoph Hellwig wrote:
>> changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
>> - WARN_ON_ONCE(!changed);
>>
>> ctrl->ctrl.nr_reconnects = 0;
>>
>> - nvme_start_ctrl(&ctrl->ctrl);
>> + if (changed)
>> + nvme_start_ctrl(&ctrl->ctrl);
>
> It's just cosmetic, but can you folow the RDMA pattern here:
>
> changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
> if (!changed) {
> /* state change failure is ok if we're in DELETING state */
> WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
> return;
> }
>
> else the patch looks fine:
>
> Reviewed-by: Christoph Hellwig <hch at lst.de>
>
Actually, it's not just cosmetic. I don't want the warning to be
displayed. With FC's notification on device loss, in testing, we've seen
a device loss occur as this state transition occurs. I doubt that's
something rdma would ever see. With the device connectivity loss
notification, the state moves from NEW to RESETTING, then RECONNECTING.
I prefer to clear the reconnect counter (as we just went through a
connect) to ensure that the full reconnect attempts will occur.
As such, I'd prefer to leave it the same.
-- james
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 2/7] nvme_fc: change ctlr state assignments during reset/reconnect
2017-10-07 15:53 ` James Smart
@ 2017-10-11 10:07 ` Sagi Grimberg
2017-10-11 14:29 ` James Smart
0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-10-11 10:07 UTC (permalink / raw)
>>> ????? changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
>>> -??? WARN_ON_ONCE(!changed);
>>> ????? ctrl->ctrl.nr_reconnects = 0;
>>> -??? nvme_start_ctrl(&ctrl->ctrl);
>>> +??? if (changed)
>>> +??????? nvme_start_ctrl(&ctrl->ctrl);
>>
>> It's just cosmetic, but can you folow the RDMA pattern here:
>>
>> ????changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
>> ????if (!changed) {
>> ??????? /* state change failure is ok if we're in DELETING state */
>> ??????? WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
>> ??????? return;
>> ????}
>>
>> else the patch looks fine:
>>
>> Reviewed-by: Christoph Hellwig <hch at lst.de>
>>
>
> Actually, it's not just cosmetic. I don't want the warning to be
> displayed.
That's wrong AFAICT, you do want a warning notification if you
failed to move the controller state to LIVE here and you are
_NOT_ in DELETING. This indicates a state transition that is
undefined.
> With FC's notification on device loss, in testing, we've seen
> a device loss occur as this state transition occurs. I doubt that's
> something rdma would ever see.
It does, hence the patch.
> With the device connectivity loss
> notification, the state moves from NEW to RESETTING, then RECONNECTING.
you mean LIVE -> RESETTING -> RECONNECTING -> LIVE (on successful
reconnect).
That is exactly the point, the only way you fail RECONNECTING -> LIVE
is if you are in DELETING all of a sudden (triggered by the user
perhaps). Is there any way you will accept this state transition
to fail?
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 2/7] nvme_fc: change ctlr state assignments during reset/reconnect
2017-10-11 10:07 ` Sagi Grimberg
@ 2017-10-11 14:29 ` James Smart
0 siblings, 0 replies; 26+ messages in thread
From: James Smart @ 2017-10-11 14:29 UTC (permalink / raw)
On 10/11/2017 3:07 AM, Sagi Grimberg wrote:
>
>>>> ????? changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
>>>> -??? WARN_ON_ONCE(!changed);
>>>> ????? ctrl->ctrl.nr_reconnects = 0;
>>>> -??? nvme_start_ctrl(&ctrl->ctrl);
>>>> +??? if (changed)
>>>> +??????? nvme_start_ctrl(&ctrl->ctrl);
>>>
>>> It's just cosmetic, but can you folow the RDMA pattern here:
>>>
>>> ????changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
>>> ????if (!changed) {
>>> ??????? /* state change failure is ok if we're in DELETING state */
>>> ??????? WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
>>> ??????? return;
>>> ????}
>>>
>>> else the patch looks fine:
>>>
>>> Reviewed-by: Christoph Hellwig <hch at lst.de>
>>>
>>
>> Actually, it's not just cosmetic. I don't want the warning to be
>> displayed.
>
> That's wrong AFAICT, you do want a warning notification if you
> failed to move the controller state to LIVE here and you are
> _NOT_ in DELETING. This indicates a state transition that is
> undefined.
No.? The WARN_ON is a once only - and puts out a dastardly set of
messages with stack trace. Users see this as a bad error the way its
formatted.
Instead - the messages that are displayed when the other states are
transitioned to (already present) give you all the notification you need
and in a manner that is easy to understand.
"state transition that is not defined" - it is a defined transition,
otherwise the state machine wouldn't have allowed it.
>
>> With the device connectivity loss notification, the state moves from
>> NEW to RESETTING, then RECONNECTING.
>
> you mean LIVE -> RESETTING -> RECONNECTING -> LIVE (on successful
> reconnect).
yes
>
> That is exactly the point, the only way you fail RECONNECTING -> LIVE
> is if you are in DELETING all of a sudden (triggered by the user
> perhaps).
true - or triggered by a FC device connectivity loss for an extended period.
> Is there any way you will accept this state transition
> to fail?
?? which transaction. FC follows the state transitions per the state
machine.
-- james
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 3/7] nvme_fc: add a dev_loss_tmo field to the remoteport
2017-09-27 4:50 [PATCH v2 0/7] nvme_fc: add dev_loss_tmo support James Smart
2017-09-27 4:50 ` [PATCH v2 1/7] nvme core: allow controller RESETTING to RECONNECTING transition James Smart
2017-09-27 4:50 ` [PATCH v2 2/7] nvme_fc: change ctlr state assignments during reset/reconnect James Smart
@ 2017-09-27 4:50 ` James Smart
2017-10-11 10:11 ` Sagi Grimberg
2017-09-27 4:50 ` [PATCH v2 4/7] nvme_fc: add dev_loss_tmo to controller James Smart
` (3 subsequent siblings)
6 siblings, 1 reply; 26+ messages in thread
From: James Smart @ 2017-09-27 4:50 UTC (permalink / raw)
Add a dev_loss_tmo value, paralleling the SCSI FC transport, for device
connectivity loss. The transport can initialize it in the
nvme_fc_register_remoteport() call or fall back on a default of 60s.
Signed-off-by: James Smart <james.smart at broadcom.com>
---
drivers/nvme/host/fc.c | 14 ++++++++++++++
include/linux/nvme-fc-driver.h | 9 +++++++--
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index ce5718b33d13..db6484bfe02b 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -45,6 +45,10 @@ enum nvme_fc_queue_flags {
#define NVMEFC_QUEUE_DELAY 3 /* ms units */
+#define NVME_FC_DEFAULT_DEV_LOSS_TMO 60 /* seconds */
+#define NVME_FC_EXPECTED_RECONNECT_TM 2 /* seconds - E_D_TOV */
+#define NVME_FC_MIN_DEV_LOSS_TMO (2 * NVME_FC_EXPECTED_RECONNECT_TM)
+
struct nvme_fc_queue {
struct nvme_fc_ctrl *ctrl;
struct device *dev;
@@ -478,6 +482,12 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport,
unsigned long flags;
int ret, idx;
+ if (pinfo->dev_loss_tmo &&
+ pinfo->dev_loss_tmo < NVME_FC_MIN_DEV_LOSS_TMO) {
+ ret = -EINVAL;
+ goto out_reghost_failed;
+ }
+
newrec = kmalloc((sizeof(*newrec) + lport->ops->remote_priv_sz),
GFP_KERNEL);
if (!newrec) {
@@ -511,6 +521,10 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport,
newrec->remoteport.port_id = pinfo->port_id;
newrec->remoteport.port_state = FC_OBJSTATE_ONLINE;
newrec->remoteport.port_num = idx;
+ if (pinfo->dev_loss_tmo)
+ newrec->remoteport.dev_loss_tmo = pinfo->dev_loss_tmo;
+ else
+ newrec->remoteport.dev_loss_tmo = NVME_FC_DEFAULT_DEV_LOSS_TMO;
spin_lock_irqsave(&nvme_fc_lock, flags);
list_add_tail(&newrec->endp_list, &lport->endp_list);
diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h
index a726f96010d5..6e457e4f9543 100644
--- a/include/linux/nvme-fc-driver.h
+++ b/include/linux/nvme-fc-driver.h
@@ -40,6 +40,8 @@
* @node_name: FC WWNN for the port
* @port_name: FC WWPN for the port
* @port_role: What NVME roles are supported (see FC_PORT_ROLE_xxx)
+ * @dev_loss_tmo: maximum delay for reconnects to an association on
+ * this device. Used only on a remoteport.
*
* Initialization values for dynamic port fields:
* @port_id: FC N_Port_ID currently assigned the port. Upper 8 bits must
@@ -50,6 +52,7 @@ struct nvme_fc_port_info {
u64 port_name;
u32 port_role;
u32 port_id;
+ u32 dev_loss_tmo;
};
@@ -202,6 +205,9 @@ enum nvme_fc_obj_state {
* The length of the buffer corresponds to the local_priv_sz
* value specified in the nvme_fc_port_template supplied by
* the LLDD.
+ * @dev_loss_tmo: maximum delay for reconnects to an association on
+ * this device. To modify, lldd must call
+ * nvme_fc_set_remoteport_devloss().
*
* Fields with dynamic values. Values may change base on link state. LLDD
* may reference fields directly to change them. Initialized by the
@@ -259,10 +265,9 @@ struct nvme_fc_remote_port {
u32 port_role;
u64 node_name;
u64 port_name;
-
struct nvme_fc_local_port *localport;
-
void *private;
+ u32 dev_loss_tmo;
/* dynamic fields */
u32 port_id;
--
2.13.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 3/7] nvme_fc: add a dev_loss_tmo field to the remoteport
2017-09-27 4:50 ` [PATCH v2 3/7] nvme_fc: add a dev_loss_tmo field to the remoteport James Smart
@ 2017-10-11 10:11 ` Sagi Grimberg
2017-10-11 14:35 ` James Smart
0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-10-11 10:11 UTC (permalink / raw)
> +#define NVME_FC_DEFAULT_DEV_LOSS_TMO 60 /* seconds */
Why not reusing NVMF_DEF_CTRL_LOSS_TMO ? and isn't 60 seconds
a little to aggressive to give up?
What do you do with the ctrl_loss_tmo user parameter?
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 3/7] nvme_fc: add a dev_loss_tmo field to the remoteport
2017-10-11 10:11 ` Sagi Grimberg
@ 2017-10-11 14:35 ` James Smart
0 siblings, 0 replies; 26+ messages in thread
From: James Smart @ 2017-10-11 14:35 UTC (permalink / raw)
On 10/11/2017 3:11 AM, Sagi Grimberg wrote:
>
>> +#define NVME_FC_DEFAULT_DEV_LOSS_TMO 60??? /* seconds */
>
> Why not reusing NVMF_DEF_CTRL_LOSS_TMO ? and isn't 60 seconds
> a little to aggressive to give up?
Because there is an existing FC port timeout which is currently over on
the SCSI side of the house.? The FC port will support both SCSI and
FC-NVME simultaneously on the FC port - so it spans/affects both.? When
the "common fc class" is created and moved out from under scsi, it will
be the one with the value.
>
> What do you do with the ctrl_loss_tmo user parameter?
Well - this patch munged the dev_loss_tmo value into the
ctrl_loss_tmo-based values. That is an issue - so in a prior comment I
stated I'm reposting with the controller loss timeout kept separate from
the fc port dev_loss_tmo. Thus NVMF_DEF_CTRL_LOSS_TMO and ctrl_loss_tmo
will be used normally - like it is with RDMA. Dev_loss_tmo will be
tracked at the fc port level and if fires, will trump wherever the
controller is in its ctrl_loss_tmo
-- james
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 4/7] nvme_fc: add dev_loss_tmo to controller
2017-09-27 4:50 [PATCH v2 0/7] nvme_fc: add dev_loss_tmo support James Smart
` (2 preceding siblings ...)
2017-09-27 4:50 ` [PATCH v2 3/7] nvme_fc: add a dev_loss_tmo field to the remoteport James Smart
@ 2017-09-27 4:50 ` James Smart
2017-10-05 8:00 ` Christoph Hellwig
2017-10-11 10:14 ` Sagi Grimberg
2017-09-27 4:50 ` [PATCH v2 5/7] nvme_fc: check connectivity before initiating reconnects James Smart
` (2 subsequent siblings)
6 siblings, 2 replies; 26+ messages in thread
From: James Smart @ 2017-09-27 4:50 UTC (permalink / raw)
This patch adds a dev_loss_tmo value to the controller. The value
is initialized from the remoteport.
The patch also adds a lldd-callable routine,
nvme_fc_set_remoteport_devloss() to change the value on the remoteport
and apply the new value to the controllers on the remote port.
The dev_loss_tmo value set on the controller will ultimately be the
maximum window allowed for reconnection, whether it was started due
to controller reset, transport error, or loss of connectivity to the
target.
The dev_loss_tmo value set on the controller will be the smaller of
the controller's (max_connects * reconnect_delay) window set at creation
and the remoteport's dev_loss_tmo value. After selecting the smallest
value, if the controller's reconnect window is superceeded by the
remoteport's dev_loss_tmo value, the reconnect values will be
adjusted for the new dev_loss_tmo value.
Signed-off-by: James Smart <james.smart at broadcom.com>
---
drivers/nvme/host/fc.c | 93 ++++++++++++++++++++++++++++++++++++++++++
include/linux/nvme-fc-driver.h | 2 +
2 files changed, 95 insertions(+)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index db6484bfe02b..63fb35bb6f1f 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -163,6 +163,7 @@ struct nvme_fc_ctrl {
struct work_struct delete_work;
struct delayed_work connect_work;
+ u32 dev_loss_tmo;
struct kref ref;
u32 flags;
@@ -2740,6 +2741,96 @@ static const struct blk_mq_ops nvme_fc_admin_mq_ops = {
};
+static void
+nvme_fc_set_ctrl_devloss(struct nvme_fc_ctrl *ctrl,
+ struct nvmf_ctrl_options *opts)
+{
+ u32 dev_loss_tmo;
+
+ /*
+ * dev_loss_tmo will be the max amount of time after an association
+ * failure that will be allowed for a new association to be
+ * established. It doesn't matter why the original association
+ * failed (FC connectivity loss, transport error, admin-request).
+ * The new association must be established before dev_loss_tmo
+ * expires or the controller will be torn down.
+ *
+ * If the connect parameters are less than the FC port dev_loss_tmo
+ * parameter, scale dev_loss_tmo to the connect parameters.
+ *
+ * If the connect parameters are larger than the FC port
+ * dev_loss_tmo parameter, adjust the connect parameters so that
+ * there is at least 1 attempt at a reconnect attempt before failing.
+ * Note: reconnects will be attempted only if there is FC connectivity.
+ */
+
+ if (opts->max_reconnects < 1)
+ opts->max_reconnects = 1;
+ dev_loss_tmo = opts->reconnect_delay * opts->max_reconnects;
+
+ ctrl->dev_loss_tmo =
+ min_t(u32, ctrl->rport->remoteport.dev_loss_tmo, dev_loss_tmo);
+ if (ctrl->dev_loss_tmo < ctrl->rport->remoteport.dev_loss_tmo)
+ dev_info(ctrl->ctrl.device,
+ "NVME-FC{%d}: scaling dev_loss_tmo to reconnect "
+ "window (%d)\n",
+ ctrl->cnum, ctrl->dev_loss_tmo);
+
+ /* resync dev_loss_tmo with the reconnect window */
+ if (ctrl->dev_loss_tmo < opts->reconnect_delay * opts->max_reconnects) {
+ if (!ctrl->dev_loss_tmo)
+ opts->max_reconnects = 0;
+ else {
+ opts->reconnect_delay =
+ min_t(u32, opts->reconnect_delay,
+ ctrl->dev_loss_tmo -
+ NVME_FC_EXPECTED_RECONNECT_TM);
+ opts->max_reconnects = DIV_ROUND_UP(ctrl->dev_loss_tmo,
+ opts->reconnect_delay);
+ dev_info(ctrl->ctrl.device,
+ "NVME-FC{%d}: dev_loss_tmo %d: scaling "
+ "reconnect delay %d max reconnects %d\n",
+ ctrl->cnum, ctrl->dev_loss_tmo,
+ opts->reconnect_delay, opts->max_reconnects);
+ }
+ }
+}
+
+int
+nvme_fc_set_remoteport_devloss(struct nvme_fc_remote_port *portptr,
+ u32 dev_loss_tmo)
+{
+ struct nvme_fc_rport *rport = remoteport_to_rport(portptr);
+ struct nvme_fc_ctrl *ctrl;
+ unsigned long flags;
+
+ /*
+ * Allow dev_loss_tmo set to 0. This will allow
+ * nvme_fc_unregister_remoteport() to immediately delete
+ * controllers without waiting a dev_loss_tmo timeout.
+ */
+ if (dev_loss_tmo && dev_loss_tmo < NVME_FC_MIN_DEV_LOSS_TMO)
+ return -ERANGE;
+
+ spin_lock_irqsave(&rport->lock, flags);
+
+ if (portptr->port_state != FC_OBJSTATE_ONLINE) {
+ spin_unlock_irqrestore(&rport->lock, flags);
+ return -EINVAL;
+ }
+
+ rport->remoteport.dev_loss_tmo = dev_loss_tmo;
+
+ list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list)
+ /* Apply values for use in next reconnect cycle */
+ nvme_fc_set_ctrl_devloss(ctrl, ctrl->ctrl.opts);
+
+ spin_unlock_irqrestore(&rport->lock, flags);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(nvme_fc_set_remoteport_devloss);
+
static struct nvme_ctrl *
nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
struct nvme_fc_lport *lport, struct nvme_fc_rport *rport)
@@ -2837,6 +2928,8 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
spin_unlock_irqrestore(&rport->lock, flags);
+ nvme_fc_set_ctrl_devloss(ctrl, opts);
+
ret = nvme_fc_create_association(ctrl);
if (ret) {
ctrl->ctrl.opts = NULL;
diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h
index 6e457e4f9543..a2344456266c 100644
--- a/include/linux/nvme-fc-driver.h
+++ b/include/linux/nvme-fc-driver.h
@@ -452,6 +452,8 @@ int nvme_fc_register_remoteport(struct nvme_fc_local_port *localport,
int nvme_fc_unregister_remoteport(struct nvme_fc_remote_port *remoteport);
+int nvme_fc_set_remoteport_devloss(struct nvme_fc_remote_port *remoteport,
+ u32 dev_loss_tmo);
/*
* *************** LLDD FC-NVME Target/Subsystem API ***************
--
2.13.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 4/7] nvme_fc: add dev_loss_tmo to controller
2017-09-27 4:50 ` [PATCH v2 4/7] nvme_fc: add dev_loss_tmo to controller James Smart
@ 2017-10-05 8:00 ` Christoph Hellwig
2017-10-07 16:08 ` James Smart
2017-10-11 10:14 ` Sagi Grimberg
1 sibling, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-10-05 8:00 UTC (permalink / raw)
Can you please add the dev_loss_tmo to the generic nvme_ctrl and build
the infrastructure in common code?
Then as a next step we can look into inferring it from the remote port
for FC.
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 4/7] nvme_fc: add dev_loss_tmo to controller
2017-10-05 8:00 ` Christoph Hellwig
@ 2017-10-07 16:08 ` James Smart
2017-10-11 10:17 ` Sagi Grimberg
0 siblings, 1 reply; 26+ messages in thread
From: James Smart @ 2017-10-07 16:08 UTC (permalink / raw)
On 10/5/2017 1:00 AM, Christoph Hellwig wrote:
> Can you please add the dev_loss_tmo to the generic nvme_ctrl and build
> the infrastructure in common code?
>
> Then as a next step we can look into inferring it from the remote port
> for FC.
>
Hmm - this raise flags in my mind. I'm not sure how a dev_loss_tmo
applies to any other transport other than FC.
I am inclined to rework this a bit though as I think the mashing of the
dev_loss_tmo value into the controller's ctrl_loss_tmo has caused
confusion. The dev_loss_tmo should stay in the transport and be specific
to the remoteport. Mashing them also caused us to loose the initial
creation ctrl_loss_tmo value if there had been one set and the dev_loss
value dynamically changed. I'll rework it so that: dev_loss_tmo is an
FC internal timer that acts on the remoteport. Upon device loss, the
remote ports dev_loss timer will be started, and any controllers on the
remote port will have their associations torn down and their
ctrl_loss_tmo (aka max_reconnects and reconnect_delay) started. If a
controllers max_reconnects is hit it will be deleted. If the rport's
dev_loss_tmo is hit, it will delete any suspended controller.
-- james
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 4/7] nvme_fc: add dev_loss_tmo to controller
2017-10-07 16:08 ` James Smart
@ 2017-10-11 10:17 ` Sagi Grimberg
2017-10-11 15:10 ` James Smart
0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-10-11 10:17 UTC (permalink / raw)
> Hmm - this raise flags in my mind. I'm not sure how a dev_loss_tmo
> applies to any other transport other than FC.
Can you explain what exactly do you mean by dev_loss_tmo only applies
to FC?
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 4/7] nvme_fc: add dev_loss_tmo to controller
2017-10-11 10:17 ` Sagi Grimberg
@ 2017-10-11 15:10 ` James Smart
0 siblings, 0 replies; 26+ messages in thread
From: James Smart @ 2017-10-11 15:10 UTC (permalink / raw)
On 10/11/2017 3:17 AM, Sagi Grimberg wrote:
>
>> Hmm - this raise flags in my mind. I'm not sure how a dev_loss_tmo
>> applies to any other transport other than FC.
>
> Can you explain what exactly do you mean by dev_loss_tmo only applies
> to FC?
Well I'm at a loss at how dev_loss_tmo could apply to another transport
like RDMA or TCP. I'd like someone to explain it to me.
afaik - no other transport has events from the fabric when connectivity
changes (perhaps on the otherside of a switch) thus the transport
actively tracks connectivity to the other endpoint.? FC does. So we know
almost immediately when a target port goes away. Dev_loss_tmo is a
duration after initial connectivity loss where we wait and suspend
devices attached to the fc port - both scsi and fc-nvme - allowing the
port to come back and the os-device to resume operation. if it expires,
the device is deleted.? It's existed on SCSI for many years.? So it's
basically the same behavior as the ctrl_loss_tmo, but the timer is at
the port level and spans multiple ctrl's. Given they were so similar, I
munged the ctrl_loss_tmo and dev_loss_tmo together and timed things at
the controller only. Now I'm regretting that choice and keeping things
in their natural place: ctlr_loss_tmo a the ctlr level (regardless of
connectivity, with immediate reconnect reschedule if no connectivity);
and dev_loss_tmo at the fc port level (connectivity loss starts
reset/reconnect on all, timeout deletes all).
-- james
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 4/7] nvme_fc: add dev_loss_tmo to controller
2017-09-27 4:50 ` [PATCH v2 4/7] nvme_fc: add dev_loss_tmo to controller James Smart
2017-10-05 8:00 ` Christoph Hellwig
@ 2017-10-11 10:14 ` Sagi Grimberg
1 sibling, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-10-11 10:14 UTC (permalink / raw)
> +static void
> +nvme_fc_set_ctrl_devloss(struct nvme_fc_ctrl *ctrl,
> + struct nvmf_ctrl_options *opts)
> +{
> + u32 dev_loss_tmo;
> +
> + /*
> + * dev_loss_tmo will be the max amount of time after an association
> + * failure that will be allowed for a new association to be
> + * established. It doesn't matter why the original association
> + * failed (FC connectivity loss, transport error, admin-request).
> + * The new association must be established before dev_loss_tmo
> + * expires or the controller will be torn down.
> + *
> + * If the connect parameters are less than the FC port dev_loss_tmo
> + * parameter, scale dev_loss_tmo to the connect parameters.
> + *
> + * If the connect parameters are larger than the FC port
> + * dev_loss_tmo parameter, adjust the connect parameters so that
> + * there is at least 1 attempt at a reconnect attempt before failing.
> + * Note: reconnects will be attempted only if there is FC connectivity.
> + */
OK now I see where you are accounting for fabrics ctrl_loss_tmo, but
given that we have a default, what is the use for your private FC port
default?
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 5/7] nvme_fc: check connectivity before initiating reconnects
2017-09-27 4:50 [PATCH v2 0/7] nvme_fc: add dev_loss_tmo support James Smart
` (3 preceding siblings ...)
2017-09-27 4:50 ` [PATCH v2 4/7] nvme_fc: add dev_loss_tmo to controller James Smart
@ 2017-09-27 4:50 ` James Smart
2017-10-05 8:01 ` Christoph Hellwig
2017-09-27 4:50 ` [PATCH v2 6/7] nvme_fc: move remote port get/put/free location James Smart
2017-09-27 4:50 ` [PATCH v2 7/7] nvme_fc: add dev_loss_tmo timeout and remoteport resume support James Smart
6 siblings, 1 reply; 26+ messages in thread
From: James Smart @ 2017-09-27 4:50 UTC (permalink / raw)
check remoteport connectivity before initiating reconnects
Signed-off-by: James Smart <james.smart at broadcom.com>
---
drivers/nvme/host/fc.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 63fb35bb6f1f..740dd471ec7a 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -738,6 +738,19 @@ fc_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
dma_unmap_sg(dev, sg, nents, dir);
}
+static inline bool
+nvme_fc_rport_is_online(struct nvme_fc_rport *rport)
+{
+ unsigned long flags;
+ bool online;
+
+ spin_lock_irqsave(&rport->lock, flags);
+ online = (rport->remoteport.port_state == FC_OBJSTATE_ONLINE);
+ spin_unlock_irqrestore(&rport->lock, flags);
+
+ return online;
+}
+
/* *********************** FC-NVME LS Handling **************************** */
@@ -2382,6 +2395,9 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
++ctrl->ctrl.nr_reconnects;
+ if (!nvme_fc_rport_is_online(ctrl->rport))
+ return -ENODEV;
+
/*
* Create the admin queue
*/
@@ -2658,6 +2674,10 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
ctrl->cnum, status);
if (nvmf_should_reconnect(&ctrl->ctrl)) {
+ /* Only schedule the reconnect if the remote port is online */
+ if (!nvme_fc_rport_is_online(ctrl->rport))
+ return;
+
dev_info(ctrl->ctrl.device,
"NVME-FC{%d}: Reconnect attempt in %d seconds.\n",
ctrl->cnum, ctrl->ctrl.opts->reconnect_delay);
@@ -2691,12 +2711,15 @@ nvme_fc_reset_ctrl_work(struct work_struct *work)
return;
}
- ret = nvme_fc_create_association(ctrl);
- if (ret)
- nvme_fc_reconnect_or_delete(ctrl, ret);
- else
- dev_info(ctrl->ctrl.device,
- "NVME-FC{%d}: controller reset complete\n", ctrl->cnum);
+ if (nvme_fc_rport_is_online(ctrl->rport)) {
+ ret = nvme_fc_create_association(ctrl);
+ if (ret)
+ nvme_fc_reconnect_or_delete(ctrl, ret);
+ else
+ dev_info(ctrl->ctrl.device,
+ "NVME-FC{%d}: controller reset complete\n",
+ ctrl->cnum);
+ }
}
static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = {
--
2.13.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 6/7] nvme_fc: move remote port get/put/free location
2017-09-27 4:50 [PATCH v2 0/7] nvme_fc: add dev_loss_tmo support James Smart
` (4 preceding siblings ...)
2017-09-27 4:50 ` [PATCH v2 5/7] nvme_fc: check connectivity before initiating reconnects James Smart
@ 2017-09-27 4:50 ` James Smart
2017-10-05 8:03 ` Christoph Hellwig
2017-10-11 10:18 ` Sagi Grimberg
2017-09-27 4:50 ` [PATCH v2 7/7] nvme_fc: add dev_loss_tmo timeout and remoteport resume support James Smart
6 siblings, 2 replies; 26+ messages in thread
From: James Smart @ 2017-09-27 4:50 UTC (permalink / raw)
move nvme_fc_rport_get/put and rport free to higher in the file to
avoid adding prototypes to resolve references in upcoming code additions
Signed-off-by: James Smart <james.smart at broadcom.com>
---
drivers/nvme/host/fc.c | 78 +++++++++++++++++++++++++-------------------------
1 file changed, 39 insertions(+), 39 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 740dd471ec7a..608d37d15495 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -457,6 +457,45 @@ nvme_fc_unregister_localport(struct nvme_fc_local_port *portptr)
}
EXPORT_SYMBOL_GPL(nvme_fc_unregister_localport);
+static void
+nvme_fc_free_rport(struct kref *ref)
+{
+ struct nvme_fc_rport *rport =
+ container_of(ref, struct nvme_fc_rport, ref);
+ struct nvme_fc_lport *lport =
+ localport_to_lport(rport->remoteport.localport);
+ unsigned long flags;
+
+ WARN_ON(rport->remoteport.port_state != FC_OBJSTATE_DELETED);
+ WARN_ON(!list_empty(&rport->ctrl_list));
+
+ /* remove from lport list */
+ spin_lock_irqsave(&nvme_fc_lock, flags);
+ list_del(&rport->endp_list);
+ spin_unlock_irqrestore(&nvme_fc_lock, flags);
+
+ /* let the LLDD know we've finished tearing it down */
+ lport->ops->remoteport_delete(&rport->remoteport);
+
+ ida_simple_remove(&lport->endp_cnt, rport->remoteport.port_num);
+
+ kfree(rport);
+
+ nvme_fc_lport_put(lport);
+}
+
+static void
+nvme_fc_rport_put(struct nvme_fc_rport *rport)
+{
+ kref_put(&rport->ref, nvme_fc_free_rport);
+}
+
+static int
+nvme_fc_rport_get(struct nvme_fc_rport *rport)
+{
+ return kref_get_unless_zero(&rport->ref);
+}
+
/**
* nvme_fc_register_remoteport - transport entry point called by an
* LLDD to register the existence of a NVME
@@ -544,45 +583,6 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport,
}
EXPORT_SYMBOL_GPL(nvme_fc_register_remoteport);
-static void
-nvme_fc_free_rport(struct kref *ref)
-{
- struct nvme_fc_rport *rport =
- container_of(ref, struct nvme_fc_rport, ref);
- struct nvme_fc_lport *lport =
- localport_to_lport(rport->remoteport.localport);
- unsigned long flags;
-
- WARN_ON(rport->remoteport.port_state != FC_OBJSTATE_DELETED);
- WARN_ON(!list_empty(&rport->ctrl_list));
-
- /* remove from lport list */
- spin_lock_irqsave(&nvme_fc_lock, flags);
- list_del(&rport->endp_list);
- spin_unlock_irqrestore(&nvme_fc_lock, flags);
-
- /* let the LLDD know we've finished tearing it down */
- lport->ops->remoteport_delete(&rport->remoteport);
-
- ida_simple_remove(&lport->endp_cnt, rport->remoteport.port_num);
-
- kfree(rport);
-
- nvme_fc_lport_put(lport);
-}
-
-static void
-nvme_fc_rport_put(struct nvme_fc_rport *rport)
-{
- kref_put(&rport->ref, nvme_fc_free_rport);
-}
-
-static int
-nvme_fc_rport_get(struct nvme_fc_rport *rport)
-{
- return kref_get_unless_zero(&rport->ref);
-}
-
static int
nvme_fc_abort_lsops(struct nvme_fc_rport *rport)
{
--
2.13.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 7/7] nvme_fc: add dev_loss_tmo timeout and remoteport resume support
2017-09-27 4:50 [PATCH v2 0/7] nvme_fc: add dev_loss_tmo support James Smart
` (5 preceding siblings ...)
2017-09-27 4:50 ` [PATCH v2 6/7] nvme_fc: move remote port get/put/free location James Smart
@ 2017-09-27 4:50 ` James Smart
2017-10-11 10:29 ` Sagi Grimberg
6 siblings, 1 reply; 26+ messages in thread
From: James Smart @ 2017-09-27 4:50 UTC (permalink / raw)
This patch adds the dev_loss_tmo functionality to the transport.
When a remoteport is unregistered (connectivity lost), it is marked
DELETED and the following is perfomed on all the controllers on the
remoteport:
- the controller is reset to delete the current association.
- Once the association is terminated, the dev_loss_tmo timer is started.
A reconnect is not scheduled as there is no connectivity.
Note: the start of the dev_loss_tmo timer is in the generic
delete-association/create-new-association path. Thus it will be started
regardless of whether the reset was due to remote port connectivity
loss, a controller reset, or a transport run-time error.
When a remoteport is registered (connectivity established), the
transport searches the list of remoteport structures that have pending
deletions (controllers waiting to have dev_loss_tmo fire, thus
preventing remoteport deletion). The transport looks for a matching
wwnn/wwpn. If one is found, the remoteport is transitioned back to
ONLINE, and the following occurs on all controllers on the remoteport:
- any controllers in a RECONNECTING state have reconnection attempts
kicked off.
- If the controller was RESETTING, it's natural RECONNECTING transition
will start a reconnect attempt.
Once a controller successfully reconnects to a new association, any
dev_loss_tmo timer for it is terminated.
If a dev_loss_tmo timer for a controller fires, the controller is
unconditionally deleted.
Signed-off-by: James Smart <james.smart at broadcom.com>
---
drivers/nvme/host/fc.c | 237 ++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 225 insertions(+), 12 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 608d37d15495..ac898571b1e5 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -163,6 +163,7 @@ struct nvme_fc_ctrl {
struct work_struct delete_work;
struct delayed_work connect_work;
+ struct delayed_work dev_loss_work;
u32 dev_loss_tmo;
struct kref ref;
@@ -496,6 +497,87 @@ nvme_fc_rport_get(struct nvme_fc_rport *rport)
return kref_get_unless_zero(&rport->ref);
}
+static void
+nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl)
+{
+ switch (ctrl->ctrl.state) {
+ case NVME_CTRL_NEW:
+ case NVME_CTRL_RECONNECTING:
+ /*
+ * As all reconnects were suppressed, schedule a
+ * connect.
+ */
+ queue_delayed_work(nvme_wq, &ctrl->connect_work, 0);
+ break;
+
+ case NVME_CTRL_RESETTING:
+ /*
+ * Controller is already in the process of terminating the
+ * association. No need to do anything further. The reconnect
+ * step will naturally occur after the reset completes.
+ */
+ break;
+
+ default:
+ /* no action to take - let it delete */
+ break;
+ }
+}
+
+static struct nvme_fc_rport *
+nvme_fc_attach_to_suspended_rport(struct nvme_fc_lport *lport,
+ struct nvme_fc_port_info *pinfo)
+{
+ struct nvme_fc_rport *rport;
+ struct nvme_fc_ctrl *ctrl;
+ unsigned long flags;
+
+ spin_lock_irqsave(&nvme_fc_lock, flags);
+
+ list_for_each_entry(rport, &lport->endp_list, endp_list) {
+ if (rport->remoteport.node_name != pinfo->node_name ||
+ rport->remoteport.port_name != pinfo->port_name)
+ continue;
+
+ if (!nvme_fc_rport_get(rport)) {
+ rport = ERR_PTR(-ENOLCK);
+ goto out_done;
+ }
+
+ spin_unlock_irqrestore(&nvme_fc_lock, flags);
+
+ spin_lock_irqsave(&rport->lock, flags);
+
+ /* has it been unregistered */
+ if (rport->remoteport.port_state != FC_OBJSTATE_DELETED) {
+ /* means lldd called us twice */
+ spin_unlock_irqrestore(&rport->lock, flags);
+ nvme_fc_rport_put(rport);
+ return ERR_PTR(-ESTALE);
+ }
+
+ rport->remoteport.port_state = FC_OBJSTATE_ONLINE;
+
+ /*
+ * kick off a reconnect attempt on all associations to the
+ * remote port. A successful reconnects will resume i/o.
+ */
+ list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list)
+ nvme_fc_resume_controller(ctrl);
+
+ spin_unlock_irqrestore(&rport->lock, flags);
+
+ return rport;
+ }
+
+ rport = NULL;
+
+out_done:
+ spin_unlock_irqrestore(&nvme_fc_lock, flags);
+
+ return rport;
+}
+
/**
* nvme_fc_register_remoteport - transport entry point called by an
* LLDD to register the existence of a NVME
@@ -528,22 +610,46 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport,
goto out_reghost_failed;
}
+ if (!nvme_fc_lport_get(lport)) {
+ ret = -ESHUTDOWN;
+ goto out_reghost_failed;
+ }
+
+ /*
+ * look to see if there is already a remoteport that is waiting
+ * for a reconnect (within dev_loss_tmo) with the same WWN's.
+ * If so, transition to it and reconnect.
+ */
+ newrec = nvme_fc_attach_to_suspended_rport(lport, pinfo);
+
+ /* found an rport, but something about its state is bad */
+ if (IS_ERR(newrec)) {
+ ret = PTR_ERR(newrec);
+ goto out_lport_put;
+
+ /* found existing rport, which was resumed */
+ } else if (newrec) {
+ /* Ignore pinfo->dev_loss_tmo. Leave rport and ctlr's as is */
+
+ nvme_fc_lport_put(lport);
+ nvme_fc_signal_discovery_scan(lport, newrec);
+ *portptr = &newrec->remoteport;
+ return 0;
+ }
+
+ /* nothing found - allocate a new remoteport struct */
+
newrec = kmalloc((sizeof(*newrec) + lport->ops->remote_priv_sz),
GFP_KERNEL);
if (!newrec) {
ret = -ENOMEM;
- goto out_reghost_failed;
- }
-
- if (!nvme_fc_lport_get(lport)) {
- ret = -ESHUTDOWN;
- goto out_kfree_rport;
+ goto out_lport_put;
}
idx = ida_simple_get(&lport->endp_cnt, 0, 0, GFP_KERNEL);
if (idx < 0) {
ret = -ENOSPC;
- goto out_lport_put;
+ goto out_kfree_rport;
}
INIT_LIST_HEAD(&newrec->endp_list);
@@ -573,10 +679,10 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport,
*portptr = &newrec->remoteport;
return 0;
-out_lport_put:
- nvme_fc_lport_put(lport);
out_kfree_rport:
kfree(newrec);
+out_lport_put:
+ nvme_fc_lport_put(lport);
out_reghost_failed:
*portptr = NULL;
return ret;
@@ -607,6 +713,82 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport)
return 0;
}
+static void
+nvme_fc_start_dev_loss_tmo(struct nvme_fc_ctrl *ctrl, u32 dev_loss_tmo)
+{
+ /* if dev_loss_tmo==0, dev loss is immediate */
+ if (!dev_loss_tmo) {
+ dev_warn(ctrl->ctrl.device,
+ "NVME-FC{%d}: controller connectivity lost. "
+ "Deleting controller.\n",
+ ctrl->cnum);
+ __nvme_fc_del_ctrl(ctrl);
+ return;
+ }
+
+ dev_info(ctrl->ctrl.device,
+ "NVME-FC{%d}: controller connectivity lost. Awaiting reconnect",
+ ctrl->cnum);
+
+ switch (ctrl->ctrl.state) {
+ case NVME_CTRL_LIVE:
+ /*
+ * Schedule a controller reset. The reset will
+ * terminate the association and schedule the
+ * dev_loss_tmo timer. The reconnect after
+ * terminating the association will note the
+ * rport state and will not be scheduled.
+ * The controller will sit in that state, with
+ * io suspended at the block layer, until either
+ * dev_loss_tmo expires or the remoteport is
+ * re-registered. If re-registered, an immediate
+ * connect attempt will be made.
+ */
+ if (nvme_reset_ctrl(&ctrl->ctrl)) {
+ dev_warn(ctrl->ctrl.device,
+ "NVME-FC{%d}: Couldn't schedule reset. "
+ "Deleting controller.\n",
+ ctrl->cnum);
+ __nvme_fc_del_ctrl(ctrl);
+ }
+ break;
+
+ case NVME_CTRL_NEW:
+ case NVME_CTRL_RECONNECTING:
+ /*
+ * The association has already been terminated
+ * and dev_loss_tmo scheduled. The controller
+ * is either in the process of connecting or
+ * has scheduled a reconnect attempt.
+ * If in the process of connecting, it will fail
+ * due to loss of connectivity to the remoteport,
+ * and the reconnect will not be scheduled as
+ * there is no connectivity.
+ * If awaiting the reconnect, terminate it as
+ * it'll only fail.
+ */
+ cancel_delayed_work(&ctrl->connect_work);
+ break;
+
+ case NVME_CTRL_RESETTING:
+ /*
+ * Controller is already in the process of
+ * terminating the association. No need to do
+ * anything further. The reconnect step will
+ * kick in naturally after the association is
+ * terminated, detecting the lack of
+ * connectivity, and not attempt a reconnect
+ * or schedule one.
+ */
+ break;
+
+ case NVME_CTRL_DELETING:
+ default:
+ /* no action to take - let it delete */
+ break;
+ }
+}
+
/**
* nvme_fc_unregister_remoteport - transport entry point called by an
* LLDD to deregister/remove a previously
@@ -636,15 +818,20 @@ nvme_fc_unregister_remoteport(struct nvme_fc_remote_port *portptr)
}
portptr->port_state = FC_OBJSTATE_DELETED;
- /* tear down all associations to the remote port */
list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list)
- __nvme_fc_del_ctrl(ctrl);
+ nvme_fc_start_dev_loss_tmo(ctrl, portptr->dev_loss_tmo);
spin_unlock_irqrestore(&rport->lock, flags);
nvme_fc_abort_lsops(rport);
+ /*
+ * release the reference, which will allow, if all controllers
+ * go away, which should only occur after dev_loss_tmo occurs,
+ * for the rport to be torn down.
+ */
nvme_fc_rport_put(rport);
+
return 0;
}
EXPORT_SYMBOL_GPL(nvme_fc_unregister_remoteport);
@@ -2492,8 +2679,10 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
ctrl->ctrl.nr_reconnects = 0;
- if (changed)
+ if (changed) {
nvme_start_ctrl(&ctrl->ctrl);
+ cancel_delayed_work_sync(&ctrl->dev_loss_work);
+ }
return 0; /* Success */
@@ -2603,6 +2792,7 @@ nvme_fc_delete_ctrl_work(struct work_struct *work)
cancel_work_sync(&ctrl->ctrl.reset_work);
cancel_delayed_work_sync(&ctrl->connect_work);
+ cancel_delayed_work_sync(&ctrl->dev_loss_work);
nvme_stop_ctrl(&ctrl->ctrl);
nvme_remove_namespaces(&ctrl->ctrl);
/*
@@ -2711,6 +2901,9 @@ nvme_fc_reset_ctrl_work(struct work_struct *work)
return;
}
+ queue_delayed_work(nvme_wq, &ctrl->dev_loss_work,
+ ctrl->dev_loss_tmo * HZ);
+
if (nvme_fc_rport_is_online(ctrl->rport)) {
ret = nvme_fc_create_association(ctrl);
if (ret)
@@ -2753,6 +2946,25 @@ nvme_fc_connect_ctrl_work(struct work_struct *work)
ctrl->cnum);
}
+static void
+nvme_fc_dev_loss_ctrl_work(struct work_struct *work)
+{
+ struct nvme_fc_ctrl *ctrl =
+ container_of(to_delayed_work(work),
+ struct nvme_fc_ctrl, dev_loss_work);
+
+ if (ctrl->ctrl.state != NVME_CTRL_DELETING) {
+ dev_warn(ctrl->ctrl.device,
+ "NVME-FC{%d}: Device failed to reconnect within "
+ "dev_loss_tmo (%d seconds). Deleting controller\n",
+ ctrl->cnum, ctrl->dev_loss_tmo);
+ if (__nvme_fc_del_ctrl(ctrl))
+ dev_warn(ctrl->ctrl.device,
+ "NVME-FC{%d}: delete request failed\n",
+ ctrl->cnum);
+ }
+}
+
static const struct blk_mq_ops nvme_fc_admin_mq_ops = {
.queue_rq = nvme_fc_queue_rq,
@@ -2893,6 +3105,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
INIT_WORK(&ctrl->delete_work, nvme_fc_delete_ctrl_work);
INIT_WORK(&ctrl->ctrl.reset_work, nvme_fc_reset_ctrl_work);
INIT_DELAYED_WORK(&ctrl->connect_work, nvme_fc_connect_ctrl_work);
+ INIT_DELAYED_WORK(&ctrl->dev_loss_work, nvme_fc_dev_loss_ctrl_work);
spin_lock_init(&ctrl->lock);
/* io queue count */
--
2.13.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 7/7] nvme_fc: add dev_loss_tmo timeout and remoteport resume support
2017-09-27 4:50 ` [PATCH v2 7/7] nvme_fc: add dev_loss_tmo timeout and remoteport resume support James Smart
@ 2017-10-11 10:29 ` Sagi Grimberg
2017-10-11 14:52 ` James Smart
0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-10-11 10:29 UTC (permalink / raw)
> This patch adds the dev_loss_tmo functionality to the transport.
>
> When a remoteport is unregistered (connectivity lost), it is marked
> DELETED and the following is perfomed on all the controllers on the
> remoteport:
> - the controller is reset to delete the current association.
> - Once the association is terminated, the dev_loss_tmo timer is started.
> A reconnect is not scheduled as there is no connectivity.
> Note: the start of the dev_loss_tmo timer is in the generic
> delete-association/create-new-association path. Thus it will be started
> regardless of whether the reset was due to remote port connectivity
> loss, a controller reset, or a transport run-time error.
>
> When a remoteport is registered (connectivity established), the
> transport searches the list of remoteport structures that have pending
> deletions (controllers waiting to have dev_loss_tmo fire, thus
> preventing remoteport deletion). The transport looks for a matching
> wwnn/wwpn. If one is found, the remoteport is transitioned back to
> ONLINE, and the following occurs on all controllers on the remoteport:
> - any controllers in a RECONNECTING state have reconnection attempts
> kicked off.
> - If the controller was RESETTING, it's natural RECONNECTING transition
> will start a reconnect attempt.
OK, I think I finally understand the existence of the two tmos.
But may I ask, what happens if we only have ctrl_loss_tmo support?
If I understand correctly, it would behave exactly as expected as the
controller will periodically attempt reconnect until ctrl_loss_tmo fires
and then it is deleted. With dev_loss_tmo, once the lower between
ctrl_loss_tmo and dev_loss_tmo expires, the controller is deleted.
If my understanding is correct, this just adds a level of confusion
to setup procedure doesn't it?
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 7/7] nvme_fc: add dev_loss_tmo timeout and remoteport resume support
2017-10-11 10:29 ` Sagi Grimberg
@ 2017-10-11 14:52 ` James Smart
0 siblings, 0 replies; 26+ messages in thread
From: James Smart @ 2017-10-11 14:52 UTC (permalink / raw)
On 10/11/2017 3:29 AM, Sagi Grimberg wrote:
> OK, I think I finally understand the existence of the two tmos.
> But may I ask, what happens if we only have ctrl_loss_tmo support?
>
> If I understand correctly, it would behave exactly as expected as the
> controller will periodically attempt reconnect until ctrl_loss_tmo fires
> and then it is deleted. With dev_loss_tmo, once the lower between
> ctrl_loss_tmo and dev_loss_tmo expires, the controller is deleted.
> If my understanding is correct, this just adds a level of confusion
> to setup procedure doesn't it?
agree with your statements. And no, it doesn't add confusion. You are
ignoring the whole history coming from the SCSI side that is expected
and well-known. If anything, adding the per-controller ctrl_loss_tmo
that has different values, making fc-nvme behave differently than scsi
on the same port (running scsi and fc concurrently perhaps), is what
adds confusion.
-- james
^ permalink raw reply [flat|nested] 26+ messages in thread