[PATCH v3 0/5] nvme_fc: add dev_loss_tmo support

* [PATCH v3 0/5] nvme_fc: add dev_loss_tmo support
@ 2017-10-17 23:32 James Smart
  2017-10-17 23:32 ` [PATCH v3 1/5] nvme core: allow controller RESETTING to RECONNECTING transition James Smart
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: James Smart @ 2017-10-17 23:32 UTC (permalink / raw)

FC, on the SCSI side, has long had a device loss timeout which governed
how long it would hide connectivity loss to remote target ports.
The timeout value is maintained in the SCSI FC transport and admins
are used to going there to maintain it.

Eventually, the SCSI FC transport will be moved into something
independent from and above SCSI so that SCSI and NVME protocols can
be peers. In the meantime, to add the functionality now, and sync with
the SCSI FC transport, the LLDD will be used as the conduit. The
initial value for the timeout can be set by the LLDD when it creates
the remoteport via nvme_fc_register_remoteport(). Later, if the value
is updated via the SCSI transport, the LLDD can call a new nvme_fc
routine to update the remoteport's dev_loss_tmo value.

The nvme fabrics implementation already has a similar timer, the
ctrl_loss_tmo, which is distilled into a max_reconnect attempts and
a reconnect_delay between attempts, where the overall duration until
max is hit is the ctrl_loss_tmo.  This was primarily for transports
that didn't have the ability to track device connectivity and would
retry per the delay until finally giving up.

The implementation in this patch set implements the FC dev_loss_tmo
at the FC port level. The timer is initiated when connectivity is
lost. If connectivity is not re-established and the timer expires,
all controllers on the remote port will be deleted.

When the FC remoteport-level connectivity is lost, all controllers
on the remoteport are reset, which results in them transitioning to
a reconnecting state, with the ctrl_loss_tmo behavior kicks in. Thus
the controller may be deleted as soon as ctrl_loss_tmo expires or
the FC port level dev_loss_tmo expires.

If connectivity is re-established before the dev_loss_tmo expires,
any controllers on the remoteport, which would be in a reconnecting
state, would immediately have a reconnect attempted.

The patches were cut on the nvme-4.15 branch
Patch 5, which adds the dev_loss_tmo timeout, is dependent on the
nvme_fc_signal_discovery_scan() routine added by this patch:
http://lists.infradead.org/pipermail/linux-nvme/2017-September/012781.html
The patch has been approved but not yet pulled into a tree.

V3:
 In v2, the implementation merged the dev_loss_tmo value into the
 ctlr_loss_tmo in the controller, so only a single timer on each controller
 was running.
 V3 changed to keep the dev_loss_tmo on the FC remoteport and to run it
 independently from the ctrl_loss_tmo timer, excepting for loss of
 connectivity to start both simultaneously.

James Smart (5):
  nvme core: allow controller RESETTING to RECONNECTING transition
  nvme_fc: change ctlr state assignments during reset/reconnect
  nvme_fc: add a dev_loss_tmo field to the remoteport
  nvme_fc: check connectivity before initiating reconnects
  nvme_fc: add dev_loss_tmo timeout and remoteport resume support

 drivers/nvme/host/core.c       |   1 +
 drivers/nvme/host/fc.c         | 337 ++++++++++++++++++++++++++++++++++++-----
 include/linux/nvme-fc-driver.h |  11 +-
 3 files changed, 310 insertions(+), 39 deletions(-)

-- 
2.13.1

^ permalink raw reply	[flat|nested] 20+ messages in thread