From: Daniel Wagner <dwagner@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Daniel Wagner <wagi@kernel.org>,
James Smart <james.smart@broadcom.com>,
Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>,
Hannes Reinecke <hare@suse.de>, Paul Ely <paul.ely@broadcom.com>,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 2/3] nvme: trigger reset when keep alive fails
Date: Tue, 7 Jan 2025 15:38:38 +0100 [thread overview]
Message-ID: <693187ac-9fe2-4ba3-8fcf-e34204fe7247@flourine.local> (raw)
In-Reply-To: <a8c476ee-e639-4886-a2dd-6e7d08060fa2@grimberg.me>
On Tue, Dec 24, 2024 at 12:31:35PM +0200, Sagi Grimberg wrote:
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index bfd71511c85f8b1a9508c6ea062475ff51bf27fe..2a07c2c540b26c8cbe886711abaf6f0afbe6c4df 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -1320,6 +1320,12 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
> > dev_err(ctrl->device,
> > "failed nvme_keep_alive_end_io error=%d\n",
> > status);
> > + /*
> > + * The driver reports that we lost the connection,
> > + * trigger a recovery.
> > + */
> > + if (status == BLK_STS_TRANSPORT)
> > + nvme_reset_ctrl(ctrl);
> > return RQ_END_IO_NONE;
> > }
> >
>
> A lengthy explanation that results in nvme core behavior that assumes a very
> specific driver behavior.
I tried to explain exactly what's going on, so we can discuss possible
solutions without communicating past each other.
In the meantime I started on a patch set for the TP4129 related changes
in the spec (KATO Corrections and Clarifications). These changes would
also depend on the kato timeout handler triggering a reset.
I am fine with dropping this change for now and discuss it in the light
of TP4129 if this is what you prefer?
> Isn't the root of the problem that FC is willing to live
> peacefully with a controller
> without any queues/connectivity to it without periodically reconnecting?
The root problem is that the connect lost event gets ignored in the
CONNECTING state for the first connection attempt. All will work fine
for RECONNECTING state.
Maybe something like this instead? (untested)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index c4cbe3ce81f7..1f1d1d62a978 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -148,6 +148,7 @@ struct nvme_fc_rport {
#define ASSOC_ACTIVE 0
#define ASSOC_FAILED 1
#define FCCTRL_TERMIO 2
+#define CONNECTIVITY_LOST 3
struct nvme_fc_ctrl {
spinlock_t lock;
@@ -785,6 +786,8 @@ nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
"NVME-FC{%d}: controller connectivity lost. Awaiting "
"Reconnect", ctrl->cnum);
+ set_bit(CONNECTIVITY_LOST, &ctrl->flags);
+
switch (nvme_ctrl_state(&ctrl->ctrl)) {
case NVME_CTRL_NEW:
case NVME_CTRL_LIVE:
@@ -3071,6 +3074,8 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
if (nvme_fc_ctlr_active_on_rport(ctrl))
return -ENOTUNIQ;
+ clear_bit(CONNECTIVITY_LOST, &ctrl->flags);
+
dev_info(ctrl->ctrl.device,
"NVME-FC{%d}: create association : host wwpn 0x%016llx "
" rport wwpn 0x%016llx: NQN \"%s\"\n",
@@ -3174,6 +3179,11 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
+ if (test_bit(CONNECTIVITY_LOST, &ctrl->flags)) {
+ ret = -EIO;
+ goto out_term_aeo_ops;
+ }
+
ctrl->ctrl.nr_reconnects = 0;
if (changed)
next prev parent reply other threads:[~2025-01-07 15:52 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-29 9:28 [PATCH v3 0/3] nvme-fc: fix race with connectivity loss and nvme_fc_create_association Daniel Wagner
2024-11-29 9:28 ` [PATCH v3 1/3] nvme-fc: go straight to connecting state when initializing Daniel Wagner
2024-11-29 9:28 ` [PATCH v3 2/3] nvme: trigger reset when keep alive fails Daniel Wagner
2024-11-29 11:09 ` Hannes Reinecke
2024-12-09 13:36 ` Christoph Hellwig
2024-12-24 10:31 ` Sagi Grimberg
2025-01-07 14:38 ` Daniel Wagner [this message]
2025-01-08 10:50 ` Sagi Grimberg
2024-11-29 9:28 ` [PATCH v3 3/3] nvme: handle connectivity loss in nvme_set_queue_count Daniel Wagner
2024-11-29 11:10 ` Hannes Reinecke
2024-12-17 8:35 ` Daniel Wagner
2024-12-17 9:45 ` Hannes Reinecke
2024-12-17 14:01 ` Daniel Wagner
2024-12-20 8:32 ` Hannes Reinecke
2024-12-24 10:35 ` Sagi Grimberg
2025-01-07 14:40 ` Daniel Wagner
2025-01-08 10:51 ` Sagi Grimberg
2024-12-24 10:39 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=693187ac-9fe2-4ba3-8fcf-e34204fe7247@flourine.local \
--to=dwagner@suse.de \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=james.smart@broadcom.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=paul.ely@broadcom.com \
--cc=sagi@grimberg.me \
--cc=wagi@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox