Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Daniel Wagner <dwagner@suse.de>
Cc: Daniel Wagner <wagi@kernel.org>,
	James Smart <james.smart@broadcom.com>,
	Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>,
	Hannes Reinecke <hare@suse.de>, Paul Ely <paul.ely@broadcom.com>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 2/3] nvme: trigger reset when keep alive fails
Date: Wed, 8 Jan 2025 12:50:50 +0200	[thread overview]
Message-ID: <b69d5f8c-9bf3-4f31-985c-902bb3bbe93c@grimberg.me> (raw)
In-Reply-To: <693187ac-9fe2-4ba3-8fcf-e34204fe7247@flourine.local>




On 07/01/2025 16:38, Daniel Wagner wrote:
> On Tue, Dec 24, 2024 at 12:31:35PM +0200, Sagi Grimberg wrote:
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index bfd71511c85f8b1a9508c6ea062475ff51bf27fe..2a07c2c540b26c8cbe886711abaf6f0afbe6c4df 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -1320,6 +1320,12 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
>>>    		dev_err(ctrl->device,
>>>    			"failed nvme_keep_alive_end_io error=%d\n",
>>>    				status);
>>> +		/*
>>> +		 * The driver reports that we lost the connection,
>>> +		 * trigger a recovery.
>>> +		 */
>>> +		if (status == BLK_STS_TRANSPORT)
>>> +			nvme_reset_ctrl(ctrl);
>>>    		return RQ_END_IO_NONE;
>>>    	}
>>>
>> A lengthy explanation that results in nvme core behavior that assumes a very
>> specific driver behavior.
> I tried to explain exactly what's going on, so we can discuss possible
> solutions without communicating past each other.
>
> In the meantime I started on a patch set for the TP4129 related changes
> in the spec (KATO Corrections and Clarifications). These changes would
> also depend on the kato timeout handler triggering a reset.
>
> I am fine with dropping this change for now and discuss it in the light
> of TP4129 if this is what you prefer?
>
>> Isn't the root of the problem that FC is willing to live
>> peacefully with a controller
>> without any queues/connectivity to it without periodically reconnecting?
> The root problem is that the connect lost event gets ignored in the
> CONNECTING state for the first connection attempt. All will work fine
> for RECONNECTING state.
>
> Maybe something like this instead? (untested)
>
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index c4cbe3ce81f7..1f1d1d62a978 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -148,6 +148,7 @@ struct nvme_fc_rport {
>   #define ASSOC_ACTIVE		0
>   #define ASSOC_FAILED		1
>   #define FCCTRL_TERMIO		2
> +#define CONNECTIVITY_LOST	3
>
>   struct nvme_fc_ctrl {
>   	spinlock_t		lock;
> @@ -785,6 +786,8 @@ nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
>   		"NVME-FC{%d}: controller connectivity lost. Awaiting "
>   		"Reconnect", ctrl->cnum);
>
> +	set_bit(CONNECTIVITY_LOST, &ctrl->flags);
> +
>   	switch (nvme_ctrl_state(&ctrl->ctrl)) {
>   	case NVME_CTRL_NEW:
>   	case NVME_CTRL_LIVE:
> @@ -3071,6 +3074,8 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>   	if (nvme_fc_ctlr_active_on_rport(ctrl))
>   		return -ENOTUNIQ;
>
> +	clear_bit(CONNECTIVITY_LOST, &ctrl->flags);
> +
>   	dev_info(ctrl->ctrl.device,
>   		"NVME-FC{%d}: create association : host wwpn 0x%016llx "
>   		" rport wwpn 0x%016llx: NQN \"%s\"\n",
> @@ -3174,6 +3179,11 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>
>   	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
>
> +	if (test_bit(CONNECTIVITY_LOST, &ctrl->flags)) {
> +		ret = -EIO;
> +		goto out_term_aeo_ops;
> +	}
> +
>   	ctrl->ctrl.nr_reconnects = 0;
>
>   	if (changed)

This looks a lot better to me.


  reply	other threads:[~2025-01-08 10:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-29  9:28 [PATCH v3 0/3] nvme-fc: fix race with connectivity loss and nvme_fc_create_association Daniel Wagner
2024-11-29  9:28 ` [PATCH v3 1/3] nvme-fc: go straight to connecting state when initializing Daniel Wagner
2024-11-29  9:28 ` [PATCH v3 2/3] nvme: trigger reset when keep alive fails Daniel Wagner
2024-11-29 11:09   ` Hannes Reinecke
2024-12-09 13:36   ` Christoph Hellwig
2024-12-24 10:31   ` Sagi Grimberg
2025-01-07 14:38     ` Daniel Wagner
2025-01-08 10:50       ` Sagi Grimberg [this message]
2024-11-29  9:28 ` [PATCH v3 3/3] nvme: handle connectivity loss in nvme_set_queue_count Daniel Wagner
2024-11-29 11:10   ` Hannes Reinecke
2024-12-17  8:35     ` Daniel Wagner
2024-12-17  9:45       ` Hannes Reinecke
2024-12-17 14:01         ` Daniel Wagner
2024-12-20  8:32           ` Hannes Reinecke
2024-12-24 10:35       ` Sagi Grimberg
2025-01-07 14:40         ` Daniel Wagner
2025-01-08 10:51           ` Sagi Grimberg
2024-12-24 10:39   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b69d5f8c-9bf3-4f31-985c-902bb3bbe93c@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=dwagner@suse.de \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=paul.ely@broadcom.com \
    --cc=wagi@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox