From mboxrd@z Thu Jan  1 00:00:00 1970
From: hch@lst.de (Christoph Hellwig)
Date: Tue, 29 May 2018 14:47:29 +0200
Subject: [PATCH 02/10] nvme: ANA transition timeout handling
In-Reply-To: <20180529101431.62271-3-hare@suse.de>
References: <20180529101431.62271-1-hare@suse.de>
 <20180529101431.62271-3-hare@suse.de>
Message-ID: <20180529124729.GC7376@lst.de>

> +	if (ns->anagrpid != le32_to_cpu(id->anagrpid)) {
> +		dev_warn(ctrl->device, "nsid %d ANA group id changed\n",
> +			 ns->head->ns_id);
> +		queue_delayed_work(nvme_wq, &ctrl->ana_work, 0);
> +	}

No need to queue any work if an anagrpid changed.  We'll automatically
index into the right group once it has changed.

> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 1a8791340862..2fcaf50d84e2 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -69,6 +69,8 @@ void nvme_failover_req(struct request *req)
>  		 * entirely trivial..
>  		 */
>  		nvme_update_ana_state(ns, NVME_ANA_CHANGE);
> +		queue_delayed_work(nvme_wq, &ns->ctrl->ana_work,
> +				   ns->ctrl->anatt * HZ);

This doesn't make much sense.  Once we get the ana transitioning
status we should either retry the command up to ANATT or try another
path.  There is no point in scheduling a read of the log page after
ANATT, as we'll already get an AEN when that log page is ready.

> @@ -323,7 +325,7 @@ static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only)
>  			ctrl->ana_log_buf, ctrl->ana_log_size, 0);
>  	if (error) {
>  		dev_warn(ctrl->device, "Failed to get ANA log: %d\n", error);
> -		return error;
> +		return -EIO;
>  	}
>  
>  	for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) {
> @@ -345,6 +347,8 @@ static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only)
>  		dev_info(ctrl->device, "ANA group %d: %s.\n",
>  				grpid, nvme_ana_state_names[desc->state]);
>  		WRITE_ONCE(ctrl->ana_state[grpid], desc->state);
> +		if (desc->state == NVME_ANA_CHANGE)
> +			error = -EAGAIN;

Huh?  Why would be stop processing our log when we see a change state?
This looks extremely dubious to me, and does not match the changelog
either.

> +	if (!ctrl->ana_log_buf)
> +		return;

How would the log buf disappear?  Even if it does please does this
in a separate, documented patch.

> +	if (ctrl->state != NVME_CTRL_LIVE)
> +		return;

This looks sensible, but it should probably also check for ADMIN_LIVE
for completeness, and be a seprate, properly documented patch.

> +		/*
> +		 * In case of an I/O error just add a small delay to not hit
> +		 * the target too hard
> +		 */
> +		if (ret == -EIO)
> +			log_delay = msecs_to_jiffies(NVME_ANA_LOG_DELAY);
> +		queue_delayed_work(nvme_wq, &ctrl->ana_work, log_delay);

What is the rationale for this I/O error handling?  In NVMe over
Fabrics transport errors tear down the association, so I really
don't see why we should handle errors here.