From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-dl1-f41.google.com (mail-dl1-f41.google.com [74.125.82.41])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF9C725B31D
	for <linux-kernel@vger.kernel.org>; Thu, 26 Mar 2026 02:37:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.41
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774492636; cv=none; b=t5hMhPzj+85/xQta3219Her691H/p0cNoF12MFzoyIzJvhArB5GOtM3yd0qx1N9aZuKCZGp9mPUoipogrXlummx09ImgfdClZj18x6TxixpRn2kQPogbpep9t1HbBEW82vX3wHYcmEMpsEkMm62rbIkj8dnQoOthS4iHBu16d/M=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774492636; c=relaxed/simple;
	bh=OArDaIi5YO5q/mSGyxGdTozJ3DCDyFUgTJ7oyDLHric=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=Md2Wb0cdiyz27o2jO1IztRqbugktQQnYVcasMfYn5IkYmzWp1YpecgQGiLP05LRN7N4zZaN6xVH7bNgM11TtRmKRz8h+B38JmD/zuzK7YesQErJ0GCoXTiuCwu5g0RRBXKWC8N17Nf8wP4ZBS0sCNQlwZRPqkF7Js5pvCSqv8PI=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=AFTotHEL; arc=none smtp.client-ip=74.125.82.41
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=purestorage.com
Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="AFTotHEL"
Received: by mail-dl1-f41.google.com with SMTP id a92af1059eb24-1274204434bso1233824c88.1
        for <linux-kernel@vger.kernel.org>; Wed, 25 Mar 2026 19:37:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=purestorage.com; s=google2022; t=1774492634; x=1775097434; darn=vger.kernel.org;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;
        bh=k9kEmpa36Y5vTCjxOeRrg3+IOOIPShrE+kuP9CcuV2s=;
        b=AFTotHELQBz4nmakAFPtvkc25iq/0tVFDCaStY/+QShpS/jj5IWgvQ6nEmCapajQyY
         R8T5k87oY6fYzjW1+WZ9H93ME/P3x0XpHaGOZyP2s9O3lqhGkTEc0UIqBONc7jzh0XJf
         6IDCjreu0eZH3fWP8HINvVFP3wxMg5ZZNcnZIbtzxFWh4fMMXykyz+Ex5iAC1huAX4w3
         FZe0WpGAEJjbIwzIhiOvKqwwwv31G2Z/OyvmeC4FkoXa4PYWBrsZROXAUIZ4AhdQ0zah
         LktOxs2edC4OQDs/utikIgZzqH1zIMWB0wCONvbpwD7A1ek/oq3lVgHzdeal/sCvUTdY
         mKIw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774492634; x=1775097434;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=k9kEmpa36Y5vTCjxOeRrg3+IOOIPShrE+kuP9CcuV2s=;
        b=JaCfZhTmPfcGt0XfZPV74Zo/T84dfxcmtmuoez7pjobYzJoEr5pLmMvHwhYzlko5Do
         f7LeEfzxalo4uoe7mAxDfaUvX9DxMao6hXHYEHmxVbA9hknawJkFE/Y/g/aFhZYO5iDi
         d1Ye58qHdxwRmmiKGaGuWz7KiSMiyfVEm4BhSr1dorgty/RMp/7U7B9OTdp+uXKXwkYO
         6GxusgAS+Fr/l+GUSqLDpwXm15qYY1U+6V+Krl7s+RpWsRGetDxgHWiKTwVgHmpawatK
         TgvarJGkMLnBpmIoIYwo84eLk3G+OMIDlxDkqVx4RcJcnJeGfu5VGdyFI+Yn9mvmYxh5
         juLg==
X-Forwarded-Encrypted: i=1; AJvYcCXGm/Crf85FXpjCU9p9bXawRGJ7Ixoq2OpAJVlVGUQeWNeIKK8QZfu8PwLIpiP0qJ/8vGO2eeBr4RahNYA=@vger.kernel.org
X-Gm-Message-State: AOJu0Yy3VG39zML8B9vgsSiGBVQOp37IjGsA6DnqmQCp7kGpUUknfzYq
	x+elagFt0CxTcFCjB8w/LPcAYPzMiLosH82hJyiL8msG4kIN68VYydqfQI3RDU6TrhY=
X-Gm-Gg: ATEYQzwJDiYYkF0EpL+Pab+LROEz2WdvKddxNMQw+9g9+/odp7lqtyJ4Qn5b5jE/jK9
	rTPaPmBte+Ag5P1U4eXC5Tf4T28X+mbG5X2PBn2PtOxGOqOH2YfH2rTTYsJKlJktfJ/XEAdkwwl
	2xu5AIw9H3rWgSGbcq9Kb8hbTQNGpWdbiUnqcYy8XrPbO1hNx9BlZ7UbnQBqL1XXnha2yUrC804
	SPx/B/KK9LKL3hDjJ5DvYYTU21ognNz7iv5cO3U9vfIercGYsQOODc9IXEdLoqCLne1tAW8CjW3
	RTVpT2DjvuegdPAZ5cFIRPj0iBX/EX/RXEy1d6P0hnkFWVs1ju/ugsaJarNJbeYT1eIBZD+Lx1g
	DrfHGUchBGb/z8o/UQGdd4t23rbPTQwvHWPB192ScdsReiXR7oYBnia+GBYL9DtPfF4wZb2UpE2
	Yc0QojArs8uwtfqw+8hO15go4kqhulixCsJ2iRjjY=
X-Received: by 2002:a05:7022:2386:b0:11b:2a5:3b9b with SMTP id a92af1059eb24-12a968676a8mr2688886c88.8.1774492633484;
        Wed, 25 Mar 2026 19:37:13 -0700 (PDT)
Received: from medusa.lab.kspace.sh ([2601:640:8202:6fb0::9c63])
        by smtp.googlemail.com with UTF8SMTPSA id a92af1059eb24-12aa7248731sm1803130c88.4.2026.03.25.19.37.12
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 25 Mar 2026 19:37:12 -0700 (PDT)
Date: Wed, 25 Mar 2026 19:37:11 -0700
From: Mohamed Khalfella <mkhalfella@purestorage.com>
To: James Smart <jsmart833426@gmail.com>
Cc: Justin Tee <justin.tee@broadcom.com>,
	Naresh Gottumukkala <nareshgottumukkala83@gmail.com>,
	Paul Ely <paul.ely@broadcom.com>,
	Chaitanya Kulkarni <kch@nvidia.com>, Christoph Hellwig <hch@lst.de>,
	Jens Axboe <axboe@kernel.dk>, Keith Busch <kbusch@kernel.org>,
	Sagi Grimberg <sagi@grimberg.me>, Hannes Reinecke <hare@suse.de>,
	Aaron Dailey <adailey@purestorage.com>,
	Randy Jennings <randyj@purestorage.com>,
	Dhaval Giani <dgiani@purestorage.com>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 12/21] nvme-fc: Decouple error recovery from
 controller reset
Message-ID: <20260326023711.GA724739-mkhalfella@purestorage.com>
References: <20260214042753.4073668-1-mkhalfella@purestorage.com>
 <20260214042753.4073668-13-mkhalfella@purestorage.com>
 <f3886902-12ed-4d9c-abee-0b2363a0ada8@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <f3886902-12ed-4d9c-abee-0b2363a0ada8@gmail.com>

On Fri 2026-02-27 16:12:05 -0800, James Smart wrote:
> On 2/13/2026 8:25 PM, Mohamed Khalfella wrote:
> > nvme_fc_error_recovery() called from nvme_fc_timeout() while controller
> > in CONNECTING state results in deadlock reported in link below. Update
> > nvme_fc_timeout() to schedule error recovery to avoid the deadlock.
> 
> This seems misleading on what is changing...
> 
> How about:
> Add new nvme_fc_start_ioerr_recovery() routine which effectively
>    "resets" a the controller.
> Refactor error points that invoked routines that reset the controller
>    to now call nvme_fc_start_ioerr_recovery().
> Eliminated io abort on io error, as we will be resetting the controller.
> 

nvme-fc: Refactor IO error recovery

Added new nvme_fc_start_ioerr_recovery() to trigger error recovery
instead of directly queueing ctrl->ioerr_work. nvme_fc_error_recovery()
now called only from ctrl->ioerr_work has been updated to not depend on
nvme_reset_ctrl() to handle error recovery. nvme_fc_error_recovery()
effectively resets the controller and attempts reconnection if needed.
This makes nvme-fc ioerr handling similar to other fabric transports.

Update nvme_fc_timeout() to not abort timed out IOs. IOs aborted from
nvme_fc_timeout() are not accounted for in ctrl->iocnt and this causes
nvme_fc_delete_association() not to wait for them. Instead of aborting
IOs nvme_fc_timeout() calls nvme_fc_start_ioerr_recovery() to start IO
error recovery. Since error recovery runs in ctrl->ioerr_work this
change fixes the issue reported in the link below.

Above is the updated commit message. Let me know if there is any part
you want me to change before I submit v4.

> 
> > 
> > Previous to this change if controller was LIVE error recovery resets
> > the controller and this does not match nvme-tcp and nvme-rdma. Decouple
> > error recovery from controller reset to match other fabric transports.
> 
> Please delete. It's irrelevant to the patch.

Deleted.

> 
> 
> ...
> > @@ -1871,7 +1874,22 @@ nvme_fc_ctrl_ioerr_work(struct work_struct *work)
> >   	struct nvme_fc_ctrl *ctrl =
> >   			container_of(work, struct nvme_fc_ctrl, ioerr_work);
> >   
> > -	nvme_fc_error_recovery(ctrl, "transport detected io error");
> > +	/*
> > +	 * if an error (io timeout, etc) while (re)connecting, the remote
> > +	 * port requested terminating of the association (disconnect_ls)
> > +	 * or an error (timeout or abort) occurred on an io while creating
> > +	 * the controller.  Abort any ios on the association and let the
> > +	 * create_association error path resolve things.
> > +	 */
> > +	if (nvme_ctrl_state(&ctrl->ctrl) == NVME_CTRL_CONNECTING) {
> > +		__nvme_fc_abort_outstanding_ios(ctrl, true);
> > +		dev_warn(ctrl->ctrl.device,
> > +			 "NVME-FC{%d}: transport error during (re)connect\n",
> > +			 ctrl->cnum);
> > +		return;
> > +	}
> > +
> > +	nvme_fc_error_recovery(ctrl);
> >   }
> 
> ok - but see below...
> 
> 
> > +static void nvme_fc_start_ioerr_recovery(struct nvme_fc_ctrl *ctrl,
> > +					 char *errmsg)
> > +{
> > +	enum nvme_ctrl_state state;
> > +
> > +	if (nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING)) {
> > +		dev_warn(ctrl->ctrl.device, "NVME-FC{%d}: starting error recovery %s\n",
> > +			 ctrl->cnum, errmsg);
> > +		queue_work(nvme_reset_wq, &ctrl->ioerr_work);
> > +		return;
> > +	}
> > +
> > +	state = nvme_ctrl_state(&ctrl->ctrl);
> > +	if (state == NVME_CTRL_CONNECTING || state == NVME_CTRL_DELETING ||
> > +	    state == NVME_CTRL_DELETING_NOIO) {
> > +		queue_work(nvme_reset_wq, &ctrl->ioerr_work);
> > +	}
> > +}
> 
> What bothers me about this (true of the tcp and rmda transports) is 
> there is little difference between this and using the core 
> nvme_reset_ctrl(), excepting that even when the state change fails, the 
> code continues to schedule the work element that does the reset.

It does bother me too. The existance of controller reset and error
recovery as two separate and very similar codepaths has been pointed to
in emails in this very patchset. I think at some point the two codepaths
should be refactored. Until this happens the change above should be easy
to understand.

> 
> And the latter odd snippet to reset anyway is only to get the CONNECTING 
> code snippet, which failed the RESETTING transition, to be performed. 
> I'd prefer the connecting snippet be at the top of start_ioerr_recovery 
> before any state change attempt so its in the same place as prior.

Updated nvme_fc_start_ioerr_recovery() to handle the case of CONNECTING,
DELETING, DELETING_NOIO first.

> 
> 
> ...
> >   static enum blk_eh_timer_return nvme_fc_timeout(struct request *rq)
> >   {
> >   	struct nvme_fc_fcp_op *op = blk_mq_rq_to_pdu(rq);
> > @@ -2536,24 +2539,14 @@ static enum blk_eh_timer_return nvme_fc_timeout(struct request *rq)
> >   	struct nvme_fc_cmd_iu *cmdiu = &op->cmd_iu;
> >   	struct nvme_command *sqe = &cmdiu->sqe;
> >   
> > -	/*
> > -	 * Attempt to abort the offending command. Command completion
> > -	 * will detect the aborted io and will fail the connection.
> > -	 */
> >   	dev_info(ctrl->ctrl.device,
> >   		"NVME-FC{%d.%d}: io timeout: opcode %d fctype %d (%s) w10/11: "
> >   		"x%08x/x%08x\n",
> >   		ctrl->cnum, qnum, sqe->common.opcode, sqe->fabrics.fctype,
> >   		nvme_fabrics_opcode_str(qnum, sqe),
> >   		sqe->common.cdw10, sqe->common.cdw11);
> > -	if (__nvme_fc_abort_op(ctrl, op))
> > -		nvme_fc_error_recovery(ctrl, "io timeout abort failed");
> >   
> > -	/*
> > -	 * the io abort has been initiated. Have the reset timer
> > -	 * restarted and the abort completion will complete the io
> > -	 * shortly. Avoids a synchronous wait while the abort finishes.
> > -	 */
> > +	nvme_fc_start_ioerr_recovery(ctrl, "io timeout");
> >   	return BLK_EH_RESET_TIMER;
> >   }
> 
> I eventually gave in on not doing the abort of the io as the 
> start_ioerr_recovery() will be resetting the controller.
> 
> 
> >   
> > @@ -3352,6 +3345,27 @@ nvme_fc_reset_ctrl_work(struct work_struct *work)
> >   	}
> >   }
> >   
> > +static void
> > +nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl)
> > +{
> > +	nvme_stop_keep_alive(&ctrl->ctrl);
> > +	nvme_stop_ctrl(&ctrl->ctrl);
> > +	flush_work(&ctrl->ctrl.async_event_work);
> > +
> > +	/* will block while waiting for io to terminate */
> > +	nvme_fc_delete_association(ctrl);
> > +
> > +	/* Do not reconnect if controller is being deleted */
> > +	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING))
> > +		return;
> > +
> > +	if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) {
> > +		queue_delayed_work(nvme_wq, &ctrl->connect_work, 0);
> > +		return;
> > +	}
> > +
> > +	nvme_fc_reconnect_or_delete(ctrl, -ENOTCONN);
> > +}
> >   
> >   static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = {
> >   	.name			= "fc",
> 
> There is no reason to duplicate the code that is already in ioerr_work. 
> I prototyped a simple service routine. The net/net is showed what little 
> reason there is to have an ioerr_work and a reset_work - as they are 
> effectively the same. So I then eliminated ioerr_work and use reset_work 
> and the service routine (kept the nvme_fc_error_recovery() name).
> 
> 
> Here's a revised diff for this patch... I have compiled but not tested.
> 
> 
> --- fc.c.START	2026-02-27 14:10:07.631705123 -0800
> +++ fc.c	2026-02-27 15:41:09.777836476 -0800
> @@ -166,7 +166,6 @@ struct nvme_fc_ctrl {
>   	struct blk_mq_tag_set	admin_tag_set;
>   	struct blk_mq_tag_set	tag_set;
> 
> -	struct work_struct	ioerr_work;
>   	struct delayed_work	connect_work;
> 
>   	struct kref		ref;
> @@ -227,6 +226,8 @@ static DEFINE_IDA(nvme_fc_ctrl_cnt);
>   static struct device *fc_udev_device;
> 
>   static void nvme_fc_complete_rq(struct request *rq);
> +static void nvme_fc_start_ioerr_recovery(struct nvme_fc_ctrl *ctrl,
> +					char *errmsg);
> 
>   /* *********************** FC-NVME Port Management 
> ************************ */
> 
> @@ -788,7 +789,7 @@ nvme_fc_ctrl_connectivity_loss(struct nv
>   		"Reconnect", ctrl->cnum);
> 
>   	set_bit(ASSOC_FAILED, &ctrl->flags);
> -	nvme_reset_ctrl(&ctrl->ctrl);
> +	nvme_fc_start_ioerr_recovery(ctrl, "Connectivity Loss");
>   }
> 
>   /**
> @@ -985,8 +986,6 @@ fc_dma_unmap_sg(struct device *dev, stru
>   static void nvme_fc_ctrl_put(struct nvme_fc_ctrl *);
>   static int nvme_fc_ctrl_get(struct nvme_fc_ctrl *);
> 
> -static void nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char 
> *errmsg);
> -
>   static void
>   __nvme_fc_finish_ls_req(struct nvmefc_ls_req_op *lsop)
>   {
> @@ -1569,7 +1568,8 @@ nvme_fc_ls_disconnect_assoc(struct nvmef
>   	 */
> 
>   	/* fail the association */
> -	nvme_fc_error_recovery(ctrl, "Disconnect Association LS received");
> +	nvme_fc_start_ioerr_recovery(ctrl,
> +				     "Disconnect Association LS received");
> 
>   	/* release the reference taken by nvme_fc_match_disconn_ls() */
>   	nvme_fc_ctrl_put(ctrl);
> @@ -1865,15 +1865,6 @@ __nvme_fc_fcpop_chk_teardowns(struct nvm
>   	}
>   }
> 
> -static void
> -nvme_fc_ctrl_ioerr_work(struct work_struct *work)
> -{
> -	struct nvme_fc_ctrl *ctrl =
> -			container_of(work, struct nvme_fc_ctrl, ioerr_work);
> -
> -	nvme_fc_error_recovery(ctrl, "transport detected io error");
> -}
> -
>   /*
>    * nvme_fc_io_getuuid - Routine called to get the appid field
>    * associated with request by the lldd
> @@ -2049,9 +2040,8 @@ done:
>   		nvme_fc_complete_rq(rq);
> 
>   check_error:
> -	if (terminate_assoc &&
> -	    nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_RESETTING)
> -		queue_work(nvme_reset_wq, &ctrl->ioerr_work);
> +	if (terminate_assoc)
> +		nvme_fc_start_ioerr_recovery(ctrl, "io error");
>   }
> 
>   static int
> @@ -2496,7 +2486,7 @@ __nvme_fc_abort_outstanding_ios(struct n
>   }
> 
>   static void
> -nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
> +nvme_fc_start_ioerr_recovery(struct nvme_fc_ctrl *ctrl,	char *errmsg)
>   {
>   	enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
> 
> @@ -2515,17 +2505,15 @@ nvme_fc_error_recovery(struct nvme_fc_ct
>   		return;
>   	}
> 
> -	/* Otherwise, only proceed if in LIVE state - e.g. on first error */
> -	if (state != NVME_CTRL_LIVE)
> +	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
>   		return;
> 
>   	dev_warn(ctrl->ctrl.device,
>   		"NVME-FC{%d}: transport association event: %s\n",
>   		ctrl->cnum, errmsg);
> -	dev_warn(ctrl->ctrl.device,
> -		"NVME-FC{%d}: resetting controller\n", ctrl->cnum);
> -
> -	nvme_reset_ctrl(&ctrl->ctrl);
> +	dev_warn(ctrl->ctrl.device, "NVME-FC{%d}: starting error recovery %s\n",
> +		 ctrl->cnum, errmsg);
> +	queue_work(nvme_reset_wq, &ctrl->ctrl.reset_work);
>   }
> 
>   static enum blk_eh_timer_return nvme_fc_timeout(struct request *rq)
> @@ -2536,24 +2524,14 @@ static enum blk_eh_timer_return nvme_fc_
>   	struct nvme_fc_cmd_iu *cmdiu = &op->cmd_iu;
>   	struct nvme_command *sqe = &cmdiu->sqe;
> 
> -	/*
> -	 * Attempt to abort the offending command. Command completion
> -	 * will detect the aborted io and will fail the connection.
> -	 */
>   	dev_info(ctrl->ctrl.device,
>   		"NVME-FC{%d.%d}: io timeout: opcode %d fctype %d (%s) w10/11: "
>   		"x%08x/x%08x\n",
>   		ctrl->cnum, qnum, sqe->common.opcode, sqe->fabrics.fctype,
>   		nvme_fabrics_opcode_str(qnum, sqe),
>   		sqe->common.cdw10, sqe->common.cdw11);
> -	if (__nvme_fc_abort_op(ctrl, op))
> -		nvme_fc_error_recovery(ctrl, "io timeout abort failed");
> 
> -	/*
> -	 * the io abort has been initiated. Have the reset timer
> -	 * restarted and the abort completion will complete the io
> -	 * shortly. Avoids a synchronous wait while the abort finishes.
> -	 */
> +	nvme_fc_start_ioerr_recovery(ctrl, "io timeout");
>   	return BLK_EH_RESET_TIMER;
>   }
> 
> @@ -3264,7 +3242,7 @@ nvme_fc_delete_ctrl(struct nvme_ctrl *nc
>   	 * waiting for io to terminate
>   	 */
>   	nvme_fc_delete_association(ctrl);
> -	cancel_work_sync(&ctrl->ioerr_work);
> +	cancel_work_sync(&ctrl->ctrl.reset_work);
> 
>   	if (ctrl->ctrl.tagset)
>   		nvme_remove_io_tag_set(&ctrl->ctrl);
> @@ -3324,20 +3302,27 @@ nvme_fc_reconnect_or_delete(struct nvme_
>   }
> 
>   static void
> -nvme_fc_reset_ctrl_work(struct work_struct *work)
> +nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl)
>   {
> -	struct nvme_fc_ctrl *ctrl =
> -		container_of(work, struct nvme_fc_ctrl, ctrl.reset_work);
> -
> +	nvme_stop_keep_alive(&ctrl->ctrl);
> +	flush_work(&ctrl->ctrl.async_event_work);
>   	nvme_stop_ctrl(&ctrl->ctrl);
> 
>   	/* will block will waiting for io to terminate */
>   	nvme_fc_delete_association(ctrl);
> 
> -	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING))
> +	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
> +		enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
> +
> +		/* state change failure is ok if we started ctrl delete */
> +		if (state == NVME_CTRL_DELETING ||
> +		    state == NVME_CTRL_DELETING_NOIO)
> +			return;
> +
>   		dev_err(ctrl->ctrl.device,
> -			"NVME-FC{%d}: error_recovery: Couldn't change state "
> -			"to CONNECTING\n", ctrl->cnum);
> +			"NVME-FC{%d}: error_recovery: Couldn't change "
> +			"state to CONNECTING (%d)\n", ctrl->cnum, state);
> +	}
> 
>   	if (ctrl->rport->remoteport.port_state == FC_OBJSTATE_ONLINE) {
>   		if (!queue_delayed_work(nvme_wq, &ctrl->connect_work, 0)) {
> @@ -3352,6 +3337,15 @@ nvme_fc_reset_ctrl_work(struct work_stru
>   	}
>   }
> 
> +static void
> +nvme_fc_reset_ctrl_work(struct work_struct *work)
> +{
> +	struct nvme_fc_ctrl *ctrl =
> +		container_of(work, struct nvme_fc_ctrl, ctrl.reset_work);
> +
> +	nvme_fc_error_recovery(ctrl);
> +}
> +
> 
>   static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = {
>   	.name			= "fc",
> @@ -3483,7 +3477,6 @@ nvme_fc_alloc_ctrl(struct device *dev, s
> 
>   	INIT_WORK(&ctrl->ctrl.reset_work, nvme_fc_reset_ctrl_work);
>   	INIT_DELAYED_WORK(&ctrl->connect_work, nvme_fc_connect_ctrl_work);
> -	INIT_WORK(&ctrl->ioerr_work, nvme_fc_ctrl_ioerr_work);
>   	spin_lock_init(&ctrl->lock);
> 
>   	/* io queue count */
> @@ -3581,7 +3574,6 @@ nvme_fc_init_ctrl(struct device *dev, st
> 
>   fail_ctrl:
>   	nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING);
> -	cancel_work_sync(&ctrl->ioerr_work);
>   	cancel_work_sync(&ctrl->ctrl.reset_work);
>   	cancel_delayed_work_sync(&ctrl->connect_work);
> 

nvme_fc_timeout() ->
  nvme_fc_start_ioerr_recovery() ->
    __nvme_fc_abort_outstanding_ios() ->
      blk_sync_queue();

The codepath in the patch above will cause a deadlock.


nvme_fc_unregister_remoteport() ->
  nvme_fc_ctrl_connectivity_loss() ->
    nvme_fc_start_ioerr_recovery()

nvme_fc_fcpio_done() ->
  nvme_fc_start_ioerr_recovery()

The above codepaths use LLDD threads to do recovery. I thought we should
not be doing that.