[PATCH v4 0/3] nvme-fc: fix race with connectivity loss and nvme_fc_create

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/3] nvme-fc: fix race with connectivity loss and nvme_fc_create_association
@ 2025-01-09 13:30 Daniel Wagner
  2025-01-09 13:30 ` [PATCH v4 1/3] nvme-fc: go straight to connecting state when initializing Daniel Wagner
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Daniel Wagner @ 2025-01-09 13:30 UTC (permalink / raw)
  To: James Smart, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Hannes Reinecke, Paul Ely
  Cc: linux-nvme, linux-kernel, Daniel Wagner

As requested by Sagi, I've fixed the race window in nvme-fc by using the
already existing ASSOC_FAILED flag for tracking the connectivity loss.

Daniel

previous cover letter:

We got a bug report that a controller was stuck in the connected state
after an association dropped.

It turns out that nvme_fc_create_association can succeed even though some
operation do fail. This is on purpose to handle the degraded controller
case, where the admin queue is up and running but not the io queues. In
this case the controller will still reach the LIVE state.

Unfortunatly, this will also ignore full connectivity loss for fabric
controllers. Let's address this by not filtering out all errors in
nvme_set_queue_count.

---
Changes in v4:
- collected review tags
- dropped "nvme_ctrl_reset to keep alive end io handler" again
- added "nvme-fc: do not ignore connectivity loss during connecting"
- Link to v3: https://lore.kernel.org/r/20241129-nvme-fc-handle-com-lost-v3-0-d8967b3cae54@kernel.org

Changes in v3:
- collected reviewed tags
- added "nvme_ctrl_reset to keep alive end io handler"
- Link to v2: https://lore.kernel.org/r/20241029-nvme-fc-handle-com-lost-v2-0-5b0d137e2a0a@kernel.org

Changes in v2:
- handle connection lost in nvme_set_queue_count directly
- collected reviewed tags
- Link to v1: https://lore.kernel.org/r/20240611190647.11856-1-dwagner@suse.de

---
Daniel Wagner (3):
      nvme-fc: go straight to connecting state when initializing
      nvme: handle connectivity loss in nvme_set_queue_count
      nvme-fc: do not ignore connectivity loss during connecting

 drivers/nvme/host/core.c |  8 +++++++-
 drivers/nvme/host/fc.c   | 26 +++++++++++++++++++-------
 2 files changed, 26 insertions(+), 8 deletions(-)
---
base-commit: b9973aa4d0507c4969ad87763b535edb77b7dceb
change-id: 20241029-nvme-fc-handle-com-lost-9b241936809a

Best regards,
-- 
Daniel Wagner <wagi@kernel.org>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/3] nvme-fc: go straight to connecting state when initializing
  2025-01-09 13:30 [PATCH v4 0/3] nvme-fc: fix race with connectivity loss and nvme_fc_create_association Daniel Wagner
@ 2025-01-09 13:30 ` Daniel Wagner
  2025-01-09 13:30 ` [PATCH v4 2/3] nvme: handle connectivity loss in nvme_set_queue_count Daniel Wagner
  2025-01-09 13:30 ` [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting Daniel Wagner
  2 siblings, 0 replies; 10+ messages in thread
From: Daniel Wagner @ 2025-01-09 13:30 UTC (permalink / raw)
  To: James Smart, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Hannes Reinecke, Paul Ely
  Cc: linux-nvme, linux-kernel, Daniel Wagner

The initial controller initialization mimiks the reconnect loop
behavior by switching from NEW to RESETTING and then to CONNECTING.

The transition from NEW to CONNECTING is a valid transition, so there is
no point entering the RESETTING state. TCP and RDMA also transition
directly to CONNECTING state.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/nvme/host/fc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 094be164ffdc0fb79050cfb92c32dfaee8d15622..7409da42b9ee580cdd6fe78c0f93e78c4ad08675 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3578,8 +3578,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
 	spin_unlock_irqrestore(&rport->lock, flags);
 
-	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
-	    !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
 		dev_err(ctrl->ctrl.device,
 			"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
 		goto fail_ctrl;

-- 
2.47.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 2/3] nvme: handle connectivity loss in nvme_set_queue_count
  2025-01-09 13:30 [PATCH v4 0/3] nvme-fc: fix race with connectivity loss and nvme_fc_create_association Daniel Wagner
  2025-01-09 13:30 ` [PATCH v4 1/3] nvme-fc: go straight to connecting state when initializing Daniel Wagner
@ 2025-01-09 13:30 ` Daniel Wagner
  2025-01-09 13:30 ` [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting Daniel Wagner
  2 siblings, 0 replies; 10+ messages in thread
From: Daniel Wagner @ 2025-01-09 13:30 UTC (permalink / raw)
  To: James Smart, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Hannes Reinecke, Paul Ely
  Cc: linux-nvme, linux-kernel, Daniel Wagner

When the set feature attempts fails with any NVME status code set in
nvme_set_queue_count, the function still report success. Though the
numbers of queues set to 0. This is done to support controllers in
degraded state (the admin queue is still up and running but no IO
queues).

Though there is an exception. When nvme_set_features reports an host
path error, nvme_set_queue_count should propagate this error as the
connectivity is lost, which means also the admin queue is not working
anymore.

Fixes: 9a0be7abb62f ("nvme: refactor set_queue_count")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/nvme/host/core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b72b0e06801490f2c14e7fa09a45a870045a221d..a08ff8b362d0b7bd1a8a343e26d564b089c6bad3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1695,7 +1695,13 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
 
 	status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, NULL, 0,
 			&result);
-	if (status < 0)
+
+	/*
+	 * It's either a kernel error or the host observed a connection
+	 * lost. In either case it's not possible communicate with the
+	 * controller and thus enter the error code path.
+	 */
+	if (status < 0 || status == NVME_SC_HOST_PATH_ERROR)
 		return status;
 
 	/*

-- 
2.47.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting
  2025-01-09 13:30 [PATCH v4 0/3] nvme-fc: fix race with connectivity loss and nvme_fc_create_association Daniel Wagner
  2025-01-09 13:30 ` [PATCH v4 1/3] nvme-fc: go straight to connecting state when initializing Daniel Wagner
  2025-01-09 13:30 ` [PATCH v4 2/3] nvme: handle connectivity loss in nvme_set_queue_count Daniel Wagner
@ 2025-01-09 13:30 ` Daniel Wagner
  2025-01-10 22:50   ` Sagi Grimberg
                     ` (2 more replies)
  2 siblings, 3 replies; 10+ messages in thread
From: Daniel Wagner @ 2025-01-09 13:30 UTC (permalink / raw)
  To: James Smart, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Hannes Reinecke, Paul Ely
  Cc: linux-nvme, linux-kernel, Daniel Wagner

When a connectivity loss occurs while nvme_fc_create_assocation is
being executed, it's possible that the ctrl ends up stuck in the LIVE
state:

  1) nvme nvme10: NVME-FC{10}: create association : ...
  2) nvme nvme10: NVME-FC{10}: controller connectivity lost.
                  Awaiting Reconnect
     nvme nvme10: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
  3) nvme nvme10: Could not set queue count (880)
     nvme nvme10: Failed to configure AEN (cfg 900)
  4) nvme nvme10: NVME-FC{10}: controller connect complete
  5) nvme nvme10: failed nvme_keep_alive_end_io error=4

A connection attempt starts 1) and the ctrl is in state CONNECTING.
Shortly after the LLDD driver detects a connection lost event and calls
nvme_fc_ctrl_connectivity_loss 2). Because we are still in CONNECTING
state, this event is ignored.

nvme_fc_create_association continues to run in parallel and tries to
communicate with the controller and these commands will fail. Though
these errors are filtered out, e.g in 3) setting the I/O queues numbers
fails which leads to an early exit in nvme_fc_create_io_queues. Because
the number of IO queues is 0 at this point, there is nothing left in
nvme_fc_create_association which could detected the connection drop.
Thus the ctrl enters LIVE state 4).

Eventually the keep alive handler times out 5) but because nothing is
being done, the ctrl stays in LIVE state.

There is already the ASSOC_FAILED flag to track connectivity loss event
but this bit is set too late in the recovery code path. Move this into
the connectivity loss event handler and synchronize it with the state
change. This ensures that the ASSOC_FAILED flag is seen by
nvme_fc_create_io_queues and it does not enter the LIVE state after a
connectivity loss event. If the connectivity loss event happens after we
entered the LIVE state the normal error recovery path is executed.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/nvme/host/fc.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 7409da42b9ee580cdd6fe78c0f93e78c4ad08675..55884d3df6f291cfddb4742e135b54a72f1cfa05 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -781,11 +781,19 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport)
 static void
 nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
 {
+	enum nvme_ctrl_state state;
+	unsigned long flags;
+
 	dev_info(ctrl->ctrl.device,
 		"NVME-FC{%d}: controller connectivity lost. Awaiting "
 		"Reconnect", ctrl->cnum);
 
-	switch (nvme_ctrl_state(&ctrl->ctrl)) {
+	spin_lock_irqsave(&ctrl->lock, flags);
+	set_bit(ASSOC_FAILED, &ctrl->flags);
+	state = nvme_ctrl_state(&ctrl->ctrl);
+	spin_unlock_irqrestore(&ctrl->lock, flags);
+
+	switch (state) {
 	case NVME_CTRL_NEW:
 	case NVME_CTRL_LIVE:
 		/*
@@ -2542,7 +2550,6 @@ nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
 	 */
 	if (ctrl->ctrl.state == NVME_CTRL_CONNECTING) {
 		__nvme_fc_abort_outstanding_ios(ctrl, true);
-		set_bit(ASSOC_FAILED, &ctrl->flags);
 		dev_warn(ctrl->ctrl.device,
 			"NVME-FC{%d}: transport error during (re)connect\n",
 			ctrl->cnum);
@@ -3167,12 +3174,18 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 		else
 			ret = nvme_fc_recreate_io_queues(ctrl);
 	}
-	if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags))
-		ret = -EIO;
 	if (ret)
 		goto out_term_aen_ops;
 
-	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
+	spin_lock_irqsave(&ctrl->lock, flags);
+	if (!test_bit(ASSOC_FAILED, &ctrl->flags))
+		changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
+	else
+		ret = -EIO;
+	spin_unlock_irqrestore(&ctrl->lock, flags);
+
+	if (ret)
+		goto out_term_aen_ops;
 
 	ctrl->ctrl.nr_reconnects = 0;
 

-- 
2.47.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting
  2025-01-09 13:30 ` [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting Daniel Wagner
@ 2025-01-10 22:50   ` Sagi Grimberg
  2025-01-20 13:45   ` Hannes Reinecke
  2025-02-13  7:16   ` Shinichiro Kawasaki
  2 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2025-01-10 22:50 UTC (permalink / raw)
  To: Daniel Wagner, James Smart, Keith Busch, Christoph Hellwig,
	Hannes Reinecke, Paul Ely
  Cc: linux-nvme, linux-kernel

Looks better to me, but FC folks should probably review this one

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting
  2025-01-09 13:30 ` [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting Daniel Wagner
  2025-01-10 22:50   ` Sagi Grimberg
@ 2025-01-20 13:45   ` Hannes Reinecke
  2025-02-13  7:16   ` Shinichiro Kawasaki
  2 siblings, 0 replies; 10+ messages in thread
From: Hannes Reinecke @ 2025-01-20 13:45 UTC (permalink / raw)
  To: Daniel Wagner, James Smart, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Paul Ely
  Cc: linux-nvme, linux-kernel

On 1/9/25 14:30, Daniel Wagner wrote:
> When a connectivity loss occurs while nvme_fc_create_assocation is
> being executed, it's possible that the ctrl ends up stuck in the LIVE
> state:
> 
>    1) nvme nvme10: NVME-FC{10}: create association : ...
>    2) nvme nvme10: NVME-FC{10}: controller connectivity lost.
>                    Awaiting Reconnect
>       nvme nvme10: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
>    3) nvme nvme10: Could not set queue count (880)
>       nvme nvme10: Failed to configure AEN (cfg 900)
>    4) nvme nvme10: NVME-FC{10}: controller connect complete
>    5) nvme nvme10: failed nvme_keep_alive_end_io error=4
> 
> A connection attempt starts 1) and the ctrl is in state CONNECTING.
> Shortly after the LLDD driver detects a connection lost event and calls
> nvme_fc_ctrl_connectivity_loss 2). Because we are still in CONNECTING
> state, this event is ignored.
> 
> nvme_fc_create_association continues to run in parallel and tries to
> communicate with the controller and these commands will fail. Though
> these errors are filtered out, e.g in 3) setting the I/O queues numbers
> fails which leads to an early exit in nvme_fc_create_io_queues. Because
> the number of IO queues is 0 at this point, there is nothing left in
> nvme_fc_create_association which could detected the connection drop.
> Thus the ctrl enters LIVE state 4).
> 
> Eventually the keep alive handler times out 5) but because nothing is
> being done, the ctrl stays in LIVE state.
> 
> There is already the ASSOC_FAILED flag to track connectivity loss event
> but this bit is set too late in the recovery code path. Move this into
> the connectivity loss event handler and synchronize it with the state
> change. This ensures that the ASSOC_FAILED flag is seen by
> nvme_fc_create_io_queues and it does not enter the LIVE state after a
> connectivity loss event. If the connectivity loss event happens after we
> entered the LIVE state the normal error recovery path is executed.
> 
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>   drivers/nvme/host/fc.c | 23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 7409da42b9ee580cdd6fe78c0f93e78c4ad08675..55884d3df6f291cfddb4742e135b54a72f1cfa05 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -781,11 +781,19 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport)
>   static void
>   nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
>   {
> +	enum nvme_ctrl_state state;
> +	unsigned long flags;
> +
>   	dev_info(ctrl->ctrl.device,
>   		"NVME-FC{%d}: controller connectivity lost. Awaiting "
>   		"Reconnect", ctrl->cnum);
>   
> -	switch (nvme_ctrl_state(&ctrl->ctrl)) {
> +	spin_lock_irqsave(&ctrl->lock, flags);
> +	set_bit(ASSOC_FAILED, &ctrl->flags);
> +	state = nvme_ctrl_state(&ctrl->ctrl);
> +	spin_unlock_irqrestore(&ctrl->lock, flags);
> +
> +	switch (state) {
>   	case NVME_CTRL_NEW:
>   	case NVME_CTRL_LIVE:
>   		/*
> @@ -2542,7 +2550,6 @@ nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
>   	 */
>   	if (ctrl->ctrl.state == NVME_CTRL_CONNECTING) {
>   		__nvme_fc_abort_outstanding_ios(ctrl, true);
> -		set_bit(ASSOC_FAILED, &ctrl->flags);
>   		dev_warn(ctrl->ctrl.device,
>   			"NVME-FC{%d}: transport error during (re)connect\n",
>   			ctrl->cnum);
> @@ -3167,12 +3174,18 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>   		else
>   			ret = nvme_fc_recreate_io_queues(ctrl);
>   	}
> -	if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags))
> -		ret = -EIO;
>   	if (ret)
>   		goto out_term_aen_ops;
>   
> -	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
> +	spin_lock_irqsave(&ctrl->lock, flags);
> +	if (!test_bit(ASSOC_FAILED, &ctrl->flags))
> +		changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
> +	else
> +		ret = -EIO;
> +	spin_unlock_irqrestore(&ctrl->lock, flags);
> +
> +	if (ret)
> +		goto out_term_aen_ops;
>   
>   	ctrl->ctrl.nr_reconnects = 0;
>   
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting
  2025-01-09 13:30 ` [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting Daniel Wagner
  2025-01-10 22:50   ` Sagi Grimberg
  2025-01-20 13:45   ` Hannes Reinecke
@ 2025-02-13  7:16   ` Shinichiro Kawasaki
  2025-02-13  9:14     ` Daniel Wagner
  2 siblings, 1 reply; 10+ messages in thread
From: Shinichiro Kawasaki @ 2025-02-13  7:16 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: James Smart, Keith Busch, hch, Sagi Grimberg, Hannes Reinecke,
	Paul Ely, linux-nvme@lists.infradead.org,
	linux-kernel@vger.kernel.org

On Jan 09, 2025 / 14:30, Daniel Wagner wrote:
> When a connectivity loss occurs while nvme_fc_create_assocation is
> being executed, it's possible that the ctrl ends up stuck in the LIVE
> state:
>
[...]
> 
> There is already the ASSOC_FAILED flag to track connectivity loss event
> but this bit is set too late in the recovery code path. Move this into
> the connectivity loss event handler and synchronize it with the state
> change. This ensures that the ASSOC_FAILED flag is seen by
> nvme_fc_create_io_queues and it does not enter the LIVE state after a
> connectivity loss event. If the connectivity loss event happens after we
> entered the LIVE state the normal error recovery path is executed.

I found many test cases in blktests nvme group fail with v6.14-rc2 kernel with
fc transport. I bisected and found this patch, as the commit ee59e3820ca9, is
the trigger. When I revert the commit from v6.14-rc2, the failure disappears.

Here I share the kernel message log observed at the test case nvme/003. The
kernel reported "BUG: sleeping function called from invalid context".


Feb 12 12:03:08 testnode1 unknown: run blktests nvme/003 at 2025-02-12 12:03:08
Feb 12 12:03:08 testnode1 kernel: loop0: detected capacity change from 0 to 2097152
Feb 12 12:03:08 testnode1 kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
Feb 12 12:03:08 testnode1 kernel: nvme nvme1: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Feb 12 12:03:08 testnode1 kernel: (NULL device *): {0:0} Association created
Feb 12 12:03:08 testnode1 kernel: nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
Feb 12 12:03:08 testnode1 kernel: BUG: sleeping function called from invalid context at kernel/workqueue.c:4355
Feb 12 12:03:08 testnode1 kernel: in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 68705, name: kworker/u16:14
Feb 12 12:03:08 testnode1 kernel: preempt_count: 1, expected: 0
Feb 12 12:03:08 testnode1 kernel: RCU nest depth: 0, expected: 0
Feb 12 12:03:08 testnode1 kernel: 3 locks held by kworker/u16:14/68705:
Feb 12 12:03:08 testnode1 kernel:  #0: ffff88813a90b148 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0xf20/0x1460
Feb 12 12:03:08 testnode1 kernel:  #1: ffff88813b397d38 ((work_completion)(&(&ctrl->connect_work)->work)){+.+.}-{0:0}, at: process_one_work+0x7de/0x1460
Feb 12 12:03:08 testnode1 kernel:  #2: ffff8881ab5bc018 (&ctrl->lock#2){....}-{3:3}, at: nvme_fc_connect_ctrl_work.cold+0x2200/0x2c9d [nvme_fc]
Feb 12 12:03:08 testnode1 kernel: irq event stamp: 140088
Feb 12 12:03:08 testnode1 kernel: hardirqs last  enabled at (140087): [<ffffffff90e222ec>] _raw_spin_unlock_irqrestore+0x4c/0x60
Feb 12 12:03:08 testnode1 kernel: hardirqs last disabled at (140088): [<ffffffff90e21fef>] _raw_spin_lock_irqsave+0x5f/0x70
Feb 12 12:03:08 testnode1 kernel: softirqs last  enabled at (139874): [<ffffffff8e5158b9>] __irq_exit_rcu+0x109/0x210
Feb 12 12:03:08 testnode1 kernel: softirqs last disabled at (139867): [<ffffffff8e5158b9>] __irq_exit_rcu+0x109/0x210
Feb 12 12:03:08 testnode1 kernel: Preemption disabled at:
Feb 12 12:03:08 testnode1 kernel: [<0000000000000000>] 0x0
Feb 12 12:03:08 testnode1 kernel: CPU: 0 UID: 0 PID: 68705 Comm: kworker/u16:14 Not tainted 6.14.0-rc2 #256
Feb 12 12:03:08 testnode1 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
Feb 12 12:03:08 testnode1 kernel: Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
Feb 12 12:03:08 testnode1 kernel: Call Trace:
Feb 12 12:03:08 testnode1 kernel:  <TASK>
Feb 12 12:03:08 testnode1 kernel:  dump_stack_lvl+0x6a/0x90
Feb 12 12:03:08 testnode1 kernel:  __might_resched.cold+0x1f7/0x23d
Feb 12 12:03:08 testnode1 kernel:  ? __pfx___might_resched+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  cancel_delayed_work_sync+0x67/0xb0
Feb 12 12:03:08 testnode1 kernel:  nvme_change_ctrl_state+0x104/0x2f0 [nvme_core]
Feb 12 12:03:08 testnode1 kernel:  nvme_fc_connect_ctrl_work.cold+0x2252/0x2c9d [nvme_fc]
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_nvme_fc_connect_ctrl_work+0x10/0x10 [nvme_fc]
Feb 12 12:03:08 testnode1 kernel:  process_one_work+0x85a/0x1460
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_lock_acquire+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_process_one_work+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ? assign_work+0x16c/0x240
Feb 12 12:03:08 testnode1 kernel:  ? lock_is_held_type+0xd5/0x130
Feb 12 12:03:08 testnode1 kernel:  worker_thread+0x5e2/0xfc0
Feb 12 12:03:08 testnode1 kernel:  ? __kthread_parkme+0xb1/0x1d0
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_worker_thread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_worker_thread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  kthread+0x39d/0x750
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ret_from_fork+0x30/0x70
Feb 12 12:03:08 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Feb 12 12:03:08 testnode1 kernel:  ret_from_fork_asm+0x1a/0x30
Feb 12 12:03:08 testnode1 kernel:  </TASK>
Feb 12 12:03:08 testnode1 kernel: nvme nvme1: NVME-FC{0}: controller connect complete
Feb 12 12:03:08 testnode1 kernel: nvme nvme1: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
Feb 12 12:03:08 testnode1 kernel: nvme nvme2: NVME-FC{1}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Feb 12 12:03:08 testnode1 kernel: (NULL device *): {0:1} Association created
Feb 12 12:03:08 testnode1 kernel: nvmet: Created discovery controller 2 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:f35c1212-8678-4a7f-99e5-05e61788783b.
Feb 12 12:03:08 testnode1 kernel: nvme nvme2: NVME-FC{1}: controller connect complete
Feb 12 12:03:08 testnode1 kernel: nvme nvme2: NVME-FC{1}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:f35c1212-8678-4a7f-99e5-05e61788783b
Feb 12 12:03:08 testnode1 kernel: nvme nvme2: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Feb 12 12:03:08 testnode1 kernel: (NULL device *): {0:1} Association deleted
Feb 12 12:03:08 testnode1 kernel: (NULL device *): {0:1} Association freed
Feb 12 12:03:08 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Feb 12 12:03:18 testnode1 kernel: nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Feb 12 12:03:18 testnode1 kernel: (NULL device *): {0:0} Association deleted
Feb 12 12:03:18 testnode1 kernel: (NULL device *): {0:0} Association freed
Feb 12 12:03:18 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Feb 12 12:03:18 testnode1 kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x10001100ab000001:pn-0x20001100ab000001 - nn-0x10001100aa000001:pn-0x20001100aa000001 combination not found


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting
  2025-02-13  7:16   ` Shinichiro Kawasaki
@ 2025-02-13  9:14     ` Daniel Wagner
  2025-02-13 14:22       ` Daniel Wagner
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Wagner @ 2025-02-13  9:14 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: Daniel Wagner, James Smart, Keith Busch, hch, Sagi Grimberg,
	Hannes Reinecke, Paul Ely, linux-nvme@lists.infradead.org,
	linux-kernel@vger.kernel.org

On Thu, Feb 13, 2025 at 07:16:31AM +0000, Shinichiro Kawasaki wrote:
> On Jan 09, 2025 / 14:30, Daniel Wagner wrote:
> > When a connectivity loss occurs while nvme_fc_create_assocation is
> > being executed, it's possible that the ctrl ends up stuck in the LIVE
> > state:
> >
> [...]
> > 
> > There is already the ASSOC_FAILED flag to track connectivity loss event
> > but this bit is set too late in the recovery code path. Move this into
> > the connectivity loss event handler and synchronize it with the state
> > change. This ensures that the ASSOC_FAILED flag is seen by
> > nvme_fc_create_io_queues and it does not enter the LIVE state after a
> > connectivity loss event. If the connectivity loss event happens after we
> > entered the LIVE state the normal error recovery path is executed.
> 
> I found many test cases in blktests nvme group fail with v6.14-rc2 kernel with
> fc transport. I bisected and found this patch, as the commit ee59e3820ca9, is
> the trigger. When I revert the commit from v6.14-rc2, the failure disappears.
> 
> Here I share the kernel message log observed at the test case nvme/003. The
> kernel reported "BUG: sleeping function called from invalid context".

Thanks for digging into it.

The problem is that the patch moves the nvme_change_ctrl_state call
under the ctrl->lock and nvme_change_ctrl_state is also
cancel_delayed_work_sync which doesn't work:

  spin_lock_irqsave
    nvme_change_ctrl_state
      nvme_stop_failfast_work
        cancel_delayed_work_sync

I've checked if there is another caller of nvme_change_ctrl_state is
taking a lock, all looks fine. If there would be way to move the
failfast code bits out of the nvme_change_ctrl_state function would be a
nice solution.

That means the idea to synchronize the state change with the
ASSOC_FAILED bit under the lock is not going to work. I am trying to
figure out a solution.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting
  2025-02-13  9:14     ` Daniel Wagner
@ 2025-02-13 14:22       ` Daniel Wagner
  2025-02-14  3:54         ` Shinichiro Kawasaki
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Wagner @ 2025-02-13 14:22 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: Daniel Wagner, James Smart, Keith Busch, hch, Sagi Grimberg,
	Hannes Reinecke, Paul Ely, linux-nvme@lists.infradead.org,
	linux-kernel@vger.kernel.org

On Thu, Feb 13, 2025 at 10:14:46AM +0100, Daniel Wagner wrote:
> That means the idea to synchronize the state change with the
> ASSOC_FAILED bit under the lock is not going to work. I am trying to
> figure out a solution.

I found a possible solution. The only 'catch' is to make the state
machine a bit more restrictive. The only valid transition to LIVE would
be from CONNECTING. We can do this because even the PCI driver is doing
all the state transitions NEW -> CONNECTING -> LIVE. It's important that
we can't enter LIVE from RESETTING to get the patch below working.

We don't have to rely on another variable to figure in which state the
ctrl is. The nvme_fc_ctrl_connectivity_loss callback needs always to
trigger a reset. If the ctrl is not in LIVE it is a no-op. This makes it
possible to remove the lock around the ASSOC_FAILED and the state read
operation.

In nvme_fc_create_association we only have to check if we can enter the
LIVE state (thus we were in CONNECTING) and if this fails we entered the
RESETTING state and should return an error.


diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 818d4e49aab5..f028913e2e62 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -564,8 +564,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 	switch (new_state) {
 	case NVME_CTRL_LIVE:
 		switch (old_state) {
-		case NVME_CTRL_NEW:
-		case NVME_CTRL_RESETTING:
 		case NVME_CTRL_CONNECTING:
 			changed = true;
 			fallthrough;
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index f4f1866fbd5b..e740814fd1ea 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -781,61 +781,12 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport)
 static void
 nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
 {
-	enum nvme_ctrl_state state;
-	unsigned long flags;
-
 	dev_info(ctrl->ctrl.device,
 		"NVME-FC{%d}: controller connectivity lost. Awaiting "
 		"Reconnect", ctrl->cnum);

-	spin_lock_irqsave(&ctrl->lock, flags);
 	set_bit(ASSOC_FAILED, &ctrl->flags);
-	state = nvme_ctrl_state(&ctrl->ctrl);
-	spin_unlock_irqrestore(&ctrl->lock, flags);
-
-	switch (state) {
-	case NVME_CTRL_NEW:
-	case NVME_CTRL_LIVE:
-		/*
-		 * Schedule a controller reset. The reset will terminate the
-		 * association and schedule the reconnect timer.  Reconnects
-		 * will be attempted until either the ctlr_loss_tmo
-		 * (max_retries * connect_delay) expires or the remoteport's
-		 * dev_loss_tmo expires.
-		 */
-		if (nvme_reset_ctrl(&ctrl->ctrl)) {
-			dev_warn(ctrl->ctrl.device,
-				"NVME-FC{%d}: Couldn't schedule reset.\n",
-				ctrl->cnum);
-			nvme_delete_ctrl(&ctrl->ctrl);
-		}
-		break;
-
-	case NVME_CTRL_CONNECTING:
-		/*
-		 * The association has already been terminated and the
-		 * controller is attempting reconnects.  No need to do anything
-		 * futher.  Reconnects will be attempted until either the
-		 * ctlr_loss_tmo (max_retries * connect_delay) expires or the
-		 * remoteport's dev_loss_tmo expires.
-		 */
-		break;
-
-	case NVME_CTRL_RESETTING:
-		/*
-		 * Controller is already in the process of terminating the
-		 * association.  No need to do anything further. The reconnect
-		 * step will kick in naturally after the association is
-		 * terminated.
-		 */
-		break;
-
-	case NVME_CTRL_DELETING:
-	case NVME_CTRL_DELETING_NOIO:
-	default:
-		/* no action to take - let it delete */
-		break;
-	}
+	nvme_reset_ctrl(&ctrl->ctrl);
 }

 /**
@@ -3177,23 +3128,18 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 		else
 			ret = nvme_fc_recreate_io_queues(ctrl);
 	}
+	if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags))
+		ret = -EIO;
 	if (ret)
 		goto out_term_aen_ops;

-	spin_lock_irqsave(&ctrl->lock, flags);
-	if (!test_bit(ASSOC_FAILED, &ctrl->flags))
-		changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
-	else
+	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE)) {
 		ret = -EIO;
-	spin_unlock_irqrestore(&ctrl->lock, flags);
-
-	if (ret)
 		goto out_term_aen_ops;
+	}

 	ctrl->ctrl.nr_reconnects = 0;
-
-	if (changed)
-		nvme_start_ctrl(&ctrl->ctrl);
+	nvme_start_ctrl(&ctrl->ctrl);

 	return 0;	/* Success */


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting
  2025-02-13 14:22       ` Daniel Wagner
@ 2025-02-14  3:54         ` Shinichiro Kawasaki
  0 siblings, 0 replies; 10+ messages in thread
From: Shinichiro Kawasaki @ 2025-02-14  3:54 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Daniel Wagner, James Smart, Keith Busch, hch, Sagi Grimberg,
	Hannes Reinecke, Paul Ely, linux-nvme@lists.infradead.org,
	linux-kernel@vger.kernel.org

On Feb 13, 2025 / 15:22, Daniel Wagner wrote:
> On Thu, Feb 13, 2025 at 10:14:46AM +0100, Daniel Wagner wrote:
> > That means the idea to synchronize the state change with the
> > ASSOC_FAILED bit under the lock is not going to work. I am trying to
> > figure out a solution.
> 
> I found a possible solution. The only 'catch' is to make the state
> machine a bit more restrictive. The only valid transition to LIVE would
> be from CONNECTING. We can do this because even the PCI driver is doing
> all the state transitions NEW -> CONNECTING -> LIVE. It's important that
> we can't enter LIVE from RESETTING to get the patch below working.
> 
> We don't have to rely on another variable to figure in which state the
> ctrl is. The nvme_fc_ctrl_connectivity_loss callback needs always to
> trigger a reset. If the ctrl is not in LIVE it is a no-op. This makes it
> possible to remove the lock around the ASSOC_FAILED and the state read
> operation.
> 
> In nvme_fc_create_association we only have to check if we can enter the
> LIVE state (thus we were in CONNECTING) and if this fails we entered the
> RESETTING state and should return an error.
> 
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 818d4e49aab5..f028913e2e62 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -564,8 +564,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>  	switch (new_state) {
>  	case NVME_CTRL_LIVE:
>  		switch (old_state) {
> -		case NVME_CTRL_NEW:
> -		case NVME_CTRL_RESETTING:
>  		case NVME_CTRL_CONNECTING:
>  			changed = true;
>  			fallthrough;
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index f4f1866fbd5b..e740814fd1ea 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -781,61 +781,12 @@ nvme_fc_abort_lsops(struct nvme_fc_rport *rport)
>  static void
>  nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
>  {
> -	enum nvme_ctrl_state state;
> -	unsigned long flags;
> -
>  	dev_info(ctrl->ctrl.device,
>  		"NVME-FC{%d}: controller connectivity lost. Awaiting "
>  		"Reconnect", ctrl->cnum);
> 
> -	spin_lock_irqsave(&ctrl->lock, flags);
>  	set_bit(ASSOC_FAILED, &ctrl->flags);
> -	state = nvme_ctrl_state(&ctrl->ctrl);
> -	spin_unlock_irqrestore(&ctrl->lock, flags);
> -
> -	switch (state) {
> -	case NVME_CTRL_NEW:
> -	case NVME_CTRL_LIVE:
> -		/*
> -		 * Schedule a controller reset. The reset will terminate the
> -		 * association and schedule the reconnect timer.  Reconnects
> -		 * will be attempted until either the ctlr_loss_tmo
> -		 * (max_retries * connect_delay) expires or the remoteport's
> -		 * dev_loss_tmo expires.
> -		 */
> -		if (nvme_reset_ctrl(&ctrl->ctrl)) {
> -			dev_warn(ctrl->ctrl.device,
> -				"NVME-FC{%d}: Couldn't schedule reset.\n",
> -				ctrl->cnum);
> -			nvme_delete_ctrl(&ctrl->ctrl);
> -		}
> -		break;
> -
> -	case NVME_CTRL_CONNECTING:
> -		/*
> -		 * The association has already been terminated and the
> -		 * controller is attempting reconnects.  No need to do anything
> -		 * futher.  Reconnects will be attempted until either the
> -		 * ctlr_loss_tmo (max_retries * connect_delay) expires or the
> -		 * remoteport's dev_loss_tmo expires.
> -		 */
> -		break;
> -
> -	case NVME_CTRL_RESETTING:
> -		/*
> -		 * Controller is already in the process of terminating the
> -		 * association.  No need to do anything further. The reconnect
> -		 * step will kick in naturally after the association is
> -		 * terminated.
> -		 */
> -		break;
> -
> -	case NVME_CTRL_DELETING:
> -	case NVME_CTRL_DELETING_NOIO:
> -	default:
> -		/* no action to take - let it delete */
> -		break;
> -	}
> +	nvme_reset_ctrl(&ctrl->ctrl);
>  }
> 
>  /**
> @@ -3177,23 +3128,18 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>  		else
>  			ret = nvme_fc_recreate_io_queues(ctrl);
>  	}
> +	if (!ret && test_bit(ASSOC_FAILED, &ctrl->flags))
> +		ret = -EIO;
>  	if (ret)
>  		goto out_term_aen_ops;
> 
> -	spin_lock_irqsave(&ctrl->lock, flags);
> -	if (!test_bit(ASSOC_FAILED, &ctrl->flags))
> -		changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
> -	else
> +	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE)) {
>  		ret = -EIO;
> -	spin_unlock_irqrestore(&ctrl->lock, flags);
> -
> -	if (ret)
>  		goto out_term_aen_ops;
> +	}
> 
>  	ctrl->ctrl.nr_reconnects = 0;
> -
> -	if (changed)
> -		nvme_start_ctrl(&ctrl->ctrl);
> +	nvme_start_ctrl(&ctrl->ctrl);
> 
>  	return 0;	/* Success */

Thanks Daniel. I applied the patch on top of v6.14-rc2, and it avoided the
blktests failures. I ran the nvme test group with other transports (loop, rdma
and tcp), and observed no regression. It looks good from test run point of view.

Of note is that I observed a compiler warning at kenerl build:

drivers/nvme/host/fc.c: In function ‘nvme_fc_create_association’:
drivers/nvme/host/fc.c:3025:14: warning: unused variable ‘changed’ [-Wunused-variable]
 3025 |         bool changed;
      |              ^~~~~~~

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-02-14  3:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-09 13:30 [PATCH v4 0/3] nvme-fc: fix race with connectivity loss and nvme_fc_create_association Daniel Wagner
2025-01-09 13:30 ` [PATCH v4 1/3] nvme-fc: go straight to connecting state when initializing Daniel Wagner
2025-01-09 13:30 ` [PATCH v4 2/3] nvme: handle connectivity loss in nvme_set_queue_count Daniel Wagner
2025-01-09 13:30 ` [PATCH v4 3/3] nvme-fc: do not ignore connectivity loss during connecting Daniel Wagner
2025-01-10 22:50   ` Sagi Grimberg
2025-01-20 13:45   ` Hannes Reinecke
2025-02-13  7:16   ` Shinichiro Kawasaki
2025-02-13  9:14     ` Daniel Wagner
2025-02-13 14:22       ` Daniel Wagner
2025-02-14  3:54         ` Shinichiro Kawasaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox