Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] nvmet-fcloop: unblock module removal
@ 2023-04-11 12:07 Daniel Wagner
  2023-04-11 12:07 ` [PATCH v2 1/4] nvmet-fcloop: Remove remote port from list when unlinking Daniel Wagner
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Daniel Wagner @ 2023-04-11 12:07 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Daniel Wagner

blktests is not able to unload the FC related modules. It is possible to unload
the modules but it still will not work correctly. The host and the controller
seem to be in a kind of live deadlock:

 loop: module loaded
 run blktests nvme/003 at 2023-04-11 13:55:57
 nvmet: adding nsid 1 to subsystem blktests-subsystem-1
 nvme nvme0: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
 (NULL device *): {0:0} Association created
 [71] nvmet: ctrl 1 start keep-alive timer for 120 secs
 nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
 nvme nvme0: NVME-FC{0}: controller connect complete
 nvme nvme0: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
 nvme nvme1: NVME-FC{1}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
 (NULL device *): {0:1} Association created
 [453] nvmet: ctrl 2 start keep-alive timer for 5 secs
 nvmet: creating nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
 [71] nvmet: adding queue 1 to ctrl 2.
 [45] nvmet: adding queue 2 to ctrl 2.
 [453] nvmet: adding queue 3 to ctrl 2.
 [105] nvmet: adding queue 4 to ctrl 2.
 nvme nvme1: NVME-FC{1}: controller connect complete
 nvme nvme1: NVME-FC{1}: new ctrl: NQN "blktests-subsystem-1"
 [453] nvmet: ctrl 2 reschedule traffic based keep-alive timer
 [105] nvmet: ctrl 2 update keep-alive timer for 5 secs
 [105] nvmet: ctrl 2 update keep-alive timer for 5 secs
 nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
 [45] nvmet: ctrl 1 stop keep-alive
 (NULL device *): {0:0} Association deleted
 (NULL device *): {0:0} Association freed
 (NULL device *): Disconnect LS failed: No Association
 nvme nvme1: rescanning namespaces.
 nvme nvme1: NVME-FC{1}: io failed due to lldd error 6
 nvme nvme1: NVME-FC{1}: transport association event: transport detected io error
 nvme nvme1: NVME-FC{1}: resetting controller
 [105] nvmet: ctrl 2 stop keep-alive
 nvme nvme0: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
 (NULL device *): {0:1} Association deleted
 (NULL device *): {0:0} Association created
 (NULL device *): {0:1} Association freed
 nvmet: connect request for invalid subsystem nqn.2014-08.org.nvmexpress.discovery!
 nvme nvme0: Connect Invalid Data Parameter, subsysnqn "nqn.2014-08.org.nvmexpress.discovery"
 (NULL device *): Disconnect LS failed: No Association
 nvme nvme1: NVME-FC{1}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
 nvme nvme0: NVME-FC{0}: reset: Reconnect attempt failed (16770)
 (NULL device *): {0:1} Association created
 nvme nvme0: NVME-FC{0}: reconnect failure
 nvmet: connect request for invalid subsystem blktests-subsystem-1!
 nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
 nvme nvme1: Connect Invalid Data Parameter, subsysnqn "blktests-subsystem-1"
 nvme nvme0: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
 nvme nvme1: NVME-FC{1}: reset: Reconnect attempt failed (16770)
 nvme nvme1: NVME-FC{1}: reconnect failure
 nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
 (NULL device *): {0:0} Association deleted
 (NULL device *): {0:0} Association freed
 (NULL device *): Disconnect LS failed: No Association
 (NULL device *): {0:1} Association deleted
 (NULL device *): {0:1} Association freed
 (NULL device *): Disconnect LS failed: No Association
 nvmet_fc: nvmet_fc_exit_module: targetport list not empty

I think these patches here are not very controversial and should propably go in
even if we still haven't fix for the above scenario.

v2:
  - added additional fixes

v1:
  - initial version
  - https://lore.kernel.org/linux-nvme/20230411092209.12719-1-dwagner@suse.de/

Daniel Wagner (4):
  nvmet-fcloop: Remove remote port from list when unlinking
  nvmet-fcloop: Do not wait on completion when unregister fails
  nvmet-fc: Do not wait in vain when unloading module
  nvmet-fc: Release reference on target port

 drivers/nvme/host/fc.c       | 20 +++++++++++++-------
 drivers/nvme/target/fc.c     |  1 +
 drivers/nvme/target/fcloop.c |  5 ++++-
 3 files changed, 18 insertions(+), 8 deletions(-)

-- 
2.40.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/4] nvmet-fcloop: Remove remote port from list when unlinking
  2023-04-11 12:07 [PATCH v2 0/4] nvmet-fcloop: unblock module removal Daniel Wagner
@ 2023-04-11 12:07 ` Daniel Wagner
  2023-04-11 12:07 ` [PATCH v2 2/4] nvmet-fcloop: Do not wait on completion when unregister fails Daniel Wagner
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Daniel Wagner @ 2023-04-11 12:07 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Daniel Wagner

The remote port is never removed from fcloop_nports list. During module
unloading it's possible to end up an busy loop in fcloop_exit, because
the remote port is found in the list and thus we will never progress.

The kernel log will be spammed with

  nvme_fcloop: fcloop_exit: Failed deleting remote port
  nvme_fcloop: fcloop_exit: Failed deleting target port

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/target/fcloop.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index 5c16372f3b53..1e53c8fe4b95 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -1352,6 +1352,8 @@ __unlink_remote_port(struct fcloop_nport *nport)
 		nport->tport->remoteport = NULL;
 	nport->rport = NULL;
 
+	list_del(&nport->nport_list);
+
 	return rport;
 }
 
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/4] nvmet-fcloop: Do not wait on completion when unregister fails
  2023-04-11 12:07 [PATCH v2 0/4] nvmet-fcloop: unblock module removal Daniel Wagner
  2023-04-11 12:07 ` [PATCH v2 1/4] nvmet-fcloop: Remove remote port from list when unlinking Daniel Wagner
@ 2023-04-11 12:07 ` Daniel Wagner
  2023-04-11 12:07 ` [PATCH v2 3/4] nvmet-fc: Do not wait in vain when unloading module Daniel Wagner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Daniel Wagner @ 2023-04-11 12:07 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Daniel Wagner

The nvme_fc_unregister_localport() returns an error code in case that
the locaport pointer is NULL or has already been unegisterd. localport is
is either in the ONLINE state (all resources allocated) or has already
been put into DELETED state.

In this case we will never receive an wakeup call and thus any caller
will hang, e.g. module unload.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/target/fcloop.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index 1e53c8fe4b95..6c3905498a94 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -1162,7 +1162,8 @@ __wait_localport_unreg(struct fcloop_lport *lport)
 
 	ret = nvme_fc_unregister_localport(lport->localport);
 
-	wait_for_completion(&lport->unreg_done);
+	if (!ret)
+		wait_for_completion(&lport->unreg_done);
 
 	kfree(lport);
 
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/4] nvmet-fc: Do not wait in vain when unloading module
  2023-04-11 12:07 [PATCH v2 0/4] nvmet-fcloop: unblock module removal Daniel Wagner
  2023-04-11 12:07 ` [PATCH v2 1/4] nvmet-fcloop: Remove remote port from list when unlinking Daniel Wagner
  2023-04-11 12:07 ` [PATCH v2 2/4] nvmet-fcloop: Do not wait on completion when unregister fails Daniel Wagner
@ 2023-04-11 12:07 ` Daniel Wagner
  2023-04-11 12:07 ` [PATCH v2 4/4] nvmet-fc: Release reference on target port Daniel Wagner
  2023-04-11 15:00 ` [PATCH v2 0/4] nvmet-fcloop: unblock module removal Daniel Wagner
  4 siblings, 0 replies; 6+ messages in thread
From: Daniel Wagner @ 2023-04-11 12:07 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Daniel Wagner

When there is no controller to be deleted the module unload path will
still wait on the nvme_fc_unload_proceed completion. Because this will
will never happen the caller will hang forever.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/host/fc.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 456ee42a6133..df85cf93742b 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3933,10 +3933,11 @@ static int __init nvme_fc_init_module(void)
 	return ret;
 }
 
-static void
+static bool
 nvme_fc_delete_controllers(struct nvme_fc_rport *rport)
 {
 	struct nvme_fc_ctrl *ctrl;
+	bool cleanup = false;
 
 	spin_lock(&rport->lock);
 	list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) {
@@ -3944,21 +3945,28 @@ nvme_fc_delete_controllers(struct nvme_fc_rport *rport)
 			"NVME-FC{%d}: transport unloading: deleting ctrl\n",
 			ctrl->cnum);
 		nvme_delete_ctrl(&ctrl->ctrl);
+		cleanup = true;
 	}
 	spin_unlock(&rport->lock);
+
+	return cleanup;
 }
 
-static void
+static bool
 nvme_fc_cleanup_for_unload(void)
 {
 	struct nvme_fc_lport *lport;
 	struct nvme_fc_rport *rport;
+	bool cleanup = false;
 
 	list_for_each_entry(lport, &nvme_fc_lport_list, port_list) {
 		list_for_each_entry(rport, &lport->endp_list, endp_list) {
-			nvme_fc_delete_controllers(rport);
+			if (nvme_fc_delete_controllers(rport))
+				cleanup = true;
 		}
 	}
+
+	return cleanup;
 }
 
 static void __exit nvme_fc_exit_module(void)
@@ -3968,10 +3976,8 @@ static void __exit nvme_fc_exit_module(void)
 
 	spin_lock_irqsave(&nvme_fc_lock, flags);
 	nvme_fc_waiting_to_unload = true;
-	if (!list_empty(&nvme_fc_lport_list)) {
-		need_cleanup = true;
-		nvme_fc_cleanup_for_unload();
-	}
+	if (!list_empty(&nvme_fc_lport_list))
+		need_cleanup = nvme_fc_cleanup_for_unload();
 	spin_unlock_irqrestore(&nvme_fc_lock, flags);
 	if (need_cleanup) {
 		pr_info("%s: waiting for ctlr deletes\n", __func__);
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 4/4] nvmet-fc: Release reference on target port
  2023-04-11 12:07 [PATCH v2 0/4] nvmet-fcloop: unblock module removal Daniel Wagner
                   ` (2 preceding siblings ...)
  2023-04-11 12:07 ` [PATCH v2 3/4] nvmet-fc: Do not wait in vain when unloading module Daniel Wagner
@ 2023-04-11 12:07 ` Daniel Wagner
  2023-04-11 15:00 ` [PATCH v2 0/4] nvmet-fcloop: unblock module removal Daniel Wagner
  4 siblings, 0 replies; 6+ messages in thread
From: Daniel Wagner @ 2023-04-11 12:07 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Daniel Wagner

In case we return early out of __nvmet_fc_finish_ls_req() we still have
to release the reference on the target port.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/target/fc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 1ab6601fdd5c..df7d84aff843 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -359,6 +359,7 @@ __nvmet_fc_finish_ls_req(struct nvmet_fc_ls_req_op *lsop)
 
 	if (!lsop->req_queued) {
 		spin_unlock_irqrestore(&tgtport->lock, flags);
+		nvmet_fc_tgtport_put(tgtport);
 		return;
 	}
 
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/4] nvmet-fcloop: unblock module removal
  2023-04-11 12:07 [PATCH v2 0/4] nvmet-fcloop: unblock module removal Daniel Wagner
                   ` (3 preceding siblings ...)
  2023-04-11 12:07 ` [PATCH v2 4/4] nvmet-fc: Release reference on target port Daniel Wagner
@ 2023-04-11 15:00 ` Daniel Wagner
  4 siblings, 0 replies; 6+ messages in thread
From: Daniel Wagner @ 2023-04-11 15:00 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Sagi Grimberg, James Smart, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki

On Tue, Apr 11, 2023 at 02:07:14PM +0200, Daniel Wagner wrote:
>  loop: module loaded
>  run blktests nvme/003 at 2023-04-11 13:55:57
>  nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>  nvme nvme0: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
>  (NULL device *): {0:0} Association created
>  [71] nvmet: ctrl 1 start keep-alive timer for 120 secs
>  nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
>  nvme nvme0: NVME-FC{0}: controller connect complete
>  nvme nvme0: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>  nvme nvme1: NVME-FC{1}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
>  (NULL device *): {0:1} Association created
>  [453] nvmet: ctrl 2 start keep-alive timer for 5 secs
>  nvmet: creating nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
>  [71] nvmet: adding queue 1 to ctrl 2.
>  [45] nvmet: adding queue 2 to ctrl 2.
>  [453] nvmet: adding queue 3 to ctrl 2.
>  [105] nvmet: adding queue 4 to ctrl 2.
>  nvme nvme1: NVME-FC{1}: controller connect complete
>  nvme nvme1: NVME-FC{1}: new ctrl: NQN "blktests-subsystem-1"
>  [453] nvmet: ctrl 2 reschedule traffic based keep-alive timer
>  [105] nvmet: ctrl 2 update keep-alive timer for 5 secs
>  [105] nvmet: ctrl 2 update keep-alive timer for 5 secs
>  nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>  [45] nvmet: ctrl 1 stop keep-alive
>  (NULL device *): {0:0} Association deleted
>  (NULL device *): {0:0} Association freed
>  (NULL device *): Disconnect LS failed: No Association

It turns out this sequence is triggered the udev auto connect rules. As I read
the _setup_nvmet() code, the connect should happen via blktest and not udev.
Obviously, this messes with the test case. I'll try to figure out how we can
disable the udev rules during test run.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-11 15:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-11 12:07 [PATCH v2 0/4] nvmet-fcloop: unblock module removal Daniel Wagner
2023-04-11 12:07 ` [PATCH v2 1/4] nvmet-fcloop: Remove remote port from list when unlinking Daniel Wagner
2023-04-11 12:07 ` [PATCH v2 2/4] nvmet-fcloop: Do not wait on completion when unregister fails Daniel Wagner
2023-04-11 12:07 ` [PATCH v2 3/4] nvmet-fc: Do not wait in vain when unloading module Daniel Wagner
2023-04-11 12:07 ` [PATCH v2 4/4] nvmet-fc: Release reference on target port Daniel Wagner
2023-04-11 15:00 ` [PATCH v2 0/4] nvmet-fcloop: unblock module removal Daniel Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox