* [PATCH v3 0/4] nvmet-fc: fixes for blktests tests
@ 2025-09-02 10:21 Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 1/4] nvmet-fc: move lsop put work to nvmet_fc_ls_req_op Daniel Wagner
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Daniel Wagner @ 2025-09-02 10:21 UTC (permalink / raw)
To: James Smart, Christoph Hellwig, Sagi Grimberg, Keith Busch
Cc: Yi Zhang, Shinichiro Kawasaki, Hannes Reinecke, linux-nvme,
linux-kernel, Daniel Wagner
I've added the other two patches I've recenetly send out to this series
for getting blktests with FC more stable. So this series containts all
the latest versions of the patches flying around.
What is missing are the changes for nvme/041. These patches need some
more work and more review.
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
Changes in v3:
- collected tags, updated references
- added https://lore.kernel.org/all/20250901-nvme-fc-fix-leaks-v1-1-3ae0aa88d5e5@kernel.org
- added https://lore.kernel.org/all/ldr3xa4muogowu2xeh3uq3xcifljfftssydgnhjdarfre4kg4p@midqloza2ayz
- Link to v2: https://patch.msgid.link/20250829-fix-nvmet-fc-v2-0-26620e2742c7@kernel.org
Changes in v2:
- rebased
- added "nvmet-fc: avoid scheduling association deletion twice"
- Link to v1: https://patch.msgid.link/20250821-fix-nvmet-fc-v1-1-3349da4f416e@kernel.org
---
Daniel Wagner (4):
nvmet-fc: move lsop put work to nvmet_fc_ls_req_op
nvmet-fc: avoid scheduling association deletion twice
nvmet-fcloop: call done callback even when remote port is gone
nvme-fc: use lock accessing port_state and rport state
drivers/nvme/host/fc.c | 10 ++++++++--
drivers/nvme/target/fc.c | 35 ++++++++++++++++++-----------------
drivers/nvme/target/fcloop.c | 8 +++++---
3 files changed, 31 insertions(+), 22 deletions(-)
---
base-commit: c17b750b3ad9f45f2b6f7e6f7f4679844244f0b9
change-id: 20250821-fix-nvmet-fc-1af98991428f
Best regards,
--
Daniel Wagner <wagi@kernel.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/4] nvmet-fc: move lsop put work to nvmet_fc_ls_req_op
2025-09-02 10:21 [PATCH v3 0/4] nvmet-fc: fixes for blktests tests Daniel Wagner
@ 2025-09-02 10:22 ` Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 2/4] nvmet-fc: avoid scheduling association deletion twice Daniel Wagner
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Daniel Wagner @ 2025-09-02 10:22 UTC (permalink / raw)
To: James Smart, Christoph Hellwig, Sagi Grimberg, Keith Busch
Cc: Yi Zhang, Shinichiro Kawasaki, Hannes Reinecke, linux-nvme,
linux-kernel, Daniel Wagner
It’s possible for more than one async command to be in flight from
__nvmet_fc_send_ls_req. For each command, a tgtport reference is taken.
In the current code, only one put work item is queued at a time, which
results in a leaked reference.
To fix this, move the work item to the nvmet_fc_ls_req_op struct, which
already tracks all resources related to the command.
Fixes: 710c69dbaccd ("nvmet-fc: avoid deadlock on delete association path")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
drivers/nvme/target/fc.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index a9b18c051f5bd830a6ae45ff0f4892c3f28c8608..6725c34dd7c90ae38f8271368e609fd0ba267561 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -54,6 +54,8 @@ struct nvmet_fc_ls_req_op { /* for an LS RQST XMT */
int ls_error;
struct list_head lsreq_list; /* tgtport->ls_req_list */
bool req_queued;
+
+ struct work_struct put_work;
};
@@ -111,8 +113,6 @@ struct nvmet_fc_tgtport {
struct nvmet_fc_port_entry *pe;
struct kref ref;
u32 max_sg_cnt;
-
- struct work_struct put_work;
};
struct nvmet_fc_port_entry {
@@ -235,12 +235,13 @@ static int nvmet_fc_tgt_a_get(struct nvmet_fc_tgt_assoc *assoc);
static void nvmet_fc_tgt_q_put(struct nvmet_fc_tgt_queue *queue);
static int nvmet_fc_tgt_q_get(struct nvmet_fc_tgt_queue *queue);
static void nvmet_fc_tgtport_put(struct nvmet_fc_tgtport *tgtport);
-static void nvmet_fc_put_tgtport_work(struct work_struct *work)
+static void nvmet_fc_put_lsop_work(struct work_struct *work)
{
- struct nvmet_fc_tgtport *tgtport =
- container_of(work, struct nvmet_fc_tgtport, put_work);
+ struct nvmet_fc_ls_req_op *lsop =
+ container_of(work, struct nvmet_fc_ls_req_op, put_work);
- nvmet_fc_tgtport_put(tgtport);
+ nvmet_fc_tgtport_put(lsop->tgtport);
+ kfree(lsop);
}
static int nvmet_fc_tgtport_get(struct nvmet_fc_tgtport *tgtport);
static void nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
@@ -367,7 +368,7 @@ __nvmet_fc_finish_ls_req(struct nvmet_fc_ls_req_op *lsop)
DMA_BIDIRECTIONAL);
out_putwork:
- queue_work(nvmet_wq, &tgtport->put_work);
+ queue_work(nvmet_wq, &lsop->put_work);
}
static int
@@ -388,6 +389,7 @@ __nvmet_fc_send_ls_req(struct nvmet_fc_tgtport *tgtport,
lsreq->done = done;
lsop->req_queued = false;
INIT_LIST_HEAD(&lsop->lsreq_list);
+ INIT_WORK(&lsop->put_work, nvmet_fc_put_lsop_work);
lsreq->rqstdma = fc_dma_map_single(tgtport->dev, lsreq->rqstaddr,
lsreq->rqstlen + lsreq->rsplen,
@@ -447,8 +449,6 @@ nvmet_fc_disconnect_assoc_done(struct nvmefc_ls_req *lsreq, int status)
__nvmet_fc_finish_ls_req(lsop);
/* fc-nvme target doesn't care about success or failure of cmd */
-
- kfree(lsop);
}
/*
@@ -1410,7 +1410,6 @@ nvmet_fc_register_targetport(struct nvmet_fc_port_info *pinfo,
kref_init(&newrec->ref);
ida_init(&newrec->assoc_cnt);
newrec->max_sg_cnt = template->max_sgl_segments;
- INIT_WORK(&newrec->put_work, nvmet_fc_put_tgtport_work);
ret = nvmet_fc_alloc_ls_iodlist(newrec);
if (ret) {
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/4] nvmet-fc: avoid scheduling association deletion twice
2025-09-02 10:21 [PATCH v3 0/4] nvmet-fc: fixes for blktests tests Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 1/4] nvmet-fc: move lsop put work to nvmet_fc_ls_req_op Daniel Wagner
@ 2025-09-02 10:22 ` Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 3/4] nvmet-fcloop: call done callback even when remote port is gone Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 4/4] nvme-fc: use lock accessing port_state and rport state Daniel Wagner
3 siblings, 0 replies; 6+ messages in thread
From: Daniel Wagner @ 2025-09-02 10:22 UTC (permalink / raw)
To: James Smart, Christoph Hellwig, Sagi Grimberg, Keith Busch
Cc: Yi Zhang, Shinichiro Kawasaki, Hannes Reinecke, linux-nvme,
linux-kernel, Daniel Wagner
When forcefully shutting down a port via the configfs interface,
nvmet_port_subsys_drop_link() first calls nvmet_port_del_ctrls() and
then nvmet_disable_port(). Both functions will eventually schedule all
remaining associations for deletion.
The current implementation checks whether an association is about to be
removed, but only after the work item has already been scheduled. As a
result, it is possible for the first scheduled work item to free all
resources, and then for the same work item to be scheduled again for
deletion.
Because the association list is an RCU list, it is not possible to take
a lock and remove the list entry directly, so it cannot be looked up
again. Instead, a flag (terminating) must be used to determine whether
the association is already in the process of being deleted.
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/all/rsdinhafrtlguauhesmrrzkybpnvwantwmyfq2ih5aregghax5@mhr7v3eryci3/
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
drivers/nvme/target/fc.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 6725c34dd7c90ae38f8271368e609fd0ba267561..7d84527d5a43efe1d43ccf5fb8010a4884f99e3e 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -1075,6 +1075,14 @@ nvmet_fc_delete_assoc_work(struct work_struct *work)
static void
nvmet_fc_schedule_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
{
+ int terminating;
+
+ terminating = atomic_xchg(&assoc->terminating, 1);
+
+ /* if already terminating, do nothing */
+ if (terminating)
+ return;
+
nvmet_fc_tgtport_get(assoc->tgtport);
if (!queue_work(nvmet_wq, &assoc->del_work))
nvmet_fc_tgtport_put(assoc->tgtport);
@@ -1202,13 +1210,7 @@ nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
{
struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
unsigned long flags;
- int i, terminating;
-
- terminating = atomic_xchg(&assoc->terminating, 1);
-
- /* if already terminating, do nothing */
- if (terminating)
- return;
+ int i;
spin_lock_irqsave(&tgtport->lock, flags);
list_del_rcu(&assoc->a_list);
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 3/4] nvmet-fcloop: call done callback even when remote port is gone
2025-09-02 10:21 [PATCH v3 0/4] nvmet-fc: fixes for blktests tests Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 1/4] nvmet-fc: move lsop put work to nvmet_fc_ls_req_op Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 2/4] nvmet-fc: avoid scheduling association deletion twice Daniel Wagner
@ 2025-09-02 10:22 ` Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 4/4] nvme-fc: use lock accessing port_state and rport state Daniel Wagner
3 siblings, 0 replies; 6+ messages in thread
From: Daniel Wagner @ 2025-09-02 10:22 UTC (permalink / raw)
To: James Smart, Christoph Hellwig, Sagi Grimberg, Keith Busch
Cc: Yi Zhang, Shinichiro Kawasaki, Hannes Reinecke, linux-nvme,
linux-kernel, Daniel Wagner
When the target port is gone, it's not possible to access any of the
request resources. The function should just silently drop the response.
The comment is misleading in this regard.
Though it's still necessary to call the driver via the ->done callback
so the driver is able to release all resources.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/all/CAHj4cs-OBA0WMt5f7R0dz+rR4HcEz19YLhnyGsj-MRV3jWDsPg@mail.gmail.com/
Fixes: 84eedced1c5b ("nvmet-fcloop: drop response if targetport is gone")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
drivers/nvme/target/fcloop.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index 257b497d515a892a39da82d2f96b3fa3c6e10cdd..5dffcc5becae86c79ef75a123647566b2dfc21f6 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -496,13 +496,15 @@ fcloop_t2h_xmt_ls_rsp(struct nvme_fc_local_port *localport,
if (!targetport) {
/*
* The target port is gone. The target doesn't expect any
- * response anymore and the ->done call is not valid
- * because the resources have been freed by
- * nvmet_fc_free_pending_reqs.
+ * response anymore and thus lsreq can't be accessed anymore.
*
* We end up here from delete association exchange:
* nvmet_fc_xmt_disconnect_assoc sends an async request.
+ *
+ * Return success because this is what LLDDs do; silently
+ * drop the response.
*/
+ lsrsp->done(lsrsp);
kmem_cache_free(lsreq_cache, tls_req);
return 0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 4/4] nvme-fc: use lock accessing port_state and rport state
2025-09-02 10:21 [PATCH v3 0/4] nvmet-fc: fixes for blktests tests Daniel Wagner
` (2 preceding siblings ...)
2025-09-02 10:22 ` [PATCH v3 3/4] nvmet-fcloop: call done callback even when remote port is gone Daniel Wagner
@ 2025-09-02 10:22 ` Daniel Wagner
2025-09-02 10:27 ` Hannes Reinecke
3 siblings, 1 reply; 6+ messages in thread
From: Daniel Wagner @ 2025-09-02 10:22 UTC (permalink / raw)
To: James Smart, Christoph Hellwig, Sagi Grimberg, Keith Busch
Cc: Yi Zhang, Shinichiro Kawasaki, Hannes Reinecke, linux-nvme,
linux-kernel, Daniel Wagner
nvme_fc_unregister_remote removes the remote port on a lport object at
any point in time when there is no active association. This races with
with the reconnect logic, because nvme_fc_create_association is not
taking a lock to check the port_state and atomically increase the
active count on the rport.
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/all/u4ttvhnn7lark5w3sgrbuy2rxupcvosp4qmvj46nwzgeo5ausc@uyrkdls2muwx
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
drivers/nvme/host/fc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 3e12d4683ac72f9ef9c6dff64d22d5d97e6f3d60..03987f497a5b55533ee169c9a7cb9b479d0f2d92 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3032,11 +3032,17 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
++ctrl->ctrl.nr_reconnects;
- if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE)
+ spin_lock_irqsave(&ctrl->rport->lock, flags);
+ if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) {
+ spin_unlock_irqrestore(&ctrl->rport->lock, flags);
return -ENODEV;
+ }
- if (nvme_fc_ctlr_active_on_rport(ctrl))
+ if (nvme_fc_ctlr_active_on_rport(ctrl)) {
+ spin_unlock_irqrestore(&ctrl->rport->lock, flags);
return -ENOTUNIQ;
+ }
+ spin_unlock_irqrestore(&ctrl->rport->lock, flags);
dev_info(ctrl->ctrl.device,
"NVME-FC{%d}: create association : host wwpn 0x%016llx "
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 4/4] nvme-fc: use lock accessing port_state and rport state
2025-09-02 10:22 ` [PATCH v3 4/4] nvme-fc: use lock accessing port_state and rport state Daniel Wagner
@ 2025-09-02 10:27 ` Hannes Reinecke
0 siblings, 0 replies; 6+ messages in thread
From: Hannes Reinecke @ 2025-09-02 10:27 UTC (permalink / raw)
To: Daniel Wagner, James Smart, Christoph Hellwig, Sagi Grimberg,
Keith Busch
Cc: Yi Zhang, Shinichiro Kawasaki, linux-nvme, linux-kernel
On 9/2/25 12:22, Daniel Wagner wrote:
> nvme_fc_unregister_remote removes the remote port on a lport object at
> any point in time when there is no active association. This races with
> with the reconnect logic, because nvme_fc_create_association is not
> taking a lock to check the port_state and atomically increase the
> active count on the rport.
>
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Closes: https://lore.kernel.org/all/u4ttvhnn7lark5w3sgrbuy2rxupcvosp4qmvj46nwzgeo5ausc@uyrkdls2muwx
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
> drivers/nvme/host/fc.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-09-02 10:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-02 10:21 [PATCH v3 0/4] nvmet-fc: fixes for blktests tests Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 1/4] nvmet-fc: move lsop put work to nvmet_fc_ls_req_op Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 2/4] nvmet-fc: avoid scheduling association deletion twice Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 3/4] nvmet-fcloop: call done callback even when remote port is gone Daniel Wagner
2025-09-02 10:22 ` [PATCH v3 4/4] nvme-fc: use lock accessing port_state and rport state Daniel Wagner
2025-09-02 10:27 ` Hannes Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).