public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues
@ 2025-04-24  5:13 Wilfred Mallawa
  2025-04-24  5:13 ` [PATCH 1/5] nvmet: add a helper function for cqid checking Wilfred Mallawa
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Wilfred Mallawa @ 2025-04-24  5:13 UTC (permalink / raw)
  To: linux-nvme, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni
  Cc: dlemoal, alistair.francis, cassel, Wilfred Mallawa

From: Wilfred Mallawa <wilfred.mallawa@wdc.com>

Hi all,

For the NVMe PCI transport, the NVMe specification allows different
submission queues (SQs) to share completion queues (CQs), however,
this is not supported in the current NVMe target implementation.
Until now, the nvmet target implementation enforced a 1:1 relationship
between SQs and CQs, which is not specification compliant for the NVMe
PCI transport.

This patch series adds support for CQ sharing between multiple SQs in the
NVMe target driver, in line with the NVMe PCI transport specification.
This series implements reference counting for completion queues to ensure
proper lifecycle management when shared across multiple submission queues.
This ensures that we retain CQs until all referencing SQs are deleted
first, thereby avoiding premature CQ deletions.

Sample callchain with CQ refcounting for the PCI endpoint target
(pci-epf)
=================================================================

i.   nvmet_execute_create_cq -> nvmet_pci_epf_create_cq -> nvmet_cq_create
     -> nvmet_cq_init			[cq refcount = 1]

ii.  nvmet_execute_create_sq -> nvmet_pci_epf_create_sq -> nvmet_sq_create
     -> nvmet_sq_init -> nvmet_cq_get	[cq refcount = 2]

iii. nvmet_execute_delete_sq -> nvmet_pci_epf_delete_sq ->
     nvmet_sq_destroy -> nvmet_cq_put	[cq refcount = 1]

iv.  nvmet_execute_delete_cq -> nvmet_pci_epf_delete_cq -> nvmet_cq_put
					[cq refcount = 0]

For NVMe over fabrics, CQ sharing is not supported per specification,
however, the fabrics drivers are updated to integrate the new
API changes. No functional change is intended here.

Testing
=======

Core functionality changes were tested with a Rockchip-based Rock5B PCIe
endpoint setup using the pci-epf driver. The host kernel was modified to
support queue sharing. In the test setup, this resulted in IO SQs 1 & 2
using IO CQ 1 and IO SQ 3 & 4 using IO CQ 2.

Testing methodology includes:

For PCI:

1. Boot up host
2. Assert that the endpoint device is detected as an NVMe drive
   (IO CQs/SQs are created)
3. Run FIOs
4. Unbind NVMe driver (IO SQs then CQs are deleted)
5. Rebind NVMe driver (IO SQs then CQs are created)
6. Run FIOs

For NVMe over fabrics: Using NVMe loop driver:

Note that there is no queue sharing supported for fabrics.

1. Connect command (IO queues are created)
2. Run FIOs
3. Disconnect command (IO queues are deleted)

Thanks!

Wilfred Mallawa (5):
  nvmet: add a helper function for cqid checking
  nvmet: cq: prepare for completion queue sharing
  nvmet: fabrics: add CQ init and destroy
  nvmet: support completion queue sharing
  nvmet: Simplify nvmet_req_init() interface

 drivers/nvme/target/admin-cmd.c   | 31 ++++------
 drivers/nvme/target/core.c        | 94 ++++++++++++++++++++++++-------
 drivers/nvme/target/fabrics-cmd.c |  8 +++
 drivers/nvme/target/fc.c          | 10 ++--
 drivers/nvme/target/loop.c        | 23 +++++---
 drivers/nvme/target/nvmet.h       | 24 +++++---
 drivers/nvme/target/pci-epf.c     | 11 ++--
 drivers/nvme/target/rdma.c        |  8 ++-
 drivers/nvme/target/tcp.c         |  8 ++-
 9 files changed, 147 insertions(+), 70 deletions(-)

-- 
2.49.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] nvmet: add a helper function for cqid checking
  2025-04-24  5:13 [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Wilfred Mallawa
@ 2025-04-24  5:13 ` Wilfred Mallawa
  2025-05-07  7:20   ` Damien Le Moal
  2025-04-24  5:13 ` [PATCH 2/5] nvmet: cq: prepare for completion queue sharing Wilfred Mallawa
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Wilfred Mallawa @ 2025-04-24  5:13 UTC (permalink / raw)
  To: linux-nvme, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni
  Cc: dlemoal, alistair.francis, cassel, Wilfred Mallawa

From: Wilfred Mallawa <wilfred.mallawa@wdc.com>

This patch adds a new helper function nvmet_check_io_cqid(). It is to be
used when parsing host commands for IO CQ creation/deletion and IO SQ
creation to ensure that the specified IO completion queue identifier
(CQID) is not 0 (Admin queue ID). This is a check that already occurs in
the nvmet_execute_x() functions prior to nvmet_check_cqid.

With the addition of this helper function, the CQ ID checks in the
nvmet_execute_x() function can be removed, and instead simply call
nvmet_check_io_cqid() in place of nvmet_check_cqid().

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 14 ++------------
 drivers/nvme/target/core.c      |  7 +++++++
 drivers/nvme/target/nvmet.h     |  1 +
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index acc138bbf8f2..753166fbb133 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -96,12 +96,7 @@ static void nvmet_execute_delete_cq(struct nvmet_req *req)
 		goto complete;
 	}
 
-	if (!cqid) {
-		status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
-		goto complete;
-	}
-
-	status = nvmet_check_cqid(ctrl, cqid);
+	status = nvmet_check_io_cqid(ctrl, cqid);
 	if (status != NVME_SC_SUCCESS)
 		goto complete;
 
@@ -127,12 +122,7 @@ static void nvmet_execute_create_cq(struct nvmet_req *req)
 		goto complete;
 	}
 
-	if (!cqid) {
-		status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
-		goto complete;
-	}
-
-	status = nvmet_check_cqid(ctrl, cqid);
+	status = nvmet_check_io_cqid(ctrl, cqid);
 	if (status != NVME_SC_SUCCESS)
 		goto complete;
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 71f8d06998d6..4c0643a72f66 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -856,6 +856,13 @@ u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid)
 	return NVME_SC_SUCCESS;
 }
 
+u16 nvmet_check_io_cqid(struct nvmet_ctrl *ctrl, u16 cqid)
+{
+	if (!cqid)
+		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+	return nvmet_check_cqid(ctrl, cqid);
+}
+
 u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq,
 		    u16 qid, u16 size)
 {
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index b6db8b74dc4a..2f70db1284c9 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -572,6 +572,7 @@ void nvmet_execute_get_features(struct nvmet_req *req);
 void nvmet_execute_keep_alive(struct nvmet_req *req);
 
 u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid);
+u16 nvmet_check_io_cqid(struct nvmet_ctrl *ctrl, u16 cqid);
 void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
 		u16 size);
 u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] nvmet: cq: prepare for completion queue sharing
  2025-04-24  5:13 [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Wilfred Mallawa
  2025-04-24  5:13 ` [PATCH 1/5] nvmet: add a helper function for cqid checking Wilfred Mallawa
@ 2025-04-24  5:13 ` Wilfred Mallawa
  2025-05-07  7:25   ` Damien Le Moal
  2025-04-24  5:13 ` [PATCH 3/5] nvmet: fabrics: add CQ init and destroy Wilfred Mallawa
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Wilfred Mallawa @ 2025-04-24  5:13 UTC (permalink / raw)
  To: linux-nvme, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni
  Cc: dlemoal, alistair.francis, cassel, Wilfred Mallawa

From: Wilfred Mallawa <wilfred.mallawa@wdc.com>

For the PCI transport, the NVMe specification allows submission queues to
share completion queues, however, this is not supported in the current
NVMe target implementation. This is a preparatory patch to allow for
completion queue (CQ) sharing between different submission queues (SQ).

To support queue sharing, reference counting completion queues is
required. This patch adds the refcount_t field ref to struct nvmet_cq
coupled with respective nvmet_cq_init(), nvmet_cq_get(), nvmet_cq_put(),
nvmet_cq_is_deletable() and nvmet_cq_destroy() functions.

A CQ reference count is initialized with nvmet_cq_init() when a CQ is
created. Using nvmet_cq_get(), a reference to a CQ is taken when an SQ is
created that uses the respective CQ. Similarly. when an SQ is destroyed,
the reference count to the respective CQ from the SQ being destroyed is
decremented with nvmet_cq_put(). The last reference to a CQ is dropped
on a CQ deletion using nvmet_cq_put(), which invokes nvmet_cq_destroy()
to fully cleanup after the CQ. The helper function nvmet_cq_in_use() is
used to determine if any SQs are still using the CQ pending deletion.
In which case, the CQ must not be deleted. This should protect scenarios
where a bad host may attempt to delete a CQ without first having deleted
SQ(s) using that CQ.

Additionally, this patch adds an array of struct nvmet_cq to the
nvmet_ctrl structure. This allows for the controller to keep track of CQs
as they are created and destroyed, similar to the current tracking done
for SQs. The memory for this array is freed when the controller is freed.
A struct nvmet_ctrl reference is also added to the nvmet_cq structure to
allow for CQs to be removed from the controller whilst keeping the new API
similar to the existing API for SQs.

Sample callchain with CQ refcounting for the PCI endpoint target
(pci-epf):

i.   nvmet_execute_create_cq -> nvmet_pci_epf_create_cq -> nvmet_cq_create
     -> nvmet_cq_init [cq refcount=1]

ii.  nvmet_execute_create_sq -> nvmet_pci_epf_create_sq -> nvmet_sq_create
     -> nvmet_sq_init -> nvmet_cq_get [cq refcount=2]

iii. nvmet_execute_delete_sq - > nvmet_pci_epf_delete_sq ->
     -> nvmet_sq_destroy -> nvmet_cq_put [cq refcount 1]

iv.  nvmet_execute_delete_cq -> nvmet_pci_epf_delete_cq -> nvmet_cq_put
     [cq refcount 0]

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
---
 drivers/nvme/target/admin-cmd.c |  4 +-
 drivers/nvme/target/core.c      | 68 +++++++++++++++++++++++++--------
 drivers/nvme/target/nvmet.h     | 12 +++++-
 drivers/nvme/target/pci-epf.c   |  1 +
 4 files changed, 66 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 753166fbb133..5e3699973d56 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -96,7 +96,7 @@ static void nvmet_execute_delete_cq(struct nvmet_req *req)
 		goto complete;
 	}
 
-	status = nvmet_check_io_cqid(ctrl, cqid);
+	status = nvmet_check_io_cqid(ctrl, cqid, false);
 	if (status != NVME_SC_SUCCESS)
 		goto complete;
 
@@ -122,7 +122,7 @@ static void nvmet_execute_create_cq(struct nvmet_req *req)
 		goto complete;
 	}
 
-	status = nvmet_check_io_cqid(ctrl, cqid);
+	status = nvmet_check_io_cqid(ctrl, cqid, true);
 	if (status != NVME_SC_SUCCESS)
 		goto complete;
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 4c0643a72f66..a622a6b886cb 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -810,11 +810,43 @@ void nvmet_req_complete(struct nvmet_req *req, u16 status)
 }
 EXPORT_SYMBOL_GPL(nvmet_req_complete);
 
+void nvmet_cq_init(struct nvmet_cq *cq)
+{
+	refcount_set(&cq->ref, 1);
+}
+EXPORT_SYMBOL_GPL(nvmet_cq_init);
+
+bool nvmet_cq_get(struct nvmet_cq *cq)
+{
+	return refcount_inc_not_zero(&cq->ref);
+}
+EXPORT_SYMBOL_GPL(nvmet_cq_get);
+
+void nvmet_cq_put(struct nvmet_cq *cq)
+{
+	if (refcount_dec_and_test(&cq->ref))
+		nvmet_cq_destroy(cq);
+}
+EXPORT_SYMBOL_GPL(nvmet_cq_put);
+
 void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq,
 		u16 qid, u16 size)
 {
 	cq->qid = qid;
 	cq->size = size;
+
+	ctrl->cqs[qid] = cq;
+}
+
+void nvmet_cq_destroy(struct nvmet_cq *cq)
+{
+	struct nvmet_ctrl *ctrl = cq->ctrl;
+
+	if (ctrl) {
+		ctrl->cqs[cq->qid] = NULL;
+		nvmet_ctrl_put(cq->ctrl);
+		cq->ctrl = NULL;
+	}
 }
 
 void nvmet_sq_setup(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq,
@@ -834,41 +866,39 @@ static void nvmet_confirm_sq(struct percpu_ref *ref)
 	complete(&sq->confirm_done);
 }
 
-u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid)
+u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid, bool create)
 {
-	if (!ctrl->sqs)
+	if (!ctrl->cqs)
 		return NVME_SC_INTERNAL | NVME_STATUS_DNR;
 
 	if (cqid > ctrl->subsys->max_qid)
 		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
 
-	/*
-	 * Note: For PCI controllers, the NVMe specifications allows multiple
-	 * SQs to share a single CQ. However, we do not support this yet, so
-	 * check that there is no SQ defined for a CQ. If one exist, then the
-	 * CQ ID is invalid for creation as well as when the CQ is being
-	 * deleted (as that would mean that the SQ was not deleted before the
-	 * CQ).
-	 */
-	if (ctrl->sqs[cqid])
+	if ((create && ctrl->cqs[cqid]) || (!create && !ctrl->cqs[cqid]))
 		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
 
 	return NVME_SC_SUCCESS;
 }
 
-u16 nvmet_check_io_cqid(struct nvmet_ctrl *ctrl, u16 cqid)
+u16 nvmet_check_io_cqid(struct nvmet_ctrl *ctrl, u16 cqid, bool create)
 {
 	if (!cqid)
 		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
-	return nvmet_check_cqid(ctrl, cqid);
+	return nvmet_check_cqid(ctrl, cqid, create);
 }
 
+bool nvmet_cq_in_use(struct nvmet_cq *cq)
+{
+	return refcount_read(&cq->ref) > 1;
+}
+EXPORT_SYMBOL_GPL(nvmet_cq_in_use);
+
 u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq,
 		    u16 qid, u16 size)
 {
 	u16 status;
 
-	status = nvmet_check_cqid(ctrl, qid);
+	status = nvmet_check_cqid(ctrl, qid, true);
 	if (status != NVME_SC_SUCCESS)
 		return status;
 
@@ -1616,12 +1646,17 @@ struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args)
 	if (!ctrl->sqs)
 		goto out_free_changed_ns_list;
 
+	ctrl->cqs = kcalloc(subsys->max_qid + 1, sizeof(struct nvmet_cq *),
+			   GFP_KERNEL);
+	if (!ctrl->cqs)
+		goto out_free_sqs;
+
 	ret = ida_alloc_range(&cntlid_ida,
 			     subsys->cntlid_min, subsys->cntlid_max,
 			     GFP_KERNEL);
 	if (ret < 0) {
 		args->status = NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR;
-		goto out_free_sqs;
+		goto out_free_cqs;
 	}
 	ctrl->cntlid = ret;
 
@@ -1680,6 +1715,8 @@ struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args)
 	mutex_unlock(&subsys->lock);
 	nvmet_stop_keep_alive_timer(ctrl);
 	ida_free(&cntlid_ida, ctrl->cntlid);
+out_free_cqs:
+	kfree(ctrl->cqs);
 out_free_sqs:
 	kfree(ctrl->sqs);
 out_free_changed_ns_list:
@@ -1716,6 +1753,7 @@ static void nvmet_ctrl_free(struct kref *ref)
 
 	nvmet_async_events_free(ctrl);
 	kfree(ctrl->sqs);
+	kfree(ctrl->cqs);
 	kfree(ctrl->changed_ns_list);
 	kfree(ctrl);
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 2f70db1284c9..c87f6fb458e8 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -141,8 +141,10 @@ static inline struct device *nvmet_ns_dev(struct nvmet_ns *ns)
 }
 
 struct nvmet_cq {
+	struct nvmet_ctrl	*ctrl;
 	u16			qid;
 	u16			size;
+	refcount_t		ref;
 };
 
 struct nvmet_sq {
@@ -247,6 +249,7 @@ struct nvmet_pr_log_mgr {
 struct nvmet_ctrl {
 	struct nvmet_subsys	*subsys;
 	struct nvmet_sq		**sqs;
+	struct nvmet_cq		**cqs;
 
 	void			*drvdata;
 
@@ -571,12 +574,17 @@ void nvmet_execute_set_features(struct nvmet_req *req);
 void nvmet_execute_get_features(struct nvmet_req *req);
 void nvmet_execute_keep_alive(struct nvmet_req *req);
 
-u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid);
-u16 nvmet_check_io_cqid(struct nvmet_ctrl *ctrl, u16 cqid);
+u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid, bool create);
+u16 nvmet_check_io_cqid(struct nvmet_ctrl *ctrl, u16 cqid, bool create);
+void nvmet_cq_init(struct nvmet_cq *cq);
 void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
 		u16 size);
 u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
 		u16 size);
+void nvmet_cq_destroy(struct nvmet_cq *cq);
+bool nvmet_cq_get(struct nvmet_cq *cq);
+void nvmet_cq_put(struct nvmet_cq *cq);
+bool nvmet_cq_in_use(struct nvmet_cq *cq);
 u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid, bool create);
 void nvmet_sq_setup(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid,
 		u16 size);
diff --git a/drivers/nvme/target/pci-epf.c b/drivers/nvme/target/pci-epf.c
index 7fab7f3d79b7..7dda4156d86c 100644
--- a/drivers/nvme/target/pci-epf.c
+++ b/drivers/nvme/target/pci-epf.c
@@ -1346,6 +1346,7 @@ static u16 nvmet_pci_epf_delete_cq(struct nvmet_ctrl *tctrl, u16 cqid)
 	nvmet_pci_epf_drain_queue(cq);
 	nvmet_pci_epf_remove_irq_vector(ctrl, cq->vector);
 	nvmet_pci_epf_mem_unmap(ctrl->nvme_epf, &cq->pci_map);
+	tctrl->cqs[cqid] = NULL;
 
 	return NVME_SC_SUCCESS;
 }
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] nvmet: fabrics: add CQ init and destroy
  2025-04-24  5:13 [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Wilfred Mallawa
  2025-04-24  5:13 ` [PATCH 1/5] nvmet: add a helper function for cqid checking Wilfred Mallawa
  2025-04-24  5:13 ` [PATCH 2/5] nvmet: cq: prepare for completion queue sharing Wilfred Mallawa
@ 2025-04-24  5:13 ` Wilfred Mallawa
  2025-05-07  7:27   ` Damien Le Moal
  2025-04-24  5:13 ` [PATCH 4/5] nvmet: support completion queue sharing Wilfred Mallawa
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Wilfred Mallawa @ 2025-04-24  5:13 UTC (permalink / raw)
  To: linux-nvme, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni
  Cc: dlemoal, alistair.francis, cassel, Wilfred Mallawa

From: Wilfred Mallawa <wilfred.mallawa@wdc.com>

With struct nvmet_cq now having a reference count, this patch
amends the target fabrics call chain to initialize and destroy/put a
completion queue.

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
---
 drivers/nvme/target/fabrics-cmd.c |  8 ++++++++
 drivers/nvme/target/fc.c          |  3 +++
 drivers/nvme/target/loop.c        | 13 +++++++++++--
 drivers/nvme/target/rdma.c        |  3 +++
 drivers/nvme/target/tcp.c         |  3 +++
 5 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index f012bdf89850..3fb4a7010d8e 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -208,6 +208,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl *ctrl, struct nvmet_req *req)
 		return NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR;
 	}
 
+	kref_get(&ctrl->ref);
+	old = cmpxchg(&req->cq->ctrl, NULL, ctrl);
+	if (old) {
+		pr_warn("queue already connected!\n");
+		req->error_loc = offsetof(struct nvmf_connect_command, opcode);
+		return NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR;
+	}
+
 	/* note: convert queue size from 0's-based value to 1's-based value */
 	nvmet_cq_setup(ctrl, req->cq, qid, sqsize + 1);
 	nvmet_sq_setup(ctrl, req->sq, qid, sqsize + 1);
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 7b50130f10f6..7c2a4e2eb315 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -816,6 +816,7 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
 
 	nvmet_fc_prep_fcp_iodlist(assoc->tgtport, queue);
 
+	nvmet_cq_init(&queue->nvme_cq);
 	ret = nvmet_sq_init(&queue->nvme_sq);
 	if (ret)
 		goto out_fail_iodlist;
@@ -826,6 +827,7 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
 	return queue;
 
 out_fail_iodlist:
+	nvmet_cq_put(&queue->nvme_cq);
 	nvmet_fc_destroy_fcp_iodlist(assoc->tgtport, queue);
 	destroy_workqueue(queue->work_q);
 out_free_queue:
@@ -934,6 +936,7 @@ nvmet_fc_delete_target_queue(struct nvmet_fc_tgt_queue *queue)
 	flush_workqueue(queue->work_q);
 
 	nvmet_sq_destroy(&queue->nvme_sq);
+	nvmet_cq_put(&queue->nvme_cq);
 
 	nvmet_fc_tgt_q_put(queue);
 }
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index a5c41144667c..85a97f843dd5 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -273,6 +273,7 @@ static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
 	nvme_unquiesce_admin_queue(&ctrl->ctrl);
 
 	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
+	nvmet_cq_put(&ctrl->queues[0].nvme_cq);
 	nvme_remove_admin_tag_set(&ctrl->ctrl);
 }
 
@@ -302,6 +303,7 @@ static void nvme_loop_destroy_io_queues(struct nvme_loop_ctrl *ctrl)
 	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
 		clear_bit(NVME_LOOP_Q_LIVE, &ctrl->queues[i].flags);
 		nvmet_sq_destroy(&ctrl->queues[i].nvme_sq);
+		nvmet_cq_put(&ctrl->queues[i].nvme_cq);
 	}
 	ctrl->ctrl.queue_count = 1;
 	/*
@@ -327,9 +329,12 @@ static int nvme_loop_init_io_queues(struct nvme_loop_ctrl *ctrl)
 
 	for (i = 1; i <= nr_io_queues; i++) {
 		ctrl->queues[i].ctrl = ctrl;
+		nvmet_cq_init(&ctrl->queues[i].nvme_cq);
 		ret = nvmet_sq_init(&ctrl->queues[i].nvme_sq);
-		if (ret)
+		if (ret) {
+			nvmet_cq_put(&ctrl->queues[i].nvme_cq);
 			goto out_destroy_queues;
+		}
 
 		ctrl->ctrl.queue_count++;
 	}
@@ -360,9 +365,12 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	int error;
 
 	ctrl->queues[0].ctrl = ctrl;
+	nvmet_cq_init(&ctrl->queues[0].nvme_cq);
 	error = nvmet_sq_init(&ctrl->queues[0].nvme_sq);
-	if (error)
+	if (error) {
+		nvmet_cq_put(&ctrl->queues[0].nvme_cq);
 		return error;
+	}
 	ctrl->ctrl.queue_count = 1;
 
 	error = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
@@ -401,6 +409,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	nvme_remove_admin_tag_set(&ctrl->ctrl);
 out_free_sq:
 	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
+	nvmet_cq_put(&ctrl->queues[0].nvme_cq);
 	return error;
 }
 
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 2a4536ef6184..3ad9b4d1fad2 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1353,6 +1353,7 @@ static void nvmet_rdma_free_queue(struct nvmet_rdma_queue *queue)
 	pr_debug("freeing queue %d\n", queue->idx);
 
 	nvmet_sq_destroy(&queue->nvme_sq);
+	nvmet_cq_put(&queue->nvme_cq);
 
 	nvmet_rdma_destroy_queue_ib(queue);
 	if (!queue->nsrq) {
@@ -1436,6 +1437,7 @@ nvmet_rdma_alloc_queue(struct nvmet_rdma_device *ndev,
 		goto out_reject;
 	}
 
+	nvmet_cq_init(&queue->nvme_cq);
 	ret = nvmet_sq_init(&queue->nvme_sq);
 	if (ret) {
 		ret = NVME_RDMA_CM_NO_RSC;
@@ -1517,6 +1519,7 @@ nvmet_rdma_alloc_queue(struct nvmet_rdma_device *ndev,
 out_destroy_sq:
 	nvmet_sq_destroy(&queue->nvme_sq);
 out_free_queue:
+	nvmet_cq_put(&queue->nvme_cq);
 	kfree(queue);
 out_reject:
 	nvmet_rdma_cm_reject(cm_id, ret);
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index f2d0c920269b..5045d1bc0412 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1612,6 +1612,7 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w)
 	nvmet_sq_put_tls_key(&queue->nvme_sq);
 	nvmet_tcp_uninit_data_in_cmds(queue);
 	nvmet_sq_destroy(&queue->nvme_sq);
+	nvmet_cq_put(&queue->nvme_cq);
 	cancel_work_sync(&queue->io_work);
 	nvmet_tcp_free_cmd_data_in_buffers(queue);
 	/* ->sock will be released by fput() */
@@ -1947,6 +1948,7 @@ static void nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
 	if (ret)
 		goto out_ida_remove;
 
+	nvmet_cq_init(&queue->nvme_cq);
 	ret = nvmet_sq_init(&queue->nvme_sq);
 	if (ret)
 		goto out_free_connect;
@@ -1990,6 +1992,7 @@ static void nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
 	mutex_unlock(&nvmet_tcp_queue_mutex);
 	nvmet_sq_destroy(&queue->nvme_sq);
 out_free_connect:
+	nvmet_cq_put(&queue->nvme_cq);
 	nvmet_tcp_free_cmd(&queue->connect);
 out_ida_remove:
 	ida_free(&nvmet_tcp_queue_ida, queue->idx);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] nvmet: support completion queue sharing
  2025-04-24  5:13 [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Wilfred Mallawa
                   ` (2 preceding siblings ...)
  2025-04-24  5:13 ` [PATCH 3/5] nvmet: fabrics: add CQ init and destroy Wilfred Mallawa
@ 2025-04-24  5:13 ` Wilfred Mallawa
  2025-05-07  7:29   ` Damien Le Moal
  2025-04-24  5:13 ` [PATCH 5/5] nvmet: Simplify nvmet_req_init() interface Wilfred Mallawa
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Wilfred Mallawa @ 2025-04-24  5:13 UTC (permalink / raw)
  To: linux-nvme, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni
  Cc: dlemoal, alistair.francis, cassel, Wilfred Mallawa

From: Wilfred Mallawa <wilfred.mallawa@wdc.com>

The NVMe PCI transport specification allows for completion queues to be
shared by different submission queues.

This patch allows a submission queue to keep track of the completion queue
it is using with reference counting. As such, it can be ensured that a
completion queue is not deleted while a submission queue is actively
using it.

This patch enables completion queue sharing in the pci-epf target driver.
For fabrics drivers, completion queue sharing is not enabled as it is
not possible as per the fabrics specification. However, this patch
modifies the fabrics drivers to correctly integrate the new API that
supports completion queue sharing.

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 19 ++++++++++---------
 drivers/nvme/target/core.c      | 17 ++++++++++++++---
 drivers/nvme/target/fc.c        |  2 +-
 drivers/nvme/target/loop.c      |  4 ++--
 drivers/nvme/target/nvmet.h     |  9 +++++----
 drivers/nvme/target/pci-epf.c   |  9 +++++----
 drivers/nvme/target/rdma.c      |  2 +-
 drivers/nvme/target/tcp.c       |  2 +-
 8 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 5e3699973d56..c7317299078d 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -63,14 +63,9 @@ static void nvmet_execute_create_sq(struct nvmet_req *req)
 	if (status != NVME_SC_SUCCESS)
 		goto complete;
 
-	/*
-	 * Note: The NVMe specification allows multiple SQs to use the same CQ.
-	 * However, the target code does not really support that. So for now,
-	 * prevent this and fail the command if sqid and cqid are different.
-	 */
-	if (!cqid || cqid != sqid) {
-		pr_err("SQ %u: Unsupported CQID %u\n", sqid, cqid);
-		status = NVME_SC_CQ_INVALID | NVME_STATUS_DNR;
+	status = nvmet_check_io_cqid(ctrl, cqid, false);
+	if (status != NVME_SC_SUCCESS) {
+		pr_err("SQ %u: Invalid CQID %u\n", sqid, cqid);
 		goto complete;
 	}
 
@@ -79,7 +74,7 @@ static void nvmet_execute_create_sq(struct nvmet_req *req)
 		goto complete;
 	}
 
-	status = ctrl->ops->create_sq(ctrl, sqid, sq_flags, qsize, prp1);
+	status = ctrl->ops->create_sq(ctrl, sqid, cqid, sq_flags, qsize, prp1);
 
 complete:
 	nvmet_req_complete(req, status);
@@ -100,6 +95,12 @@ static void nvmet_execute_delete_cq(struct nvmet_req *req)
 	if (status != NVME_SC_SUCCESS)
 		goto complete;
 
+	if (!ctrl->cqs[cqid] || nvmet_cq_in_use(ctrl->cqs[cqid])) {
+		/* Some SQs are still using this CQ */
+		status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+		goto complete;
+	}
+
 	status = ctrl->ops->delete_cq(ctrl, cqid);
 
 complete:
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index a622a6b886cb..d989880bbafc 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -902,6 +902,11 @@ u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq,
 	if (status != NVME_SC_SUCCESS)
 		return status;
 
+	if (!kref_get_unless_zero(&ctrl->ref))
+		return NVME_SC_INTERNAL | NVME_STATUS_DNR;
+	cq->ctrl = ctrl;
+
+	nvmet_cq_init(cq);
 	nvmet_cq_setup(ctrl, cq, qid, size);
 
 	return NVME_SC_SUCCESS;
@@ -925,7 +930,7 @@ u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid,
 }
 
 u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq,
-		    u16 sqid, u16 size)
+		    struct nvmet_cq *cq, u16 sqid, u16 size)
 {
 	u16 status;
 	int ret;
@@ -937,7 +942,7 @@ u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq,
 	if (status != NVME_SC_SUCCESS)
 		return status;
 
-	ret = nvmet_sq_init(sq);
+	ret = nvmet_sq_init(sq, cq);
 	if (ret) {
 		status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
 		goto ctrl_put;
@@ -969,6 +974,7 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
 	wait_for_completion(&sq->free_done);
 	percpu_ref_exit(&sq->ref);
 	nvmet_auth_sq_free(sq);
+	nvmet_cq_put(sq->cq);
 
 	/*
 	 * we must reference the ctrl again after waiting for inflight IO
@@ -1001,18 +1007,23 @@ static void nvmet_sq_free(struct percpu_ref *ref)
 	complete(&sq->free_done);
 }
 
-int nvmet_sq_init(struct nvmet_sq *sq)
+int nvmet_sq_init(struct nvmet_sq *sq, struct nvmet_cq *cq)
 {
 	int ret;
 
+	if (!nvmet_cq_get(cq))
+		return ret;
+
 	ret = percpu_ref_init(&sq->ref, nvmet_sq_free, 0, GFP_KERNEL);
 	if (ret) {
 		pr_err("percpu_ref init failed!\n");
+		nvmet_cq_put(cq);
 		return ret;
 	}
 	init_completion(&sq->free_done);
 	init_completion(&sq->confirm_done);
 	nvmet_auth_sq_init(sq);
+	sq->cq = cq;
 
 	return 0;
 }
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 7c2a4e2eb315..2e813e51549c 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -817,7 +817,7 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
 	nvmet_fc_prep_fcp_iodlist(assoc->tgtport, queue);
 
 	nvmet_cq_init(&queue->nvme_cq);
-	ret = nvmet_sq_init(&queue->nvme_sq);
+	ret = nvmet_sq_init(&queue->nvme_sq, &queue->nvme_cq);
 	if (ret)
 		goto out_fail_iodlist;
 
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 85a97f843dd5..9354a58456e0 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -330,7 +330,7 @@ static int nvme_loop_init_io_queues(struct nvme_loop_ctrl *ctrl)
 	for (i = 1; i <= nr_io_queues; i++) {
 		ctrl->queues[i].ctrl = ctrl;
 		nvmet_cq_init(&ctrl->queues[i].nvme_cq);
-		ret = nvmet_sq_init(&ctrl->queues[i].nvme_sq);
+		ret = nvmet_sq_init(&ctrl->queues[i].nvme_sq, &ctrl->queues[i].nvme_cq);
 		if (ret) {
 			nvmet_cq_put(&ctrl->queues[i].nvme_cq);
 			goto out_destroy_queues;
@@ -366,7 +366,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 
 	ctrl->queues[0].ctrl = ctrl;
 	nvmet_cq_init(&ctrl->queues[0].nvme_cq);
-	error = nvmet_sq_init(&ctrl->queues[0].nvme_sq);
+	error = nvmet_sq_init(&ctrl->queues[0].nvme_sq, &ctrl->queues[0].nvme_cq);
 	if (error) {
 		nvmet_cq_put(&ctrl->queues[0].nvme_cq);
 		return error;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index c87f6fb458e8..d3795b09fcc4 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -150,6 +150,7 @@ struct nvmet_cq {
 struct nvmet_sq {
 	struct nvmet_ctrl	*ctrl;
 	struct percpu_ref	ref;
+	struct nvmet_cq		*cq;
 	u16			qid;
 	u16			size;
 	u32			sqhd;
@@ -427,7 +428,7 @@ struct nvmet_fabrics_ops {
 	u16 (*get_max_queue_size)(const struct nvmet_ctrl *ctrl);
 
 	/* Operations mandatory for PCI target controllers */
-	u16 (*create_sq)(struct nvmet_ctrl *ctrl, u16 sqid, u16 flags,
+	u16 (*create_sq)(struct nvmet_ctrl *ctrl, u16 sqid, u16 cqid, u16 flags,
 			 u16 qsize, u64 prp1);
 	u16 (*delete_sq)(struct nvmet_ctrl *ctrl, u16 sqid);
 	u16 (*create_cq)(struct nvmet_ctrl *ctrl, u16 cqid, u16 flags,
@@ -588,10 +589,10 @@ bool nvmet_cq_in_use(struct nvmet_cq *cq);
 u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid, bool create);
 void nvmet_sq_setup(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid,
 		u16 size);
-u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid,
-		u16 size);
+u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq,
+	struct nvmet_cq *cq, u16 qid, u16 size);
 void nvmet_sq_destroy(struct nvmet_sq *sq);
-int nvmet_sq_init(struct nvmet_sq *sq);
+int nvmet_sq_init(struct nvmet_sq *sq, struct nvmet_cq *cq);
 
 void nvmet_ctrl_fatal_error(struct nvmet_ctrl *ctrl);
 
diff --git a/drivers/nvme/target/pci-epf.c b/drivers/nvme/target/pci-epf.c
index 7dda4156d86c..e5030bca18ee 100644
--- a/drivers/nvme/target/pci-epf.c
+++ b/drivers/nvme/target/pci-epf.c
@@ -1346,16 +1346,17 @@ static u16 nvmet_pci_epf_delete_cq(struct nvmet_ctrl *tctrl, u16 cqid)
 	nvmet_pci_epf_drain_queue(cq);
 	nvmet_pci_epf_remove_irq_vector(ctrl, cq->vector);
 	nvmet_pci_epf_mem_unmap(ctrl->nvme_epf, &cq->pci_map);
-	tctrl->cqs[cqid] = NULL;
+	nvmet_cq_put(&cq->nvme_cq);
 
 	return NVME_SC_SUCCESS;
 }
 
 static u16 nvmet_pci_epf_create_sq(struct nvmet_ctrl *tctrl,
-		u16 sqid, u16 flags, u16 qsize, u64 pci_addr)
+		u16 sqid, u16 cqid, u16 flags, u16 qsize, u64 pci_addr)
 {
 	struct nvmet_pci_epf_ctrl *ctrl = tctrl->drvdata;
 	struct nvmet_pci_epf_queue *sq = &ctrl->sq[sqid];
+	struct nvmet_pci_epf_queue *cq = &ctrl->cq[cqid];
 	u16 status;
 
 	if (test_bit(NVMET_PCI_EPF_Q_LIVE, &sq->flags))
@@ -1378,7 +1379,7 @@ static u16 nvmet_pci_epf_create_sq(struct nvmet_ctrl *tctrl,
 		sq->qes = ctrl->io_sqes;
 	sq->pci_size = sq->qes * sq->depth;
 
-	status = nvmet_sq_create(tctrl, &sq->nvme_sq, sqid, sq->depth);
+	status = nvmet_sq_create(tctrl, &sq->nvme_sq, &cq->nvme_cq, sqid, sq->depth);
 	if (status != NVME_SC_SUCCESS)
 		return status;
 
@@ -1873,7 +1874,7 @@ static int nvmet_pci_epf_enable_ctrl(struct nvmet_pci_epf_ctrl *ctrl)
 
 	qsize = aqa & 0x00000fff;
 	pci_addr = asq & GENMASK_ULL(63, 12);
-	status = nvmet_pci_epf_create_sq(ctrl->tctrl, 0, NVME_QUEUE_PHYS_CONTIG,
+	status = nvmet_pci_epf_create_sq(ctrl->tctrl, 0, 0, NVME_QUEUE_PHYS_CONTIG,
 					 qsize, pci_addr);
 	if (status != NVME_SC_SUCCESS) {
 		dev_err(ctrl->dev, "Failed to create admin submission queue\n");
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 3ad9b4d1fad2..2e5c32298818 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1438,7 +1438,7 @@ nvmet_rdma_alloc_queue(struct nvmet_rdma_device *ndev,
 	}
 
 	nvmet_cq_init(&queue->nvme_cq);
-	ret = nvmet_sq_init(&queue->nvme_sq);
+	ret = nvmet_sq_init(&queue->nvme_sq, &queue->nvme_cq);
 	if (ret) {
 		ret = NVME_RDMA_CM_NO_RSC;
 		goto out_free_queue;
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 5045d1bc0412..2223cfd00b58 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1949,7 +1949,7 @@ static void nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
 		goto out_ida_remove;
 
 	nvmet_cq_init(&queue->nvme_cq);
-	ret = nvmet_sq_init(&queue->nvme_sq);
+	ret = nvmet_sq_init(&queue->nvme_sq, &queue->nvme_cq);
 	if (ret)
 		goto out_free_connect;
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] nvmet: Simplify nvmet_req_init() interface
  2025-04-24  5:13 [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Wilfred Mallawa
                   ` (3 preceding siblings ...)
  2025-04-24  5:13 ` [PATCH 4/5] nvmet: support completion queue sharing Wilfred Mallawa
@ 2025-04-24  5:13 ` Wilfred Mallawa
  2025-05-07  7:29   ` Damien Le Moal
  2025-04-24 13:16 ` [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Niklas Cassel
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Wilfred Mallawa @ 2025-04-24  5:13 UTC (permalink / raw)
  To: linux-nvme, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni
  Cc: dlemoal, alistair.francis, cassel, Wilfred Mallawa

From: Wilfred Mallawa <wilfred.mallawa@wdc.com>

Now that a submission queue holds a reference to its completion queue,
there is no need to pass the cq argument to nvmet_req_init(), so remove it.

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
---
 drivers/nvme/target/core.c    | 6 +++---
 drivers/nvme/target/fc.c      | 5 +----
 drivers/nvme/target/loop.c    | 6 ++----
 drivers/nvme/target/nvmet.h   | 4 ++--
 drivers/nvme/target/pci-epf.c | 3 +--
 drivers/nvme/target/rdma.c    | 3 +--
 drivers/nvme/target/tcp.c     | 3 +--
 7 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index d989880bbafc..042039011d7c 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -1153,13 +1153,13 @@ static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
 	return ret;
 }
 
-bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
-		struct nvmet_sq *sq, const struct nvmet_fabrics_ops *ops)
+bool nvmet_req_init(struct nvmet_req *req, struct nvmet_sq *sq,
+		const struct nvmet_fabrics_ops *ops)
 {
 	u8 flags = req->cmd->common.flags;
 	u16 status;
 
-	req->cq = cq;
+	req->cq = sq->cq;
 	req->sq = sq;
 	req->ops = ops;
 	req->sg = NULL;
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 2e813e51549c..6d3344726f84 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -2534,10 +2534,7 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
 	fod->data_sg = NULL;
 	fod->data_sg_cnt = 0;
 
-	ret = nvmet_req_init(&fod->req,
-				&fod->queue->nvme_cq,
-				&fod->queue->nvme_sq,
-				&nvmet_fc_tgt_fcp_ops);
+	ret = nvmet_req_init(&fod->req, &fod->queue->nvme_sq, &nvmet_fc_tgt_fcp_ops);
 	if (!ret) {
 		/* bad SQE content or invalid ctrl state */
 		/* nvmet layer has already called op done to send rsp. */
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 9354a58456e0..80f8281a98c3 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -148,8 +148,7 @@ static blk_status_t nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	nvme_start_request(req);
 	iod->cmd.common.flags |= NVME_CMD_SGL_METABUF;
 	iod->req.port = queue->ctrl->port;
-	if (!nvmet_req_init(&iod->req, &queue->nvme_cq,
-			&queue->nvme_sq, &nvme_loop_ops))
+	if (!nvmet_req_init(&iod->req, &queue->nvme_sq, &nvme_loop_ops))
 		return BLK_STS_OK;
 
 	if (blk_rq_nr_phys_segments(req)) {
@@ -181,8 +180,7 @@ static void nvme_loop_submit_async_event(struct nvme_ctrl *arg)
 	iod->cmd.common.command_id = NVME_AQ_BLK_MQ_DEPTH;
 	iod->cmd.common.flags |= NVME_CMD_SGL_METABUF;
 
-	if (!nvmet_req_init(&iod->req, &queue->nvme_cq, &queue->nvme_sq,
-			&nvme_loop_ops)) {
+	if (!nvmet_req_init(&iod->req, &queue->nvme_sq, &nvme_loop_ops)) {
 		dev_err(ctrl->ctrl.device, "failed async event work\n");
 		return;
 	}
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index d3795b09fcc4..df69a9dee71c 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -561,8 +561,8 @@ u32 nvmet_fabrics_admin_cmd_data_len(struct nvmet_req *req);
 u16 nvmet_parse_fabrics_io_cmd(struct nvmet_req *req);
 u32 nvmet_fabrics_io_cmd_data_len(struct nvmet_req *req);
 
-bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
-		struct nvmet_sq *sq, const struct nvmet_fabrics_ops *ops);
+bool nvmet_req_init(struct nvmet_req *req, struct nvmet_sq *sq,
+		const struct nvmet_fabrics_ops *ops);
 void nvmet_req_uninit(struct nvmet_req *req);
 size_t nvmet_req_transfer_len(struct nvmet_req *req);
 bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len);
diff --git a/drivers/nvme/target/pci-epf.c b/drivers/nvme/target/pci-epf.c
index e5030bca18ee..99a46b6bd231 100644
--- a/drivers/nvme/target/pci-epf.c
+++ b/drivers/nvme/target/pci-epf.c
@@ -1596,8 +1596,7 @@ static void nvmet_pci_epf_exec_iod_work(struct work_struct *work)
 		goto complete;
 	}
 
-	if (!nvmet_req_init(req, &iod->cq->nvme_cq, &iod->sq->nvme_sq,
-			    &nvmet_pci_epf_fabrics_ops))
+	if (!nvmet_req_init(req, &iod->sq->nvme_sq, &nvmet_pci_epf_fabrics_ops))
 		goto complete;
 
 	iod->data_len = nvmet_req_transfer_len(req);
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 2e5c32298818..432bdf7cd49e 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -976,8 +976,7 @@ static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
 		cmd->send_sge.addr, cmd->send_sge.length,
 		DMA_TO_DEVICE);
 
-	if (!nvmet_req_init(&cmd->req, &queue->nvme_cq,
-			&queue->nvme_sq, &nvmet_rdma_ops))
+	if (!nvmet_req_init(&cmd->req, &queue->nvme_sq, &nvmet_rdma_ops))
 		return;
 
 	status = nvmet_rdma_map_sgl(cmd);
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 2223cfd00b58..493e680ddcf7 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1077,8 +1077,7 @@ static int nvmet_tcp_done_recv_pdu(struct nvmet_tcp_queue *queue)
 	req = &queue->cmd->req;
 	memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd));
 
-	if (unlikely(!nvmet_req_init(req, &queue->nvme_cq,
-			&queue->nvme_sq, &nvmet_tcp_ops))) {
+	if (unlikely(!nvmet_req_init(req, &queue->nvme_sq, &nvmet_tcp_ops))) {
 		pr_err("failed cmd %p id %d opcode %d, data_len: %d, status: %04x\n",
 			req->cmd, req->cmd->common.command_id,
 			req->cmd->common.opcode,
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues
  2025-04-24  5:13 [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Wilfred Mallawa
                   ` (4 preceding siblings ...)
  2025-04-24  5:13 ` [PATCH 5/5] nvmet: Simplify nvmet_req_init() interface Wilfred Mallawa
@ 2025-04-24 13:16 ` Niklas Cassel
  2025-04-25  1:22   ` Damien Le Moal
  2025-04-29 12:56   ` Christoph Hellwig
  2025-05-07  7:32 ` Damien Le Moal
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 20+ messages in thread
From: Niklas Cassel @ 2025-04-24 13:16 UTC (permalink / raw)
  To: Wilfred Mallawa
  Cc: linux-nvme, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni, dlemoal, alistair.francis, Wilfred Mallawa

Hello Wilfred,

On Thu, Apr 24, 2025 at 03:13:48PM +1000, Wilfred Mallawa wrote:
> From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
> 
> Hi all,
> 
> For the NVMe PCI transport, the NVMe specification allows different
> submission queues (SQs) to share completion queues (CQs), however,
> this is not supported in the current NVMe target implementation.
> Until now, the nvmet target implementation enforced a 1:1 relationship
> between SQs and CQs, which is not specification compliant for the NVMe
> PCI transport.

Perhaps it is a bit too harsh to say that we are non-spec compliant.
We don't implement every feature in the NVMe spec in the Linux drivers
on purpose, because there is a lot of feature creep in NVMe.

Perhaps rephrase this to something like:
"While the nvmet target implementation only supports a 1:1 relationship
between SQs and CQs, the NVMe over PCIe Transport Specification does not
have this restriction."


> 
> This patch series adds support for CQ sharing between multiple SQs in the
> NVMe target driver, in line with the NVMe PCI transport specification.
> This series implements reference counting for completion queues to ensure
> proper lifecycle management when shared across multiple submission queues.
> This ensures that we retain CQs until all referencing SQs are deleted
> first, thereby avoiding premature CQ deletions.

The patches themselves look nice, however, I do think this cover letter is
missing the answer to the question "why?".

If you need to modify the host-side Linux kernel even to test this feature,
why are we doing this?

Is because we want to be nice to other host-side operating systems or NVMe
stacks, e.g. SPDK, with might actually use this feature?


Kind regards,
Niklas


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues
  2025-04-24 13:16 ` [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Niklas Cassel
@ 2025-04-25  1:22   ` Damien Le Moal
  2025-04-29 12:56   ` Christoph Hellwig
  1 sibling, 0 replies; 20+ messages in thread
From: Damien Le Moal @ 2025-04-25  1:22 UTC (permalink / raw)
  To: Niklas Cassel, Wilfred Mallawa
  Cc: linux-nvme, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni, alistair.francis, Wilfred Mallawa

On 4/24/25 22:16, Niklas Cassel wrote:
> Hello Wilfred,
> 
> On Thu, Apr 24, 2025 at 03:13:48PM +1000, Wilfred Mallawa wrote:
>> From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
>>
>> Hi all,
>>
>> For the NVMe PCI transport, the NVMe specification allows different
>> submission queues (SQs) to share completion queues (CQs), however,
>> this is not supported in the current NVMe target implementation.
>> Until now, the nvmet target implementation enforced a 1:1 relationship
>> between SQs and CQs, which is not specification compliant for the NVMe
>> PCI transport.
> 
> Perhaps it is a bit too harsh to say that we are non-spec compliant.
> We don't implement every feature in the NVMe spec in the Linux drivers
> on purpose, because there is a lot of feature creep in NVMe.
> 
> Perhaps rephrase this to something like:
> "While the nvmet target implementation only supports a 1:1 relationship
> between SQs and CQs, the NVMe over PCIe Transport Specification does not
> have this restriction."
> 
> 
>>
>> This patch series adds support for CQ sharing between multiple SQs in the
>> NVMe target driver, in line with the NVMe PCI transport specification.
>> This series implements reference counting for completion queues to ensure
>> proper lifecycle management when shared across multiple submission queues.
>> This ensures that we retain CQs until all referencing SQs are deleted
>> first, thereby avoiding premature CQ deletions.
> 
> The patches themselves look nice, however, I do think this cover letter is
> missing the answer to the question "why?".
> 
> If you need to modify the host-side Linux kernel even to test this feature,
> why are we doing this?
> 
> Is because we want to be nice to other host-side operating systems or NVMe
> stacks, e.g. SPDK, with might actually use this feature?

Yes. The Linux nvme pci host driver does not use CQ sharing, but other OSes may.

There is no urgency with this as no-one complained about this mandatory feature
not being supported. But it does not hurt to have this as it also adds nice
checks on the fabrics side to make sure that IO queues are being created and
deleted correctly.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues
  2025-04-24 13:16 ` [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Niklas Cassel
  2025-04-25  1:22   ` Damien Le Moal
@ 2025-04-29 12:56   ` Christoph Hellwig
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2025-04-29 12:56 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Wilfred Mallawa, linux-nvme, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Chaitanya Kulkarni, dlemoal, alistair.francis,
	Wilfred Mallawa

On Thu, Apr 24, 2025 at 03:16:54PM +0200, Niklas Cassel wrote:
> Perhaps it is a bit too harsh to say that we are non-spec compliant.

It is not too harsh but simply a true statement.  If the spec
compliance matters or not might be a different question.

> We don't implement every feature in the NVMe spec in the Linux drivers
> on purpose, because there is a lot of feature creep in NVMe.

That is a valid point for optional features.  Sharing a CQ for multiple
SQs is a mandatory feature in PCIe.

This should also get us to the point of running verification test
suite on the PCIe nvmet code, which I'm really looking forward to.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] nvmet: add a helper function for cqid checking
  2025-04-24  5:13 ` [PATCH 1/5] nvmet: add a helper function for cqid checking Wilfred Mallawa
@ 2025-05-07  7:20   ` Damien Le Moal
  0 siblings, 0 replies; 20+ messages in thread
From: Damien Le Moal @ 2025-05-07  7:20 UTC (permalink / raw)
  To: Wilfred Mallawa, linux-nvme, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Chaitanya Kulkarni
  Cc: alistair.francis, cassel, Wilfred Mallawa

On 4/24/25 2:13 PM, Wilfred Mallawa wrote:
> From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
> 
> This patch adds a new helper function nvmet_check_io_cqid(). It is to be
> used when parsing host commands for IO CQ creation/deletion and IO SQ
> creation to ensure that the specified IO completion queue identifier
> (CQID) is not 0 (Admin queue ID). This is a check that already occurs in
> the nvmet_execute_x() functions prior to nvmet_check_cqid.
> 
> With the addition of this helper function, the CQ ID checks in the
> nvmet_execute_x() function can be removed, and instead simply call
> nvmet_check_io_cqid() in place of nvmet_check_cqid().
> 
> Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] nvmet: cq: prepare for completion queue sharing
  2025-04-24  5:13 ` [PATCH 2/5] nvmet: cq: prepare for completion queue sharing Wilfred Mallawa
@ 2025-05-07  7:25   ` Damien Le Moal
  2025-05-07  7:34     ` Wilfred Mallawa
  0 siblings, 1 reply; 20+ messages in thread
From: Damien Le Moal @ 2025-05-07  7:25 UTC (permalink / raw)
  To: Wilfred Mallawa, linux-nvme, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Chaitanya Kulkarni
  Cc: alistair.francis, cassel, Wilfred Mallawa

On 4/24/25 2:13 PM, Wilfred Mallawa wrote:
> From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
> 
> For the PCI transport, the NVMe specification allows submission queues to
> share completion queues, however, this is not supported in the current
> NVMe target implementation. This is a preparatory patch to allow for
> completion queue (CQ) sharing between different submission queues (SQ).
> 
> To support queue sharing, reference counting completion queues is
> required. This patch adds the refcount_t field ref to struct nvmet_cq
> coupled with respective nvmet_cq_init(), nvmet_cq_get(), nvmet_cq_put(),
> nvmet_cq_is_deletable() and nvmet_cq_destroy() functions.
> 
> A CQ reference count is initialized with nvmet_cq_init() when a CQ is
> created. Using nvmet_cq_get(), a reference to a CQ is taken when an SQ is
> created that uses the respective CQ. Similarly. when an SQ is destroyed,
> the reference count to the respective CQ from the SQ being destroyed is
> decremented with nvmet_cq_put(). The last reference to a CQ is dropped
> on a CQ deletion using nvmet_cq_put(), which invokes nvmet_cq_destroy()
> to fully cleanup after the CQ. The helper function nvmet_cq_in_use() is
> used to determine if any SQs are still using the CQ pending deletion.
> In which case, the CQ must not be deleted. This should protect scenarios
> where a bad host may attempt to delete a CQ without first having deleted
> SQ(s) using that CQ.
> 
> Additionally, this patch adds an array of struct nvmet_cq to the
> nvmet_ctrl structure. This allows for the controller to keep track of CQs
> as they are created and destroyed, similar to the current tracking done
> for SQs. The memory for this array is freed when the controller is freed.
> A struct nvmet_ctrl reference is also added to the nvmet_cq structure to
> allow for CQs to be removed from the controller whilst keeping the new API
> similar to the existing API for SQs.

Looks good to me, modulo one nit below.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> diff --git a/drivers/nvme/target/pci-epf.c b/drivers/nvme/target/pci-epf.c
> index 7fab7f3d79b7..7dda4156d86c 100644
> --- a/drivers/nvme/target/pci-epf.c
> +++ b/drivers/nvme/target/pci-epf.c
> @@ -1346,6 +1346,7 @@ static u16 nvmet_pci_epf_delete_cq(struct nvmet_ctrl *tctrl, u16 cqid)
>  	nvmet_pci_epf_drain_queue(cq);
>  	nvmet_pci_epf_remove_irq_vector(ctrl, cq->vector);
>  	nvmet_pci_epf_mem_unmap(ctrl->nvme_epf, &cq->pci_map);
> +	tctrl->cqs[cqid] = NULL;

I do not think we need for this hunk in this patch as we are not yet using the
cqs array, and patch 4 removes this line to change it with a call to
nvmet_cq_put().

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] nvmet: fabrics: add CQ init and destroy
  2025-04-24  5:13 ` [PATCH 3/5] nvmet: fabrics: add CQ init and destroy Wilfred Mallawa
@ 2025-05-07  7:27   ` Damien Le Moal
  0 siblings, 0 replies; 20+ messages in thread
From: Damien Le Moal @ 2025-05-07  7:27 UTC (permalink / raw)
  To: Wilfred Mallawa, linux-nvme, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Chaitanya Kulkarni
  Cc: alistair.francis, cassel, Wilfred Mallawa

On 4/24/25 2:13 PM, Wilfred Mallawa wrote:
> From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
> 
> With struct nvmet_cq now having a reference count, this patch
> amends the target fabrics call chain to initialize and destroy/put a
> completion queue.
> 
> Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] nvmet: support completion queue sharing
  2025-04-24  5:13 ` [PATCH 4/5] nvmet: support completion queue sharing Wilfred Mallawa
@ 2025-05-07  7:29   ` Damien Le Moal
  0 siblings, 0 replies; 20+ messages in thread
From: Damien Le Moal @ 2025-05-07  7:29 UTC (permalink / raw)
  To: Wilfred Mallawa, linux-nvme, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Chaitanya Kulkarni
  Cc: alistair.francis, cassel, Wilfred Mallawa

On 4/24/25 2:13 PM, Wilfred Mallawa wrote:
> From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
> 
> The NVMe PCI transport specification allows for completion queues to be
> shared by different submission queues.
> 
> This patch allows a submission queue to keep track of the completion queue
> it is using with reference counting. As such, it can be ensured that a
> completion queue is not deleted while a submission queue is actively
> using it.
> 
> This patch enables completion queue sharing in the pci-epf target driver.
> For fabrics drivers, completion queue sharing is not enabled as it is
> not possible as per the fabrics specification. However, this patch
> modifies the fabrics drivers to correctly integrate the new API that
> supports completion queue sharing.
> 
> Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] nvmet: Simplify nvmet_req_init() interface
  2025-04-24  5:13 ` [PATCH 5/5] nvmet: Simplify nvmet_req_init() interface Wilfred Mallawa
@ 2025-05-07  7:29   ` Damien Le Moal
  0 siblings, 0 replies; 20+ messages in thread
From: Damien Le Moal @ 2025-05-07  7:29 UTC (permalink / raw)
  To: Wilfred Mallawa, linux-nvme, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Chaitanya Kulkarni
  Cc: alistair.francis, cassel, Wilfred Mallawa

On 4/24/25 2:13 PM, Wilfred Mallawa wrote:
> From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
> 
> Now that a submission queue holds a reference to its completion queue,
> there is no need to pass the cq argument to nvmet_req_init(), so remove it.
> 
> Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues
  2025-04-24  5:13 [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Wilfred Mallawa
                   ` (5 preceding siblings ...)
  2025-04-24 13:16 ` [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Niklas Cassel
@ 2025-05-07  7:32 ` Damien Le Moal
  2025-05-09  5:05 ` Christoph Hellwig
  2025-05-11 23:05 ` Chaitanya Kulkarni
  8 siblings, 0 replies; 20+ messages in thread
From: Damien Le Moal @ 2025-05-07  7:32 UTC (permalink / raw)
  To: Wilfred Mallawa, linux-nvme, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Chaitanya Kulkarni
  Cc: alistair.francis, cassel, Wilfred Mallawa

On 4/24/25 2:13 PM, Wilfred Mallawa wrote:
> From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
> 
> Hi all,
> 
> For the NVMe PCI transport, the NVMe specification allows different
> submission queues (SQs) to share completion queues (CQs), however,
> this is not supported in the current NVMe target implementation.
> Until now, the nvmet target implementation enforced a 1:1 relationship
> between SQs and CQs, which is not specification compliant for the NVMe
> PCI transport.
> 
> This patch series adds support for CQ sharing between multiple SQs in the
> NVMe target driver, in line with the NVMe PCI transport specification.
> This series implements reference counting for completion queues to ensure
> proper lifecycle management when shared across multiple submission queues.
> This ensures that we retain CQs until all referencing SQs are deleted
> first, thereby avoiding premature CQ deletions.

I reviewed this series but please note that I have been reviewing this code
off-list with Wilfred for a while. So given my familiarity with the code, I may
be missing things here and my review may not be as precise as it should.

Other reviews would be nice !

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] nvmet: cq: prepare for completion queue sharing
  2025-05-07  7:25   ` Damien Le Moal
@ 2025-05-07  7:34     ` Wilfred Mallawa
  2025-05-07  7:38       ` Damien Le Moal
  0 siblings, 1 reply; 20+ messages in thread
From: Wilfred Mallawa @ 2025-05-07  7:34 UTC (permalink / raw)
  To: kbusch@kernel.org, hch, sagi@grimberg.me, dlemoal@kernel.org,
	linux-nvme@lists.infradead.org, kch@nvidia.com
  Cc: Alistair Francis, cassel@kernel.org

On Wed, 2025-05-07 at 16:25 +0900, Damien Le Moal wrote:
> On 4/24/25 2:13 PM, Wilfred Mallawa wrote:
> > From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
> > 
> > For the PCI transport, the NVMe specification allows submission
> > queues to
> > share completion queues, however, this is not supported in the
> > current
> > NVMe target implementation. This is a preparatory patch to allow
> > for
> > completion queue (CQ) sharing between different submission queues
> > (SQ).
> > 
> > To support queue sharing, reference counting completion queues is
> > required. This patch adds the refcount_t field ref to struct
> > nvmet_cq
> > coupled with respective nvmet_cq_init(), nvmet_cq_get(),
> > nvmet_cq_put(),
> > nvmet_cq_is_deletable() and nvmet_cq_destroy() functions.
> > 
> > A CQ reference count is initialized with nvmet_cq_init() when a CQ
> > is
> > created. Using nvmet_cq_get(), a reference to a CQ is taken when an
> > SQ is
> > created that uses the respective CQ. Similarly. when an SQ is
> > destroyed,
> > the reference count to the respective CQ from the SQ being
> > destroyed is
> > decremented with nvmet_cq_put(). The last reference to a CQ is
> > dropped
> > on a CQ deletion using nvmet_cq_put(), which invokes
> > nvmet_cq_destroy()
> > to fully cleanup after the CQ. The helper function
> > nvmet_cq_in_use() is
> > used to determine if any SQs are still using the CQ pending
> > deletion.
> > In which case, the CQ must not be deleted. This should protect
> > scenarios
> > where a bad host may attempt to delete a CQ without first having
> > deleted
> > SQ(s) using that CQ.
> > 
> > Additionally, this patch adds an array of struct nvmet_cq to the
> > nvmet_ctrl structure. This allows for the controller to keep track
> > of CQs
> > as they are created and destroyed, similar to the current tracking
> > done
> > for SQs. The memory for this array is freed when the controller is
> > freed.
> > A struct nvmet_ctrl reference is also added to the nvmet_cq
> > structure to
> > allow for CQs to be removed from the controller whilst keeping the
> > new API
> > similar to the existing API for SQs.
> 
> Looks good to me, modulo one nit below.
> 
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> 
> > diff --git a/drivers/nvme/target/pci-epf.c
> > b/drivers/nvme/target/pci-epf.c
> > index 7fab7f3d79b7..7dda4156d86c 100644
> > --- a/drivers/nvme/target/pci-epf.c
> > +++ b/drivers/nvme/target/pci-epf.c
> > @@ -1346,6 +1346,7 @@ static u16 nvmet_pci_epf_delete_cq(struct
> > nvmet_ctrl *tctrl, u16 cqid)
> >  	nvmet_pci_epf_drain_queue(cq);
> >  	nvmet_pci_epf_remove_irq_vector(ctrl, cq->vector);
> >  	nvmet_pci_epf_mem_unmap(ctrl->nvme_epf, &cq->pci_map);
> > +	tctrl->cqs[cqid] = NULL;
> 
> I do not think we need for this hunk in this patch as we are not yet
> using the
> cqs array, and patch 4 removes this line to change it with a call to
> nvmet_cq_put().
> 
Hey Damien,

This is added to undo the effects of `nvmet_cq_setup()`. If we don't,
when a CQ is deleted and created (same id) it would incorrectly fail
the `create && ctrl->cqs[cqid]` check in nvmet_check_cqid().

Wilfred


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] nvmet: cq: prepare for completion queue sharing
  2025-05-07  7:34     ` Wilfred Mallawa
@ 2025-05-07  7:38       ` Damien Le Moal
  2025-05-07  7:47         ` Wilfred Mallawa
  0 siblings, 1 reply; 20+ messages in thread
From: Damien Le Moal @ 2025-05-07  7:38 UTC (permalink / raw)
  To: Wilfred Mallawa, kbusch@kernel.org, hch, sagi@grimberg.me,
	linux-nvme@lists.infradead.org, kch@nvidia.com
  Cc: Alistair Francis, cassel@kernel.org

On 5/7/25 4:34 PM, Wilfred Mallawa wrote:
>>> diff --git a/drivers/nvme/target/pci-epf.c
>>> b/drivers/nvme/target/pci-epf.c
>>> index 7fab7f3d79b7..7dda4156d86c 100644
>>> --- a/drivers/nvme/target/pci-epf.c
>>> +++ b/drivers/nvme/target/pci-epf.c
>>> @@ -1346,6 +1346,7 @@ static u16 nvmet_pci_epf_delete_cq(struct
>>> nvmet_ctrl *tctrl, u16 cqid)
>>>  	nvmet_pci_epf_drain_queue(cq);
>>>  	nvmet_pci_epf_remove_irq_vector(ctrl, cq->vector);
>>>  	nvmet_pci_epf_mem_unmap(ctrl->nvme_epf, &cq->pci_map);
>>> +	tctrl->cqs[cqid] = NULL;
>>
>> I do not think we need for this hunk in this patch as we are not yet
>> using the
>> cqs array, and patch 4 removes this line to change it with a call to
>> nvmet_cq_put().
>>
> Hey Damien,
> 
> This is added to undo the effects of `nvmet_cq_setup()`. If we don't,
> when a CQ is deleted and created (same id) it would incorrectly fail
> the `create && ctrl->cqs[cqid]` check in nvmet_check_cqid().

Oops. Yes, absolutely. Forgot that we already are calling nvmet_cq_setup().
Need more coffee :)

Sorry for the noise.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] nvmet: cq: prepare for completion queue sharing
  2025-05-07  7:38       ` Damien Le Moal
@ 2025-05-07  7:47         ` Wilfred Mallawa
  0 siblings, 0 replies; 20+ messages in thread
From: Wilfred Mallawa @ 2025-05-07  7:47 UTC (permalink / raw)
  To: kbusch@kernel.org, hch, sagi@grimberg.me, dlemoal@kernel.org,
	linux-nvme@lists.infradead.org, kch@nvidia.com
  Cc: Alistair Francis, cassel@kernel.org

On Wed, 2025-05-07 at 16:38 +0900, Damien Le Moal wrote:
> On 5/7/25 4:34 PM, Wilfred Mallawa wrote:
> > > > diff --git a/drivers/nvme/target/pci-epf.c
> > > > b/drivers/nvme/target/pci-epf.c
> > > > index 7fab7f3d79b7..7dda4156d86c 100644
> > > > --- a/drivers/nvme/target/pci-epf.c
> > > > +++ b/drivers/nvme/target/pci-epf.c
> > > > @@ -1346,6 +1346,7 @@ static u16 nvmet_pci_epf_delete_cq(struct
> > > > nvmet_ctrl *tctrl, u16 cqid)
> > > >  	nvmet_pci_epf_drain_queue(cq);
> > > >  	nvmet_pci_epf_remove_irq_vector(ctrl, cq->vector);
> > > >  	nvmet_pci_epf_mem_unmap(ctrl->nvme_epf, &cq->pci_map);
> > > > +	tctrl->cqs[cqid] = NULL;
> > > 
> > > I do not think we need for this hunk in this patch as we are not
> > > yet
> > > using the
> > > cqs array, and patch 4 removes this line to change it with a call
> > > to
> > > nvmet_cq_put().
> > > 
> > Hey Damien,
> > 
> > This is added to undo the effects of `nvmet_cq_setup()`. If we
> > don't,
> > when a CQ is deleted and created (same id) it would incorrectly
> > fail
> > the `create && ctrl->cqs[cqid]` check in nvmet_check_cqid().
> 
> Oops. Yes, absolutely. Forgot that we already are calling
> nvmet_cq_setup().
> Need more coffee :)
ha! yep me too :P
> 
> Sorry for the noise.
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues
  2025-04-24  5:13 [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Wilfred Mallawa
                   ` (6 preceding siblings ...)
  2025-05-07  7:32 ` Damien Le Moal
@ 2025-05-09  5:05 ` Christoph Hellwig
  2025-05-11 23:05 ` Chaitanya Kulkarni
  8 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2025-05-09  5:05 UTC (permalink / raw)
  To: Wilfred Mallawa
  Cc: linux-nvme, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni, dlemoal, alistair.francis, cassel,
	Wilfred Mallawa

Thanks,

applied to nvme-6.16.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues
  2025-04-24  5:13 [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Wilfred Mallawa
                   ` (7 preceding siblings ...)
  2025-05-09  5:05 ` Christoph Hellwig
@ 2025-05-11 23:05 ` Chaitanya Kulkarni
  8 siblings, 0 replies; 20+ messages in thread
From: Chaitanya Kulkarni @ 2025-05-11 23:05 UTC (permalink / raw)
  To: Wilfred Mallawa, linux-nvme@lists.infradead.org, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni
  Cc: dlemoal@kernel.org, alistair.francis@wdc.com, cassel@kernel.org,
	Wilfred Mallawa

On 4/23/25 22:13, Wilfred Mallawa wrote:
> From: Wilfred Mallawa<wilfred.mallawa@wdc.com>
>
> Hi all,
>
> For the NVMe PCI transport, the NVMe specification allows different
> submission queues (SQs) to share completion queues (CQs), however,
> this is not supported in the current NVMe target implementation.
> Until now, the nvmet target implementation enforced a 1:1 relationship
> between SQs and CQs, which is not specification compliant for the NVMe
> PCI transport.
>
> This patch series adds support for CQ sharing between multiple SQs in the
> NVMe target driver, in line with the NVMe PCI transport specification.
> This series implements reference counting for completion queues to ensure
> proper lifecycle management when shared across multiple submission queues.
> This ensures that we retain CQs until all referencing SQs are deleted
> first, thereby avoiding premature CQ deletions.
>
> Sample callchain with CQ refcounting for the PCI endpoint target
> (pci-epf)
> =================================================================
>
> i.   nvmet_execute_create_cq -> nvmet_pci_epf_create_cq -> nvmet_cq_create
>       -> nvmet_cq_init			[cq refcount = 1]
>
> ii.  nvmet_execute_create_sq -> nvmet_pci_epf_create_sq -> nvmet_sq_create
>       -> nvmet_sq_init -> nvmet_cq_get	[cq refcount = 2]
>
> iii. nvmet_execute_delete_sq -> nvmet_pci_epf_delete_sq ->
>       nvmet_sq_destroy -> nvmet_cq_put	[cq refcount = 1]
>
> iv.  nvmet_execute_delete_cq -> nvmet_pci_epf_delete_cq -> nvmet_cq_put
> 					[cq refcount = 0]
>
> For NVMe over fabrics, CQ sharing is not supported per specification,
> however, the fabrics drivers are updated to integrate the new
> API changes. No functional change is intended here.
>
> Testing
> =======
>
> Core functionality changes were tested with a Rockchip-based Rock5B PCIe
> endpoint setup using the pci-epf driver. The host kernel was modified to
> support queue sharing. In the test setup, this resulted in IO SQs 1 & 2
> using IO CQ 1 and IO SQ 3 & 4 using IO CQ 2.
>
> Testing methodology includes:
>
> For PCI:
>
> 1. Boot up host
> 2. Assert that the endpoint device is detected as an NVMe drive
>     (IO CQs/SQs are created)
> 3. Run FIOs
> 4. Unbind NVMe driver (IO SQs then CQs are deleted)
> 5. Rebind NVMe driver (IO SQs then CQs are created)
> 6. Run FIOs
>
> For NVMe over fabrics: Using NVMe loop driver:
>
> Note that there is no queue sharing supported for fabrics.
>
> 1. Connect command (IO queues are created)
> 2. Run FIOs
> 3. Disconnect command (IO queues are deleted)
>
> Thanks!

FWIW,

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-05-11 23:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-24  5:13 [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Wilfred Mallawa
2025-04-24  5:13 ` [PATCH 1/5] nvmet: add a helper function for cqid checking Wilfred Mallawa
2025-05-07  7:20   ` Damien Le Moal
2025-04-24  5:13 ` [PATCH 2/5] nvmet: cq: prepare for completion queue sharing Wilfred Mallawa
2025-05-07  7:25   ` Damien Le Moal
2025-05-07  7:34     ` Wilfred Mallawa
2025-05-07  7:38       ` Damien Le Moal
2025-05-07  7:47         ` Wilfred Mallawa
2025-04-24  5:13 ` [PATCH 3/5] nvmet: fabrics: add CQ init and destroy Wilfred Mallawa
2025-05-07  7:27   ` Damien Le Moal
2025-04-24  5:13 ` [PATCH 4/5] nvmet: support completion queue sharing Wilfred Mallawa
2025-05-07  7:29   ` Damien Le Moal
2025-04-24  5:13 ` [PATCH 5/5] nvmet: Simplify nvmet_req_init() interface Wilfred Mallawa
2025-05-07  7:29   ` Damien Le Moal
2025-04-24 13:16 ` [PATCH 0/5] pci: nvmet: support completion queue sharing by multiple submission queues Niklas Cassel
2025-04-25  1:22   ` Damien Le Moal
2025-04-29 12:56   ` Christoph Hellwig
2025-05-07  7:32 ` Damien Le Moal
2025-05-09  5:05 ` Christoph Hellwig
2025-05-11 23:05 ` Chaitanya Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox