misc cleanups for nvme-pci

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* misc cleanups for nvme-pci
@ 2025-05-13  7:00 Christoph Hellwig
  2025-05-13  7:00 ` [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:00 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme

Hi all,

this has misc cleanups for the nvme-pci I/O code.  Most of this was done
as part of the new DMA API work.  Which then clashed with the per-node
dmapools, so I'll try to get it out now instead of having to do another
rebase.  As part of that this also includes minor tidyups for the new code
from the per-node dmapools.

Diffstat:
 pci.c |  196 ++++++++++++++++++++++++++++++++----------------------------------
 1 file changed, 97 insertions(+), 99 deletions(-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue
  2025-05-13  7:00 misc cleanups for nvme-pci Christoph Hellwig
@ 2025-05-13  7:00 ` Christoph Hellwig
  2025-05-13  7:28   ` Leon Romanovsky
  2025-05-13  7:00 ` [PATCH 2/7] nvme-pci: store aborted state in flags variable Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:00 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme

No admin command defined in an NVMe specification supports metadata,
but to protect against vendor specific commands using metadata ensure
that we don't try to use SGLs for metadata on the admin queue, as NVMe
does not support SGLs on the admin queue for the PCI transport.  Do
this by checking if the data transfer has been setup using SGLs as
that is required for using SGLs for metadata.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e6b6b6ee0878..24f5292a349b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -971,7 +971,10 @@ static blk_status_t nvme_pci_setup_meta_mptr(struct nvme_dev *dev,
 
 static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req)
 {
-	if (nvme_pci_metadata_use_sgls(dev, req))
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+
+	if ((iod->cmd.common.flags & NVME_CMD_SGL_METABUF) &&
+	    nvme_pci_metadata_use_sgls(dev, req))
 		return nvme_pci_setup_meta_sgls(dev, req);
 	return nvme_pci_setup_meta_mptr(dev, req);
 }
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue
  2025-05-13  7:00 ` [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue Christoph Hellwig
@ 2025-05-13  7:28   ` Leon Romanovsky
  2025-05-13  7:39     ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Leon Romanovsky @ 2025-05-13  7:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Caleb Sander Mateos, linux-nvme

On Tue, May 13, 2025 at 09:00:17AM +0200, Christoph Hellwig wrote:
> No admin command defined in an NVMe specification supports metadata,
> but to protect against vendor specific commands using metadata ensure
> that we don't try to use SGLs for metadata on the admin queue, as NVMe
> does not support SGLs on the admin queue for the PCI transport.  Do
> this by checking if the data transfer has been setup using SGLs as
> that is required for using SGLs for metadata.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index e6b6b6ee0878..24f5292a349b 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -971,7 +971,10 @@ static blk_status_t nvme_pci_setup_meta_mptr(struct nvme_dev *dev,
>  
>  static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req)
>  {
> -	if (nvme_pci_metadata_use_sgls(dev, req))
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +
> +	if ((iod->cmd.common.flags & NVME_CMD_SGL_METABUF) &&

Won't it be more future error proof to check for NVME_CMD_SGL_ALL?

Anyway current proposal is also fine.

Thanks,
Reviewed-by: Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue
  2025-05-13  7:28   ` Leon Romanovsky
@ 2025-05-13  7:39     ` Christoph Hellwig
  2025-05-13  7:45       ` Leon Romanovsky
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:39 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Caleb Sander Mateos, linux-nvme

On Tue, May 13, 2025 at 10:28:42AM +0300, Leon Romanovsky wrote:
> > -	if (nvme_pci_metadata_use_sgls(dev, req))
> > +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> > +
> > +	if ((iod->cmd.common.flags & NVME_CMD_SGL_METABUF) &&
> 
> Won't it be more future error proof to check for NVME_CMD_SGL_ALL?

No really, as this is used to decide to call into the routine that
sets NVME_CMD_SGL_METASEG.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue
  2025-05-13  7:39     ` Christoph Hellwig
@ 2025-05-13  7:45       ` Leon Romanovsky
  2025-05-13  7:47         ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Leon Romanovsky @ 2025-05-13  7:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Caleb Sander Mateos, linux-nvme

On Tue, May 13, 2025 at 09:39:08AM +0200, Christoph Hellwig wrote:
> On Tue, May 13, 2025 at 10:28:42AM +0300, Leon Romanovsky wrote:
> > > -	if (nvme_pci_metadata_use_sgls(dev, req))
> > > +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> > > +
> > > +	if ((iod->cmd.common.flags & NVME_CMD_SGL_METABUF) &&
> > 
> > Won't it be more future error proof to check for NVME_CMD_SGL_ALL?
> 
> No really, as this is used to decide to call into the routine that
> sets NVME_CMD_SGL_METASEG.

I see same NVME_CMD_SGL_ALL check in target code.

  1642 u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
  1643 {

<...>

  1656         /* For PCI controllers, admin commands shall not use SGL. */
  1657         if (nvmet_is_pci_ctrl(req->sq->ctrl) && !req->sq->qid &&
  1658             cmd->common.flags & NVME_CMD_SGL_ALL)
  1659                 return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;

Thanks


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue
  2025-05-13  7:45       ` Leon Romanovsky
@ 2025-05-13  7:47         ` Christoph Hellwig
  2025-05-13  7:51           ` Leon Romanovsky
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:47 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Caleb Sander Mateos, linux-nvme

On Tue, May 13, 2025 at 10:45:15AM +0300, Leon Romanovsky wrote:
> On Tue, May 13, 2025 at 09:39:08AM +0200, Christoph Hellwig wrote:
> > On Tue, May 13, 2025 at 10:28:42AM +0300, Leon Romanovsky wrote:
> > > > -	if (nvme_pci_metadata_use_sgls(dev, req))
> > > > +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> > > > +
> > > > +	if ((iod->cmd.common.flags & NVME_CMD_SGL_METABUF) &&
> > > 
> > > Won't it be more future error proof to check for NVME_CMD_SGL_ALL?
> > 
> > No really, as this is used to decide to call into the routine that
> > sets NVME_CMD_SGL_METASEG.
> 
> I see same NVME_CMD_SGL_ALL check in target code.

Yeah.  But the above check guards the call to nvme_pci_setup_meta_sgls,
which takes a command and updates the sgls field in flags from
NVME_CMD_SGL_METABUF to NVME_CMD_SGL_METASEG, and no other place sets
NVME_CMD_SGL_METASEG. So NVME_CMD_SGL_METASEG can't be set at this
point by definition.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue
  2025-05-13  7:47         ` Christoph Hellwig
@ 2025-05-13  7:51           ` Leon Romanovsky
  0 siblings, 0 replies; 38+ messages in thread
From: Leon Romanovsky @ 2025-05-13  7:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Caleb Sander Mateos, linux-nvme

On Tue, May 13, 2025 at 09:47:11AM +0200, Christoph Hellwig wrote:
> On Tue, May 13, 2025 at 10:45:15AM +0300, Leon Romanovsky wrote:
> > On Tue, May 13, 2025 at 09:39:08AM +0200, Christoph Hellwig wrote:
> > > On Tue, May 13, 2025 at 10:28:42AM +0300, Leon Romanovsky wrote:
> > > > > -	if (nvme_pci_metadata_use_sgls(dev, req))
> > > > > +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> > > > > +
> > > > > +	if ((iod->cmd.common.flags & NVME_CMD_SGL_METABUF) &&
> > > > 
> > > > Won't it be more future error proof to check for NVME_CMD_SGL_ALL?
> > > 
> > > No really, as this is used to decide to call into the routine that
> > > sets NVME_CMD_SGL_METASEG.
> > 
> > I see same NVME_CMD_SGL_ALL check in target code.
> 
> Yeah.  But the above check guards the call to nvme_pci_setup_meta_sgls,
> which takes a command and updates the sgls field in flags from
> NVME_CMD_SGL_METABUF to NVME_CMD_SGL_METASEG, and no other place sets
> NVME_CMD_SGL_METASEG. So NVME_CMD_SGL_METASEG can't be set at this
> point by definition.

Thanks


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/7] nvme-pci: store aborted state in flags variable
  2025-05-13  7:00 misc cleanups for nvme-pci Christoph Hellwig
  2025-05-13  7:00 ` [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue Christoph Hellwig
@ 2025-05-13  7:00 ` Christoph Hellwig
  2025-05-13 14:13   ` Kanchan Joshi/Kanchan Joshi
  2025-05-13  7:00 ` [PATCH 3/7] nvme-pci: remove struct nvme_descriptor Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:00 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme, Leon Romanovsky

From: Leon Romanovsky <leon@kernel.org>

Instead of keeping dedicated "bool aborted" variable, let's reuse
newly introduced flags variable and save space.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 24f5292a349b..51430f5e6a66 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -230,6 +230,12 @@ union nvme_descriptor {
 	__le64			*prp_list;
 };
 
+/* bits for iod->flags */
+enum nvme_iod_flags {
+	/* this command has been aborted by the timeout handler */
+	IOD_ABORTED		= 1U << 0,
+};
+
 /*
  * The nvme_iod describes the data in an I/O.
  *
@@ -239,7 +245,7 @@ union nvme_descriptor {
 struct nvme_iod {
 	struct nvme_request req;
 	struct nvme_command cmd;
-	bool aborted;
+	u8 flags;
 	s8 nr_allocations;	/* PRP list pool allocations. 0 means small
 				   pool in use */
 	unsigned int dma_len;	/* length of single DMA segment mapping */
@@ -984,7 +990,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	blk_status_t ret;
 
-	iod->aborted = false;
+	iod->flags = 0;
 	iod->nr_allocations = -1;
 	iod->sgt.nents = 0;
 	iod->meta_sgt.nents = 0;
@@ -1551,7 +1557,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
 	 * returned to the driver, or if this is the admin queue.
 	 */
 	opcode = nvme_req(req)->cmd->common.opcode;
-	if (!nvmeq->qid || iod->aborted) {
+	if (!nvmeq->qid || (iod->flags & IOD_ABORTED)) {
 		dev_warn(dev->ctrl.device,
 			 "I/O tag %d (%04x) opcode %#x (%s) QID %d timeout, reset controller\n",
 			 req->tag, nvme_cid(req), opcode,
@@ -1564,7 +1570,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
 		atomic_inc(&dev->ctrl.abort_limit);
 		return BLK_EH_RESET_TIMER;
 	}
-	iod->aborted = true;
+	iod->flags |= IOD_ABORTED;
 
 	cmd.abort.opcode = nvme_admin_abort_cmd;
 	cmd.abort.cid = nvme_cid(req);
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] nvme-pci: store aborted state in flags variable
  2025-05-13  7:00 ` [PATCH 2/7] nvme-pci: store aborted state in flags variable Christoph Hellwig
@ 2025-05-13 14:13   ` Kanchan Joshi/Kanchan Joshi
  2025-05-14  0:44     ` Caleb Sander Mateos
  2025-05-14  5:06     ` Christoph Hellwig
  0 siblings, 2 replies; 38+ messages in thread
From: Kanchan Joshi/Kanchan Joshi @ 2025-05-13 14:13 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme, Leon Romanovsky

On 5/13/2025 12:30 PM, Christoph Hellwig wrote:
> From: Leon Romanovsky<leon@kernel.org>
> 
> Instead of keeping dedicated "bool aborted" variable, let's reuse
> newly introduced flags variable and save space.

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>

Nit: I would not call it 'reuse' as flags (within iod) did not exist 
before this patch.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] nvme-pci: store aborted state in flags variable
  2025-05-13 14:13   ` Kanchan Joshi/Kanchan Joshi
@ 2025-05-14  0:44     ` Caleb Sander Mateos
  2025-05-14  5:06     ` Christoph Hellwig
  1 sibling, 0 replies; 38+ messages in thread
From: Caleb Sander Mateos @ 2025-05-14  0:44 UTC (permalink / raw)
  To: Kanchan Joshi/Kanchan Joshi
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, Leon Romanovsky,
	linux-nvme, Leon Romanovsky

On Tue, May 13, 2025 at 7:13 AM Kanchan Joshi/Kanchan Joshi
<joshi.k@samsung.com> wrote:
>
> On 5/13/2025 12:30 PM, Christoph Hellwig wrote:
> > From: Leon Romanovsky<leon@kernel.org>
> >
> > Instead of keeping dedicated "bool aborted" variable, let's reuse
> > newly introduced flags variable and save space.
>
> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
>
> Nit: I would not call it 'reuse' as flags (within iod) did not exist
> before this patch.

Same comment about the commit message. Other than that,

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] nvme-pci: store aborted state in flags variable
  2025-05-13 14:13   ` Kanchan Joshi/Kanchan Joshi
  2025-05-14  0:44     ` Caleb Sander Mateos
@ 2025-05-14  5:06     ` Christoph Hellwig
  1 sibling, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-14  5:06 UTC (permalink / raw)
  To: Kanchan Joshi/Kanchan Joshi
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Caleb Sander Mateos, Leon Romanovsky, linux-nvme, Leon Romanovsky

On Tue, May 13, 2025 at 07:43:43PM +0530, Kanchan Joshi/Kanchan Joshi wrote:
> On 5/13/2025 12:30 PM, Christoph Hellwig wrote:
> > From: Leon Romanovsky<leon@kernel.org>
> > 
> > Instead of keeping dedicated "bool aborted" variable, let's reuse
> > newly introduced flags variable and save space.
> 
> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
> 
> Nit: I would not call it 'reuse' as flags (within iod) did not exist 
> before this patch.

It was when Leon wrote the patch, but my patch reshuffling changed that.
I'll fix it up to match the reality.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 3/7] nvme-pci: remove struct nvme_descriptor
  2025-05-13  7:00 misc cleanups for nvme-pci Christoph Hellwig
  2025-05-13  7:00 ` [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue Christoph Hellwig
  2025-05-13  7:00 ` [PATCH 2/7] nvme-pci: store aborted state in flags variable Christoph Hellwig
@ 2025-05-13  7:00 ` Christoph Hellwig
  2025-05-13 14:15   ` Kanchan Joshi/Kanchan Joshi
  2025-05-14  1:07   ` Caleb Sander Mateos
  2025-05-13  7:00 ` [PATCH 4/7] nvme-pci: rename the descriptor pools Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:00 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme, Jens Axboe

There is no real point in having a union of two pointer types here, just
use a void pointer as we mix and match types between the arms of the
union between the allocation and freeing side already.

Also rename the nr_allocations field to nr_descriptors to better describe
what it does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[leon: ported forward to include metadata SGL support]
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/nvme/host/pci.c | 57 +++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 33 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 51430f5e6a66..fe859c31e765 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -44,7 +44,7 @@
 #define NVME_MAX_KB_SZ	8192
 #define NVME_MAX_SEGS	128
 #define NVME_MAX_META_SEGS 15
-#define NVME_MAX_NR_ALLOCATIONS	5
+#define NVME_MAX_NR_DESCRIPTORS	5
 
 static int use_threaded_interrupts;
 module_param(use_threaded_interrupts, int, 0444);
@@ -225,11 +225,6 @@ struct nvme_queue {
 	struct completion delete_done;
 };
 
-union nvme_descriptor {
-	struct nvme_sgl_desc	*sg_list;
-	__le64			*prp_list;
-};
-
 /* bits for iod->flags */
 enum nvme_iod_flags {
 	/* this command has been aborted by the timeout handler */
@@ -238,23 +233,19 @@ enum nvme_iod_flags {
 
 /*
  * The nvme_iod describes the data in an I/O.
- *
- * The sg pointer contains the list of PRP/SGL chunk allocations in addition
- * to the actual struct scatterlist.
  */
 struct nvme_iod {
 	struct nvme_request req;
 	struct nvme_command cmd;
 	u8 flags;
-	s8 nr_allocations;	/* PRP list pool allocations. 0 means small
-				   pool in use */
+	s8 nr_descriptors;
 	unsigned int dma_len;	/* length of single DMA segment mapping */
 	dma_addr_t first_dma;
 	dma_addr_t meta_dma;
 	struct sg_table sgt;
 	struct sg_table meta_sgt;
-	union nvme_descriptor meta_list;
-	union nvme_descriptor list[NVME_MAX_NR_ALLOCATIONS];
+	void *meta_descriptor;
+	void *descriptors[NVME_MAX_NR_DESCRIPTORS];
 };
 
 static inline unsigned int nvme_dbbuf_size(struct nvme_dev *dev)
@@ -607,8 +598,8 @@ static void nvme_free_prps(struct nvme_queue *nvmeq, struct request *req)
 	dma_addr_t dma_addr = iod->first_dma;
 	int i;
 
-	for (i = 0; i < iod->nr_allocations; i++) {
-		__le64 *prp_list = iod->list[i].prp_list;
+	for (i = 0; i < iod->nr_descriptors; i++) {
+		__le64 *prp_list = iod->descriptors[i];
 		dma_addr_t next_dma_addr = le64_to_cpu(prp_list[last_prp]);
 
 		dma_pool_free(nvmeq->prp_pools.large, prp_list, dma_addr);
@@ -631,11 +622,11 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_queue *nvmeq,
 
 	dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0);
 
-	if (iod->nr_allocations == 0)
-		dma_pool_free(nvmeq->prp_pools.small, iod->list[0].sg_list,
+	if (iod->nr_descriptors == 0)
+		dma_pool_free(nvmeq->prp_pools.small, iod->descriptors[0],
 			      iod->first_dma);
-	else if (iod->nr_allocations == 1)
-		dma_pool_free(nvmeq->prp_pools.large, iod->list[0].sg_list,
+	else if (iod->nr_descriptors == 1)
+		dma_pool_free(nvmeq->prp_pools.large, iod->descriptors[0],
 			      iod->first_dma);
 	else
 		nvme_free_prps(nvmeq, req);
@@ -693,18 +684,18 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
 	nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
 	if (nprps <= (256 / 8)) {
 		pool = nvmeq->prp_pools.small;
-		iod->nr_allocations = 0;
+		iod->nr_descriptors = 0;
 	} else {
 		pool = nvmeq->prp_pools.large;
-		iod->nr_allocations = 1;
+		iod->nr_descriptors = 1;
 	}
 
 	prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
 	if (!prp_list) {
-		iod->nr_allocations = -1;
+		iod->nr_descriptors = -1;
 		return BLK_STS_RESOURCE;
 	}
-	iod->list[0].prp_list = prp_list;
+	iod->descriptors[0] = prp_list;
 	iod->first_dma = prp_dma;
 	i = 0;
 	for (;;) {
@@ -713,7 +704,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
 			prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
 			if (!prp_list)
 				goto free_prps;
-			iod->list[iod->nr_allocations++].prp_list = prp_list;
+			iod->descriptors[iod->nr_descriptors++] = prp_list;
 			prp_list[0] = old_prp_list[i - 1];
 			old_prp_list[i - 1] = cpu_to_le64(prp_dma);
 			i = 1;
@@ -783,19 +774,19 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq,
 
 	if (entries <= (256 / sizeof(struct nvme_sgl_desc))) {
 		pool = nvmeq->prp_pools.small;
-		iod->nr_allocations = 0;
+		iod->nr_descriptors = 0;
 	} else {
 		pool = nvmeq->prp_pools.large;
-		iod->nr_allocations = 1;
+		iod->nr_descriptors = 1;
 	}
 
 	sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma);
 	if (!sg_list) {
-		iod->nr_allocations = -1;
+		iod->nr_descriptors = -1;
 		return BLK_STS_RESOURCE;
 	}
 
-	iod->list[0].sg_list = sg_list;
+	iod->descriptors[0] = sg_list;
 	iod->first_dma = sgl_dma;
 
 	nvme_pci_sgl_set_seg(&cmd->dptr.sgl, sgl_dma, entries);
@@ -935,7 +926,7 @@ static blk_status_t nvme_pci_setup_meta_sgls(struct nvme_dev *dev,
 		goto out_unmap_sg;
 
 	entries = iod->meta_sgt.nents;
-	iod->meta_list.sg_list = sg_list;
+	iod->meta_descriptor = sg_list;
 	iod->meta_dma = sgl_dma;
 
 	cmnd->flags = NVME_CMD_SGL_METASEG;
@@ -991,7 +982,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	blk_status_t ret;
 
 	iod->flags = 0;
-	iod->nr_allocations = -1;
+	iod->nr_descriptors = -1;
 	iod->sgt.nents = 0;
 	iod->meta_sgt.nents = 0;
 
@@ -1117,8 +1108,8 @@ static __always_inline void nvme_unmap_metadata(struct nvme_dev *dev,
 		return;
 	}
 
-	dma_pool_free(nvmeq->prp_pools.small, iod->meta_list.sg_list,
-		      iod->meta_dma);
+	dma_pool_free(nvmeq->prp_pools.small, iod->meta_descriptor,
+			iod->meta_dma);
 	dma_unmap_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), 0);
 	mempool_free(iod->meta_sgt.sgl, dev->iod_meta_mempool);
 }
@@ -3842,7 +3833,7 @@ static int __init nvme_init(void)
 	BUILD_BUG_ON(IRQ_AFFINITY_MAX_SETS < 2);
 	BUILD_BUG_ON(NVME_MAX_SEGS > SGES_PER_PAGE);
 	BUILD_BUG_ON(sizeof(struct scatterlist) * NVME_MAX_SEGS > PAGE_SIZE);
-	BUILD_BUG_ON(nvme_pci_npages_prp() > NVME_MAX_NR_ALLOCATIONS);
+	BUILD_BUG_ON(nvme_pci_npages_prp() > NVME_MAX_NR_DESCRIPTORS);
 
 	return pci_register_driver(&nvme_driver);
 }
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] nvme-pci: remove struct nvme_descriptor
  2025-05-13  7:00 ` [PATCH 3/7] nvme-pci: remove struct nvme_descriptor Christoph Hellwig
@ 2025-05-13 14:15   ` Kanchan Joshi/Kanchan Joshi
  2025-05-14  1:07   ` Caleb Sander Mateos
  1 sibling, 0 replies; 38+ messages in thread
From: Kanchan Joshi/Kanchan Joshi @ 2025-05-13 14:15 UTC (permalink / raw)
  To: linux-nvme

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] nvme-pci: remove struct nvme_descriptor
  2025-05-13  7:00 ` [PATCH 3/7] nvme-pci: remove struct nvme_descriptor Christoph Hellwig
  2025-05-13 14:15   ` Kanchan Joshi/Kanchan Joshi
@ 2025-05-14  1:07   ` Caleb Sander Mateos
  2025-05-14  5:15     ` Christoph Hellwig
  1 sibling, 1 reply; 38+ messages in thread
From: Caleb Sander Mateos @ 2025-05-14  1:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Leon Romanovsky, linux-nvme,
	Jens Axboe

On Tue, May 13, 2025 at 12:00 AM Christoph Hellwig <hch@lst.de> wrote:
>
> There is no real point in having a union of two pointer types here, just
> use a void pointer as we mix and match types between the arms of the
> union between the allocation and freeing side already.
>
> Also rename the nr_allocations field to nr_descriptors to better describe
> what it does.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> [leon: ported forward to include metadata SGL support]
> Signed-off-by: Leon Romanovsky <leon@kernel.org>
> Tested-by: Jens Axboe <axboe@kernel.dk>
> ---
>  drivers/nvme/host/pci.c | 57 +++++++++++++++++------------------------
>  1 file changed, 24 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 51430f5e6a66..fe859c31e765 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -44,7 +44,7 @@
>  #define NVME_MAX_KB_SZ 8192
>  #define NVME_MAX_SEGS  128
>  #define NVME_MAX_META_SEGS 15
> -#define NVME_MAX_NR_ALLOCATIONS        5
> +#define NVME_MAX_NR_DESCRIPTORS        5
>
>  static int use_threaded_interrupts;
>  module_param(use_threaded_interrupts, int, 0444);
> @@ -225,11 +225,6 @@ struct nvme_queue {
>         struct completion delete_done;
>  };
>
> -union nvme_descriptor {
> -       struct nvme_sgl_desc    *sg_list;
> -       __le64                  *prp_list;
> -};
> -
>  /* bits for iod->flags */
>  enum nvme_iod_flags {
>         /* this command has been aborted by the timeout handler */
> @@ -238,23 +233,19 @@ enum nvme_iod_flags {
>
>  /*
>   * The nvme_iod describes the data in an I/O.
> - *
> - * The sg pointer contains the list of PRP/SGL chunk allocations in addition
> - * to the actual struct scatterlist.
>   */
>  struct nvme_iod {
>         struct nvme_request req;
>         struct nvme_command cmd;
>         u8 flags;
> -       s8 nr_allocations;      /* PRP list pool allocations. 0 means small
> -                                  pool in use */
> +       s8 nr_descriptors;
>         unsigned int dma_len;   /* length of single DMA segment mapping */
>         dma_addr_t first_dma;
>         dma_addr_t meta_dma;
>         struct sg_table sgt;
>         struct sg_table meta_sgt;
> -       union nvme_descriptor meta_list;
> -       union nvme_descriptor list[NVME_MAX_NR_ALLOCATIONS];
> +       void *meta_descriptor;

It looks like only the sg_list variant of meta_list is currently used.
Should this just be a struct nvme_sgl_desc *?

Best,
Caleb


> +       void *descriptors[NVME_MAX_NR_DESCRIPTORS];
>  };
>
>  static inline unsigned int nvme_dbbuf_size(struct nvme_dev *dev)
> @@ -607,8 +598,8 @@ static void nvme_free_prps(struct nvme_queue *nvmeq, struct request *req)
>         dma_addr_t dma_addr = iod->first_dma;
>         int i;
>
> -       for (i = 0; i < iod->nr_allocations; i++) {
> -               __le64 *prp_list = iod->list[i].prp_list;
> +       for (i = 0; i < iod->nr_descriptors; i++) {
> +               __le64 *prp_list = iod->descriptors[i];
>                 dma_addr_t next_dma_addr = le64_to_cpu(prp_list[last_prp]);
>
>                 dma_pool_free(nvmeq->prp_pools.large, prp_list, dma_addr);
> @@ -631,11 +622,11 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_queue *nvmeq,
>
>         dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0);
>
> -       if (iod->nr_allocations == 0)
> -               dma_pool_free(nvmeq->prp_pools.small, iod->list[0].sg_list,
> +       if (iod->nr_descriptors == 0)
> +               dma_pool_free(nvmeq->prp_pools.small, iod->descriptors[0],
>                               iod->first_dma);
> -       else if (iod->nr_allocations == 1)
> -               dma_pool_free(nvmeq->prp_pools.large, iod->list[0].sg_list,
> +       else if (iod->nr_descriptors == 1)
> +               dma_pool_free(nvmeq->prp_pools.large, iod->descriptors[0],
>                               iod->first_dma);
>         else
>                 nvme_free_prps(nvmeq, req);
> @@ -693,18 +684,18 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
>         nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
>         if (nprps <= (256 / 8)) {
>                 pool = nvmeq->prp_pools.small;
> -               iod->nr_allocations = 0;
> +               iod->nr_descriptors = 0;
>         } else {
>                 pool = nvmeq->prp_pools.large;
> -               iod->nr_allocations = 1;
> +               iod->nr_descriptors = 1;
>         }
>
>         prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
>         if (!prp_list) {
> -               iod->nr_allocations = -1;
> +               iod->nr_descriptors = -1;
>                 return BLK_STS_RESOURCE;
>         }
> -       iod->list[0].prp_list = prp_list;
> +       iod->descriptors[0] = prp_list;
>         iod->first_dma = prp_dma;
>         i = 0;
>         for (;;) {
> @@ -713,7 +704,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
>                         prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
>                         if (!prp_list)
>                                 goto free_prps;
> -                       iod->list[iod->nr_allocations++].prp_list = prp_list;
> +                       iod->descriptors[iod->nr_descriptors++] = prp_list;
>                         prp_list[0] = old_prp_list[i - 1];
>                         old_prp_list[i - 1] = cpu_to_le64(prp_dma);
>                         i = 1;
> @@ -783,19 +774,19 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq,
>
>         if (entries <= (256 / sizeof(struct nvme_sgl_desc))) {
>                 pool = nvmeq->prp_pools.small;
> -               iod->nr_allocations = 0;
> +               iod->nr_descriptors = 0;
>         } else {
>                 pool = nvmeq->prp_pools.large;
> -               iod->nr_allocations = 1;
> +               iod->nr_descriptors = 1;
>         }
>
>         sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma);
>         if (!sg_list) {
> -               iod->nr_allocations = -1;
> +               iod->nr_descriptors = -1;
>                 return BLK_STS_RESOURCE;
>         }
>
> -       iod->list[0].sg_list = sg_list;
> +       iod->descriptors[0] = sg_list;
>         iod->first_dma = sgl_dma;
>
>         nvme_pci_sgl_set_seg(&cmd->dptr.sgl, sgl_dma, entries);
> @@ -935,7 +926,7 @@ static blk_status_t nvme_pci_setup_meta_sgls(struct nvme_dev *dev,
>                 goto out_unmap_sg;
>
>         entries = iod->meta_sgt.nents;
> -       iod->meta_list.sg_list = sg_list;
> +       iod->meta_descriptor = sg_list;
>         iod->meta_dma = sgl_dma;
>
>         cmnd->flags = NVME_CMD_SGL_METASEG;
> @@ -991,7 +982,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
>         blk_status_t ret;
>
>         iod->flags = 0;
> -       iod->nr_allocations = -1;
> +       iod->nr_descriptors = -1;
>         iod->sgt.nents = 0;
>         iod->meta_sgt.nents = 0;
>
> @@ -1117,8 +1108,8 @@ static __always_inline void nvme_unmap_metadata(struct nvme_dev *dev,
>                 return;
>         }
>
> -       dma_pool_free(nvmeq->prp_pools.small, iod->meta_list.sg_list,
> -                     iod->meta_dma);
> +       dma_pool_free(nvmeq->prp_pools.small, iod->meta_descriptor,
> +                       iod->meta_dma);
>         dma_unmap_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), 0);
>         mempool_free(iod->meta_sgt.sgl, dev->iod_meta_mempool);
>  }
> @@ -3842,7 +3833,7 @@ static int __init nvme_init(void)
>         BUILD_BUG_ON(IRQ_AFFINITY_MAX_SETS < 2);
>         BUILD_BUG_ON(NVME_MAX_SEGS > SGES_PER_PAGE);
>         BUILD_BUG_ON(sizeof(struct scatterlist) * NVME_MAX_SEGS > PAGE_SIZE);
> -       BUILD_BUG_ON(nvme_pci_npages_prp() > NVME_MAX_NR_ALLOCATIONS);
> +       BUILD_BUG_ON(nvme_pci_npages_prp() > NVME_MAX_NR_DESCRIPTORS);
>
>         return pci_register_driver(&nvme_driver);
>  }
> --
> 2.47.2
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] nvme-pci: remove struct nvme_descriptor
  2025-05-14  1:07   ` Caleb Sander Mateos
@ 2025-05-14  5:15     ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-14  5:15 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, Leon Romanovsky,
	linux-nvme, Jens Axboe

On Tue, May 13, 2025 at 06:07:11PM -0700, Caleb Sander Mateos wrote:
> > -       union nvme_descriptor list[NVME_MAX_NR_ALLOCATIONS];
> > +       void *meta_descriptor;
> 
> It looks like only the sg_list variant of meta_list is currently used.
> Should this just be a struct nvme_sgl_desc *?

Good question.  type safety is always good, and given how the metadata
pointer work adding anything else is highly unlikely.  OTOH it would
make things a little less symmetric, but I guess we can live with that.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 4/7] nvme-pci: rename the descriptor pools
  2025-05-13  7:00 misc cleanups for nvme-pci Christoph Hellwig
                   ` (2 preceding siblings ...)
  2025-05-13  7:00 ` [PATCH 3/7] nvme-pci: remove struct nvme_descriptor Christoph Hellwig
@ 2025-05-13  7:00 ` Christoph Hellwig
  2025-05-13  7:35   ` Leon Romanovsky
                     ` (2 more replies)
  2025-05-13  7:00 ` [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:00 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme

They are used for both PRPs and SGLs, and we use descriptor elsewhere
when referring to their allocations, so use that name here as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 84 ++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fe859c31e765..773a2e625657 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -113,7 +113,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 static void nvme_delete_io_queues(struct nvme_dev *dev);
 static void nvme_update_attrs(struct nvme_dev *dev);
 
-struct nvme_prp_dma_pools {
+struct nvme_descriptor_pools {
 	struct dma_pool *large;
 	struct dma_pool *small;
 };
@@ -166,7 +166,7 @@ struct nvme_dev {
 	unsigned int nr_allocated_queues;
 	unsigned int nr_write_queues;
 	unsigned int nr_poll_queues;
-	struct nvme_prp_dma_pools prp_pools[];
+	struct nvme_descriptor_pools descriptor_pools[];
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -196,7 +196,7 @@ static inline struct nvme_dev *to_nvme_dev(struct nvme_ctrl *ctrl)
  */
 struct nvme_queue {
 	struct nvme_dev *dev;
-	struct nvme_prp_dma_pools prp_pools;
+	struct nvme_descriptor_pools descriptor_pools;
 	spinlock_t sq_lock;
 	void *sq_cmds;
 	 /* only used for poll queues: */
@@ -400,46 +400,44 @@ static int nvme_pci_npages_prp(void)
 	return DIV_ROUND_UP(8 * nprps, NVME_CTRL_PAGE_SIZE - 8);
 }
 
-static struct nvme_prp_dma_pools *
-nvme_setup_prp_pools(struct nvme_dev *dev, unsigned numa_node)
+static struct nvme_descriptor_pools *
+nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node)
 {
-	struct nvme_prp_dma_pools *prp_pools = &dev->prp_pools[numa_node];
+	struct nvme_descriptor_pools *pools = &dev->descriptor_pools[numa_node];
 	size_t small_align = 256;
 
-	if (prp_pools->small)
-		return prp_pools; /* already initialized */
+	if (pools->small)
+		return pools; /* already initialized */
 
-	prp_pools->large = dma_pool_create_node("prp list page", dev->dev,
-						NVME_CTRL_PAGE_SIZE,
-						NVME_CTRL_PAGE_SIZE, 0,
-						numa_node);
-	if (!prp_pools->large)
+	pools->large = dma_pool_create_node("nvme descriptor page", dev->dev,
+			NVME_CTRL_PAGE_SIZE, NVME_CTRL_PAGE_SIZE, 0, numa_node);
+	if (!pools->large)
 		return ERR_PTR(-ENOMEM);
 
 	if (dev->ctrl.quirks & NVME_QUIRK_DMAPOOL_ALIGN_512)
 		small_align = 512;
 
 	/* Optimisation for I/Os between 4k and 128k */
-	prp_pools->small = dma_pool_create_node("prp list 256", dev->dev,
-						256, small_align, 0, numa_node);
-	if (!prp_pools->small) {
-		dma_pool_destroy(prp_pools->large);
-		prp_pools->large = NULL;
+	pools->small = dma_pool_create_node("nvme descriptor 256", dev->dev,
+			256, small_align, 0, numa_node);
+	if (!pools->small) {
+		dma_pool_destroy(pools->large);
+		pools->large = NULL;
 		return ERR_PTR(-ENOMEM);
 	}
 
-	return prp_pools;
+	return pools;
 }
 
-static void nvme_release_prp_pools(struct nvme_dev *dev)
+static void nvme_release_descriptor_pools(struct nvme_dev *dev)
 {
 	unsigned i;
 
 	for (i = 0; i < nr_node_ids; i++) {
-		struct nvme_prp_dma_pools *prp_pools = &dev->prp_pools[i];
+		struct nvme_descriptor_pools *pools = &dev->descriptor_pools[i];
 
-		dma_pool_destroy(prp_pools->large);
-		dma_pool_destroy(prp_pools->small);
+		dma_pool_destroy(pools->large);
+		dma_pool_destroy(pools->small);
 	}
 }
 
@@ -448,16 +446,16 @@ static int nvme_init_hctx_common(struct blk_mq_hw_ctx *hctx, void *data,
 {
 	struct nvme_dev *dev = to_nvme_dev(data);
 	struct nvme_queue *nvmeq = &dev->queues[qid];
-	struct nvme_prp_dma_pools *prp_pools;
+	struct nvme_descriptor_pools *pools;
 	struct blk_mq_tags *tags;
 
 	tags = qid ? dev->tagset.tags[qid - 1] : dev->admin_tagset.tags[0];
 	WARN_ON(tags != hctx->tags);
-	prp_pools = nvme_setup_prp_pools(dev, hctx->numa_node);
-	if (IS_ERR(prp_pools))
-		return PTR_ERR(prp_pools);
+	pools = nvme_setup_descriptor_pools(dev, hctx->numa_node);
+	if (IS_ERR(pools))
+		return PTR_ERR(pools);
 
-	nvmeq->prp_pools = *prp_pools;
+	nvmeq->descriptor_pools = *pools;
 	hctx->driver_data = nvmeq;
 	return 0;
 }
@@ -602,7 +600,8 @@ static void nvme_free_prps(struct nvme_queue *nvmeq, struct request *req)
 		__le64 *prp_list = iod->descriptors[i];
 		dma_addr_t next_dma_addr = le64_to_cpu(prp_list[last_prp]);
 
-		dma_pool_free(nvmeq->prp_pools.large, prp_list, dma_addr);
+		dma_pool_free(nvmeq->descriptor_pools.large, prp_list,
+				dma_addr);
 		dma_addr = next_dma_addr;
 	}
 }
@@ -623,11 +622,11 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_queue *nvmeq,
 	dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0);
 
 	if (iod->nr_descriptors == 0)
-		dma_pool_free(nvmeq->prp_pools.small, iod->descriptors[0],
-			      iod->first_dma);
+		dma_pool_free(nvmeq->descriptor_pools.small,
+				iod->descriptors[0], iod->first_dma);
 	else if (iod->nr_descriptors == 1)
-		dma_pool_free(nvmeq->prp_pools.large, iod->descriptors[0],
-			      iod->first_dma);
+		dma_pool_free(nvmeq->descriptor_pools.large,
+				iod->descriptors[0], iod->first_dma);
 	else
 		nvme_free_prps(nvmeq, req);
 	mempool_free(iod->sgt.sgl, dev->iod_mempool);
@@ -683,10 +682,10 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
 
 	nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
 	if (nprps <= (256 / 8)) {
-		pool = nvmeq->prp_pools.small;
+		pool = nvmeq->descriptor_pools.small;
 		iod->nr_descriptors = 0;
 	} else {
-		pool = nvmeq->prp_pools.large;
+		pool = nvmeq->descriptor_pools.large;
 		iod->nr_descriptors = 1;
 	}
 
@@ -773,10 +772,10 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq,
 	}
 
 	if (entries <= (256 / sizeof(struct nvme_sgl_desc))) {
-		pool = nvmeq->prp_pools.small;
+		pool = nvmeq->descriptor_pools.small;
 		iod->nr_descriptors = 0;
 	} else {
-		pool = nvmeq->prp_pools.large;
+		pool = nvmeq->descriptor_pools.large;
 		iod->nr_descriptors = 1;
 	}
 
@@ -921,7 +920,8 @@ static blk_status_t nvme_pci_setup_meta_sgls(struct nvme_dev *dev,
 	if (rc)
 		goto out_free_sg;
 
-	sg_list = dma_pool_alloc(nvmeq->prp_pools.small, GFP_ATOMIC, &sgl_dma);
+	sg_list = dma_pool_alloc(nvmeq->descriptor_pools.small, GFP_ATOMIC,
+			&sgl_dma);
 	if (!sg_list)
 		goto out_unmap_sg;
 
@@ -1108,7 +1108,7 @@ static __always_inline void nvme_unmap_metadata(struct nvme_dev *dev,
 		return;
 	}
 
-	dma_pool_free(nvmeq->prp_pools.small, iod->meta_descriptor,
+	dma_pool_free(nvmeq->descriptor_pools.small, iod->meta_descriptor,
 			iod->meta_dma);
 	dma_unmap_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), 0);
 	mempool_free(iod->meta_sgt.sgl, dev->iod_meta_mempool);
@@ -3214,8 +3214,8 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev,
 	struct nvme_dev *dev;
 	int ret = -ENOMEM;
 
-	dev = kzalloc_node(sizeof(*dev) + nr_node_ids * sizeof(*dev->prp_pools),
-			   GFP_KERNEL, node);
+	dev = kzalloc_node(sizeof(*dev) + nr_node_ids *
+			sizeof(*dev->descriptor_pools), GFP_KERNEL, node);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 	INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work);
@@ -3432,7 +3432,7 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_free_queues(dev, 0);
 	mempool_destroy(dev->iod_mempool);
 	mempool_destroy(dev->iod_meta_mempool);
-	nvme_release_prp_pools(dev);
+	nvme_release_descriptor_pools(dev);
 	nvme_dev_unmap(dev);
 	nvme_uninit_ctrl(&dev->ctrl);
 }
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] nvme-pci: rename the descriptor pools
  2025-05-13  7:00 ` [PATCH 4/7] nvme-pci: rename the descriptor pools Christoph Hellwig
@ 2025-05-13  7:35   ` Leon Romanovsky
  2025-05-13 14:16   ` Kanchan Joshi/Kanchan Joshi
  2025-05-14  1:03   ` Caleb Sander Mateos
  2 siblings, 0 replies; 38+ messages in thread
From: Leon Romanovsky @ 2025-05-13  7:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Caleb Sander Mateos, linux-nvme

On Tue, May 13, 2025 at 09:00:20AM +0200, Christoph Hellwig wrote:
> They are used for both PRPs and SGLs, and we use descriptor elsewhere
> when referring to their allocations, so use that name here as well.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 84 ++++++++++++++++++++---------------------
>  1 file changed, 42 insertions(+), 42 deletions(-)
> 

Thanks,
Reviewed-by: Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] nvme-pci: rename the descriptor pools
  2025-05-13  7:00 ` [PATCH 4/7] nvme-pci: rename the descriptor pools Christoph Hellwig
  2025-05-13  7:35   ` Leon Romanovsky
@ 2025-05-13 14:16   ` Kanchan Joshi/Kanchan Joshi
  2025-05-14  1:03   ` Caleb Sander Mateos
  2 siblings, 0 replies; 38+ messages in thread
From: Kanchan Joshi/Kanchan Joshi @ 2025-05-13 14:16 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] nvme-pci: rename the descriptor pools
  2025-05-13  7:00 ` [PATCH 4/7] nvme-pci: rename the descriptor pools Christoph Hellwig
  2025-05-13  7:35   ` Leon Romanovsky
  2025-05-13 14:16   ` Kanchan Joshi/Kanchan Joshi
@ 2025-05-14  1:03   ` Caleb Sander Mateos
  2 siblings, 0 replies; 38+ messages in thread
From: Caleb Sander Mateos @ 2025-05-14  1:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, Leon Romanovsky, linux-nvme

On Tue, May 13, 2025 at 12:00 AM Christoph Hellwig <hch@lst.de> wrote:
>
> They are used for both PRPs and SGLs, and we use descriptor elsewhere
> when referring to their allocations, so use that name here as well.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>

> ---
>  drivers/nvme/host/pci.c | 84 ++++++++++++++++++++---------------------
>  1 file changed, 42 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index fe859c31e765..773a2e625657 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -113,7 +113,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
>  static void nvme_delete_io_queues(struct nvme_dev *dev);
>  static void nvme_update_attrs(struct nvme_dev *dev);
>
> -struct nvme_prp_dma_pools {
> +struct nvme_descriptor_pools {
>         struct dma_pool *large;
>         struct dma_pool *small;
>  };
> @@ -166,7 +166,7 @@ struct nvme_dev {
>         unsigned int nr_allocated_queues;
>         unsigned int nr_write_queues;
>         unsigned int nr_poll_queues;
> -       struct nvme_prp_dma_pools prp_pools[];
> +       struct nvme_descriptor_pools descriptor_pools[];
>  };
>
>  static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
> @@ -196,7 +196,7 @@ static inline struct nvme_dev *to_nvme_dev(struct nvme_ctrl *ctrl)
>   */
>  struct nvme_queue {
>         struct nvme_dev *dev;
> -       struct nvme_prp_dma_pools prp_pools;
> +       struct nvme_descriptor_pools descriptor_pools;
>         spinlock_t sq_lock;
>         void *sq_cmds;
>          /* only used for poll queues: */
> @@ -400,46 +400,44 @@ static int nvme_pci_npages_prp(void)
>         return DIV_ROUND_UP(8 * nprps, NVME_CTRL_PAGE_SIZE - 8);
>  }
>
> -static struct nvme_prp_dma_pools *
> -nvme_setup_prp_pools(struct nvme_dev *dev, unsigned numa_node)
> +static struct nvme_descriptor_pools *
> +nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node)
>  {
> -       struct nvme_prp_dma_pools *prp_pools = &dev->prp_pools[numa_node];
> +       struct nvme_descriptor_pools *pools = &dev->descriptor_pools[numa_node];
>         size_t small_align = 256;
>
> -       if (prp_pools->small)
> -               return prp_pools; /* already initialized */
> +       if (pools->small)
> +               return pools; /* already initialized */
>
> -       prp_pools->large = dma_pool_create_node("prp list page", dev->dev,
> -                                               NVME_CTRL_PAGE_SIZE,
> -                                               NVME_CTRL_PAGE_SIZE, 0,
> -                                               numa_node);
> -       if (!prp_pools->large)
> +       pools->large = dma_pool_create_node("nvme descriptor page", dev->dev,
> +                       NVME_CTRL_PAGE_SIZE, NVME_CTRL_PAGE_SIZE, 0, numa_node);
> +       if (!pools->large)
>                 return ERR_PTR(-ENOMEM);
>
>         if (dev->ctrl.quirks & NVME_QUIRK_DMAPOOL_ALIGN_512)
>                 small_align = 512;
>
>         /* Optimisation for I/Os between 4k and 128k */
> -       prp_pools->small = dma_pool_create_node("prp list 256", dev->dev,
> -                                               256, small_align, 0, numa_node);
> -       if (!prp_pools->small) {
> -               dma_pool_destroy(prp_pools->large);
> -               prp_pools->large = NULL;
> +       pools->small = dma_pool_create_node("nvme descriptor 256", dev->dev,
> +                       256, small_align, 0, numa_node);
> +       if (!pools->small) {
> +               dma_pool_destroy(pools->large);
> +               pools->large = NULL;
>                 return ERR_PTR(-ENOMEM);
>         }
>
> -       return prp_pools;
> +       return pools;
>  }
>
> -static void nvme_release_prp_pools(struct nvme_dev *dev)
> +static void nvme_release_descriptor_pools(struct nvme_dev *dev)
>  {
>         unsigned i;
>
>         for (i = 0; i < nr_node_ids; i++) {
> -               struct nvme_prp_dma_pools *prp_pools = &dev->prp_pools[i];
> +               struct nvme_descriptor_pools *pools = &dev->descriptor_pools[i];
>
> -               dma_pool_destroy(prp_pools->large);
> -               dma_pool_destroy(prp_pools->small);
> +               dma_pool_destroy(pools->large);
> +               dma_pool_destroy(pools->small);
>         }
>  }
>
> @@ -448,16 +446,16 @@ static int nvme_init_hctx_common(struct blk_mq_hw_ctx *hctx, void *data,
>  {
>         struct nvme_dev *dev = to_nvme_dev(data);
>         struct nvme_queue *nvmeq = &dev->queues[qid];
> -       struct nvme_prp_dma_pools *prp_pools;
> +       struct nvme_descriptor_pools *pools;
>         struct blk_mq_tags *tags;
>
>         tags = qid ? dev->tagset.tags[qid - 1] : dev->admin_tagset.tags[0];
>         WARN_ON(tags != hctx->tags);
> -       prp_pools = nvme_setup_prp_pools(dev, hctx->numa_node);
> -       if (IS_ERR(prp_pools))
> -               return PTR_ERR(prp_pools);
> +       pools = nvme_setup_descriptor_pools(dev, hctx->numa_node);
> +       if (IS_ERR(pools))
> +               return PTR_ERR(pools);
>
> -       nvmeq->prp_pools = *prp_pools;
> +       nvmeq->descriptor_pools = *pools;
>         hctx->driver_data = nvmeq;
>         return 0;
>  }
> @@ -602,7 +600,8 @@ static void nvme_free_prps(struct nvme_queue *nvmeq, struct request *req)
>                 __le64 *prp_list = iod->descriptors[i];
>                 dma_addr_t next_dma_addr = le64_to_cpu(prp_list[last_prp]);
>
> -               dma_pool_free(nvmeq->prp_pools.large, prp_list, dma_addr);
> +               dma_pool_free(nvmeq->descriptor_pools.large, prp_list,
> +                               dma_addr);
>                 dma_addr = next_dma_addr;
>         }
>  }
> @@ -623,11 +622,11 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_queue *nvmeq,
>         dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0);
>
>         if (iod->nr_descriptors == 0)
> -               dma_pool_free(nvmeq->prp_pools.small, iod->descriptors[0],
> -                             iod->first_dma);
> +               dma_pool_free(nvmeq->descriptor_pools.small,
> +                               iod->descriptors[0], iod->first_dma);
>         else if (iod->nr_descriptors == 1)
> -               dma_pool_free(nvmeq->prp_pools.large, iod->descriptors[0],
> -                             iod->first_dma);
> +               dma_pool_free(nvmeq->descriptor_pools.large,
> +                               iod->descriptors[0], iod->first_dma);
>         else
>                 nvme_free_prps(nvmeq, req);
>         mempool_free(iod->sgt.sgl, dev->iod_mempool);
> @@ -683,10 +682,10 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
>
>         nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
>         if (nprps <= (256 / 8)) {
> -               pool = nvmeq->prp_pools.small;
> +               pool = nvmeq->descriptor_pools.small;
>                 iod->nr_descriptors = 0;
>         } else {
> -               pool = nvmeq->prp_pools.large;
> +               pool = nvmeq->descriptor_pools.large;
>                 iod->nr_descriptors = 1;
>         }
>
> @@ -773,10 +772,10 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq,
>         }
>
>         if (entries <= (256 / sizeof(struct nvme_sgl_desc))) {
> -               pool = nvmeq->prp_pools.small;
> +               pool = nvmeq->descriptor_pools.small;
>                 iod->nr_descriptors = 0;
>         } else {
> -               pool = nvmeq->prp_pools.large;
> +               pool = nvmeq->descriptor_pools.large;
>                 iod->nr_descriptors = 1;
>         }
>
> @@ -921,7 +920,8 @@ static blk_status_t nvme_pci_setup_meta_sgls(struct nvme_dev *dev,
>         if (rc)
>                 goto out_free_sg;
>
> -       sg_list = dma_pool_alloc(nvmeq->prp_pools.small, GFP_ATOMIC, &sgl_dma);
> +       sg_list = dma_pool_alloc(nvmeq->descriptor_pools.small, GFP_ATOMIC,
> +                       &sgl_dma);
>         if (!sg_list)
>                 goto out_unmap_sg;
>
> @@ -1108,7 +1108,7 @@ static __always_inline void nvme_unmap_metadata(struct nvme_dev *dev,
>                 return;
>         }
>
> -       dma_pool_free(nvmeq->prp_pools.small, iod->meta_descriptor,
> +       dma_pool_free(nvmeq->descriptor_pools.small, iod->meta_descriptor,
>                         iod->meta_dma);
>         dma_unmap_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), 0);
>         mempool_free(iod->meta_sgt.sgl, dev->iod_meta_mempool);
> @@ -3214,8 +3214,8 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev,
>         struct nvme_dev *dev;
>         int ret = -ENOMEM;
>
> -       dev = kzalloc_node(sizeof(*dev) + nr_node_ids * sizeof(*dev->prp_pools),
> -                          GFP_KERNEL, node);
> +       dev = kzalloc_node(sizeof(*dev) + nr_node_ids *
> +                       sizeof(*dev->descriptor_pools), GFP_KERNEL, node);
>         if (!dev)
>                 return ERR_PTR(-ENOMEM);
>         INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work);
> @@ -3432,7 +3432,7 @@ static void nvme_remove(struct pci_dev *pdev)
>         nvme_free_queues(dev, 0);
>         mempool_destroy(dev->iod_mempool);
>         mempool_destroy(dev->iod_meta_mempool);
> -       nvme_release_prp_pools(dev);
> +       nvme_release_descriptor_pools(dev);
>         nvme_dev_unmap(dev);
>         nvme_uninit_ctrl(&dev->ctrl);
>  }
> --
> 2.47.2
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-13  7:00 misc cleanups for nvme-pci Christoph Hellwig
                   ` (3 preceding siblings ...)
  2025-05-13  7:00 ` [PATCH 4/7] nvme-pci: rename the descriptor pools Christoph Hellwig
@ 2025-05-13  7:00 ` Christoph Hellwig
  2025-05-13  7:39   ` Leon Romanovsky
                     ` (3 more replies)
  2025-05-13  7:00 ` [PATCH 6/7] nvme-pci: add a symolic name for the small pool size Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 4 replies; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:00 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme

Add a separate flag to encode that the transfer is using the small
page sized pool, and use a normal 0..n count for the number of
descriptors.

Contains improvements and suggestions from Kanchan Joshi
<joshi.k@samsung.com> and Leon Romanovsky <leonro@nvidia.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 82 ++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 43 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 773a2e625657..16e2ce25da83 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -229,6 +229,9 @@ struct nvme_queue {
 enum nvme_iod_flags {
 	/* this command has been aborted by the timeout handler */
 	IOD_ABORTED		= 1U << 0,
+
+	/* uses the small descriptor pool */
+	IOD_SMALL_DESCRIPTOR		= 1U << 1,
 };
 
 /*
@@ -238,7 +241,7 @@ struct nvme_iod {
 	struct nvme_request req;
 	struct nvme_command cmd;
 	u8 flags;
-	s8 nr_descriptors;
+	u8 nr_descriptors;
 	unsigned int dma_len;	/* length of single DMA segment mapping */
 	dma_addr_t first_dma;
 	dma_addr_t meta_dma;
@@ -589,13 +592,27 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req,
 	return true;
 }
 
-static void nvme_free_prps(struct nvme_queue *nvmeq, struct request *req)
+static inline struct dma_pool *nvme_dma_pool(struct nvme_queue *nvmeq,
+		struct nvme_iod *iod)
+{
+	if (iod->flags & IOD_SMALL_DESCRIPTOR)
+		return nvmeq->descriptor_pools.small;
+	return nvmeq->descriptor_pools.large;
+}
+
+static void nvme_free_descriptors(struct nvme_queue *nvmeq, struct request *req)
 {
 	const int last_prp = NVME_CTRL_PAGE_SIZE / sizeof(__le64) - 1;
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	dma_addr_t dma_addr = iod->first_dma;
 	int i;
 
+	if (iod->nr_descriptors == 1) {
+		dma_pool_free(nvme_dma_pool(nvmeq, iod), iod->descriptors[0],
+				dma_addr);
+		return;
+	}
+
 	for (i = 0; i < iod->nr_descriptors; i++) {
 		__le64 *prp_list = iod->descriptors[i];
 		dma_addr_t next_dma_addr = le64_to_cpu(prp_list[last_prp]);
@@ -620,15 +637,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_queue *nvmeq,
 	WARN_ON_ONCE(!iod->sgt.nents);
 
 	dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0);
-
-	if (iod->nr_descriptors == 0)
-		dma_pool_free(nvmeq->descriptor_pools.small,
-				iod->descriptors[0], iod->first_dma);
-	else if (iod->nr_descriptors == 1)
-		dma_pool_free(nvmeq->descriptor_pools.large,
-				iod->descriptors[0], iod->first_dma);
-	else
-		nvme_free_prps(nvmeq, req);
+	nvme_free_descriptors(nvmeq, req);
 	mempool_free(iod->sgt.sgl, dev->iod_mempool);
 }
 
@@ -650,7 +659,6 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
 		struct request *req, struct nvme_rw_command *cmnd)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	struct dma_pool *pool;
 	int length = blk_rq_payload_bytes(req);
 	struct scatterlist *sg = iod->sgt.sgl;
 	int dma_len = sg_dma_len(sg);
@@ -658,7 +666,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
 	int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1);
 	__le64 *prp_list;
 	dma_addr_t prp_dma;
-	int nprps, i;
+	int i;
 
 	length -= (NVME_CTRL_PAGE_SIZE - offset);
 	if (length <= 0) {
@@ -680,27 +688,23 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
 		goto done;
 	}
 
-	nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
-	if (nprps <= (256 / 8)) {
-		pool = nvmeq->descriptor_pools.small;
-		iod->nr_descriptors = 0;
-	} else {
-		pool = nvmeq->descriptor_pools.large;
-		iod->nr_descriptors = 1;
-	}
+	if (DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE) <=
+	    256 / sizeof(__le64))
+		iod->flags |= IOD_SMALL_DESCRIPTOR;
 
-	prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
-	if (!prp_list) {
-		iod->nr_descriptors = -1;
+	prp_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod), GFP_ATOMIC,
+			&prp_dma);
+	if (!prp_list)
 		return BLK_STS_RESOURCE;
-	}
-	iod->descriptors[0] = prp_list;
+	iod->descriptors[iod->nr_descriptors++] = prp_list;
 	iod->first_dma = prp_dma;
 	i = 0;
 	for (;;) {
 		if (i == NVME_CTRL_PAGE_SIZE >> 3) {
 			__le64 *old_prp_list = prp_list;
-			prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
+
+			prp_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod),
+					GFP_ATOMIC, &prp_dma);
 			if (!prp_list)
 				goto free_prps;
 			iod->descriptors[iod->nr_descriptors++] = prp_list;
@@ -727,7 +731,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
 	cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma);
 	return BLK_STS_OK;
 free_prps:
-	nvme_free_prps(nvmeq, req);
+	nvme_free_descriptors(nvmeq, req);
 	return BLK_STS_RESOURCE;
 bad_sgl:
 	WARN(DO_ONCE(nvme_print_sgl, iod->sgt.sgl, iod->sgt.nents),
@@ -756,7 +760,6 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq,
 		struct request *req, struct nvme_rw_command *cmd)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	struct dma_pool *pool;
 	struct nvme_sgl_desc *sg_list;
 	struct scatterlist *sg = iod->sgt.sgl;
 	unsigned int entries = iod->sgt.nents;
@@ -771,21 +774,14 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq,
 		return BLK_STS_OK;
 	}
 
-	if (entries <= (256 / sizeof(struct nvme_sgl_desc))) {
-		pool = nvmeq->descriptor_pools.small;
-		iod->nr_descriptors = 0;
-	} else {
-		pool = nvmeq->descriptor_pools.large;
-		iod->nr_descriptors = 1;
-	}
+	if (entries <= 256 / sizeof(*sg_list))
+		iod->flags |= IOD_SMALL_DESCRIPTOR;
 
-	sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma);
-	if (!sg_list) {
-		iod->nr_descriptors = -1;
+	sg_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod), GFP_ATOMIC,
+			&sgl_dma);
+	if (!sg_list)
 		return BLK_STS_RESOURCE;
-	}
-
-	iod->descriptors[0] = sg_list;
+	iod->descriptors[iod->nr_descriptors++] = sg_list;
 	iod->first_dma = sgl_dma;
 
 	nvme_pci_sgl_set_seg(&cmd->dptr.sgl, sgl_dma, entries);
@@ -982,7 +978,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
 	blk_status_t ret;
 
 	iod->flags = 0;
-	iod->nr_descriptors = -1;
+	iod->nr_descriptors = 0;
 	iod->sgt.nents = 0;
 	iod->meta_sgt.nents = 0;
 
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-13  7:00 ` [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations Christoph Hellwig
@ 2025-05-13  7:39   ` Leon Romanovsky
  2025-05-13  7:40     ` Christoph Hellwig
  2025-05-13 14:49   ` Kanchan Joshi/Kanchan Joshi
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Leon Romanovsky @ 2025-05-13  7:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Caleb Sander Mateos, linux-nvme

On Tue, May 13, 2025 at 09:00:21AM +0200, Christoph Hellwig wrote:
> Add a separate flag to encode that the transfer is using the small
> page sized pool, and use a normal 0..n count for the number of
> descriptors.
> 
> Contains improvements and suggestions from Kanchan Joshi
> <joshi.k@samsung.com> and Leon Romanovsky <leonro@nvidia.com>.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 82 ++++++++++++++++++++---------------------
>  1 file changed, 39 insertions(+), 43 deletions(-)

<...>

> -static void nvme_free_prps(struct nvme_queue *nvmeq, struct request *req)
> +static inline struct dma_pool *nvme_dma_pool(struct nvme_queue *nvmeq,
> +		struct nvme_iod *iod)
> +{

You probably want __always_inline and not inline here.


> +	if (iod->flags & IOD_SMALL_DESCRIPTOR)
> +		return nvmeq->descriptor_pools.small;
> +	return nvmeq->descriptor_pools.large;
> +}

Thanks,
Reviewed-by: Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-13  7:39   ` Leon Romanovsky
@ 2025-05-13  7:40     ` Christoph Hellwig
  2025-05-13  7:49       ` Leon Romanovsky
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:40 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Caleb Sander Mateos, linux-nvme

On Tue, May 13, 2025 at 10:39:50AM +0300, Leon Romanovsky wrote:
> > +static inline struct dma_pool *nvme_dma_pool(struct nvme_queue *nvmeq,
> > +		struct nvme_iod *iod)
> > +{
> 
> You probably want __always_inline and not inline here.

I avoid that if I can.  People trying to use non-optimizing compiler
flag deserve what they asked for :)



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-13  7:40     ` Christoph Hellwig
@ 2025-05-13  7:49       ` Leon Romanovsky
  0 siblings, 0 replies; 38+ messages in thread
From: Leon Romanovsky @ 2025-05-13  7:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Caleb Sander Mateos, linux-nvme

On Tue, May 13, 2025 at 09:40:41AM +0200, Christoph Hellwig wrote:
> On Tue, May 13, 2025 at 10:39:50AM +0300, Leon Romanovsky wrote:
> > > +static inline struct dma_pool *nvme_dma_pool(struct nvme_queue *nvmeq,
> > > +		struct nvme_iod *iod)
> > > +{
> > 
> > You probably want __always_inline and not inline here.
> 
> I avoid that if I can.  People trying to use non-optimizing compiler
> flag deserve what they asked for :)

Right, but you are trying to sit on two seats here.

For users who uses normal compiler, this two line function will be
inlined anyway without need in extra "inline" keyword.
For users with "non-optimizing compiler", this "inline" keyword will do
nothing.

Thanks 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-13  7:00 ` [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations Christoph Hellwig
  2025-05-13  7:39   ` Leon Romanovsky
@ 2025-05-13 14:49   ` Kanchan Joshi/Kanchan Joshi
  2025-05-13 15:06   ` Keith Busch
  2025-05-14 17:47   ` Caleb Sander Mateos
  3 siblings, 0 replies; 38+ messages in thread
From: Kanchan Joshi/Kanchan Joshi @ 2025-05-13 14:49 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-13  7:00 ` [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations Christoph Hellwig
  2025-05-13  7:39   ` Leon Romanovsky
  2025-05-13 14:49   ` Kanchan Joshi/Kanchan Joshi
@ 2025-05-13 15:06   ` Keith Busch
  2025-05-14  5:12     ` Christoph Hellwig
  2025-05-14 17:48     ` Caleb Sander Mateos
  2025-05-14 17:47   ` Caleb Sander Mateos
  3 siblings, 2 replies; 38+ messages in thread
From: Keith Busch @ 2025-05-13 15:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Caleb Sander Mateos, Leon Romanovsky, linux-nvme

On Tue, May 13, 2025 at 09:00:21AM +0200, Christoph Hellwig wrote:
> +static inline struct dma_pool *nvme_dma_pool(struct nvme_queue *nvmeq,
> +		struct nvme_iod *iod)
> +{
> +	if (iod->flags & IOD_SMALL_DESCRIPTOR)
> +		return nvmeq->descriptor_pools.small;
> +	return nvmeq->descriptor_pools.large;
> +}

...

>  		if (i == NVME_CTRL_PAGE_SIZE >> 3) {
>  			__le64 *old_prp_list = prp_list;
> -			prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
> +
> +			prp_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod),
> +					GFP_ATOMIC, &prp_dma);

You could assume nvmeq->descriptor_pools.large here. We'd never use the
small pool if we've enough elements to chain lists, and it would skip
the extra condition branch on each loop.

The same optimization is done in the free side already, too.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-13 15:06   ` Keith Busch
@ 2025-05-14  5:12     ` Christoph Hellwig
  2025-05-14 13:06       ` Keith Busch
  2025-05-14 17:48     ` Caleb Sander Mateos
  1 sibling, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-14  5:12 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Sagi Grimberg, Caleb Sander Mateos,
	Leon Romanovsky, linux-nvme

On Tue, May 13, 2025 at 09:06:38AM -0600, Keith Busch wrote:
> >  		if (i == NVME_CTRL_PAGE_SIZE >> 3) {
> >  			__le64 *old_prp_list = prp_list;
> > -			prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
> > +
> > +			prp_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod),
> > +					GFP_ATOMIC, &prp_dma);
> 
> You could assume nvmeq->descriptor_pools.large here. We'd never use the
> small pool if we've enough elements to chain lists, and it would skip
> the extra condition branch on each loop.

Yeah, I'll fix it up.

Something only vaguely related:  do you remember why the metadata SGL
descriptors are limited to just the small pool?  It would be nice
to support as many metadata as data entries, which we'd almost get
by using the larger one (we'd still be one off).



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-14  5:12     ` Christoph Hellwig
@ 2025-05-14 13:06       ` Keith Busch
  2025-05-14 14:00         ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Keith Busch @ 2025-05-14 13:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Caleb Sander Mateos, Leon Romanovsky, linux-nvme

On Wed, May 14, 2025 at 07:12:21AM +0200, Christoph Hellwig wrote:
> 
> Something only vaguely related:  do you remember why the metadata SGL
> descriptors are limited to just the small pool?  It would be nice
> to support as many metadata as data entries, which we'd almost get
> by using the larger one (we'd still be one off).

The only way I'd expect to exceed what the small pool provides is
through merging, but that feature was more of an afterthought.

My intended use case was to support zero-copy for user passthrough. The
metadata there is a virtually contiguous buffer. You'd need very large
IO in order for metadata to require more than few segments, so the small
pool's 15 segments  was sufficient. If you are doing large passthrough
IO (>2MB), then you should be using huge pages, in which case we'd still
only see one or two segments for the metadata.

But if you are trying to merge dozens of requests with metadata into
one, then yeah, the small pool isn't large enough to accomodate. If you
want to support that use case, then we can certainly change the driver
to use the large pool when more than 15 integrity segments are required.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-14 13:06       ` Keith Busch
@ 2025-05-14 14:00         ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-14 14:00 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Sagi Grimberg, Caleb Sander Mateos,
	Leon Romanovsky, linux-nvme

On Wed, May 14, 2025 at 07:06:17AM -0600, Keith Busch wrote:
> But if you are trying to merge dozens of requests with metadata into
> one, then yeah, the small pool isn't large enough to accomodate. If you
> want to support that use case, then we can certainly change the driver
> to use the large pool when more than 15 integrity segments are required.

Merge is one use case.  The other is to use non-contigous buffers for file
system PI support, as a single BIO with large folios can get quite huge,
and being able to either use vmalloc or multiple pages would be very
helpful there.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-13 15:06   ` Keith Busch
  2025-05-14  5:12     ` Christoph Hellwig
@ 2025-05-14 17:48     ` Caleb Sander Mateos
  1 sibling, 0 replies; 38+ messages in thread
From: Caleb Sander Mateos @ 2025-05-14 17:48 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Sagi Grimberg, Leon Romanovsky, linux-nvme

On Tue, May 13, 2025 at 8:06 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Tue, May 13, 2025 at 09:00:21AM +0200, Christoph Hellwig wrote:
> > +static inline struct dma_pool *nvme_dma_pool(struct nvme_queue *nvmeq,
> > +             struct nvme_iod *iod)
> > +{
> > +     if (iod->flags & IOD_SMALL_DESCRIPTOR)
> > +             return nvmeq->descriptor_pools.small;
> > +     return nvmeq->descriptor_pools.large;
> > +}
>
> ...
>
> >               if (i == NVME_CTRL_PAGE_SIZE >> 3) {
> >                       __le64 *old_prp_list = prp_list;
> > -                     prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
> > +
> > +                     prp_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod),
> > +                                     GFP_ATOMIC, &prp_dma);
>
> You could assume nvmeq->descriptor_pools.large here. We'd never use the
> small pool if we've enough elements to chain lists, and it would skip
> the extra condition branch on each loop.

Right, and the existing code already assumes this branch is taken only
in the large case since iod->descriptors[0] is always set before the
loop but nr_descriptors is only initialized to 1 in the large case.

The existing approach of storing the struct dma_pool * in a local
variable seems fine to me and would avoid another pointer dereference
here.

Best,
Caleb

>
> The same optimization is done in the free side already, too.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations
  2025-05-13  7:00 ` [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations Christoph Hellwig
                     ` (2 preceding siblings ...)
  2025-05-13 15:06   ` Keith Busch
@ 2025-05-14 17:47   ` Caleb Sander Mateos
  3 siblings, 0 replies; 38+ messages in thread
From: Caleb Sander Mateos @ 2025-05-14 17:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, Leon Romanovsky, linux-nvme

On Tue, May 13, 2025 at 12:00 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Add a separate flag to encode that the transfer is using the small
> page sized pool, and use a normal 0..n count for the number of
> descriptors.
>
> Contains improvements and suggestions from Kanchan Joshi
> <joshi.k@samsung.com> and Leon Romanovsky <leonro@nvidia.com>.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 82 ++++++++++++++++++++---------------------
>  1 file changed, 39 insertions(+), 43 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 773a2e625657..16e2ce25da83 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -229,6 +229,9 @@ struct nvme_queue {
>  enum nvme_iod_flags {
>         /* this command has been aborted by the timeout handler */
>         IOD_ABORTED             = 1U << 0,
> +
> +       /* uses the small descriptor pool */
> +       IOD_SMALL_DESCRIPTOR            = 1U << 1,
>  };
>
>  /*
> @@ -238,7 +241,7 @@ struct nvme_iod {
>         struct nvme_request req;
>         struct nvme_command cmd;
>         u8 flags;
> -       s8 nr_descriptors;
> +       u8 nr_descriptors;
>         unsigned int dma_len;   /* length of single DMA segment mapping */
>         dma_addr_t first_dma;
>         dma_addr_t meta_dma;
> @@ -589,13 +592,27 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req,
>         return true;
>  }
>
> -static void nvme_free_prps(struct nvme_queue *nvmeq, struct request *req)
> +static inline struct dma_pool *nvme_dma_pool(struct nvme_queue *nvmeq,
> +               struct nvme_iod *iod)
> +{
> +       if (iod->flags & IOD_SMALL_DESCRIPTOR)
> +               return nvmeq->descriptor_pools.small;
> +       return nvmeq->descriptor_pools.large;
> +}
> +
> +static void nvme_free_descriptors(struct nvme_queue *nvmeq, struct request *req)

Feels like this name change from "prps" to "descriptors" belongs
better in the prior commit "nvme-pci: rename the descriptor pools"?

Best,
Caleb

>  {
>         const int last_prp = NVME_CTRL_PAGE_SIZE / sizeof(__le64) - 1;
>         struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>         dma_addr_t dma_addr = iod->first_dma;
>         int i;
>
> +       if (iod->nr_descriptors == 1) {
> +               dma_pool_free(nvme_dma_pool(nvmeq, iod), iod->descriptors[0],
> +                               dma_addr);
> +               return;
> +       }
> +
>         for (i = 0; i < iod->nr_descriptors; i++) {
>                 __le64 *prp_list = iod->descriptors[i];
>                 dma_addr_t next_dma_addr = le64_to_cpu(prp_list[last_prp]);
> @@ -620,15 +637,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_queue *nvmeq,
>         WARN_ON_ONCE(!iod->sgt.nents);
>
>         dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0);
> -
> -       if (iod->nr_descriptors == 0)
> -               dma_pool_free(nvmeq->descriptor_pools.small,
> -                               iod->descriptors[0], iod->first_dma);
> -       else if (iod->nr_descriptors == 1)
> -               dma_pool_free(nvmeq->descriptor_pools.large,
> -                               iod->descriptors[0], iod->first_dma);
> -       else
> -               nvme_free_prps(nvmeq, req);
> +       nvme_free_descriptors(nvmeq, req);
>         mempool_free(iod->sgt.sgl, dev->iod_mempool);
>  }
>
> @@ -650,7 +659,6 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
>                 struct request *req, struct nvme_rw_command *cmnd)
>  {
>         struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> -       struct dma_pool *pool;
>         int length = blk_rq_payload_bytes(req);
>         struct scatterlist *sg = iod->sgt.sgl;
>         int dma_len = sg_dma_len(sg);
> @@ -658,7 +666,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
>         int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1);
>         __le64 *prp_list;
>         dma_addr_t prp_dma;
> -       int nprps, i;
> +       int i;
>
>         length -= (NVME_CTRL_PAGE_SIZE - offset);
>         if (length <= 0) {
> @@ -680,27 +688,23 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
>                 goto done;
>         }
>
> -       nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
> -       if (nprps <= (256 / 8)) {
> -               pool = nvmeq->descriptor_pools.small;
> -               iod->nr_descriptors = 0;
> -       } else {
> -               pool = nvmeq->descriptor_pools.large;
> -               iod->nr_descriptors = 1;
> -       }
> +       if (DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE) <=
> +           256 / sizeof(__le64))
> +               iod->flags |= IOD_SMALL_DESCRIPTOR;
>
> -       prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
> -       if (!prp_list) {
> -               iod->nr_descriptors = -1;
> +       prp_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod), GFP_ATOMIC,
> +                       &prp_dma);
> +       if (!prp_list)
>                 return BLK_STS_RESOURCE;
> -       }
> -       iod->descriptors[0] = prp_list;
> +       iod->descriptors[iod->nr_descriptors++] = prp_list;

This is slightly more expensive than the existing code because the
compiler can't tell that iod->nr_descriptors is initially 0. How about
keeping this as iod->descriptors[0] = prp_list and setting
iod->nr_descriptors = 1 on the next line?

>         iod->first_dma = prp_dma;
>         i = 0;
>         for (;;) {
>                 if (i == NVME_CTRL_PAGE_SIZE >> 3) {
>                         __le64 *old_prp_list = prp_list;
> -                       prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
> +
> +                       prp_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod),
> +                                       GFP_ATOMIC, &prp_dma);
>                         if (!prp_list)
>                                 goto free_prps;
>                         iod->descriptors[iod->nr_descriptors++] = prp_list;
> @@ -727,7 +731,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
>         cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma);
>         return BLK_STS_OK;
>  free_prps:
> -       nvme_free_prps(nvmeq, req);
> +       nvme_free_descriptors(nvmeq, req);
>         return BLK_STS_RESOURCE;
>  bad_sgl:
>         WARN(DO_ONCE(nvme_print_sgl, iod->sgt.sgl, iod->sgt.nents),
> @@ -756,7 +760,6 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq,
>                 struct request *req, struct nvme_rw_command *cmd)
>  {
>         struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> -       struct dma_pool *pool;
>         struct nvme_sgl_desc *sg_list;
>         struct scatterlist *sg = iod->sgt.sgl;
>         unsigned int entries = iod->sgt.nents;
> @@ -771,21 +774,14 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq,
>                 return BLK_STS_OK;
>         }
>
> -       if (entries <= (256 / sizeof(struct nvme_sgl_desc))) {
> -               pool = nvmeq->descriptor_pools.small;
> -               iod->nr_descriptors = 0;
> -       } else {
> -               pool = nvmeq->descriptor_pools.large;
> -               iod->nr_descriptors = 1;
> -       }
> +       if (entries <= 256 / sizeof(*sg_list))
> +               iod->flags |= IOD_SMALL_DESCRIPTOR;
>
> -       sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma);
> -       if (!sg_list) {
> -               iod->nr_descriptors = -1;
> +       sg_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod), GFP_ATOMIC,
> +                       &sgl_dma);
> +       if (!sg_list)
>                 return BLK_STS_RESOURCE;
> -       }
> -
> -       iod->descriptors[0] = sg_list;
> +       iod->descriptors[iod->nr_descriptors++] = sg_list;
>         iod->first_dma = sgl_dma;
>
>         nvme_pci_sgl_set_seg(&cmd->dptr.sgl, sgl_dma, entries);
> @@ -982,7 +978,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req)
>         blk_status_t ret;
>
>         iod->flags = 0;
> -       iod->nr_descriptors = -1;
> +       iod->nr_descriptors = 0;
>         iod->sgt.nents = 0;
>         iod->meta_sgt.nents = 0;
>
> --
> 2.47.2
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/7] nvme-pci: add a symolic name for the small pool size
  2025-05-13  7:00 misc cleanups for nvme-pci Christoph Hellwig
                   ` (4 preceding siblings ...)
  2025-05-13  7:00 ` [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations Christoph Hellwig
@ 2025-05-13  7:00 ` Christoph Hellwig
  2025-05-13 14:51   ` Kanchan Joshi/Kanchan Joshi
  2025-05-14 18:24   ` Caleb Sander Mateos
  2025-05-13  7:00 ` [PATCH 7/7] nvme-pci: use struct_size for allocation struct nvme_dev Christoph Hellwig
  2025-05-13 15:34 ` misc cleanups for nvme-pci Keith Busch
  7 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:00 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme

From: Leon Romanovsky <leon@kernel.org>

Open coding magic numbers in multiple places is never a good idea.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 16e2ce25da83..5396fe30eb94 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -37,6 +37,8 @@
 
 #define SGES_PER_PAGE	(NVME_CTRL_PAGE_SIZE / sizeof(struct nvme_sgl_desc))
 
+#define NVME_SMALL_POOL_SIZE	256
+
 /*
  * These can be higher, but we need to ensure that any command doesn't
  * require an sg allocation that needs more than a page of data.
@@ -407,7 +409,7 @@ static struct nvme_descriptor_pools *
 nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node)
 {
 	struct nvme_descriptor_pools *pools = &dev->descriptor_pools[numa_node];
-	size_t small_align = 256;
+	size_t small_align = NVME_SMALL_POOL_SIZE;
 
 	if (pools->small)
 		return pools; /* already initialized */
@@ -422,7 +424,7 @@ nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node)
 
 	/* Optimisation for I/Os between 4k and 128k */
 	pools->small = dma_pool_create_node("nvme descriptor 256", dev->dev,
-			256, small_align, 0, numa_node);
+			NVME_SMALL_POOL_SIZE, small_align, 0, numa_node);
 	if (!pools->small) {
 		dma_pool_destroy(pools->large);
 		pools->large = NULL;
@@ -689,7 +691,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
 	}
 
 	if (DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE) <=
-	    256 / sizeof(__le64))
+	    NVME_SMALL_POOL_SIZE / sizeof(__le64))
 		iod->flags |= IOD_SMALL_DESCRIPTOR;
 
 	prp_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod), GFP_ATOMIC,
@@ -774,7 +776,7 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq,
 		return BLK_STS_OK;
 	}
 
-	if (entries <= 256 / sizeof(*sg_list))
+	if (entries <= NVME_SMALL_POOL_SIZE / sizeof(*sg_list))
 		iod->flags |= IOD_SMALL_DESCRIPTOR;
 
 	sg_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod), GFP_ATOMIC,
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/7] nvme-pci: add a symolic name for the small pool size
  2025-05-13  7:00 ` [PATCH 6/7] nvme-pci: add a symolic name for the small pool size Christoph Hellwig
@ 2025-05-13 14:51   ` Kanchan Joshi/Kanchan Joshi
  2025-05-14 18:24   ` Caleb Sander Mateos
  1 sibling, 0 replies; 38+ messages in thread
From: Kanchan Joshi/Kanchan Joshi @ 2025-05-13 14:51 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/7] nvme-pci: add a symolic name for the small pool size
  2025-05-13  7:00 ` [PATCH 6/7] nvme-pci: add a symolic name for the small pool size Christoph Hellwig
  2025-05-13 14:51   ` Kanchan Joshi/Kanchan Joshi
@ 2025-05-14 18:24   ` Caleb Sander Mateos
  1 sibling, 0 replies; 38+ messages in thread
From: Caleb Sander Mateos @ 2025-05-14 18:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, Leon Romanovsky, linux-nvme

On Tue, May 13, 2025 at 12:00 AM Christoph Hellwig <hch@lst.de> wrote:
>
> From: Leon Romanovsky <leon@kernel.org>
>
> Open coding magic numbers in multiple places is never a good idea.
>
> Signed-off-by: Leon Romanovsky <leon@kernel.org>
> [hch: split from a larger patch]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 16e2ce25da83..5396fe30eb94 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -37,6 +37,8 @@
>
>  #define SGES_PER_PAGE  (NVME_CTRL_PAGE_SIZE / sizeof(struct nvme_sgl_desc))
>
> +#define NVME_SMALL_POOL_SIZE   256
> +
>  /*
>   * These can be higher, but we need to ensure that any command doesn't
>   * require an sg allocation that needs more than a page of data.
> @@ -407,7 +409,7 @@ static struct nvme_descriptor_pools *
>  nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node)
>  {
>         struct nvme_descriptor_pools *pools = &dev->descriptor_pools[numa_node];
> -       size_t small_align = 256;
> +       size_t small_align = NVME_SMALL_POOL_SIZE;
>
>         if (pools->small)
>                 return pools; /* already initialized */
> @@ -422,7 +424,7 @@ nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node)
>
>         /* Optimisation for I/Os between 4k and 128k */

This comment explains the choice of 256. Maybe it would make sense to
move it to the definition of NVME_SMALL_POOL_SIZE?

>         pools->small = dma_pool_create_node("nvme descriptor 256", dev->dev,

The magic number is used in this string too. Maybe change "256" to "small"?

Other than that,
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>


> -                       256, small_align, 0, numa_node);
> +                       NVME_SMALL_POOL_SIZE, small_align, 0, numa_node);
>         if (!pools->small) {
>                 dma_pool_destroy(pools->large);
>                 pools->large = NULL;
> @@ -689,7 +691,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq,
>         }
>
>         if (DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE) <=
> -           256 / sizeof(__le64))
> +           NVME_SMALL_POOL_SIZE / sizeof(__le64))
>                 iod->flags |= IOD_SMALL_DESCRIPTOR;
>
>         prp_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod), GFP_ATOMIC,
> @@ -774,7 +776,7 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq,
>                 return BLK_STS_OK;
>         }
>
> -       if (entries <= 256 / sizeof(*sg_list))
> +       if (entries <= NVME_SMALL_POOL_SIZE / sizeof(*sg_list))
>                 iod->flags |= IOD_SMALL_DESCRIPTOR;
>
>         sg_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod), GFP_ATOMIC,
> --
> 2.47.2
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 7/7] nvme-pci: use struct_size for allocation struct nvme_dev
  2025-05-13  7:00 misc cleanups for nvme-pci Christoph Hellwig
                   ` (5 preceding siblings ...)
  2025-05-13  7:00 ` [PATCH 6/7] nvme-pci: add a symolic name for the small pool size Christoph Hellwig
@ 2025-05-13  7:00 ` Christoph Hellwig
  2025-05-13  7:20   ` Leon Romanovsky
                     ` (2 more replies)
  2025-05-13 15:34 ` misc cleanups for nvme-pci Keith Busch
  7 siblings, 3 replies; 38+ messages in thread
From: Christoph Hellwig @ 2025-05-13  7:00 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme

This avoid open coding the variable size array arithmetics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5396fe30eb94..0efbc9329291 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3212,8 +3212,8 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev,
 	struct nvme_dev *dev;
 	int ret = -ENOMEM;
 
-	dev = kzalloc_node(sizeof(*dev) + nr_node_ids *
-			sizeof(*dev->descriptor_pools), GFP_KERNEL, node);
+	dev = kzalloc_node(struct_size(dev, descriptor_pools, nr_node_ids),
+			GFP_KERNEL, node);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 	INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work);
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] nvme-pci: use struct_size for allocation struct nvme_dev
  2025-05-13  7:00 ` [PATCH 7/7] nvme-pci: use struct_size for allocation struct nvme_dev Christoph Hellwig
@ 2025-05-13  7:20   ` Leon Romanovsky
  2025-05-13 14:54   ` Kanchan Joshi/Kanchan Joshi
  2025-05-14 18:28   ` Caleb Sander Mateos
  2 siblings, 0 replies; 38+ messages in thread
From: Leon Romanovsky @ 2025-05-13  7:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Caleb Sander Mateos, linux-nvme

On Tue, May 13, 2025 at 09:00:23AM +0200, Christoph Hellwig wrote:
> This avoid open coding the variable size array arithmetics.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Thanks,
Reviewed-by: Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] nvme-pci: use struct_size for allocation struct nvme_dev
  2025-05-13  7:00 ` [PATCH 7/7] nvme-pci: use struct_size for allocation struct nvme_dev Christoph Hellwig
  2025-05-13  7:20   ` Leon Romanovsky
@ 2025-05-13 14:54   ` Kanchan Joshi/Kanchan Joshi
  2025-05-14 18:28   ` Caleb Sander Mateos
  2 siblings, 0 replies; 38+ messages in thread
From: Kanchan Joshi/Kanchan Joshi @ 2025-05-13 14:54 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, Sagi Grimberg
  Cc: Caleb Sander Mateos, Leon Romanovsky, linux-nvme

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] nvme-pci: use struct_size for allocation struct nvme_dev
  2025-05-13  7:00 ` [PATCH 7/7] nvme-pci: use struct_size for allocation struct nvme_dev Christoph Hellwig
  2025-05-13  7:20   ` Leon Romanovsky
  2025-05-13 14:54   ` Kanchan Joshi/Kanchan Joshi
@ 2025-05-14 18:28   ` Caleb Sander Mateos
  2 siblings, 0 replies; 38+ messages in thread
From: Caleb Sander Mateos @ 2025-05-14 18:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, Leon Romanovsky, linux-nvme

On Tue, May 13, 2025 at 12:00 AM Christoph Hellwig <hch@lst.de> wrote:
>
> This avoid open coding the variable size array arithmetics.

typo:  "avoid" -> "avoids"?

>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>

> ---
>  drivers/nvme/host/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 5396fe30eb94..0efbc9329291 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3212,8 +3212,8 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev,
>         struct nvme_dev *dev;
>         int ret = -ENOMEM;
>
> -       dev = kzalloc_node(sizeof(*dev) + nr_node_ids *
> -                       sizeof(*dev->descriptor_pools), GFP_KERNEL, node);
> +       dev = kzalloc_node(struct_size(dev, descriptor_pools, nr_node_ids),
> +                       GFP_KERNEL, node);
>         if (!dev)
>                 return ERR_PTR(-ENOMEM);
>         INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work);
> --
> 2.47.2
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: misc cleanups for nvme-pci
  2025-05-13  7:00 misc cleanups for nvme-pci Christoph Hellwig
                   ` (6 preceding siblings ...)
  2025-05-13  7:00 ` [PATCH 7/7] nvme-pci: use struct_size for allocation struct nvme_dev Christoph Hellwig
@ 2025-05-13 15:34 ` Keith Busch
  7 siblings, 0 replies; 38+ messages in thread
From: Keith Busch @ 2025-05-13 15:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Caleb Sander Mateos, Leon Romanovsky, linux-nvme

On Tue, May 13, 2025 at 09:00:16AM +0200, Christoph Hellwig wrote:
> this has misc cleanups for the nvme-pci I/O code.  Most of this was done
> as part of the new DMA API work.  Which then clashed with the per-node
> dmapools, so I'll try to get it out now instead of having to do another
> rebase.  As part of that this also includes minor tidyups for the new code
> from the per-node dmapools.

Series looks good. Just a minor optimization suggestion from me for
patch 5, and commit log issue Kanchan mentioned for patch 2.

Reviewed-by: Keith Busch <kbusch@kernel.org>


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2025-05-14 18:52 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13  7:00 misc cleanups for nvme-pci Christoph Hellwig
2025-05-13  7:00 ` [PATCH 1/7] nvme-pci: don't try to use SGLs for metadata on the admin queue Christoph Hellwig
2025-05-13  7:28   ` Leon Romanovsky
2025-05-13  7:39     ` Christoph Hellwig
2025-05-13  7:45       ` Leon Romanovsky
2025-05-13  7:47         ` Christoph Hellwig
2025-05-13  7:51           ` Leon Romanovsky
2025-05-13  7:00 ` [PATCH 2/7] nvme-pci: store aborted state in flags variable Christoph Hellwig
2025-05-13 14:13   ` Kanchan Joshi/Kanchan Joshi
2025-05-14  0:44     ` Caleb Sander Mateos
2025-05-14  5:06     ` Christoph Hellwig
2025-05-13  7:00 ` [PATCH 3/7] nvme-pci: remove struct nvme_descriptor Christoph Hellwig
2025-05-13 14:15   ` Kanchan Joshi/Kanchan Joshi
2025-05-14  1:07   ` Caleb Sander Mateos
2025-05-14  5:15     ` Christoph Hellwig
2025-05-13  7:00 ` [PATCH 4/7] nvme-pci: rename the descriptor pools Christoph Hellwig
2025-05-13  7:35   ` Leon Romanovsky
2025-05-13 14:16   ` Kanchan Joshi/Kanchan Joshi
2025-05-14  1:03   ` Caleb Sander Mateos
2025-05-13  7:00 ` [PATCH 5/7] nvme-pci: use a better encoding for small prp pool allocations Christoph Hellwig
2025-05-13  7:39   ` Leon Romanovsky
2025-05-13  7:40     ` Christoph Hellwig
2025-05-13  7:49       ` Leon Romanovsky
2025-05-13 14:49   ` Kanchan Joshi/Kanchan Joshi
2025-05-13 15:06   ` Keith Busch
2025-05-14  5:12     ` Christoph Hellwig
2025-05-14 13:06       ` Keith Busch
2025-05-14 14:00         ` Christoph Hellwig
2025-05-14 17:48     ` Caleb Sander Mateos
2025-05-14 17:47   ` Caleb Sander Mateos
2025-05-13  7:00 ` [PATCH 6/7] nvme-pci: add a symolic name for the small pool size Christoph Hellwig
2025-05-13 14:51   ` Kanchan Joshi/Kanchan Joshi
2025-05-14 18:24   ` Caleb Sander Mateos
2025-05-13  7:00 ` [PATCH 7/7] nvme-pci: use struct_size for allocation struct nvme_dev Christoph Hellwig
2025-05-13  7:20   ` Leon Romanovsky
2025-05-13 14:54   ` Kanchan Joshi/Kanchan Joshi
2025-05-14 18:28   ` Caleb Sander Mateos
2025-05-13 15:34 ` misc cleanups for nvme-pci Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox