Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools
@ 2026-05-22 15:06 Mateusz Nowicki
  2026-05-22 15:27 ` Caleb Sander Mateos
  2026-05-23  8:28 ` [PATCH v2] " Mateusz Nowicki
  0 siblings, 2 replies; 6+ messages in thread
From: Mateusz Nowicki @ 2026-05-22 15:06 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme, linux-kernel

nvme_setup_descriptor_pools() indexes dev->descriptor_pools[] using the
numa_node forwarded from hctx->numa_node by its single caller,
nvme_init_hctx_common().  On a non-NUMA kernel hctx->numa_node is
NUMA_NO_NODE (-1).  Because the parameter was declared 'unsigned', the
value becomes UINT_MAX and the index walks off the array (sized to
nr_node_ids), faulting during nvme_alloc_ns() and leaving the namespace
without a /dev node.

Reproduces on any NVMe controller probed by a CONFIG_NUMA=n kernel

  BUG: unable to handle page fault for address: ffff889101603d38
  RIP: 0010:nvme_init_hctx_common+0x5a/0x190 [nvme]
  Call Trace:
   nvme_init_hctx+0x10/0x20 [nvme]
   nvme_alloc_ns+0x9e/0xa10 [nvme_core]
   nvme_scan_ns+0x301/0x3b0 [nvme_core]
   nvme_scan_ns_async+0x23/0x30 [nvme_core]

Switch the parameter to int and fall back to node 0 for negative or
out-of-range values; node 0 is always present.

Signed-off-by: Mateusz Nowicki <mateusz.nowicki@posteo.net>
---
 drivers/nvme/host/pci.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9fd04cd7c5cb..ecec0f9cff98 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -587,11 +587,17 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db,
 }
 
 static struct nvme_descriptor_pools *
-nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node)
+nvme_setup_descriptor_pools(struct nvme_dev *dev, int numa_node)
 {
-	struct nvme_descriptor_pools *pools = &dev->descriptor_pools[numa_node];
+	struct nvme_descriptor_pools *pools;
 	size_t small_align = NVME_SMALL_POOL_SIZE;
 
+	/* hctx->numa_node may be NUMA_NO_NODE; fall back to node 0. */
+	if (numa_node < 0 || numa_node >= nr_node_ids)
+		numa_node = 0;
+
+	pools = &dev->descriptor_pools[numa_node];
+
 	if (pools->small)
 		return pools; /* already initialized */
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools
  2026-05-22 15:06 [PATCH] nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools Mateusz Nowicki
@ 2026-05-22 15:27 ` Caleb Sander Mateos
  2026-05-23  8:17   ` mateusz.nowicki
  2026-05-23  8:28 ` [PATCH v2] " Mateusz Nowicki
  1 sibling, 1 reply; 6+ messages in thread
From: Caleb Sander Mateos @ 2026-05-22 15:27 UTC (permalink / raw)
  To: Mateusz Nowicki
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, linux-kernel

On Fri, May 22, 2026 at 8:07 AM Mateusz Nowicki
<mateusz.nowicki@posteo.net> wrote:
>
> nvme_setup_descriptor_pools() indexes dev->descriptor_pools[] using the
> numa_node forwarded from hctx->numa_node by its single caller,
> nvme_init_hctx_common().  On a non-NUMA kernel hctx->numa_node is
> NUMA_NO_NODE (-1).  Because the parameter was declared 'unsigned', the
> value becomes UINT_MAX and the index walks off the array (sized to
> nr_node_ids), faulting during nvme_alloc_ns() and leaving the namespace
> without a /dev node.

FYI there was a previous report and patch for this issue:
https://lore.kernel.org/linux-nvme/20260309062840.2937858-2-iam@sung-woo.kim/T/#u
. Looks like a v2 was promised but never arrived. Some attribution
(Reported-by, Link?) for the original patch might be good.

I did like that the other patch switched the type of struct
blk_mq_hw_ctx's numa_node field and the argument to struct
blk_mq_ops's init_request function pointer from unsigned int to int to
clarify that it was optional. But probably makes sense to do that as a
follow-on commit separate from the bug fix.

>
> Reproduces on any NVMe controller probed by a CONFIG_NUMA=n kernel
>
>   BUG: unable to handle page fault for address: ffff889101603d38
>   RIP: 0010:nvme_init_hctx_common+0x5a/0x190 [nvme]
>   Call Trace:
>    nvme_init_hctx+0x10/0x20 [nvme]
>    nvme_alloc_ns+0x9e/0xa10 [nvme_core]
>    nvme_scan_ns+0x301/0x3b0 [nvme_core]
>    nvme_scan_ns_async+0x23/0x30 [nvme_core]
>
> Switch the parameter to int and fall back to node 0 for negative or
> out-of-range values; node 0 is always present.
>
> Signed-off-by: Mateusz Nowicki <mateusz.nowicki@posteo.net>
> ---
>  drivers/nvme/host/pci.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 9fd04cd7c5cb..ecec0f9cff98 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -587,11 +587,17 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db,
>  }
>
>  static struct nvme_descriptor_pools *
> -nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node)
> +nvme_setup_descriptor_pools(struct nvme_dev *dev, int numa_node)
>  {
> -       struct nvme_descriptor_pools *pools = &dev->descriptor_pools[numa_node];
> +       struct nvme_descriptor_pools *pools;
>         size_t small_align = NVME_SMALL_POOL_SIZE;
>
> +       /* hctx->numa_node may be NUMA_NO_NODE; fall back to node 0. */
> +       if (numa_node < 0 || numa_node >= nr_node_ids)

Is numa_node >= nr_node_ids possible? I think just numa_node < 0
should be fine, and would avoid a compiler warning about comparing int
to unsigned int.

Best,
Caleb

> +               numa_node = 0;
> +
> +       pools = &dev->descriptor_pools[numa_node];
> +
>         if (pools->small)
>                 return pools; /* already initialized */
>
> --
> 2.53.0
>
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools
  2026-05-22 15:27 ` Caleb Sander Mateos
@ 2026-05-23  8:17   ` mateusz.nowicki
  0 siblings, 0 replies; 6+ messages in thread
From: mateusz.nowicki @ 2026-05-23  8:17 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, linux-kernel

On 22.05.2026 17:27, Caleb Sander Mateos wrote:

> FYI there was a previous report and patch for this issue:
> https://lore.kernel.org/linux-nvme/20260309062840.2937858-2-iam@sung-woo.kim/T/#u
> . Looks like a v2 was promised but never arrived. Some attribution
> (Reported-by, Link?) for the original patch might be good.
> 
Thanks, missed that thread. Added in v2:
    Reported-by: Sung-woo Kim <iam@sung-woo.kim>
    Link: 
https://lore.kernel.org/r/20260309062840.2937858-2-iam@sung-woo.kim

Also added a Fixes: tag for d977506f8863

> I did like that the other patch switched the type of struct
> blk_mq_hw_ctx's numa_node field and the argument to struct
> blk_mq_ops's init_request function pointer from unsigned int to int to
> clarify that it was optional. But probably makes sense to do that as a
> follow-on commit separate from the bug fix.
> 
I can take care of it, will send it as separate follow-up.

> Is numa_node >= nr_node_ids possible? I think just numa_node < 0
> should be fine, and would avoid a compiler warning about comparing int
> to unsigned int.
> 

Right, dropped the nr_node_ids check. Went with == NUMA_NO_NODE
rather than < 0 to match the style in block/blk-mq.c

v2 incoming.

Thanks,
Mateusz


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools
  2026-05-22 15:06 [PATCH] nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools Mateusz Nowicki
  2026-05-22 15:27 ` Caleb Sander Mateos
@ 2026-05-23  8:28 ` Mateusz Nowicki
  2026-05-25  5:56   ` Christoph Hellwig
  2026-05-27 16:34   ` Keith Busch
  1 sibling, 2 replies; 6+ messages in thread
From: Mateusz Nowicki @ 2026-05-23  8:28 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Caleb Sander Mateos, Sung-woo Kim, linux-nvme, linux-kernel

nvme_setup_descriptor_pools() indexes dev->descriptor_pools[] using the
numa_node forwarded from hctx->numa_node by its single caller,
nvme_init_hctx_common().  On a non-NUMA kernel hctx->numa_node is
NUMA_NO_NODE (-1).  Because the parameter was declared 'unsigned', the
value becomes UINT_MAX and the index walks off the array (sized to
nr_node_ids), faulting during nvme_alloc_ns() and leaving the namespace
without a /dev node.

Reproduces on any NVMe controller probed by a CONFIG_NUMA=n kernel:

  BUG: unable to handle page fault for address: ffff889101603d38
  RIP: 0010:nvme_init_hctx_common+0x5a/0x190 [nvme]
  Call Trace:
   nvme_init_hctx+0x10/0x20 [nvme]
   nvme_alloc_ns+0x9e/0xa10 [nvme_core]
   nvme_scan_ns+0x301/0x3b0 [nvme_core]
   nvme_scan_ns_async+0x23/0x30 [nvme_core]

Switch the parameter to int and fall back to node 0 when it is
NUMA_NO_NODE; node 0 is always present.

Fixes: d977506f8863 ("nvme-pci: make PRP list DMA pools per-NUMA-node")
Reported-by: Sung-woo Kim <iam@sung-woo.kim>
Link: https://lore.kernel.org/r/20260309062840.2937858-2-iam@sung-woo.kim
Signed-off-by: Mateusz Nowicki <mateusz.nowicki@posteo.net>
---
v2:
 - drop the (numa_node >= nr_node_ids) check: cpu_to_node() never returns
   that in practice, so NUMA_NO_NODE is the only out-of-range value worth
   guarding against. (Caleb Sander)
 - test against NUMA_NO_NODE explicitly instead of (numa_node < 0)
 - add Fixes: tag, Reported-by/Link to Sung-woo Kim's earlier report.

 drivers/nvme/host/pci.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9fd04cd7c5cb..9815823c974e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -587,11 +587,16 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db,
 }
 
 static struct nvme_descriptor_pools *
-nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node)
+nvme_setup_descriptor_pools(struct nvme_dev *dev, int numa_node)
 {
-	struct nvme_descriptor_pools *pools = &dev->descriptor_pools[numa_node];
+	struct nvme_descriptor_pools *pools;
 	size_t small_align = NVME_SMALL_POOL_SIZE;
 
+	if (numa_node == NUMA_NO_NODE)
+		numa_node = 0;
+
+	pools = &dev->descriptor_pools[numa_node];
+
 	if (pools->small)
 		return pools; /* already initialized */
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools
  2026-05-23  8:28 ` [PATCH v2] " Mateusz Nowicki
@ 2026-05-25  5:56   ` Christoph Hellwig
  2026-05-27 16:34   ` Keith Busch
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2026-05-25  5:56 UTC (permalink / raw)
  To: Mateusz Nowicki
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Caleb Sander Mateos, Sung-woo Kim, linux-nvme, linux-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools
  2026-05-23  8:28 ` [PATCH v2] " Mateusz Nowicki
  2026-05-25  5:56   ` Christoph Hellwig
@ 2026-05-27 16:34   ` Keith Busch
  1 sibling, 0 replies; 6+ messages in thread
From: Keith Busch @ 2026-05-27 16:34 UTC (permalink / raw)
  To: Mateusz Nowicki
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Caleb Sander Mateos,
	Sung-woo Kim, linux-nvme, linux-kernel

On Sat, May 23, 2026 at 08:28:16AM +0000, Mateusz Nowicki wrote:
> nvme_setup_descriptor_pools() indexes dev->descriptor_pools[] using the
> numa_node forwarded from hctx->numa_node by its single caller,
> nvme_init_hctx_common().  On a non-NUMA kernel hctx->numa_node is
> NUMA_NO_NODE (-1).  Because the parameter was declared 'unsigned', the
> value becomes UINT_MAX and the index walks off the array (sized to
> nr_node_ids), faulting during nvme_alloc_ns() and leaving the namespace
> without a /dev node.

Thanks, applied to nvme-7.2.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-27 16:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 15:06 [PATCH] nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools Mateusz Nowicki
2026-05-22 15:27 ` Caleb Sander Mateos
2026-05-23  8:17   ` mateusz.nowicki
2026-05-23  8:28 ` [PATCH v2] " Mateusz Nowicki
2026-05-25  5:56   ` Christoph Hellwig
2026-05-27 16:34   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox