[REPORT] nvmet-rdma: integer overflow in inline-data SGL bounds check -> pre-auth kernel-memory read + remote crash (candidate patch inline)

Linux block layer
 help / color / mirror / Atom feed

* [REPORT] nvmet-rdma: integer overflow in inline-data SGL bounds check -> pre-auth kernel-memory read + remote crash (candidate patch inline)
@ 2026-05-29  6:52 hexlabsecurity
  2026-05-29 16:09 ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: hexlabsecurity @ 2026-05-29  6:52 UTC (permalink / raw)
  To: security@kernel.org
  Cc: hch@lst.de, sagi@grimberg.me, kbusch@kernel.org, kch@nvidia.com,
	linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
	linux-block@vger.kernel.org

Hello,

I would like to report an integer-overflow vulnerability in the NVMe-oF
RDMA target (drivers/nvme/target/rdma.c).  The inline-data SGL bounds
check in nvmet_rdma_map_sgl_inline() is computed in u64 over two
host-controlled values and wraps, which a remote fabric peer can use
both to read kernel memory back over the fabric and to crash the target.

== Affected ==

  drivers/nvme/target/rdma.c, nvmet_rdma_map_sgl_inline()

  Verified present on the current mainline tree (commit 27fa82620cba,
  ~v7.1-rc5), at the bounds check:

    static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
    {
        struct nvme_sgl_desc *sgl = &rsp->req.cmd->common.dptr.sgl;
        u64 off = le64_to_cpu(sgl->addr);     /* host-controlled, 64-bit */
        u32 len = le32_to_cpu(sgl->length);   /* host-controlled, 32-bit */
        ...
        if (off + len > rsp->queue->dev->inline_data_size) {   /* u64 wrap */
            pr_err("invalid inline data offset!\n");
            return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
        }
        ...
        nvmet_rdma_use_inline_sg(rsp, len, off);
    }

  "off + len" is evaluated in u64 and wraps modulo 2^64.  For example
  addr = 0xfffffffffffffe00, length = 0x1000 makes the sum wrap to
  0xe00, which is <= inline_data_size (default PAGE_SIZE), so the check
  passes.  The current check form (against the per-port inline_data_size)
  and the fixed-size inline_sg[NVMET_RDMA_MAX_INLINE_SGE] array with the
  num_pages(len) loop were introduced together by commit 0d5ee2b2ab4f
  ("nvmet-rdma: support max(16KB, PAGE_SIZE) inline data"), which is the
  Fixes: I used.  Note: the single-page inline path that predates that
  commit may have an analogous u64-overflow read in a different code
  shape; I would appreciate the maintainers' judgement on whether the
  stable backport scope should reach before that commit.

== Two consequences of the bypass ==

  1. Kernel-memory read (information disclosure).
     nvmet_rdma_use_inline_sg() does "sg->offset = off", truncating the
     64-bit offset to scatterlist::offset (unsigned int).  The block
     layer then accesses page_to_phys(inline_page) + (off & 0xffffffff),
     so the target reads up to inline_data_size bytes of kernel memory
     per write command and returns them to the host on read-back, or
     faults the in-kernel copy if the offset lands on unmapped memory.

  2. Kernel-memory corruption -> remote crash (denial of service).
     A large length makes "sg_count = num_pages(len)" in
     nvmet_rdma_use_inline_sg() exceed NVMET_RDMA_MAX_INLINE_SGE (4), so
     the loop writes scatterlist entries past the fixed-size inline_sg[]
     array, corrupting the surrounding command object.

== Reachability ==

  The path is reached by any write command carrying an inline SGL, i.e.
  after a Fabrics Connect.  On a subsystem configured with
  attr_allow_any_host=1 it is reachable WITHOUT authentication by any
  RDMA peer (RoCE/iWARP/IB) that can reach the target's listener.  With
  DH-CHAP configured, or attr_allow_any_host=0 with an unknown host NQN,
  a valid/known host NQN is required first.

== Empirical reproduction ==

  Reproduced against a stock nvmet-rdma target over a soft-iWARP (siw)
  fabric on a Linux 6.12.90 build with KASAN (KASAN_INLINE):

  - Read: a single write command with addr = 0xfffffffffffffe00,
    length = 0x1000 produced a KASAN out-of-bounds read and returned
    ~4 KiB of kernel memory (including kernel .text) into the
    attacker-readable namespace.

  - Crash: a write command with addr = 0xffffffffffff0500,
    length = 0x10000 (sum wraps to 0x500 <= inline_data_size, but
    num_pages(0x10000) = 16 writes 16 scatterlist entries into the
    4-entry inline_sg[], 12 past its end) deterministically corrupted
    the command object and oopsed the target:

      Oops: general protection fault [...] KASAN: null-ptr-deref
      RIP: nvmet_rdma_post_recv+0x... [nvmet_rdma]
        nvmet_rdma_post_recv <- nvmet_rdma_queue_response
        <- __nvmet_req_complete <- nvmet_check_transfer_len
        <- nvmet_rdma_handle_command <- ib_cq_poll_work

    Every reconnect re-triggers it (persistent remote DoS).  The
    nvmet_rdma_cmd objects are carved from one contiguous kcalloc'd
    array, so the over-long entry write stays within that allocation and
    KASAN flags the downstream dereference of the corrupted command in
    nvmet_rdma_post_recv rather than the store itself.  The out-of-bounds
    content is not attacker-controlled, so this is a crash/corruption
    primitive, not a controlled write; I do not see a path to remote code
    execution from this bug.

  Severity estimate.  The two consequences arise from different inline-SGL
  capsules (small vs large length) and are scored as separate single-capsule
  outcomes, not one combined vector:

    OOB read  (info-disclosure):  CVSS 7.5 HIGH
        CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N
    OOB write (corruption/DoS):   CVSS 8.2 HIGH
        CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:H

  Headline 8.2 HIGH (both reachable pre-auth with attr_allow_any_host=1).
  With attr_allow_any_host=0 a valid host NQN is required first (PR:L),
  lowering these to 6.5 and 7.1.

== Suggested fix ==

  Validate the offset with check_add_overflow() before comparing against
  inline_data_size.  A passing check then guarantees
  off + len <= inline_data_size <= NVMET_RDMA_MAX_INLINE_DATA_SIZE, which
  bounds both the truncated scatterlist::offset and
  num_pages(len) <= NVMET_RDMA_MAX_INLINE_SGE, closing the read and the
  inline_sg[] overflow together.  Candidate patch inline below (applies
  to current mainline).

== Embargo ==

  I am happy to follow the standard process.  Proposing a 7-day embargo;
  the fix is small and I can adjust as the maintainers prefer.  I have
  not notified linux-distros and will hold that until a public patch
  lands, per the usual guidance.

I am an independent security researcher; please credit
"Bryam Vargas <hexlabsecurity@proton.me>" (Reported-by already in the
patch).  Affiliation: HEXLAB SAS (registration pending) -- Cali,
Colombia.  Happy to provide the full reproduction harness on request.

Thank you,
Bryam Vargas

----- candidate patch (inline, plain text) -----

From 448c122c744430c1c2926d635855a3894370ee33 Mon Sep 17 00:00:00 2001
From: Bryam Vargas <hexlabsecurity@proton.me>
Date: Thu, 28 May 2026 21:23:52 -0500
Subject: [PATCH] nvmet-rdma: fix integer overflow in inline data SGL bounds
 check

nvmet_rdma_map_sgl_inline() bounds-checks the inline data descriptor
with both operands host-controlled and the sum evaluated in u64:

	u64 off = le64_to_cpu(sgl->addr);
	u32 len = le32_to_cpu(sgl->length);
	...
	if (off + len > rsp->queue->dev->inline_data_size)
		return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;

"off + len" therefore wraps modulo 2^64.  A descriptor with, for
example, addr = 0xfffffffffffffe00 and length = 0x1000 makes the sum
wrap to 0xe00, which passes the inline_data_size check.  An inline-SGL
write command reaches this path after a Fabrics Connect; on a subsystem
with attr_allow_any_host set it is reachable without authentication by
any peer that can reach the target.

Two distinct out-of-bounds accesses follow from the bypass:

 - nvmet_rdma_use_inline_sg() stores the 64-bit offset into
   scatterlist::offset, which is unsigned int, committing the truncated
   attacker offset to the inline page.  The block layer then accesses
   page_to_phys(inline_page) + (off & 0xffffffff), reading up to
   inline_data_size bytes of kernel memory per command back to the host
   (or faulting the target if the offset lands on unmapped memory).

 - A large len makes sg_count = num_pages(len) in
   nvmet_rdma_use_inline_sg() exceed NVMET_RDMA_MAX_INLINE_SGE, so the
   loop writes scatterlist entries past the fixed-size inline_sg[]
   array, corrupting the surrounding command object and oopsing the
   target on the next use of that command.

Validate the offset with check_add_overflow() before comparing against
inline_data_size.  A passing check then guarantees
off + len <= inline_data_size <= NVMET_RDMA_MAX_INLINE_DATA_SIZE, which
bounds both the truncated scatterlist::offset and
num_pages(len) <= NVMET_RDMA_MAX_INLINE_SGE, closing the out-of-bounds
read and the inline_sg[] overflow together.

Reported-by: Bryam Vargas <hexlabsecurity@proton.me>
Fixes: 0d5ee2b2ab4f ("nvmet-rdma: support max(16KB, PAGE_SIZE) inline data")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
Review context (not for the commit log):

Reproducer -- unprivileged remote RDMA peer against a target with
attr_allow_any_host=1, a single inline-SGL WRITE capsule:
  * OOB read:  sgl->addr=0xfffffffffffffe00, sgl->length=0x1000
               (off+len wraps to 0xe00 <= inline_data_size; sg->offset
               truncates to 0xfffffe00) -> ~4 KiB of kernel memory is
               read back from the namespace.
  * OOB write: sgl->addr=0xffffffffffff0500, sgl->length=0x10000
               (num_pages(0x10000)=16 overruns the 4-entry inline_sg[])
               -> target memory corruption / crash.

A/B-tested on a 6.12.90 KASAN lab kernel (same .config, only this hunk
differs): pre-fix the OOB-read capsule trips "KASAN: use-after-free in
copy_page_from_iter_atomic" via nvmet_file_execute_io; post-fix both
capsules are rejected with "invalid inline data offset!"
(NVME_SC_SGL_INVALID_OFFSET), benign inline writes still succeed, and no
KASAN/oops fires. The fix decides identically in 32- and 64-bit builds
(check_add_overflow operates on u64).

 drivers/nvme/target/rdma.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e6e2c3f9afdf..a5bbf9d41c3b 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -12,6 +12,7 @@
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/nvme.h>
+#include <linux/overflow.h>
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/wait.h>
@@ -847,6 +848,7 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
 	struct nvme_sgl_desc *sgl = &rsp->req.cmd->common.dptr.sgl;
 	u64 off = le64_to_cpu(sgl->addr);
 	u32 len = le32_to_cpu(sgl->length);
+	u64 bound;

 	if (!nvme_is_write(rsp->req.cmd)) {
 		rsp->req.error_loc =
@@ -854,7 +856,8 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
 		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
 	}

-	if (off + len > rsp->queue->dev->inline_data_size) {
+	if (check_add_overflow(off, (u64)len, &bound) ||
+	    bound > rsp->queue->dev->inline_data_size) {
 		pr_err("invalid inline data offset!\n");
 		return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
 	}
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [REPORT] nvmet-rdma: integer overflow in inline-data SGL bounds check -> pre-auth kernel-memory read + remote crash (candidate patch inline)
  2026-05-29  6:52 [REPORT] nvmet-rdma: integer overflow in inline-data SGL bounds check -> pre-auth kernel-memory read + remote crash (candidate patch inline) hexlabsecurity
@ 2026-05-29 16:09 ` Keith Busch
  2026-06-04  8:46   ` [PATCH] nvmet-rdma: reject inline data with a nonzero offset Bryam Vargas
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2026-05-29 16:09 UTC (permalink / raw)
  To: hexlabsecurity
  Cc: security@kernel.org, hch@lst.de, sagi@grimberg.me, kch@nvidia.com,
	linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
	linux-block@vger.kernel.org

On Fri, May 29, 2026 at 06:52:13AM +0000, hexlabsecurity@proton.me wrote:
> @@ -847,6 +848,7 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
>  	struct nvme_sgl_desc *sgl = &rsp->req.cmd->common.dptr.sgl;
>  	u64 off = le64_to_cpu(sgl->addr);
>  	u32 len = le32_to_cpu(sgl->length);
> +	u64 bound;
> 
>  	if (!nvme_is_write(rsp->req.cmd)) {
>  		rsp->req.error_loc =
> @@ -854,7 +856,8 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
>  		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
>  	}
> 
> -	if (off + len > rsp->queue->dev->inline_data_size) {
> +	if (check_add_overflow(off, (u64)len, &bound) ||
> +	    bound > rsp->queue->dev->inline_data_size) {

Since you don't use "bound" for anything other than the final check, I
think we make this simpler without it:

	if (off > rsp->queue->dev->inline_data_size ||
	    len > rsp->queue->dev->inline_data_size - off) {

Thanks for the report.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] nvmet-rdma: reject inline data with a nonzero offset
  2026-05-29 16:09 ` Keith Busch
@ 2026-06-04  8:46   ` Bryam Vargas
  2026-06-04  9:32     ` Keith Busch
  2026-06-04 10:22     ` Keith Busch
  0 siblings, 2 replies; 7+ messages in thread
From: Bryam Vargas @ 2026-06-04  8:46 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, Keith Busch, Chaitanya Kulkarni
  Cc: linux-nvme, linux-rdma, linux-block

nvmet_rdma_map_sgl_inline() takes a host-controlled offset and length
from the inline SGL descriptor and bounds-checks them against the
per-port inline_data_size:

	u64 off = le64_to_cpu(sgl->addr);
	u32 len = le32_to_cpu(sgl->length);
	...
	if (off + len > rsp->queue->dev->inline_data_size)
		return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;

This is unsound whenever the offset is nonzero:

 - "off + len" is evaluated in u64 and wraps modulo 2^64.  A descriptor
   with addr = 0xfffffffffffffe00 and length = 0x1000 wraps the sum to
   0xe00 and passes the check.  nvmet_rdma_use_inline_sg() then stores
   the offset into scatterlist::offset (unsigned int) and the block
   layer reads out of bounds of the inline page; a large len also makes
   num_pages(len) exceed NVMET_RDMA_MAX_INLINE_SGE and overruns the
   fixed-size inline_sg[] array.

 - Even computed without wrapping, inline_data_size is configurable up
   to max(SZ_16K, PAGE_SIZE).  An offset in (PAGE_SIZE, inline_data_size]
   passes the bound and then "PAGE_SIZE - off" in
   nvmet_rdma_use_inline_sg() underflows, leaving scatterlist::length at
   ~4 GiB and the offset pointing past the first inline page.

A nonzero inline offset is never legitimate here.  nvmet advertises
icdoff = 0, nvme_rdma_setup_ctrl() refuses to use a controller that
reports a nonzero icdoff ("icdoff is not supported!"), and
nvme_rdma_map_sg_inline() sets the inline descriptor addr to icdoff, so
a compliant initiator always sends offset 0.  nvmet_rdma_use_inline_sg()
likewise assumes the inline data begins at the start of the first inline
page (the RNIC DMAs it to page offset 0); any nonzero offset also
mis-describes the scatterlist even when it is in bounds.

Reject a nonzero offset directly.  This closes the u64 overflow, the
inline_sg[] overrun and the PAGE_SIZE - off underflow together, and is
simpler than bounding the offset.

Fixes: 0d5ee2b2ab4f ("nvmet-rdma: support max(16KB, PAGE_SIZE) inline data")
Cc: stable@vger.kernel.org
Reported-by: Bryam Vargas <hexlabsecurity@proton.me>
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
Keith, thanks for the suggested form

	if (off > rsp->queue->dev->inline_data_size ||
	    len > rsp->queue->dev->inline_data_size - off)

It does stop the u64 overflow, but while testing it I found it is still
incomplete when a port is configured with inline_data_size > PAGE_SIZE
(it is settable up to max(SZ_16K, PAGE_SIZE)): an offset in
(PAGE_SIZE, inline_data_size] passes that bound and then "PAGE_SIZE - off"
in nvmet_rdma_use_inline_sg() underflows, leaving scatterlist::length at
~4 GiB pointing past the first inline page. The block backend then
executes the out-of-bounds read (KASAN trace below). Since a compliant
initiator never sends a nonzero inline offset (nvmet advertises
icdoff = 0 and nvme_rdma_setup_ctrl() refuses a nonzero icdoff),
rejecting off != 0 closes that case too and is even simpler, so this
formal patch uses that instead of bounding the offset.

Verified on a KASAN build (inline_data_size = 16384) over an rdma_rxe
soft-RoCE loopback nvmet-rdma target with a block backend:
  - offset 0, 4 KiB inline write: succeeds, clean (control).
  - offset 8192, len 4096: without this patch the bounds check passes
    and the block backend executes the out-of-bounds read
      BUG: KASAN: slab-out-of-bounds in copy_folio_from_iter_atomic
      Read of size 4096 ...
    with this patch it is rejected ("invalid inline data offset!").
  - offset 4095 (< PAGE_SIZE): without this patch it is in bounds but
    mis-describes the SGL (NVME_SC_SGL_INVALID_DATA, no OOB); with this
    patch it is rejected up front.
  - offset 0 keeps working (no regression for compliant initiators).

 drivers/nvme/target/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -854,7 +854,7 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
 		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
 	}

-	if (off + len > rsp->queue->dev->inline_data_size) {
+	if (off || len > rsp->queue->dev->inline_data_size) {
 		pr_err("invalid inline data offset!\n");
 		return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
 	}
--
2.43.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet-rdma: reject inline data with a nonzero offset
  2026-06-04  8:46   ` [PATCH] nvmet-rdma: reject inline data with a nonzero offset Bryam Vargas
@ 2026-06-04  9:32     ` Keith Busch
  2026-06-04 10:22     ` Keith Busch
  1 sibling, 0 replies; 7+ messages in thread
From: Keith Busch @ 2026-06-04  9:32 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, linux-nvme,
	linux-rdma, linux-block

On Thu, Jun 04, 2026 at 08:46:33AM +0000, Bryam Vargas wrote:
> A nonzero inline offset is never legitimate here.  nvmet advertises
> icdoff = 0, nvme_rdma_setup_ctrl() refuses to use a controller that
> reports a nonzero icdoff ("icdoff is not supported!"), and
> nvme_rdma_map_sg_inline() sets the inline descriptor addr to icdoff, so
> a compliant initiator always sends offset 0.  nvmet_rdma_use_inline_sg()
> likewise assumes the inline data begins at the start of the first inline
> page (the RNIC DMAs it to page offset 0); any nonzero offset also
> mis-describes the scatterlist even when it is in bounds.

Wait, is this accurate? I'm pretty sure icdoff == 0 just means the host
can start the inline data immediately after the SQE, not that it
necessarily must do that. My understanding is offsets are still allowed
as long as the total length fits.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet-rdma: reject inline data with a nonzero offset
  2026-06-04  8:46   ` [PATCH] nvmet-rdma: reject inline data with a nonzero offset Bryam Vargas
  2026-06-04  9:32     ` Keith Busch
@ 2026-06-04 10:22     ` Keith Busch
  2026-06-04 19:36       ` [PATCH v2] nvmet-rdma: handle " Bryam Vargas
  1 sibling, 1 reply; 7+ messages in thread
From: Keith Busch @ 2026-06-04 10:22 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, linux-nvme,
	linux-rdma, linux-block

On Thu, Jun 04, 2026 at 08:46:33AM +0000, Bryam Vargas wrote:
> It does stop the u64 overflow, but while testing it I found it is still
> incomplete when a port is configured with inline_data_size > PAGE_SIZE
> (it is settable up to max(SZ_16K, PAGE_SIZE)): an offset in
> (PAGE_SIZE, inline_data_size] passes that bound and then "PAGE_SIZE - off"
> in nvmet_rdma_use_inline_sg() underflows, leaving scatterlist::length at
> ~4 GiB pointing past the first inline page.

Then the use_inline_sg() should find the appropriate index and offset
accordingly.

---
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 1234567..abcdefg 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -821,22 +821,29 @@ static void nvmet_rdma_use_inline_sg(struct nvmet_rdma_rsp *rsp, u32 len,
 		u64 off)
 {
-	int sg_count = num_pages(len);
+	u64 page_off = off % PAGE_SIZE;
+	u64 page_idx = off / PAGE_SIZE;
+	int sg_count = num_pages(page_off + len);
 	struct scatterlist *sg;
 	int i;
 
-	sg = rsp->cmd->inline_sg;
+	sg = &rsp->cmd->inline_sg[page_idx];
 	for (i = 0; i < sg_count; i++, sg++) {
 		if (i < sg_count - 1)
 			sg_unmark_end(sg);
 		else
 			sg_mark_end(sg);
-		sg->offset = off;
-		sg->length = min_t(int, len, PAGE_SIZE - off);
+		sg->offset = page_off;
+		sg->length = min_t(u64, len, PAGE_SIZE - page_off);
 		len -= sg->length;
-		if (!i)
-			off = 0;
+		page_off = 0;
 	}
 
-	rsp->req.sg = rsp->cmd->inline_sg;
+	rsp->req.sg = &rsp->cmd->inline_sg[page_idx];
 	rsp->req.sg_cnt = sg_count;
 }
 
@@ -857,7 +864,8 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
 			return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
 	}
 
-	if (off + len > rsp->queue->dev->inline_data_size) {
+	if (off > rsp->queue->dev->inline_data_size ||
+	    len > rsp->queue->dev->inline_data_size - off) {
 		pr_err("invalid inline data offset!\n");
 		return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
 	}
--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] nvmet-rdma: handle inline data with a nonzero offset
  2026-06-04 10:22     ` Keith Busch
@ 2026-06-04 19:36       ` Bryam Vargas
  2026-06-09 22:00         ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: Bryam Vargas @ 2026-06-04 19:36 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, Keith Busch, Chaitanya Kulkarni
  Cc: linux-nvme, linux-rdma, linux-block

nvmet_rdma_use_inline_sg() maps the host-controlled inline data offset
into the per-command inline scatterlist.  The bounds check admits any
offset with off + len <= inline_data_size, but the mapping still assumes
the data begins in the first inline page:

	sg->offset = off;
	sg->length = min_t(int, len, PAGE_SIZE - off);

When a port is configured with inline_data_size > PAGE_SIZE (settable up
to max(SZ_16K, PAGE_SIZE)), an offset in (PAGE_SIZE, inline_data_size]
makes "PAGE_SIZE - off" underflow, so sg->length is set to ~4 GiB and
the block backend reads far past the first inline page.  num_pages(len)
also ignores the offset, so an in-bounds offset whose [off, off+len)
span crosses a page boundary under-counts the scatterlist.

Map the offset properly: split it into a page index and an in-page
offset, start the scatterlist at that page, and size the page count from
page_off + len.  Because the request scatterlist may now start at
inline_sg[page_idx] rather than inline_sg[0], generalize the inline-SGL
identity test in nvmet_rdma_release_rsp() to a range test; otherwise the
persistent inline scatterlist is mistaken for an allocated one and
nvmet_req_free_sgls() frees an inline page (and warns in
free_large_kmalloc()).

Fixes: 0d5ee2b2ab4f ("nvmet-rdma: support max(16KB, PAGE_SIZE) inline data")
Cc: stable@vger.kernel.org
Suggested-by: Keith Busch <kbusch@kernel.org>
Reported-by: Bryam Vargas <hexlabsecurity@proton.me>
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
v1 rejected a nonzero offset; per Keith's note a nonzero in-capsule SGL
offset is legitimate (it is the per-command SGL Offset field, distinct
from the controller ICDOFF attribute that nvme_rdma_setup_ctrl() refuses
when nonzero), so v2 handles it instead, using Keith's suggested
page_idx/page_off form for nvmet_rdma_use_inline_sg().

Review context (not for the commit log):

Bound safety: with off + len <= inline_data_size the highest inline_sg[]
index touched is page_idx + sg_count - 1 = floor((off + len - 1) /
PAGE_SIZE) <= num_pages(inline_data_size) - 1 = inline_page_count - 1
(<= NVMET_RDMA_MAX_INLINE_SGE - 1), and page_off < PAGE_SIZE so
PAGE_SIZE - page_off cannot underflow.  The release_rsp range test is a
strict generalization of the old "!= inline_sg" test: inline_sg[0] is in
range (unchanged: not freed), allocated/keyed SGLs are outside it (still
freed), and only the new inline_sg[1..] starts are additionally treated
as inline.

Decides identically on 32- and 64-bit builds: off is u64, so the offset
arithmetic and PAGE_SIZE - page_off are evaluated in 64-bit on both ABIs;
num_pages() sees page_off + len <= 16384 (positive, int-safe on both);
the release_rsp comparison is a pointer comparison, identical semantics
on ILP32 and LP64.  (-m32/-m64 model output identical.)

A/B on a KASAN build (inline_data_size = 16384) over an rdma_rxe
loopback nvmet-rdma target with a block backend, inline write:
  - offset 0: succeeds, clean (control + no regression).
  - offset 8192: before this patch the block backend reads out of bounds
      BUG: KASAN: slab-out-of-bounds in copy_folio_from_iter_atomic
      (sg->length = 0xfffff000); with this patch it is served from the
      correct inline page, in bounds, no KASAN and no free_large_kmalloc
      warning.
  - the use_inline_sg() rework alone (without the release_rsp change)
      trips on offset 8192:
      WARNING: ... free_large_kmalloc ... Not a kmalloc allocation
        nvmet_req_free_sgls <- nvmet_rdma_release_rsp <- nvmet_rdma_send_done

 drivers/nvme/target/rdma.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 565183a20007..eb975fbd74a1 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -666,7 +666,8 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
 	if (rsp->n_rdma)
 		nvmet_rdma_rw_ctx_destroy(rsp);

-	if (rsp->req.sg != rsp->cmd->inline_sg)
+	if (rsp->req.sg < rsp->cmd->inline_sg ||
+	    rsp->req.sg >= rsp->cmd->inline_sg + queue->dev->inline_page_count)
 		nvmet_req_free_sgls(&rsp->req);

 	if (unlikely(!list_empty_careful(&queue->rsp_wr_wait_list)))
@@ -821,24 +822,25 @@ static void nvmet_rdma_write_data_done(struct ib_cq *cq, struct ib_wc *wc)
 static void nvmet_rdma_use_inline_sg(struct nvmet_rdma_rsp *rsp, u32 len,
 		u64 off)
 {
-	int sg_count = num_pages(len);
+	u64 page_off = off % PAGE_SIZE;
+	u64 page_idx = off / PAGE_SIZE;
+	int sg_count = num_pages(page_off + len);
 	struct scatterlist *sg;
 	int i;

-	sg = rsp->cmd->inline_sg;
+	sg = &rsp->cmd->inline_sg[page_idx];
 	for (i = 0; i < sg_count; i++, sg++) {
 		if (i < sg_count - 1)
 			sg_unmark_end(sg);
 		else
 			sg_mark_end(sg);
-		sg->offset = off;
-		sg->length = min_t(int, len, PAGE_SIZE - off);
+		sg->offset = page_off;
+		sg->length = min_t(u64, len, PAGE_SIZE - page_off);
 		len -= sg->length;
-		if (!i)
-			off = 0;
+		page_off = 0;
 	}

-	rsp->req.sg = rsp->cmd->inline_sg;
+	rsp->req.sg = &rsp->cmd->inline_sg[page_idx];
 	rsp->req.sg_cnt = sg_count;
 }

--
2.43.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] nvmet-rdma: handle inline data with a nonzero offset
  2026-06-04 19:36       ` [PATCH v2] nvmet-rdma: handle " Bryam Vargas
@ 2026-06-09 22:00         ` Keith Busch
  0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2026-06-09 22:00 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, linux-nvme,
	linux-rdma, linux-block

On Thu, Jun 04, 2026 at 07:36:54PM +0000, Bryam Vargas wrote:
> nvmet_rdma_use_inline_sg() maps the host-controlled inline data offset
> into the per-command inline scatterlist.  The bounds check admits any
> offset with off + len <= inline_data_size, but the mapping still assumes
> the data begins in the first inline page:

Thanks applied to nvme-7.2.

And not necessarily directed at you since apparently many people do
this, but it would help me a great deal if subsequent versions were
posted as a new thread rather than appending to the previous. The
interleaving of the intermediate just makes this harder to sift through.

I'm actually not even sure how so many people converged on this
anti-pattern, as 'git send-email' would have naturally created a new
thread for each new version. What exactly are people doing here?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-09 22:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29  6:52 [REPORT] nvmet-rdma: integer overflow in inline-data SGL bounds check -> pre-auth kernel-memory read + remote crash (candidate patch inline) hexlabsecurity
2026-05-29 16:09 ` Keith Busch
2026-06-04  8:46   ` [PATCH] nvmet-rdma: reject inline data with a nonzero offset Bryam Vargas
2026-06-04  9:32     ` Keith Busch
2026-06-04 10:22     ` Keith Busch
2026-06-04 19:36       ` [PATCH v2] nvmet-rdma: handle " Bryam Vargas
2026-06-09 22:00         ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox