Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
       [not found] <2026031805-stretch-skid-ae5b@gregkh>
@ 2026-03-19  1:26 ` Shivam Kumar
  2026-03-19  7:59   ` Maurizio Lombardi
  0 siblings, 1 reply; 17+ messages in thread
From: Shivam Kumar @ 2026-03-19  1:26 UTC (permalink / raw)
  To: gregkh; +Cc: security, hch, sagi, kch, linux-nvme, kumar.shivam43666

nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
and passes it to sgl_alloc() for scatter-gather allocation. The only
existing size check (len > inline_data_size) is gated behind
sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
other SGL type bypasses the validation entirely. A remote attacker can
send a CapsuleCmd with a non-offset SGL type and a large sgl->length
(e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
page allocator WARNING and kernel panic when panic_on_warn is set.

Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
sgl_alloc(). This value is already advertised to the host in the ICResp
maxdata field as the maximum data transfer size.

Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
---
 drivers/nvme/target/tcp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index a456dd2fd8bd..c7253221c342 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
 			return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
 		cmd->pdu_len = len;
 	}
+	if (len > NVMET_TCP_MAXH2CDATA)
+		return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
 	cmd->req.transfer_len += len;
 
 	cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-03-19  1:26 ` [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() Shivam Kumar
@ 2026-03-19  7:59   ` Maurizio Lombardi
  2026-03-19 18:00     ` Shivam Kumar
  0 siblings, 1 reply; 17+ messages in thread
From: Maurizio Lombardi @ 2026-03-19  7:59 UTC (permalink / raw)
  To: Shivam Kumar, gregkh; +Cc: security, hch, sagi, kch, linux-nvme

On Thu Mar 19, 2026 at 2:26 AM CET, Shivam Kumar wrote:
> nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
> and passes it to sgl_alloc() for scatter-gather allocation. The only
> existing size check (len > inline_data_size) is gated behind
> sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
> other SGL type bypasses the validation entirely. A remote attacker can
> send a CapsuleCmd with a non-offset SGL type and a large sgl->length
> (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
> page allocator WARNING and kernel panic when panic_on_warn is set.
>
> Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
> sgl_alloc(). This value is already advertised to the host in the ICResp
> maxdata field as the maximum data transfer size.
>
> Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> ---
>  drivers/nvme/target/tcp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index a456dd2fd8bd..c7253221c342 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
>  			return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
>  		cmd->pdu_len = len;
>  	}
> +	if (len > NVMET_TCP_MAXH2CDATA)
> +		return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
>  	cmd->req.transfer_len += len;
>  
>  	cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);


I am not sure this patch is correct, the intent is valid but you are conflating
the total transfer size with the H2CData maximum PDU size.

In other words, if I understood the code correctly:

MAXH2CDATA is the limit of a single H2CData packet's size.
sgl->length is the total amount of data to transfer.

Also, will the request be completed with "NVME_SC_SGL_INVALID_DATA |
NVME_STATUS_DNR" code? If yes, this would be a violation of the spec
that states that an H2CDATA with a too large size should trigger a
fatal transport error.

Maybe the correct fix is to set a sane default for MDTS
(Maximum Data Transfer Size), at the moment it's left to zero (no limit).

Maurizio



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-03-19  7:59   ` Maurizio Lombardi
@ 2026-03-19 18:00     ` Shivam Kumar
  2026-03-20  7:48       ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Shivam Kumar @ 2026-03-19 18:00 UTC (permalink / raw)
  To: Maurizio Lombardi; +Cc: gregkh, security, hch, sagi, kch, linux-nvme

Hi Maurizio

Thank you for the review. You're right, I was conflating the per-PDU
H2CData limit with the total transfer size for the command.
And the error handling should trigger a fatal transport error rather
than completing the request with an NVMe status code.

Would setting a sane default for MDTS be the preferred approach here?
If so, I'm happy to send a v2 implementing that instead.

Thanks,
Shivam

On Thu, Mar 19, 2026 at 3:59 AM Maurizio Lombardi <mlombard@arkamax.eu> wrote:
>
> On Thu Mar 19, 2026 at 2:26 AM CET, Shivam Kumar wrote:
> > nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
> > and passes it to sgl_alloc() for scatter-gather allocation. The only
> > existing size check (len > inline_data_size) is gated behind
> > sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
> > other SGL type bypasses the validation entirely. A remote attacker can
> > send a CapsuleCmd with a non-offset SGL type and a large sgl->length
> > (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
> > page allocator WARNING and kernel panic when panic_on_warn is set.
> >
> > Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
> > sgl_alloc(). This value is already advertised to the host in the ICResp
> > maxdata field as the maximum data transfer size.
> >
> > Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> > ---
> >  drivers/nvme/target/tcp.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> > index a456dd2fd8bd..c7253221c342 100644
> > --- a/drivers/nvme/target/tcp.c
> > +++ b/drivers/nvme/target/tcp.c
> > @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
> >                       return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
> >               cmd->pdu_len = len;
> >       }
> > +     if (len > NVMET_TCP_MAXH2CDATA)
> > +             return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
> >       cmd->req.transfer_len += len;
> >
> >       cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);
>
>
> I am not sure this patch is correct, the intent is valid but you are conflating
> the total transfer size with the H2CData maximum PDU size.
>
> In other words, if I understood the code correctly:
>
> MAXH2CDATA is the limit of a single H2CData packet's size.
> sgl->length is the total amount of data to transfer.
>
> Also, will the request be completed with "NVME_SC_SGL_INVALID_DATA |
> NVME_STATUS_DNR" code? If yes, this would be a violation of the spec
> that states that an H2CDATA with a too large size should trigger a
> fatal transport error.
>
> Maybe the correct fix is to set a sane default for MDTS
> (Maximum Data Transfer Size), at the moment it's left to zero (no limit).
>
> Maurizio
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-03-19 18:00     ` Shivam Kumar
@ 2026-03-20  7:48       ` Christoph Hellwig
  2026-04-05 19:46         ` Shivam Kumar
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2026-03-20  7:48 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: Maurizio Lombardi, gregkh, security, hch, sagi, kch, linux-nvme

On Thu, Mar 19, 2026 at 02:00:27PM -0400, Shivam Kumar wrote:
> Hi Maurizio
> 
> Thank you for the review. You're right, I was conflating the per-PDU
> H2CData limit with the total transfer size for the command.
> And the error handling should trigger a fatal transport error rather
> than completing the request with an NVMe status code.
> 
> Would setting a sane default for MDTS be the preferred approach here?
> If so, I'm happy to send a v2 implementing that instead.

Yes, if we want to limit the I/O size we have to limit MDTS.  Which all
other transports do anyway, so I'm kinda surprised TCP doesn't.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-03-20  7:48       ` Christoph Hellwig
@ 2026-04-05 19:46         ` Shivam Kumar
  2026-04-07  6:30           ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Shivam Kumar @ 2026-04-05 19:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme

Hi Christoph, Maurizio

Just following up on this thread.
Would you like me to send a v2 patch that sets a sane default MDTS for
the TCP transport? If so, is there a preferred value I should use, or
should I follow what the other transports do?

Thanks,
Shivam Kumar

On Fri, Mar 20, 2026 at 3:48 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Mar 19, 2026 at 02:00:27PM -0400, Shivam Kumar wrote:
> > Hi Maurizio
> >
> > Thank you for the review. You're right, I was conflating the per-PDU
> > H2CData limit with the total transfer size for the command.
> > And the error handling should trigger a fatal transport error rather
> > than completing the request with an NVMe status code.
> >
> > Would setting a sane default for MDTS be the preferred approach here?
> > If so, I'm happy to send a v2 implementing that instead.
>
> Yes, if we want to limit the I/O size we have to limit MDTS.  Which all
> other transports do anyway, so I'm kinda surprised TCP doesn't.
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-04-05 19:46         ` Shivam Kumar
@ 2026-04-07  6:30           ` Christoph Hellwig
  2026-04-08  6:30             ` Maurizio Lombardi
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2026-04-07  6:30 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: Christoph Hellwig, Maurizio Lombardi, gregkh, security, sagi, kch,
	linux-nvme

On Sun, Apr 05, 2026 at 03:46:12PM -0400, Shivam Kumar wrote:
> Hi Christoph, Maurizio
> 
> Just following up on this thread.
> Would you like me to send a v2 patch that sets a sane default MDTS for
> the TCP transport?

Yes, that would be great!

> If so, is there a preferred value I should use, or
> should I follow what the other transports do?

I'm honestly not sure.  Maurizio/Sagi: any good idea what a TCP default
max MDTS might be?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-04-07  6:30           ` Christoph Hellwig
@ 2026-04-08  6:30             ` Maurizio Lombardi
  2026-04-09  6:08               ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Maurizio Lombardi @ 2026-04-08  6:30 UTC (permalink / raw)
  To: Christoph Hellwig, Shivam Kumar
  Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme

On Tue Apr 7, 2026 at 8:30 AM CEST, Christoph Hellwig wrote:
> On Sun, Apr 05, 2026 at 03:46:12PM -0400, Shivam Kumar wrote:
>> Hi Christoph, Maurizio
>> 
>> Just following up on this thread.
>> Would you like me to send a v2 patch that sets a sane default MDTS for
>> the TCP transport?
>
> Yes, that would be great!
>
>> If so, is there a preferred value I should use, or
>> should I follow what the other transports do?
>
> I'm honestly not sure.  Maurizio/Sagi: any good idea what a TCP default
> max MDTS might be?

I'm not sure there is a "correct" value here. It seems to mostly come
down to balancing performance vs memory usage, rather than anything
TCP-specific.

For reference, pci-epf allows a max MDTS of 2 MiB[1], while RDMA uses 1 MiB.

Personally, I would pick a default of 2 MiB, it should be large
enough to avoid excessive splitting while still being reasonable from a
memory perspective.

Sagi? Any thought on that?

I have seen a patch on this mailing list that makes mdts configurable
via configfs, so in case of need a user could still change it
to his preferred value.

Maurizio

[1] https://docs.kernel.org/nvme/nvme-pci-endpoint-target.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-04-08  6:30             ` Maurizio Lombardi
@ 2026-04-09  6:08               ` Christoph Hellwig
  2026-04-26 23:57                 ` Shivam Kumar
  2026-04-27  0:44                 ` [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport Shivam Kumar
  0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2026-04-09  6:08 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Christoph Hellwig, Shivam Kumar, gregkh, security, sagi, kch,
	linux-nvme

On Wed, Apr 08, 2026 at 08:30:12AM +0200, Maurizio Lombardi wrote:
> Personally, I would pick a default of 2 MiB, it should be large
> enough to avoid excessive splitting while still being reasonable from a
> memory perspective.
> 
> Sagi? Any thought on that?
> 
> I have seen a patch on this mailing list that makes mdts configurable
> via configfs, so in case of need a user could still change it
> to his preferred value.

Right now that just allows adjusting down, so picking the larger
value sounds sane.  We can still increase it later and/or allow
adjusting above the default to an upper cap if needed.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-04-09  6:08               ` Christoph Hellwig
@ 2026-04-26 23:57                 ` Shivam Kumar
  2026-04-27  0:44                 ` [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport Shivam Kumar
  1 sibling, 0 replies; 17+ messages in thread
From: Shivam Kumar @ 2026-04-26 23:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme

Hi all,

Gentle ping on this. Is there anything else needed from my side?
Or is a consensus forming around the 2 MiB default?

Happy to send an updated patch if that helps move things along.

Thanks,
Shivam

On Thu, Apr 9, 2026 at 2:08 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Apr 08, 2026 at 08:30:12AM +0200, Maurizio Lombardi wrote:
> > Personally, I would pick a default of 2 MiB, it should be large
> > enough to avoid excessive splitting while still being reasonable from a
> > memory perspective.
> >
> > Sagi? Any thought on that?
> >
> > I have seen a patch on this mailing list that makes mdts configurable
> > via configfs, so in case of need a user could still change it
> > to his preferred value.
>
> Right now that just allows adjusting down, so picking the larger
> value sounds sane.  We can still increase it later and/or allow
> adjusting above the default to an upper cap if needed.
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport
  2026-04-09  6:08               ` Christoph Hellwig
  2026-04-26 23:57                 ` Shivam Kumar
@ 2026-04-27  0:44                 ` Shivam Kumar
  2026-04-28  6:07                   ` Maurizio Lombardi
  1 sibling, 1 reply; 17+ messages in thread
From: Shivam Kumar @ 2026-04-27  0:44 UTC (permalink / raw)
  To: hch; +Cc: mlombard, sagi, kch, linux-nvme, kbusch, gregkh, security,
	Shivam Kumar

Unlike other fabrics transports, the TCP target does not set a default
Maximum Data Transfer Size. With the configfs MDTS entry defaulting to 0
(no limit), a remote attacker can send a CapsuleCmd with an arbitrarily
large SGL length, causing sgl_alloc() in nvmet_tcp_map_data() to attempt
an excessive kernel allocation that triggers the OOM killer.

Set a default MDTS of 9 (2 MiB) for TCP. Admins can still adjust via
the configfs mdts attribute if needed.

Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
---
 drivers/nvme/target/tcp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index a456dd2fd8bd..d09c81d07a1d 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -25,6 +25,7 @@
 #define NVMET_TCP_DEF_INLINE_DATA_SIZE	(4 * PAGE_SIZE)
 #define NVMET_TCP_MAXH2CDATA		0x400000 /* 16M arbitrary limit */
 #define NVMET_TCP_BACKLOG 128
+#define NVMET_TCP_DEF_MDTS		9 /* 2 MiB (2^(12+9)) */
 
 static int param_store_val(const char *str, int *val, int min, int max)
 {
@@ -2067,6 +2068,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
 	INIT_WORK(&port->accept_work, nvmet_tcp_accept_work);
 	if (port->nport->inline_data_size < 0)
 		port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
+	if (nport->mdts < 0)
+		nport->mdts = NVMET_TCP_DEF_MDTS;
 
 	ret = sock_create(port->addr.ss_family, SOCK_STREAM,
 				IPPROTO_TCP, &port->sock);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport
  2026-04-27  0:44                 ` [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport Shivam Kumar
@ 2026-04-28  6:07                   ` Maurizio Lombardi
  2026-04-29  1:05                     ` Shivam Kumar
  2026-05-08 20:39                     ` [PATCH v3] nvmet-tcp: set and enforce a default MDTS " Shivam Kumar
  0 siblings, 2 replies; 17+ messages in thread
From: Maurizio Lombardi @ 2026-04-28  6:07 UTC (permalink / raw)
  To: Shivam Kumar, hch
  Cc: mlombard, sagi, kch, linux-nvme, kbusch, gregkh, security

On Mon Apr 27, 2026 at 2:44 AM CEST, Shivam Kumar wrote:
> Unlike other fabrics transports, the TCP target does not set a default
> Maximum Data Transfer Size. With the configfs MDTS entry defaulting to 0
> (no limit), a remote attacker can send a CapsuleCmd with an arbitrarily
> large SGL length, causing sgl_alloc() in nvmet_tcp_map_data() to attempt
> an excessive kernel allocation that triggers the OOM killer.
>
> Set a default MDTS of 9 (2 MiB) for TCP. Admins can still adjust via
> the configfs mdts attribute if needed.
>
> Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> ---
>  drivers/nvme/target/tcp.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index a456dd2fd8bd..d09c81d07a1d 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -25,6 +25,7 @@
>  #define NVMET_TCP_DEF_INLINE_DATA_SIZE	(4 * PAGE_SIZE)
>  #define NVMET_TCP_MAXH2CDATA		0x400000 /* 16M arbitrary limit */
>  #define NVMET_TCP_BACKLOG 128
> +#define NVMET_TCP_DEF_MDTS		9 /* 2 MiB (2^(12+9)) */
>  
>  static int param_store_val(const char *str, int *val, int min, int max)
>  {
> @@ -2067,6 +2068,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
>  	INIT_WORK(&port->accept_work, nvmet_tcp_accept_work);
>  	if (port->nport->inline_data_size < 0)
>  		port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
> +	if (nport->mdts < 0)
> +		nport->mdts = NVMET_TCP_DEF_MDTS;
>  
>  	ret = sock_create(port->addr.ss_family, SOCK_STREAM,
>  				IPPROTO_TCP, &port->sock);

Ok, but what happens if a host doesn't respect the limitation advertized
by the target?

The spec explicitely says that "If a command is submitted that exceeds this transfer
size, then the command is aborted with a status code of Invalid Field in
Command."

Maurizio


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport
  2026-04-28  6:07                   ` Maurizio Lombardi
@ 2026-04-29  1:05                     ` Shivam Kumar
  2026-05-08 20:39                     ` [PATCH v3] nvmet-tcp: set and enforce a default MDTS " Shivam Kumar
  1 sibling, 0 replies; 17+ messages in thread
From: Shivam Kumar @ 2026-04-29  1:05 UTC (permalink / raw)
  To: Maurizio Lombardi; +Cc: hch, sagi, kch, linux-nvme, kbusch, gregkh, security

Hi Maurizio,

I'll add enforcement to reject commands exceeding the MDTS limit and send a v3.

Thanks,
Shivam Kumar


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3] nvmet-tcp: set and enforce a default MDTS for TCP transport
  2026-04-28  6:07                   ` Maurizio Lombardi
  2026-04-29  1:05                     ` Shivam Kumar
@ 2026-05-08 20:39                     ` Shivam Kumar
  2026-05-10 20:42                       ` Sagi Grimberg
  1 sibling, 1 reply; 17+ messages in thread
From: Shivam Kumar @ 2026-05-08 20:39 UTC (permalink / raw)
  To: hch; +Cc: mlombard, sagi, kch, linux-nvme, kbusch, gregkh, security,
	Shivam Kumar

Unlike other fabrics transports, the TCP target does not set a default
Maximum Data Transfer Size. With the configfs MDTS entry defaulting to 0
(no limit), a remote attacker can send a CapsuleCmd with an arbitrarily
large SGL length, causing sgl_alloc() in nvmet_tcp_map_data() to attempt
an excessive kernel allocation that triggers the OOM killer.

Set a default MDTS of 9 (2 MiB) for TCP. Enforce the limit server-side
in nvmet_tcp_map_data() by rejecting commands whose SGL length exceeds
the configured MDTS, returning NVME_SC_INVALID_FIELD as required by the
NVMe specification. Admins can still adjust via the configfs mdts
attribute if needed.

Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
---
 drivers/nvme/target/tcp.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 164a564ba3b4..098cf877c358 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -25,6 +25,7 @@
 #define NVMET_TCP_DEF_INLINE_DATA_SIZE	(4 * PAGE_SIZE)
 #define NVMET_TCP_MAXH2CDATA		0x400000 /* 16M arbitrary limit */
 #define NVMET_TCP_BACKLOG 128
+#define NVMET_TCP_DEF_MDTS		9 /* 2 MiB (2^(12+9)) */
 
 static int param_store_val(const char *str, int *val, int min, int max)
 {
@@ -422,6 +423,13 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
 	if (!len)
 		return 0;
 
+	/* Enforce MDTS: abort commands exceeding the advertised limit */
+	if (cmd->req.port->mdts) {
+		u8 mdts = cmd->req.port->mdts;
+		if (mdts < 20 && len > (1U << (12 + mdts)))
+			return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	}
+
 	if (sgl->type == ((NVME_SGL_FMT_DATA_DESC << 4) |
 			  NVME_SGL_FMT_OFFSET)) {
 		if (!nvme_is_write(cmd->req.cmd))
@@ -2077,6 +2085,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
 	INIT_WORK(&port->accept_work, nvmet_tcp_accept_work);
 	if (port->nport->inline_data_size < 0)
 		port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
+	if (nport->mdts < 0)
+		nport->mdts = NVMET_TCP_DEF_MDTS;
 
 	ret = sock_create(port->addr.ss_family, SOCK_STREAM,
 				IPPROTO_TCP, &port->sock);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] nvmet-tcp: set and enforce a default MDTS for TCP transport
  2026-05-08 20:39                     ` [PATCH v3] nvmet-tcp: set and enforce a default MDTS " Shivam Kumar
@ 2026-05-10 20:42                       ` Sagi Grimberg
  2026-05-11  6:36                         ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Sagi Grimberg @ 2026-05-10 20:42 UTC (permalink / raw)
  To: Shivam Kumar, hch; +Cc: mlombard, kch, linux-nvme, kbusch, gregkh, security



On 08/05/2026 23:39, Shivam Kumar wrote:
> Unlike other fabrics transports, the TCP target does not set a default
> Maximum Data Transfer Size. With the configfs MDTS entry defaulting to 0
> (no limit), a remote attacker can send a CapsuleCmd with an arbitrarily
> large SGL length, causing sgl_alloc() in nvmet_tcp_map_data() to attempt
> an excessive kernel allocation that triggers the OOM killer.
>
> Set a default MDTS of 9 (2 MiB) for TCP. Enforce the limit server-side
> in nvmet_tcp_map_data() by rejecting commands whose SGL length exceeds
> the configured MDTS, returning NVME_SC_INVALID_FIELD as required by the
> NVMe specification. Admins can still adjust via the configfs mdts
> attribute if needed.
>
> Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> ---
>   drivers/nvme/target/tcp.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 164a564ba3b4..098cf877c358 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -25,6 +25,7 @@
>   #define NVMET_TCP_DEF_INLINE_DATA_SIZE	(4 * PAGE_SIZE)
>   #define NVMET_TCP_MAXH2CDATA		0x400000 /* 16M arbitrary limit */
>   #define NVMET_TCP_BACKLOG 128
> +#define NVMET_TCP_DEF_MDTS		9 /* 2 MiB (2^(12+9)) */
>   
>   static int param_store_val(const char *str, int *val, int min, int max)
>   {
> @@ -422,6 +423,13 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
>   	if (!len)
>   		return 0;
>   
> +	/* Enforce MDTS: abort commands exceeding the advertised limit */
> +	if (cmd->req.port->mdts) {
> +		u8 mdts = cmd->req.port->mdts;
> +		if (mdts < 20 && len > (1U << (12 + mdts)))
> +			return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
> +	}
> +
>   	if (sgl->type == ((NVME_SGL_FMT_DATA_DESC << 4) |
>   			  NVME_SGL_FMT_OFFSET)) {
>   		if (!nvme_is_write(cmd->req.cmd))
> @@ -2077,6 +2085,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
>   	INIT_WORK(&port->accept_work, nvmet_tcp_accept_work);
>   	if (port->nport->inline_data_size < 0)
>   		port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
> +	if (nport->mdts < 0)
> +		nport->mdts = NVMET_TCP_DEF_MDTS;
>   
>   	ret = sock_create(port->addr.ss_family, SOCK_STREAM,
>   				IPPROTO_TCP, &port->sock);

Shivam,

I think what we want is to limit the tcp to a sane limit similar to the 
nvmet-rdma driver.
Also, we probably want it at least consistent with the (rather 
arbitrary) MAXH2CDATA...

e.g. something like:
--
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 20f150d17a96..32f33e5dbfdb 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -24,6 +24,7 @@

  #define NVMET_TCP_DEF_INLINE_DATA_SIZE (4 * PAGE_SIZE)
  #define NVMET_TCP_MAXH2CDATA           0x400000 /* 16M arbitrary limit */
+#define NVMET_TCP_MAX_MDTS             12
  #define NVMET_TCP_BACKLOG 128

  static int param_store_val(const char *str, int *val, int min, int max)
@@ -2220,6 +2221,11 @@ static ssize_t nvmet_tcp_host_port_addr(struct 
nvmet_ctrl *ctrl,
                         (struct sockaddr *)&queue->sockaddr_peer);
  }

+static u8 nvmet_tcp_get_mdts(const struct nvmet_ctrl *ctrl)
+{
+       return NVMET_TCP_MAX_MDTS;
+}
+
  static const struct nvmet_fabrics_ops nvmet_tcp_ops = {
         .owner                  = THIS_MODULE,
         .type                   = NVMF_TRTYPE_TCP,
@@ -2231,6 +2237,7 @@ static const struct nvmet_fabrics_ops 
nvmet_tcp_ops = {
         .install_queue          = nvmet_tcp_install_queue,
         .disc_traddr            = nvmet_tcp_disc_port_addr,
         .host_traddr            = nvmet_tcp_host_port_addr,
+       .get_mdts               = nvmet_tcp_get_mdts,
  };

  static int __init nvmet_tcp_init(void)
-- 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] nvmet-tcp: set and enforce a default MDTS for TCP transport
  2026-05-10 20:42                       ` Sagi Grimberg
@ 2026-05-11  6:36                         ` Christoph Hellwig
  2026-05-11  8:01                           ` Sagi Grimberg
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2026-05-11  6:36 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Shivam Kumar, hch, mlombard, kch, linux-nvme, kbusch, gregkh,
	security

On Sun, May 10, 2026 at 11:42:48PM +0300, Sagi Grimberg wrote:
> I think what we want is to limit the tcp to a sane limit similar to the 
> nvmet-rdma driver.

Agreed.

We might still need the early enforcement in Shivam's patch, though.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] nvmet-tcp: set and enforce a default MDTS for TCP transport
  2026-05-11  6:36                         ` Christoph Hellwig
@ 2026-05-11  8:01                           ` Sagi Grimberg
  2026-05-11  8:04                             ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Sagi Grimberg @ 2026-05-11  8:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Shivam Kumar, mlombard, kch, linux-nvme, kbusch, gregkh, security



On 11/05/2026 9:36, Christoph Hellwig wrote:
> On Sun, May 10, 2026 at 11:42:48PM +0300, Sagi Grimberg wrote:
>> I think what we want is to limit the tcp to a sane limit similar to the
>> nvmet-rdma driver.
> Agreed.
>
> We might still need the early enforcement in Shivam's patch, though.
>

Agreed, I think that rdma is susceptible here as well (perhaps also fc?).

Perhaps we can have a helper for nvmet_check_mdts() that would accept
the request after req->transfer_len was initialized.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] nvmet-tcp: set and enforce a default MDTS for TCP transport
  2026-05-11  8:01                           ` Sagi Grimberg
@ 2026-05-11  8:04                             ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2026-05-11  8:04 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Shivam Kumar, mlombard, kch, linux-nvme,
	kbusch, gregkh, security

On Mon, May 11, 2026 at 11:01:18AM +0300, Sagi Grimberg wrote:
> Agreed, I think that rdma is susceptible here as well (perhaps also fc?).
>
> Perhaps we can have a helper for nvmet_check_mdts() that would accept
> the request after req->transfer_len was initialized.

Sounds good.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-05-11  8:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2026031805-stretch-skid-ae5b@gregkh>
2026-03-19  1:26 ` [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() Shivam Kumar
2026-03-19  7:59   ` Maurizio Lombardi
2026-03-19 18:00     ` Shivam Kumar
2026-03-20  7:48       ` Christoph Hellwig
2026-04-05 19:46         ` Shivam Kumar
2026-04-07  6:30           ` Christoph Hellwig
2026-04-08  6:30             ` Maurizio Lombardi
2026-04-09  6:08               ` Christoph Hellwig
2026-04-26 23:57                 ` Shivam Kumar
2026-04-27  0:44                 ` [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport Shivam Kumar
2026-04-28  6:07                   ` Maurizio Lombardi
2026-04-29  1:05                     ` Shivam Kumar
2026-05-08 20:39                     ` [PATCH v3] nvmet-tcp: set and enforce a default MDTS " Shivam Kumar
2026-05-10 20:42                       ` Sagi Grimberg
2026-05-11  6:36                         ` Christoph Hellwig
2026-05-11  8:01                           ` Sagi Grimberg
2026-05-11  8:04                             ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox