public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
       [not found] <2026031805-stretch-skid-ae5b@gregkh>
@ 2026-03-19  1:26 ` Shivam Kumar
  2026-03-19  7:59   ` Maurizio Lombardi
  0 siblings, 1 reply; 7+ messages in thread
From: Shivam Kumar @ 2026-03-19  1:26 UTC (permalink / raw)
  To: gregkh; +Cc: security, hch, sagi, kch, linux-nvme, kumar.shivam43666

nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
and passes it to sgl_alloc() for scatter-gather allocation. The only
existing size check (len > inline_data_size) is gated behind
sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
other SGL type bypasses the validation entirely. A remote attacker can
send a CapsuleCmd with a non-offset SGL type and a large sgl->length
(e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
page allocator WARNING and kernel panic when panic_on_warn is set.

Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
sgl_alloc(). This value is already advertised to the host in the ICResp
maxdata field as the maximum data transfer size.

Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
---
 drivers/nvme/target/tcp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index a456dd2fd8bd..c7253221c342 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
 			return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
 		cmd->pdu_len = len;
 	}
+	if (len > NVMET_TCP_MAXH2CDATA)
+		return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
 	cmd->req.transfer_len += len;
 
 	cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-03-19  1:26 ` [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() Shivam Kumar
@ 2026-03-19  7:59   ` Maurizio Lombardi
  2026-03-19 18:00     ` Shivam Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Maurizio Lombardi @ 2026-03-19  7:59 UTC (permalink / raw)
  To: Shivam Kumar, gregkh; +Cc: security, hch, sagi, kch, linux-nvme

On Thu Mar 19, 2026 at 2:26 AM CET, Shivam Kumar wrote:
> nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
> and passes it to sgl_alloc() for scatter-gather allocation. The only
> existing size check (len > inline_data_size) is gated behind
> sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
> other SGL type bypasses the validation entirely. A remote attacker can
> send a CapsuleCmd with a non-offset SGL type and a large sgl->length
> (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
> page allocator WARNING and kernel panic when panic_on_warn is set.
>
> Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
> sgl_alloc(). This value is already advertised to the host in the ICResp
> maxdata field as the maximum data transfer size.
>
> Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> ---
>  drivers/nvme/target/tcp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index a456dd2fd8bd..c7253221c342 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
>  			return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
>  		cmd->pdu_len = len;
>  	}
> +	if (len > NVMET_TCP_MAXH2CDATA)
> +		return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
>  	cmd->req.transfer_len += len;
>  
>  	cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);


I am not sure this patch is correct, the intent is valid but you are conflating
the total transfer size with the H2CData maximum PDU size.

In other words, if I understood the code correctly:

MAXH2CDATA is the limit of a single H2CData packet's size.
sgl->length is the total amount of data to transfer.

Also, will the request be completed with "NVME_SC_SGL_INVALID_DATA |
NVME_STATUS_DNR" code? If yes, this would be a violation of the spec
that states that an H2CDATA with a too large size should trigger a
fatal transport error.

Maybe the correct fix is to set a sane default for MDTS
(Maximum Data Transfer Size), at the moment it's left to zero (no limit).

Maurizio



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-03-19  7:59   ` Maurizio Lombardi
@ 2026-03-19 18:00     ` Shivam Kumar
  2026-03-20  7:48       ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Shivam Kumar @ 2026-03-19 18:00 UTC (permalink / raw)
  To: Maurizio Lombardi; +Cc: gregkh, security, hch, sagi, kch, linux-nvme

Hi Maurizio

Thank you for the review. You're right, I was conflating the per-PDU
H2CData limit with the total transfer size for the command.
And the error handling should trigger a fatal transport error rather
than completing the request with an NVMe status code.

Would setting a sane default for MDTS be the preferred approach here?
If so, I'm happy to send a v2 implementing that instead.

Thanks,
Shivam

On Thu, Mar 19, 2026 at 3:59 AM Maurizio Lombardi <mlombard@arkamax.eu> wrote:
>
> On Thu Mar 19, 2026 at 2:26 AM CET, Shivam Kumar wrote:
> > nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
> > and passes it to sgl_alloc() for scatter-gather allocation. The only
> > existing size check (len > inline_data_size) is gated behind
> > sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
> > other SGL type bypasses the validation entirely. A remote attacker can
> > send a CapsuleCmd with a non-offset SGL type and a large sgl->length
> > (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
> > page allocator WARNING and kernel panic when panic_on_warn is set.
> >
> > Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
> > sgl_alloc(). This value is already advertised to the host in the ICResp
> > maxdata field as the maximum data transfer size.
> >
> > Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> > ---
> >  drivers/nvme/target/tcp.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> > index a456dd2fd8bd..c7253221c342 100644
> > --- a/drivers/nvme/target/tcp.c
> > +++ b/drivers/nvme/target/tcp.c
> > @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
> >                       return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
> >               cmd->pdu_len = len;
> >       }
> > +     if (len > NVMET_TCP_MAXH2CDATA)
> > +             return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
> >       cmd->req.transfer_len += len;
> >
> >       cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);
>
>
> I am not sure this patch is correct, the intent is valid but you are conflating
> the total transfer size with the H2CData maximum PDU size.
>
> In other words, if I understood the code correctly:
>
> MAXH2CDATA is the limit of a single H2CData packet's size.
> sgl->length is the total amount of data to transfer.
>
> Also, will the request be completed with "NVME_SC_SGL_INVALID_DATA |
> NVME_STATUS_DNR" code? If yes, this would be a violation of the spec
> that states that an H2CDATA with a too large size should trigger a
> fatal transport error.
>
> Maybe the correct fix is to set a sane default for MDTS
> (Maximum Data Transfer Size), at the moment it's left to zero (no limit).
>
> Maurizio
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-03-19 18:00     ` Shivam Kumar
@ 2026-03-20  7:48       ` Christoph Hellwig
  2026-04-05 19:46         ` Shivam Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2026-03-20  7:48 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: Maurizio Lombardi, gregkh, security, hch, sagi, kch, linux-nvme

On Thu, Mar 19, 2026 at 02:00:27PM -0400, Shivam Kumar wrote:
> Hi Maurizio
> 
> Thank you for the review. You're right, I was conflating the per-PDU
> H2CData limit with the total transfer size for the command.
> And the error handling should trigger a fatal transport error rather
> than completing the request with an NVMe status code.
> 
> Would setting a sane default for MDTS be the preferred approach here?
> If so, I'm happy to send a v2 implementing that instead.

Yes, if we want to limit the I/O size we have to limit MDTS.  Which all
other transports do anyway, so I'm kinda surprised TCP doesn't.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-03-20  7:48       ` Christoph Hellwig
@ 2026-04-05 19:46         ` Shivam Kumar
  2026-04-07  6:30           ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Shivam Kumar @ 2026-04-05 19:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme

Hi Christoph, Maurizio

Just following up on this thread.
Would you like me to send a v2 patch that sets a sane default MDTS for
the TCP transport? If so, is there a preferred value I should use, or
should I follow what the other transports do?

Thanks,
Shivam Kumar

On Fri, Mar 20, 2026 at 3:48 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Mar 19, 2026 at 02:00:27PM -0400, Shivam Kumar wrote:
> > Hi Maurizio
> >
> > Thank you for the review. You're right, I was conflating the per-PDU
> > H2CData limit with the total transfer size for the command.
> > And the error handling should trigger a fatal transport error rather
> > than completing the request with an NVMe status code.
> >
> > Would setting a sane default for MDTS be the preferred approach here?
> > If so, I'm happy to send a v2 implementing that instead.
>
> Yes, if we want to limit the I/O size we have to limit MDTS.  Which all
> other transports do anyway, so I'm kinda surprised TCP doesn't.
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-04-05 19:46         ` Shivam Kumar
@ 2026-04-07  6:30           ` Christoph Hellwig
  2026-04-08  6:30             ` Maurizio Lombardi
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2026-04-07  6:30 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: Christoph Hellwig, Maurizio Lombardi, gregkh, security, sagi, kch,
	linux-nvme

On Sun, Apr 05, 2026 at 03:46:12PM -0400, Shivam Kumar wrote:
> Hi Christoph, Maurizio
> 
> Just following up on this thread.
> Would you like me to send a v2 patch that sets a sane default MDTS for
> the TCP transport?

Yes, that would be great!

> If so, is there a preferred value I should use, or
> should I follow what the other transports do?

I'm honestly not sure.  Maurizio/Sagi: any good idea what a TCP default
max MDTS might be?



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
  2026-04-07  6:30           ` Christoph Hellwig
@ 2026-04-08  6:30             ` Maurizio Lombardi
  0 siblings, 0 replies; 7+ messages in thread
From: Maurizio Lombardi @ 2026-04-08  6:30 UTC (permalink / raw)
  To: Christoph Hellwig, Shivam Kumar
  Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme

On Tue Apr 7, 2026 at 8:30 AM CEST, Christoph Hellwig wrote:
> On Sun, Apr 05, 2026 at 03:46:12PM -0400, Shivam Kumar wrote:
>> Hi Christoph, Maurizio
>> 
>> Just following up on this thread.
>> Would you like me to send a v2 patch that sets a sane default MDTS for
>> the TCP transport?
>
> Yes, that would be great!
>
>> If so, is there a preferred value I should use, or
>> should I follow what the other transports do?
>
> I'm honestly not sure.  Maurizio/Sagi: any good idea what a TCP default
> max MDTS might be?

I'm not sure there is a "correct" value here. It seems to mostly come
down to balancing performance vs memory usage, rather than anything
TCP-specific.

For reference, pci-epf allows a max MDTS of 2 MiB[1], while RDMA uses 1 MiB.

Personally, I would pick a default of 2 MiB, it should be large
enough to avoid excessive splitting while still being reasonable from a
memory perspective.

Sagi? Any thought on that?

I have seen a patch on this mailing list that makes mdts configurable
via configfs, so in case of need a user could still change it
to his preferred value.

Maurizio

[1] https://docs.kernel.org/nvme/nvme-pci-endpoint-target.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-08  6:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2026031805-stretch-skid-ae5b@gregkh>
2026-03-19  1:26 ` [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() Shivam Kumar
2026-03-19  7:59   ` Maurizio Lombardi
2026-03-19 18:00     ` Shivam Kumar
2026-03-20  7:48       ` Christoph Hellwig
2026-04-05 19:46         ` Shivam Kumar
2026-04-07  6:30           ` Christoph Hellwig
2026-04-08  6:30             ` Maurizio Lombardi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox