* [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
[not found] <2026031805-stretch-skid-ae5b@gregkh>
@ 2026-03-19 1:26 ` Shivam Kumar
2026-03-19 7:59 ` Maurizio Lombardi
0 siblings, 1 reply; 7+ messages in thread
From: Shivam Kumar @ 2026-03-19 1:26 UTC (permalink / raw)
To: gregkh; +Cc: security, hch, sagi, kch, linux-nvme, kumar.shivam43666
nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
and passes it to sgl_alloc() for scatter-gather allocation. The only
existing size check (len > inline_data_size) is gated behind
sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
other SGL type bypasses the validation entirely. A remote attacker can
send a CapsuleCmd with a non-offset SGL type and a large sgl->length
(e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
page allocator WARNING and kernel panic when panic_on_warn is set.
Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
sgl_alloc(). This value is already advertised to the host in the ICResp
maxdata field as the maximum data transfer size.
Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
---
drivers/nvme/target/tcp.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index a456dd2fd8bd..c7253221c342 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
cmd->pdu_len = len;
}
+ if (len > NVMET_TCP_MAXH2CDATA)
+ return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
cmd->req.transfer_len += len;
cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-03-19 1:26 ` [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() Shivam Kumar
@ 2026-03-19 7:59 ` Maurizio Lombardi
2026-03-19 18:00 ` Shivam Kumar
0 siblings, 1 reply; 7+ messages in thread
From: Maurizio Lombardi @ 2026-03-19 7:59 UTC (permalink / raw)
To: Shivam Kumar, gregkh; +Cc: security, hch, sagi, kch, linux-nvme
On Thu Mar 19, 2026 at 2:26 AM CET, Shivam Kumar wrote:
> nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
> and passes it to sgl_alloc() for scatter-gather allocation. The only
> existing size check (len > inline_data_size) is gated behind
> sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
> other SGL type bypasses the validation entirely. A remote attacker can
> send a CapsuleCmd with a non-offset SGL type and a large sgl->length
> (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
> page allocator WARNING and kernel panic when panic_on_warn is set.
>
> Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
> sgl_alloc(). This value is already advertised to the host in the ICResp
> maxdata field as the maximum data transfer size.
>
> Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> ---
> drivers/nvme/target/tcp.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index a456dd2fd8bd..c7253221c342 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
> return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
> cmd->pdu_len = len;
> }
> + if (len > NVMET_TCP_MAXH2CDATA)
> + return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
> cmd->req.transfer_len += len;
>
> cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);
I am not sure this patch is correct, the intent is valid but you are conflating
the total transfer size with the H2CData maximum PDU size.
In other words, if I understood the code correctly:
MAXH2CDATA is the limit of a single H2CData packet's size.
sgl->length is the total amount of data to transfer.
Also, will the request be completed with "NVME_SC_SGL_INVALID_DATA |
NVME_STATUS_DNR" code? If yes, this would be a violation of the spec
that states that an H2CDATA with a too large size should trigger a
fatal transport error.
Maybe the correct fix is to set a sane default for MDTS
(Maximum Data Transfer Size), at the moment it's left to zero (no limit).
Maurizio
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-03-19 7:59 ` Maurizio Lombardi
@ 2026-03-19 18:00 ` Shivam Kumar
2026-03-20 7:48 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Shivam Kumar @ 2026-03-19 18:00 UTC (permalink / raw)
To: Maurizio Lombardi; +Cc: gregkh, security, hch, sagi, kch, linux-nvme
Hi Maurizio
Thank you for the review. You're right, I was conflating the per-PDU
H2CData limit with the total transfer size for the command.
And the error handling should trigger a fatal transport error rather
than completing the request with an NVMe status code.
Would setting a sane default for MDTS be the preferred approach here?
If so, I'm happy to send a v2 implementing that instead.
Thanks,
Shivam
On Thu, Mar 19, 2026 at 3:59 AM Maurizio Lombardi <mlombard@arkamax.eu> wrote:
>
> On Thu Mar 19, 2026 at 2:26 AM CET, Shivam Kumar wrote:
> > nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
> > and passes it to sgl_alloc() for scatter-gather allocation. The only
> > existing size check (len > inline_data_size) is gated behind
> > sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
> > other SGL type bypasses the validation entirely. A remote attacker can
> > send a CapsuleCmd with a non-offset SGL type and a large sgl->length
> > (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
> > page allocator WARNING and kernel panic when panic_on_warn is set.
> >
> > Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
> > sgl_alloc(). This value is already advertised to the host in the ICResp
> > maxdata field as the maximum data transfer size.
> >
> > Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> > ---
> > drivers/nvme/target/tcp.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> > index a456dd2fd8bd..c7253221c342 100644
> > --- a/drivers/nvme/target/tcp.c
> > +++ b/drivers/nvme/target/tcp.c
> > @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
> > return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
> > cmd->pdu_len = len;
> > }
> > + if (len > NVMET_TCP_MAXH2CDATA)
> > + return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
> > cmd->req.transfer_len += len;
> >
> > cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);
>
>
> I am not sure this patch is correct, the intent is valid but you are conflating
> the total transfer size with the H2CData maximum PDU size.
>
> In other words, if I understood the code correctly:
>
> MAXH2CDATA is the limit of a single H2CData packet's size.
> sgl->length is the total amount of data to transfer.
>
> Also, will the request be completed with "NVME_SC_SGL_INVALID_DATA |
> NVME_STATUS_DNR" code? If yes, this would be a violation of the spec
> that states that an H2CDATA with a too large size should trigger a
> fatal transport error.
>
> Maybe the correct fix is to set a sane default for MDTS
> (Maximum Data Transfer Size), at the moment it's left to zero (no limit).
>
> Maurizio
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-03-19 18:00 ` Shivam Kumar
@ 2026-03-20 7:48 ` Christoph Hellwig
2026-04-05 19:46 ` Shivam Kumar
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2026-03-20 7:48 UTC (permalink / raw)
To: Shivam Kumar
Cc: Maurizio Lombardi, gregkh, security, hch, sagi, kch, linux-nvme
On Thu, Mar 19, 2026 at 02:00:27PM -0400, Shivam Kumar wrote:
> Hi Maurizio
>
> Thank you for the review. You're right, I was conflating the per-PDU
> H2CData limit with the total transfer size for the command.
> And the error handling should trigger a fatal transport error rather
> than completing the request with an NVMe status code.
>
> Would setting a sane default for MDTS be the preferred approach here?
> If so, I'm happy to send a v2 implementing that instead.
Yes, if we want to limit the I/O size we have to limit MDTS. Which all
other transports do anyway, so I'm kinda surprised TCP doesn't.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-03-20 7:48 ` Christoph Hellwig
@ 2026-04-05 19:46 ` Shivam Kumar
2026-04-07 6:30 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Shivam Kumar @ 2026-04-05 19:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme
Hi Christoph, Maurizio
Just following up on this thread.
Would you like me to send a v2 patch that sets a sane default MDTS for
the TCP transport? If so, is there a preferred value I should use, or
should I follow what the other transports do?
Thanks,
Shivam Kumar
On Fri, Mar 20, 2026 at 3:48 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Mar 19, 2026 at 02:00:27PM -0400, Shivam Kumar wrote:
> > Hi Maurizio
> >
> > Thank you for the review. You're right, I was conflating the per-PDU
> > H2CData limit with the total transfer size for the command.
> > And the error handling should trigger a fatal transport error rather
> > than completing the request with an NVMe status code.
> >
> > Would setting a sane default for MDTS be the preferred approach here?
> > If so, I'm happy to send a v2 implementing that instead.
>
> Yes, if we want to limit the I/O size we have to limit MDTS. Which all
> other transports do anyway, so I'm kinda surprised TCP doesn't.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-04-05 19:46 ` Shivam Kumar
@ 2026-04-07 6:30 ` Christoph Hellwig
2026-04-08 6:30 ` Maurizio Lombardi
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2026-04-07 6:30 UTC (permalink / raw)
To: Shivam Kumar
Cc: Christoph Hellwig, Maurizio Lombardi, gregkh, security, sagi, kch,
linux-nvme
On Sun, Apr 05, 2026 at 03:46:12PM -0400, Shivam Kumar wrote:
> Hi Christoph, Maurizio
>
> Just following up on this thread.
> Would you like me to send a v2 patch that sets a sane default MDTS for
> the TCP transport?
Yes, that would be great!
> If so, is there a preferred value I should use, or
> should I follow what the other transports do?
I'm honestly not sure. Maurizio/Sagi: any good idea what a TCP default
max MDTS might be?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-04-07 6:30 ` Christoph Hellwig
@ 2026-04-08 6:30 ` Maurizio Lombardi
0 siblings, 0 replies; 7+ messages in thread
From: Maurizio Lombardi @ 2026-04-08 6:30 UTC (permalink / raw)
To: Christoph Hellwig, Shivam Kumar
Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme
On Tue Apr 7, 2026 at 8:30 AM CEST, Christoph Hellwig wrote:
> On Sun, Apr 05, 2026 at 03:46:12PM -0400, Shivam Kumar wrote:
>> Hi Christoph, Maurizio
>>
>> Just following up on this thread.
>> Would you like me to send a v2 patch that sets a sane default MDTS for
>> the TCP transport?
>
> Yes, that would be great!
>
>> If so, is there a preferred value I should use, or
>> should I follow what the other transports do?
>
> I'm honestly not sure. Maurizio/Sagi: any good idea what a TCP default
> max MDTS might be?
I'm not sure there is a "correct" value here. It seems to mostly come
down to balancing performance vs memory usage, rather than anything
TCP-specific.
For reference, pci-epf allows a max MDTS of 2 MiB[1], while RDMA uses 1 MiB.
Personally, I would pick a default of 2 MiB, it should be large
enough to avoid excessive splitting while still being reasonable from a
memory perspective.
Sagi? Any thought on that?
I have seen a patch on this mailing list that makes mdts configurable
via configfs, so in case of need a user could still change it
to his preferred value.
Maurizio
[1] https://docs.kernel.org/nvme/nvme-pci-endpoint-target.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-08 6:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2026031805-stretch-skid-ae5b@gregkh>
2026-03-19 1:26 ` [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() Shivam Kumar
2026-03-19 7:59 ` Maurizio Lombardi
2026-03-19 18:00 ` Shivam Kumar
2026-03-20 7:48 ` Christoph Hellwig
2026-04-05 19:46 ` Shivam Kumar
2026-04-07 6:30 ` Christoph Hellwig
2026-04-08 6:30 ` Maurizio Lombardi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox