* [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
[not found] <2026031805-stretch-skid-ae5b@gregkh>
@ 2026-03-19 1:26 ` Shivam Kumar
2026-03-19 7:59 ` Maurizio Lombardi
0 siblings, 1 reply; 17+ messages in thread
From: Shivam Kumar @ 2026-03-19 1:26 UTC (permalink / raw)
To: gregkh; +Cc: security, hch, sagi, kch, linux-nvme, kumar.shivam43666
nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
and passes it to sgl_alloc() for scatter-gather allocation. The only
existing size check (len > inline_data_size) is gated behind
sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
other SGL type bypasses the validation entirely. A remote attacker can
send a CapsuleCmd with a non-offset SGL type and a large sgl->length
(e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
page allocator WARNING and kernel panic when panic_on_warn is set.
Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
sgl_alloc(). This value is already advertised to the host in the ICResp
maxdata field as the maximum data transfer size.
Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
---
drivers/nvme/target/tcp.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index a456dd2fd8bd..c7253221c342 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
cmd->pdu_len = len;
}
+ if (len > NVMET_TCP_MAXH2CDATA)
+ return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
cmd->req.transfer_len += len;
cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-03-19 1:26 ` [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() Shivam Kumar
@ 2026-03-19 7:59 ` Maurizio Lombardi
2026-03-19 18:00 ` Shivam Kumar
0 siblings, 1 reply; 17+ messages in thread
From: Maurizio Lombardi @ 2026-03-19 7:59 UTC (permalink / raw)
To: Shivam Kumar, gregkh; +Cc: security, hch, sagi, kch, linux-nvme
On Thu Mar 19, 2026 at 2:26 AM CET, Shivam Kumar wrote:
> nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
> and passes it to sgl_alloc() for scatter-gather allocation. The only
> existing size check (len > inline_data_size) is gated behind
> sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
> other SGL type bypasses the validation entirely. A remote attacker can
> send a CapsuleCmd with a non-offset SGL type and a large sgl->length
> (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
> page allocator WARNING and kernel panic when panic_on_warn is set.
>
> Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
> sgl_alloc(). This value is already advertised to the host in the ICResp
> maxdata field as the maximum data transfer size.
>
> Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> ---
> drivers/nvme/target/tcp.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index a456dd2fd8bd..c7253221c342 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
> return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
> cmd->pdu_len = len;
> }
> + if (len > NVMET_TCP_MAXH2CDATA)
> + return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
> cmd->req.transfer_len += len;
>
> cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);
I am not sure this patch is correct, the intent is valid but you are conflating
the total transfer size with the H2CData maximum PDU size.
In other words, if I understood the code correctly:
MAXH2CDATA is the limit of a single H2CData packet's size.
sgl->length is the total amount of data to transfer.
Also, will the request be completed with "NVME_SC_SGL_INVALID_DATA |
NVME_STATUS_DNR" code? If yes, this would be a violation of the spec
that states that an H2CDATA with a too large size should trigger a
fatal transport error.
Maybe the correct fix is to set a sane default for MDTS
(Maximum Data Transfer Size), at the moment it's left to zero (no limit).
Maurizio
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-03-19 7:59 ` Maurizio Lombardi
@ 2026-03-19 18:00 ` Shivam Kumar
2026-03-20 7:48 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Shivam Kumar @ 2026-03-19 18:00 UTC (permalink / raw)
To: Maurizio Lombardi; +Cc: gregkh, security, hch, sagi, kch, linux-nvme
Hi Maurizio
Thank you for the review. You're right, I was conflating the per-PDU
H2CData limit with the total transfer size for the command.
And the error handling should trigger a fatal transport error rather
than completing the request with an NVMe status code.
Would setting a sane default for MDTS be the preferred approach here?
If so, I'm happy to send a v2 implementing that instead.
Thanks,
Shivam
On Thu, Mar 19, 2026 at 3:59 AM Maurizio Lombardi <mlombard@arkamax.eu> wrote:
>
> On Thu Mar 19, 2026 at 2:26 AM CET, Shivam Kumar wrote:
> > nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command
> > and passes it to sgl_alloc() for scatter-gather allocation. The only
> > existing size check (len > inline_data_size) is gated behind
> > sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any
> > other SGL type bypasses the validation entirely. A remote attacker can
> > send a CapsuleCmd with a non-offset SGL type and a large sgl->length
> > (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a
> > page allocator WARNING and kernel panic when panic_on_warn is set.
> >
> > Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling
> > sgl_alloc(). This value is already advertised to the host in the ICResp
> > maxdata field as the maximum data transfer size.
> >
> > Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> > ---
> > drivers/nvme/target/tcp.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> > index a456dd2fd8bd..c7253221c342 100644
> > --- a/drivers/nvme/target/tcp.c
> > +++ b/drivers/nvme/target/tcp.c
> > @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
> > return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
> > cmd->pdu_len = len;
> > }
> > + if (len > NVMET_TCP_MAXH2CDATA)
> > + return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR;
> > cmd->req.transfer_len += len;
> >
> > cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt);
>
>
> I am not sure this patch is correct, the intent is valid but you are conflating
> the total transfer size with the H2CData maximum PDU size.
>
> In other words, if I understood the code correctly:
>
> MAXH2CDATA is the limit of a single H2CData packet's size.
> sgl->length is the total amount of data to transfer.
>
> Also, will the request be completed with "NVME_SC_SGL_INVALID_DATA |
> NVME_STATUS_DNR" code? If yes, this would be a violation of the spec
> that states that an H2CDATA with a too large size should trigger a
> fatal transport error.
>
> Maybe the correct fix is to set a sane default for MDTS
> (Maximum Data Transfer Size), at the moment it's left to zero (no limit).
>
> Maurizio
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-03-19 18:00 ` Shivam Kumar
@ 2026-03-20 7:48 ` Christoph Hellwig
2026-04-05 19:46 ` Shivam Kumar
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2026-03-20 7:48 UTC (permalink / raw)
To: Shivam Kumar
Cc: Maurizio Lombardi, gregkh, security, hch, sagi, kch, linux-nvme
On Thu, Mar 19, 2026 at 02:00:27PM -0400, Shivam Kumar wrote:
> Hi Maurizio
>
> Thank you for the review. You're right, I was conflating the per-PDU
> H2CData limit with the total transfer size for the command.
> And the error handling should trigger a fatal transport error rather
> than completing the request with an NVMe status code.
>
> Would setting a sane default for MDTS be the preferred approach here?
> If so, I'm happy to send a v2 implementing that instead.
Yes, if we want to limit the I/O size we have to limit MDTS. Which all
other transports do anyway, so I'm kinda surprised TCP doesn't.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-03-20 7:48 ` Christoph Hellwig
@ 2026-04-05 19:46 ` Shivam Kumar
2026-04-07 6:30 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Shivam Kumar @ 2026-04-05 19:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme
Hi Christoph, Maurizio
Just following up on this thread.
Would you like me to send a v2 patch that sets a sane default MDTS for
the TCP transport? If so, is there a preferred value I should use, or
should I follow what the other transports do?
Thanks,
Shivam Kumar
On Fri, Mar 20, 2026 at 3:48 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Mar 19, 2026 at 02:00:27PM -0400, Shivam Kumar wrote:
> > Hi Maurizio
> >
> > Thank you for the review. You're right, I was conflating the per-PDU
> > H2CData limit with the total transfer size for the command.
> > And the error handling should trigger a fatal transport error rather
> > than completing the request with an NVMe status code.
> >
> > Would setting a sane default for MDTS be the preferred approach here?
> > If so, I'm happy to send a v2 implementing that instead.
>
> Yes, if we want to limit the I/O size we have to limit MDTS. Which all
> other transports do anyway, so I'm kinda surprised TCP doesn't.
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-04-05 19:46 ` Shivam Kumar
@ 2026-04-07 6:30 ` Christoph Hellwig
2026-04-08 6:30 ` Maurizio Lombardi
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2026-04-07 6:30 UTC (permalink / raw)
To: Shivam Kumar
Cc: Christoph Hellwig, Maurizio Lombardi, gregkh, security, sagi, kch,
linux-nvme
On Sun, Apr 05, 2026 at 03:46:12PM -0400, Shivam Kumar wrote:
> Hi Christoph, Maurizio
>
> Just following up on this thread.
> Would you like me to send a v2 patch that sets a sane default MDTS for
> the TCP transport?
Yes, that would be great!
> If so, is there a preferred value I should use, or
> should I follow what the other transports do?
I'm honestly not sure. Maurizio/Sagi: any good idea what a TCP default
max MDTS might be?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-04-07 6:30 ` Christoph Hellwig
@ 2026-04-08 6:30 ` Maurizio Lombardi
2026-04-09 6:08 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Maurizio Lombardi @ 2026-04-08 6:30 UTC (permalink / raw)
To: Christoph Hellwig, Shivam Kumar
Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme
On Tue Apr 7, 2026 at 8:30 AM CEST, Christoph Hellwig wrote:
> On Sun, Apr 05, 2026 at 03:46:12PM -0400, Shivam Kumar wrote:
>> Hi Christoph, Maurizio
>>
>> Just following up on this thread.
>> Would you like me to send a v2 patch that sets a sane default MDTS for
>> the TCP transport?
>
> Yes, that would be great!
>
>> If so, is there a preferred value I should use, or
>> should I follow what the other transports do?
>
> I'm honestly not sure. Maurizio/Sagi: any good idea what a TCP default
> max MDTS might be?
I'm not sure there is a "correct" value here. It seems to mostly come
down to balancing performance vs memory usage, rather than anything
TCP-specific.
For reference, pci-epf allows a max MDTS of 2 MiB[1], while RDMA uses 1 MiB.
Personally, I would pick a default of 2 MiB, it should be large
enough to avoid excessive splitting while still being reasonable from a
memory perspective.
Sagi? Any thought on that?
I have seen a patch on this mailing list that makes mdts configurable
via configfs, so in case of need a user could still change it
to his preferred value.
Maurizio
[1] https://docs.kernel.org/nvme/nvme-pci-endpoint-target.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-04-08 6:30 ` Maurizio Lombardi
@ 2026-04-09 6:08 ` Christoph Hellwig
2026-04-26 23:57 ` Shivam Kumar
2026-04-27 0:44 ` [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport Shivam Kumar
0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2026-04-09 6:08 UTC (permalink / raw)
To: Maurizio Lombardi
Cc: Christoph Hellwig, Shivam Kumar, gregkh, security, sagi, kch,
linux-nvme
On Wed, Apr 08, 2026 at 08:30:12AM +0200, Maurizio Lombardi wrote:
> Personally, I would pick a default of 2 MiB, it should be large
> enough to avoid excessive splitting while still being reasonable from a
> memory perspective.
>
> Sagi? Any thought on that?
>
> I have seen a patch on this mailing list that makes mdts configurable
> via configfs, so in case of need a user could still change it
> to his preferred value.
Right now that just allows adjusting down, so picking the larger
value sounds sane. We can still increase it later and/or allow
adjusting above the default to an upper cap if needed.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data()
2026-04-09 6:08 ` Christoph Hellwig
@ 2026-04-26 23:57 ` Shivam Kumar
2026-04-27 0:44 ` [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport Shivam Kumar
1 sibling, 0 replies; 17+ messages in thread
From: Shivam Kumar @ 2026-04-26 23:57 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme
Hi all,
Gentle ping on this. Is there anything else needed from my side?
Or is a consensus forming around the 2 MiB default?
Happy to send an updated patch if that helps move things along.
Thanks,
Shivam
On Thu, Apr 9, 2026 at 2:08 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Apr 08, 2026 at 08:30:12AM +0200, Maurizio Lombardi wrote:
> > Personally, I would pick a default of 2 MiB, it should be large
> > enough to avoid excessive splitting while still being reasonable from a
> > memory perspective.
> >
> > Sagi? Any thought on that?
> >
> > I have seen a patch on this mailing list that makes mdts configurable
> > via configfs, so in case of need a user could still change it
> > to his preferred value.
>
> Right now that just allows adjusting down, so picking the larger
> value sounds sane. We can still increase it later and/or allow
> adjusting above the default to an upper cap if needed.
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport
2026-04-09 6:08 ` Christoph Hellwig
2026-04-26 23:57 ` Shivam Kumar
@ 2026-04-27 0:44 ` Shivam Kumar
2026-04-28 6:07 ` Maurizio Lombardi
1 sibling, 1 reply; 17+ messages in thread
From: Shivam Kumar @ 2026-04-27 0:44 UTC (permalink / raw)
To: hch; +Cc: mlombard, sagi, kch, linux-nvme, kbusch, gregkh, security,
Shivam Kumar
Unlike other fabrics transports, the TCP target does not set a default
Maximum Data Transfer Size. With the configfs MDTS entry defaulting to 0
(no limit), a remote attacker can send a CapsuleCmd with an arbitrarily
large SGL length, causing sgl_alloc() in nvmet_tcp_map_data() to attempt
an excessive kernel allocation that triggers the OOM killer.
Set a default MDTS of 9 (2 MiB) for TCP. Admins can still adjust via
the configfs mdts attribute if needed.
Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
---
drivers/nvme/target/tcp.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index a456dd2fd8bd..d09c81d07a1d 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -25,6 +25,7 @@
#define NVMET_TCP_DEF_INLINE_DATA_SIZE (4 * PAGE_SIZE)
#define NVMET_TCP_MAXH2CDATA 0x400000 /* 16M arbitrary limit */
#define NVMET_TCP_BACKLOG 128
+#define NVMET_TCP_DEF_MDTS 9 /* 2 MiB (2^(12+9)) */
static int param_store_val(const char *str, int *val, int min, int max)
{
@@ -2067,6 +2068,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
INIT_WORK(&port->accept_work, nvmet_tcp_accept_work);
if (port->nport->inline_data_size < 0)
port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
+ if (nport->mdts < 0)
+ nport->mdts = NVMET_TCP_DEF_MDTS;
ret = sock_create(port->addr.ss_family, SOCK_STREAM,
IPPROTO_TCP, &port->sock);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport
2026-04-27 0:44 ` [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport Shivam Kumar
@ 2026-04-28 6:07 ` Maurizio Lombardi
2026-04-29 1:05 ` Shivam Kumar
2026-05-08 20:39 ` [PATCH v3] nvmet-tcp: set and enforce a default MDTS " Shivam Kumar
0 siblings, 2 replies; 17+ messages in thread
From: Maurizio Lombardi @ 2026-04-28 6:07 UTC (permalink / raw)
To: Shivam Kumar, hch
Cc: mlombard, sagi, kch, linux-nvme, kbusch, gregkh, security
On Mon Apr 27, 2026 at 2:44 AM CEST, Shivam Kumar wrote:
> Unlike other fabrics transports, the TCP target does not set a default
> Maximum Data Transfer Size. With the configfs MDTS entry defaulting to 0
> (no limit), a remote attacker can send a CapsuleCmd with an arbitrarily
> large SGL length, causing sgl_alloc() in nvmet_tcp_map_data() to attempt
> an excessive kernel allocation that triggers the OOM killer.
>
> Set a default MDTS of 9 (2 MiB) for TCP. Admins can still adjust via
> the configfs mdts attribute if needed.
>
> Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> ---
> drivers/nvme/target/tcp.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index a456dd2fd8bd..d09c81d07a1d 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -25,6 +25,7 @@
> #define NVMET_TCP_DEF_INLINE_DATA_SIZE (4 * PAGE_SIZE)
> #define NVMET_TCP_MAXH2CDATA 0x400000 /* 16M arbitrary limit */
> #define NVMET_TCP_BACKLOG 128
> +#define NVMET_TCP_DEF_MDTS 9 /* 2 MiB (2^(12+9)) */
>
> static int param_store_val(const char *str, int *val, int min, int max)
> {
> @@ -2067,6 +2068,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
> INIT_WORK(&port->accept_work, nvmet_tcp_accept_work);
> if (port->nport->inline_data_size < 0)
> port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
> + if (nport->mdts < 0)
> + nport->mdts = NVMET_TCP_DEF_MDTS;
>
> ret = sock_create(port->addr.ss_family, SOCK_STREAM,
> IPPROTO_TCP, &port->sock);
Ok, but what happens if a host doesn't respect the limitation advertized
by the target?
The spec explicitely says that "If a command is submitted that exceeds this transfer
size, then the command is aborted with a status code of Invalid Field in
Command."
Maurizio
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport
2026-04-28 6:07 ` Maurizio Lombardi
@ 2026-04-29 1:05 ` Shivam Kumar
2026-05-08 20:39 ` [PATCH v3] nvmet-tcp: set and enforce a default MDTS " Shivam Kumar
1 sibling, 0 replies; 17+ messages in thread
From: Shivam Kumar @ 2026-04-29 1:05 UTC (permalink / raw)
To: Maurizio Lombardi; +Cc: hch, sagi, kch, linux-nvme, kbusch, gregkh, security
Hi Maurizio,
I'll add enforcement to reject commands exceeding the MDTS limit and send a v3.
Thanks,
Shivam Kumar
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3] nvmet-tcp: set and enforce a default MDTS for TCP transport
2026-04-28 6:07 ` Maurizio Lombardi
2026-04-29 1:05 ` Shivam Kumar
@ 2026-05-08 20:39 ` Shivam Kumar
2026-05-10 20:42 ` Sagi Grimberg
1 sibling, 1 reply; 17+ messages in thread
From: Shivam Kumar @ 2026-05-08 20:39 UTC (permalink / raw)
To: hch; +Cc: mlombard, sagi, kch, linux-nvme, kbusch, gregkh, security,
Shivam Kumar
Unlike other fabrics transports, the TCP target does not set a default
Maximum Data Transfer Size. With the configfs MDTS entry defaulting to 0
(no limit), a remote attacker can send a CapsuleCmd with an arbitrarily
large SGL length, causing sgl_alloc() in nvmet_tcp_map_data() to attempt
an excessive kernel allocation that triggers the OOM killer.
Set a default MDTS of 9 (2 MiB) for TCP. Enforce the limit server-side
in nvmet_tcp_map_data() by rejecting commands whose SGL length exceeds
the configured MDTS, returning NVME_SC_INVALID_FIELD as required by the
NVMe specification. Admins can still adjust via the configfs mdts
attribute if needed.
Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
---
drivers/nvme/target/tcp.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 164a564ba3b4..098cf877c358 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -25,6 +25,7 @@
#define NVMET_TCP_DEF_INLINE_DATA_SIZE (4 * PAGE_SIZE)
#define NVMET_TCP_MAXH2CDATA 0x400000 /* 16M arbitrary limit */
#define NVMET_TCP_BACKLOG 128
+#define NVMET_TCP_DEF_MDTS 9 /* 2 MiB (2^(12+9)) */
static int param_store_val(const char *str, int *val, int min, int max)
{
@@ -422,6 +423,13 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
if (!len)
return 0;
+ /* Enforce MDTS: abort commands exceeding the advertised limit */
+ if (cmd->req.port->mdts) {
+ u8 mdts = cmd->req.port->mdts;
+ if (mdts < 20 && len > (1U << (12 + mdts)))
+ return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+ }
+
if (sgl->type == ((NVME_SGL_FMT_DATA_DESC << 4) |
NVME_SGL_FMT_OFFSET)) {
if (!nvme_is_write(cmd->req.cmd))
@@ -2077,6 +2085,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
INIT_WORK(&port->accept_work, nvmet_tcp_accept_work);
if (port->nport->inline_data_size < 0)
port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
+ if (nport->mdts < 0)
+ nport->mdts = NVMET_TCP_DEF_MDTS;
ret = sock_create(port->addr.ss_family, SOCK_STREAM,
IPPROTO_TCP, &port->sock);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3] nvmet-tcp: set and enforce a default MDTS for TCP transport
2026-05-08 20:39 ` [PATCH v3] nvmet-tcp: set and enforce a default MDTS " Shivam Kumar
@ 2026-05-10 20:42 ` Sagi Grimberg
2026-05-11 6:36 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Sagi Grimberg @ 2026-05-10 20:42 UTC (permalink / raw)
To: Shivam Kumar, hch; +Cc: mlombard, kch, linux-nvme, kbusch, gregkh, security
On 08/05/2026 23:39, Shivam Kumar wrote:
> Unlike other fabrics transports, the TCP target does not set a default
> Maximum Data Transfer Size. With the configfs MDTS entry defaulting to 0
> (no limit), a remote attacker can send a CapsuleCmd with an arbitrarily
> large SGL length, causing sgl_alloc() in nvmet_tcp_map_data() to attempt
> an excessive kernel allocation that triggers the OOM killer.
>
> Set a default MDTS of 9 (2 MiB) for TCP. Enforce the limit server-side
> in nvmet_tcp_map_data() by rejecting commands whose SGL length exceeds
> the configured MDTS, returning NVME_SC_INVALID_FIELD as required by the
> NVMe specification. Admins can still adjust via the configfs mdts
> attribute if needed.
>
> Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
> ---
> drivers/nvme/target/tcp.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 164a564ba3b4..098cf877c358 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -25,6 +25,7 @@
> #define NVMET_TCP_DEF_INLINE_DATA_SIZE (4 * PAGE_SIZE)
> #define NVMET_TCP_MAXH2CDATA 0x400000 /* 16M arbitrary limit */
> #define NVMET_TCP_BACKLOG 128
> +#define NVMET_TCP_DEF_MDTS 9 /* 2 MiB (2^(12+9)) */
>
> static int param_store_val(const char *str, int *val, int min, int max)
> {
> @@ -422,6 +423,13 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd)
> if (!len)
> return 0;
>
> + /* Enforce MDTS: abort commands exceeding the advertised limit */
> + if (cmd->req.port->mdts) {
> + u8 mdts = cmd->req.port->mdts;
> + if (mdts < 20 && len > (1U << (12 + mdts)))
> + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
> + }
> +
> if (sgl->type == ((NVME_SGL_FMT_DATA_DESC << 4) |
> NVME_SGL_FMT_OFFSET)) {
> if (!nvme_is_write(cmd->req.cmd))
> @@ -2077,6 +2085,8 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
> INIT_WORK(&port->accept_work, nvmet_tcp_accept_work);
> if (port->nport->inline_data_size < 0)
> port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
> + if (nport->mdts < 0)
> + nport->mdts = NVMET_TCP_DEF_MDTS;
>
> ret = sock_create(port->addr.ss_family, SOCK_STREAM,
> IPPROTO_TCP, &port->sock);
Shivam,
I think what we want is to limit the tcp to a sane limit similar to the
nvmet-rdma driver.
Also, we probably want it at least consistent with the (rather
arbitrary) MAXH2CDATA...
e.g. something like:
--
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 20f150d17a96..32f33e5dbfdb 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -24,6 +24,7 @@
#define NVMET_TCP_DEF_INLINE_DATA_SIZE (4 * PAGE_SIZE)
#define NVMET_TCP_MAXH2CDATA 0x400000 /* 16M arbitrary limit */
+#define NVMET_TCP_MAX_MDTS 12
#define NVMET_TCP_BACKLOG 128
static int param_store_val(const char *str, int *val, int min, int max)
@@ -2220,6 +2221,11 @@ static ssize_t nvmet_tcp_host_port_addr(struct
nvmet_ctrl *ctrl,
(struct sockaddr *)&queue->sockaddr_peer);
}
+static u8 nvmet_tcp_get_mdts(const struct nvmet_ctrl *ctrl)
+{
+ return NVMET_TCP_MAX_MDTS;
+}
+
static const struct nvmet_fabrics_ops nvmet_tcp_ops = {
.owner = THIS_MODULE,
.type = NVMF_TRTYPE_TCP,
@@ -2231,6 +2237,7 @@ static const struct nvmet_fabrics_ops
nvmet_tcp_ops = {
.install_queue = nvmet_tcp_install_queue,
.disc_traddr = nvmet_tcp_disc_port_addr,
.host_traddr = nvmet_tcp_host_port_addr,
+ .get_mdts = nvmet_tcp_get_mdts,
};
static int __init nvmet_tcp_init(void)
--
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3] nvmet-tcp: set and enforce a default MDTS for TCP transport
2026-05-10 20:42 ` Sagi Grimberg
@ 2026-05-11 6:36 ` Christoph Hellwig
2026-05-11 8:01 ` Sagi Grimberg
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2026-05-11 6:36 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Shivam Kumar, hch, mlombard, kch, linux-nvme, kbusch, gregkh,
security
On Sun, May 10, 2026 at 11:42:48PM +0300, Sagi Grimberg wrote:
> I think what we want is to limit the tcp to a sane limit similar to the
> nvmet-rdma driver.
Agreed.
We might still need the early enforcement in Shivam's patch, though.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3] nvmet-tcp: set and enforce a default MDTS for TCP transport
2026-05-11 6:36 ` Christoph Hellwig
@ 2026-05-11 8:01 ` Sagi Grimberg
2026-05-11 8:04 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Sagi Grimberg @ 2026-05-11 8:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Shivam Kumar, mlombard, kch, linux-nvme, kbusch, gregkh, security
On 11/05/2026 9:36, Christoph Hellwig wrote:
> On Sun, May 10, 2026 at 11:42:48PM +0300, Sagi Grimberg wrote:
>> I think what we want is to limit the tcp to a sane limit similar to the
>> nvmet-rdma driver.
> Agreed.
>
> We might still need the early enforcement in Shivam's patch, though.
>
Agreed, I think that rdma is susceptible here as well (perhaps also fc?).
Perhaps we can have a helper for nvmet_check_mdts() that would accept
the request after req->transfer_len was initialized.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3] nvmet-tcp: set and enforce a default MDTS for TCP transport
2026-05-11 8:01 ` Sagi Grimberg
@ 2026-05-11 8:04 ` Christoph Hellwig
0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2026-05-11 8:04 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, Shivam Kumar, mlombard, kch, linux-nvme,
kbusch, gregkh, security
On Mon, May 11, 2026 at 11:01:18AM +0300, Sagi Grimberg wrote:
> Agreed, I think that rdma is susceptible here as well (perhaps also fc?).
>
> Perhaps we can have a helper for nvmet_check_mdts() that would accept
> the request after req->transfer_len was initialized.
Sounds good.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-05-11 8:04 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2026031805-stretch-skid-ae5b@gregkh>
2026-03-19 1:26 ` [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() Shivam Kumar
2026-03-19 7:59 ` Maurizio Lombardi
2026-03-19 18:00 ` Shivam Kumar
2026-03-20 7:48 ` Christoph Hellwig
2026-04-05 19:46 ` Shivam Kumar
2026-04-07 6:30 ` Christoph Hellwig
2026-04-08 6:30 ` Maurizio Lombardi
2026-04-09 6:08 ` Christoph Hellwig
2026-04-26 23:57 ` Shivam Kumar
2026-04-27 0:44 ` [PATCH v2] nvmet-tcp: set a default MDTS of 2 MiB for TCP transport Shivam Kumar
2026-04-28 6:07 ` Maurizio Lombardi
2026-04-29 1:05 ` Shivam Kumar
2026-05-08 20:39 ` [PATCH v3] nvmet-tcp: set and enforce a default MDTS " Shivam Kumar
2026-05-10 20:42 ` Sagi Grimberg
2026-05-11 6:36 ` Christoph Hellwig
2026-05-11 8:01 ` Sagi Grimberg
2026-05-11 8:04 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox