* [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() [not found] <2026031805-stretch-skid-ae5b@gregkh> @ 2026-03-19 1:26 ` Shivam Kumar 2026-03-19 7:59 ` Maurizio Lombardi 0 siblings, 1 reply; 7+ messages in thread From: Shivam Kumar @ 2026-03-19 1:26 UTC (permalink / raw) To: gregkh; +Cc: security, hch, sagi, kch, linux-nvme, kumar.shivam43666 nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command and passes it to sgl_alloc() for scatter-gather allocation. The only existing size check (len > inline_data_size) is gated behind sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any other SGL type bypasses the validation entirely. A remote attacker can send a CapsuleCmd with a non-offset SGL type and a large sgl->length (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a page allocator WARNING and kernel panic when panic_on_warn is set. Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling sgl_alloc(). This value is already advertised to the host in the ICResp maxdata field as the maximum data transfer size. Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com> --- drivers/nvme/target/tcp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index a456dd2fd8bd..c7253221c342 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd) return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR; cmd->pdu_len = len; } + if (len > NVMET_TCP_MAXH2CDATA) + return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR; cmd->req.transfer_len += len; cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt); -- 2.34.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() 2026-03-19 1:26 ` [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() Shivam Kumar @ 2026-03-19 7:59 ` Maurizio Lombardi 2026-03-19 18:00 ` Shivam Kumar 0 siblings, 1 reply; 7+ messages in thread From: Maurizio Lombardi @ 2026-03-19 7:59 UTC (permalink / raw) To: Shivam Kumar, gregkh; +Cc: security, hch, sagi, kch, linux-nvme On Thu Mar 19, 2026 at 2:26 AM CET, Shivam Kumar wrote: > nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command > and passes it to sgl_alloc() for scatter-gather allocation. The only > existing size check (len > inline_data_size) is gated behind > sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any > other SGL type bypasses the validation entirely. A remote attacker can > send a CapsuleCmd with a non-offset SGL type and a large sgl->length > (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a > page allocator WARNING and kernel panic when panic_on_warn is set. > > Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling > sgl_alloc(). This value is already advertised to the host in the ICResp > maxdata field as the maximum data transfer size. > > Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com> > --- > drivers/nvme/target/tcp.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c > index a456dd2fd8bd..c7253221c342 100644 > --- a/drivers/nvme/target/tcp.c > +++ b/drivers/nvme/target/tcp.c > @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd) > return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR; > cmd->pdu_len = len; > } > + if (len > NVMET_TCP_MAXH2CDATA) > + return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR; > cmd->req.transfer_len += len; > > cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt); I am not sure this patch is correct, the intent is valid but you are conflating the total transfer size with the H2CData maximum PDU size. In other words, if I understood the code correctly: MAXH2CDATA is the limit of a single H2CData packet's size. sgl->length is the total amount of data to transfer. Also, will the request be completed with "NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR" code? If yes, this would be a violation of the spec that states that an H2CDATA with a too large size should trigger a fatal transport error. Maybe the correct fix is to set a sane default for MDTS (Maximum Data Transfer Size), at the moment it's left to zero (no limit). Maurizio ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() 2026-03-19 7:59 ` Maurizio Lombardi @ 2026-03-19 18:00 ` Shivam Kumar 2026-03-20 7:48 ` Christoph Hellwig 0 siblings, 1 reply; 7+ messages in thread From: Shivam Kumar @ 2026-03-19 18:00 UTC (permalink / raw) To: Maurizio Lombardi; +Cc: gregkh, security, hch, sagi, kch, linux-nvme Hi Maurizio Thank you for the review. You're right, I was conflating the per-PDU H2CData limit with the total transfer size for the command. And the error handling should trigger a fatal transport error rather than completing the request with an NVMe status code. Would setting a sane default for MDTS be the preferred approach here? If so, I'm happy to send a v2 implementing that instead. Thanks, Shivam On Thu, Mar 19, 2026 at 3:59 AM Maurizio Lombardi <mlombard@arkamax.eu> wrote: > > On Thu Mar 19, 2026 at 2:26 AM CET, Shivam Kumar wrote: > > nvmet_tcp_map_data() reads the SGL length from the incoming NVMe command > > and passes it to sgl_alloc() for scatter-gather allocation. The only > > existing size check (len > inline_data_size) is gated behind > > sgl->type == NVME_SGL_FMT_DATA_DESC | NVME_SGL_FMT_OFFSET, meaning any > > other SGL type bypasses the validation entirely. A remote attacker can > > send a CapsuleCmd with a non-offset SGL type and a large sgl->length > > (e.g. ~3 GB) to trigger an excessive kernel allocation, resulting in a > > page allocator WARNING and kernel panic when panic_on_warn is set. > > > > Add an upper-bound check against NVMET_TCP_MAXH2CDATA before calling > > sgl_alloc(). This value is already advertised to the host in the ICResp > > maxdata field as the maximum data transfer size. > > > > Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com> > > --- > > drivers/nvme/target/tcp.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c > > index a456dd2fd8bd..c7253221c342 100644 > > --- a/drivers/nvme/target/tcp.c > > +++ b/drivers/nvme/target/tcp.c > > @@ -431,6 +431,8 @@ static int nvmet_tcp_map_data(struct nvmet_tcp_cmd *cmd) > > return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR; > > cmd->pdu_len = len; > > } > > + if (len > NVMET_TCP_MAXH2CDATA) > > + return NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR; > > cmd->req.transfer_len += len; > > > > cmd->req.sg = sgl_alloc(len, GFP_KERNEL, &cmd->req.sg_cnt); > > > I am not sure this patch is correct, the intent is valid but you are conflating > the total transfer size with the H2CData maximum PDU size. > > In other words, if I understood the code correctly: > > MAXH2CDATA is the limit of a single H2CData packet's size. > sgl->length is the total amount of data to transfer. > > Also, will the request be completed with "NVME_SC_SGL_INVALID_DATA | > NVME_STATUS_DNR" code? If yes, this would be a violation of the spec > that states that an H2CDATA with a too large size should trigger a > fatal transport error. > > Maybe the correct fix is to set a sane default for MDTS > (Maximum Data Transfer Size), at the moment it's left to zero (no limit). > > Maurizio > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() 2026-03-19 18:00 ` Shivam Kumar @ 2026-03-20 7:48 ` Christoph Hellwig 2026-04-05 19:46 ` Shivam Kumar 0 siblings, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2026-03-20 7:48 UTC (permalink / raw) To: Shivam Kumar Cc: Maurizio Lombardi, gregkh, security, hch, sagi, kch, linux-nvme On Thu, Mar 19, 2026 at 02:00:27PM -0400, Shivam Kumar wrote: > Hi Maurizio > > Thank you for the review. You're right, I was conflating the per-PDU > H2CData limit with the total transfer size for the command. > And the error handling should trigger a fatal transport error rather > than completing the request with an NVMe status code. > > Would setting a sane default for MDTS be the preferred approach here? > If so, I'm happy to send a v2 implementing that instead. Yes, if we want to limit the I/O size we have to limit MDTS. Which all other transports do anyway, so I'm kinda surprised TCP doesn't. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() 2026-03-20 7:48 ` Christoph Hellwig @ 2026-04-05 19:46 ` Shivam Kumar 2026-04-07 6:30 ` Christoph Hellwig 0 siblings, 1 reply; 7+ messages in thread From: Shivam Kumar @ 2026-04-05 19:46 UTC (permalink / raw) To: Christoph Hellwig Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme Hi Christoph, Maurizio Just following up on this thread. Would you like me to send a v2 patch that sets a sane default MDTS for the TCP transport? If so, is there a preferred value I should use, or should I follow what the other transports do? Thanks, Shivam Kumar On Fri, Mar 20, 2026 at 3:48 AM Christoph Hellwig <hch@lst.de> wrote: > > On Thu, Mar 19, 2026 at 02:00:27PM -0400, Shivam Kumar wrote: > > Hi Maurizio > > > > Thank you for the review. You're right, I was conflating the per-PDU > > H2CData limit with the total transfer size for the command. > > And the error handling should trigger a fatal transport error rather > > than completing the request with an NVMe status code. > > > > Would setting a sane default for MDTS be the preferred approach here? > > If so, I'm happy to send a v2 implementing that instead. > > Yes, if we want to limit the I/O size we have to limit MDTS. Which all > other transports do anyway, so I'm kinda surprised TCP doesn't. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() 2026-04-05 19:46 ` Shivam Kumar @ 2026-04-07 6:30 ` Christoph Hellwig 2026-04-08 6:30 ` Maurizio Lombardi 0 siblings, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2026-04-07 6:30 UTC (permalink / raw) To: Shivam Kumar Cc: Christoph Hellwig, Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme On Sun, Apr 05, 2026 at 03:46:12PM -0400, Shivam Kumar wrote: > Hi Christoph, Maurizio > > Just following up on this thread. > Would you like me to send a v2 patch that sets a sane default MDTS for > the TCP transport? Yes, that would be great! > If so, is there a preferred value I should use, or > should I follow what the other transports do? I'm honestly not sure. Maurizio/Sagi: any good idea what a TCP default max MDTS might be? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() 2026-04-07 6:30 ` Christoph Hellwig @ 2026-04-08 6:30 ` Maurizio Lombardi 0 siblings, 0 replies; 7+ messages in thread From: Maurizio Lombardi @ 2026-04-08 6:30 UTC (permalink / raw) To: Christoph Hellwig, Shivam Kumar Cc: Maurizio Lombardi, gregkh, security, sagi, kch, linux-nvme On Tue Apr 7, 2026 at 8:30 AM CEST, Christoph Hellwig wrote: > On Sun, Apr 05, 2026 at 03:46:12PM -0400, Shivam Kumar wrote: >> Hi Christoph, Maurizio >> >> Just following up on this thread. >> Would you like me to send a v2 patch that sets a sane default MDTS for >> the TCP transport? > > Yes, that would be great! > >> If so, is there a preferred value I should use, or >> should I follow what the other transports do? > > I'm honestly not sure. Maurizio/Sagi: any good idea what a TCP default > max MDTS might be? I'm not sure there is a "correct" value here. It seems to mostly come down to balancing performance vs memory usage, rather than anything TCP-specific. For reference, pci-epf allows a max MDTS of 2 MiB[1], while RDMA uses 1 MiB. Personally, I would pick a default of 2 MiB, it should be large enough to avoid excessive splitting while still being reasonable from a memory perspective. Sagi? Any thought on that? I have seen a patch on this mailing list that makes mdts configurable via configfs, so in case of need a user could still change it to his preferred value. Maurizio [1] https://docs.kernel.org/nvme/nvme-pci-endpoint-target.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-08 6:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2026031805-stretch-skid-ae5b@gregkh>
2026-03-19 1:26 ` [PATCH] nvmet-tcp: bound sgl->length check in nvmet_tcp_map_data() Shivam Kumar
2026-03-19 7:59 ` Maurizio Lombardi
2026-03-19 18:00 ` Shivam Kumar
2026-03-20 7:48 ` Christoph Hellwig
2026-04-05 19:46 ` Shivam Kumar
2026-04-07 6:30 ` Christoph Hellwig
2026-04-08 6:30 ` Maurizio Lombardi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox