* [PATCH] RDMA/siw: bound Read Response placement to the RREAD length
@ 2026-06-02 19:47 Michael Bommarito
2026-06-03 9:44 ` Bernard Metzler
2026-06-05 17:07 ` Jason Gunthorpe
0 siblings, 2 replies; 4+ messages in thread
From: Michael Bommarito @ 2026-06-02 19:47 UTC (permalink / raw)
To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky
Cc: linux-rdma, linux-kernel
In drivers/infiniband/sw/siw/siw_qp_rx.c, siw_proc_rresp() places each
inbound Read Response DDP segment at sge->laddr + wqe->processed and then
accumulates wqe->processed, but it never checks the running total against
the sink buffer length on continuation segments. siw_check_sge() resolves
and validates the sink memory only on the first fragment (the if (!*mem)
branch), and siw_rresp_check_ntoh() compares the cumulative length against
wqe->bytes only on the final segment (the !frx->more_ddp_segs guard).
A connected siw peer that answers an outstanding RREAD with Read Response
segments that keep the DDP Last flag clear, carrying more total payload
than the RREAD requested, drives wqe->processed past the validated sink
buffer; the next siw_rx_data() call writes out of bounds at
sge->laddr + wqe->processed. siw runs iWARP over ordinary routable TCP,
so the peer is the remote end of an established RDMA connection and needs
no local privilege.
Bound every segment before placement, exactly as siw_proc_send() and
siw_proc_write() already do for their tagged and untagged paths, and
terminate the connection with a base-or-bounds DDP error when the
Read Response would overrun the sink buffer.
This is the second receive-path length fix for this file. A separate
change rejects an MPA FPDU length that underflows the per-fragment
remainder in the header decode; that guard does not cover this case,
because here each individual segment length is self-consistent and only
the accumulated placement offset overruns the buffer.
Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
Impact: the remote peer of an established Soft-iWARP connection can write
out of bounds past the RREAD sink (global DMA MR sink) or force a
connection-terminating fault (default FRWR sink) by streaming Read
Response segments whose cumulative length exceeds the requested length
with the DDP Last flag clear.
Relationship to the in-flight siw receive-path fix
This is the second receive-path length fix I have for this file. The
first, currently under review on this list, rejects an MPA FPDU length
that underflows the per-fragment remainder during header decode in
siw_get_hdr(). That guard sits in the header-length path and does not
cover the case here: in this report every individual segment length is
self-consistent and non-negative, and only the accumulated placement
offset (wqe->processed across continuation segments) overruns the sink
buffer. The two fixes touch different functions and are independent;
either can be applied without the other. I am sending this as a fresh
top-level thread rather than threading it under the earlier series to
keep the two changes separable for review and for stable selection.
Analysis
Verified by reading siw_qp_rx.c at v7.1-rc4 (a1f173eb51db):
- siw_proc_rresp() resolves and range-checks the sink memory region
once, on the first fragment only (the if (!*mem) branch calling
siw_check_sge() with the full requested length).
- The cumulative-length consistency check in siw_rresp_check_ntoh()
is gated on !more_ddp_segs, so it fires only on the segment that
carries the DDP Last flag, and siw_rresp_check_ntoh() itself runs
only on the first DDP segment.
- Each segment is then placed at sge->laddr + wqe->processed and
wqe->processed is accumulated, with no per-segment bound against
wqe->bytes.
A peer that keeps the DDP Last flag clear across continuation segments
and streams more total payload than the outstanding RREAD requested
therefore drives wqe->processed past the validated sink region; the
next placement writes peer-supplied bytes out of bounds. For a sink
backed by a global DMA / kernel-virtual MR (mem_obj NULL, the
siw_rx_kva() path) this is a direct kernel-heap out-of-bounds write.
For a user-memory or fast-registration/PBL MR (siw_rx_umem() /
siw_rx_pbl(), the mode kernel ULPs use by default) the over-walk is
caught when the page or PBL index runs out and is converted to a
connection-terminating fault. The missing per-segment bound should be
added regardless of sink type.
siw_proc_send() and siw_proc_write() already bound every segment with
the equivalent wqe->bytes < wqe->processed + srx->fpdu_part_rem check;
this patch brings siw_proc_rresp() in line with them.
Conditions: CONFIG_RDMA_SIW=m or =y, siw loaded and a link configured
(modprobe siw; rdma link add ... type siw), and a local ULP with an
outstanding RDMA READ to the peer. The out-of-bounds write requires the
READ sink to be a global DMA / kernel-virtual MR; ULPs using the default
fast-registration (FRWR) sink instead see a connection-terminating
fault. No special sysctls or local privilege on the victim are required
beyond the configured fabric.
Mitigations: until the patch is applied, do not configure Soft-iWARP
toward untrusted peers, or restrict the RDMA-CM listener to trusted
peers at the network layer. Removing the module (rmmod siw) is
sufficient when no application is using it.
A function-level reproducer that drives siw_proc_rresp() over the real
kernel TCP receive path with a primed ORQ and a two-segment malformed
Read Response reproduces a KASAN slab-out-of-bounds write on a global
DMA MR sink; it can be shared off-list on request.
Fixes: 8b6a361b8c48 ("rdma/siw: receive path"), reachable from v5.10
and all later releases; all stable branches carrying siw are affected.
drivers/infiniband/sw/siw/siw_qp_rx.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c
index e8a88b378d51d..ec4aadef3dfe2 100644
--- a/drivers/infiniband/sw/siw/siw_qp_rx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_rx.c
@@ -844,6 +844,25 @@ int siw_proc_rresp(struct siw_qp *qp)
}
mem_p = *mem;
+ /*
+ * siw_rresp_check_ntoh() validates the cumulative length only on
+ * the last DDP segment (!more_ddp_segs), and siw_check_sge() above
+ * resolves the sink memory only on the first fragment. A peer that
+ * keeps DDP_FLAG_LAST clear and streams more payload than the
+ * outstanding RREAD requested therefore drives wqe->processed past
+ * the validated sink buffer, writing out of bounds. Bound every
+ * segment as siw_proc_send()/siw_proc_write() already do.
+ */
+ if (unlikely(wqe->processed + srx->fpdu_part_rem > wqe->bytes)) {
+ siw_dbg_qp(qp, "rresp len: %d + %d > %d\n",
+ wqe->processed, srx->fpdu_part_rem, wqe->bytes);
+ wqe->wc_status = SIW_WC_LOC_LEN_ERR;
+ siw_init_terminate(qp, TERM_ERROR_LAYER_DDP,
+ DDP_ETYPE_TAGGED_BUF,
+ DDP_ECODE_T_BASE_BOUNDS, 0);
+ return -EINVAL;
+ }
+
bytes = min(srx->fpdu_part_rem, srx->skb_new);
rv = siw_rx_data(mem_p, srx, &frx->pbl_idx,
sge->laddr + wqe->processed, bytes);
--
2.53.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] RDMA/siw: bound Read Response placement to the RREAD length
2026-06-02 19:47 [PATCH] RDMA/siw: bound Read Response placement to the RREAD length Michael Bommarito
@ 2026-06-03 9:44 ` Bernard Metzler
2026-06-05 17:07 ` Jason Gunthorpe
1 sibling, 0 replies; 4+ messages in thread
From: Bernard Metzler @ 2026-06-03 9:44 UTC (permalink / raw)
To: Michael Bommarito, Jason Gunthorpe, Leon Romanovsky
Cc: linux-rdma, linux-kernel
On 02.06.2026 21:47, Michael Bommarito wrote:
> In drivers/infiniband/sw/siw/siw_qp_rx.c, siw_proc_rresp() places each
> inbound Read Response DDP segment at sge->laddr + wqe->processed and then
> accumulates wqe->processed, but it never checks the running total against
> the sink buffer length on continuation segments. siw_check_sge() resolves
> and validates the sink memory only on the first fragment (the if (!*mem)
> branch), and siw_rresp_check_ntoh() compares the cumulative length against
> wqe->bytes only on the final segment (the !frx->more_ddp_segs guard).
>
> A connected siw peer that answers an outstanding RREAD with Read Response
> segments that keep the DDP Last flag clear, carrying more total payload
> than the RREAD requested, drives wqe->processed past the validated sink
> buffer; the next siw_rx_data() call writes out of bounds at
> sge->laddr + wqe->processed. siw runs iWARP over ordinary routable TCP,
> so the peer is the remote end of an established RDMA connection and needs
> no local privilege.
>
> Bound every segment before placement, exactly as siw_proc_send() and
> siw_proc_write() already do for their tagged and untagged paths, and
> terminate the connection with a base-or-bounds DDP error when the
> Read Response would overrun the sink buffer.
>
> This is the second receive-path length fix for this file. A separate
> change rejects an MPA FPDU length that underflows the per-fragment
> remainder in the header decode; that guard does not cover this case,
> because here each individual segment length is self-consistent and only
> the accumulated placement offset overruns the buffer.
>
> Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
> Cc: stable@vger.kernel.org
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
>
> Impact: the remote peer of an established Soft-iWARP connection can write
> out of bounds past the RREAD sink (global DMA MR sink) or force a
> connection-terminating fault (default FRWR sink) by streaming Read
> Response segments whose cumulative length exceeds the requested length
> with the DDP Last flag clear.
>
> Relationship to the in-flight siw receive-path fix
>
> This is the second receive-path length fix I have for this file. The
> first, currently under review on this list, rejects an MPA FPDU length
> that underflows the per-fragment remainder during header decode in
> siw_get_hdr(). That guard sits in the header-length path and does not
> cover the case here: in this report every individual segment length is
> self-consistent and non-negative, and only the accumulated placement
> offset (wqe->processed across continuation segments) overruns the sink
> buffer. The two fixes touch different functions and are independent;
> either can be applied without the other. I am sending this as a fresh
> top-level thread rather than threading it under the earlier series to
> keep the two changes separable for review and for stable selection.
>
> Analysis
>
> Verified by reading siw_qp_rx.c at v7.1-rc4 (a1f173eb51db):
>
> - siw_proc_rresp() resolves and range-checks the sink memory region
> once, on the first fragment only (the if (!*mem) branch calling
> siw_check_sge() with the full requested length).
> - The cumulative-length consistency check in siw_rresp_check_ntoh()
> is gated on !more_ddp_segs, so it fires only on the segment that
> carries the DDP Last flag, and siw_rresp_check_ntoh() itself runs
> only on the first DDP segment.
> - Each segment is then placed at sge->laddr + wqe->processed and
> wqe->processed is accumulated, with no per-segment bound against
> wqe->bytes.
>
> A peer that keeps the DDP Last flag clear across continuation segments
> and streams more total payload than the outstanding RREAD requested
> therefore drives wqe->processed past the validated sink region; the
> next placement writes peer-supplied bytes out of bounds. For a sink
> backed by a global DMA / kernel-virtual MR (mem_obj NULL, the
> siw_rx_kva() path) this is a direct kernel-heap out-of-bounds write.
> For a user-memory or fast-registration/PBL MR (siw_rx_umem() /
> siw_rx_pbl(), the mode kernel ULPs use by default) the over-walk is
> caught when the page or PBL index runs out and is converted to a
> connection-terminating fault. The missing per-segment bound should be
> added regardless of sink type.
>
> siw_proc_send() and siw_proc_write() already bound every segment with
> the equivalent wqe->bytes < wqe->processed + srx->fpdu_part_rem check;
> this patch brings siw_proc_rresp() in line with them.
>
> Conditions: CONFIG_RDMA_SIW=m or =y, siw loaded and a link configured
> (modprobe siw; rdma link add ... type siw), and a local ULP with an
> outstanding RDMA READ to the peer. The out-of-bounds write requires the
> READ sink to be a global DMA / kernel-virtual MR; ULPs using the default
> fast-registration (FRWR) sink instead see a connection-terminating
> fault. No special sysctls or local privilege on the victim are required
> beyond the configured fabric.
>
> Mitigations: until the patch is applied, do not configure Soft-iWARP
> toward untrusted peers, or restrict the RDMA-CM listener to trusted
> peers at the network layer. Removing the module (rmmod siw) is
> sufficient when no application is using it.
>
> A function-level reproducer that drives siw_proc_rresp() over the real
> kernel TCP receive path with a primed ORQ and a two-segment malformed
> Read Response reproduces a KASAN slab-out-of-bounds write on a global
> DMA MR sink; it can be shared off-list on request.
>
> Fixes: 8b6a361b8c48 ("rdma/siw: receive path"), reachable from v5.10
> and all later releases; all stable branches carrying siw are affected.
>
> drivers/infiniband/sw/siw/siw_qp_rx.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c
> index e8a88b378d51d..ec4aadef3dfe2 100644
> --- a/drivers/infiniband/sw/siw/siw_qp_rx.c
> +++ b/drivers/infiniband/sw/siw/siw_qp_rx.c
> @@ -844,6 +844,25 @@ int siw_proc_rresp(struct siw_qp *qp)
> }
> mem_p = *mem;
>
> + /*
> + * siw_rresp_check_ntoh() validates the cumulative length only on
> + * the last DDP segment (!more_ddp_segs), and siw_check_sge() above
> + * resolves the sink memory only on the first fragment. A peer that
> + * keeps DDP_FLAG_LAST clear and streams more payload than the
> + * outstanding RREAD requested therefore drives wqe->processed past
> + * the validated sink buffer, writing out of bounds. Bound every
> + * segment as siw_proc_send()/siw_proc_write() already do.
> + */
Excellent finding!
What we do for Send's and WRITE's, we have to do for RRESP's as
well.
Please make the commentary above much shorter, if at all.
As you are pointing out, we have that bounds check in place
for Send and WRITE already and it is clear why we need it
here as well. We are indeed excited we found a big and
ugly hole in the driver, but that should get reflected
in a (brief and concise) commit message and not the
code path.
> + if (unlikely(wqe->processed + srx->fpdu_part_rem > wqe->bytes)) {
> + siw_dbg_qp(qp, "rresp len: %d + %d > %d\n",
> + wqe->processed, srx->fpdu_part_rem, wqe->bytes);
> + wqe->wc_status = SIW_WC_LOC_LEN_ERR;
> + siw_init_terminate(qp, TERM_ERROR_LAYER_DDP,
> + DDP_ETYPE_TAGGED_BUF,
> + DDP_ECODE_T_BASE_BOUNDS, 0);
> + return -EINVAL;
> + }
> +
No extra newline here
> bytes = min(srx->fpdu_part_rem, srx->skb_new);
> rv = siw_rx_data(mem_p, srx, &frx->pbl_idx,
> sge->laddr + wqe->processed, bytes);
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] RDMA/siw: bound Read Response placement to the RREAD length
2026-06-02 19:47 [PATCH] RDMA/siw: bound Read Response placement to the RREAD length Michael Bommarito
2026-06-03 9:44 ` Bernard Metzler
@ 2026-06-05 17:07 ` Jason Gunthorpe
2026-06-06 18:42 ` Michael Bommarito
1 sibling, 1 reply; 4+ messages in thread
From: Jason Gunthorpe @ 2026-06-05 17:07 UTC (permalink / raw)
To: Michael Bommarito
Cc: Bernard Metzler, Leon Romanovsky, linux-rdma, linux-kernel
On Tue, Jun 02, 2026 at 03:47:00PM -0400, Michael Bommarito wrote:
> In drivers/infiniband/sw/siw/siw_qp_rx.c, siw_proc_rresp() places each
> inbound Read Response DDP segment at sge->laddr + wqe->processed and then
> accumulates wqe->processed, but it never checks the running total against
> the sink buffer length on continuation segments. siw_check_sge() resolves
> and validates the sink memory only on the first fragment (the if (!*mem)
> branch), and siw_rresp_check_ntoh() compares the cumulative length against
> wqe->bytes only on the final segment (the !frx->more_ddp_segs guard).
>
> A connected siw peer that answers an outstanding RREAD with Read Response
> segments that keep the DDP Last flag clear, carrying more total payload
> than the RREAD requested, drives wqe->processed past the validated sink
> buffer; the next siw_rx_data() call writes out of bounds at
> sge->laddr + wqe->processed. siw runs iWARP over ordinary routable TCP,
> so the peer is the remote end of an established RDMA connection and needs
> no local privilege.
>
> Bound every segment before placement, exactly as siw_proc_send() and
> siw_proc_write() already do for their tagged and untagged paths, and
> terminate the connection with a base-or-bounds DDP error when the
> Read Response would overrun the sink buffer.
>
> This is the second receive-path length fix for this file. A separate
> change rejects an MPA FPDU length that underflows the per-fragment
> remainder in the header decode; that guard does not cover this case,
> because here each individual segment length is self-consistent and only
> the accumulated placement offset overruns the buffer.
>
> Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
> Cc: stable@vger.kernel.org
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
I made the changes Bernard asked for and applied it to for-next
Thanks,
Jason
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] RDMA/siw: bound Read Response placement to the RREAD length
2026-06-05 17:07 ` Jason Gunthorpe
@ 2026-06-06 18:42 ` Michael Bommarito
0 siblings, 0 replies; 4+ messages in thread
From: Michael Bommarito @ 2026-06-06 18:42 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Bernard Metzler, Leon Romanovsky, linux-rdma, linux-kernel
On Fri, Jun 5, 2026 at 1:07 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> I made the changes Bernard asked for and applied it to for-next
Thanks for finishing this and SRP_RSP. I know you're still waiting on
my v2 for IB/mad, which I'll be sending soon too.
Thanks,
Mike
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-06 18:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02 19:47 [PATCH] RDMA/siw: bound Read Response placement to the RREAD length Michael Bommarito
2026-06-03 9:44 ` Bernard Metzler
2026-06-05 17:07 ` Jason Gunthorpe
2026-06-06 18:42 ` Michael Bommarito
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.