* [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage
@ 2026-05-13 17:53 Michael Bommarito
2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito
2026-05-13 17:53 ` [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing Michael Bommarito
0 siblings, 2 replies; 5+ messages in thread
From: Michael Bommarito @ 2026-05-13 17:53 UTC (permalink / raw)
To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky, linux-rdma
Cc: linux-kernel
[1/2] fixes a peer-controlled signed-int underflow in the Soft-iWARP
receive path: c_hdr->mpa_len (16-bit, on-wire, peer-chosen) is never
compared against iwarp_pktinfo[opcode].hdr_len, so a malformed FPDU
makes siw_tcp_rx_data() derive a negative srx->fpdu_part_rem that
flows through siw_proc_write() / siw_proc_rresp() into siw_check_mem()
(which accepts a negative interval against a valid base) and on into
skb_copy_bits() as a signed int copy length. Under KASAN this fires
as a multi-gigabyte OOB read in the header-copy branch. Full root
cause and the KASAN call trace are in [1/2]'s commit message.
[2/2] adds the KUnit regression harness used to validate [1/2]. It
is split into its own patch because the test brings new Kconfig
plumbing and a new file in drivers/infiniband/sw/siw/, and so that
maintainers can take [1/2] on its own if they want to defer the test
or treat it differently for stable backport. The fix in [1/2] is
tagged for stable; [2/2] is not.
The harness has three cases. Two use a constructed sk_buff: one
asserts the new check rejects an underflowed mpa_len; one is a
regression control with the minimum-valid mpa_len (zero-length
WRITE). The third opens a loopback AF_INET socketpair via
sock_create_kern() and drives the malformed FPDU through the real
kernel TCP receive path (sk_data_ready in softirq -> tcp_read_sock
-> siw_tcp_rx_data), so the same chain a remote peer would exercise
is covered.
Tested:
- UML + KASAN (inline) defconfig + KUNIT + RDMA_SIW: all three
KUnit cases pass with the series applied; the stock tree splats
in skb_copy_bits with "Read of size 4294967295".
- x86_64 modular W=1 build clean on drivers/infiniband/sw/siw/.
- checkpatch.pl --strict clean on both patches (one false-positive
MAINTAINERS warning on [2/2] because the existing siw entry
covers drivers/infiniband/sw/siw/ as a directory).
- git am of the series to a fresh base produces a diff identical
to the validation worktree.
Bug exists since commit 8b6a361b8c48 ("rdma/siw: receive path") in
2019 (5.3-rc1), so all LTS branches with siw are affected; [1/2]
carries Cc: stable.
Michael Bommarito (2):
RDMA/siw: reject MPA FPDU length underflow before signed receive math
RDMA/siw: add KUnit tests for MPA receive parsing
drivers/infiniband/sw/siw/Kconfig | 18 +
drivers/infiniband/sw/siw/Makefile | 2 +
drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c | 349 +++++++++++++++++++
drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +
4 files changed, 384 insertions(+)
create mode 100644 drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math 2026-05-13 17:53 [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage Michael Bommarito @ 2026-05-13 17:53 ` Michael Bommarito 2026-05-14 17:10 ` Bernard Metzler 2026-05-14 21:24 ` Jason Gunthorpe 2026-05-13 17:53 ` [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing Michael Bommarito 1 sibling, 2 replies; 5+ messages in thread From: Michael Bommarito @ 2026-05-13 17:53 UTC (permalink / raw) To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky, linux-rdma Cc: linux-kernel A malicious connected siw peer can send an iWARP FPDU whose MPA length field (c_hdr->mpa_len, 16 bit big-endian, peer-controlled) is smaller than the fixed DDP/RDMAP header for the announced opcode. Soft-iWARP parses the full header in siw_get_hdr() based on iwarp_pktinfo[opcode] .hdr_len, but never compares mpa_len against that header length. siw_tcp_rx_data() then derives srx->fpdu_part_rem = be16_to_cpu(mpa_len) - fpdu_part_rcvd + MPA_HDR_SIZE; where fpdu_part_rcvd equals iwarp_pktinfo[opcode].hdr_len at this point. For a tagged WRITE (hdr_len 16, MPA_HDR_SIZE 2) the smallest on-wire mpa_len of 0 yields fpdu_part_rem = -14, and any mpa_len below hdr_len - MPA_HDR_SIZE underflows to a negative int. The signed value then flows into siw_proc_write()/siw_proc_rresp() as bytes = min(srx->fpdu_part_rem, srx->skb_new); is handed to siw_check_mem() as an int len (whose interval check addr + len > mem->va + mem->len is satisfied for a valid base when len is negative), and reaches siw_rx_data() -> siw_rx_kva() / siw_rx_umem() -> skb_copy_bits() as a signed copy length. The header copy branch in skb_copy_bits() promotes that to size_t, producing a multi-gigabyte read. KASAN under a KUnit harness that drives the real kernel TCP receive path -- a loopback AF_INET socketpair, the malformed FPDU written via kernel_sendmsg, sk_data_ready firing in softirq, tcp_read_sock dispatching to siw_tcp_rx_data -- reports: BUG: KASAN: use-after-free in skb_copy_bits+0x284/0x480 Read of size 4294967295 at addr ffff888... Call Trace: skb_copy_bits siw_rx_kva siw_rx_data siw_check_mem siw_proc_write siw_tcp_rx_data __tcp_read_sock siw_qp_llp_data_ready tcp_data_ready tcp_data_queue Add the missing invariant at the earliest point where the peer header is fully assembled. iwarp_pktinfo[*].hdr_len - MPA_HDR_SIZE is exactly the value the siw transmitter uses as the minimum mpa_len for each opcode (drivers/infiniband/sw/siw/siw_qp.c:33), so this matches the protocol contract. Out-of-range FPDUs terminate the connection with TERM_ERROR_LAYER_LLP / LLP_ETYPE_MPA / LLP_ECODE_FPDU_START -- which is RFC 5044 Section 8 error code 3 ("Marker and ULPDU Length fields do not agree on the start of an FPDU"), the correct framing-error class for this inconsistency. Fixes: 8b6a361b8c48 ("rdma/siw: receive path") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-7 --- See cover letter for full root cause, series rationale, and test summary. [2/2] adds the KUnit regression harness used to validate this fix. drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c index e8a88b378d51..34d03584160c 100644 --- a/drivers/infiniband/sw/siw/siw_qp_rx.c +++ b/drivers/infiniband/sw/siw/siw_qp_rx.c @@ -1081,6 +1081,21 @@ static int siw_get_hdr(struct siw_rx_stream *srx) return -EAGAIN; } + /* + * Peer-controlled mpa_len must not underflow srx->fpdu_part_rem + * in siw_tcp_rx_data(); a negative value flows as a signed copy + * length into siw_check_mem() and skb_copy_bits(). + */ + if (unlikely(be16_to_cpu(c_hdr->mpa_len) + MPA_HDR_SIZE < + iwarp_pktinfo[opcode].hdr_len)) { + pr_warn_ratelimited("siw: short mpa_len %u for opcode %u (hdr_len %u)\n", + be16_to_cpu(c_hdr->mpa_len), opcode, + iwarp_pktinfo[opcode].hdr_len); + siw_init_terminate(rx_qp(srx), TERM_ERROR_LAYER_LLP, + LLP_ETYPE_MPA, LLP_ECODE_FPDU_START, 0); + return -EINVAL; + } + /* * DDP/RDMAP header receive completed. Check if the current * DDP segment starts a new RDMAP message or continues a previously -- 2.53.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math 2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito @ 2026-05-14 17:10 ` Bernard Metzler 2026-05-14 21:24 ` Jason Gunthorpe 1 sibling, 0 replies; 5+ messages in thread From: Bernard Metzler @ 2026-05-14 17:10 UTC (permalink / raw) To: Michael Bommarito, Jason Gunthorpe, Leon Romanovsky, linux-rdma Cc: linux-kernel On 13.05.2026 19:53, Michael Bommarito wrote: > A malicious connected siw peer can send an iWARP FPDU whose MPA length > field (c_hdr->mpa_len, 16 bit big-endian, peer-controlled) is smaller > than the fixed DDP/RDMAP header for the announced opcode. Soft-iWARP > parses the full header in siw_get_hdr() based on iwarp_pktinfo[opcode] > .hdr_len, but never compares mpa_len against that header length. > > siw_tcp_rx_data() then derives > > srx->fpdu_part_rem = be16_to_cpu(mpa_len) - fpdu_part_rcvd > + MPA_HDR_SIZE; > > where fpdu_part_rcvd equals iwarp_pktinfo[opcode].hdr_len at this > point. For a tagged WRITE (hdr_len 16, MPA_HDR_SIZE 2) the smallest > on-wire mpa_len of 0 yields fpdu_part_rem = -14, and any mpa_len below > hdr_len - MPA_HDR_SIZE underflows to a negative int. > > The signed value then flows into siw_proc_write()/siw_proc_rresp() as > > bytes = min(srx->fpdu_part_rem, srx->skb_new); > > is handed to siw_check_mem() as an int len (whose interval check > addr + len > mem->va + mem->len is satisfied for a valid base when > len is negative), and reaches siw_rx_data() -> siw_rx_kva() / > siw_rx_umem() -> skb_copy_bits() as a signed copy length. The header > copy branch in skb_copy_bits() promotes that to size_t, producing a > multi-gigabyte read. > > KASAN under a KUnit harness that drives the real kernel TCP receive > path -- a loopback AF_INET socketpair, the malformed FPDU written via > kernel_sendmsg, sk_data_ready firing in softirq, tcp_read_sock > dispatching to siw_tcp_rx_data -- reports: > > BUG: KASAN: use-after-free in skb_copy_bits+0x284/0x480 > Read of size 4294967295 at addr ffff888... > Call Trace: > skb_copy_bits > siw_rx_kva > siw_rx_data > siw_check_mem > siw_proc_write > siw_tcp_rx_data > __tcp_read_sock > siw_qp_llp_data_ready > tcp_data_ready > tcp_data_queue > > Add the missing invariant at the earliest point where the peer header > is fully assembled. iwarp_pktinfo[*].hdr_len - MPA_HDR_SIZE is exactly > the value the siw transmitter uses as the minimum mpa_len for each > opcode (drivers/infiniband/sw/siw/siw_qp.c:33), so this matches the > protocol contract. Out-of-range FPDUs terminate the connection with > TERM_ERROR_LAYER_LLP / LLP_ETYPE_MPA / LLP_ECODE_FPDU_START -- which > is RFC 5044 Section 8 error code 3 ("Marker and ULPDU Length fields > do not agree on the start of an FPDU"), the correct framing-error > class for this inconsistency. > > Fixes: 8b6a361b8c48 ("rdma/siw: receive path") > Cc: stable@vger.kernel.org > Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> > Assisted-by: Claude:claude-opus-4-7 > --- > See cover letter for full root cause, series rationale, and test > summary. [2/2] adds the KUnit regression harness used to validate > this fix. > > drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c > index e8a88b378d51..34d03584160c 100644 > --- a/drivers/infiniband/sw/siw/siw_qp_rx.c > +++ b/drivers/infiniband/sw/siw/siw_qp_rx.c > @@ -1081,6 +1081,21 @@ static int siw_get_hdr(struct siw_rx_stream *srx) > return -EAGAIN; > } > > + /* > + * Peer-controlled mpa_len must not underflow srx->fpdu_part_rem > + * in siw_tcp_rx_data(); a negative value flows as a signed copy > + * length into siw_check_mem() and skb_copy_bits(). > + */ Excellent finding. This was an open gateway for all evil. > + if (unlikely(be16_to_cpu(c_hdr->mpa_len) + MPA_HDR_SIZE < > + iwarp_pktinfo[opcode].hdr_len)) { > + pr_warn_ratelimited("siw: short mpa_len %u for opcode %u (hdr_len %u)\n", I think we shall stay with 80 chars per line. So let's wrap the above line. Otherwise Acked-by: Bernard Metzler <bernard.metzler@linux.dev> > + be16_to_cpu(c_hdr->mpa_len), opcode, > + iwarp_pktinfo[opcode].hdr_len); > + siw_init_terminate(rx_qp(srx), TERM_ERROR_LAYER_LLP, > + LLP_ETYPE_MPA, LLP_ECODE_FPDU_START, 0); > + return -EINVAL; > + } > + > /* > * DDP/RDMAP header receive completed. Check if the current > * DDP segment starts a new RDMAP message or continues a previously ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math 2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito 2026-05-14 17:10 ` Bernard Metzler @ 2026-05-14 21:24 ` Jason Gunthorpe 1 sibling, 0 replies; 5+ messages in thread From: Jason Gunthorpe @ 2026-05-14 21:24 UTC (permalink / raw) To: Michael Bommarito Cc: Bernard Metzler, Leon Romanovsky, linux-rdma, linux-kernel On Wed, May 13, 2026 at 01:53:24PM -0400, Michael Bommarito wrote: > A malicious connected siw peer can send an iWARP FPDU whose MPA length > field (c_hdr->mpa_len, 16 bit big-endian, peer-controlled) is smaller > than the fixed DDP/RDMAP header for the announced opcode. Soft-iWARP > parses the full header in siw_get_hdr() based on iwarp_pktinfo[opcode] > .hdr_len, but never compares mpa_len against that header length. > > siw_tcp_rx_data() then derives > > srx->fpdu_part_rem = be16_to_cpu(mpa_len) - fpdu_part_rcvd > + MPA_HDR_SIZE; > > where fpdu_part_rcvd equals iwarp_pktinfo[opcode].hdr_len at this > point. For a tagged WRITE (hdr_len 16, MPA_HDR_SIZE 2) the smallest > on-wire mpa_len of 0 yields fpdu_part_rem = -14, and any mpa_len below > hdr_len - MPA_HDR_SIZE underflows to a negative int. > > The signed value then flows into siw_proc_write()/siw_proc_rresp() as > > bytes = min(srx->fpdu_part_rem, srx->skb_new); > > is handed to siw_check_mem() as an int len (whose interval check > addr + len > mem->va + mem->len is satisfied for a valid base when > len is negative), and reaches siw_rx_data() -> siw_rx_kva() / > siw_rx_umem() -> skb_copy_bits() as a signed copy length. The header > copy branch in skb_copy_bits() promotes that to size_t, producing a > multi-gigabyte read. > > KASAN under a KUnit harness that drives the real kernel TCP receive > path -- a loopback AF_INET socketpair, the malformed FPDU written via > kernel_sendmsg, sk_data_ready firing in softirq, tcp_read_sock > dispatching to siw_tcp_rx_data -- reports: > > BUG: KASAN: use-after-free in skb_copy_bits+0x284/0x480 > Read of size 4294967295 at addr ffff888... > Call Trace: > skb_copy_bits > siw_rx_kva > siw_rx_data > siw_check_mem > siw_proc_write > siw_tcp_rx_data > __tcp_read_sock > siw_qp_llp_data_ready > tcp_data_ready > tcp_data_queue > > Add the missing invariant at the earliest point where the peer header > is fully assembled. iwarp_pktinfo[*].hdr_len - MPA_HDR_SIZE is exactly > the value the siw transmitter uses as the minimum mpa_len for each > opcode (drivers/infiniband/sw/siw/siw_qp.c:33), so this matches the > protocol contract. Out-of-range FPDUs terminate the connection with > TERM_ERROR_LAYER_LLP / LLP_ETYPE_MPA / LLP_ECODE_FPDU_START -- which > is RFC 5044 Section 8 error code 3 ("Marker and ULPDU Length fields > do not agree on the start of an FPDU"), the correct framing-error > class for this inconsistency. > > Fixes: 8b6a361b8c48 ("rdma/siw: receive path") > Cc: stable@vger.kernel.org > Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> > Assisted-by: Claude:claude-opus-4-7 > Acked-by: Bernard Metzler <bernard.metzler@linux.dev> > --- > See cover letter for full root cause, series rationale, and test > summary. [2/2] adds the KUnit regression harness used to validate > this fix. > > drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) Applied to for-rc Thanks, Jason ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing 2026-05-13 17:53 [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage Michael Bommarito 2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito @ 2026-05-13 17:53 ` Michael Bommarito 1 sibling, 0 replies; 5+ messages in thread From: Michael Bommarito @ 2026-05-13 17:53 UTC (permalink / raw) To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky, linux-rdma Cc: linux-kernel Add a KUnit suite (CONFIG_SIW_MPA_RX_KUNIT_TEST) that exercises the real siw_tcp_rx_data() path with three cases covering the MPA length validation added in the previous patch: - siw_mpa_write_underflow_rejected Constructs an sk_buff carrying a tagged RDMA WRITE FPDU whose mpa_len is one below iwarp_pktinfo[opcode].hdr_len - MPA_HDR_SIZE. Registers a REMOTE_WRITE MR in mem_xa so the WRITE path would otherwise reach siw_proc_write(), and calls siw_tcp_rx_data() directly. Asserts the FPDU is rejected with TERM(LLP/MPA/FPDU_START) and rx_suspend = 1. - siw_mpa_write_minimum_valid_accepted Regression control with mpa_len = hdr_len - MPA_HDR_SIZE (the smallest legal value, i.e. a zero-length WRITE). Asserts the new check does not fire: no terminate, rx_stream not suspended. - siw_mpa_write_underflow_rejected_live_socket Opens a loopback AF_INET socketpair via sock_create_kern(), attaches a struct siw_cep as sk_user_data so sk_to_qp() resolves to the test QP, and installs siw_qp_llp_data_ready as sk_data_ready on the victim socket. Writes the malformed FPDU via kernel_sendmsg from the attacker side. The kernel TCP stack delivers, sk_data_ready fires in softirq, and tcp_read_sock dispatches to siw_tcp_rx_data the same way a remote peer would. Asserts the same terminate state as the first case. The third case is the design driver: it confirms the bug-fix codepath fires from a real softirq RX entry, not just a synthetic direct call. On a stock siw tree the same harness reproduces the KASAN slab-out-of-bounds / use-after-free in skb_copy_bits. Bringing siw's loopback netdev up (case 3 binds 127.0.0.1) is done inline via dev_change_flags() under rtnl_lock since the KUnit environment does not run init scripts. Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-7 --- drivers/infiniband/sw/siw/Kconfig | 18 + drivers/infiniband/sw/siw/Makefile | 2 + drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c | 349 +++++++++++++++++++ 3 files changed, 369 insertions(+) create mode 100644 drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig index 186f182b80e7..b137f5920271 100644 --- a/drivers/infiniband/sw/siw/Kconfig +++ b/drivers/infiniband/sw/siw/Kconfig @@ -18,3 +18,21 @@ config RDMA_SIW space verbs API, libibverbs. To implement RDMA over TCP/IP, the driver further interfaces with the Linux in-kernel TCP socket layer. + +config SIW_MPA_RX_KUNIT_TEST + bool "KUnit tests for Soft-iWARP MPA receive parsing" if !KUNIT_ALL_TESTS + depends on KUNIT && RDMA_SIW + default KUNIT_ALL_TESTS + help + Build KUnit regression tests for the Soft-iWARP MPA receive + state machine. The tests cover the MPA length consistency + check in siw_get_hdr(): malformed FPDUs whose mpa_len is + below the opcode's fixed DDP/RDMAP header must be rejected + with TERM(LLP/MPA/FPDU_START); the minimum-valid mpa_len + (zero-length WRITE) must still be accepted. One case drives + the real kernel TCP receive path via a loopback socketpair + so the same softirq sk_data_ready -> tcp_read_sock -> + siw_tcp_rx_data chain a remote peer would exercise is + covered. + + If unsure, say N. diff --git a/drivers/infiniband/sw/siw/Makefile b/drivers/infiniband/sw/siw/Makefile index f5f7e3867889..09d4c90d8758 100644 --- a/drivers/infiniband/sw/siw/Makefile +++ b/drivers/infiniband/sw/siw/Makefile @@ -9,3 +9,5 @@ siw-y := \ siw_qp_tx.o \ siw_qp_rx.o \ siw_verbs.o + +siw-$(CONFIG_SIW_MPA_RX_KUNIT_TEST) += siw_mpa_rx_kunit.o diff --git a/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c b/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c new file mode 100644 index 000000000000..204b3213b840 --- /dev/null +++ b/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c @@ -0,0 +1,349 @@ +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause +/* + * KUnit harness for siw MPA receive-state length validation. + * + * Internal to the SIW_MPA_LEN_UNDERFLOW_RX_COPY disclosure validation tree. + * Not part of the upstream patch. + * + * case 1: short mpa_len triggers the new siw_get_hdr() check via direct + * siw_tcp_rx_data() call with a constructed sk_buff + * - expects: TERM(LLP/MPA/FPDU_START), rx_suspend=1 + * - under stock siw: KASAN slab-out-of-bounds in skb_copy_bits() + * - under patched siw: no splat, terminate state set + * + * case 2: minimum-valid mpa_len control (constructed sk_buff) + * - mpa_len = hdr_len - MPA_HDR_SIZE -> fpdu_part_rem = 0 + * so siw_proc_write() takes the zero-length WRITE short path + * and returns 0 without calling skb_copy_bits(). + * - expects: no TERM, state machine progressed normally + * + * case 3: real loopback TCP socketpair (the "live two-node" analog) + * - opens AF_INET TCP sockets in-kernel via sock_create_kern() + * - binds/listens on 127.0.0.1:0, connects, accepts + * - installs siw_qp_llp_data_ready on the victim socket and + * attaches a struct siw_cep so sk_to_qp() resolves to our qp + * - writes the malformed FPDU bytes via kernel_sendmsg on the + * attacker socket + * - the kernel TCP stack delivers, sk_data_ready fires, and + * siw_qp_llp_data_ready -> tcp_read_sock -> siw_tcp_rx_data + * runs in the normal kernel receive path + * - expects: TERM(LLP/MPA/FPDU_START) on the qp + */ + +#include <kunit/test.h> +#include <linux/inet.h> +#include <linux/in.h> +#include <linux/netdevice.h> +#include <linux/rtnetlink.h> +#include <linux/skbuff.h> +#include <linux/tcp.h> +#include <linux/wait.h> +#include <linux/xarray.h> +#include <net/sock.h> +#include <net/tcp.h> +#include <rdma/ib_verbs.h> + +#include "siw.h" +#include "siw_cm.h" +#include "siw_mem.h" + +static void siw_kunit_kfree_skb(void *skb) +{ + kfree_skb(skb); +} + +struct siw_mpa_rx_ctx { + struct siw_device *sdev; + struct siw_qp *qp; + struct siw_mem *mem; + void *target; + u32 stag; +}; + +static void siw_mpa_rx_setup(struct kunit *test, struct siw_mpa_rx_ctx *c) +{ + void *xa_ret; + + c->stag = 0x00000100; + + c->sdev = kunit_kzalloc(test, sizeof(*c->sdev), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, c->sdev); + xa_init_flags(&c->sdev->mem_xa, XA_FLAGS_ALLOC1); + + c->qp = kunit_kzalloc(test, sizeof(*c->qp), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, c->qp); + c->qp->sdev = c->sdev; + c->qp->pd = kunit_kzalloc(test, sizeof(*c->qp->pd), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, c->qp->pd); + c->qp->rx_stream.state = SIW_GET_HDR; + + c->target = kunit_kzalloc(test, 64, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, c->target); + + c->mem = kunit_kzalloc(test, sizeof(*c->mem), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, c->mem); + kref_init(&c->mem->ref); + c->mem->sdev = c->sdev; + c->mem->stag = c->stag; + c->mem->stag_valid = 1; + c->mem->va = (u64)(uintptr_t)c->target; + c->mem->len = 64; + c->mem->pd = c->qp->pd; + c->mem->perms = IB_ACCESS_REMOTE_WRITE; + + xa_ret = xa_store(&c->sdev->mem_xa, c->stag >> 8, c->mem, GFP_KERNEL); + KUNIT_ASSERT_FALSE(test, xa_is_err(xa_ret)); +} + +static void siw_mpa_rx_run(struct kunit *test, struct siw_mpa_rx_ctx *c, + u16 mpa_len_val) +{ + struct iwarp_rdma_write write = { }; + struct sk_buff *skb; + read_descriptor_t rd_desc = { }; + u8 payload[sizeof(write) + 1]; + + write.ctrl.mpa_len = cpu_to_be16(mpa_len_val); + write.ctrl.ddp_rdmap_ctrl = DDP_FLAG_TAGGED | DDP_FLAG_LAST | + cpu_to_be16(DDP_VERSION << 8) | + cpu_to_be16(RDMAP_VERSION << 6) | + cpu_to_be16(RDMAP_RDMA_WRITE); + write.sink_stag = cpu_to_be32(c->stag); + write.sink_to = cpu_to_be64((u64)(uintptr_t)c->target); + + memcpy(payload, &write, sizeof(write)); + payload[sizeof(write)] = 0x41; + + skb = alloc_skb(sizeof(payload), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, skb); + skb_put_data(skb, payload, sizeof(payload)); + kunit_add_action_or_reset(test, siw_kunit_kfree_skb, skb); + + rd_desc.arg.data = c->qp; + rd_desc.count = sizeof(payload); + + siw_tcp_rx_data(&rd_desc, skb, 0, sizeof(payload)); +} + +static void siw_mpa_write_underflow_rejected(struct kunit *test) +{ + struct siw_mpa_rx_ctx c; + + siw_mpa_rx_setup(test, &c); + + /* mpa_len one byte short of the WRITE hdr_len - MPA_HDR_SIZE floor. */ + siw_mpa_rx_run(test, &c, + sizeof(struct iwarp_rdma_write) - MPA_HDR_SIZE - 1); + + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.valid, 1); + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.layer, + (int)TERM_ERROR_LAYER_LLP); + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.etype, + (int)LLP_ETYPE_MPA); + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.ecode, + (int)LLP_ECODE_FPDU_START); + KUNIT_EXPECT_EQ(test, (int)c.qp->rx_stream.rx_suspend, 1); +} + +static void siw_mpa_write_minimum_valid_accepted(struct kunit *test) +{ + struct siw_mpa_rx_ctx c; + + siw_mpa_rx_setup(test, &c); + + /* + * mpa_len == hdr_len - MPA_HDR_SIZE is the smallest legal value; + * it yields fpdu_part_rem = 0, i.e. a zero-length WRITE. The new + * length check in siw_get_hdr() must accept this. + */ + siw_mpa_rx_run(test, &c, + sizeof(struct iwarp_rdma_write) - MPA_HDR_SIZE); + + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.valid, 0); + KUNIT_EXPECT_EQ(test, (int)c.qp->rx_stream.rx_suspend, 0); +} + +static int siw_mpa_rx_bring_up_lo(struct kunit *test) +{ + struct net_device *lo; + int rv; + + rtnl_lock(); + lo = __dev_get_by_name(&init_net, "lo"); + KUNIT_ASSERT_NOT_NULL(test, lo); + if (!(lo->flags & IFF_UP)) + rv = dev_change_flags(lo, lo->flags | IFF_UP, NULL); + else + rv = 0; + rtnl_unlock(); + KUNIT_ASSERT_EQ(test, rv, 0); + return 0; +} + +static int siw_mpa_rx_make_pair(struct kunit *test, struct socket **listen, + struct socket **server, struct socket **client) +{ + struct sockaddr_in addr = { .sin_family = AF_INET, }; + struct sockaddr_in bound = { }; + struct socket *l = NULL, *s = NULL, *c = NULL; + int rv; + + siw_mpa_rx_bring_up_lo(test); + + rv = sock_create_kern(&init_net, AF_INET, SOCK_STREAM, IPPROTO_TCP, &l); + KUNIT_ASSERT_EQ(test, rv, 0); + + addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); + addr.sin_port = 0; + rv = kernel_bind(l, (struct sockaddr_unsized *)&addr, sizeof(addr)); + KUNIT_ASSERT_EQ(test, rv, 0); + + rv = l->ops->getname(l, (struct sockaddr *)&bound, 0); + KUNIT_ASSERT_GT(test, rv, 0); + + rv = kernel_listen(l, 1); + KUNIT_ASSERT_EQ(test, rv, 0); + + rv = sock_create_kern(&init_net, AF_INET, SOCK_STREAM, IPPROTO_TCP, &c); + KUNIT_ASSERT_EQ(test, rv, 0); + + rv = kernel_connect(c, (struct sockaddr_unsized *)&bound, + sizeof(bound), 0); + KUNIT_ASSERT_EQ(test, rv, 0); + + rv = kernel_accept(l, &s, 0); + KUNIT_ASSERT_EQ(test, rv, 0); + + *listen = l; + *server = s; + *client = c; + return 0; +} + +static void siw_mpa_write_underflow_rejected_live_socket(struct kunit *test) +{ + struct siw_device *sdev; + struct siw_qp *qp; + struct siw_cep *cep; + struct siw_mem *mem; + struct socket *listen_sock = NULL, *server_sock = NULL, *client_sock = NULL; + struct iwarp_rdma_write write = { }; + struct kvec iov; + struct msghdr msg = { }; + void *xa_ret, *target; + u8 payload[sizeof(write) + 1]; + u32 stag = 0x00000100; + int rv, i; + + sdev = kunit_kzalloc(test, sizeof(*sdev), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, sdev); + xa_init_flags(&sdev->mem_xa, XA_FLAGS_ALLOC1); + + qp = kunit_kzalloc(test, sizeof(*qp), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, qp); + qp->sdev = sdev; + qp->pd = kunit_kzalloc(test, sizeof(*qp->pd), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, qp->pd); + qp->rx_stream.state = SIW_GET_HDR; + init_rwsem(&qp->state_lock); + qp->attrs.state = SIW_QP_STATE_RTS; + qp->cep = NULL; + + /* Register a valid REMOTE_WRITE memory object. On unpatched siw + * this is what lets the negative-length copy reach skb_copy_bits; + * without an MR the STag lookup in siw_proc_write() returns NULL + * and the WRITE is terminated before the underflow primitive fires. + * With this patch in place, the new siw_get_hdr() check rejects + * the FPDU BEFORE STag lookup, so the MR is unused. We keep it in + * the test so unpatched-kernel reruns also exercise the full path. + */ + target = kunit_kzalloc(test, 64, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, target); + mem = kunit_kzalloc(test, sizeof(*mem), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, mem); + kref_init(&mem->ref); + mem->sdev = sdev; + mem->stag = stag; + mem->stag_valid = 1; + mem->va = (u64)(uintptr_t)target; + mem->len = 64; + mem->pd = qp->pd; + mem->perms = IB_ACCESS_REMOTE_WRITE; + xa_ret = xa_store(&sdev->mem_xa, stag >> 8, mem, GFP_KERNEL); + KUNIT_ASSERT_FALSE(test, xa_is_err(xa_ret)); + + /* siw_qp_llp_data_ready dereferences sk_user_data as siw_cep. */ + cep = kunit_kzalloc(test, sizeof(*cep), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, cep); + cep->qp = qp; + spin_lock_init(&cep->lock); + kref_init(&cep->ref); + + rv = siw_mpa_rx_make_pair(test, &listen_sock, &server_sock, &client_sock); + KUNIT_ASSERT_EQ(test, rv, 0); + + write_lock_bh(&server_sock->sk->sk_callback_lock); + server_sock->sk->sk_user_data = cep; + server_sock->sk->sk_data_ready = siw_qp_llp_data_ready; + qp->attrs.sk = server_sock; + write_unlock_bh(&server_sock->sk->sk_callback_lock); + + write.ctrl.mpa_len = + cpu_to_be16(sizeof(write) - MPA_HDR_SIZE - 1); + write.ctrl.ddp_rdmap_ctrl = DDP_FLAG_TAGGED | DDP_FLAG_LAST | + cpu_to_be16(DDP_VERSION << 8) | + cpu_to_be16(RDMAP_VERSION << 6) | + cpu_to_be16(RDMAP_RDMA_WRITE); + write.sink_stag = cpu_to_be32(stag); + write.sink_to = cpu_to_be64((u64)(uintptr_t)target); + + memcpy(payload, &write, sizeof(write)); + payload[sizeof(write)] = 0x41; + + iov.iov_base = payload; + iov.iov_len = sizeof(payload); + rv = kernel_sendmsg(client_sock, &msg, &iov, 1, sizeof(payload)); + KUNIT_ASSERT_EQ(test, rv, (int)sizeof(payload)); + + /* Wait for TCP to deliver bytes and sk_data_ready to fire. */ + for (i = 0; i < 200; i++) { + if (qp->term_info.valid) + break; + msleep(20); + } + + KUNIT_EXPECT_EQ(test, (int)qp->term_info.valid, 1); + KUNIT_EXPECT_EQ(test, (int)qp->term_info.layer, + (int)TERM_ERROR_LAYER_LLP); + KUNIT_EXPECT_EQ(test, (int)qp->term_info.etype, + (int)LLP_ETYPE_MPA); + KUNIT_EXPECT_EQ(test, (int)qp->term_info.ecode, + (int)LLP_ECODE_FPDU_START); + KUNIT_EXPECT_EQ(test, (int)qp->rx_stream.rx_suspend, 1); + + /* Detach our handler before tearing down sockets so the TCP stack + * cannot call into freed kunit memory after the test. + */ + write_lock_bh(&server_sock->sk->sk_callback_lock); + server_sock->sk->sk_user_data = NULL; + server_sock->sk->sk_data_ready = sock_def_readable; + write_unlock_bh(&server_sock->sk->sk_callback_lock); + + sock_release(client_sock); + sock_release(server_sock); + sock_release(listen_sock); +} + +static struct kunit_case siw_mpa_rx_cases[] = { + KUNIT_CASE(siw_mpa_write_underflow_rejected), + KUNIT_CASE(siw_mpa_write_minimum_valid_accepted), + KUNIT_CASE(siw_mpa_write_underflow_rejected_live_socket), + { } +}; + +static struct kunit_suite siw_mpa_rx_suite = { + .name = "siw_mpa_rx", + .test_cases = siw_mpa_rx_cases, +}; + +kunit_test_suite(siw_mpa_rx_suite); -- 2.53.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-14 21:24 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-13 17:53 [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage Michael Bommarito 2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito 2026-05-14 17:10 ` Bernard Metzler 2026-05-14 21:24 ` Jason Gunthorpe 2026-05-13 17:53 ` [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing Michael Bommarito
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox