* [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage
@ 2026-05-13 17:53 Michael Bommarito
2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito
2026-05-13 17:53 ` [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing Michael Bommarito
0 siblings, 2 replies; 5+ messages in thread
From: Michael Bommarito @ 2026-05-13 17:53 UTC (permalink / raw)
To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky, linux-rdma
Cc: linux-kernel
[1/2] fixes a peer-controlled signed-int underflow in the Soft-iWARP
receive path: c_hdr->mpa_len (16-bit, on-wire, peer-chosen) is never
compared against iwarp_pktinfo[opcode].hdr_len, so a malformed FPDU
makes siw_tcp_rx_data() derive a negative srx->fpdu_part_rem that
flows through siw_proc_write() / siw_proc_rresp() into siw_check_mem()
(which accepts a negative interval against a valid base) and on into
skb_copy_bits() as a signed int copy length. Under KASAN this fires
as a multi-gigabyte OOB read in the header-copy branch. Full root
cause and the KASAN call trace are in [1/2]'s commit message.
[2/2] adds the KUnit regression harness used to validate [1/2]. It
is split into its own patch because the test brings new Kconfig
plumbing and a new file in drivers/infiniband/sw/siw/, and so that
maintainers can take [1/2] on its own if they want to defer the test
or treat it differently for stable backport. The fix in [1/2] is
tagged for stable; [2/2] is not.
The harness has three cases. Two use a constructed sk_buff: one
asserts the new check rejects an underflowed mpa_len; one is a
regression control with the minimum-valid mpa_len (zero-length
WRITE). The third opens a loopback AF_INET socketpair via
sock_create_kern() and drives the malformed FPDU through the real
kernel TCP receive path (sk_data_ready in softirq -> tcp_read_sock
-> siw_tcp_rx_data), so the same chain a remote peer would exercise
is covered.
Tested:
- UML + KASAN (inline) defconfig + KUNIT + RDMA_SIW: all three
KUnit cases pass with the series applied; the stock tree splats
in skb_copy_bits with "Read of size 4294967295".
- x86_64 modular W=1 build clean on drivers/infiniband/sw/siw/.
- checkpatch.pl --strict clean on both patches (one false-positive
MAINTAINERS warning on [2/2] because the existing siw entry
covers drivers/infiniband/sw/siw/ as a directory).
- git am of the series to a fresh base produces a diff identical
to the validation worktree.
Bug exists since commit 8b6a361b8c48 ("rdma/siw: receive path") in
2019 (5.3-rc1), so all LTS branches with siw are affected; [1/2]
carries Cc: stable.
Michael Bommarito (2):
RDMA/siw: reject MPA FPDU length underflow before signed receive math
RDMA/siw: add KUnit tests for MPA receive parsing
drivers/infiniband/sw/siw/Kconfig | 18 +
drivers/infiniband/sw/siw/Makefile | 2 +
drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c | 349 +++++++++++++++++++
drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +
4 files changed, 384 insertions(+)
create mode 100644 drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math 2026-05-13 17:53 [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage Michael Bommarito @ 2026-05-13 17:53 ` Michael Bommarito 2026-05-14 17:10 ` Bernard Metzler 2026-05-14 21:24 ` Jason Gunthorpe 2026-05-13 17:53 ` [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing Michael Bommarito 1 sibling, 2 replies; 5+ messages in thread From: Michael Bommarito @ 2026-05-13 17:53 UTC (permalink / raw) To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky, linux-rdma Cc: linux-kernel A malicious connected siw peer can send an iWARP FPDU whose MPA length field (c_hdr->mpa_len, 16 bit big-endian, peer-controlled) is smaller than the fixed DDP/RDMAP header for the announced opcode. Soft-iWARP parses the full header in siw_get_hdr() based on iwarp_pktinfo[opcode] .hdr_len, but never compares mpa_len against that header length. siw_tcp_rx_data() then derives srx->fpdu_part_rem = be16_to_cpu(mpa_len) - fpdu_part_rcvd + MPA_HDR_SIZE; where fpdu_part_rcvd equals iwarp_pktinfo[opcode].hdr_len at this point. For a tagged WRITE (hdr_len 16, MPA_HDR_SIZE 2) the smallest on-wire mpa_len of 0 yields fpdu_part_rem = -14, and any mpa_len below hdr_len - MPA_HDR_SIZE underflows to a negative int. The signed value then flows into siw_proc_write()/siw_proc_rresp() as bytes = min(srx->fpdu_part_rem, srx->skb_new); is handed to siw_check_mem() as an int len (whose interval check addr + len > mem->va + mem->len is satisfied for a valid base when len is negative), and reaches siw_rx_data() -> siw_rx_kva() / siw_rx_umem() -> skb_copy_bits() as a signed copy length. The header copy branch in skb_copy_bits() promotes that to size_t, producing a multi-gigabyte read. KASAN under a KUnit harness that drives the real kernel TCP receive path -- a loopback AF_INET socketpair, the malformed FPDU written via kernel_sendmsg, sk_data_ready firing in softirq, tcp_read_sock dispatching to siw_tcp_rx_data -- reports: BUG: KASAN: use-after-free in skb_copy_bits+0x284/0x480 Read of size 4294967295 at addr ffff888... Call Trace: skb_copy_bits siw_rx_kva siw_rx_data siw_check_mem siw_proc_write siw_tcp_rx_data __tcp_read_sock siw_qp_llp_data_ready tcp_data_ready tcp_data_queue Add the missing invariant at the earliest point where the peer header is fully assembled. iwarp_pktinfo[*].hdr_len - MPA_HDR_SIZE is exactly the value the siw transmitter uses as the minimum mpa_len for each opcode (drivers/infiniband/sw/siw/siw_qp.c:33), so this matches the protocol contract. Out-of-range FPDUs terminate the connection with TERM_ERROR_LAYER_LLP / LLP_ETYPE_MPA / LLP_ECODE_FPDU_START -- which is RFC 5044 Section 8 error code 3 ("Marker and ULPDU Length fields do not agree on the start of an FPDU"), the correct framing-error class for this inconsistency. Fixes: 8b6a361b8c48 ("rdma/siw: receive path") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-7 --- See cover letter for full root cause, series rationale, and test summary. [2/2] adds the KUnit regression harness used to validate this fix. drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c index e8a88b378d51..34d03584160c 100644 --- a/drivers/infiniband/sw/siw/siw_qp_rx.c +++ b/drivers/infiniband/sw/siw/siw_qp_rx.c @@ -1081,6 +1081,21 @@ static int siw_get_hdr(struct siw_rx_stream *srx) return -EAGAIN; } + /* + * Peer-controlled mpa_len must not underflow srx->fpdu_part_rem + * in siw_tcp_rx_data(); a negative value flows as a signed copy + * length into siw_check_mem() and skb_copy_bits(). + */ + if (unlikely(be16_to_cpu(c_hdr->mpa_len) + MPA_HDR_SIZE < + iwarp_pktinfo[opcode].hdr_len)) { + pr_warn_ratelimited("siw: short mpa_len %u for opcode %u (hdr_len %u)\n", + be16_to_cpu(c_hdr->mpa_len), opcode, + iwarp_pktinfo[opcode].hdr_len); + siw_init_terminate(rx_qp(srx), TERM_ERROR_LAYER_LLP, + LLP_ETYPE_MPA, LLP_ECODE_FPDU_START, 0); + return -EINVAL; + } + /* * DDP/RDMAP header receive completed. Check if the current * DDP segment starts a new RDMAP message or continues a previously -- 2.53.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math 2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito @ 2026-05-14 17:10 ` Bernard Metzler 2026-05-14 21:24 ` Jason Gunthorpe 1 sibling, 0 replies; 5+ messages in thread From: Bernard Metzler @ 2026-05-14 17:10 UTC (permalink / raw) To: Michael Bommarito, Jason Gunthorpe, Leon Romanovsky, linux-rdma Cc: linux-kernel On 13.05.2026 19:53, Michael Bommarito wrote: > A malicious connected siw peer can send an iWARP FPDU whose MPA length > field (c_hdr->mpa_len, 16 bit big-endian, peer-controlled) is smaller > than the fixed DDP/RDMAP header for the announced opcode. Soft-iWARP > parses the full header in siw_get_hdr() based on iwarp_pktinfo[opcode] > .hdr_len, but never compares mpa_len against that header length. > > siw_tcp_rx_data() then derives > > srx->fpdu_part_rem = be16_to_cpu(mpa_len) - fpdu_part_rcvd > + MPA_HDR_SIZE; > > where fpdu_part_rcvd equals iwarp_pktinfo[opcode].hdr_len at this > point. For a tagged WRITE (hdr_len 16, MPA_HDR_SIZE 2) the smallest > on-wire mpa_len of 0 yields fpdu_part_rem = -14, and any mpa_len below > hdr_len - MPA_HDR_SIZE underflows to a negative int. > > The signed value then flows into siw_proc_write()/siw_proc_rresp() as > > bytes = min(srx->fpdu_part_rem, srx->skb_new); > > is handed to siw_check_mem() as an int len (whose interval check > addr + len > mem->va + mem->len is satisfied for a valid base when > len is negative), and reaches siw_rx_data() -> siw_rx_kva() / > siw_rx_umem() -> skb_copy_bits() as a signed copy length. The header > copy branch in skb_copy_bits() promotes that to size_t, producing a > multi-gigabyte read. > > KASAN under a KUnit harness that drives the real kernel TCP receive > path -- a loopback AF_INET socketpair, the malformed FPDU written via > kernel_sendmsg, sk_data_ready firing in softirq, tcp_read_sock > dispatching to siw_tcp_rx_data -- reports: > > BUG: KASAN: use-after-free in skb_copy_bits+0x284/0x480 > Read of size 4294967295 at addr ffff888... > Call Trace: > skb_copy_bits > siw_rx_kva > siw_rx_data > siw_check_mem > siw_proc_write > siw_tcp_rx_data > __tcp_read_sock > siw_qp_llp_data_ready > tcp_data_ready > tcp_data_queue > > Add the missing invariant at the earliest point where the peer header > is fully assembled. iwarp_pktinfo[*].hdr_len - MPA_HDR_SIZE is exactly > the value the siw transmitter uses as the minimum mpa_len for each > opcode (drivers/infiniband/sw/siw/siw_qp.c:33), so this matches the > protocol contract. Out-of-range FPDUs terminate the connection with > TERM_ERROR_LAYER_LLP / LLP_ETYPE_MPA / LLP_ECODE_FPDU_START -- which > is RFC 5044 Section 8 error code 3 ("Marker and ULPDU Length fields > do not agree on the start of an FPDU"), the correct framing-error > class for this inconsistency. > > Fixes: 8b6a361b8c48 ("rdma/siw: receive path") > Cc: stable@vger.kernel.org > Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> > Assisted-by: Claude:claude-opus-4-7 > --- > See cover letter for full root cause, series rationale, and test > summary. [2/2] adds the KUnit regression harness used to validate > this fix. > > drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c > index e8a88b378d51..34d03584160c 100644 > --- a/drivers/infiniband/sw/siw/siw_qp_rx.c > +++ b/drivers/infiniband/sw/siw/siw_qp_rx.c > @@ -1081,6 +1081,21 @@ static int siw_get_hdr(struct siw_rx_stream *srx) > return -EAGAIN; > } > > + /* > + * Peer-controlled mpa_len must not underflow srx->fpdu_part_rem > + * in siw_tcp_rx_data(); a negative value flows as a signed copy > + * length into siw_check_mem() and skb_copy_bits(). > + */ Excellent finding. This was an open gateway for all evil. > + if (unlikely(be16_to_cpu(c_hdr->mpa_len) + MPA_HDR_SIZE < > + iwarp_pktinfo[opcode].hdr_len)) { > + pr_warn_ratelimited("siw: short mpa_len %u for opcode %u (hdr_len %u)\n", I think we shall stay with 80 chars per line. So let's wrap the above line. Otherwise Acked-by: Bernard Metzler <bernard.metzler@linux.dev> > + be16_to_cpu(c_hdr->mpa_len), opcode, > + iwarp_pktinfo[opcode].hdr_len); > + siw_init_terminate(rx_qp(srx), TERM_ERROR_LAYER_LLP, > + LLP_ETYPE_MPA, LLP_ECODE_FPDU_START, 0); > + return -EINVAL; > + } > + > /* > * DDP/RDMAP header receive completed. Check if the current > * DDP segment starts a new RDMAP message or continues a previously ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math 2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito 2026-05-14 17:10 ` Bernard Metzler @ 2026-05-14 21:24 ` Jason Gunthorpe 1 sibling, 0 replies; 5+ messages in thread From: Jason Gunthorpe @ 2026-05-14 21:24 UTC (permalink / raw) To: Michael Bommarito Cc: Bernard Metzler, Leon Romanovsky, linux-rdma, linux-kernel On Wed, May 13, 2026 at 01:53:24PM -0400, Michael Bommarito wrote: > A malicious connected siw peer can send an iWARP FPDU whose MPA length > field (c_hdr->mpa_len, 16 bit big-endian, peer-controlled) is smaller > than the fixed DDP/RDMAP header for the announced opcode. Soft-iWARP > parses the full header in siw_get_hdr() based on iwarp_pktinfo[opcode] > .hdr_len, but never compares mpa_len against that header length. > > siw_tcp_rx_data() then derives > > srx->fpdu_part_rem = be16_to_cpu(mpa_len) - fpdu_part_rcvd > + MPA_HDR_SIZE; > > where fpdu_part_rcvd equals iwarp_pktinfo[opcode].hdr_len at this > point. For a tagged WRITE (hdr_len 16, MPA_HDR_SIZE 2) the smallest > on-wire mpa_len of 0 yields fpdu_part_rem = -14, and any mpa_len below > hdr_len - MPA_HDR_SIZE underflows to a negative int. > > The signed value then flows into siw_proc_write()/siw_proc_rresp() as > > bytes = min(srx->fpdu_part_rem, srx->skb_new); > > is handed to siw_check_mem() as an int len (whose interval check > addr + len > mem->va + mem->len is satisfied for a valid base when > len is negative), and reaches siw_rx_data() -> siw_rx_kva() / > siw_rx_umem() -> skb_copy_bits() as a signed copy length. The header > copy branch in skb_copy_bits() promotes that to size_t, producing a > multi-gigabyte read. > > KASAN under a KUnit harness that drives the real kernel TCP receive > path -- a loopback AF_INET socketpair, the malformed FPDU written via > kernel_sendmsg, sk_data_ready firing in softirq, tcp_read_sock > dispatching to siw_tcp_rx_data -- reports: > > BUG: KASAN: use-after-free in skb_copy_bits+0x284/0x480 > Read of size 4294967295 at addr ffff888... > Call Trace: > skb_copy_bits > siw_rx_kva > siw_rx_data > siw_check_mem > siw_proc_write > siw_tcp_rx_data > __tcp_read_sock > siw_qp_llp_data_ready > tcp_data_ready > tcp_data_queue > > Add the missing invariant at the earliest point where the peer header > is fully assembled. iwarp_pktinfo[*].hdr_len - MPA_HDR_SIZE is exactly > the value the siw transmitter uses as the minimum mpa_len for each > opcode (drivers/infiniband/sw/siw/siw_qp.c:33), so this matches the > protocol contract. Out-of-range FPDUs terminate the connection with > TERM_ERROR_LAYER_LLP / LLP_ETYPE_MPA / LLP_ECODE_FPDU_START -- which > is RFC 5044 Section 8 error code 3 ("Marker and ULPDU Length fields > do not agree on the start of an FPDU"), the correct framing-error > class for this inconsistency. > > Fixes: 8b6a361b8c48 ("rdma/siw: receive path") > Cc: stable@vger.kernel.org > Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> > Assisted-by: Claude:claude-opus-4-7 > Acked-by: Bernard Metzler <bernard.metzler@linux.dev> > --- > See cover letter for full root cause, series rationale, and test > summary. [2/2] adds the KUnit regression harness used to validate > this fix. > > drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) Applied to for-rc Thanks, Jason ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing 2026-05-13 17:53 [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage Michael Bommarito 2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito @ 2026-05-13 17:53 ` Michael Bommarito 1 sibling, 0 replies; 5+ messages in thread From: Michael Bommarito @ 2026-05-13 17:53 UTC (permalink / raw) To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky, linux-rdma Cc: linux-kernel Add a KUnit suite (CONFIG_SIW_MPA_RX_KUNIT_TEST) that exercises the real siw_tcp_rx_data() path with three cases covering the MPA length validation added in the previous patch: - siw_mpa_write_underflow_rejected Constructs an sk_buff carrying a tagged RDMA WRITE FPDU whose mpa_len is one below iwarp_pktinfo[opcode].hdr_len - MPA_HDR_SIZE. Registers a REMOTE_WRITE MR in mem_xa so the WRITE path would otherwise reach siw_proc_write(), and calls siw_tcp_rx_data() directly. Asserts the FPDU is rejected with TERM(LLP/MPA/FPDU_START) and rx_suspend = 1. - siw_mpa_write_minimum_valid_accepted Regression control with mpa_len = hdr_len - MPA_HDR_SIZE (the smallest legal value, i.e. a zero-length WRITE). Asserts the new check does not fire: no terminate, rx_stream not suspended. - siw_mpa_write_underflow_rejected_live_socket Opens a loopback AF_INET socketpair via sock_create_kern(), attaches a struct siw_cep as sk_user_data so sk_to_qp() resolves to the test QP, and installs siw_qp_llp_data_ready as sk_data_ready on the victim socket. Writes the malformed FPDU via kernel_sendmsg from the attacker side. The kernel TCP stack delivers, sk_data_ready fires in softirq, and tcp_read_sock dispatches to siw_tcp_rx_data the same way a remote peer would. Asserts the same terminate state as the first case. The third case is the design driver: it confirms the bug-fix codepath fires from a real softirq RX entry, not just a synthetic direct call. On a stock siw tree the same harness reproduces the KASAN slab-out-of-bounds / use-after-free in skb_copy_bits. Bringing siw's loopback netdev up (case 3 binds 127.0.0.1) is done inline via dev_change_flags() under rtnl_lock since the KUnit environment does not run init scripts. Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-7 --- drivers/infiniband/sw/siw/Kconfig | 18 + drivers/infiniband/sw/siw/Makefile | 2 + drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c | 349 +++++++++++++++++++ 3 files changed, 369 insertions(+) create mode 100644 drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig index 186f182b80e7..b137f5920271 100644 --- a/drivers/infiniband/sw/siw/Kconfig +++ b/drivers/infiniband/sw/siw/Kconfig @@ -18,3 +18,21 @@ config RDMA_SIW space verbs API, libibverbs. To implement RDMA over TCP/IP, the driver further interfaces with the Linux in-kernel TCP socket layer. + +config SIW_MPA_RX_KUNIT_TEST + bool "KUnit tests for Soft-iWARP MPA receive parsing" if !KUNIT_ALL_TESTS + depends on KUNIT && RDMA_SIW + default KUNIT_ALL_TESTS + help + Build KUnit regression tests for the Soft-iWARP MPA receive + state machine. The tests cover the MPA length consistency + check in siw_get_hdr(): malformed FPDUs whose mpa_len is + below the opcode's fixed DDP/RDMAP header must be rejected + with TERM(LLP/MPA/FPDU_START); the minimum-valid mpa_len + (zero-length WRITE) must still be accepted. One case drives + the real kernel TCP receive path via a loopback socketpair + so the same softirq sk_data_ready -> tcp_read_sock -> + siw_tcp_rx_data chain a remote peer would exercise is + covered. + + If unsure, say N. diff --git a/drivers/infiniband/sw/siw/Makefile b/drivers/infiniband/sw/siw/Makefile index f5f7e3867889..09d4c90d8758 100644 --- a/drivers/infiniband/sw/siw/Makefile +++ b/drivers/infiniband/sw/siw/Makefile @@ -9,3 +9,5 @@ siw-y := \ siw_qp_tx.o \ siw_qp_rx.o \ siw_verbs.o + +siw-$(CONFIG_SIW_MPA_RX_KUNIT_TEST) += siw_mpa_rx_kunit.o diff --git a/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c b/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c new file mode 100644 index 000000000000..204b3213b840 --- /dev/null +++ b/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c @@ -0,0 +1,349 @@ +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause +/* + * KUnit harness for siw MPA receive-state length validation. + * + * Internal to the SIW_MPA_LEN_UNDERFLOW_RX_COPY disclosure validation tree. + * Not part of the upstream patch. + * + * case 1: short mpa_len triggers the new siw_get_hdr() check via direct + * siw_tcp_rx_data() call with a constructed sk_buff + * - expects: TERM(LLP/MPA/FPDU_START), rx_suspend=1 + * - under stock siw: KASAN slab-out-of-bounds in skb_copy_bits() + * - under patched siw: no splat, terminate state set + * + * case 2: minimum-valid mpa_len control (constructed sk_buff) + * - mpa_len = hdr_len - MPA_HDR_SIZE -> fpdu_part_rem = 0 + * so siw_proc_write() takes the zero-length WRITE short path + * and returns 0 without calling skb_copy_bits(). + * - expects: no TERM, state machine progressed normally + * + * case 3: real loopback TCP socketpair (the "live two-node" analog) + * - opens AF_INET TCP sockets in-kernel via sock_create_kern() + * - binds/listens on 127.0.0.1:0, connects, accepts + * - installs siw_qp_llp_data_ready on the victim socket and + * attaches a struct siw_cep so sk_to_qp() resolves to our qp + * - writes the malformed FPDU bytes via kernel_sendmsg on the + * attacker socket + * - the kernel TCP stack delivers, sk_data_ready fires, and + * siw_qp_llp_data_ready -> tcp_read_sock -> siw_tcp_rx_data + * runs in the normal kernel receive path + * - expects: TERM(LLP/MPA/FPDU_START) on the qp + */ + +#include <kunit/test.h> +#include <linux/inet.h> +#include <linux/in.h> +#include <linux/netdevice.h> +#include <linux/rtnetlink.h> +#include <linux/skbuff.h> +#include <linux/tcp.h> +#include <linux/wait.h> +#include <linux/xarray.h> +#include <net/sock.h> +#include <net/tcp.h> +#include <rdma/ib_verbs.h> + +#include "siw.h" +#include "siw_cm.h" +#include "siw_mem.h" + +static void siw_kunit_kfree_skb(void *skb) +{ + kfree_skb(skb); +} + +struct siw_mpa_rx_ctx { + struct siw_device *sdev; + struct siw_qp *qp; + struct siw_mem *mem; + void *target; + u32 stag; +}; + +static void siw_mpa_rx_setup(struct kunit *test, struct siw_mpa_rx_ctx *c) +{ + void *xa_ret; + + c->stag = 0x00000100; + + c->sdev = kunit_kzalloc(test, sizeof(*c->sdev), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, c->sdev); + xa_init_flags(&c->sdev->mem_xa, XA_FLAGS_ALLOC1); + + c->qp = kunit_kzalloc(test, sizeof(*c->qp), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, c->qp); + c->qp->sdev = c->sdev; + c->qp->pd = kunit_kzalloc(test, sizeof(*c->qp->pd), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, c->qp->pd); + c->qp->rx_stream.state = SIW_GET_HDR; + + c->target = kunit_kzalloc(test, 64, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, c->target); + + c->mem = kunit_kzalloc(test, sizeof(*c->mem), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, c->mem); + kref_init(&c->mem->ref); + c->mem->sdev = c->sdev; + c->mem->stag = c->stag; + c->mem->stag_valid = 1; + c->mem->va = (u64)(uintptr_t)c->target; + c->mem->len = 64; + c->mem->pd = c->qp->pd; + c->mem->perms = IB_ACCESS_REMOTE_WRITE; + + xa_ret = xa_store(&c->sdev->mem_xa, c->stag >> 8, c->mem, GFP_KERNEL); + KUNIT_ASSERT_FALSE(test, xa_is_err(xa_ret)); +} + +static void siw_mpa_rx_run(struct kunit *test, struct siw_mpa_rx_ctx *c, + u16 mpa_len_val) +{ + struct iwarp_rdma_write write = { }; + struct sk_buff *skb; + read_descriptor_t rd_desc = { }; + u8 payload[sizeof(write) + 1]; + + write.ctrl.mpa_len = cpu_to_be16(mpa_len_val); + write.ctrl.ddp_rdmap_ctrl = DDP_FLAG_TAGGED | DDP_FLAG_LAST | + cpu_to_be16(DDP_VERSION << 8) | + cpu_to_be16(RDMAP_VERSION << 6) | + cpu_to_be16(RDMAP_RDMA_WRITE); + write.sink_stag = cpu_to_be32(c->stag); + write.sink_to = cpu_to_be64((u64)(uintptr_t)c->target); + + memcpy(payload, &write, sizeof(write)); + payload[sizeof(write)] = 0x41; + + skb = alloc_skb(sizeof(payload), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, skb); + skb_put_data(skb, payload, sizeof(payload)); + kunit_add_action_or_reset(test, siw_kunit_kfree_skb, skb); + + rd_desc.arg.data = c->qp; + rd_desc.count = sizeof(payload); + + siw_tcp_rx_data(&rd_desc, skb, 0, sizeof(payload)); +} + +static void siw_mpa_write_underflow_rejected(struct kunit *test) +{ + struct siw_mpa_rx_ctx c; + + siw_mpa_rx_setup(test, &c); + + /* mpa_len one byte short of the WRITE hdr_len - MPA_HDR_SIZE floor. */ + siw_mpa_rx_run(test, &c, + sizeof(struct iwarp_rdma_write) - MPA_HDR_SIZE - 1); + + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.valid, 1); + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.layer, + (int)TERM_ERROR_LAYER_LLP); + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.etype, + (int)LLP_ETYPE_MPA); + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.ecode, + (int)LLP_ECODE_FPDU_START); + KUNIT_EXPECT_EQ(test, (int)c.qp->rx_stream.rx_suspend, 1); +} + +static void siw_mpa_write_minimum_valid_accepted(struct kunit *test) +{ + struct siw_mpa_rx_ctx c; + + siw_mpa_rx_setup(test, &c); + + /* + * mpa_len == hdr_len - MPA_HDR_SIZE is the smallest legal value; + * it yields fpdu_part_rem = 0, i.e. a zero-length WRITE. The new + * length check in siw_get_hdr() must accept this. + */ + siw_mpa_rx_run(test, &c, + sizeof(struct iwarp_rdma_write) - MPA_HDR_SIZE); + + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.valid, 0); + KUNIT_EXPECT_EQ(test, (int)c.qp->rx_stream.rx_suspend, 0); +} + +static int siw_mpa_rx_bring_up_lo(struct kunit *test) +{ + struct net_device *lo; + int rv; + + rtnl_lock(); + lo = __dev_get_by_name(&init_net, "lo"); + KUNIT_ASSERT_NOT_NULL(test, lo); + if (!(lo->flags & IFF_UP)) + rv = dev_change_flags(lo, lo->flags | IFF_UP, NULL); + else + rv = 0; + rtnl_unlock(); + KUNIT_ASSERT_EQ(test, rv, 0); + return 0; +} + +static int siw_mpa_rx_make_pair(struct kunit *test, struct socket **listen, + struct socket **server, struct socket **client) +{ + struct sockaddr_in addr = { .sin_family = AF_INET, }; + struct sockaddr_in bound = { }; + struct socket *l = NULL, *s = NULL, *c = NULL; + int rv; + + siw_mpa_rx_bring_up_lo(test); + + rv = sock_create_kern(&init_net, AF_INET, SOCK_STREAM, IPPROTO_TCP, &l); + KUNIT_ASSERT_EQ(test, rv, 0); + + addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); + addr.sin_port = 0; + rv = kernel_bind(l, (struct sockaddr_unsized *)&addr, sizeof(addr)); + KUNIT_ASSERT_EQ(test, rv, 0); + + rv = l->ops->getname(l, (struct sockaddr *)&bound, 0); + KUNIT_ASSERT_GT(test, rv, 0); + + rv = kernel_listen(l, 1); + KUNIT_ASSERT_EQ(test, rv, 0); + + rv = sock_create_kern(&init_net, AF_INET, SOCK_STREAM, IPPROTO_TCP, &c); + KUNIT_ASSERT_EQ(test, rv, 0); + + rv = kernel_connect(c, (struct sockaddr_unsized *)&bound, + sizeof(bound), 0); + KUNIT_ASSERT_EQ(test, rv, 0); + + rv = kernel_accept(l, &s, 0); + KUNIT_ASSERT_EQ(test, rv, 0); + + *listen = l; + *server = s; + *client = c; + return 0; +} + +static void siw_mpa_write_underflow_rejected_live_socket(struct kunit *test) +{ + struct siw_device *sdev; + struct siw_qp *qp; + struct siw_cep *cep; + struct siw_mem *mem; + struct socket *listen_sock = NULL, *server_sock = NULL, *client_sock = NULL; + struct iwarp_rdma_write write = { }; + struct kvec iov; + struct msghdr msg = { }; + void *xa_ret, *target; + u8 payload[sizeof(write) + 1]; + u32 stag = 0x00000100; + int rv, i; + + sdev = kunit_kzalloc(test, sizeof(*sdev), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, sdev); + xa_init_flags(&sdev->mem_xa, XA_FLAGS_ALLOC1); + + qp = kunit_kzalloc(test, sizeof(*qp), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, qp); + qp->sdev = sdev; + qp->pd = kunit_kzalloc(test, sizeof(*qp->pd), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, qp->pd); + qp->rx_stream.state = SIW_GET_HDR; + init_rwsem(&qp->state_lock); + qp->attrs.state = SIW_QP_STATE_RTS; + qp->cep = NULL; + + /* Register a valid REMOTE_WRITE memory object. On unpatched siw + * this is what lets the negative-length copy reach skb_copy_bits; + * without an MR the STag lookup in siw_proc_write() returns NULL + * and the WRITE is terminated before the underflow primitive fires. + * With this patch in place, the new siw_get_hdr() check rejects + * the FPDU BEFORE STag lookup, so the MR is unused. We keep it in + * the test so unpatched-kernel reruns also exercise the full path. + */ + target = kunit_kzalloc(test, 64, GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, target); + mem = kunit_kzalloc(test, sizeof(*mem), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, mem); + kref_init(&mem->ref); + mem->sdev = sdev; + mem->stag = stag; + mem->stag_valid = 1; + mem->va = (u64)(uintptr_t)target; + mem->len = 64; + mem->pd = qp->pd; + mem->perms = IB_ACCESS_REMOTE_WRITE; + xa_ret = xa_store(&sdev->mem_xa, stag >> 8, mem, GFP_KERNEL); + KUNIT_ASSERT_FALSE(test, xa_is_err(xa_ret)); + + /* siw_qp_llp_data_ready dereferences sk_user_data as siw_cep. */ + cep = kunit_kzalloc(test, sizeof(*cep), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, cep); + cep->qp = qp; + spin_lock_init(&cep->lock); + kref_init(&cep->ref); + + rv = siw_mpa_rx_make_pair(test, &listen_sock, &server_sock, &client_sock); + KUNIT_ASSERT_EQ(test, rv, 0); + + write_lock_bh(&server_sock->sk->sk_callback_lock); + server_sock->sk->sk_user_data = cep; + server_sock->sk->sk_data_ready = siw_qp_llp_data_ready; + qp->attrs.sk = server_sock; + write_unlock_bh(&server_sock->sk->sk_callback_lock); + + write.ctrl.mpa_len = + cpu_to_be16(sizeof(write) - MPA_HDR_SIZE - 1); + write.ctrl.ddp_rdmap_ctrl = DDP_FLAG_TAGGED | DDP_FLAG_LAST | + cpu_to_be16(DDP_VERSION << 8) | + cpu_to_be16(RDMAP_VERSION << 6) | + cpu_to_be16(RDMAP_RDMA_WRITE); + write.sink_stag = cpu_to_be32(stag); + write.sink_to = cpu_to_be64((u64)(uintptr_t)target); + + memcpy(payload, &write, sizeof(write)); + payload[sizeof(write)] = 0x41; + + iov.iov_base = payload; + iov.iov_len = sizeof(payload); + rv = kernel_sendmsg(client_sock, &msg, &iov, 1, sizeof(payload)); + KUNIT_ASSERT_EQ(test, rv, (int)sizeof(payload)); + + /* Wait for TCP to deliver bytes and sk_data_ready to fire. */ + for (i = 0; i < 200; i++) { + if (qp->term_info.valid) + break; + msleep(20); + } + + KUNIT_EXPECT_EQ(test, (int)qp->term_info.valid, 1); + KUNIT_EXPECT_EQ(test, (int)qp->term_info.layer, + (int)TERM_ERROR_LAYER_LLP); + KUNIT_EXPECT_EQ(test, (int)qp->term_info.etype, + (int)LLP_ETYPE_MPA); + KUNIT_EXPECT_EQ(test, (int)qp->term_info.ecode, + (int)LLP_ECODE_FPDU_START); + KUNIT_EXPECT_EQ(test, (int)qp->rx_stream.rx_suspend, 1); + + /* Detach our handler before tearing down sockets so the TCP stack + * cannot call into freed kunit memory after the test. + */ + write_lock_bh(&server_sock->sk->sk_callback_lock); + server_sock->sk->sk_user_data = NULL; + server_sock->sk->sk_data_ready = sock_def_readable; + write_unlock_bh(&server_sock->sk->sk_callback_lock); + + sock_release(client_sock); + sock_release(server_sock); + sock_release(listen_sock); +} + +static struct kunit_case siw_mpa_rx_cases[] = { + KUNIT_CASE(siw_mpa_write_underflow_rejected), + KUNIT_CASE(siw_mpa_write_minimum_valid_accepted), + KUNIT_CASE(siw_mpa_write_underflow_rejected_live_socket), + { } +}; + +static struct kunit_suite siw_mpa_rx_suite = { + .name = "siw_mpa_rx", + .test_cases = siw_mpa_rx_cases, +}; + +kunit_test_suite(siw_mpa_rx_suite); -- 2.53.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-14 21:24 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-13 17:53 [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage Michael Bommarito 2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito 2026-05-14 17:10 ` Bernard Metzler 2026-05-14 21:24 ` Jason Gunthorpe 2026-05-13 17:53 ` [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing Michael Bommarito
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.