* [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage
@ 2026-05-13 17:53 Michael Bommarito
2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito
2026-05-13 17:53 ` [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing Michael Bommarito
0 siblings, 2 replies; 5+ messages in thread
From: Michael Bommarito @ 2026-05-13 17:53 UTC (permalink / raw)
To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky, linux-rdma
Cc: linux-kernel
[1/2] fixes a peer-controlled signed-int underflow in the Soft-iWARP
receive path: c_hdr->mpa_len (16-bit, on-wire, peer-chosen) is never
compared against iwarp_pktinfo[opcode].hdr_len, so a malformed FPDU
makes siw_tcp_rx_data() derive a negative srx->fpdu_part_rem that
flows through siw_proc_write() / siw_proc_rresp() into siw_check_mem()
(which accepts a negative interval against a valid base) and on into
skb_copy_bits() as a signed int copy length. Under KASAN this fires
as a multi-gigabyte OOB read in the header-copy branch. Full root
cause and the KASAN call trace are in [1/2]'s commit message.
[2/2] adds the KUnit regression harness used to validate [1/2]. It
is split into its own patch because the test brings new Kconfig
plumbing and a new file in drivers/infiniband/sw/siw/, and so that
maintainers can take [1/2] on its own if they want to defer the test
or treat it differently for stable backport. The fix in [1/2] is
tagged for stable; [2/2] is not.
The harness has three cases. Two use a constructed sk_buff: one
asserts the new check rejects an underflowed mpa_len; one is a
regression control with the minimum-valid mpa_len (zero-length
WRITE). The third opens a loopback AF_INET socketpair via
sock_create_kern() and drives the malformed FPDU through the real
kernel TCP receive path (sk_data_ready in softirq -> tcp_read_sock
-> siw_tcp_rx_data), so the same chain a remote peer would exercise
is covered.
Tested:
- UML + KASAN (inline) defconfig + KUNIT + RDMA_SIW: all three
KUnit cases pass with the series applied; the stock tree splats
in skb_copy_bits with "Read of size 4294967295".
- x86_64 modular W=1 build clean on drivers/infiniband/sw/siw/.
- checkpatch.pl --strict clean on both patches (one false-positive
MAINTAINERS warning on [2/2] because the existing siw entry
covers drivers/infiniband/sw/siw/ as a directory).
- git am of the series to a fresh base produces a diff identical
to the validation worktree.
Bug exists since commit 8b6a361b8c48 ("rdma/siw: receive path") in
2019 (5.3-rc1), so all LTS branches with siw are affected; [1/2]
carries Cc: stable.
Michael Bommarito (2):
RDMA/siw: reject MPA FPDU length underflow before signed receive math
RDMA/siw: add KUnit tests for MPA receive parsing
drivers/infiniband/sw/siw/Kconfig | 18 +
drivers/infiniband/sw/siw/Makefile | 2 +
drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c | 349 +++++++++++++++++++
drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +
4 files changed, 384 insertions(+)
create mode 100644 drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math
2026-05-13 17:53 [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage Michael Bommarito
@ 2026-05-13 17:53 ` Michael Bommarito
2026-05-14 17:10 ` Bernard Metzler
2026-05-14 21:24 ` Jason Gunthorpe
2026-05-13 17:53 ` [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing Michael Bommarito
1 sibling, 2 replies; 5+ messages in thread
From: Michael Bommarito @ 2026-05-13 17:53 UTC (permalink / raw)
To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky, linux-rdma
Cc: linux-kernel
A malicious connected siw peer can send an iWARP FPDU whose MPA length
field (c_hdr->mpa_len, 16 bit big-endian, peer-controlled) is smaller
than the fixed DDP/RDMAP header for the announced opcode. Soft-iWARP
parses the full header in siw_get_hdr() based on iwarp_pktinfo[opcode]
.hdr_len, but never compares mpa_len against that header length.
siw_tcp_rx_data() then derives
srx->fpdu_part_rem = be16_to_cpu(mpa_len) - fpdu_part_rcvd
+ MPA_HDR_SIZE;
where fpdu_part_rcvd equals iwarp_pktinfo[opcode].hdr_len at this
point. For a tagged WRITE (hdr_len 16, MPA_HDR_SIZE 2) the smallest
on-wire mpa_len of 0 yields fpdu_part_rem = -14, and any mpa_len below
hdr_len - MPA_HDR_SIZE underflows to a negative int.
The signed value then flows into siw_proc_write()/siw_proc_rresp() as
bytes = min(srx->fpdu_part_rem, srx->skb_new);
is handed to siw_check_mem() as an int len (whose interval check
addr + len > mem->va + mem->len is satisfied for a valid base when
len is negative), and reaches siw_rx_data() -> siw_rx_kva() /
siw_rx_umem() -> skb_copy_bits() as a signed copy length. The header
copy branch in skb_copy_bits() promotes that to size_t, producing a
multi-gigabyte read.
KASAN under a KUnit harness that drives the real kernel TCP receive
path -- a loopback AF_INET socketpair, the malformed FPDU written via
kernel_sendmsg, sk_data_ready firing in softirq, tcp_read_sock
dispatching to siw_tcp_rx_data -- reports:
BUG: KASAN: use-after-free in skb_copy_bits+0x284/0x480
Read of size 4294967295 at addr ffff888...
Call Trace:
skb_copy_bits
siw_rx_kva
siw_rx_data
siw_check_mem
siw_proc_write
siw_tcp_rx_data
__tcp_read_sock
siw_qp_llp_data_ready
tcp_data_ready
tcp_data_queue
Add the missing invariant at the earliest point where the peer header
is fully assembled. iwarp_pktinfo[*].hdr_len - MPA_HDR_SIZE is exactly
the value the siw transmitter uses as the minimum mpa_len for each
opcode (drivers/infiniband/sw/siw/siw_qp.c:33), so this matches the
protocol contract. Out-of-range FPDUs terminate the connection with
TERM_ERROR_LAYER_LLP / LLP_ETYPE_MPA / LLP_ECODE_FPDU_START -- which
is RFC 5044 Section 8 error code 3 ("Marker and ULPDU Length fields
do not agree on the start of an FPDU"), the correct framing-error
class for this inconsistency.
Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
---
See cover letter for full root cause, series rationale, and test
summary. [2/2] adds the KUnit regression harness used to validate
this fix.
drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c
index e8a88b378d51..34d03584160c 100644
--- a/drivers/infiniband/sw/siw/siw_qp_rx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_rx.c
@@ -1081,6 +1081,21 @@ static int siw_get_hdr(struct siw_rx_stream *srx)
return -EAGAIN;
}
+ /*
+ * Peer-controlled mpa_len must not underflow srx->fpdu_part_rem
+ * in siw_tcp_rx_data(); a negative value flows as a signed copy
+ * length into siw_check_mem() and skb_copy_bits().
+ */
+ if (unlikely(be16_to_cpu(c_hdr->mpa_len) + MPA_HDR_SIZE <
+ iwarp_pktinfo[opcode].hdr_len)) {
+ pr_warn_ratelimited("siw: short mpa_len %u for opcode %u (hdr_len %u)\n",
+ be16_to_cpu(c_hdr->mpa_len), opcode,
+ iwarp_pktinfo[opcode].hdr_len);
+ siw_init_terminate(rx_qp(srx), TERM_ERROR_LAYER_LLP,
+ LLP_ETYPE_MPA, LLP_ECODE_FPDU_START, 0);
+ return -EINVAL;
+ }
+
/*
* DDP/RDMAP header receive completed. Check if the current
* DDP segment starts a new RDMAP message or continues a previously
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing
2026-05-13 17:53 [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage Michael Bommarito
2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito
@ 2026-05-13 17:53 ` Michael Bommarito
1 sibling, 0 replies; 5+ messages in thread
From: Michael Bommarito @ 2026-05-13 17:53 UTC (permalink / raw)
To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky, linux-rdma
Cc: linux-kernel
Add a KUnit suite (CONFIG_SIW_MPA_RX_KUNIT_TEST) that exercises the
real siw_tcp_rx_data() path with three cases covering the MPA length
validation added in the previous patch:
- siw_mpa_write_underflow_rejected
Constructs an sk_buff carrying a tagged RDMA WRITE FPDU whose
mpa_len is one below iwarp_pktinfo[opcode].hdr_len -
MPA_HDR_SIZE. Registers a REMOTE_WRITE MR in mem_xa so the
WRITE path would otherwise reach siw_proc_write(), and calls
siw_tcp_rx_data() directly. Asserts the FPDU is rejected with
TERM(LLP/MPA/FPDU_START) and rx_suspend = 1.
- siw_mpa_write_minimum_valid_accepted
Regression control with mpa_len = hdr_len - MPA_HDR_SIZE (the
smallest legal value, i.e. a zero-length WRITE). Asserts the
new check does not fire: no terminate, rx_stream not
suspended.
- siw_mpa_write_underflow_rejected_live_socket
Opens a loopback AF_INET socketpair via sock_create_kern(),
attaches a struct siw_cep as sk_user_data so sk_to_qp()
resolves to the test QP, and installs siw_qp_llp_data_ready as
sk_data_ready on the victim socket. Writes the malformed FPDU
via kernel_sendmsg from the attacker side. The kernel TCP
stack delivers, sk_data_ready fires in softirq, and
tcp_read_sock dispatches to siw_tcp_rx_data the same way a
remote peer would. Asserts the same terminate state as the
first case.
The third case is the design driver: it confirms the bug-fix
codepath fires from a real softirq RX entry, not just a synthetic
direct call. On a stock siw tree the same harness reproduces the
KASAN slab-out-of-bounds / use-after-free in skb_copy_bits.
Bringing siw's loopback netdev up (case 3 binds 127.0.0.1) is done
inline via dev_change_flags() under rtnl_lock since the KUnit
environment does not run init scripts.
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
---
drivers/infiniband/sw/siw/Kconfig | 18 +
drivers/infiniband/sw/siw/Makefile | 2 +
drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c | 349 +++++++++++++++++++
3 files changed, 369 insertions(+)
create mode 100644 drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c
diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig
index 186f182b80e7..b137f5920271 100644
--- a/drivers/infiniband/sw/siw/Kconfig
+++ b/drivers/infiniband/sw/siw/Kconfig
@@ -18,3 +18,21 @@ config RDMA_SIW
space verbs API, libibverbs. To implement RDMA over
TCP/IP, the driver further interfaces with the Linux
in-kernel TCP socket layer.
+
+config SIW_MPA_RX_KUNIT_TEST
+ bool "KUnit tests for Soft-iWARP MPA receive parsing" if !KUNIT_ALL_TESTS
+ depends on KUNIT && RDMA_SIW
+ default KUNIT_ALL_TESTS
+ help
+ Build KUnit regression tests for the Soft-iWARP MPA receive
+ state machine. The tests cover the MPA length consistency
+ check in siw_get_hdr(): malformed FPDUs whose mpa_len is
+ below the opcode's fixed DDP/RDMAP header must be rejected
+ with TERM(LLP/MPA/FPDU_START); the minimum-valid mpa_len
+ (zero-length WRITE) must still be accepted. One case drives
+ the real kernel TCP receive path via a loopback socketpair
+ so the same softirq sk_data_ready -> tcp_read_sock ->
+ siw_tcp_rx_data chain a remote peer would exercise is
+ covered.
+
+ If unsure, say N.
diff --git a/drivers/infiniband/sw/siw/Makefile b/drivers/infiniband/sw/siw/Makefile
index f5f7e3867889..09d4c90d8758 100644
--- a/drivers/infiniband/sw/siw/Makefile
+++ b/drivers/infiniband/sw/siw/Makefile
@@ -9,3 +9,5 @@ siw-y := \
siw_qp_tx.o \
siw_qp_rx.o \
siw_verbs.o
+
+siw-$(CONFIG_SIW_MPA_RX_KUNIT_TEST) += siw_mpa_rx_kunit.o
diff --git a/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c b/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c
new file mode 100644
index 000000000000..204b3213b840
--- /dev/null
+++ b/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c
@@ -0,0 +1,349 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+/*
+ * KUnit harness for siw MPA receive-state length validation.
+ *
+ * Internal to the SIW_MPA_LEN_UNDERFLOW_RX_COPY disclosure validation tree.
+ * Not part of the upstream patch.
+ *
+ * case 1: short mpa_len triggers the new siw_get_hdr() check via direct
+ * siw_tcp_rx_data() call with a constructed sk_buff
+ * - expects: TERM(LLP/MPA/FPDU_START), rx_suspend=1
+ * - under stock siw: KASAN slab-out-of-bounds in skb_copy_bits()
+ * - under patched siw: no splat, terminate state set
+ *
+ * case 2: minimum-valid mpa_len control (constructed sk_buff)
+ * - mpa_len = hdr_len - MPA_HDR_SIZE -> fpdu_part_rem = 0
+ * so siw_proc_write() takes the zero-length WRITE short path
+ * and returns 0 without calling skb_copy_bits().
+ * - expects: no TERM, state machine progressed normally
+ *
+ * case 3: real loopback TCP socketpair (the "live two-node" analog)
+ * - opens AF_INET TCP sockets in-kernel via sock_create_kern()
+ * - binds/listens on 127.0.0.1:0, connects, accepts
+ * - installs siw_qp_llp_data_ready on the victim socket and
+ * attaches a struct siw_cep so sk_to_qp() resolves to our qp
+ * - writes the malformed FPDU bytes via kernel_sendmsg on the
+ * attacker socket
+ * - the kernel TCP stack delivers, sk_data_ready fires, and
+ * siw_qp_llp_data_ready -> tcp_read_sock -> siw_tcp_rx_data
+ * runs in the normal kernel receive path
+ * - expects: TERM(LLP/MPA/FPDU_START) on the qp
+ */
+
+#include <kunit/test.h>
+#include <linux/inet.h>
+#include <linux/in.h>
+#include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
+#include <linux/skbuff.h>
+#include <linux/tcp.h>
+#include <linux/wait.h>
+#include <linux/xarray.h>
+#include <net/sock.h>
+#include <net/tcp.h>
+#include <rdma/ib_verbs.h>
+
+#include "siw.h"
+#include "siw_cm.h"
+#include "siw_mem.h"
+
+static void siw_kunit_kfree_skb(void *skb)
+{
+ kfree_skb(skb);
+}
+
+struct siw_mpa_rx_ctx {
+ struct siw_device *sdev;
+ struct siw_qp *qp;
+ struct siw_mem *mem;
+ void *target;
+ u32 stag;
+};
+
+static void siw_mpa_rx_setup(struct kunit *test, struct siw_mpa_rx_ctx *c)
+{
+ void *xa_ret;
+
+ c->stag = 0x00000100;
+
+ c->sdev = kunit_kzalloc(test, sizeof(*c->sdev), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, c->sdev);
+ xa_init_flags(&c->sdev->mem_xa, XA_FLAGS_ALLOC1);
+
+ c->qp = kunit_kzalloc(test, sizeof(*c->qp), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, c->qp);
+ c->qp->sdev = c->sdev;
+ c->qp->pd = kunit_kzalloc(test, sizeof(*c->qp->pd), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, c->qp->pd);
+ c->qp->rx_stream.state = SIW_GET_HDR;
+
+ c->target = kunit_kzalloc(test, 64, GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, c->target);
+
+ c->mem = kunit_kzalloc(test, sizeof(*c->mem), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, c->mem);
+ kref_init(&c->mem->ref);
+ c->mem->sdev = c->sdev;
+ c->mem->stag = c->stag;
+ c->mem->stag_valid = 1;
+ c->mem->va = (u64)(uintptr_t)c->target;
+ c->mem->len = 64;
+ c->mem->pd = c->qp->pd;
+ c->mem->perms = IB_ACCESS_REMOTE_WRITE;
+
+ xa_ret = xa_store(&c->sdev->mem_xa, c->stag >> 8, c->mem, GFP_KERNEL);
+ KUNIT_ASSERT_FALSE(test, xa_is_err(xa_ret));
+}
+
+static void siw_mpa_rx_run(struct kunit *test, struct siw_mpa_rx_ctx *c,
+ u16 mpa_len_val)
+{
+ struct iwarp_rdma_write write = { };
+ struct sk_buff *skb;
+ read_descriptor_t rd_desc = { };
+ u8 payload[sizeof(write) + 1];
+
+ write.ctrl.mpa_len = cpu_to_be16(mpa_len_val);
+ write.ctrl.ddp_rdmap_ctrl = DDP_FLAG_TAGGED | DDP_FLAG_LAST |
+ cpu_to_be16(DDP_VERSION << 8) |
+ cpu_to_be16(RDMAP_VERSION << 6) |
+ cpu_to_be16(RDMAP_RDMA_WRITE);
+ write.sink_stag = cpu_to_be32(c->stag);
+ write.sink_to = cpu_to_be64((u64)(uintptr_t)c->target);
+
+ memcpy(payload, &write, sizeof(write));
+ payload[sizeof(write)] = 0x41;
+
+ skb = alloc_skb(sizeof(payload), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, skb);
+ skb_put_data(skb, payload, sizeof(payload));
+ kunit_add_action_or_reset(test, siw_kunit_kfree_skb, skb);
+
+ rd_desc.arg.data = c->qp;
+ rd_desc.count = sizeof(payload);
+
+ siw_tcp_rx_data(&rd_desc, skb, 0, sizeof(payload));
+}
+
+static void siw_mpa_write_underflow_rejected(struct kunit *test)
+{
+ struct siw_mpa_rx_ctx c;
+
+ siw_mpa_rx_setup(test, &c);
+
+ /* mpa_len one byte short of the WRITE hdr_len - MPA_HDR_SIZE floor. */
+ siw_mpa_rx_run(test, &c,
+ sizeof(struct iwarp_rdma_write) - MPA_HDR_SIZE - 1);
+
+ KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.valid, 1);
+ KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.layer,
+ (int)TERM_ERROR_LAYER_LLP);
+ KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.etype,
+ (int)LLP_ETYPE_MPA);
+ KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.ecode,
+ (int)LLP_ECODE_FPDU_START);
+ KUNIT_EXPECT_EQ(test, (int)c.qp->rx_stream.rx_suspend, 1);
+}
+
+static void siw_mpa_write_minimum_valid_accepted(struct kunit *test)
+{
+ struct siw_mpa_rx_ctx c;
+
+ siw_mpa_rx_setup(test, &c);
+
+ /*
+ * mpa_len == hdr_len - MPA_HDR_SIZE is the smallest legal value;
+ * it yields fpdu_part_rem = 0, i.e. a zero-length WRITE. The new
+ * length check in siw_get_hdr() must accept this.
+ */
+ siw_mpa_rx_run(test, &c,
+ sizeof(struct iwarp_rdma_write) - MPA_HDR_SIZE);
+
+ KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.valid, 0);
+ KUNIT_EXPECT_EQ(test, (int)c.qp->rx_stream.rx_suspend, 0);
+}
+
+static int siw_mpa_rx_bring_up_lo(struct kunit *test)
+{
+ struct net_device *lo;
+ int rv;
+
+ rtnl_lock();
+ lo = __dev_get_by_name(&init_net, "lo");
+ KUNIT_ASSERT_NOT_NULL(test, lo);
+ if (!(lo->flags & IFF_UP))
+ rv = dev_change_flags(lo, lo->flags | IFF_UP, NULL);
+ else
+ rv = 0;
+ rtnl_unlock();
+ KUNIT_ASSERT_EQ(test, rv, 0);
+ return 0;
+}
+
+static int siw_mpa_rx_make_pair(struct kunit *test, struct socket **listen,
+ struct socket **server, struct socket **client)
+{
+ struct sockaddr_in addr = { .sin_family = AF_INET, };
+ struct sockaddr_in bound = { };
+ struct socket *l = NULL, *s = NULL, *c = NULL;
+ int rv;
+
+ siw_mpa_rx_bring_up_lo(test);
+
+ rv = sock_create_kern(&init_net, AF_INET, SOCK_STREAM, IPPROTO_TCP, &l);
+ KUNIT_ASSERT_EQ(test, rv, 0);
+
+ addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ addr.sin_port = 0;
+ rv = kernel_bind(l, (struct sockaddr_unsized *)&addr, sizeof(addr));
+ KUNIT_ASSERT_EQ(test, rv, 0);
+
+ rv = l->ops->getname(l, (struct sockaddr *)&bound, 0);
+ KUNIT_ASSERT_GT(test, rv, 0);
+
+ rv = kernel_listen(l, 1);
+ KUNIT_ASSERT_EQ(test, rv, 0);
+
+ rv = sock_create_kern(&init_net, AF_INET, SOCK_STREAM, IPPROTO_TCP, &c);
+ KUNIT_ASSERT_EQ(test, rv, 0);
+
+ rv = kernel_connect(c, (struct sockaddr_unsized *)&bound,
+ sizeof(bound), 0);
+ KUNIT_ASSERT_EQ(test, rv, 0);
+
+ rv = kernel_accept(l, &s, 0);
+ KUNIT_ASSERT_EQ(test, rv, 0);
+
+ *listen = l;
+ *server = s;
+ *client = c;
+ return 0;
+}
+
+static void siw_mpa_write_underflow_rejected_live_socket(struct kunit *test)
+{
+ struct siw_device *sdev;
+ struct siw_qp *qp;
+ struct siw_cep *cep;
+ struct siw_mem *mem;
+ struct socket *listen_sock = NULL, *server_sock = NULL, *client_sock = NULL;
+ struct iwarp_rdma_write write = { };
+ struct kvec iov;
+ struct msghdr msg = { };
+ void *xa_ret, *target;
+ u8 payload[sizeof(write) + 1];
+ u32 stag = 0x00000100;
+ int rv, i;
+
+ sdev = kunit_kzalloc(test, sizeof(*sdev), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, sdev);
+ xa_init_flags(&sdev->mem_xa, XA_FLAGS_ALLOC1);
+
+ qp = kunit_kzalloc(test, sizeof(*qp), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, qp);
+ qp->sdev = sdev;
+ qp->pd = kunit_kzalloc(test, sizeof(*qp->pd), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, qp->pd);
+ qp->rx_stream.state = SIW_GET_HDR;
+ init_rwsem(&qp->state_lock);
+ qp->attrs.state = SIW_QP_STATE_RTS;
+ qp->cep = NULL;
+
+ /* Register a valid REMOTE_WRITE memory object. On unpatched siw
+ * this is what lets the negative-length copy reach skb_copy_bits;
+ * without an MR the STag lookup in siw_proc_write() returns NULL
+ * and the WRITE is terminated before the underflow primitive fires.
+ * With this patch in place, the new siw_get_hdr() check rejects
+ * the FPDU BEFORE STag lookup, so the MR is unused. We keep it in
+ * the test so unpatched-kernel reruns also exercise the full path.
+ */
+ target = kunit_kzalloc(test, 64, GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, target);
+ mem = kunit_kzalloc(test, sizeof(*mem), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, mem);
+ kref_init(&mem->ref);
+ mem->sdev = sdev;
+ mem->stag = stag;
+ mem->stag_valid = 1;
+ mem->va = (u64)(uintptr_t)target;
+ mem->len = 64;
+ mem->pd = qp->pd;
+ mem->perms = IB_ACCESS_REMOTE_WRITE;
+ xa_ret = xa_store(&sdev->mem_xa, stag >> 8, mem, GFP_KERNEL);
+ KUNIT_ASSERT_FALSE(test, xa_is_err(xa_ret));
+
+ /* siw_qp_llp_data_ready dereferences sk_user_data as siw_cep. */
+ cep = kunit_kzalloc(test, sizeof(*cep), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, cep);
+ cep->qp = qp;
+ spin_lock_init(&cep->lock);
+ kref_init(&cep->ref);
+
+ rv = siw_mpa_rx_make_pair(test, &listen_sock, &server_sock, &client_sock);
+ KUNIT_ASSERT_EQ(test, rv, 0);
+
+ write_lock_bh(&server_sock->sk->sk_callback_lock);
+ server_sock->sk->sk_user_data = cep;
+ server_sock->sk->sk_data_ready = siw_qp_llp_data_ready;
+ qp->attrs.sk = server_sock;
+ write_unlock_bh(&server_sock->sk->sk_callback_lock);
+
+ write.ctrl.mpa_len =
+ cpu_to_be16(sizeof(write) - MPA_HDR_SIZE - 1);
+ write.ctrl.ddp_rdmap_ctrl = DDP_FLAG_TAGGED | DDP_FLAG_LAST |
+ cpu_to_be16(DDP_VERSION << 8) |
+ cpu_to_be16(RDMAP_VERSION << 6) |
+ cpu_to_be16(RDMAP_RDMA_WRITE);
+ write.sink_stag = cpu_to_be32(stag);
+ write.sink_to = cpu_to_be64((u64)(uintptr_t)target);
+
+ memcpy(payload, &write, sizeof(write));
+ payload[sizeof(write)] = 0x41;
+
+ iov.iov_base = payload;
+ iov.iov_len = sizeof(payload);
+ rv = kernel_sendmsg(client_sock, &msg, &iov, 1, sizeof(payload));
+ KUNIT_ASSERT_EQ(test, rv, (int)sizeof(payload));
+
+ /* Wait for TCP to deliver bytes and sk_data_ready to fire. */
+ for (i = 0; i < 200; i++) {
+ if (qp->term_info.valid)
+ break;
+ msleep(20);
+ }
+
+ KUNIT_EXPECT_EQ(test, (int)qp->term_info.valid, 1);
+ KUNIT_EXPECT_EQ(test, (int)qp->term_info.layer,
+ (int)TERM_ERROR_LAYER_LLP);
+ KUNIT_EXPECT_EQ(test, (int)qp->term_info.etype,
+ (int)LLP_ETYPE_MPA);
+ KUNIT_EXPECT_EQ(test, (int)qp->term_info.ecode,
+ (int)LLP_ECODE_FPDU_START);
+ KUNIT_EXPECT_EQ(test, (int)qp->rx_stream.rx_suspend, 1);
+
+ /* Detach our handler before tearing down sockets so the TCP stack
+ * cannot call into freed kunit memory after the test.
+ */
+ write_lock_bh(&server_sock->sk->sk_callback_lock);
+ server_sock->sk->sk_user_data = NULL;
+ server_sock->sk->sk_data_ready = sock_def_readable;
+ write_unlock_bh(&server_sock->sk->sk_callback_lock);
+
+ sock_release(client_sock);
+ sock_release(server_sock);
+ sock_release(listen_sock);
+}
+
+static struct kunit_case siw_mpa_rx_cases[] = {
+ KUNIT_CASE(siw_mpa_write_underflow_rejected),
+ KUNIT_CASE(siw_mpa_write_minimum_valid_accepted),
+ KUNIT_CASE(siw_mpa_write_underflow_rejected_live_socket),
+ { }
+};
+
+static struct kunit_suite siw_mpa_rx_suite = {
+ .name = "siw_mpa_rx",
+ .test_cases = siw_mpa_rx_cases,
+};
+
+kunit_test_suite(siw_mpa_rx_suite);
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math
2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito
@ 2026-05-14 17:10 ` Bernard Metzler
2026-05-14 21:24 ` Jason Gunthorpe
1 sibling, 0 replies; 5+ messages in thread
From: Bernard Metzler @ 2026-05-14 17:10 UTC (permalink / raw)
To: Michael Bommarito, Jason Gunthorpe, Leon Romanovsky, linux-rdma
Cc: linux-kernel
On 13.05.2026 19:53, Michael Bommarito wrote:
> A malicious connected siw peer can send an iWARP FPDU whose MPA length
> field (c_hdr->mpa_len, 16 bit big-endian, peer-controlled) is smaller
> than the fixed DDP/RDMAP header for the announced opcode. Soft-iWARP
> parses the full header in siw_get_hdr() based on iwarp_pktinfo[opcode]
> .hdr_len, but never compares mpa_len against that header length.
>
> siw_tcp_rx_data() then derives
>
> srx->fpdu_part_rem = be16_to_cpu(mpa_len) - fpdu_part_rcvd
> + MPA_HDR_SIZE;
>
> where fpdu_part_rcvd equals iwarp_pktinfo[opcode].hdr_len at this
> point. For a tagged WRITE (hdr_len 16, MPA_HDR_SIZE 2) the smallest
> on-wire mpa_len of 0 yields fpdu_part_rem = -14, and any mpa_len below
> hdr_len - MPA_HDR_SIZE underflows to a negative int.
>
> The signed value then flows into siw_proc_write()/siw_proc_rresp() as
>
> bytes = min(srx->fpdu_part_rem, srx->skb_new);
>
> is handed to siw_check_mem() as an int len (whose interval check
> addr + len > mem->va + mem->len is satisfied for a valid base when
> len is negative), and reaches siw_rx_data() -> siw_rx_kva() /
> siw_rx_umem() -> skb_copy_bits() as a signed copy length. The header
> copy branch in skb_copy_bits() promotes that to size_t, producing a
> multi-gigabyte read.
>
> KASAN under a KUnit harness that drives the real kernel TCP receive
> path -- a loopback AF_INET socketpair, the malformed FPDU written via
> kernel_sendmsg, sk_data_ready firing in softirq, tcp_read_sock
> dispatching to siw_tcp_rx_data -- reports:
>
> BUG: KASAN: use-after-free in skb_copy_bits+0x284/0x480
> Read of size 4294967295 at addr ffff888...
> Call Trace:
> skb_copy_bits
> siw_rx_kva
> siw_rx_data
> siw_check_mem
> siw_proc_write
> siw_tcp_rx_data
> __tcp_read_sock
> siw_qp_llp_data_ready
> tcp_data_ready
> tcp_data_queue
>
> Add the missing invariant at the earliest point where the peer header
> is fully assembled. iwarp_pktinfo[*].hdr_len - MPA_HDR_SIZE is exactly
> the value the siw transmitter uses as the minimum mpa_len for each
> opcode (drivers/infiniband/sw/siw/siw_qp.c:33), so this matches the
> protocol contract. Out-of-range FPDUs terminate the connection with
> TERM_ERROR_LAYER_LLP / LLP_ETYPE_MPA / LLP_ECODE_FPDU_START -- which
> is RFC 5044 Section 8 error code 3 ("Marker and ULPDU Length fields
> do not agree on the start of an FPDU"), the correct framing-error
> class for this inconsistency.
>
> Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-7
> ---
> See cover letter for full root cause, series rationale, and test
> summary. [2/2] adds the KUnit regression harness used to validate
> this fix.
>
> drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c
> index e8a88b378d51..34d03584160c 100644
> --- a/drivers/infiniband/sw/siw/siw_qp_rx.c
> +++ b/drivers/infiniband/sw/siw/siw_qp_rx.c
> @@ -1081,6 +1081,21 @@ static int siw_get_hdr(struct siw_rx_stream *srx)
> return -EAGAIN;
> }
>
> + /*
> + * Peer-controlled mpa_len must not underflow srx->fpdu_part_rem
> + * in siw_tcp_rx_data(); a negative value flows as a signed copy
> + * length into siw_check_mem() and skb_copy_bits().
> + */
Excellent finding. This was an open gateway for all evil.
> + if (unlikely(be16_to_cpu(c_hdr->mpa_len) + MPA_HDR_SIZE <
> + iwarp_pktinfo[opcode].hdr_len)) {
> + pr_warn_ratelimited("siw: short mpa_len %u for opcode %u (hdr_len %u)\n",
I think we shall stay with 80 chars per line. So let's
wrap the above line.
Otherwise
Acked-by: Bernard Metzler <bernard.metzler@linux.dev>
> + be16_to_cpu(c_hdr->mpa_len), opcode,
> + iwarp_pktinfo[opcode].hdr_len);
> + siw_init_terminate(rx_qp(srx), TERM_ERROR_LAYER_LLP,
> + LLP_ETYPE_MPA, LLP_ECODE_FPDU_START, 0);
> + return -EINVAL;
> + }
> +
> /*
> * DDP/RDMAP header receive completed. Check if the current
> * DDP segment starts a new RDMAP message or continues a previously
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math
2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito
2026-05-14 17:10 ` Bernard Metzler
@ 2026-05-14 21:24 ` Jason Gunthorpe
1 sibling, 0 replies; 5+ messages in thread
From: Jason Gunthorpe @ 2026-05-14 21:24 UTC (permalink / raw)
To: Michael Bommarito
Cc: Bernard Metzler, Leon Romanovsky, linux-rdma, linux-kernel
On Wed, May 13, 2026 at 01:53:24PM -0400, Michael Bommarito wrote:
> A malicious connected siw peer can send an iWARP FPDU whose MPA length
> field (c_hdr->mpa_len, 16 bit big-endian, peer-controlled) is smaller
> than the fixed DDP/RDMAP header for the announced opcode. Soft-iWARP
> parses the full header in siw_get_hdr() based on iwarp_pktinfo[opcode]
> .hdr_len, but never compares mpa_len against that header length.
>
> siw_tcp_rx_data() then derives
>
> srx->fpdu_part_rem = be16_to_cpu(mpa_len) - fpdu_part_rcvd
> + MPA_HDR_SIZE;
>
> where fpdu_part_rcvd equals iwarp_pktinfo[opcode].hdr_len at this
> point. For a tagged WRITE (hdr_len 16, MPA_HDR_SIZE 2) the smallest
> on-wire mpa_len of 0 yields fpdu_part_rem = -14, and any mpa_len below
> hdr_len - MPA_HDR_SIZE underflows to a negative int.
>
> The signed value then flows into siw_proc_write()/siw_proc_rresp() as
>
> bytes = min(srx->fpdu_part_rem, srx->skb_new);
>
> is handed to siw_check_mem() as an int len (whose interval check
> addr + len > mem->va + mem->len is satisfied for a valid base when
> len is negative), and reaches siw_rx_data() -> siw_rx_kva() /
> siw_rx_umem() -> skb_copy_bits() as a signed copy length. The header
> copy branch in skb_copy_bits() promotes that to size_t, producing a
> multi-gigabyte read.
>
> KASAN under a KUnit harness that drives the real kernel TCP receive
> path -- a loopback AF_INET socketpair, the malformed FPDU written via
> kernel_sendmsg, sk_data_ready firing in softirq, tcp_read_sock
> dispatching to siw_tcp_rx_data -- reports:
>
> BUG: KASAN: use-after-free in skb_copy_bits+0x284/0x480
> Read of size 4294967295 at addr ffff888...
> Call Trace:
> skb_copy_bits
> siw_rx_kva
> siw_rx_data
> siw_check_mem
> siw_proc_write
> siw_tcp_rx_data
> __tcp_read_sock
> siw_qp_llp_data_ready
> tcp_data_ready
> tcp_data_queue
>
> Add the missing invariant at the earliest point where the peer header
> is fully assembled. iwarp_pktinfo[*].hdr_len - MPA_HDR_SIZE is exactly
> the value the siw transmitter uses as the minimum mpa_len for each
> opcode (drivers/infiniband/sw/siw/siw_qp.c:33), so this matches the
> protocol contract. Out-of-range FPDUs terminate the connection with
> TERM_ERROR_LAYER_LLP / LLP_ETYPE_MPA / LLP_ECODE_FPDU_START -- which
> is RFC 5044 Section 8 error code 3 ("Marker and ULPDU Length fields
> do not agree on the start of an FPDU"), the correct framing-error
> class for this inconsistency.
>
> Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-7
> Acked-by: Bernard Metzler <bernard.metzler@linux.dev>
> ---
> See cover letter for full root cause, series rationale, and test
> summary. [2/2] adds the KUnit regression harness used to validate
> this fix.
>
> drivers/infiniband/sw/siw/siw_qp_rx.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
Applied to for-rc
Thanks,
Jason
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-14 21:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 17:53 [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage Michael Bommarito
2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito
2026-05-14 17:10 ` Bernard Metzler
2026-05-14 21:24 ` Jason Gunthorpe
2026-05-13 17:53 ` [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing Michael Bommarito
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.