From: Bernard Metzler <bernard.metzler@linux.dev>
To: Michael Bommarito <michael.bommarito@gmail.com>,
Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>,
linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing
Date: Fri, 15 May 2026 12:39:17 +0200 [thread overview]
Message-ID: <aa21dc7c-1c68-4795-8de4-44e9021c5d04@linux.dev> (raw)
In-Reply-To: <20260513175325.2042630-3-michael.bommarito@gmail.com>
On 13.05.2026 19:53, Michael Bommarito wrote:
This is great code to create broken protocol headers and I
regard it very useful for testing during code development.
I am not sure though if it should become part of the siw
driver. In fact, do we want to distribute code which assists
in crashing remote systems which are running a non-fixed driver?
I don't think we should.
Furthermore, iWarp is a protocol running on top of TCP. So it is
rather simple to feed a user land TCP socket with any broken or
correct iWarp header without writing any extra kernel code.
Thanks,
Bernard.
> Add a KUnit suite (CONFIG_SIW_MPA_RX_KUNIT_TEST) that exercises the
> real siw_tcp_rx_data() path with three cases covering the MPA length
> validation added in the previous patch:
>
> - siw_mpa_write_underflow_rejected
> Constructs an sk_buff carrying a tagged RDMA WRITE FPDU whose
> mpa_len is one below iwarp_pktinfo[opcode].hdr_len -
> MPA_HDR_SIZE. Registers a REMOTE_WRITE MR in mem_xa so the
> WRITE path would otherwise reach siw_proc_write(), and calls
> siw_tcp_rx_data() directly. Asserts the FPDU is rejected with
> TERM(LLP/MPA/FPDU_START) and rx_suspend = 1.
>
> - siw_mpa_write_minimum_valid_accepted
> Regression control with mpa_len = hdr_len - MPA_HDR_SIZE (the
> smallest legal value, i.e. a zero-length WRITE). Asserts the
> new check does not fire: no terminate, rx_stream not
> suspended.
>
> - siw_mpa_write_underflow_rejected_live_socket
> Opens a loopback AF_INET socketpair via sock_create_kern(),
> attaches a struct siw_cep as sk_user_data so sk_to_qp()
> resolves to the test QP, and installs siw_qp_llp_data_ready as
> sk_data_ready on the victim socket. Writes the malformed FPDU
> via kernel_sendmsg from the attacker side. The kernel TCP
> stack delivers, sk_data_ready fires in softirq, and
> tcp_read_sock dispatches to siw_tcp_rx_data the same way a
> remote peer would. Asserts the same terminate state as the
> first case.
>
> The third case is the design driver: it confirms the bug-fix
> codepath fires from a real softirq RX entry, not just a synthetic
> direct call. On a stock siw tree the same harness reproduces the
> KASAN slab-out-of-bounds / use-after-free in skb_copy_bits.
>
> Bringing siw's loopback netdev up (case 3 binds 127.0.0.1) is done
> inline via dev_change_flags() under rtnl_lock since the KUnit
> environment does not run init scripts.
>
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> Assisted-by: Claude:claude-opus-4-7
> ---
> drivers/infiniband/sw/siw/Kconfig | 18 +
> drivers/infiniband/sw/siw/Makefile | 2 +
> drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c | 349 +++++++++++++++++++
> 3 files changed, 369 insertions(+)
> create mode 100644 drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c
>
> diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig
> index 186f182b80e7..b137f5920271 100644
> --- a/drivers/infiniband/sw/siw/Kconfig
> +++ b/drivers/infiniband/sw/siw/Kconfig
> @@ -18,3 +18,21 @@ config RDMA_SIW
> space verbs API, libibverbs. To implement RDMA over
> TCP/IP, the driver further interfaces with the Linux
> in-kernel TCP socket layer.
> +
> +config SIW_MPA_RX_KUNIT_TEST
> + bool "KUnit tests for Soft-iWARP MPA receive parsing" if !KUNIT_ALL_TESTS
> + depends on KUNIT && RDMA_SIW
> + default KUNIT_ALL_TESTS
> + help
> + Build KUnit regression tests for the Soft-iWARP MPA receive
> + state machine. The tests cover the MPA length consistency
> + check in siw_get_hdr(): malformed FPDUs whose mpa_len is
> + below the opcode's fixed DDP/RDMAP header must be rejected
> + with TERM(LLP/MPA/FPDU_START); the minimum-valid mpa_len
> + (zero-length WRITE) must still be accepted. One case drives
> + the real kernel TCP receive path via a loopback socketpair
> + so the same softirq sk_data_ready -> tcp_read_sock ->
> + siw_tcp_rx_data chain a remote peer would exercise is
> + covered.
> +
> + If unsure, say N.
> diff --git a/drivers/infiniband/sw/siw/Makefile b/drivers/infiniband/sw/siw/Makefile
> index f5f7e3867889..09d4c90d8758 100644
> --- a/drivers/infiniband/sw/siw/Makefile
> +++ b/drivers/infiniband/sw/siw/Makefile
> @@ -9,3 +9,5 @@ siw-y := \
> siw_qp_tx.o \
> siw_qp_rx.o \
> siw_verbs.o
> +
> +siw-$(CONFIG_SIW_MPA_RX_KUNIT_TEST) += siw_mpa_rx_kunit.o
> diff --git a/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c b/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c
> new file mode 100644
> index 000000000000..204b3213b840
> --- /dev/null
> +++ b/drivers/infiniband/sw/siw/siw_mpa_rx_kunit.c
> @@ -0,0 +1,349 @@
> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> +/*
> + * KUnit harness for siw MPA receive-state length validation.
> + *
> + * Internal to the SIW_MPA_LEN_UNDERFLOW_RX_COPY disclosure validation tree.
> + * Not part of the upstream patch.
> + *
> + * case 1: short mpa_len triggers the new siw_get_hdr() check via direct
> + * siw_tcp_rx_data() call with a constructed sk_buff
> + * - expects: TERM(LLP/MPA/FPDU_START), rx_suspend=1
> + * - under stock siw: KASAN slab-out-of-bounds in skb_copy_bits()
> + * - under patched siw: no splat, terminate state set
> + *
> + * case 2: minimum-valid mpa_len control (constructed sk_buff)
> + * - mpa_len = hdr_len - MPA_HDR_SIZE -> fpdu_part_rem = 0
> + * so siw_proc_write() takes the zero-length WRITE short path
> + * and returns 0 without calling skb_copy_bits().
> + * - expects: no TERM, state machine progressed normally
> + *
> + * case 3: real loopback TCP socketpair (the "live two-node" analog)
> + * - opens AF_INET TCP sockets in-kernel via sock_create_kern()
> + * - binds/listens on 127.0.0.1:0, connects, accepts
> + * - installs siw_qp_llp_data_ready on the victim socket and
> + * attaches a struct siw_cep so sk_to_qp() resolves to our qp
> + * - writes the malformed FPDU bytes via kernel_sendmsg on the
> + * attacker socket
> + * - the kernel TCP stack delivers, sk_data_ready fires, and
> + * siw_qp_llp_data_ready -> tcp_read_sock -> siw_tcp_rx_data
> + * runs in the normal kernel receive path
> + * - expects: TERM(LLP/MPA/FPDU_START) on the qp
> + */
> +
> +#include <kunit/test.h>
> +#include <linux/inet.h>
> +#include <linux/in.h>
> +#include <linux/netdevice.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/skbuff.h>
> +#include <linux/tcp.h>
> +#include <linux/wait.h>
> +#include <linux/xarray.h>
> +#include <net/sock.h>
> +#include <net/tcp.h>
> +#include <rdma/ib_verbs.h>
> +
> +#include "siw.h"
> +#include "siw_cm.h"
> +#include "siw_mem.h"
> +
> +static void siw_kunit_kfree_skb(void *skb)
> +{
> + kfree_skb(skb);
> +}
> +
> +struct siw_mpa_rx_ctx {
> + struct siw_device *sdev;
> + struct siw_qp *qp;
> + struct siw_mem *mem;
> + void *target;
> + u32 stag;
> +};
> +
> +static void siw_mpa_rx_setup(struct kunit *test, struct siw_mpa_rx_ctx *c)
> +{
> + void *xa_ret;
> +
> + c->stag = 0x00000100;
> +
> + c->sdev = kunit_kzalloc(test, sizeof(*c->sdev), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, c->sdev);
> + xa_init_flags(&c->sdev->mem_xa, XA_FLAGS_ALLOC1);
> +
> + c->qp = kunit_kzalloc(test, sizeof(*c->qp), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, c->qp);
> + c->qp->sdev = c->sdev;
> + c->qp->pd = kunit_kzalloc(test, sizeof(*c->qp->pd), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, c->qp->pd);
> + c->qp->rx_stream.state = SIW_GET_HDR;
> +
> + c->target = kunit_kzalloc(test, 64, GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, c->target);
> +
> + c->mem = kunit_kzalloc(test, sizeof(*c->mem), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, c->mem);
> + kref_init(&c->mem->ref);
> + c->mem->sdev = c->sdev;
> + c->mem->stag = c->stag;
> + c->mem->stag_valid = 1;
> + c->mem->va = (u64)(uintptr_t)c->target;
> + c->mem->len = 64;
> + c->mem->pd = c->qp->pd;
> + c->mem->perms = IB_ACCESS_REMOTE_WRITE;
> +
> + xa_ret = xa_store(&c->sdev->mem_xa, c->stag >> 8, c->mem, GFP_KERNEL);
> + KUNIT_ASSERT_FALSE(test, xa_is_err(xa_ret));
> +}
> +
> +static void siw_mpa_rx_run(struct kunit *test, struct siw_mpa_rx_ctx *c,
> + u16 mpa_len_val)
> +{
> + struct iwarp_rdma_write write = { };
> + struct sk_buff *skb;
> + read_descriptor_t rd_desc = { };
> + u8 payload[sizeof(write) + 1];
> +
> + write.ctrl.mpa_len = cpu_to_be16(mpa_len_val);
> + write.ctrl.ddp_rdmap_ctrl = DDP_FLAG_TAGGED | DDP_FLAG_LAST |
> + cpu_to_be16(DDP_VERSION << 8) |
> + cpu_to_be16(RDMAP_VERSION << 6) |
> + cpu_to_be16(RDMAP_RDMA_WRITE);
> + write.sink_stag = cpu_to_be32(c->stag);
> + write.sink_to = cpu_to_be64((u64)(uintptr_t)c->target);
> +
> + memcpy(payload, &write, sizeof(write));
> + payload[sizeof(write)] = 0x41;
> +
> + skb = alloc_skb(sizeof(payload), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, skb);
> + skb_put_data(skb, payload, sizeof(payload));
> + kunit_add_action_or_reset(test, siw_kunit_kfree_skb, skb);
> +
> + rd_desc.arg.data = c->qp;
> + rd_desc.count = sizeof(payload);
> +
> + siw_tcp_rx_data(&rd_desc, skb, 0, sizeof(payload));
> +}
> +
> +static void siw_mpa_write_underflow_rejected(struct kunit *test)
> +{
> + struct siw_mpa_rx_ctx c;
> +
> + siw_mpa_rx_setup(test, &c);
> +
> + /* mpa_len one byte short of the WRITE hdr_len - MPA_HDR_SIZE floor. */
> + siw_mpa_rx_run(test, &c,
> + sizeof(struct iwarp_rdma_write) - MPA_HDR_SIZE - 1);
> +
> + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.valid, 1);
> + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.layer,
> + (int)TERM_ERROR_LAYER_LLP);
> + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.etype,
> + (int)LLP_ETYPE_MPA);
> + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.ecode,
> + (int)LLP_ECODE_FPDU_START);
> + KUNIT_EXPECT_EQ(test, (int)c.qp->rx_stream.rx_suspend, 1);
> +}
> +
> +static void siw_mpa_write_minimum_valid_accepted(struct kunit *test)
> +{
> + struct siw_mpa_rx_ctx c;
> +
> + siw_mpa_rx_setup(test, &c);
> +
> + /*
> + * mpa_len == hdr_len - MPA_HDR_SIZE is the smallest legal value;
> + * it yields fpdu_part_rem = 0, i.e. a zero-length WRITE. The new
> + * length check in siw_get_hdr() must accept this.
> + */
> + siw_mpa_rx_run(test, &c,
> + sizeof(struct iwarp_rdma_write) - MPA_HDR_SIZE);
> +
> + KUNIT_EXPECT_EQ(test, (int)c.qp->term_info.valid, 0);
> + KUNIT_EXPECT_EQ(test, (int)c.qp->rx_stream.rx_suspend, 0);
> +}
> +
> +static int siw_mpa_rx_bring_up_lo(struct kunit *test)
> +{
> + struct net_device *lo;
> + int rv;
> +
> + rtnl_lock();
> + lo = __dev_get_by_name(&init_net, "lo");
> + KUNIT_ASSERT_NOT_NULL(test, lo);
> + if (!(lo->flags & IFF_UP))
> + rv = dev_change_flags(lo, lo->flags | IFF_UP, NULL);
> + else
> + rv = 0;
> + rtnl_unlock();
> + KUNIT_ASSERT_EQ(test, rv, 0);
> + return 0;
> +}
> +
> +static int siw_mpa_rx_make_pair(struct kunit *test, struct socket **listen,
> + struct socket **server, struct socket **client)
> +{
> + struct sockaddr_in addr = { .sin_family = AF_INET, };
> + struct sockaddr_in bound = { };
> + struct socket *l = NULL, *s = NULL, *c = NULL;
> + int rv;
> +
> + siw_mpa_rx_bring_up_lo(test);
> +
> + rv = sock_create_kern(&init_net, AF_INET, SOCK_STREAM, IPPROTO_TCP, &l);
> + KUNIT_ASSERT_EQ(test, rv, 0);
> +
> + addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
> + addr.sin_port = 0;
> + rv = kernel_bind(l, (struct sockaddr_unsized *)&addr, sizeof(addr));
> + KUNIT_ASSERT_EQ(test, rv, 0);
> +
> + rv = l->ops->getname(l, (struct sockaddr *)&bound, 0);
> + KUNIT_ASSERT_GT(test, rv, 0);
> +
> + rv = kernel_listen(l, 1);
> + KUNIT_ASSERT_EQ(test, rv, 0);
> +
> + rv = sock_create_kern(&init_net, AF_INET, SOCK_STREAM, IPPROTO_TCP, &c);
> + KUNIT_ASSERT_EQ(test, rv, 0);
> +
> + rv = kernel_connect(c, (struct sockaddr_unsized *)&bound,
> + sizeof(bound), 0);
> + KUNIT_ASSERT_EQ(test, rv, 0);
> +
> + rv = kernel_accept(l, &s, 0);
> + KUNIT_ASSERT_EQ(test, rv, 0);
> +
> + *listen = l;
> + *server = s;
> + *client = c;
> + return 0;
> +}
> +
> +static void siw_mpa_write_underflow_rejected_live_socket(struct kunit *test)
> +{
> + struct siw_device *sdev;
> + struct siw_qp *qp;
> + struct siw_cep *cep;
> + struct siw_mem *mem;
> + struct socket *listen_sock = NULL, *server_sock = NULL, *client_sock = NULL;
> + struct iwarp_rdma_write write = { };
> + struct kvec iov;
> + struct msghdr msg = { };
> + void *xa_ret, *target;
> + u8 payload[sizeof(write) + 1];
> + u32 stag = 0x00000100;
> + int rv, i;
> +
> + sdev = kunit_kzalloc(test, sizeof(*sdev), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, sdev);
> + xa_init_flags(&sdev->mem_xa, XA_FLAGS_ALLOC1);
> +
> + qp = kunit_kzalloc(test, sizeof(*qp), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, qp);
> + qp->sdev = sdev;
> + qp->pd = kunit_kzalloc(test, sizeof(*qp->pd), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, qp->pd);
> + qp->rx_stream.state = SIW_GET_HDR;
> + init_rwsem(&qp->state_lock);
> + qp->attrs.state = SIW_QP_STATE_RTS;
> + qp->cep = NULL;
> +
> + /* Register a valid REMOTE_WRITE memory object. On unpatched siw
> + * this is what lets the negative-length copy reach skb_copy_bits;
> + * without an MR the STag lookup in siw_proc_write() returns NULL
> + * and the WRITE is terminated before the underflow primitive fires.
> + * With this patch in place, the new siw_get_hdr() check rejects
> + * the FPDU BEFORE STag lookup, so the MR is unused. We keep it in
> + * the test so unpatched-kernel reruns also exercise the full path.
> + */
> + target = kunit_kzalloc(test, 64, GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, target);
> + mem = kunit_kzalloc(test, sizeof(*mem), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, mem);
> + kref_init(&mem->ref);
> + mem->sdev = sdev;
> + mem->stag = stag;
> + mem->stag_valid = 1;
> + mem->va = (u64)(uintptr_t)target;
> + mem->len = 64;
> + mem->pd = qp->pd;
> + mem->perms = IB_ACCESS_REMOTE_WRITE;
> + xa_ret = xa_store(&sdev->mem_xa, stag >> 8, mem, GFP_KERNEL);
> + KUNIT_ASSERT_FALSE(test, xa_is_err(xa_ret));
> +
> + /* siw_qp_llp_data_ready dereferences sk_user_data as siw_cep. */
> + cep = kunit_kzalloc(test, sizeof(*cep), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, cep);
> + cep->qp = qp;
> + spin_lock_init(&cep->lock);
> + kref_init(&cep->ref);
> +
> + rv = siw_mpa_rx_make_pair(test, &listen_sock, &server_sock, &client_sock);
> + KUNIT_ASSERT_EQ(test, rv, 0);
> +
> + write_lock_bh(&server_sock->sk->sk_callback_lock);
> + server_sock->sk->sk_user_data = cep;
> + server_sock->sk->sk_data_ready = siw_qp_llp_data_ready;
> + qp->attrs.sk = server_sock;
> + write_unlock_bh(&server_sock->sk->sk_callback_lock);
> +
> + write.ctrl.mpa_len =
> + cpu_to_be16(sizeof(write) - MPA_HDR_SIZE - 1);
> + write.ctrl.ddp_rdmap_ctrl = DDP_FLAG_TAGGED | DDP_FLAG_LAST |
> + cpu_to_be16(DDP_VERSION << 8) |
> + cpu_to_be16(RDMAP_VERSION << 6) |
> + cpu_to_be16(RDMAP_RDMA_WRITE);
> + write.sink_stag = cpu_to_be32(stag);
> + write.sink_to = cpu_to_be64((u64)(uintptr_t)target);
> +
> + memcpy(payload, &write, sizeof(write));
> + payload[sizeof(write)] = 0x41;
> +
> + iov.iov_base = payload;
> + iov.iov_len = sizeof(payload);
> + rv = kernel_sendmsg(client_sock, &msg, &iov, 1, sizeof(payload));
> + KUNIT_ASSERT_EQ(test, rv, (int)sizeof(payload));
> +
> + /* Wait for TCP to deliver bytes and sk_data_ready to fire. */
> + for (i = 0; i < 200; i++) {
> + if (qp->term_info.valid)
> + break;
> + msleep(20);
> + }
> +
> + KUNIT_EXPECT_EQ(test, (int)qp->term_info.valid, 1);
> + KUNIT_EXPECT_EQ(test, (int)qp->term_info.layer,
> + (int)TERM_ERROR_LAYER_LLP);
> + KUNIT_EXPECT_EQ(test, (int)qp->term_info.etype,
> + (int)LLP_ETYPE_MPA);
> + KUNIT_EXPECT_EQ(test, (int)qp->term_info.ecode,
> + (int)LLP_ECODE_FPDU_START);
> + KUNIT_EXPECT_EQ(test, (int)qp->rx_stream.rx_suspend, 1);
> +
> + /* Detach our handler before tearing down sockets so the TCP stack
> + * cannot call into freed kunit memory after the test.
> + */
> + write_lock_bh(&server_sock->sk->sk_callback_lock);
> + server_sock->sk->sk_user_data = NULL;
> + server_sock->sk->sk_data_ready = sock_def_readable;
> + write_unlock_bh(&server_sock->sk->sk_callback_lock);
> +
> + sock_release(client_sock);
> + sock_release(server_sock);
> + sock_release(listen_sock);
> +}
> +
> +static struct kunit_case siw_mpa_rx_cases[] = {
> + KUNIT_CASE(siw_mpa_write_underflow_rejected),
> + KUNIT_CASE(siw_mpa_write_minimum_valid_accepted),
> + KUNIT_CASE(siw_mpa_write_underflow_rejected_live_socket),
> + { }
> +};
> +
> +static struct kunit_suite siw_mpa_rx_suite = {
> + .name = "siw_mpa_rx",
> + .test_cases = siw_mpa_rx_cases,
> +};
> +
> +kunit_test_suite(siw_mpa_rx_suite);
prev parent reply other threads:[~2026-05-15 10:39 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 17:53 [PATCH 0/2] RDMA/siw: fix MPA FPDU length underflow + add KUnit coverage Michael Bommarito
2026-05-13 17:53 ` [PATCH 1/2] RDMA/siw: reject MPA FPDU length underflow before signed receive math Michael Bommarito
2026-05-14 17:10 ` Bernard Metzler
2026-05-14 21:24 ` Jason Gunthorpe
2026-05-13 17:53 ` [PATCH 2/2] RDMA/siw: add KUnit tests for MPA receive parsing Michael Bommarito
2026-05-15 10:39 ` Bernard Metzler [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa21dc7c-1c68-4795-8de4-44e9021c5d04@linux.dev \
--to=bernard.metzler@linux.dev \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=michael.bommarito@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox