Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Leon Romanovsky <leonro@nvidia.com>
To: Michael Bommarito <michael.bommarito@gmail.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>, <linux-rdma@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <stable@vger.kernel.org>,
	Vlad Dumitrescu <vdumitrescu@nvidia.com>,
	Or Har-Toov <ohartoov@nvidia.com>,
	Bob Pearson <rpearsonhpe@gmail.com>,
	Sean Hefty <shefty@nvidia.com>, Kees Cook <kees@kernel.org>
Subject: Re: [PATCH] IB/mad: cap RMPP reassembly window size to bound find_seg_location walk
Date: Tue, 19 May 2026 17:46:53 +0300	[thread overview]
Message-ID: <20260519144653.GZ33515@unreal> (raw)
In-Reply-To: <20260518212336.337104-1-michael.bommarito@gmail.com>

On Mon, May 18, 2026 at 05:23:36PM -0400, Michael Bommarito wrote:
> A peer on the same InfiniBand subnet or RoCEv2 L2 (or any UDP/4791-
> reachable peer for internet-exposed RoCEv2 ports) can pin a target
> port's IB MAD kworker for milliseconds per low-bandwidth RMPP burst
> by sending an RMPP management transaction with descending segment
> numbers. QP1 GMP traffic is unauthenticated by IBTA spec, so no
> credentials are required. The bug sits on the IB management path
> (QP1 GMP RMPP reassembly), not the RDMA data plane, so RDMA verbs
> throughput is unaffected; deployments that raise recv_queue_size to
> tune management-plane throughput are quadratically more exposed,
> because per-burst cost grows O(F^2) with the configured window.
> 
> drivers/infiniband/core/mad_rmpp.c::find_seg_location() walks
> rmpp_recv->rmpp_wc->rmpp_list in reverse on every inbound RMPP DATA
> segment to locate the insertion point keyed by segment number. The
> walk is O(N) per insert under spin_lock_irqsave(&rmpp_recv->lock) in
> kworker context, so F adversarially-reordered segments aggregate to
> O(F^2). window_size() returns max(recv_queue.max_active >> 3, 1):
> the IB MAD core default recv_queue_size of 512 yields window=64
> (per-burst cost in the microsecond range), but tuned production
> configs with recv_queue_size=8192 push window to 1024 and let a
> single low-bandwidth burst pin the per-port MAD kworker for several
> milliseconds.
> 
> Cap the effective window at IB_MAD_RMPP_MAX_WINDOW = 64 in
> window_size() so admins tuning recv_queue_size for higher RX throughput
> do not enlarge the walker attack surface. Real RMPP transactions in
> the wild (SA queries, perf-counter reads) are well served by a window
> of 64, which is also the IB MAD core default. A structural follow-up
> would convert rmpp_recv->rmpp_wc->rmpp_list to an rb_tree keyed by
> seg_num and lift the cap; that mirrors tcp_data_queue_ofo post-
> CVE-2018-5390. For now the cap suffices.
> 
> Fixes: fa619a77046b ("[PATCH] IB: Add RMPP implementation")
> Cc: stable@vger.kernel.org
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> I reproduced this under x86_64 QEMU/KVM (4 vCPUs) on v7.1-rc2 with
> CONFIG_RDMA_RXE + CONFIG_INFINIBAND_USER_MAD, a veth pair carrying
> two rdma_rxe links, and raw RoCEv2/UDP/4791 packet injection with
> descending seg_num while holding seg #1. Without the cap, F=1024
> burst produces 1022 paired continue_rmpp invocations whose per-call
> walker duration grows from ~1 us (early, near-empty list) to ~5 us
> (late, ~1000-deep list), a 4x per-call amplification as the queue
> deepens, with aggregate walker time per burst >= 1.5 ms (lower bound,
> ftrace 1 us granularity). With the cap, the same F=1024 burst drops
> to ~0.28 ms aggregate (5.4x reduction); F=32 in-window legitimate
> RMPP still completes normally (30 walker calls, avg 1.5 us, max 3 us).
> tools/testing/selftests/drivers/net/rdma/ carries no RMPP-specific
> selftest in v7.1-rc2 (rdma_rxe self-tests do not exercise QP1 GMP
> RMPP reassembly), so no in-tree selftest delta to report.
> 
>  drivers/infiniband/core/mad_rmpp.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)

Please rewrite this patch in human language without AI slop.
While working on this, please ensure that your commit message clearly
explains why the change is needed and what issue it actually resolves.

Thanks

      reply	other threads:[~2026-05-19 14:47 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18 21:23 [PATCH] IB/mad: cap RMPP reassembly window size to bound find_seg_location walk Michael Bommarito
2026-05-19 14:46 ` Leon Romanovsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260519144653.GZ33515@unreal \
    --to=leonro@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=kees@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=michael.bommarito@gmail.com \
    --cc=ohartoov@nvidia.com \
    --cc=rpearsonhpe@gmail.com \
    --cc=shefty@nvidia.com \
    --cc=stable@vger.kernel.org \
    --cc=vdumitrescu@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox