[PATCH] net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive
@ 2026-06-10  8:48 Bryam Vargas
  2026-06-11  8:48 ` sashiko-bot
  2026-06-14  0:31 ` Jakub Kicinski
  0 siblings, 2 replies; 4+ messages in thread
From: Bryam Vargas @ 2026-06-10  8:48 UTC (permalink / raw)
  To: D . Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang
  Cc: Mahanta Jambigi, Tony Lu, Wen Gu, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stefan Raspl,
	Ursula Braun, linux-rdma, linux-s390, netdev, linux-kernel

The SMC CDC receive path copies the peer's producer cursor -- a
wire-controlled value -- into the local connection cursor with no upper
bound against the receive buffer (RMB). A malicious peer can advertise a
producer cursor past rmb_desc->len, which is then used out of bounds:

 - the urgent path uses the cursor count as a raw index:
   smc_cdc_handle_urg_data_arrival() dereferences
   *(rmb_desc->cpu_addr + rx_off + urg_curs.count - 1);

 - the receive path turns the cursor into a length:
   smc_cdc_msg_recv_action() feeds it to smc_curs_diff() and
   atomic_add()s the result into conn->bytes_to_rcv. The differing-wrap
   branch returns (len - old.count) + new.count, which exceeds len for a
   forged cursor and accumulates across CDCs, so bytes_to_rcv grows past
   rmb_desc->len even when the cursor count itself is bounded;
   smc_rx_recvmsg() then copies the wrap-around second chunk past the RMB.

The RMB is a kernel allocation, so the reads disclose adjacent kernel
memory, and a cursor pointing at an unmapped offset faults in the receive
tasklet (softirq). Both transports are affected: SMC-D converts the
cursor in smcd_cdc_msg_to_host() and SMC-R in smc_cdc_cursor_to_host(),
and neither bounds the count.

Bound the producer cursor count to the RMB at both conversion boundaries
(this closes the urgent index on SMC-D and SMC-R) and enforce the
documented "0 <= bytes_to_rcv <= rmb_desc->len" invariant in
smc_cdc_msg_recv_action() (this closes the receive length, which the
cursor-count clamp alone cannot because of the differing-wrap diff and
the cumulative atomic_add).

Fixes: de8474eb9d50 ("net/smc: urgent data support")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
Reproduced under KASAN on 6.12.y. A forged producer cursor with
prod.count = rmb_desc->len + 1 and prod_flags.urg_data_present set produces
a 1-byte out-of-bounds read in smc_cdc_handle_urg_data_arrival()
(slab-out-of-bounds Read of size 1); the cursor-count clamp makes the same
input in-bounds. A wrap-flipped cursor drives bytes_to_rcv past
rmb_desc->len across several CDCs so smc_rx_recvmsg() over-reads; the
bytes_to_rcv invariant keeps it bounded. SMC-D was exercised over the
in-kernel loopback-ism and the SMC-R converter (smc_cdc_cursor_to_host)
over an emulated RDMA loopback. Clamping prod.count alone does not bound
the recv length, hence the separate bytes_to_rcv hunk.

 net/smc/smc_cdc.c |  2 ++
 net/smc/smc_cdc.h | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index 619b3bab3824..738c45fd5cd0 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -382,6 +382,8 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
 		smp_mb__before_atomic();
 		atomic_add(diff_prod, &conn->bytes_to_rcv);
 		/* guarantee 0 <= bytes_to_rcv <= rmb_desc->len */
+		if (atomic_read(&conn->bytes_to_rcv) > conn->rmb_desc->len)
+			atomic_set(&conn->bytes_to_rcv, conn->rmb_desc->len);
 		smp_mb__after_atomic();
 		smc->sk.sk_data_ready(&smc->sk);
 	} else {
diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h
index 696cc11f2303..7fa6e0d3817f 100644
--- a/net/smc/smc_cdc.h
+++ b/net/smc/smc_cdc.h
@@ -230,6 +230,12 @@ static inline void smc_cdc_cursor_to_host(union smc_host_cursor *local,
 	smc_curs_copy_net(&net, peer, conn);
 	temp.count = ntohl(net.count);
 	temp.wrap = ntohs(net.wrap);
+	/* the peer producer cursor is wire-controlled; bound the SMC-R count to
+	 * our RMB before it is used as a raw index by the urgent path, mirroring
+	 * the SMC-D conversion in smcd_cdc_msg_to_host().
+	 */
+	if (temp.count > conn->rmb_desc->len)
+		temp.count = conn->rmb_desc->len;
 	if ((old.wrap > temp.wrap) && temp.wrap)
 		return;
 	if ((old.wrap == temp.wrap) &&
@@ -260,6 +266,12 @@ static inline void smcd_cdc_msg_to_host(struct smc_host_cdc_msg *local,
 
 	temp.wrap = peer->prod.wrap;
 	temp.count = peer->prod.count;
+	/* the peer producer cursor is wire-controlled; a count past our RMB is
+	 * used as a raw index by the urgent path (smc_cdc_handle_urg_data_arrival)
+	 * and as a length by the recv path.  Bound it to the RMB.
+	 */
+	if (temp.count > conn->rmb_desc->len)
+		temp.count = conn->rmb_desc->len;
 	smc_curs_copy(&local->prod, &temp, conn);
 
 	temp.wrap = peer->cons.wrap;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive
  2026-06-10  8:48 [PATCH] net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive Bryam Vargas
@ 2026-06-11  8:48 ` sashiko-bot
  2026-06-14  0:31 ` Jakub Kicinski
  1 sibling, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-06-11  8:48 UTC (permalink / raw)
  To: Bryam Vargas; +Cc: Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-s390

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Incorrect clamping of SMC-R consumer cursor permanently stalls connections with asymmetric buffer sizes.
- [High] A Time-of-Check to Time-of-Use (TOCTOU) race condition in clamping `conn->bytes_to_rcv` allows out-of-bounds reads and data corruption.

Pre-existing issues:
- [Critical] Unbounded consumer cursor in SMC-D leaves a kernel heap buffer overflow vulnerability unpatched in the send path.
--

commit c64247d450ada1697a0e227038dd52be750039e2
Author: Bryam Vargas <hexlabsecurity@proton.me>

net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive

This commit adds bounds checking to the peer producer cursor in the SMC-D
and SMC-R receive paths to prevent out-of-bounds reads. It also clamps the
bytes_to_rcv metric to the receive buffer length.

> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
> index 619b3bab38248..738c45fd5cd08 100644
> --- a/net/smc/smc_cdc.c
> +++ b/net/smc/smc_cdc.c
> @@ -382,6 +382,8 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
>  		smp_mb__before_atomic();
>  		atomic_add(diff_prod, &conn->bytes_to_rcv);
>  		/* guarantee 0 <= bytes_to_rcv <= rmb_desc->len */
> +		if (atomic_read(&conn->bytes_to_rcv) > conn->rmb_desc->len)
> +			atomic_set(&conn->bytes_to_rcv, conn->rmb_desc->len);

[Severity: High]
Does this non-atomic clamping sequence introduce a race condition?

If a peer sends a forged producer cursor to inflate diff_prod, bytes_to_rcv
will temporarily exceed rmb_desc->len during the window between the
atomic_add() and the atomic_set().

If smc_rx_recvmsg() is running concurrently in process context on another
CPU, could it read this transiently inflated value?

net/smc/smc_rx.c:smc_rx_recvmsg() {
    ...
    copylen = min_t(size_t, read_remaining, readable);
    ...
    chunk_len = copylen - chunk_len; /* remainder */
    ...
    rc = memcpy_to_msg(msg, rcvbuf_base + chunk_off, chunk_len);
}

If smc_rx_recvmsg() calculates a copylen larger than the buffer length based
on the oversized bytes_to_rcv, would this lead to an out-of-bounds read on
the second chunk?

Additionally, if smc_rx_recvmsg() executes an atomic_sub() during this exact
window, wouldn't the subsequent atomic_set() overwrite the subtracted value,
artificially inflating bytes_to_rcv again?

>  		smp_mb__after_atomic();
>  		smc->sk.sk_data_ready(&smc->sk);

[ ... ]

> diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h
> index 696cc11f2303b..7fa6e0d3817f8 100644
> --- a/net/smc/smc_cdc.h
> +++ b/net/smc/smc_cdc.h
> @@ -230,6 +230,12 @@ static inline void smc_cdc_cursor_to_host(union smc_host_cursor *local,
>  	smc_curs_copy_net(&net, peer, conn);
>  	temp.count = ntohl(net.count);
>  	temp.wrap = ntohs(net.wrap);
> +	/* the peer producer cursor is wire-controlled; bound the SMC-R count to
> +	 * our RMB before it is used as a raw index by the urgent path, mirroring
> +	 * the SMC-D conversion in smcd_cdc_msg_to_host().
> +	 */
> +	if (temp.count > conn->rmb_desc->len)
> +		temp.count = conn->rmb_desc->len;

[Severity: High]
Does this hardcoded clamp permanently stall connections that use asymmetric
buffer sizes?

This function is used by smcr_cdc_msg_to_host() to parse both the producer
and consumer cursors:

net/smc/smc_cdc.h:smcr_cdc_msg_to_host() {
    smc_cdc_cursor_to_host(&local->prod, &peer->prod, conn);
    smc_cdc_cursor_to_host(&local->cons, &peer->cons, conn);
}

The consumer cursor represents the peer's read position within its own
receive buffer, so its natural limit is conn->peer_rmbe_size. If a
connection has a peer_rmbe_size larger than the local rmb_desc->len,
legitimate consumer cursor updates will be truncated by this new clamp.

When smc_curs_diff() computes the consumed space using this truncated
cursor, it yields zero. Does this prevent peer_rmbe_space from being
replenished, causing the sender to falsely believe the peer's buffer is
full and stalling the connection?

>  	if ((old.wrap > temp.wrap) && temp.wrap)
>  		return;

[ ... ]

> @@ -260,6 +266,12 @@ static inline void smcd_cdc_msg_to_host(struct smc_host_cdc_msg *local,
>  
>  	temp.wrap = peer->prod.wrap;
>  	temp.count = peer->prod.count;
> +	/* the peer producer cursor is wire-controlled; a count past our RMB is
> +	 * used as a raw index by the urgent path (smc_cdc_handle_urg_data_arrival)
> +	 * and as a length by the recv path.  Bound it to the RMB.
> +	 */
> +	if (temp.count > conn->rmb_desc->len)
> +		temp.count = conn->rmb_desc->len;
>  	smc_curs_copy(&local->prod, &temp, conn);
>  
>  	temp.wrap = peer->cons.wrap;

[Severity: Critical]
This is a pre-existing issue, but it appears the consumer cursor in
smcd_cdc_msg_to_host() is still left unbounded.

While the producer cursor is clamped, the consumer cursor is copied
directly:

net/smc/smc_cdc.h:smcd_cdc_msg_to_host() {
    temp.wrap = peer->cons.wrap;
    temp.count = peer->cons.count;
    smc_curs_copy(&local->cons, &temp, conn);
}

If a peer sends an SMC-D CDC message with an unexpectedly large consumer
count, this unbounded cursor is passed to smc_curs_diff() in
smc_cdc_msg_recv_action(), which generates an oversized diff_tx and inflates
conn->sndbuf_space past conn->sndbuf_desc->len.

When the local sender transmits data, can this inflated sndbuf_space allow
copylen to exceed the send buffer size in smc_tx_sendmsg()?

net/smc/smc_tx.c:smc_tx_sendmsg() {
    ...
    chunk_len = copylen - chunk_len; /* remainder */
    ...
    rc = memcpy_from_msg(sndbuf_base + chunk_off, msg, chunk_len);
}

During wrap-around, wouldn't the second chunk's length exceed the allocated
buffer bounds, resulting in an out-of-bounds memcpy_from_msg() into
adjacent memory?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260610084803.186516-1-hexlabsecurity@proton.me?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive
  2026-06-10  8:48 [PATCH] net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive Bryam Vargas
  2026-06-11  8:48 ` sashiko-bot
@ 2026-06-14  0:31 ` Jakub Kicinski
  2026-06-14  8:32   ` Bryam Vargas
  1 sibling, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2026-06-14  0:31 UTC (permalink / raw)
  To: hexlabsecurity
  Cc: Jakub Kicinski, alibuda, dust.li, sidraya, wenjia, mjambigi,
	tonylu, guwen, davem, edumazet, pabeni, horms, raspl, ubraun,
	linux-rdma, linux-s390, netdev, linux-kernel

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive

This patch attempts to close an out-of-bounds read in the SMC CDC
receive path by clamping the peer producer cursor and the
bytes_to_rcv accumulator against rmb_desc->len.  A couple of
questions on the approach below.

> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
> index 619b3bab3824..738c45fd5cd0 100644
> --- a/net/smc/smc_cdc.c
> +++ b/net/smc/smc_cdc.c
> @@ -382,6 +382,8 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
>  		smp_mb__before_atomic();
>  		atomic_add(diff_prod, &conn->bytes_to_rcv);
>  		/* guarantee 0 <= bytes_to_rcv <= rmb_desc->len */
> +		if (atomic_read(&conn->bytes_to_rcv) > conn->rmb_desc->len)
> +			atomic_set(&conn->bytes_to_rcv, conn->rmb_desc->len);

[Critical]
Is this clamp safe against a concurrent smc_rx_recvmsg() on another
CPU?

smc_cdc_msg_recv() runs the BH path under bh_lock_sock() (slock with
BH disabled), while smc_rx_recvmsg() runs under lock_sock(), which
sets sk_lock.owned but releases the slock for the body of the
syscall.  The body contains:

    /* update cursors */
    if (!(flags & MSG_PEEK)) {
        smp_mb__before_atomic();
        atomic_sub(copylen, &conn->bytes_to_rcv);
        smp_mb__after_atomic();
        ...
    }

so the BH on CPU0 and the recvmsg body on CPU1 can run concurrently
on the same connection.

The new clamp here is a non-atomic read-then-set:

    atomic_add(diff_prod, &conn->bytes_to_rcv);
    if (atomic_read(&conn->bytes_to_rcv) > conn->rmb_desc->len)
        atomic_set(&conn->bytes_to_rcv, conn->rmb_desc->len);

With L = rmb_desc->len, the commit message notes that the
differing-wrap branch of smc_curs_diff() can return up to
(L - old.count) + new.count, i.e. close to 2L, even after the count
itself is clamped.  Can the following interleaving happen?

  CPU0 (BH)                                CPU1 (recvmsg)
  atomic_add(diff_prod) -> X (~2L)
                                           readable =
                                             atomic_read(&bytes_to_rcv) = X
                                           copylen = min(remaining, X)
                                           chunk_len = min(copylen,
                                                           L - cons.count)
                                           second chunk = copylen - chunk_len
                                           memcpy_to_msg(rcvbuf_base + 0,
                                                         second_chunk_len)
  atomic_read() -> X
  atomic_set(&bytes_to_rcv, L)

With cons.count near L/2 and copylen near 2L, the second chunk reads
roughly 3L/2 bytes starting at rcvbuf_base + 0, walking past the end
of the RMB allocation.  Is that not the same OOB read this patch is
meant to close?

A second variant of the same race seems to silently corrupt the
counter:

  CPU0 (BH)                                CPU1 (recvmsg)
  atomic_add(diff_prod)        -> X > L
                                           atomic_sub(Y) -> X - Y
  atomic_read()                -> X - Y
  (X - Y still > L?)
  atomic_set(L)

If X - Y is still above L, CPU0 overwrites the consumer's decrement
and bytes_to_rcv ends up at L even though Y bytes were already
consumed, leaving a persistent over-report that future recvmsg calls
use as readable.

smp_mb__after_atomic() does not turn this read-then-set into an RMW;
atomic_set() is a relaxed store.  Would clamping diff_prod against
(L - atomic_read(&bytes_to_rcv)) before the atomic_add (or using an
atomic_cmpxchg loop) close both windows?

>  		smp_mb__after_atomic();
>  		smc->sk.sk_data_ready(&smc->sk);
>  	} else {
> diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h
> index 696cc11f2303..7fa6e0d3817f 100644
> --- a/net/smc/smc_cdc.h
> +++ b/net/smc/smc_cdc.h
> @@ -230,6 +230,12 @@ static inline void smc_cdc_cursor_to_host(union smc_host_cursor *local,
>  	smc_curs_copy_net(&net, peer, conn);
>  	temp.count = ntohl(net.count);
>  	temp.wrap = ntohs(net.wrap);
> +	/* the peer producer cursor is wire-controlled; bound the SMC-R count to
> +	 * our RMB before it is used as a raw index by the urgent path, mirroring
> +	 * the SMC-D conversion in smcd_cdc_msg_to_host().
> +	 */
> +	if (temp.count > conn->rmb_desc->len)
> +		temp.count = conn->rmb_desc->len;

[High]
Should this clamp live in the shared helper, or only on the prod
call site?

smc_cdc_cursor_to_host() is invoked twice from smcr_cdc_msg_to_host():

    smc_cdc_cursor_to_host(&local->prod, &peer->prod, conn);
    smc_cdc_cursor_to_host(&local->cons, &peer->cons, conn);

The producer cursor is bounded by our local RMB (conn->rmb_desc->len),
but the consumer cursor lives in the peer's RMB space and is bounded
by conn->peer_rmbe_size, which is set independently in
smcr_conn_save_peer_info() from the peer's CLC bufsize.

smc_cdc_msg_recv_action() confirms the asymmetry:

    diff_cons = smc_curs_diff(conn->peer_rmbe_size, &cons_old,
                              &conn->local_rx_ctrl.cons);
    ...
    atomic_add(diff_cons, &conn->peer_rmbe_space);

so peer_rmbe_size (not rmb_desc->len) is the right bound for the cons
cursor.

Whenever peer_rmbe_size > rmb_desc->len, legitimate peer cons.count
values in (rmb_desc->len, peer_rmbe_size] are silently truncated to
rmb_desc->len here, diff_cons is under-credited and
peer_rmbe_space stays artificially low, which can stall TX flow
control on perfectly well-behaved peers.

The SMC-D companion in this same patch is structured the way the
prod/cons asymmetry suggests -- it clamps prod and copies cons
unmodified:

> @@ -260,6 +266,12 @@ static inline void smcd_cdc_msg_to_host(struct smc_host_cdc_msg *local,
>  
>  	temp.wrap = peer->prod.wrap;
>  	temp.count = peer->prod.count;
> +	/* the peer producer cursor is wire-controlled; a count past our RMB is
> +	 * used as a raw index by the urgent path (smc_cdc_handle_urg_data_arrival)
> +	 * and as a length by the recv path.  Bound it to the RMB.
> +	 */
> +	if (temp.count > conn->rmb_desc->len)
> +		temp.count = conn->rmb_desc->len;
>  	smc_curs_copy(&local->prod, &temp, conn);
>  
>  	temp.wrap = peer->cons.wrap;

Should the SMC-R hunk be moved out of the shared smc_cdc_cursor_to_host()
helper and onto the prod call site in smcr_cdc_msg_to_host(), to match
the SMC-D shape?
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive
  2026-06-14  0:31 ` Jakub Kicinski
@ 2026-06-14  8:32   ` Bryam Vargas
  0 siblings, 0 replies; 4+ messages in thread
From: Bryam Vargas @ 2026-06-14  8:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: D . Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	Mahanta Jambigi, Tony Lu, Wen Gu, David S . Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Stefan Raspl, Ursula Braun, linux-rdma,
	linux-s390, netdev, linux-kernel

On Sat, 13 Jun 2026 17:31:11 -0700, Jakub Kicinski wrote:
> Is this clamp safe against a concurrent smc_rx_recvmsg() on another CPU?

Confirmed -- the tasklet read-then-set is racy: a recvmsg()/sendmsg() on
another CPU reads the inflated value in the window between the
atomic_add() and the clamp (recvmsg() runs under lock_sock(), which
leaves the slock free, so it is not serialized against the
bh_lock_sock() CDC tasklet). Reworked as a v3 series:

  https://lore.kernel.org/netdev/20260614-b4-disp-edd64be9-v3-0-551fa514257e@proton.me/

The bound now lives at the consumer (smc_rx_recvmsg() / smc_tx_sendmsg()),
where it is race-free; it also rejects a sign-overflowed (negative)
accumulator (per the sashiko-bot review on the sndbuf_space patch); and
the producer-cursor clamp is applied to the producer cursor only, so the
consumer cursor stays bounded by peer_rmbe_size, not rmb_desc->len. The
sndbuf_space fix is folded in as patch 3/3.

Thanks for the review.

Bryam

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-14  8:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10  8:48 [PATCH] net/smc: bound the peer producer cursor on SMC-D and SMC-R CDC receive Bryam Vargas
2026-06-11  8:48 ` sashiko-bot
2026-06-14  0:31 ` Jakub Kicinski
2026-06-14  8:32   ` Bryam Vargas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.