public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] tcp: do not shrink window clamp when SO_RCVBUF is locked
@ 2026-04-27 15:27 Ankit Jain
  2026-04-27 15:38 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Ankit Jain @ 2026-04-27 15:27 UTC (permalink / raw)
  To: netdev, davem, dsahern, edumazet, ncardwell, kuniyu, kuba, pabeni,
	horms, quic_stranche, quic_subashab
  Cc: linux-kernel, karen.badiryan, ajay.kaher, alexey.makhalov,
	vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu, Ankit Jain,
	stable

When an application explicitly sets SO_RCVBUF, the window clamp should
not be dynamically recalculated based on the memory scaling_ratio.

Currently, tcp_measure_rcv_mss() aggressively crushes the window clamp
down when it sees a poor skb->len to skb->truesize ratio. If the
application explicitly locked the buffer via SO_RCVBUF, this
recalculation causes the advertised window to drop severely.

If the window drops below the interface MSS, it triggers Silly Window
Syndrome (SWS) avoidance on the sender. The sender defers transmission
and drops the connection into a perpetual 200ms PROBE0 timer loop,
drastically reducing throughput.

This is highly reproducible on loopback interfaces (MTU 65536) using
Java-based workloads (like Tomcat/GemFire) where the JVM sets SO_RCVBUF
to 32K or 64K. The bloated loopback truesize forces the scaling ratio
to drop, crushing the window clamp to ~26K, instantly triggering SWS
stalls and causing gigabyte transfers to take minutes instead of
milliseconds.

Since the application locked the buffer, the kernel should respect the
clamp boundary and not dynamically crush it based on runtime ratios.

Fixes: a2cbb1603943 ("tcp: Update window clamping condition")
Cc: stable@vger.kernel.org
Reported-by: Karen Badiryan <karen.badiryan@broadcom.com>
Signed-off-by: Ankit Jain <ankit-aj.jain@broadcom.com>
---
Note to reviewers:

Testing Context:
- The SWS deadlock was successfully reproduced on the latest netdev/net 
  tree (v7.1-rc1) using the actual enterprise Java workload.
- Applying this patch completely resolves the 504 Timeouts and restores 
  loopback throughput.
- Baseline iperf3 auto-tuning remains unaffected by this patch.

For context, here is the exact sequence of events that triggers the 
recalculation flaw, illustrated in a packetdrill-style flow. 
Unpatched kernels aggressively crush the window at step 3, triggering SWS.

// 1. Tomcat creates socket and hardcodes the buffer to 32K
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [32768]) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

// 2. GemFire connects over loopback (simulating Jumbo MSS of 65496)
+0 < S 0:0(0) win 65535 <mss 65496, sackOK, TS val 100 ecr 0, nop, wscale 8>
+0 > S. 0:0(0) ack 1 <...>
+0 < . 1:1(0) ack 1 win 65535 <TS val 200 ecr 1>
+0 accept(3, ..., ...) = 4

// 3. GemFire sends a 20KB packet, dropping the scaling_ratio.
// Without the patch, tcp_measure_rcv_mss() crushes the window_clamp here.
+0.1 < . 1:20001(20000) ack 1 win 65535 <TS val 300 ecr 1>
+0.1 read(4, ..., 20000) = 20000

// 4. Assert window did not crush
// WITH the patch, the kernel respects the SOCK_RCVBUF_LOCK.
+0 > . 1:1(0) ack 20001 win 65535
---
 net/ipv4/tcp_input.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d5c9e65d9..c1cb9d3ed 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -248,7 +248,8 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 			do_div(val, skb->truesize);
 			tcp_sk(sk)->scaling_ratio = val ? val : 1;
 
-			if (old_ratio != tcp_sk(sk)->scaling_ratio) {
+			if (old_ratio != tcp_sk(sk)->scaling_ratio &&
+			    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
 				struct tcp_sock *tp = tcp_sk(sk);
 
 				val = tcp_win_from_space(sk, sk->sk_rcvbuf);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-27 20:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 15:27 [PATCH net] tcp: do not shrink window clamp when SO_RCVBUF is locked Ankit Jain
2026-04-27 15:38 ` Eric Dumazet
2026-04-27 20:11   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox