public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] tcp: fix scaled no-shrink rwnd quantization slack
@ 2026-03-17  6:51 Wesley Atwell
  2026-03-17  6:51 ` [PATCH net-next 1/3] selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss Wesley Atwell
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Wesley Atwell @ 2026-03-17  6:51 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, linux-kselftest, davem, dsahern, edumazet, horms,
	kuba, kuniyu, ncardwell, pabeni, shuah, Wesley Atwell

Hi,

this is a new, narrow series for one specific receive-window problem in
the scaled no-shrink path.

The earlier larger receive-window accounting series tried to address
multiple cases at once: quantization slack, live scaling-ratio drift,
retracted windows, repair state, MPTCP state, and extra test plumbing.
Review on that version was useful, but it also made clear that the
series was trying to carry more mechanism than the currently proven
problem justified.

This repost keeps only the part with a clear fail-before/pass-after
story today, and it keeps the stored receive-window state representable
so later no-shrink decisions continue to reason from a right edge the
peer could actually have seen on the wire.

Problem
=======

In the scaled no-shrink path, __tcp_select_window() rounds free_space up
to the receive-window scale quantum:

  window = ALIGN(free_space, 1 << tp->rx_opt.rcv_wscale);

When free_space sits just below the next quantum, that can expose fresh
sender-visible credit that is not actually backed by the current receive
memory state.

That is not just a presentation artifact. Later hard admission can still
reject data which is within the sender-visible offer.

A narrower first rework that stored raw free_space would not have been
safe either. Later no-shrink preservation derives the current offer from

  tp->rcv_wup + tp->rcv_wnd - tp->rcv_nxt

so the stored window still needs to remain representable in scaled
units.

Approach
========

Instead of rounding larger raw windows up in the scaled no-shrink path,
keep only the cases we actually need:

  - relax one unrelated packetdrill test which was pinning an
    incidental advertised window
  - keep tp->rcv_wnd representable in scaled units by rounding larger
    windows down to the scale quantum
  - preserve only the small non-zero case that would otherwise scale
    away to zero
  - rely on the existing tcp_select_window() no-shrink preservation
    logic for already-exposed credit

That removes the larger-window quantization slack from rounding
free_space up, while preserving the small non-zero case needed to avoid
scaling away to zero.

Tests
=====

The packetdrill reproducer exercises both the immediate quantization
case and a follow-on ACK after a small OOO receive-memory change.
Before the TCP change, it fails on the first outbound packet with the
advertised window one scaled unit too large:

  expected: win 84
    actual: win 85

After the TCP change, the reproducer passes and the follow-on ACK also
stays at 84.

Series layout
=============

  1/3 selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss
  2/3 tcp: keep scaled no-shrink window representable
  3/3 selftests: packetdrill: cover scaled rwnd quantization slack

Thanks,
Wesley Atwell

---
 net/ipv4/tcp_output.c                              | 16 +++++---
 .../selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt |  8 ++--
 .../packetdrill/tcp_rcv_quantization_credit.pkt   | 45 ++++++++++++++++++++++
 3 files changed, 61 insertions(+), 8 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next 1/3] selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss
  2026-03-17  6:51 [PATCH net-next 0/3] tcp: fix scaled no-shrink rwnd quantization slack Wesley Atwell
@ 2026-03-17  6:51 ` Wesley Atwell
  2026-03-19 10:22   ` Paolo Abeni
  2026-03-17  6:51 ` [PATCH net-next 2/3] tcp: keep scaled no-shrink window representable Wesley Atwell
  2026-03-17  6:51 ` [PATCH net-next 3/3] selftests: packetdrill: cover scaled rwnd quantization slack Wesley Atwell
  2 siblings, 1 reply; 11+ messages in thread
From: Wesley Atwell @ 2026-03-17  6:51 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, linux-kselftest, davem, dsahern, edumazet, horms,
	kuba, kuniyu, ncardwell, pabeni, shuah, Wesley Atwell

tcp_ooo_rcv_mss.pkt cares about the OOO SACK state and the resulting
tcpi_rcv_mss update.

Its exact advertised receive-window value is incidental to that test and
can legitimately move when unrelated rwnd accounting changes adjust the
ACK window.

Drop the hard-coded win 81 checks and keep only the ACK/SACK shape and
the tcpi_rcv_mss assertion.

Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
---
 tools/testing/selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt b/tools/testing/selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt
index 7e6bc5fb0c8d78f36dc3d18842ff11d938c4e41b..0b19de9f9307b3d0ee579bc9e3a2b9219a88cd8a 100644
--- a/tools/testing/selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt
+++ b/tools/testing/selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt
@@ -17,11 +17,13 @@ sysctl -q net.ipv4.tcp_rmem="4096 131072 $((32*1024*1024))"`
    +0 accept(3, ..., ...) = 4
 
    +0 < . 2001:11001(9000) ack 1 win 257
-   +0 > . 1:1(0) ack 1 win 81 <nop,nop,sack 2001:11001>
+// This test cares about the OOO SACK state and the resulting tcpi_rcv_mss.
+// Keep the ACK/SACK shape exact, but do not pin the precise advertised
+// receive window here because unrelated rwnd accounting changes can adjust it.
+   +0 > . 1:1(0) ack 1 <nop,nop,sack 2001:11001>
 
 // check that ooo packet properly updates tcpi_rcv_mss
    +0 %{ assert tcpi_rcv_mss == 1000, tcpi_rcv_mss }%
 
    +0 < . 11001:21001(10000) ack 1 win 257
-   +0 > . 1:1(0) ack 1 win 81 <nop,nop,sack 2001:21001>
-
+   +0 > . 1:1(0) ack 1 <nop,nop,sack 2001:21001>
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 2/3] tcp: keep scaled no-shrink window representable
  2026-03-17  6:51 [PATCH net-next 0/3] tcp: fix scaled no-shrink rwnd quantization slack Wesley Atwell
  2026-03-17  6:51 ` [PATCH net-next 1/3] selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss Wesley Atwell
@ 2026-03-17  6:51 ` Wesley Atwell
  2026-03-19 10:50   ` Paolo Abeni
  2026-03-17  6:51 ` [PATCH net-next 3/3] selftests: packetdrill: cover scaled rwnd quantization slack Wesley Atwell
  2 siblings, 1 reply; 11+ messages in thread
From: Wesley Atwell @ 2026-03-17  6:51 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, linux-kselftest, davem, dsahern, edumazet, horms,
	kuba, kuniyu, ncardwell, pabeni, shuah, Wesley Atwell

In the scaled no-shrink path, __tcp_select_window() currently rounds the
raw free-space value up to the receive-window scale quantum.

That can expose fresh sender-visible credit beyond the currently backed
free space.

Fix this without changing the meaning of the stored receive-window
state. Keep tp->rcv_wnd representable in scaled units by rounding larger
windows down to the scale quantum and preserving only the small
non-zero case that would otherwise scale away to zero.

tcp_select_window() already preserves the no-shrink guarantee from the
currently offered window, so later no-shrink decisions continue to
reason from a right edge the peer actually saw on the wire.

This removes the larger-window quantization slack from rounding
free_space up, while preserving the small non-zero case needed to avoid
scaling away to zero.

Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
---
 net/ipv4/tcp_output.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 35c3b0ab5a0cb714155d5720fe56888f71aecced..bd3a43148a87e891bc632a47ffb5b82c475e8f6f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3375,13 +3375,19 @@ u32 __tcp_select_window(struct sock *sk)
 	 * scaled window will not line up with the MSS boundary anyway.
 	 */
 	if (tp->rx_opt.rcv_wscale) {
-		window = free_space;
+		u32 gran = 1U << tp->rx_opt.rcv_wscale;
 
-		/* Advertise enough space so that it won't get scaled away.
-		 * Import case: prevent zero window announcement if
-		 * 1<<rcv_wscale > mss.
+		/* Keep tp->rcv_wnd representable in scaled units so later
+		 * no-shrink decisions reason about the same right edge we
+		 * can advertise on the wire. Preserve only a small non-zero
+		 * offer that would otherwise get scaled away to zero.
 		 */
-		window = ALIGN(window, (1 << tp->rx_opt.rcv_wscale));
+		if (free_space >= gran)
+			window = round_down(free_space, gran);
+		else if (free_space > 0)
+			window = gran;
+		else
+			window = 0;
 	} else {
 		window = tp->rcv_wnd;
 		/* Get the largest window that is a nice multiple of mss.
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 3/3] selftests: packetdrill: cover scaled rwnd quantization slack
  2026-03-17  6:51 [PATCH net-next 0/3] tcp: fix scaled no-shrink rwnd quantization slack Wesley Atwell
  2026-03-17  6:51 ` [PATCH net-next 1/3] selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss Wesley Atwell
  2026-03-17  6:51 ` [PATCH net-next 2/3] tcp: keep scaled no-shrink window representable Wesley Atwell
@ 2026-03-17  6:51 ` Wesley Atwell
  2026-03-22 16:32   ` Simon Baatz
  2 siblings, 1 reply; 11+ messages in thread
From: Wesley Atwell @ 2026-03-17  6:51 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, linux-kselftest, davem, dsahern, edumazet, horms,
	kuba, kuniyu, ncardwell, pabeni, shuah, Wesley Atwell

Add a packetdrill reproducer for the scaled no-shrink quantization case.

The sequence leaves slightly more than 84 scaled units of backed credit
after one skb is drained. The buggy ALIGN() path rounds that up and
exposes a fresh extra unit, so the wire-visible window becomes 85.

Then queue a tiny OOO skb so the next ACK re-runs the no-shrink path
after a small receive-memory change without advancing rcv_nxt. With the
fix in place, both ACKs keep the sender-visible window at 84.

This provides fail-before/pass-after coverage for both the immediate
quantization bug and the follow-on ACK transition that reuses the stored
window state.

Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
---
 .../packetdrill/tcp_rcv_quantization_credit.pkt    | 45 ++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt

diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt
new file mode 100644
index 0000000000000000000000000000000000000000..8ea96281b601f2d161cfd84967cad91cedb03151
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+
+--ip_version=ipv4
+--mss=1000
+
+`./defaults.sh
+sysctl -q net.ipv4.tcp_moderate_rcvbuf=0
+sysctl -q net.ipv4.tcp_shrink_window=0
+sysctl -q net.ipv4.tcp_rmem="4096 131072 $((32*1024*1024))"`
+
+// Exercise the scaled no-shrink path in __tcp_select_window().
+// The sequence below leaves slightly more than 84 scaled units of backed
+// credit after one skb is drained. The buggy ALIGN() path rounds that up and
+// exposes a fresh extra unit; the fixed path keeps the sender-visible window
+// at 84. Then queue a tiny OOO skb so the next ACK re-runs the no-shrink
+// path after a small receive-memory change without advancing rcv_nxt.
+   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+   +0 bind(3, ..., ...) = 0
+   +0 listen(3, 1) = 0
+
+   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
+   +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 10>
+   +0 < . 1:1(0) ack 1 win 257
+
+   +0 accept(3, ..., ...) = 4
+
+   +0 < P. 1:10001(10000) ack 1 win 257
+   * > .  1:1(0) ack 10001
+
+   +0 < P. 10001:11024(1023) ack 1 win 257
+   * > .  1:1(0) ack 11024
+
+// Free one skb, then force an outbound packet so the current advertised
+// window is observable both on the wire and via TCP_INFO.
+   +0 read(4, ..., 10000) = 10000
+   +0 write(4, ..., 1) = 1
+   * > P. 1:2(1) ack 11024 win 84
+   +0 %{ assert (tcpi_rcv_wnd >> 10) == 84, tcpi_rcv_wnd }%
+
+// Queue a tiny OOO skb. This should not create fresh sender-visible credit
+// on the next ACK after the first post-drain window update.
+   +0 < P. 12024:12025(1) ack 2 win 257
+   * > .  2:2(0) ack 11024 win 84
+   +0 %{ assert (tcpi_rcv_wnd >> 10) == 84, tcpi_rcv_wnd }%
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 1/3] selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss
  2026-03-17  6:51 ` [PATCH net-next 1/3] selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss Wesley Atwell
@ 2026-03-19 10:22   ` Paolo Abeni
  2026-03-19 14:51     ` Neal Cardwell
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2026-03-19 10:22 UTC (permalink / raw)
  To: Wesley Atwell, netdev, ncardwell
  Cc: linux-kernel, linux-kselftest, davem, dsahern, edumazet, horms,
	kuba, kuniyu, shuah

On 3/17/26 7:51 AM, Wesley Atwell wrote:
> tcp_ooo_rcv_mss.pkt cares about the OOO SACK state and the resulting
> tcpi_rcv_mss update.
> 
> Its exact advertised receive-window value is incidental to that test and
> can legitimately move when unrelated rwnd accounting changes adjust the
> ACK window.
> 
> Drop the hard-coded win 81 checks and keep only the ACK/SACK shape and
> the tcpi_rcv_mss assertion.

I think it would be better to keep the test updated with the kernel
behavior. Having the pktdrill tests bundled together with the kernel
allows for tightly coupling.

@Neal: WDYT?

/P


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 2/3] tcp: keep scaled no-shrink window representable
  2026-03-17  6:51 ` [PATCH net-next 2/3] tcp: keep scaled no-shrink window representable Wesley Atwell
@ 2026-03-19 10:50   ` Paolo Abeni
  2026-03-19 21:21     ` Wesley Atwell
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2026-03-19 10:50 UTC (permalink / raw)
  To: Wesley Atwell, netdev
  Cc: linux-kernel, linux-kselftest, davem, dsahern, edumazet, horms,
	kuba, kuniyu, ncardwell, shuah

On 3/17/26 7:51 AM, Wesley Atwell wrote:
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 35c3b0ab5a0cb714155d5720fe56888f71aecced..bd3a43148a87e891bc632a47ffb5b82c475e8f6f 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3375,13 +3375,19 @@ u32 __tcp_select_window(struct sock *sk)
>  	 * scaled window will not line up with the MSS boundary anyway.
>  	 */
>  	if (tp->rx_opt.rcv_wscale) {
> -		window = free_space;
> +		u32 gran = 1U << tp->rx_opt.rcv_wscale;
>  
> -		/* Advertise enough space so that it won't get scaled away.
> -		 * Import case: prevent zero window announcement if
> -		 * 1<<rcv_wscale > mss.
> +		/* Keep tp->rcv_wnd representable in scaled units so later
> +		 * no-shrink decisions reason about the same right edge we
> +		 * can advertise on the wire. Preserve only a small non-zero
> +		 * offer that would otherwise get scaled away to zero.
>  		 */
> -		window = ALIGN(window, (1 << tp->rx_opt.rcv_wscale));
> +		if (free_space >= gran)
> +			window = round_down(free_space, gran);

The receive window already has a similar rounding in the `free_space <
(full_space >> 1)` case. This is basically excluding only:

	gran > free_space >= (full_space >> 1)

which IDK if is a realistic situation, perhaps just do the scale down
unconditionally?

Also minor nit, prefer 'granularity' over 'gran'

/P


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 1/3] selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss
  2026-03-19 10:22   ` Paolo Abeni
@ 2026-03-19 14:51     ` Neal Cardwell
  2026-03-19 20:53       ` Wesley Atwell
  0 siblings, 1 reply; 11+ messages in thread
From: Neal Cardwell @ 2026-03-19 14:51 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Wesley Atwell, netdev, linux-kernel, linux-kselftest, davem,
	dsahern, edumazet, horms, kuba, kuniyu, shuah

On Thu, Mar 19, 2026 at 6:22 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 3/17/26 7:51 AM, Wesley Atwell wrote:
> > tcp_ooo_rcv_mss.pkt cares about the OOO SACK state and the resulting
> > tcpi_rcv_mss update.
> >
> > Its exact advertised receive-window value is incidental to that test and
> > can legitimately move when unrelated rwnd accounting changes adjust the
> > ACK window.
> >
> > Drop the hard-coded win 81 checks and keep only the ACK/SACK shape and
> > the tcpi_rcv_mss assertion.
>
> I think it would be better to keep the test updated with the kernel
> behavior. Having the pktdrill tests bundled together with the kernel
> allows for tightly coupling.
>
> @Neal: WDYT?

IMHO our experience internally with packetdrill tests suggests that
assertions about outgoing receive window values should only be made in
a small set of tests specifically focused on receive window behavior.

The motivation is mainly toil and velocity. We have over 1,000
packetdrill test scripts internally, and hopefully most of these will
eventually have upstream versions.  If we allow assertions about
receive window behavior to be sprinkled among all tests, then over
time we may end up requiring edits to hundreds of packetdrill tests
any time the receive window behavior changes. And the receive window
values can change frequently. We have generally tried to remove
receive window assertions when we've discovered they were included in
a test not focused on receive window behavior.

neal

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 1/3] selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss
  2026-03-19 14:51     ` Neal Cardwell
@ 2026-03-19 20:53       ` Wesley Atwell
  0 siblings, 0 replies; 11+ messages in thread
From: Wesley Atwell @ 2026-03-19 20:53 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Paolo Abeni, netdev, linux-kernel, linux-kselftest, davem,
	dsahern, edumazet, horms, kuba, kuniyu, shuah

Thanks,

Paolo and Neal. My intent here was just to keep the test focused
on OOO SACK state and `tcpi_rcv_mss`. I appreciate the feedback.

V/R,
Wesley Atwell

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 2/3] tcp: keep scaled no-shrink window representable
  2026-03-19 10:50   ` Paolo Abeni
@ 2026-03-19 21:21     ` Wesley Atwell
  0 siblings, 0 replies; 11+ messages in thread
From: Wesley Atwell @ 2026-03-19 21:21 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, linux-kernel, linux-kselftest, davem, dsahern, edumazet,
	horms, kuba, kuniyu, ncardwell, shuah

Paolo,

I re-checked that corner more carefully. I do not want to overstate it as a
common path, but I also do not think the current code rules it out.

rcv_wscale is fixed at connection setup, while full_space/free_space are
recomputed later from the current receive-buffer state. The tree explicitly
allows later SO_RCVBUF reduction, and the window clamp can also change later,
so I do not see an invariant that keeps full_space above one scale quantum
once the scale has been negotiated.

That said, my reason for keeping the small non-zero case is not that
unconditional scale-down would be less safe. It is that it would also change
the long-standing behavior that avoids scaling a non-zero offer away to zero.
My intent here was to remove the larger-window round-up slack without changing
that smaller legacy case in the same patch.

If you would prefer to also change that small non-zero case, I can do that in
v2 instead.

I will also rename gran to granularity in v2, if so.

Thanks,
Wesley

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 3/3] selftests: packetdrill: cover scaled rwnd quantization slack
  2026-03-17  6:51 ` [PATCH net-next 3/3] selftests: packetdrill: cover scaled rwnd quantization slack Wesley Atwell
@ 2026-03-22 16:32   ` Simon Baatz
  2026-03-24  5:27     ` Wesley Atwell
  0 siblings, 1 reply; 11+ messages in thread
From: Simon Baatz @ 2026-03-22 16:32 UTC (permalink / raw)
  To: Wesley Atwell
  Cc: netdev, linux-kernel, linux-kselftest, davem, dsahern, edumazet,
	horms, kuba, kuniyu, ncardwell, pabeni, shuah

Hi Wesley,

On Tue, Mar 17, 2026 at 12:51:37AM -0600, Wesley Atwell wrote:
> Add a packetdrill reproducer for the scaled no-shrink quantization case.
> 
> The sequence leaves slightly more than 84 scaled units of backed credit
> after one skb is drained. The buggy ALIGN() path rounds that up and
> exposes a fresh extra unit, so the wire-visible window becomes 85.

I ran this test on current net-next with assertions on all advertised
windows.  The following passes:

...
   +0 < P. 10001:11024(1023) ack 1 win 257
   * > .  1:1(0) ack 11024 win 85

// Free one skb, then force an outbound packet so the current advertised
// window is observable both on the wire and via TCP_INFO.
   +0 read(4, ..., 10000) = 10000
   +0 write(4, ..., 1) = 1
   * > P. 1:2(1) ack 11024 win 85
   +0 %{ assert (tcpi_rcv_wnd >> 10) == 85, tcpi_rcv_wnd }%

// Queue a tiny OOO skb. This should not create fresh sender-visible credit
// on the next ACK after the first post-drain window update.
   +0 < P. 12024:12025(1) ack 2 win 257
   * > .  2:2(0) ack 11024 win 85
   +0 %{ assert (tcpi_rcv_wnd >> 10) == 85, tcpi_rcv_wnd }%


I do not see an “extra unit” after draining; the sender-visible
window stays at 85 throughout.

In this flow the receive window is limited by rcv_ssthresh (86286)
and not by memory (free_space > 92000).  The effect of the change
looks to be that the previous code settles at rcv_ssthresh rounded
up, while the patched code settles at rcv_ssthresh rounded down.

The choice in the current code seem to be explicit, in the
shrink_window_allowed branch we have:

	if (free_space > tp->rcv_ssthresh) {
		free_space = tp->rcv_ssthresh;
		/* new window should always be an exact multiple of scaling factor
		 *
		 * For this case, we ALIGN "up" (increase free_space) because
		 * we know free_space is not zero here, it has been reduced from
		 * the memory-based limit, and rcv_ssthresh is not a hard limit
		 * (unlike sk_rcvbuf).
		 */
		free_space = ALIGN(free_space, (1 << tp->rx_opt.rcv_wscale));
	}

Thus, we explicitly round up when we are constrained by rcv_ssthresh.

As Paolo noted, once free_space gets smaller we already round down,
so the current behavior appears to be: round up while there is
headroom (possibly constrained by rcv_ssthresh), and round down once
memory starts to get tight, which makes sense IMHO.

Given that, this test case demonstrates that the new code now expects
rounding down even when there is ample memory, but it does not show a
concrete improvement in the situation the patch is meant to fix
(i.e. a case where the current rounding-up behavior actually hurts).

> Then queue a tiny OOO skb so the next ACK re-runs the no-shrink path
> after a small receive-memory change without advancing rcv_nxt. With the
> fix in place, both ACKs keep the sender-visible window at 84.

And without it, both ACKs keep the sender-visible window at 85.
 
> This provides fail-before/pass-after coverage for both the immediate
> quantization bug and the follow-on ACK transition that reuses the stored
> window state.
> 
> Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
> ---
>  .../packetdrill/tcp_rcv_quantization_credit.pkt    | 45 ++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>  create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt
> 
> diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt
> new file mode 100644
> index 0000000000000000000000000000000000000000..8ea96281b601f2d161cfd84967cad91cedb03151
> --- /dev/null
> +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt
> @@ -0,0 +1,45 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +--ip_version=ipv4

Is there a particular reason to restrict this to IPv4 only?

> +--mss=1000
> +
> +`./defaults.sh
> +sysctl -q net.ipv4.tcp_moderate_rcvbuf=0
> +sysctl -q net.ipv4.tcp_shrink_window=0
> +sysctl -q net.ipv4.tcp_rmem="4096 131072 $((32*1024*1024))"`
> +
> +// Exercise the scaled no-shrink path in __tcp_select_window().
> +// The sequence below leaves slightly more than 84 scaled units of backed
> +// credit after one skb is drained. The buggy ALIGN() path rounds that up and

I'd drop the "buggy ALIGN() path" wording in the comment.  If the
change lands, there is no longer such a path, and the comment should
just describe the behavior the test is checking.

> +// exposes a fresh extra unit; the fixed path keeps the sender-visible window
> +// at 84. Then queue a tiny OOO skb so the next ACK re-runs the no-shrink
> +// path after a small receive-memory change without advancing rcv_nxt.

- Simon

-- 
Simon Baatz <gmbnomis@gmail.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 3/3] selftests: packetdrill: cover scaled rwnd quantization slack
  2026-03-22 16:32   ` Simon Baatz
@ 2026-03-24  5:27     ` Wesley Atwell
  0 siblings, 0 replies; 11+ messages in thread
From: Wesley Atwell @ 2026-03-24  5:27 UTC (permalink / raw)
  To: Simon Baatz
  Cc: netdev, linux-kernel, linux-kselftest, davem, dsahern, edumazet,
	horms, kuba, kuniyu, ncardwell, pabeni, shuah

Hi,

Thanks for pointing that out, have a V3 coming that should fix all nits.

Thanks,
Wesley Atwell

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-03-24  5:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-17  6:51 [PATCH net-next 0/3] tcp: fix scaled no-shrink rwnd quantization slack Wesley Atwell
2026-03-17  6:51 ` [PATCH net-next 1/3] selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss Wesley Atwell
2026-03-19 10:22   ` Paolo Abeni
2026-03-19 14:51     ` Neal Cardwell
2026-03-19 20:53       ` Wesley Atwell
2026-03-17  6:51 ` [PATCH net-next 2/3] tcp: keep scaled no-shrink window representable Wesley Atwell
2026-03-19 10:50   ` Paolo Abeni
2026-03-19 21:21     ` Wesley Atwell
2026-03-17  6:51 ` [PATCH net-next 3/3] selftests: packetdrill: cover scaled rwnd quantization slack Wesley Atwell
2026-03-22 16:32   ` Simon Baatz
2026-03-24  5:27     ` Wesley Atwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox