public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling
@ 2026-02-19 23:55 Simon Baatz via B4 Relay
  2026-02-19 23:55 ` [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-02-19 23:55 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Stefano Brivio, Jon Maloy, Jason Xing,
	mfreemon, Shuah Khan
  Cc: Christian Ebner, netdev, linux-doc, linux-kernel, linux-kselftest,
	Simon Baatz

Hi,

this series implements the receiver-side requirements for TCP window
retraction as specified in RFC 7323 and adds packetdrill tests to
cover the new behavior.

It addresses a regression with somewhat complex causes; see my message
"Re: [regression] [PATCH net-next 7/8] tcp: stronger sk_rcvbuf checks"
(https://lkml.kernel.org/netdev/aXaHEk_eRJyhYfyM@gandalf.schnuecks.de/).

Please see the first patch for background and implementation details.

This is an RFC because a few open questions remain:

- Placement of the new rcv_mwnd_seq field in tcp_sock:

  rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
  tcp_select_window. However, rcv_wup is documented as RX read_write
  only (even though it is updated in tcp_select_window), and rcv_wnd
  is TX read_write / RX read_mostly.

  rcv_mwnd_seq is only updated in tcp_select_window and, as far as I
  can tell, is not used on the RX fast path.

  If I understand the placement rules correctly, this means that
  rcv_mwnd_seq, rcv_wup, and rcv_wnd end up in different cacheline
  groups, which feels odd. Guidance on where rcv_mwnd_seq should live
  would be appreciated.

- In tcp_minisocks.c, it is not clear to me whether we should change
  "tcptw->tw_rcv_wnd = tcp_receive_window(tp)" to
  "tcptw->tw_rcv_wnd = tcp_max_receive_window(tp)". I could not find a
  case where this makes a practical difference and have left the
  existing behavior unchanged.

- Packetdrill tests: Some of these seem rather brittle to me; I
  included them mostly to document what I have tested. Suggestions
  for making them more robust are welcome.

- MPTCP seems to modify tp->rcv_wnd of subflows. I haven't looked at
  this, since I wanted to get feedback on the overall approach first.

- Although this series addresses a regression triggered by commit
  d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") the underlying
  problem is shrinking the window. Thus I added "Fixes" headers for
  the commits that introduced window shrinking.

I would appreciate feedback on the overall approach and on these
questions.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
Simon Baatz (4):
      tcp: implement RFC 7323 window retraction receiver requirements
      selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
      selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
      selftests/net: packetdrill: add tcp_rcv_toobig_back_to_back.pkt

 .../networking/net_cachelines/tcp_sock.rst         |   1 +
 include/linux/tcp.h                                |   1 +
 include/net/tcp.h                                  |  14 +++
 net/ipv4/tcp_fastopen.c                            |   1 +
 net/ipv4/tcp_input.c                               |   6 +-
 net/ipv4/tcp_minisocks.c                           |   1 +
 net/ipv4/tcp_output.c                              |  12 +++
 .../net/packetdrill/tcp_rcv_big_endseq.pkt         |   2 +-
 .../packetdrill/tcp_rcv_toobig_back_to_back.pkt    |  27 +++++
 .../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt |  35 +++++++
 .../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt   | 109 +++++++++++++++++++++
 11 files changed, 206 insertions(+), 3 deletions(-)
---
base-commit: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
change-id: 20260220-tcp_rfc7323_retract_wnd_rfc-c8a2d2baebde

Best regards,
-- 
Simon Baatz <gmbnomis@gmail.com>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements
  2026-02-19 23:55 [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
@ 2026-02-19 23:55 ` Simon Baatz via B4 Relay
  2026-02-23 22:26   ` Stefano Brivio
  2026-02-19 23:55 ` [PATCH RFC net-next 2/4] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt Simon Baatz via B4 Relay
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-02-19 23:55 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Stefano Brivio, Jon Maloy, Jason Xing,
	mfreemon, Shuah Khan
  Cc: Christian Ebner, netdev, linux-doc, linux-kernel, linux-kselftest,
	Simon Baatz

From: Simon Baatz <gmbnomis@gmail.com>

By default, the Linux TCP implementation does not shrink the
advertised window (RFC 7323 calls this "window retraction") with the
following exceptions:

- When an incoming segment cannot be added due to the receive buffer
  running out of memory. Since commit 8c670bdfa58e ("tcp: correct
  handling of extreme memory squeeze") a zero window will be
  advertised in this case. It turns out that reaching the required
  "memory pressure" is very easy when window scaling is in use. In the
  simplest case, sending a sufficient number of segments smaller than
  the scale factor to a receiver that does not read data is enough.

  Since commit 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") this
  happens much earlier than before, leading to regressions (the test
  suite of the Valkey project does not pass because of a TCP
  connection that is no longer bi-directional).

- Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
  allowing the tcp window to shrink") addressed the "eating memory"
  problem by introducing a sysctl knob that allows shrinking the
  window before running out of memory.

However, RFC 7323 does not only state that shrinking the window is
necessary in some cases, it also formulates requirements for TCP
implementations when doing so (Section 2.4).

This commit addresses the receiver-side requirements: After retracting
the window, the peer may have a snd_nxt that lies within a previously
advertised window but is now beyond the retracted window. This means
that all incoming segments (including pure ACKs) will be rejected
until the application happens to read enough data to let the peer's
snd_nxt be in window again (which may be never).

To comply with RFC 7323, the receiver MUST honor any segment that
would have been in window for any ACK sent by the receiver and, when
window scaling is in effect, SHOULD track the maximum window sequence
number it has advertised. This patch tracks that maximum window
sequence number throughout the connection and uses it in
tcp_sequence() when deciding whether a segment is acceptable.
Acceptability of data is not changed.

Fixes: 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze")
Fixes: b650d953cd39 ("tcp: enforce receive buffer memory limits by allowing the tcp window to shrink")
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
 Documentation/networking/net_cachelines/tcp_sock.rst       |  1 +
 include/linux/tcp.h                                        |  1 +
 include/net/tcp.h                                          | 14 ++++++++++++++
 net/ipv4/tcp_fastopen.c                                    |  1 +
 net/ipv4/tcp_input.c                                       |  6 ++++--
 net/ipv4/tcp_minisocks.c                                   |  1 +
 net/ipv4/tcp_output.c                                      | 12 ++++++++++++
 .../selftests/net/packetdrill/tcp_rcv_big_endseq.pkt       |  2 +-
 8 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
index 563daea10d6c5c074f004cb1b8574f5392157abb..fecf61166a54ee2f64bcef5312c81dcc4aa9a124 100644
--- a/Documentation/networking/net_cachelines/tcp_sock.rst
+++ b/Documentation/networking/net_cachelines/tcp_sock.rst
@@ -121,6 +121,7 @@ u64                           delivered_mstamp        read_write
 u32                           rate_delivered                              read_mostly         tcp_rate_gen
 u32                           rate_interval_us                            read_mostly         rate_delivered,rate_app_limited
 u32                           rcv_wnd                 read_write          read_mostly         tcp_select_window,tcp_receive_window,tcp_fast_path_check
+u32                           rcv_mwnd_seq            read_write                              tcp_select_window
 u32                           write_seq               read_write                              tcp_rate_check_app_limited,tcp_write_queue_empty,tcp_skb_entail,forced_push,tcp_mark_push
 u32                           notsent_lowat           read_mostly                             tcp_stream_memory_free
 u32                           pushed_seq              read_write                              tcp_mark_push,forced_push
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f72eef31fa23cc584f2f0cefacdc35cae43aa52d..5a943b12d4c050a980b4cf81635b9fa2f0036283 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -271,6 +271,7 @@ struct tcp_sock {
 	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
 	u32	mdev_us;	/* medium deviation			*/
 	u32	rtt_seq;	/* sequence number to update rttvar	*/
+	u32	rcv_mwnd_seq; /* Maximum window sequence number (RFC 7323, section 2.4) */
 	u64	tcp_wstamp_ns;	/* departure time for next sent data packet */
 	u64	accecn_opt_tstamp;	/* Last AccECN option sent timestamp */
 	struct list_head tsorted_sent_queue; /* time-sorted sent but un-SACKed skbs */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 40e72b9cb85f08714d3f458c0bd1402a5fb1eb4e..e1944d504823d5f8754d85bfbbf3c9630d2190ac 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -912,6 +912,20 @@ static inline u32 tcp_receive_window(const struct tcp_sock *tp)
 	return (u32) win;
 }
 
+/* Compute the maximum receive window we ever advertised.
+ * Rcv_nxt can be after the window if our peer push more data
+ * than the offered window.
+ */
+static inline u32 tcp_max_receive_window(const struct tcp_sock *tp)
+{
+	s32 win = tp->rcv_mwnd_seq - tp->rcv_nxt;
+
+	if (win < 0)
+		win = 0;
+	return (u32) win;
+}
+
+
 /* Choose a new window, without checks for shrinking, and without
  * scaling applied to the result.  The caller does these things
  * if necessary.  This is a "raw" window selection.
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index b30090cff3cf7d925dc46694860abd3ca5516d70..f034ef6e3e7b54bf73c77fd2bf1d3090c75dbfc6 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -377,6 +377,7 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
 
 	tcp_rsk(req)->rcv_nxt = tp->rcv_nxt;
 	tp->rcv_wup = tp->rcv_nxt;
+	tp->rcv_mwnd_seq = tp->rcv_wup + tp->rcv_wnd;
 	/* tcp_conn_request() is sending the SYNACK,
 	 * and queues the child into listener accept queue.
 	 */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e7b41abb82aad33d8cab4fcfa989cc4771149b41..af9dd51256b01fd31d9e390d69dcb1d1700daf1b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4865,8 +4865,8 @@ static enum skb_drop_reason tcp_sequence(const struct sock *sk,
 	if (before(end_seq, tp->rcv_wup))
 		return SKB_DROP_REASON_TCP_OLD_SEQUENCE;
 
-	if (after(end_seq, tp->rcv_nxt + tcp_receive_window(tp))) {
-		if (after(seq, tp->rcv_nxt + tcp_receive_window(tp)))
+	if (after(end_seq, tp->rcv_nxt + tcp_max_receive_window(tp))) {
+		if (after(seq, tp->rcv_nxt + tcp_max_receive_window(tp)))
 			return SKB_DROP_REASON_TCP_INVALID_SEQUENCE;
 
 		/* Only accept this packet if receive queue is empty. */
@@ -6959,6 +6959,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 		 */
 		WRITE_ONCE(tp->rcv_nxt, TCP_SKB_CB(skb)->seq + 1);
 		tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;
+		tp->rcv_mwnd_seq = tp->rcv_wup + tp->rcv_wnd;
 
 		/* RFC1323: The window in SYN & SYN/ACK segments is
 		 * never scaled.
@@ -7071,6 +7072,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 		WRITE_ONCE(tp->rcv_nxt, TCP_SKB_CB(skb)->seq + 1);
 		WRITE_ONCE(tp->copied_seq, tp->rcv_nxt);
 		tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;
+		tp->rcv_mwnd_seq = tp->rcv_wup + tp->rcv_wnd;
 
 		/* RFC1323: The window in SYN & SYN/ACK segments is
 		 * never scaled.
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index ec128865f5c029c971eb00cb9ee058b742efafd1..df95d8b6dce5c746e5e34545aa75a96080cc752d 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -604,6 +604,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 	newtp->window_clamp = req->rsk_window_clamp;
 	newtp->rcv_ssthresh = req->rsk_rcv_wnd;
 	newtp->rcv_wnd = req->rsk_rcv_wnd;
+	newtp->rcv_mwnd_seq = newtp->rcv_wup + req->rsk_rcv_wnd;
 	newtp->rx_opt.wscale_ok = ireq->wscale_ok;
 	if (newtp->rx_opt.wscale_ok) {
 		newtp->rx_opt.snd_wscale = ireq->snd_wscale;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 326b58ff1118d02fc396753d56f210f9d3007c7f..50774443f6ae0ca83f360c7fc3239184a1523e1b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -274,6 +274,15 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
 }
 EXPORT_IPV6_MOD(tcp_select_initial_window);
 
+/* Check if we need to update the maximum window sequence number */
+static inline void tcp_update_max_wnd_seq(struct tcp_sock *tp)
+{
+	u32 wre = tp->rcv_wup + tp->rcv_wnd;
+
+	if (after(wre, tp->rcv_mwnd_seq))
+		tp->rcv_mwnd_seq = wre;
+}
+
 /* Chose a new window to advertise, update state in tcp_sock for the
  * socket, and return result with RFC1323 scaling applied.  The return
  * value can be stuffed directly into th->window for an outgoing
@@ -293,6 +302,7 @@ static u16 tcp_select_window(struct sock *sk)
 		tp->pred_flags = 0;
 		tp->rcv_wnd = 0;
 		tp->rcv_wup = tp->rcv_nxt;
+		tcp_update_max_wnd_seq(tp);
 		return 0;
 	}
 
@@ -316,6 +326,7 @@ static u16 tcp_select_window(struct sock *sk)
 
 	tp->rcv_wnd = new_win;
 	tp->rcv_wup = tp->rcv_nxt;
+	tcp_update_max_wnd_seq(tp);
 
 	/* Make sure we do not exceed the maximum possible
 	 * scaled window.
@@ -4169,6 +4180,7 @@ static void tcp_connect_init(struct sock *sk)
 	else
 		tp->rcv_tstamp = tcp_jiffies32;
 	tp->rcv_wup = tp->rcv_nxt;
+	tp->rcv_mwnd_seq = tp->rcv_nxt + tp->rcv_wnd;
 	WRITE_ONCE(tp->copied_seq, tp->rcv_nxt);
 
 	inet_csk(sk)->icsk_rto = tcp_timeout_init(sk);
diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt
index 3848b419e68c3fc895ad736d06373fc32f3691c1..1a86ee5093696deb316c532ca8f7de2bbf5cd8ea 100644
--- a/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt
+++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt
@@ -36,7 +36,7 @@
 
   +0 read(4, ..., 100000) = 4000
 
-// If queue is empty, accept a packet even if its end_seq is above wup + rcv_wnd
+// If queue is empty, accept a packet even if its end_seq is above rcv_mwnd_seq
   +0 < P. 4001:54001(50000) ack 1 win 257
   +0 > .  1:1(0) ack 54001 win 0
 

-- 
2.52.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC net-next 2/4] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
  2026-02-19 23:55 [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
  2026-02-19 23:55 ` [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
@ 2026-02-19 23:55 ` Simon Baatz via B4 Relay
  2026-02-19 23:55 ` [PATCH RFC net-next 3/4] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt Simon Baatz via B4 Relay
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-02-19 23:55 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Stefano Brivio, Jon Maloy, Jason Xing,
	mfreemon, Shuah Khan
  Cc: Christian Ebner, netdev, linux-doc, linux-kernel, linux-kselftest,
	Simon Baatz

From: Simon Baatz <gmbnomis@gmail.com>

This test verifies the sequence number checks using the maximum
advertised window sequence number. Cases:

1. The window is reduced to zero because of memory

2. The window grows again but still does not reach the originally
   advertised window

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
 .../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt   | 109 +++++++++++++++++++++
 1 file changed, 109 insertions(+)

diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt
new file mode 100644
index 0000000000000000000000000000000000000000..cd761300d02df449ff68cd6ff6f3b8ac62d5f27b
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+
+--mss=1000
+
+`./defaults.sh
+sysctl -q net.ipv4.tcp_rmem="4096 32768 $((32*1024*1024))"`
+
+    0 `nstat -n`
+
+// Establish a connection.
+   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+   +0 bind(3, ..., ...) = 0
+   +0 listen(3, 1) = 0
+
+   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
+   +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 10>
+   +0 < . 1:1(0) ack 1 win 257
+
+  +0 accept(3, ..., ...) = 4
+
+  +0 < P. 1:10001(10000) ack 1 win 257
+   * > .  1:1(0) ack 10001 win 16
+
+// Segments smaller than the window scale factor do not allow to reduce the adv. window
+  +0 < P. 10001:11024(1023) ack 1 win 257
+  * > .  1:1(0) ack 11024 win 16
+  +0 < P. 11024:12047(1023) ack 1 win 257
+  * > .  1:1(0) ack 12047 win 16
+  +0 < P. 12047:13070(1023) ack 1 win 257
+  * > .  1:1(0) ack 13070 win 16
+  +0 < P. 13070:14093(1023) ack 1 win 257
+  * > .  1:1(0) ack 14093 win 16
+  +0 < P. 14093:15116(1023) ack 1 win 257
+  * > .  1:1(0) ack 15116 win 16
+  +0 < P. 15116:16139(1023) ack 1 win 257
+  * > .  1:1(0) ack 16139 win 16
+  +0 < P. 16139:17162(1023) ack 1 win 257
+  * > .  1:1(0) ack 17162 win 16
+  +0 < P. 17162:18185(1023) ack 1 win 257
+  * > .  1:1(0) ack 18185 win 16
+  +0 < P. 18185:19208(1023) ack 1 win 257
+  * > .  1:1(0) ack 19208 win 16
+  +0 < P. 19208:20231(1023) ack 1 win 257
+  * > .  1:1(0) ack 20231 win 16
+  +0 < P. 20231:21254(1023) ack 1 win 257
+  * > .  1:1(0) ack 21254 win 16
+  +0 < P. 21254:22277(1023) ack 1 win 257
+  * > .  1:1(0) ack 22277 win 16
+  +0 < P. 22277:23300(1023) ack 1 win 257
+  * > .  1:1(0) ack 23300 win 16
+  +0 < P. 23300:24323(1023) ack 1 win 257
+  * > .  1:1(0) ack 24323 win 16
+  +0 < P. 24323:25346(1023) ack 1 win 257
+  * > .  1:1(0) ack 25346 win 16
+  +0 < P. 25346:26369(1023) ack 1 win 257
+  * > .  1:1(0) ack 26369 win 16
+  +0 < P. 26369:27392(1023) ack 1 win 257
+  * > .  1:1(0) ack 27392 win 16
+  +0 < P. 27392:28415(1023) ack 1 win 257
+  * > .  1:1(0) ack 28415 win 16
+  +0 < P. 28415:29438(1023) ack 1 win 257
+  * > .  1:1(0) ack 29438 win 16
+  +0 < P. 29438:30461(1023) ack 1 win 257
+  * > .  1:1(0) ack 30461 win 16
+  +0 < P. 30461:31484(1023) ack 1 win 257
+  * > .  1:1(0) ack 31484 win 16
+  +0 < P. 31484:32507(1023) ack 1 win 257
+  * > .  1:1(0) ack 32507 win 16
+  +0 < P. 32507:33530(1023) ack 1 win 257
+  * > .  1:1(0) ack 33530 win 16
+
+// rcv buffer out of memory
+  +0 < P. 33530:49914(16384) ack 1 win 257
+  +0 > .  1:1(0) ack 33530 win 0
+
+// max window seq advertised 33530 + 16*1024 = 49914
+
+  +0 write(4, ..., 1000) = 1000
+  +0 > P.  1:1001(1000) ack 33530 win 0
+
+// LINUX_MIB_BEYOND_WINDOW: segment is beyond the max window sequence
+  +0 < . 49915:49915(0) ack 1001 win 257
+  +0 > . 1001:1001(0) ack 33530 win 0
+
+  +0 < . 49914:49914(0) ack 1001 win 257
+
+  +0 %{
+assert tcpi_bytes_acked == 1000, tcpi_bytes_acked
+}%
+
+  +0 read(4, ..., 10000) = 10000
+  +0 > .  1001:1001(0) ack 33530 win 9
+
+  +0 write(4, ..., 1000) = 1000
+  +0 > P.  1001:2001(1000) ack 33530 win 9
+
+  // advertise right edge is 33530 + 9*1024 = 42746, but we still need to regard our maximum offer 49914 as in window
+  +0 < . 49914:49914(0) ack 2001 win 257
+
+  +0 %{
+assert tcpi_bytes_acked == 2000, tcpi_bytes_acked
+}%
+
+  +0 < P. 33530:42746(9216) ack 2001 win 257
+   * > .  2001:2001(0) ack 42746 win 0
+
+// Check LINUX_MIB_BEYOND_WINDOW has been incremented once
+  +0 `nstat | grep TcpExtBeyondWindow | grep -q " 1 "`

-- 
2.52.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC net-next 3/4] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
  2026-02-19 23:55 [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
  2026-02-19 23:55 ` [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
  2026-02-19 23:55 ` [PATCH RFC net-next 2/4] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt Simon Baatz via B4 Relay
@ 2026-02-19 23:55 ` Simon Baatz via B4 Relay
  2026-02-19 23:55 ` [PATCH RFC net-next 4/4] selftests/net: packetdrill: add tcp_rcv_toobig_back_to_back.pkt Simon Baatz via B4 Relay
  2026-02-20  8:58 ` [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling Eric Dumazet
  4 siblings, 0 replies; 13+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-02-19 23:55 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Stefano Brivio, Jon Maloy, Jason Xing,
	mfreemon, Shuah Khan
  Cc: Christian Ebner, netdev, linux-doc, linux-kernel, linux-kselftest,
	Simon Baatz

From: Simon Baatz <gmbnomis@gmail.com>

This test verifies the sequence number checks using the maximum
advertised window sequence number when net.ipv4.tcp_shrink_window
is enabled.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
 .../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt | 35 ++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt
new file mode 100644
index 0000000000000000000000000000000000000000..8f9b689d3f6d49c3bbc26cca8e408b905af129cb
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+
+--mss=1000
+
+`./defaults.sh
+sysctl -q net.ipv4.tcp_shrink_window=1
+sysctl -q net.ipv4.tcp_rmem="4096 32768 $((32*1024*1024))"`
+
+   0 `nstat -n`
+
+// Establish a connection.
+  +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+  +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+  +0 bind(3, ..., ...) = 0
+  +0 listen(3, 1) = 0
+
+  +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
+  +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 10>
+  +0 < . 1:1(0) ack 1 win 257
+
+  +0 accept(3, ..., ...) = 4
+
+  +0 < P. 1:10001(10000) ack 1 win 257
+  * > .  1:1(0) ack 10001 win 15
+
+  +0 < P. 10001:11024(1023) ack 1 win 257
+  * > .  1:1(0) ack 11024 win 13
+
+// Max window seq advertised 10001 + 15*1024 = 25361, last advertised: 11024 + 13*1024 = 24336
+// Segment using the max window is accepted
+  +0 < P. 11024:25361(14337) ack 1 win 257
+  * > .  1:1(0) ack 25361 win 0
+
+// Check LINUX_MIB_BEYOND_WINDOW has not been incremented
+  +0 `nstat -z | grep TcpExtBeyondWindow | grep -q " 0 "`

-- 
2.52.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC net-next 4/4] selftests/net: packetdrill: add tcp_rcv_toobig_back_to_back.pkt
  2026-02-19 23:55 [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
                   ` (2 preceding siblings ...)
  2026-02-19 23:55 ` [PATCH RFC net-next 3/4] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt Simon Baatz via B4 Relay
@ 2026-02-19 23:55 ` Simon Baatz via B4 Relay
  2026-02-20  8:58 ` [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling Eric Dumazet
  4 siblings, 0 replies; 13+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-02-19 23:55 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Stefano Brivio, Jon Maloy, Jason Xing,
	mfreemon, Shuah Khan
  Cc: Christian Ebner, netdev, linux-doc, linux-kernel, linux-kselftest,
	Simon Baatz

From: Simon Baatz <gmbnomis@gmail.com>

This test verifies the sequence number checks using the maximum
advertised window sequence number when we accept a packet going beyond
any window that was ever advertised (i.e. rcv_nxt advances beyond
rcv_mwnd_seq).

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
 .../packetdrill/tcp_rcv_toobig_back_to_back.pkt    | 27 ++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_back_to_back.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_back_to_back.pkt
new file mode 100644
index 0000000000000000000000000000000000000000..4d4c33d248948d3dfaf9b0c5b243ed27321e9b10
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_back_to_back.pkt
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+
+--mss=1000
+
+`./defaults.sh`
+
+// Establish a connection.
+   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+   +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [20000], 4) = 0
+   +0 bind(3, ..., ...) = 0
+   +0 listen(3, 1) = 0
+
+   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
+   +0 > S. 0:0(0) ack 1 win 18980 <mss 1460,nop,wscale 0>
+  +.1 < . 1:1(0) ack 1 win 257
+
+   +0 accept(3, ..., ...) = 4
+
+// A too big packet is accepted if the receive queue is empty
+   +0 < P. 1:20001(20000) ack 1 win 257
+// Send a RST immediately so that there is no rcv_wup/rcv_mwnd_seq update yet
+   +0 < R. 20001:20001(0) ack 1 win 257
+//    * > .  1:1(0) ack 20001 win 18000
+
+  +.1 %{ assert tcpi_state == TCP_CLOSE, tcpi_state }%
+

-- 
2.52.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling
  2026-02-19 23:55 [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
                   ` (3 preceding siblings ...)
  2026-02-19 23:55 ` [PATCH RFC net-next 4/4] selftests/net: packetdrill: add tcp_rcv_toobig_back_to_back.pkt Simon Baatz via B4 Relay
@ 2026-02-20  8:58 ` Eric Dumazet
  2026-02-23  0:07   ` Simon Baatz
  4 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2026-02-20  8:58 UTC (permalink / raw)
  To: gmbnomis
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Stefano Brivio, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Christian Ebner, netdev, linux-doc, linux-kernel,
	linux-kselftest

On Fri, Feb 20, 2026 at 12:56 AM Simon Baatz via B4 Relay
<devnull+gmbnomis.gmail.com@kernel.org> wrote:
>
> Hi,
>
> this series implements the receiver-side requirements for TCP window
> retraction as specified in RFC 7323 and adds packetdrill tests to
> cover the new behavior.
>
> It addresses a regression with somewhat complex causes; see my message
> "Re: [regression] [PATCH net-next 7/8] tcp: stronger sk_rcvbuf checks"
> (https://lkml.kernel.org/netdev/aXaHEk_eRJyhYfyM@gandalf.schnuecks.de/).
>
> Please see the first patch for background and implementation details.
>
> This is an RFC because a few open questions remain:
>
> - Placement of the new rcv_mwnd_seq field in tcp_sock:
>
>   rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
>   tcp_select_window. However, rcv_wup is documented as RX read_write
>   only (even though it is updated in tcp_select_window), and rcv_wnd
>   is TX read_write / RX read_mostly.
>
>   rcv_mwnd_seq is only updated in tcp_select_window and, as far as I
>   can tell, is not used on the RX fast path.
>
>   If I understand the placement rules correctly, this means that
>   rcv_mwnd_seq, rcv_wup, and rcv_wnd end up in different cacheline
>   groups, which feels odd. Guidance on where rcv_mwnd_seq should live
>   would be appreciated.
>
> - In tcp_minisocks.c, it is not clear to me whether we should change
>   "tcptw->tw_rcv_wnd = tcp_receive_window(tp)" to
>   "tcptw->tw_rcv_wnd = tcp_max_receive_window(tp)". I could not find a
>   case where this makes a practical difference and have left the
>   existing behavior unchanged.
>
> - Packetdrill tests: Some of these seem rather brittle to me; I
>   included them mostly to document what I have tested. Suggestions
>   for making them more robust are welcome.
>
> - MPTCP seems to modify tp->rcv_wnd of subflows. I haven't looked at
>   this, since I wanted to get feedback on the overall approach first.
>
> - Although this series addresses a regression triggered by commit
>   d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") the underlying
>   problem is shrinking the window. Thus I added "Fixes" headers for
>   the commits that introduced window shrinking.
>
> I would appreciate feedback on the overall approach and on these
> questions.
>

Hi Simon, thanks for the clean series.

I would guess you use some AI ? This is fine, just curious.

Can you add more tests, in memory stress situations ?

Like :

A receiver grew the RWIN over time up to 8 MB.

Then the application (or the kernel under stress) used SO_RCVBUF to 16K.

I want to make sure the socket wont accept packets to fill the prior
window and consume 8MB

8MB seems fine, unless the host has 100,000 sockets in the same situation.

Thanks

> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> ---
> Simon Baatz (4):
>       tcp: implement RFC 7323 window retraction receiver requirements
>       selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
>       selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
>       selftests/net: packetdrill: add tcp_rcv_toobig_back_to_back.pkt
>
>  .../networking/net_cachelines/tcp_sock.rst         |   1 +
>  include/linux/tcp.h                                |   1 +
>  include/net/tcp.h                                  |  14 +++
>  net/ipv4/tcp_fastopen.c                            |   1 +
>  net/ipv4/tcp_input.c                               |   6 +-
>  net/ipv4/tcp_minisocks.c                           |   1 +
>  net/ipv4/tcp_output.c                              |  12 +++
>  .../net/packetdrill/tcp_rcv_big_endseq.pkt         |   2 +-
>  .../packetdrill/tcp_rcv_toobig_back_to_back.pkt    |  27 +++++
>  .../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt |  35 +++++++
>  .../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt   | 109 +++++++++++++++++++++
>  11 files changed, 206 insertions(+), 3 deletions(-)
> ---
> base-commit: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
> change-id: 20260220-tcp_rfc7323_retract_wnd_rfc-c8a2d2baebde
>
> Best regards,
> --
> Simon Baatz <gmbnomis@gmail.com>
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling
  2026-02-20  8:58 ` [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling Eric Dumazet
@ 2026-02-23  0:07   ` Simon Baatz
  2026-02-24  9:22     ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Simon Baatz @ 2026-02-23  0:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Stefano Brivio, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Christian Ebner, netdev, linux-doc, linux-kernel,
	linux-kselftest

Hi Eric,

On Fri, Feb 20, 2026 at 09:58:00AM +0100, Eric Dumazet wrote:
> Hi Simon, thanks for the clean series.
> 
> I would guess you use some AI ? This is fine, just curious.

Thank you!  Yes, I’ve found AI helpful for getting familiar with a
new code base.  I also use it to refine or clean up the wording of
bigger commit messages.  Code generation works quite well for quick,
throw‑away code (like reproducers).
 
> Can you add more tests, in memory stress situations ?
> 
> Like :
> 
> A receiver grew the RWIN over time up to 8 MB.
> 
> Then the application (or the kernel under stress) used SO_RCVBUF to 16K.
> 
> I want to make sure the socket wont accept packets to fill the prior
> window and consume 8MB

I suspect generating 8 MB worth of RX data in packetdrill won't be
fun (unless there’s a trick I’m missing).  And using regular TCP
sockets on both ends would probably be rather uninteresting (no
packets sent once RWIN = 0)

It might be more practical to extend one of the tests to create two
situations in packetdrill:

1. Zero window:  0 == RWIN < 2 * squeezed SO_RCVBUF < tracked max. RWIN < 2 * original SO_RCVBUF
2. Small window: 0  < RWIN < 2 * squeezed SO_RCVBUF < tracked max. RWIN < 2 * original SO_RCVBUF

If these limits are sufficiently distinct, we could probe tcp_sequence() and
tcp_data_queue() paths in detail using:
  
* pure ACK or data packet
* in-order or out-of order
* within, partially within, or beyond (max) window

If we can show that we can't use more memory than expected for the
squeezed buffer, then the original max window size shouldn’t really
matter.

wdyt?

- Simon

-- 
Simon Baatz <gmbnomis@gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements
  2026-02-19 23:55 ` [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
@ 2026-02-23 22:26   ` Stefano Brivio
  2026-02-24 18:07     ` Simon Baatz
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Brivio @ 2026-02-23 22:26 UTC (permalink / raw)
  To: Simon Baatz via B4 Relay
  Cc: gmbnomis, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Shuah Khan, David Ahern, Jon Maloy, Jason Xing,
	mfreemon, Shuah Khan, Christian Ebner, netdev, linux-doc,
	linux-kernel, linux-kselftest

Hi Simon,

It all makes sense to me at a quick look, I have just some nits and one
more substantial worry, below:

On Fri, 20 Feb 2026 00:55:14 +0100
Simon Baatz via B4 Relay <devnull+gmbnomis.gmail.com@kernel.org> wrote:

> From: Simon Baatz <gmbnomis@gmail.com>
> 
> By default, the Linux TCP implementation does not shrink the
> advertised window (RFC 7323 calls this "window retraction") with the
> following exceptions:
> 
> - When an incoming segment cannot be added due to the receive buffer
>   running out of memory. Since commit 8c670bdfa58e ("tcp: correct
>   handling of extreme memory squeeze") a zero window will be
>   advertised in this case. It turns out that reaching the required
>   "memory pressure" is very easy when window scaling is in use. In the
>   simplest case, sending a sufficient number of segments smaller than
>   the scale factor to a receiver that does not read data is enough.
> 
>   Since commit 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") this
>   happens much earlier than before, leading to regressions (the test
>   suite of the Valkey project does not pass because of a TCP
>   connection that is no longer bi-directional).

Ouch. By the way, that same commit helped us unveil an issue (at least
in the sense of RFC 9293, 3.8.6) we fixed in passt:

  https://passt.top/passt/commit/?id=8d2f8c4d0fb58d6b2011e614bc7d7ff9dab406b3

> - Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
>   allowing the tcp window to shrink") addressed the "eating memory"
>   problem by introducing a sysctl knob that allows shrinking the
>   window before running out of memory.
> 
> However, RFC 7323 does not only state that shrinking the window is
> necessary in some cases, it also formulates requirements for TCP
> implementations when doing so (Section 2.4).
> 
> This commit addresses the receiver-side requirements: After retracting
> the window, the peer may have a snd_nxt that lies within a previously
> advertised window but is now beyond the retracted window. This means
> that all incoming segments (including pure ACKs) will be rejected
> until the application happens to read enough data to let the peer's
> snd_nxt be in window again (which may be never).
> 
> To comply with RFC 7323, the receiver MUST honor any segment that
> would have been in window for any ACK sent by the receiver and, when
> window scaling is in effect, SHOULD track the maximum window sequence
> number it has advertised. This patch tracks that maximum window
> sequence number throughout the connection and uses it in
> tcp_sequence() when deciding whether a segment is acceptable.
> Acceptability of data is not changed.
> 
> Fixes: 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze")
> Fixes: b650d953cd39 ("tcp: enforce receive buffer memory limits by allowing the tcp window to shrink")
> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> ---
>  Documentation/networking/net_cachelines/tcp_sock.rst       |  1 +
>  include/linux/tcp.h                                        |  1 +
>  include/net/tcp.h                                          | 14 ++++++++++++++
>  net/ipv4/tcp_fastopen.c                                    |  1 +
>  net/ipv4/tcp_input.c                                       |  6 ++++--
>  net/ipv4/tcp_minisocks.c                                   |  1 +
>  net/ipv4/tcp_output.c                                      | 12 ++++++++++++
>  .../selftests/net/packetdrill/tcp_rcv_big_endseq.pkt       |  2 +-
>  8 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
> index 563daea10d6c5c074f004cb1b8574f5392157abb..fecf61166a54ee2f64bcef5312c81dcc4aa9a124 100644
> --- a/Documentation/networking/net_cachelines/tcp_sock.rst
> +++ b/Documentation/networking/net_cachelines/tcp_sock.rst
> @@ -121,6 +121,7 @@ u64                           delivered_mstamp        read_write
>  u32                           rate_delivered                              read_mostly         tcp_rate_gen
>  u32                           rate_interval_us                            read_mostly         rate_delivered,rate_app_limited
>  u32                           rcv_wnd                 read_write          read_mostly         tcp_select_window,tcp_receive_window,tcp_fast_path_check
> +u32                           rcv_mwnd_seq            read_write                              tcp_select_window
>  u32                           write_seq               read_write                              tcp_rate_check_app_limited,tcp_write_queue_empty,tcp_skb_entail,forced_push,tcp_mark_push
>  u32                           notsent_lowat           read_mostly                             tcp_stream_memory_free
>  u32                           pushed_seq              read_write                              tcp_mark_push,forced_push
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index f72eef31fa23cc584f2f0cefacdc35cae43aa52d..5a943b12d4c050a980b4cf81635b9fa2f0036283 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -271,6 +271,7 @@ struct tcp_sock {
>  	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
>  	u32	mdev_us;	/* medium deviation			*/
>  	u32	rtt_seq;	/* sequence number to update rttvar	*/
> +	u32	rcv_mwnd_seq; /* Maximum window sequence number (RFC 7323, section 2.4) */

Nit: tab between ; and /* for consistency (I would personally prefer
the comment style as you see on 'highest_sack' but I don't think it's
enforced anymore).

Second nit: mentioning RFC 7323, section 2.4 could be a bit misleading
here because the relevant paragraph there covers a very specific case of
window retraction, caused by quantisation error from window scaling,
which is not the most common case here. I couldn't quickly find a better
reference though.

More importantly: do we need to restore this on a connection that's
being dumped and recreated using TCP_REPAIR, or will things still work
(even though sub-optimally) if we lose this value?

Other window values that *need* to be dumped and restored are currently
available via TCP_REPAIR_WINDOW socket option, and they are listed in
do_tcp_getsockopt(), net/ipv4/tcp.c:

		opt.snd_wl1	= tp->snd_wl1;
		opt.snd_wnd	= tp->snd_wnd;
		opt.max_window	= tp->max_window;
		opt.rcv_wnd	= tp->rcv_wnd;
		opt.rcv_wup	= tp->rcv_wup;

CRIU uses it to checkpoint and restore established connections, and
passt uses it to migrate them to a different host:

  https://criu.org/TCP_connection

  https://passt.top/passt/tree/tcp.c?id=02af38d4177550c086bae54246fc3aaa33ddc018#n3063

If it's strictly needed to preserve functionality, we would need to add
it to struct tcp_repair_window, notify CRIU maintainers (or send them a
patch), and add this in passt as well (I can take care of it).

Strictly speaking, in case, this could be considered a breaking change
for userspace, but I don't see how to avoid it, so I'd just make sure
it doesn't impact users as TCP_REPAIR has just a couple of (known!)
projects relying on it.

An alternative would be to have a special, initial value representing
the fact that this value was lost, but it looks really annoying to not
be able to use a u32 for it.

Disregard all this if the correct value is not strictly needed for
functionality, of course. I haven't tested things (not yet, at least).

>  	u64	tcp_wstamp_ns;	/* departure time for next sent data packet */
>  	u64	accecn_opt_tstamp;	/* Last AccECN option sent timestamp */
>  	struct list_head tsorted_sent_queue; /* time-sorted sent but un-SACKed skbs */
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 40e72b9cb85f08714d3f458c0bd1402a5fb1eb4e..e1944d504823d5f8754d85bfbbf3c9630d2190ac 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -912,6 +912,20 @@ static inline u32 tcp_receive_window(const struct tcp_sock *tp)
>  	return (u32) win;
>  }
>  
> +/* Compute the maximum receive window we ever advertised.
> + * Rcv_nxt can be after the window if our peer push more data

s/push/pushes/

s/Rcv_nxt/rcv_nxt/ (useful for grepping)

> + * than the offered window.
> + */
> +static inline u32 tcp_max_receive_window(const struct tcp_sock *tp)
> +{
> +	s32 win = tp->rcv_mwnd_seq - tp->rcv_nxt;
> +
> +	if (win < 0)
> +		win = 0;

I must be missing something but... if the sequence is about to wrap,
we'll return 0 here. Is that intended?

Doing the subtraction unsigned would have looked more natural to me,
but I didn't really think it through.

> +	return (u32) win;

Kernel coding style doesn't usually include a space between cast and
identifier.

> +}
> +
> +
>  /* Choose a new window, without checks for shrinking, and without
>   * scaling applied to the result.  The caller does these things
>   * if necessary.  This is a "raw" window selection.
> diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
> index b30090cff3cf7d925dc46694860abd3ca5516d70..f034ef6e3e7b54bf73c77fd2bf1d3090c75dbfc6 100644
> --- a/net/ipv4/tcp_fastopen.c
> +++ b/net/ipv4/tcp_fastopen.c
> @@ -377,6 +377,7 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
>  
>  	tcp_rsk(req)->rcv_nxt = tp->rcv_nxt;
>  	tp->rcv_wup = tp->rcv_nxt;
> +	tp->rcv_mwnd_seq = tp->rcv_wup + tp->rcv_wnd;
>  	/* tcp_conn_request() is sending the SYNACK,
>  	 * and queues the child into listener accept queue.
>  	 */
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index e7b41abb82aad33d8cab4fcfa989cc4771149b41..af9dd51256b01fd31d9e390d69dcb1d1700daf1b 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4865,8 +4865,8 @@ static enum skb_drop_reason tcp_sequence(const struct sock *sk,
>  	if (before(end_seq, tp->rcv_wup))
>  		return SKB_DROP_REASON_TCP_OLD_SEQUENCE;
>  
> -	if (after(end_seq, tp->rcv_nxt + tcp_receive_window(tp))) {
> -		if (after(seq, tp->rcv_nxt + tcp_receive_window(tp)))
> +	if (after(end_seq, tp->rcv_nxt + tcp_max_receive_window(tp))) {
> +		if (after(seq, tp->rcv_nxt + tcp_max_receive_window(tp)))
>  			return SKB_DROP_REASON_TCP_INVALID_SEQUENCE;
>  
>  		/* Only accept this packet if receive queue is empty. */
> @@ -6959,6 +6959,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
>  		 */
>  		WRITE_ONCE(tp->rcv_nxt, TCP_SKB_CB(skb)->seq + 1);
>  		tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;
> +		tp->rcv_mwnd_seq = tp->rcv_wup + tp->rcv_wnd;
>  
>  		/* RFC1323: The window in SYN & SYN/ACK segments is
>  		 * never scaled.
> @@ -7071,6 +7072,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
>  		WRITE_ONCE(tp->rcv_nxt, TCP_SKB_CB(skb)->seq + 1);
>  		WRITE_ONCE(tp->copied_seq, tp->rcv_nxt);
>  		tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;
> +		tp->rcv_mwnd_seq = tp->rcv_wup + tp->rcv_wnd;
>  
>  		/* RFC1323: The window in SYN & SYN/ACK segments is
>  		 * never scaled.
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index ec128865f5c029c971eb00cb9ee058b742efafd1..df95d8b6dce5c746e5e34545aa75a96080cc752d 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -604,6 +604,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
>  	newtp->window_clamp = req->rsk_window_clamp;
>  	newtp->rcv_ssthresh = req->rsk_rcv_wnd;
>  	newtp->rcv_wnd = req->rsk_rcv_wnd;
> +	newtp->rcv_mwnd_seq = newtp->rcv_wup + req->rsk_rcv_wnd;
>  	newtp->rx_opt.wscale_ok = ireq->wscale_ok;
>  	if (newtp->rx_opt.wscale_ok) {
>  		newtp->rx_opt.snd_wscale = ireq->snd_wscale;
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 326b58ff1118d02fc396753d56f210f9d3007c7f..50774443f6ae0ca83f360c7fc3239184a1523e1b 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -274,6 +274,15 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
>  }
>  EXPORT_IPV6_MOD(tcp_select_initial_window);
>  
> +/* Check if we need to update the maximum window sequence number */
> +static inline void tcp_update_max_wnd_seq(struct tcp_sock *tp)
> +{
> +	u32 wre = tp->rcv_wup + tp->rcv_wnd;
> +
> +	if (after(wre, tp->rcv_mwnd_seq))
> +		tp->rcv_mwnd_seq = wre;
> +}
> +
>  /* Chose a new window to advertise, update state in tcp_sock for the
>   * socket, and return result with RFC1323 scaling applied.  The return
>   * value can be stuffed directly into th->window for an outgoing
> @@ -293,6 +302,7 @@ static u16 tcp_select_window(struct sock *sk)
>  		tp->pred_flags = 0;
>  		tp->rcv_wnd = 0;
>  		tp->rcv_wup = tp->rcv_nxt;
> +		tcp_update_max_wnd_seq(tp);
>  		return 0;
>  	}
>  
> @@ -316,6 +326,7 @@ static u16 tcp_select_window(struct sock *sk)
>  
>  	tp->rcv_wnd = new_win;
>  	tp->rcv_wup = tp->rcv_nxt;
> +	tcp_update_max_wnd_seq(tp);
>  
>  	/* Make sure we do not exceed the maximum possible
>  	 * scaled window.
> @@ -4169,6 +4180,7 @@ static void tcp_connect_init(struct sock *sk)
>  	else
>  		tp->rcv_tstamp = tcp_jiffies32;
>  	tp->rcv_wup = tp->rcv_nxt;
> +	tp->rcv_mwnd_seq = tp->rcv_nxt + tp->rcv_wnd;
>  	WRITE_ONCE(tp->copied_seq, tp->rcv_nxt);
>  
>  	inet_csk(sk)->icsk_rto = tcp_timeout_init(sk);
> diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt
> index 3848b419e68c3fc895ad736d06373fc32f3691c1..1a86ee5093696deb316c532ca8f7de2bbf5cd8ea 100644
> --- a/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt
> +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt
> @@ -36,7 +36,7 @@
>  
>    +0 read(4, ..., 100000) = 4000
>  
> -// If queue is empty, accept a packet even if its end_seq is above wup + rcv_wnd
> +// If queue is empty, accept a packet even if its end_seq is above rcv_mwnd_seq
>    +0 < P. 4001:54001(50000) ack 1 win 257
>    +0 > .  1:1(0) ack 54001 win 0
>  
> 

-- 
Stefano


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling
  2026-02-23  0:07   ` Simon Baatz
@ 2026-02-24  9:22     ` Eric Dumazet
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2026-02-24  9:22 UTC (permalink / raw)
  To: Simon Baatz
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Stefano Brivio, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Christian Ebner, netdev, linux-doc, linux-kernel,
	linux-kselftest

On Mon, Feb 23, 2026 at 1:07 AM Simon Baatz <gmbnomis@gmail.com> wrote:
>
> Hi Eric,
>
> On Fri, Feb 20, 2026 at 09:58:00AM +0100, Eric Dumazet wrote:
> > Hi Simon, thanks for the clean series.
> >
> > I would guess you use some AI ? This is fine, just curious.
>
> Thank you!  Yes, I’ve found AI helpful for getting familiar with a
> new code base.  I also use it to refine or clean up the wording of
> bigger commit messages.  Code generation works quite well for quick,
> throw‑away code (like reproducers).
>
> > Can you add more tests, in memory stress situations ?
> >
> > Like :
> >
> > A receiver grew the RWIN over time up to 8 MB.
> >
> > Then the application (or the kernel under stress) used SO_RCVBUF to 16K.
> >
> > I want to make sure the socket wont accept packets to fill the prior
> > window and consume 8MB
>
> I suspect generating 8 MB worth of RX data in packetdrill won't be
> fun (unless there’s a trick I’m missing).  And using regular TCP
> sockets on both ends would probably be rather uninteresting (no
> packets sent once RWIN = 0)
>

8MB was only to show my point.

A packetdrill test reaching 1MB should be doable.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements
  2026-02-23 22:26   ` Stefano Brivio
@ 2026-02-24 18:07     ` Simon Baatz
  2026-02-25 21:33       ` Stefano Brivio
  0 siblings, 1 reply; 13+ messages in thread
From: Simon Baatz @ 2026-02-24 18:07 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Simon Baatz via B4 Relay, Eric Dumazet, Neal Cardwell,
	Kuniyuki Iwashima, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, David Ahern, Jon Maloy,
	Jason Xing, mfreemon, Shuah Khan, Christian Ebner, netdev,
	linux-doc, linux-kernel, linux-kselftest

Hi Stefano,

On Mon, Feb 23, 2026 at 11:26:40PM +0100, Stefano Brivio wrote:
> Hi Simon,
> 
> It all makes sense to me at a quick look, I have just some nits and one
> more substantial worry, below:
> 
> On Fri, 20 Feb 2026 00:55:14 +0100
> Simon Baatz via B4 Relay <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> 
> > From: Simon Baatz <gmbnomis@gmail.com>
> > 
> > By default, the Linux TCP implementation does not shrink the
> > advertised window (RFC 7323 calls this "window retraction") with the
> > following exceptions:
> > 
> > - When an incoming segment cannot be added due to the receive buffer
> >   running out of memory. Since commit 8c670bdfa58e ("tcp: correct
> >   handling of extreme memory squeeze") a zero window will be
> >   advertised in this case. It turns out that reaching the required
> >   "memory pressure" is very easy when window scaling is in use. In the
> >   simplest case, sending a sufficient number of segments smaller than
> >   the scale factor to a receiver that does not read data is enough.
> > 
> >   Since commit 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") this
> >   happens much earlier than before, leading to regressions (the test
> >   suite of the Valkey project does not pass because of a TCP
> >   connection that is no longer bi-directional).
> 
> Ouch. By the way, that same commit helped us unveil an issue (at least
> in the sense of RFC 9293, 3.8.6) we fixed in passt:
> 
>   https://passt.top/passt/commit/?id=8d2f8c4d0fb58d6b2011e614bc7d7ff9dab406b3

This looks concerning: It seems as if just filling the advertised
window triggered the out of memory condition(?).  Am I right in
assuming that this happened with the original 1d2fbaad7cd8, not the
relaxed version of tcp_can_ingest() from f017c1f768b?

> 
> > - Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
> >   allowing the tcp window to shrink") addressed the "eating memory"
> >   problem by introducing a sysctl knob that allows shrinking the
> >   window before running out of memory.
> > 
> > However, RFC 7323 does not only state that shrinking the window is
> > necessary in some cases, it also formulates requirements for TCP
> > implementations when doing so (Section 2.4).
> > 
> > This commit addresses the receiver-side requirements: After retracting
> > the window, the peer may have a snd_nxt that lies within a previously
> > advertised window but is now beyond the retracted window. This means
> > that all incoming segments (including pure ACKs) will be rejected
> > until the application happens to read enough data to let the peer's
> > snd_nxt be in window again (which may be never).
> > 
> > To comply with RFC 7323, the receiver MUST honor any segment that
> > would have been in window for any ACK sent by the receiver and, when
> > window scaling is in effect, SHOULD track the maximum window sequence
> > number it has advertised. This patch tracks that maximum window
> > sequence number throughout the connection and uses it in
> > tcp_sequence() when deciding whether a segment is acceptable.
> > Acceptability of data is not changed.
> > 
> > Fixes: 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze")
> > Fixes: b650d953cd39 ("tcp: enforce receive buffer memory limits by allowing the tcp window to shrink")
> > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> > ---
> >  Documentation/networking/net_cachelines/tcp_sock.rst       |  1 +
> >  include/linux/tcp.h                                        |  1 +
> >  include/net/tcp.h                                          | 14 ++++++++++++++
> >  net/ipv4/tcp_fastopen.c                                    |  1 +
> >  net/ipv4/tcp_input.c                                       |  6 ++++--
> >  net/ipv4/tcp_minisocks.c                                   |  1 +
> >  net/ipv4/tcp_output.c                                      | 12 ++++++++++++
> >  .../selftests/net/packetdrill/tcp_rcv_big_endseq.pkt       |  2 +-
> >  8 files changed, 35 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
> > index 563daea10d6c5c074f004cb1b8574f5392157abb..fecf61166a54ee2f64bcef5312c81dcc4aa9a124 100644
> > --- a/Documentation/networking/net_cachelines/tcp_sock.rst
> > +++ b/Documentation/networking/net_cachelines/tcp_sock.rst
> > @@ -121,6 +121,7 @@ u64                           delivered_mstamp        read_write
> >  u32                           rate_delivered                              read_mostly         tcp_rate_gen
> >  u32                           rate_interval_us                            read_mostly         rate_delivered,rate_app_limited
> >  u32                           rcv_wnd                 read_write          read_mostly         tcp_select_window,tcp_receive_window,tcp_fast_path_check
> > +u32                           rcv_mwnd_seq            read_write                              tcp_select_window
> >  u32                           write_seq               read_write                              tcp_rate_check_app_limited,tcp_write_queue_empty,tcp_skb_entail,forced_push,tcp_mark_push
> >  u32                           notsent_lowat           read_mostly                             tcp_stream_memory_free
> >  u32                           pushed_seq              read_write                              tcp_mark_push,forced_push
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > index f72eef31fa23cc584f2f0cefacdc35cae43aa52d..5a943b12d4c050a980b4cf81635b9fa2f0036283 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -271,6 +271,7 @@ struct tcp_sock {
> >  	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
> >  	u32	mdev_us;	/* medium deviation			*/
> >  	u32	rtt_seq;	/* sequence number to update rttvar	*/
> > +	u32	rcv_mwnd_seq; /* Maximum window sequence number (RFC 7323, section 2.4) */
> 
> Nit: tab between ; and /* for consistency (I would personally prefer
> the comment style as you see on 'highest_sack' but I don't think it's
> enforced anymore).

Thanks, I missed that.
 
> Second nit: mentioning RFC 7323, section 2.4 could be a bit misleading
> here because the relevant paragraph there covers a very specific case of
> window retraction, caused by quantisation error from window scaling,
> which is not the most common case here. I couldn't quickly find a better
> reference though.

I agree, but there is a part that, I think, is more generally
applicable:

2.4.  Addressing Window Retraction

   [ specific window retraction case introduction removed ]
   ... Implementations MUST ensure that they handle a shrinking
   window, as specified in Section 4.2.2.16 of [RFC1122].

   For the receiver, this implies that:

   1)  The receiver MUST honor, as in window, any segment that would
       have been in window for any <ACK> sent by the receiver.

   2)  When window scaling is in effect, the receiver SHOULD track the
       actual maximum window sequence number (which is likely to be
       greater than the window announced by the most recent <ACK>, if
       more than one segment has arrived since the application consumed
       any data in the receive buffer).

There is no "When window scaling is in effect," on the first
requirement. And it "happens" to be implementable by the second
requirement (with or without window scaling).

I think an improvement could be to refer to the receiver requirements
specifically here.

> More importantly: do we need to restore this on a connection that's
> being dumped and recreated using TCP_REPAIR, or will things still work
> (even though sub-optimally) if we lose this value?
> 
> Other window values that *need* to be dumped and restored are currently
> available via TCP_REPAIR_WINDOW socket option, and they are listed in
> do_tcp_getsockopt(), net/ipv4/tcp.c:
> 
> 		opt.snd_wl1	= tp->snd_wl1;
> 		opt.snd_wnd	= tp->snd_wnd;
> 		opt.max_window	= tp->max_window;
> 		opt.rcv_wnd	= tp->rcv_wnd;
> 		opt.rcv_wup	= tp->rcv_wup;
> 
> CRIU uses it to checkpoint and restore established connections, and
> passt uses it to migrate them to a different host:
> 
>   https://criu.org/TCP_connection
> 
>   https://passt.top/passt/tree/tcp.c?id=02af38d4177550c086bae54246fc3aaa33ddc018#n3063
> 
> If it's strictly needed to preserve functionality, we would need to add
> it to struct tcp_repair_window, notify CRIU maintainers (or send them a
> patch), and add this in passt as well (I can take care of it).

Thanks for the pointer, I missed that tp->rcv_wnd update.  Could the
following happen when checkpointing/restoring?

1. A client app opens a connection and writes (blocking) a specific amount
   of data before doing any reads.  (Not very clever, but this is
   supposed to work; this is what caused the problem in the Valkey
   tests.)
2. The traffic pattern causes an out-of-memory condition for the
   receive buffer; we see the RWIN 0 segments that do not ack the
   last data segment(s).
3. TCP connection is checkpointed and restored (on the client side) without
   restoring rcv_mwnd_seq.
4. If the receive buffer is still full at the new location, the
   acceptable sequence numbers in the receive window will not change
   (restored client is still blocked on write) and we no longer have
   the larger max receive window -> the client's kernel will reject
   all incoming packets and the connection is stuck.

If this scenario is possible, I'd argue that rcv_mwnd_seq is
necessary.

> Strictly speaking, in case, this could be considered a breaking change
> for userspace, but I don't see how to avoid it, so I'd just make sure
> it doesn't impact users as TCP_REPAIR has just a couple of (known!)
> projects relying on it.
> 
> An alternative would be to have a special, initial value representing
> the fact that this value was lost, but it looks really annoying to not
> be able to use a u32 for it.

Do we need a dedicated value indicating that rcv_mwnd_seq is not
present, or is it enough to choose an initial rcv_mwnd_seq based on
the size of the struct passed?  Both seem doable to me:

Missing: Initialize rcv_mwnd_seq = rcv_wup + rcv_wnd (possibly
leading to the problem described above, of course)

Default value 0: Store how much we retracted the window, i.e. 
rcv_mwnd_seq - (rcv_wup + rcv_wnd). 0 means the window was not
retracted and could double as the "we don't know" value.

For the time being, I will just initialize rcv_mwnd_seq to rcv_wup +
rcv_wnd in tcp_repair_set_window() to keep status quo. Of course,
I am happy to discuss enhancements.
 
> Disregard all this if the correct value is not strictly needed for
> functionality, of course. I haven't tested things (not yet, at least).
> 
> >  	u64	tcp_wstamp_ns;	/* departure time for next sent data packet */
> >  	u64	accecn_opt_tstamp;	/* Last AccECN option sent timestamp */
> >  	struct list_head tsorted_sent_queue; /* time-sorted sent but un-SACKed skbs */
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 40e72b9cb85f08714d3f458c0bd1402a5fb1eb4e..e1944d504823d5f8754d85bfbbf3c9630d2190ac 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -912,6 +912,20 @@ static inline u32 tcp_receive_window(const struct tcp_sock *tp)
> >  	return (u32) win;
> >  }
> >  
> > +/* Compute the maximum receive window we ever advertised.
> > + * Rcv_nxt can be after the window if our peer push more data
> 
> s/push/pushes/
> 
> s/Rcv_nxt/rcv_nxt/ (useful for grepping)

tcp_max_receive_window() is an adapted copy of
tcp_receive_window() above. But it makes sense to improve it.

> 
> > + * than the offered window.
> > + */
> > +static inline u32 tcp_max_receive_window(const struct tcp_sock *tp)
> > +{
> > +	s32 win = tp->rcv_mwnd_seq - tp->rcv_nxt;
> > +
> > +	if (win < 0)
> > +		win = 0;
> 
> I must be missing something but... if the sequence is about to wrap,
> we'll return 0 here. Is that intended?
> 
> Doing the subtraction unsigned would have looked more natural to me,
> but I didn't really think it through.

The substraction is unsigned and the outcome is interpreted as
signed. And as mentioned, it is copied with pride ;-)
 
> > +	return (u32) win;
> 
> Kernel coding style doesn't usually include a space between cast and
> identifier.

Yes, same reason as above and I will change it.


-- 
Simon Baatz <gmbnomis@gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements
  2026-02-24 18:07     ` Simon Baatz
@ 2026-02-25 21:33       ` Stefano Brivio
  2026-02-26  1:10         ` Simon Baatz
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Brivio @ 2026-02-25 21:33 UTC (permalink / raw)
  To: Simon Baatz
  Cc: Simon Baatz via B4 Relay, Eric Dumazet, Neal Cardwell,
	Kuniyuki Iwashima, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, David Ahern, Jon Maloy,
	Jason Xing, mfreemon, Shuah Khan, Christian Ebner, netdev,
	linux-doc, linux-kernel, linux-kselftest

On Tue, 24 Feb 2026 19:07:45 +0100
Simon Baatz <gmbnomis@gmail.com> wrote:

> Hi Stefano,
> 
> On Mon, Feb 23, 2026 at 11:26:40PM +0100, Stefano Brivio wrote:
> > Hi Simon,
> > 
> > It all makes sense to me at a quick look, I have just some nits and one
> > more substantial worry, below:
> > 
> > On Fri, 20 Feb 2026 00:55:14 +0100
> > Simon Baatz via B4 Relay <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> >   
> > > From: Simon Baatz <gmbnomis@gmail.com>
> > > 
> > > By default, the Linux TCP implementation does not shrink the
> > > advertised window (RFC 7323 calls this "window retraction") with the
> > > following exceptions:
> > > 
> > > - When an incoming segment cannot be added due to the receive buffer
> > >   running out of memory. Since commit 8c670bdfa58e ("tcp: correct
> > >   handling of extreme memory squeeze") a zero window will be
> > >   advertised in this case. It turns out that reaching the required
> > >   "memory pressure" is very easy when window scaling is in use. In the
> > >   simplest case, sending a sufficient number of segments smaller than
> > >   the scale factor to a receiver that does not read data is enough.
> > > 
> > >   Since commit 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") this
> > >   happens much earlier than before, leading to regressions (the test
> > >   suite of the Valkey project does not pass because of a TCP
> > >   connection that is no longer bi-directional).  
> > 
> > Ouch. By the way, that same commit helped us unveil an issue (at least
> > in the sense of RFC 9293, 3.8.6) we fixed in passt:
> > 
> >   https://passt.top/passt/commit/?id=8d2f8c4d0fb58d6b2011e614bc7d7ff9dab406b3  
> 
> This looks concerning: It seems as if just filling the advertised
> window triggered the out of memory condition(?).

Right, even if it's not so much a general "out of memory" condition:
it's just that the socket might simply refuse to queue more data at
that point (we run out of window space, rather than memory).

Together with commit e2142825c120 ("net: tcp: send zero-window ACK when
no memory"), we will even get zero-window updates in that case. Jon
raised the issue here:

  https://lore.kernel.org/r/20240406182107.261472-3-jmaloy@redhat.com/

but it was not really fixed. Anyway:

> Am I right in
> assuming that this happened with the original 1d2fbaad7cd8, not the
> relaxed version of tcp_can_ingest() from f017c1f768b?

...you're right. I wasn't even aware of f017c1f768b, thanks for
pointing that out. That seems to make things saner, and I don't expect
further issues at this point.

By the way of which, passt struggled talking to applications entirely
written in the 21st century. That's socat, I think started in 2001,
being used in Podman tests, and its only SO_RCVBUF-related fault is
that it uses the default 208 KiB value (from rmem_default) as a
starting value by... not doing anything.

Applications can set SO_RCVBUF and SO_SNDBUF to bigger values
(depending on rmem_max and wmem_max), but if they do, automatic tuning
of TCP buffer sizes (which allows exceeding rmem_max and wmem_max!) is
disabled. We used to do that in passt itself, and I eventually dropped
it here:

  https://passt.top/passt/commit/?id=71249ef3f9bcf1dbb2d6c13cdbc41ba88c794f06

because we might really need automatic tuning and the resulting big
buffers for high latency, high throughput connections.

> > > - Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
> > >   allowing the tcp window to shrink") addressed the "eating memory"
> > >   problem by introducing a sysctl knob that allows shrinking the
> > >   window before running out of memory.
> > > 
> > > However, RFC 7323 does not only state that shrinking the window is
> > > necessary in some cases, it also formulates requirements for TCP
> > > implementations when doing so (Section 2.4).
> > > 
> > > This commit addresses the receiver-side requirements: After retracting
> > > the window, the peer may have a snd_nxt that lies within a previously
> > > advertised window but is now beyond the retracted window. This means
> > > that all incoming segments (including pure ACKs) will be rejected
> > > until the application happens to read enough data to let the peer's
> > > snd_nxt be in window again (which may be never).
> > > 
> > > To comply with RFC 7323, the receiver MUST honor any segment that
> > > would have been in window for any ACK sent by the receiver and, when
> > > window scaling is in effect, SHOULD track the maximum window sequence
> > > number it has advertised. This patch tracks that maximum window
> > > sequence number throughout the connection and uses it in
> > > tcp_sequence() when deciding whether a segment is acceptable.
> > > Acceptability of data is not changed.
> > > 
> > > Fixes: 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze")
> > > Fixes: b650d953cd39 ("tcp: enforce receive buffer memory limits by allowing the tcp window to shrink")
> > > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> > > ---
> > >  Documentation/networking/net_cachelines/tcp_sock.rst       |  1 +
> > >  include/linux/tcp.h                                        |  1 +
> > >  include/net/tcp.h                                          | 14 ++++++++++++++
> > >  net/ipv4/tcp_fastopen.c                                    |  1 +
> > >  net/ipv4/tcp_input.c                                       |  6 ++++--
> > >  net/ipv4/tcp_minisocks.c                                   |  1 +
> > >  net/ipv4/tcp_output.c                                      | 12 ++++++++++++
> > >  .../selftests/net/packetdrill/tcp_rcv_big_endseq.pkt       |  2 +-
> > >  8 files changed, 35 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
> > > index 563daea10d6c5c074f004cb1b8574f5392157abb..fecf61166a54ee2f64bcef5312c81dcc4aa9a124 100644
> > > --- a/Documentation/networking/net_cachelines/tcp_sock.rst
> > > +++ b/Documentation/networking/net_cachelines/tcp_sock.rst
> > > @@ -121,6 +121,7 @@ u64                           delivered_mstamp        read_write
> > >  u32                           rate_delivered                              read_mostly         tcp_rate_gen
> > >  u32                           rate_interval_us                            read_mostly         rate_delivered,rate_app_limited
> > >  u32                           rcv_wnd                 read_write          read_mostly         tcp_select_window,tcp_receive_window,tcp_fast_path_check
> > > +u32                           rcv_mwnd_seq            read_write                              tcp_select_window
> > >  u32                           write_seq               read_write                              tcp_rate_check_app_limited,tcp_write_queue_empty,tcp_skb_entail,forced_push,tcp_mark_push
> > >  u32                           notsent_lowat           read_mostly                             tcp_stream_memory_free
> > >  u32                           pushed_seq              read_write                              tcp_mark_push,forced_push
> > > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > > index f72eef31fa23cc584f2f0cefacdc35cae43aa52d..5a943b12d4c050a980b4cf81635b9fa2f0036283 100644
> > > --- a/include/linux/tcp.h
> > > +++ b/include/linux/tcp.h
> > > @@ -271,6 +271,7 @@ struct tcp_sock {
> > >  	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
> > >  	u32	mdev_us;	/* medium deviation			*/
> > >  	u32	rtt_seq;	/* sequence number to update rttvar	*/
> > > +	u32	rcv_mwnd_seq; /* Maximum window sequence number (RFC 7323, section 2.4) */  
> > 
> > Nit: tab between ; and /* for consistency (I would personally prefer
> > the comment style as you see on 'highest_sack' but I don't think it's
> > enforced anymore).  
> 
> Thanks, I missed that.
>  
> > Second nit: mentioning RFC 7323, section 2.4 could be a bit misleading
> > here because the relevant paragraph there covers a very specific case of
> > window retraction, caused by quantisation error from window scaling,
> > which is not the most common case here. I couldn't quickly find a better
> > reference though.  
> 
> I agree, but there is a part that, I think, is more generally
> applicable:
> 
> 2.4.  Addressing Window Retraction
> 
>    [ specific window retraction case introduction removed ]
>    ... Implementations MUST ensure that they handle a shrinking
>    window, as specified in Section 4.2.2.16 of [RFC1122].
> 
>    For the receiver, this implies that:
> 
>    1)  The receiver MUST honor, as in window, any segment that would
>        have been in window for any <ACK> sent by the receiver.
> 
>    2)  When window scaling is in effect, the receiver SHOULD track the
>        actual maximum window sequence number (which is likely to be
>        greater than the window announced by the most recent <ACK>, if
>        more than one segment has arrived since the application consumed
>        any data in the receive buffer).
> 
> There is no "When window scaling is in effect," on the first
> requirement. And it "happens" to be implementable by the second
> requirement (with or without window scaling).

Right, I saw that, but the first requirement doesn't mention the
"actual maximum sequence number" which this new field represents.

> I think an improvement could be to refer to the receiver requirements
> specifically here.

Ah, yes, that sounds like a good idea.

> > More importantly: do we need to restore this on a connection that's
> > being dumped and recreated using TCP_REPAIR, or will things still work
> > (even though sub-optimally) if we lose this value?
> > 
> > Other window values that *need* to be dumped and restored are currently
> > available via TCP_REPAIR_WINDOW socket option, and they are listed in
> > do_tcp_getsockopt(), net/ipv4/tcp.c:
> > 
> > 		opt.snd_wl1	= tp->snd_wl1;
> > 		opt.snd_wnd	= tp->snd_wnd;
> > 		opt.max_window	= tp->max_window;
> > 		opt.rcv_wnd	= tp->rcv_wnd;
> > 		opt.rcv_wup	= tp->rcv_wup;
> > 
> > CRIU uses it to checkpoint and restore established connections, and
> > passt uses it to migrate them to a different host:
> > 
> >   https://criu.org/TCP_connection
> > 
> >   https://passt.top/passt/tree/tcp.c?id=02af38d4177550c086bae54246fc3aaa33ddc018#n3063
> > 
> > If it's strictly needed to preserve functionality, we would need to add
> > it to struct tcp_repair_window, notify CRIU maintainers (or send them a
> > patch), and add this in passt as well (I can take care of it).  
> 
> Thanks for the pointer, I missed that tp->rcv_wnd update.  Could the
> following happen when checkpointing/restoring?
> 
> 1. A client app opens a connection and writes (blocking) a specific amount
>    of data before doing any reads.  (Not very clever, but this is
>    supposed to work; this is what caused the problem in the Valkey
>    tests.)
> 2. The traffic pattern causes an out-of-memory condition for the
>    receive buffer; we see the RWIN 0 segments that do not ack the
>    last data segment(s).
> 3. TCP connection is checkpointed and restored (on the client side) without
>    restoring rcv_mwnd_seq.
> 4. If the receive buffer is still full at the new location, the
>    acceptable sequence numbers in the receive window will not change
>    (restored client is still blocked on write) and we no longer have
>    the larger max receive window -> the client's kernel will reject
>    all incoming packets and the connection is stuck.
> 
> If this scenario is possible, I'd argue that rcv_mwnd_seq is
> necessary.

It really sounds like a corner case, especially 1. in combination with
2., but the outcome would be pretty bad, and I think it's possible.

Typically, once the connection is restored (with TCP_REPAIR_OFF, not
with TCP_REPAIR_OFF_NO_WP), the kernel sends out an empty segment as a
window probe / keepalive, but as far as I understand that wouldn't be
enough to fix the situation. And even if it did, we still have the
TCP_REPAIR_OFF_NO_WP case, even though I'm not aware of any usage.

> > Strictly speaking, in case, this could be considered a breaking change
> > for userspace, but I don't see how to avoid it, so I'd just make sure
> > it doesn't impact users as TCP_REPAIR has just a couple of (known!)
> > projects relying on it.
> > 
> > An alternative would be to have a special, initial value representing
> > the fact that this value was lost, but it looks really annoying to not
> > be able to use a u32 for it.  
> 
> Do we need a dedicated value indicating that rcv_mwnd_seq is not
> present, or is it enough to choose an initial rcv_mwnd_seq based on
> the size of the struct passed?  Both seem doable to me:
> 
> Missing: Initialize rcv_mwnd_seq = rcv_wup + rcv_wnd (possibly
> leading to the problem described above, of course)

Well but if we might run into the problem described above, we need to
dump / restore rcv_mwnd_seq in any case, so we wouldn't have an issue
at all.

Except for a compatibility issue, but what you describe looks like a
reasonable fallback.

> Default value 0: Store how much we retracted the window, i.e. 
> rcv_mwnd_seq - (rcv_wup + rcv_wnd). 0 means the window was not
> retracted and could double as the "we don't know" value.
> 
> For the time being, I will just initialize rcv_mwnd_seq to rcv_wup +
> rcv_wnd in tcp_repair_set_window() to keep status quo. Of course,
> I am happy to discuss enhancements.

That makes sense to me at a glance, but I should still review / test it
as a whole though.

> > Disregard all this if the correct value is not strictly needed for
> > functionality, of course. I haven't tested things (not yet, at least).
> >   
> > >  	u64	tcp_wstamp_ns;	/* departure time for next sent data packet */
> > >  	u64	accecn_opt_tstamp;	/* Last AccECN option sent timestamp */
> > >  	struct list_head tsorted_sent_queue; /* time-sorted sent but un-SACKed skbs */
> > > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > > index 40e72b9cb85f08714d3f458c0bd1402a5fb1eb4e..e1944d504823d5f8754d85bfbbf3c9630d2190ac 100644
> > > --- a/include/net/tcp.h
> > > +++ b/include/net/tcp.h
> > > @@ -912,6 +912,20 @@ static inline u32 tcp_receive_window(const struct tcp_sock *tp)
> > >  	return (u32) win;
> > >  }
> > >  
> > > +/* Compute the maximum receive window we ever advertised.
> > > + * Rcv_nxt can be after the window if our peer push more data  
> > 
> > s/push/pushes/
> > 
> > s/Rcv_nxt/rcv_nxt/ (useful for grepping)  
> 
> tcp_max_receive_window() is an adapted copy of
> tcp_receive_window() above. But it makes sense to improve it.

Ah, sorry, I didn't notice.

> > > + * than the offered window.
> > > + */
> > > +static inline u32 tcp_max_receive_window(const struct tcp_sock *tp)
> > > +{
> > > +	s32 win = tp->rcv_mwnd_seq - tp->rcv_nxt;
> > > +
> > > +	if (win < 0)
> > > +		win = 0;  
> > 
> > I must be missing something but... if the sequence is about to wrap,
> > we'll return 0 here. Is that intended?
> > 
> > Doing the subtraction unsigned would have looked more natural to me,
> > but I didn't really think it through.  
> 
> The substraction is unsigned and the outcome is interpreted as
> signed. And as mentioned, it is copied with pride ;-)

Oh, wow, I mean, "of course"! How could anybody ever miss that! Pride,
you say. :) ...but sure, if it's taken from there, it makes sense to
keep it like that I guess.

> > > +	return (u32) win;  
> > 
> > Kernel coding style doesn't usually include a space between cast and
> > identifier.  
> 
> Yes, same reason as above and I will change it.

-- 
Stefano


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements
  2026-02-25 21:33       ` Stefano Brivio
@ 2026-02-26  1:10         ` Simon Baatz
  2026-02-26  4:49           ` Stefano Brivio
  0 siblings, 1 reply; 13+ messages in thread
From: Simon Baatz @ 2026-02-26  1:10 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Simon Baatz via B4 Relay, Eric Dumazet, Neal Cardwell,
	Kuniyuki Iwashima, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, David Ahern, Jon Maloy,
	Jason Xing, mfreemon, Shuah Khan, Christian Ebner, netdev,
	linux-doc, linux-kernel, linux-kselftest

On Wed, Feb 25, 2026 at 10:33:34PM +0100, Stefano Brivio wrote:
> On Tue, 24 Feb 2026 19:07:45 +0100
> Simon Baatz <gmbnomis@gmail.com> wrote:
> 
> > Hi Stefano,
> > 
> > On Mon, Feb 23, 2026 at 11:26:40PM +0100, Stefano Brivio wrote:
> > > Hi Simon,
> > > 
> > > It all makes sense to me at a quick look, I have just some nits and one
> > > more substantial worry, below:
> > > 
> > > On Fri, 20 Feb 2026 00:55:14 +0100
> > > Simon Baatz via B4 Relay <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> > >   
> > > > From: Simon Baatz <gmbnomis@gmail.com>
> > > > 
> > > > By default, the Linux TCP implementation does not shrink the
> > > > advertised window (RFC 7323 calls this "window retraction") with the
> > > > following exceptions:
> > > > 
> > > > - When an incoming segment cannot be added due to the receive buffer
> > > >   running out of memory. Since commit 8c670bdfa58e ("tcp: correct
> > > >   handling of extreme memory squeeze") a zero window will be
> > > >   advertised in this case. It turns out that reaching the required
> > > >   "memory pressure" is very easy when window scaling is in use. In the
> > > >   simplest case, sending a sufficient number of segments smaller than
> > > >   the scale factor to a receiver that does not read data is enough.
> > > > 
> > > >   Since commit 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") this
> > > >   happens much earlier than before, leading to regressions (the test
> > > >   suite of the Valkey project does not pass because of a TCP
> > > >   connection that is no longer bi-directional).  
> > > 
> > > Ouch. By the way, that same commit helped us unveil an issue (at least
> > > in the sense of RFC 9293, 3.8.6) we fixed in passt:
> > > 
> > >   https://passt.top/passt/commit/?id=8d2f8c4d0fb58d6b2011e614bc7d7ff9dab406b3  
> > 
> > This looks concerning: It seems as if just filling the advertised
> > window triggered the out of memory condition(?).
> 
> Right, even if it's not so much a general "out of memory" condition:
> it's just that the socket might simply refuse to queue more data at
> that point (we run out of window space, rather than memory).
> 
> Together with commit e2142825c120 ("net: tcp: send zero-window ACK when
> no memory"), we will even get zero-window updates in that case. Jon
> raised the issue here:
> 
>   https://lore.kernel.org/r/20240406182107.261472-3-jmaloy@redhat.com/
> 
> but it was not really fixed. Anyway:

Didn't that result in 8c670bdfa58e ("tcp: correct handling of extreme
memory squeeze")?

> [...] 

-- 
Simon Baatz <gmbnomis@gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements
  2026-02-26  1:10         ` Simon Baatz
@ 2026-02-26  4:49           ` Stefano Brivio
  0 siblings, 0 replies; 13+ messages in thread
From: Stefano Brivio @ 2026-02-26  4:49 UTC (permalink / raw)
  To: Simon Baatz
  Cc: Simon Baatz via B4 Relay, Eric Dumazet, Neal Cardwell,
	Kuniyuki Iwashima, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, David Ahern, Jon Maloy,
	Jason Xing, mfreemon, Shuah Khan, Christian Ebner, netdev,
	linux-doc, linux-kernel, linux-kselftest

On Thu, 26 Feb 2026 02:10:25 +0100
Simon Baatz <gmbnomis@gmail.com> wrote:

> On Wed, Feb 25, 2026 at 10:33:34PM +0100, Stefano Brivio wrote:
> > On Tue, 24 Feb 2026 19:07:45 +0100
> > Simon Baatz <gmbnomis@gmail.com> wrote:
> >   
> > > Hi Stefano,
> > > 
> > > On Mon, Feb 23, 2026 at 11:26:40PM +0100, Stefano Brivio wrote:  
> > > > Hi Simon,
> > > > 
> > > > It all makes sense to me at a quick look, I have just some nits and one
> > > > more substantial worry, below:
> > > > 
> > > > On Fri, 20 Feb 2026 00:55:14 +0100
> > > > Simon Baatz via B4 Relay <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> > > >     
> > > > > From: Simon Baatz <gmbnomis@gmail.com>
> > > > > 
> > > > > By default, the Linux TCP implementation does not shrink the
> > > > > advertised window (RFC 7323 calls this "window retraction") with the
> > > > > following exceptions:
> > > > > 
> > > > > - When an incoming segment cannot be added due to the receive buffer
> > > > >   running out of memory. Since commit 8c670bdfa58e ("tcp: correct
> > > > >   handling of extreme memory squeeze") a zero window will be
> > > > >   advertised in this case. It turns out that reaching the required
> > > > >   "memory pressure" is very easy when window scaling is in use. In the
> > > > >   simplest case, sending a sufficient number of segments smaller than
> > > > >   the scale factor to a receiver that does not read data is enough.
> > > > > 
> > > > >   Since commit 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") this
> > > > >   happens much earlier than before, leading to regressions (the test
> > > > >   suite of the Valkey project does not pass because of a TCP
> > > > >   connection that is no longer bi-directional).    
> > > > 
> > > > Ouch. By the way, that same commit helped us unveil an issue (at least
> > > > in the sense of RFC 9293, 3.8.6) we fixed in passt:
> > > > 
> > > >   https://passt.top/passt/commit/?id=8d2f8c4d0fb58d6b2011e614bc7d7ff9dab406b3    
> > > 
> > > This looks concerning: It seems as if just filling the advertised
> > > window triggered the out of memory condition(?).  
> > 
> > Right, even if it's not so much a general "out of memory" condition:
> > it's just that the socket might simply refuse to queue more data at
> > that point (we run out of window space, rather than memory).
> > 
> > Together with commit e2142825c120 ("net: tcp: send zero-window ACK when
> > no memory"), we will even get zero-window updates in that case. Jon
> > raised the issue here:
> > 
> >   https://lore.kernel.org/r/20240406182107.261472-3-jmaloy@redhat.com/
> > 
> > but it was not really fixed. Anyway:  
> 
> Didn't that result in 8c670bdfa58e ("tcp: correct handling of extreme
> memory squeeze")?

Yes, but with that (the v3 of it) we still send zero-window updates
more frequently (because of the 'return 0' instead of 'goto out') and
together with e2142825c120 I was seeing in the captures one zero-window
update almost every time the sender filled up the window completely.

Perhaps it was even desired, I'm not sure, I can't say it's entirely
wrong (that's why I didn't propose a further patch), and strictly
speaking the issue was on passt side (we didn't send window probes in
that case, and we didn't retransmit FINs).

I guess with f017c1f768b things should be sane again. I didn't check.

-- 
Stefano


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-02-26  4:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 23:55 [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
2026-02-19 23:55 ` [PATCH RFC net-next 1/4] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
2026-02-23 22:26   ` Stefano Brivio
2026-02-24 18:07     ` Simon Baatz
2026-02-25 21:33       ` Stefano Brivio
2026-02-26  1:10         ` Simon Baatz
2026-02-26  4:49           ` Stefano Brivio
2026-02-19 23:55 ` [PATCH RFC net-next 2/4] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt Simon Baatz via B4 Relay
2026-02-19 23:55 ` [PATCH RFC net-next 3/4] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt Simon Baatz via B4 Relay
2026-02-19 23:55 ` [PATCH RFC net-next 4/4] selftests/net: packetdrill: add tcp_rcv_toobig_back_to_back.pkt Simon Baatz via B4 Relay
2026-02-20  8:58 ` [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling Eric Dumazet
2026-02-23  0:07   ` Simon Baatz
2026-02-24  9:22     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox