public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling
@ 2026-03-09  8:02 Simon Baatz via B4 Relay
  2026-03-09  8:02 ` [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-03-09  8:02 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Stefano Brivio, Matthieu Baerts, Mat Martineau,
	Geliang Tang
  Cc: netdev, linux-doc, linux-kernel, linux-kselftest, mptcp,
	Simon Baatz

Hi,

this series implements the receiver-side requirements for TCP window
retraction as specified in RFC 7323 and adds packetdrill tests to
cover the new behavior.

Please see the first patch for background and implementation
details. Since MPTCP adjusts the TCP receive window on subflows, the
relevant MPTCP code paths are updated accordingly.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
Changes in v3:

- Address MPTCP subflow-level rcv_wnd adjustments
- Removed RFC status
- Adapt tcp_rcv_wnd_shrink_nomem.pkt to reflect 026dfef287c0 ("tcp:
  give up on stronger sk_rcvbuf checks (for now)")
- Link to v2: https://lore.kernel.org/r/20260226-tcp_rfc7323_retract_wnd_rfc-v2-0-aa3f8f9cc639@gmail.com

Changes in v2:

- tcp_rcv_wnd_shrink_nomem.pkt tests more RX code paths using various
  segment types. It also uses a more drastic rcv. buffer reduction (1MB
  to 16KB).
- Setting the TCP_REPAIR_WINDOW socket option initializes rcv_mwnd_seq.
- SKB_DROP_REASON_TCP_OVERWINDOW increases LINUX_MIB_BEYOND_WINDOW now.
- Moved rcv_mwnd_seq into rcv_wnd's cacheline group.
- Small editorial changes
- Link to v1: https://lore.kernel.org/r/20260220-tcp_rfc7323_retract_wnd_rfc-v1-0-904942561479@gmail.com

---
Simon Baatz (6):
      tcp: implement RFC 7323 window retraction receiver requirements
      mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd
      tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOW
      selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
      selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
      selftests/net: packetdrill: add tcp_rcv_neg_window.pkt

 .../networking/net_cachelines/tcp_sock.rst         |   1 +
 include/linux/tcp.h                                |   3 +
 include/net/tcp.h                                  |  22 ++++
 net/ipv4/tcp.c                                     |   2 +
 net/ipv4/tcp_fastopen.c                            |   1 +
 net/ipv4/tcp_input.c                               |  11 +-
 net/ipv4/tcp_minisocks.c                           |   1 +
 net/ipv4/tcp_output.c                              |   3 +
 net/mptcp/options.c                                |   6 +-
 .../net/packetdrill/tcp_rcv_big_endseq.pkt         |   2 +-
 .../net/packetdrill/tcp_rcv_neg_window.pkt         |  26 ++++
 .../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt |  40 +++++++
 .../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt   | 132 +++++++++++++++++++++
 13 files changed, 242 insertions(+), 8 deletions(-)
---
base-commit: 0bcac7b11262557c990da1ac564d45777eb6b005
change-id: 20260220-tcp_rfc7323_retract_wnd_rfc-c8a2d2baebde

Best regards,
-- 
Simon Baatz <gmbnomis@gmail.com>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements
  2026-03-09  8:02 [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
@ 2026-03-09  8:02 ` Simon Baatz via B4 Relay
  2026-03-09  9:22   ` Eric Dumazet
  2026-03-10  8:58   ` Stefano Brivio
  2026-03-09  8:02 ` [PATCH net-next v3 2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd Simon Baatz via B4 Relay
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-03-09  8:02 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Stefano Brivio, Matthieu Baerts, Mat Martineau,
	Geliang Tang
  Cc: netdev, linux-doc, linux-kernel, linux-kselftest, mptcp,
	Simon Baatz

From: Simon Baatz <gmbnomis@gmail.com>

By default, the Linux TCP implementation does not shrink the
advertised window (RFC 7323 calls this "window retraction") with the
following exceptions:

- When an incoming segment cannot be added due to the receive buffer
  running out of memory. Since commit 8c670bdfa58e ("tcp: correct
  handling of extreme memory squeeze") a zero window will be
  advertised in this case. It turns out that reaching the required
  memory pressure is easy when window scaling is in use. In the
  simplest case, sending a sufficient number of segments smaller than
  the scale factor to a receiver that does not read data is enough.

- Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
  allowing the tcp window to shrink") addressed the "eating memory"
  problem by introducing a sysctl knob that allows shrinking the
  window before running out of memory.

However, RFC 7323 does not only state that shrinking the window is
necessary in some cases, it also formulates requirements for TCP
implementations when doing so (Section 2.4).

This commit addresses the receiver-side requirements: After retracting
the window, the peer may have a snd_nxt that lies within a previously
advertised window but is now beyond the retracted window. This means
that all incoming segments (including pure ACKs) will be rejected
until the application happens to read enough data to let the peer's
snd_nxt be in window again (which may be never).

To comply with RFC 7323, the receiver MUST honor any segment that
would have been in window for any ACK sent by the receiver and, when
window scaling is in effect, SHOULD track the maximum window sequence
number it has advertised. This patch tracks that maximum window
sequence number rcv_mwnd_seq throughout the connection and uses it in
tcp_sequence() when deciding whether a segment is acceptable.

rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
tcp_select_window(). If we count tcp_sequence() as fast path, it is
read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's
cacheline group.

The logic for handling received data in tcp_data_queue() is already
sufficient and does not need to be updated.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
 .../networking/net_cachelines/tcp_sock.rst         |  1 +
 include/linux/tcp.h                                |  3 +++
 include/net/tcp.h                                  | 22 ++++++++++++++++++++++
 net/ipv4/tcp.c                                     |  2 ++
 net/ipv4/tcp_fastopen.c                            |  1 +
 net/ipv4/tcp_input.c                               | 10 +++++-----
 net/ipv4/tcp_minisocks.c                           |  1 +
 net/ipv4/tcp_output.c                              |  3 +++
 .../net/packetdrill/tcp_rcv_big_endseq.pkt         |  2 +-
 9 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
index 563daea10d6c5c074f004cb1b8574f5392157abb..fecf61166a54ee2f64bcef5312c81dcc4aa9a124 100644
--- a/Documentation/networking/net_cachelines/tcp_sock.rst
+++ b/Documentation/networking/net_cachelines/tcp_sock.rst
@@ -121,6 +121,7 @@ u64                           delivered_mstamp        read_write
 u32                           rate_delivered                              read_mostly         tcp_rate_gen
 u32                           rate_interval_us                            read_mostly         rate_delivered,rate_app_limited
 u32                           rcv_wnd                 read_write          read_mostly         tcp_select_window,tcp_receive_window,tcp_fast_path_check
+u32                           rcv_mwnd_seq            read_write                              tcp_select_window
 u32                           write_seq               read_write                              tcp_rate_check_app_limited,tcp_write_queue_empty,tcp_skb_entail,forced_push,tcp_mark_push
 u32                           notsent_lowat           read_mostly                             tcp_stream_memory_free
 u32                           pushed_seq              read_write                              tcp_mark_push,forced_push
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f72eef31fa23cc584f2f0cefacdc35cae43aa52d..73aa2e0ccd1d7a6314a00c27950b019b62a3851c 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -316,6 +316,9 @@ struct tcp_sock {
 					*/
 	u32	app_limited;	/* limited until "delivered" reaches this val */
 	u32	rcv_wnd;	/* Current receiver window		*/
+	u32	rcv_mwnd_seq;	/* Maximum window sequence number (RFC 7323,
+				 * section 2.4, receiver requirements)
+				 */
 	u32	rcv_tstamp;	/* timestamp of last received ACK (for keepalives) */
 /*
  *      Options received (usually on last packet, some only on SYN packets).
diff --git a/include/net/tcp.h b/include/net/tcp.h
index a6464142380696e4948a836145ac7aca4ca3ec15..5fa8455ee9bc52d1434feaf82dda80be067a36e6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -921,6 +921,28 @@ static inline u32 tcp_receive_window(const struct tcp_sock *tp)
 	return (u32) win;
 }
 
+/* Compute the maximum receive window we ever advertised.
+ * Rcv_nxt can be after the window if our peer push more data
+ * than the offered window.
+ */
+static inline u32 tcp_max_receive_window(const struct tcp_sock *tp)
+{
+	s32 win = tp->rcv_mwnd_seq - tp->rcv_nxt;
+
+	if (win < 0)
+		win = 0;
+	return (u32) win;
+}
+
+/* Check if we need to update the maximum receive window sequence number */
+static inline void tcp_update_max_rcv_wnd_seq(struct tcp_sock *tp)
+{
+	u32 wre = tp->rcv_wup + tp->rcv_wnd;
+
+	if (after(wre, tp->rcv_mwnd_seq))
+		tp->rcv_mwnd_seq = wre;
+}
+
 /* Choose a new window, without checks for shrinking, and without
  * scaling applied to the result.  The caller does these things
  * if necessary.  This is a "raw" window selection.
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ed6f6712f06076dc33af61947782bde436dde15e..516087c622ade78883ca41e4f883740e305035a0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3561,6 +3561,7 @@ static int tcp_repair_set_window(struct tcp_sock *tp, sockptr_t optbuf, int len)
 
 	tp->rcv_wnd	= opt.rcv_wnd;
 	tp->rcv_wup	= opt.rcv_wup;
+	tp->rcv_mwnd_seq = opt.rcv_wup + opt.rcv_wnd;
 
 	return 0;
 }
@@ -5275,6 +5276,7 @@ static void __init tcp_struct_check(void)
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, received_ecn_bytes);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, app_limited);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_wnd);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_mwnd_seq);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_tstamp);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rx_opt);
 
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 9fdc19accafd23c6ab74bd82f7a7d82de1d60b90..4e389d609f919c17435509c5007bc3b2a13eac6c 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -377,6 +377,7 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
 
 	tcp_rsk(req)->rcv_nxt = tp->rcv_nxt;
 	tp->rcv_wup = tp->rcv_nxt;
+	tp->rcv_mwnd_seq = tp->rcv_wup + tp->rcv_wnd;
 	/* tcp_conn_request() is sending the SYNACK,
 	 * and queues the child into listener accept queue.
 	 */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 71ac69b7b75e4919f69631a4894421fa4e417c95..2e1b237608150c2e9c9baf73cf047ed0823ca555 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4808,20 +4808,18 @@ static enum skb_drop_reason tcp_sequence(const struct sock *sk,
 					 const struct tcphdr *th)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
-	u32 seq_limit;
 
 	if (before(end_seq, tp->rcv_wup))
 		return SKB_DROP_REASON_TCP_OLD_SEQUENCE;
 
-	seq_limit = tp->rcv_nxt + tcp_receive_window(tp);
-	if (unlikely(after(end_seq, seq_limit))) {
+	if (unlikely(after(end_seq, tp->rcv_nxt + tcp_max_receive_window(tp)))) {
 		/* Some stacks are known to handle FIN incorrectly; allow the
 		 * FIN to extend beyond the window and check it in detail later.
 		 */
-		if (!after(end_seq - th->fin, seq_limit))
+		if (!after(end_seq - th->fin, tp->rcv_nxt + tcp_receive_window(tp)))
 			return SKB_NOT_DROPPED_YET;
 
-		if (after(seq, seq_limit))
+		if (after(seq, tp->rcv_nxt + tcp_max_receive_window(tp)))
 			return SKB_DROP_REASON_TCP_INVALID_SEQUENCE;
 
 		/* Only accept this packet if receive queue is empty. */
@@ -6903,6 +6901,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 		 */
 		WRITE_ONCE(tp->rcv_nxt, TCP_SKB_CB(skb)->seq + 1);
 		tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;
+		tp->rcv_mwnd_seq = tp->rcv_wup + tp->rcv_wnd;
 
 		/* RFC1323: The window in SYN & SYN/ACK segments is
 		 * never scaled.
@@ -7015,6 +7014,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 		WRITE_ONCE(tp->rcv_nxt, TCP_SKB_CB(skb)->seq + 1);
 		WRITE_ONCE(tp->copied_seq, tp->rcv_nxt);
 		tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;
+		tp->rcv_mwnd_seq = tp->rcv_wup + tp->rcv_wnd;
 
 		/* RFC1323: The window in SYN & SYN/ACK segments is
 		 * never scaled.
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index dafb63b923d0d08cb1a0e9a37d8ec025386a960a..d350d794a959720853ffd8937cfdc34c03e2ce30 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -604,6 +604,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 	newtp->window_clamp = req->rsk_window_clamp;
 	newtp->rcv_ssthresh = req->rsk_rcv_wnd;
 	newtp->rcv_wnd = req->rsk_rcv_wnd;
+	newtp->rcv_mwnd_seq = newtp->rcv_wup + req->rsk_rcv_wnd;
 	newtp->rx_opt.wscale_ok = ireq->wscale_ok;
 	if (newtp->rx_opt.wscale_ok) {
 		newtp->rx_opt.snd_wscale = ireq->snd_wscale;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f0ebcc7e287173be6198fd100130e7ba1a1dbf03..c86910d147f2394bf414d7691d8f90ed41c1b0e3 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -293,6 +293,7 @@ static u16 tcp_select_window(struct sock *sk)
 		tp->pred_flags = 0;
 		tp->rcv_wnd = 0;
 		tp->rcv_wup = tp->rcv_nxt;
+		tcp_update_max_rcv_wnd_seq(tp);
 		return 0;
 	}
 
@@ -316,6 +317,7 @@ static u16 tcp_select_window(struct sock *sk)
 
 	tp->rcv_wnd = new_win;
 	tp->rcv_wup = tp->rcv_nxt;
+	tcp_update_max_rcv_wnd_seq(tp);
 
 	/* Make sure we do not exceed the maximum possible
 	 * scaled window.
@@ -4195,6 +4197,7 @@ static void tcp_connect_init(struct sock *sk)
 	else
 		tp->rcv_tstamp = tcp_jiffies32;
 	tp->rcv_wup = tp->rcv_nxt;
+	tp->rcv_mwnd_seq = tp->rcv_nxt + tp->rcv_wnd;
 	WRITE_ONCE(tp->copied_seq, tp->rcv_nxt);
 
 	inet_csk(sk)->icsk_rto = tcp_timeout_init(sk);
diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt
index 6c0f32c40f19be2a750fc9d69bbf64250cd7b525..12882be10f2e0cf19e6bc7bd2479b27c11ce8ac0 100644
--- a/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt
+++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_big_endseq.pkt
@@ -36,7 +36,7 @@
 
   +0 read(4, ..., 100000) = 4000
 
-// If queue is empty, accept a packet even if its end_seq is above wup + rcv_wnd
+// If queue is empty, accept a packet even if its end_seq is above rcv_mwnd_seq
   +0 < P. 4001:54001(50000) ack 1 win 257
    * > .  1:1(0) ack 54001 win 0
 

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd
  2026-03-09  8:02 [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
  2026-03-09  8:02 ` [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
@ 2026-03-09  8:02 ` Simon Baatz via B4 Relay
  2026-03-10  8:46   ` Eric Dumazet
  2026-03-11 18:27   ` Matthieu Baerts
  2026-03-09  8:02 ` [PATCH net-next v3 3/6] tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOW Simon Baatz via B4 Relay
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-03-09  8:02 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Stefano Brivio, Matthieu Baerts, Mat Martineau,
	Geliang Tang
  Cc: netdev, linux-doc, linux-kernel, linux-kselftest, mptcp,
	Simon Baatz

From: Simon Baatz <gmbnomis@gmail.com>

MPTCP shares a receive window across subflows and applies it at the
subflow level by adjusting each subflow's rcv_wnd when needed.  With
the new TCP tracking of the maximum advertised window sequence,
rcv_mwnd_seq must stay consistent with these subflow-level rcv_wnd
adjustments.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
 net/mptcp/options.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 43df4293f58bfbd8a8df6bf24b9f15e0f9e238f6..8a1c5698983cff3082d68290626dd8f1e044527f 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1076,6 +1076,7 @@ static void rwin_update(struct mptcp_sock *msk, struct sock *ssk,
 	 * resync.
 	 */
 	tp->rcv_wnd += mptcp_rcv_wnd - subflow->rcv_wnd_sent;
+	tcp_update_max_rcv_wnd_seq(tp);
 	subflow->rcv_wnd_sent = mptcp_rcv_wnd;
 }
 
@@ -1338,8 +1339,9 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
 		 */
 		rcv_wnd_new = rcv_wnd_old;
 		win = rcv_wnd_old - ack_seq;
-		tp->rcv_wnd = min_t(u64, win, U32_MAX);
-		new_win = tp->rcv_wnd;
+		new_win = min_t(u64, win, U32_MAX);
+		tp->rcv_wnd = new_win;
+		tcp_update_max_rcv_wnd_seq(tp);
 
 		/* Make sure we do not exceed the maximum possible
 		 * scaled window.

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 3/6] tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOW
  2026-03-09  8:02 [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
  2026-03-09  8:02 ` [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
  2026-03-09  8:02 ` [PATCH net-next v3 2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd Simon Baatz via B4 Relay
@ 2026-03-09  8:02 ` Simon Baatz via B4 Relay
  2026-03-09  9:27   ` Eric Dumazet
  2026-03-09  8:02 ` [PATCH net-next v3 4/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt Simon Baatz via B4 Relay
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-03-09  8:02 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Stefano Brivio, Matthieu Baerts, Mat Martineau,
	Geliang Tang
  Cc: netdev, linux-doc, linux-kernel, linux-kselftest, mptcp,
	Simon Baatz

From: Simon Baatz <gmbnomis@gmail.com>

Since commit 9ca48d616ed7 ("tcp: do not accept packets beyond
window"), the path leading to SKB_DROP_REASON_TCP_OVERWINDOW in
tcp_data_queue() is probably dead. However, it can be reached now when
tcp_max_receive_window() is larger than tcp_receive_window(). In that
case, increment LINUX_MIB_BEYOND_WINDOW as done in tcp_sequence().

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
 net/ipv4/tcp_input.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2e1b237608150c2e9c9baf73cf047ed0823ca555..e6b2f4be7723db14acf2ae528df17b6d106b9da9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5678,6 +5678,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 	if (!before(TCP_SKB_CB(skb)->seq,
 		    tp->rcv_nxt + tcp_receive_window(tp))) {
 		reason = SKB_DROP_REASON_TCP_OVERWINDOW;
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_BEYOND_WINDOW);
 		goto out_of_window;
 	}
 

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 4/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
  2026-03-09  8:02 [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
                   ` (2 preceding siblings ...)
  2026-03-09  8:02 ` [PATCH net-next v3 3/6] tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOW Simon Baatz via B4 Relay
@ 2026-03-09  8:02 ` Simon Baatz via B4 Relay
  2026-03-10  8:46   ` Eric Dumazet
  2026-03-09  8:02 ` [PATCH net-next v3 5/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt Simon Baatz via B4 Relay
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-03-09  8:02 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Stefano Brivio, Matthieu Baerts, Mat Martineau,
	Geliang Tang
  Cc: netdev, linux-doc, linux-kernel, linux-kselftest, mptcp,
	Simon Baatz

From: Simon Baatz <gmbnomis@gmail.com>

This test verifies
- the sequence number checks using the maximum advertised window
  sequence number and
- the logic for handling received data in tcp_data_queue()

for the cases:

1. The window is reduced to zero because of memory

2. The window grows again but still does not reach the originally
   advertised window

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
 .../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt   | 132 +++++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt
new file mode 100644
index 0000000000000000000000000000000000000000..69b060c548eac50f5dc5c034c90f0b8eae4b4fa6
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+// When tcp_receive_window() < tcp_max_receive_window(), tcp_sequence() accepts
+// packets that would be dropped under normal conditions (i.e. tcp_receive_window()
+// equal to tcp_max_receive_window()).
+// Test that such packets are handled as expected for RWIN == 0 and for RWIN > 0.
+
+--mss=1000
+
+`./defaults.sh`
+
+    0 `nstat -n`
+
+// Establish a connection.
+   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+   +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [1000000], 4) = 0
+   +0 bind(3, ..., ...) = 0
+   +0 listen(3, 1) = 0
+
+   +0 < S 0:0(0) win 32792 <mss 1000,nop,nop,sackOK,nop,wscale 7>
+   +0 > S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK,nop,wscale 4>
+   +0 < . 1:1(0) ack 1 win 257
+
+   +0 accept(3, ..., ...) = 4
+
+// Put 1040000 bytes into the receive buffer
+   +0 < P. 1:65001(65000) ack 1 win 257
+    * > .  1:1(0) ack 65001
+   +0 < P. 65001:130001(65000) ack 1 win 257
+    * > .  1:1(0) ack 130001
+   +0 < P. 130001:195001(65000) ack 1 win 257
+    * > .  1:1(0) ack 195001
+   +0 < P. 195001:260001(65000) ack 1 win 257
+    * > .  1:1(0) ack 260001
+   +0 < P. 260001:325001(65000) ack 1 win 257
+    * > .  1:1(0) ack 325001
+   +0 < P. 325001:390001(65000) ack 1 win 257
+    * > .  1:1(0) ack 390001
+   +0 < P. 390001:455001(65000) ack 1 win 257
+    * > .  1:1(0) ack 455001
+   +0 < P. 455001:520001(65000) ack 1 win 257
+    * > .  1:1(0) ack 520001
+   +0 < P. 520001:585001(65000) ack 1 win 257
+    * > .  1:1(0) ack 585001
+   +0 < P. 585001:650001(65000) ack 1 win 257
+    * > .  1:1(0) ack 650001
+   +0 < P. 650001:715001(65000) ack 1 win 257
+    * > .  1:1(0) ack 715001
+   +0 < P. 715001:780001(65000) ack 1 win 257
+    * > .  1:1(0) ack 780001
+   +0 < P. 780001:845001(65000) ack 1 win 257
+    * > .  1:1(0) ack 845001
+   +0 < P. 845001:910001(65000) ack 1 win 257
+    * > .  1:1(0) ack 910001
+   +0 < P. 910001:975001(65000) ack 1 win 257
+    * > .  1:1(0) ack 975001
+   +0 < P. 975001:1040001(65000) ack 1 win 257
+    * > .  1:1(0) ack 1040001
+
+// Trigger an extreme memory squeeze by shrinking SO_RCVBUF
+   +0 setsockopt(4, SOL_SOCKET, SO_RCVBUF, [16000], 4) = 0
+
+   +0 < P. 1040001:1105001(65000) ack 1 win 257
+    * > .  1:1(0) ack 1040001 win 0
+// Check LINUX_MIB_TCPRCVQDROP has been incremented
+   +0 `nstat -s | grep TcpExtTCPRcvQDrop| grep -q " 1 "`
+
+// RWIN == 0: rcv_wup = 1040001, rcv_wnd = 0, rcv_mwnd_seq > 1105001 (significantly larger, typically ~1970000)
+
+// Accept pure ack with seq in max adv. window
+   +0 write(4, ..., 1000) = 1000
+   +0 > P. 1:1001(1000) ack 1040001 win 0
+   +0 < .  1105001:1105001(0) ack 1001 win 257
+
+// In order segment, in max adv. window -> drop (SKB_DROP_REASON_TCP_ZEROWINDOW)
+   +0 < P. 1040001:1041001(1000) ack 1001 win 257
+   +0 > .  1001:1001(0) ack 1040001 win 0
+// Ooo partial segment, in max adv. window -> drop (SKB_DROP_REASON_TCP_ZEROWINDOW)
+   +0 < P. 1039001:1041001(2000) ack 1001 win 257
+   +0 > .  1001:1001(0) ack 1040001 win 0 <nop,nop,sack 1039001:1040001>
+// Check LINUX_MIB_TCPZEROWINDOWDROP has been incremented twice
+   +0 `nstat -s | grep TcpExtTCPZeroWindowDrop| grep -q " 2 "`
+
+// Ooo segment, in max adv. window -> drop (SKB_DROP_REASON_TCP_OVERWINDOW)
+   +0 < P. 1105001:1106001(1000) ack 1001 win 257
+   +0 > .  1001:1001(0) ack 1040001 win 0
+// Ooo segment, beyond max adv. window -> drop (SKB_DROP_REASON_TCP_INVALID_SEQUENCE)
+   +0 < P. 2000001:2001001(1000) ack 1001 win 257
+   +0 > .  1001:1001(0) ack 1040001 win 0
+// Check LINUX_MIB_BEYOND_WINDOW has been incremented twice
+   +0 `nstat -s | grep TcpExtBeyondWindow | grep -q " 2 "`
+
+// Read all data
+   +0 read(4, ..., 2000000) = 1040000
+    * > .  1001:1001(0) ack 1040001
+
+// RWIN > 0: rcv_wup = 1040001, 0 < rcv_wnd < 32000, rcv_mwnd_seq > 1105001 (significantly larger, typically ~1970000)
+
+// Accept pure ack with seq in max adv. window, beyond adv. window
+   +0 write(4, ..., 1000) = 1000
+   +0 > P.  1001:2001(1000) ack 1040001
+   +0 < . 1105001:1105001(0) ack 2001 win 257
+
+// In order segment, in max adv. window, in adv. window -> accept
+// Note: This also ensures that we cannot hit the empty queue exception in tcp_sequence() in the following tests
+   +0 < P. 1040001:1041001(1000) ack 2001 win 257
+    * > .  2001:2001(0) ack 1041001
+
+// Ooo partial segment, in adv. window -> accept
+   +0 < P. 1040001:1042001(2000) ack 2001 win 257
+   +0 > .  2001:2001(0) ack 1042001 <nop,nop,sack 1040001:1041001>
+
+// Ooo segment, in max adv. window, beyond adv. window -> drop (SKB_DROP_REASON_TCP_OVERWINDOW)
+   +0 < P. 1105001:1106001(1000) ack 2001 win 257
+   +0 > .  2001:2001(0) ack 1042001
+// Ooo segment, beyond max adv. window, beyond adv. window -> drop (SKB_DROP_REASON_TCP_INVALID_SEQUENCE)
+   +0 < P. 2000001:2001001(1000) ack 2001 win 257
+   +0 > .  2001:2001(0) ack 1042001
+// Check LINUX_MIB_BEYOND_WINDOW has been incremented twice
+   +0 `nstat -s | grep TcpExtBeyondWindow | grep -q " 4 "`
+
+// We are allowed to go beyond the window and buffer with one packet
+   +0 < P. 1042001:1062001(20000) ack 2001 win 257
+    * > .  2001:2001(0) ack 1062001
+   +0 < P. 1062001:1082001(20000) ack 2001 win 257
+    * > .  2001:2001(0) ack 1082001 win 0
+
+// But not more: In order segment, in max adv. window -> drop (SKB_DROP_REASON_TCP_ZEROWINDOW) 
+   +0 < P. 1082001:1083001(1000) ack 2001 win 257
+    * > .  2001:2001(0) ack 1082001
+// Check LINUX_MIB_TCPZEROWINDOWDROP has been incremented again
+   +0 `nstat -s | grep TcpExtTCPZeroWindowDrop| grep -q " 3 "`

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 5/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
  2026-03-09  8:02 [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
                   ` (3 preceding siblings ...)
  2026-03-09  8:02 ` [PATCH net-next v3 4/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt Simon Baatz via B4 Relay
@ 2026-03-09  8:02 ` Simon Baatz via B4 Relay
  2026-03-10  8:52   ` Eric Dumazet
  2026-03-09  8:02 ` [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt Simon Baatz via B4 Relay
  2026-03-14 15:40 ` [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling patchwork-bot+netdevbpf
  6 siblings, 1 reply; 27+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-03-09  8:02 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Stefano Brivio, Matthieu Baerts, Mat Martineau,
	Geliang Tang
  Cc: netdev, linux-doc, linux-kernel, linux-kselftest, mptcp,
	Simon Baatz

From: Simon Baatz <gmbnomis@gmail.com>

This test verifies the sequence number checks using the maximum
advertised window sequence number when net.ipv4.tcp_shrink_window
is enabled.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
 .../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt | 40 ++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt
new file mode 100644
index 0000000000000000000000000000000000000000..6af0e0eb183a0d2fa474c304d969ce6ddeb2a1e1
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+
+--mss=1000
+
+`./defaults.sh
+sysctl -q net.ipv4.tcp_shrink_window=1
+sysctl -q net.ipv4.tcp_rmem="4096 32768 $((32*1024*1024))"`
+
+   0 `nstat -n`
+
+// Establish a connection.
+  +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+  +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+  +0 bind(3, ..., ...) = 0
+  +0 listen(3, 1) = 0
+
+  +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
+  +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 10>
+  +0 < . 1:1(0) ack 1 win 257
+
+  +0 accept(3, ..., ...) = 4
+
+  +0 < P. 1:10001(10000) ack 1 win 257
+   * > .  1:1(0) ack 10001 win 15
+
+  +0 < P. 10001:11024(1023) ack 1 win 257
+   * > .  1:1(0) ack 11024 win 13
+
+// Max window seq advertised 10001 + 15*1024 = 25361, last advertised: 11024 + 13*1024 = 24336
+
+// Segment beyond the max window is dropped
+  +0 < P. 11024:25362(14338) ack 1 win 257
+   * > .  1:1(0) ack 11024 win 13
+
+// Segment using the max window is accepted
+  +0 < P. 11024:25361(14337) ack 1 win 257
+   * > .  1:1(0) ack 25361 win 0
+
+// Check LINUX_MIB_BEYOND_WINDOW has been incremented once
+  +0 `nstat | grep TcpExtBeyondWindow | grep -q " 1 "`

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt
  2026-03-09  8:02 [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
                   ` (4 preceding siblings ...)
  2026-03-09  8:02 ` [PATCH net-next v3 5/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt Simon Baatz via B4 Relay
@ 2026-03-09  8:02 ` Simon Baatz via B4 Relay
  2026-03-10  8:54   ` Eric Dumazet
  2026-03-14 15:40 ` [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling patchwork-bot+netdevbpf
  6 siblings, 1 reply; 27+ messages in thread
From: Simon Baatz via B4 Relay @ 2026-03-09  8:02 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Stefano Brivio, Matthieu Baerts, Mat Martineau,
	Geliang Tang
  Cc: netdev, linux-doc, linux-kernel, linux-kselftest, mptcp,
	Simon Baatz

From: Simon Baatz <gmbnomis@gmail.com>

The test ensures we correctly apply the maximum advertised window limit
when rcv_nxt advances past rcv_mwnd_seq, so that the "usable window"
is properly clamped to zero rather than becoming negative.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
 .../net/packetdrill/tcp_rcv_neg_window.pkt         | 26 ++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
new file mode 100644
index 0000000000000000000000000000000000000000..15a9b4938f16d175ac54f3fd192ed2b59b0a4399
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+
+--mss=1000
+
+`./defaults.sh`
+
+// Establish a connection.
+   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+   +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [20000], 4) = 0
+   +0 bind(3, ..., ...) = 0
+   +0 listen(3, 1) = 0
+
+   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
+   +0 > S. 0:0(0) ack 1 win 18980 <mss 1460,nop,wscale 0>
+  +.1 < . 1:1(0) ack 1 win 257
+
+   +0 accept(3, ..., ...) = 4
+
+// A too big packet is accepted if the receive queue is empty
+   +0 < P. 1:20001(20000) ack 1 win 257
+// Send a RST immediately so that there is no rcv_wup/rcv_mwnd_seq update yet
+   +0 < R. 20001:20001(0) ack 1 win 257
+
+  +.1 %{ assert tcpi_state == TCP_CLOSE, tcpi_state }%
+

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements
  2026-03-09  8:02 ` [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
@ 2026-03-09  9:22   ` Eric Dumazet
  2026-03-09 18:35     ` Simon Baatz
  2026-03-10  8:58   ` Stefano Brivio
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2026-03-09  9:22 UTC (permalink / raw)
  To: gmbnomis
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Mon, Mar 9, 2026 at 9:03 AM Simon Baatz via B4 Relay
<devnull+gmbnomis.gmail.com@kernel.org> wrote:
>
> From: Simon Baatz <gmbnomis@gmail.com>
>
> By default, the Linux TCP implementation does not shrink the
> advertised window (RFC 7323 calls this "window retraction") with the
> following exceptions:
>
> - When an incoming segment cannot be added due to the receive buffer
>   running out of memory. Since commit 8c670bdfa58e ("tcp: correct
>   handling of extreme memory squeeze") a zero window will be
>   advertised in this case. It turns out that reaching the required
>   memory pressure is easy when window scaling is in use. In the
>   simplest case, sending a sufficient number of segments smaller than
>   the scale factor to a receiver that does not read data is enough.
>
> - Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
>   allowing the tcp window to shrink") addressed the "eating memory"
>   problem by introducing a sysctl knob that allows shrinking the
>   window before running out of memory.
>
> However, RFC 7323 does not only state that shrinking the window is
> necessary in some cases, it also formulates requirements for TCP
> implementations when doing so (Section 2.4).
>
> This commit addresses the receiver-side requirements: After retracting
> the window, the peer may have a snd_nxt that lies within a previously
> advertised window but is now beyond the retracted window. This means
> that all incoming segments (including pure ACKs) will be rejected
> until the application happens to read enough data to let the peer's
> snd_nxt be in window again (which may be never).
>
> To comply with RFC 7323, the receiver MUST honor any segment that
> would have been in window for any ACK sent by the receiver and, when
> window scaling is in effect, SHOULD track the maximum window sequence
> number it has advertised. This patch tracks that maximum window
> sequence number rcv_mwnd_seq throughout the connection and uses it in
> tcp_sequence() when deciding whether a segment is acceptable.
>
> rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
> tcp_select_window(). If we count tcp_sequence() as fast path, it is
> read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's
> cacheline group.
>
> The logic for handling received data in tcp_data_queue() is already
> sufficient and does not need to be updated.
>
> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>

...

> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index f0ebcc7e287173be6198fd100130e7ba1a1dbf03..c86910d147f2394bf414d7691d8f90ed41c1b0e3 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -293,6 +293,7 @@ static u16 tcp_select_window(struct sock *sk)
>                 tp->pred_flags = 0;
>                 tp->rcv_wnd = 0;
>                 tp->rcv_wup = tp->rcv_nxt;
> +               tcp_update_max_rcv_wnd_seq(tp);

Presumably we do not need  tcp_update_max_rcv_wnd_seq() here ?

Otherwise patch looks good, thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 3/6] tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOW
  2026-03-09  8:02 ` [PATCH net-next v3 3/6] tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOW Simon Baatz via B4 Relay
@ 2026-03-09  9:27   ` Eric Dumazet
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2026-03-09  9:27 UTC (permalink / raw)
  To: gmbnomis
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Mon, Mar 9, 2026 at 9:03 AM Simon Baatz via B4 Relay
<devnull+gmbnomis.gmail.com@kernel.org> wrote:
>
> From: Simon Baatz <gmbnomis@gmail.com>
>
> Since commit 9ca48d616ed7 ("tcp: do not accept packets beyond
> window"), the path leading to SKB_DROP_REASON_TCP_OVERWINDOW in
> tcp_data_queue() is probably dead. However, it can be reached now when
> tcp_max_receive_window() is larger than tcp_receive_window(). In that
> case, increment LINUX_MIB_BEYOND_WINDOW as done in tcp_sequence().
>
> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> ---
>  net/ipv4/tcp_input.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 2e1b237608150c2e9c9baf73cf047ed0823ca555..e6b2f4be7723db14acf2ae528df17b6d106b9da9 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5678,6 +5678,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
>         if (!before(TCP_SKB_CB(skb)->seq,
>                     tp->rcv_nxt + tcp_receive_window(tp))) {
>                 reason = SKB_DROP_REASON_TCP_OVERWINDOW;
> +               NET_INC_STATS(sock_net(sk), LINUX_MIB_BEYOND_WINDOW);
>                 goto out_of_window;
>         }

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements
  2026-03-09  9:22   ` Eric Dumazet
@ 2026-03-09 18:35     ` Simon Baatz
  2026-03-10  7:40       ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Simon Baatz @ 2026-03-09 18:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

Hi Eric,

thank you for the quick review.

On Mon, Mar 09, 2026 at 10:22:39AM +0100, Eric Dumazet wrote:
> On Mon, Mar 9, 2026 at 9:03???AM Simon Baatz via B4 Relay
> <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> >
> > From: Simon Baatz <gmbnomis@gmail.com>
> >
> > By default, the Linux TCP implementation does not shrink the
> > advertised window (RFC 7323 calls this "window retraction") with the
> > following exceptions:
> >
> > - When an incoming segment cannot be added due to the receive buffer
> >   running out of memory. Since commit 8c670bdfa58e ("tcp: correct
> >   handling of extreme memory squeeze") a zero window will be
> >   advertised in this case. It turns out that reaching the required
> >   memory pressure is easy when window scaling is in use. In the
> >   simplest case, sending a sufficient number of segments smaller than
> >   the scale factor to a receiver that does not read data is enough.
> >
> > - Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
> >   allowing the tcp window to shrink") addressed the "eating memory"
> >   problem by introducing a sysctl knob that allows shrinking the
> >   window before running out of memory.
> >
> > However, RFC 7323 does not only state that shrinking the window is
> > necessary in some cases, it also formulates requirements for TCP
> > implementations when doing so (Section 2.4).
> >
> > This commit addresses the receiver-side requirements: After retracting
> > the window, the peer may have a snd_nxt that lies within a previously
> > advertised window but is now beyond the retracted window. This means
> > that all incoming segments (including pure ACKs) will be rejected
> > until the application happens to read enough data to let the peer's
> > snd_nxt be in window again (which may be never).
> >
> > To comply with RFC 7323, the receiver MUST honor any segment that
> > would have been in window for any ACK sent by the receiver and, when
> > window scaling is in effect, SHOULD track the maximum window sequence
> > number it has advertised. This patch tracks that maximum window
> > sequence number rcv_mwnd_seq throughout the connection and uses it in
> > tcp_sequence() when deciding whether a segment is acceptable.
> >
> > rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
> > tcp_select_window(). If we count tcp_sequence() as fast path, it is
> > read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's
> > cacheline group.
> >
> > The logic for handling received data in tcp_data_queue() is already
> > sufficient and does not need to be updated.
> >
> > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> 
> ...
> 
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index f0ebcc7e287173be6198fd100130e7ba1a1dbf03..c86910d147f2394bf414d7691d8f90ed41c1b0e3 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -293,6 +293,7 @@ static u16 tcp_select_window(struct sock *sk)
> >                 tp->pred_flags = 0;
> >                 tp->rcv_wnd = 0;
> >                 tp->rcv_wup = tp->rcv_nxt;
> > +               tcp_update_max_rcv_wnd_seq(tp);
> 
> Presumably we do not need  tcp_update_max_rcv_wnd_seq() here ?

When we don't update here and are forced to accept a beyond-window
packet because the receive queue is empty, we can reach a state where

 rcv_mwnd_seq < rcv_wup + rcv_wnd == rcv_nxt 

I noticed this case when instrumenting the kernel and got violations
of the invariant rcv_wup + rcv_wnd <= rcv_mwnd_seq.

So, while not strictly needed (tcp_max_receive_window() would still
be 0 as rcv_nxt > rcv_mwnd_seq), I opted to include the call here to
keep rcv_mwnd_seq the actual maximum sequence number at all times.

> 
> Otherwise patch looks good, thanks.

-- 
Simon Baatz <gmbnomis@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements
  2026-03-09 18:35     ` Simon Baatz
@ 2026-03-10  7:40       ` Eric Dumazet
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2026-03-10  7:40 UTC (permalink / raw)
  To: Simon Baatz
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Mon, Mar 9, 2026 at 7:35 PM Simon Baatz <gmbnomis@gmail.com> wrote:
>
> Hi Eric,
>
> thank you for the quick review.
>
> On Mon, Mar 09, 2026 at 10:22:39AM +0100, Eric Dumazet wrote:
> > On Mon, Mar 9, 2026 at 9:03???AM Simon Baatz via B4 Relay
> > <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> > >
> > > From: Simon Baatz <gmbnomis@gmail.com>
> > >
> > > By default, the Linux TCP implementation does not shrink the
> > > advertised window (RFC 7323 calls this "window retraction") with the
> > > following exceptions:
> > >
> > > - When an incoming segment cannot be added due to the receive buffer
> > >   running out of memory. Since commit 8c670bdfa58e ("tcp: correct
> > >   handling of extreme memory squeeze") a zero window will be
> > >   advertised in this case. It turns out that reaching the required
> > >   memory pressure is easy when window scaling is in use. In the
> > >   simplest case, sending a sufficient number of segments smaller than
> > >   the scale factor to a receiver that does not read data is enough.
> > >
> > > - Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
> > >   allowing the tcp window to shrink") addressed the "eating memory"
> > >   problem by introducing a sysctl knob that allows shrinking the
> > >   window before running out of memory.
> > >
> > > However, RFC 7323 does not only state that shrinking the window is
> > > necessary in some cases, it also formulates requirements for TCP
> > > implementations when doing so (Section 2.4).
> > >
> > > This commit addresses the receiver-side requirements: After retracting
> > > the window, the peer may have a snd_nxt that lies within a previously
> > > advertised window but is now beyond the retracted window. This means
> > > that all incoming segments (including pure ACKs) will be rejected
> > > until the application happens to read enough data to let the peer's
> > > snd_nxt be in window again (which may be never).
> > >
> > > To comply with RFC 7323, the receiver MUST honor any segment that
> > > would have been in window for any ACK sent by the receiver and, when
> > > window scaling is in effect, SHOULD track the maximum window sequence
> > > number it has advertised. This patch tracks that maximum window
> > > sequence number rcv_mwnd_seq throughout the connection and uses it in
> > > tcp_sequence() when deciding whether a segment is acceptable.
> > >
> > > rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
> > > tcp_select_window(). If we count tcp_sequence() as fast path, it is
> > > read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's
> > > cacheline group.
> > >
> > > The logic for handling received data in tcp_data_queue() is already
> > > sufficient and does not need to be updated.
> > >
> > > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> >
> > ...
> >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index f0ebcc7e287173be6198fd100130e7ba1a1dbf03..c86910d147f2394bf414d7691d8f90ed41c1b0e3 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -293,6 +293,7 @@ static u16 tcp_select_window(struct sock *sk)
> > >                 tp->pred_flags = 0;
> > >                 tp->rcv_wnd = 0;
> > >                 tp->rcv_wup = tp->rcv_nxt;
> > > +               tcp_update_max_rcv_wnd_seq(tp);
> >
> > Presumably we do not need  tcp_update_max_rcv_wnd_seq() here ?
>
> When we don't update here and are forced to accept a beyond-window
> packet because the receive queue is empty, we can reach a state where
>
>  rcv_mwnd_seq < rcv_wup + rcv_wnd == rcv_nxt
>
> I noticed this case when instrumenting the kernel and got violations
> of the invariant rcv_wup + rcv_wnd <= rcv_mwnd_seq.
>
> So, while not strictly needed (tcp_max_receive_window() would still
> be 0 as rcv_nxt > rcv_mwnd_seq), I opted to include the call here to
> keep rcv_mwnd_seq the actual maximum sequence number at all times.

Fair enough, thanks !

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd
  2026-03-09  8:02 ` [PATCH net-next v3 2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd Simon Baatz via B4 Relay
@ 2026-03-10  8:46   ` Eric Dumazet
  2026-03-11 18:27   ` Matthieu Baerts
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2026-03-10  8:46 UTC (permalink / raw)
  To: gmbnomis
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Mon, Mar 9, 2026 at 9:03 AM Simon Baatz via B4 Relay
<devnull+gmbnomis.gmail.com@kernel.org> wrote:
>
> From: Simon Baatz <gmbnomis@gmail.com>
>
> MPTCP shares a receive window across subflows and applies it at the
> subflow level by adjusting each subflow's rcv_wnd when needed.  With
> the new TCP tracking of the maximum advertised window sequence,
> rcv_mwnd_seq must stay consistent with these subflow-level rcv_wnd
> adjustments.
>
> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 4/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
  2026-03-09  8:02 ` [PATCH net-next v3 4/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt Simon Baatz via B4 Relay
@ 2026-03-10  8:46   ` Eric Dumazet
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2026-03-10  8:46 UTC (permalink / raw)
  To: gmbnomis
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Mon, Mar 9, 2026 at 9:03 AM Simon Baatz via B4 Relay
<devnull+gmbnomis.gmail.com@kernel.org> wrote:
>
> From: Simon Baatz <gmbnomis@gmail.com>
>
> This test verifies
> - the sequence number checks using the maximum advertised window
>   sequence number and
> - the logic for handling received data in tcp_data_queue()
>
> for the cases:
>
> 1. The window is reduced to zero because of memory
>
> 2. The window grows again but still does not reach the originally
>    advertised window
>
> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks !

> +// Ooo partial segment, in adv. window -> accept
> +   +0 < P. 1040001:1042001(2000) ack 2001 win 257
> +   +0 > .  2001:2001(0) ack 1042001 <nop,nop,sack 1040001:1041001>
> +
> +// Ooo segment, in max adv. window, beyond adv. window -> drop (SKB_DROP_REASON_TCP_OVERWINDOW)
> +   +0 < P. 1105001:1106001(1000) ack 2001 win 257
> +   +0 > .  2001:2001(0) ack 1042001
> +// Ooo segment, beyond max adv. window, beyond adv. window -> drop (SKB_DROP_REASON_TCP_INVALID_SEQUENCE)
> +   +0 < P. 2000001:2001001(1000) ack 2001 win 257
> +   +0 > .  2001:2001(0) ack 1042001
> +// Check LINUX_MIB_BEYOND_WINDOW has been incremented twice
> +   +0 `nstat -s | grep TcpExtBeyondWindow | grep -q " 4 "`
> +
> +// We are allowed to go beyond the window and buffer with one packet
> +   +0 < P. 1042001:1062001(20000) ack 2001 win 257
> +    * > .  2001:2001(0) ack 1062001
> +   +0 < P. 1062001:1082001(20000) ack 2001 win 257
> +    * > .  2001:2001(0) ack 1082001 win 0
> +
> +// But not more: In order segment, in max adv. window -> drop (SKB_DROP_REASON_TCP_ZEROWINDOW)
> +   +0 < P. 1082001:1083001(1000) ack 2001 win 257
> +    * > .  2001:2001(0) ack 1082001
> +// Check LINUX_MIB_TCPZEROWINDOWDROP has been incremented again
> +   +0 `nstat -s | grep TcpExtTCPZeroWindowDrop| grep -q " 3 "`
>
> --
> 2.53.0
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 5/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
  2026-03-09  8:02 ` [PATCH net-next v3 5/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt Simon Baatz via B4 Relay
@ 2026-03-10  8:52   ` Eric Dumazet
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2026-03-10  8:52 UTC (permalink / raw)
  To: gmbnomis
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Mon, Mar 9, 2026 at 9:03 AM Simon Baatz via B4 Relay
<devnull+gmbnomis.gmail.com@kernel.org> wrote:
>
> From: Simon Baatz <gmbnomis@gmail.com>
>
> This test verifies the sequence number checks using the maximum
> advertised window sequence number when net.ipv4.tcp_shrink_window
> is enabled.
>
> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt
  2026-03-09  8:02 ` [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt Simon Baatz via B4 Relay
@ 2026-03-10  8:54   ` Eric Dumazet
  2026-03-10 23:09     ` Simon Baatz
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2026-03-10  8:54 UTC (permalink / raw)
  To: gmbnomis
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Mon, Mar 9, 2026 at 9:03 AM Simon Baatz via B4 Relay
<devnull+gmbnomis.gmail.com@kernel.org> wrote:
>
> From: Simon Baatz <gmbnomis@gmail.com>
>
> The test ensures we correctly apply the maximum advertised window limit
> when rcv_nxt advances past rcv_mwnd_seq, so that the "usable window"
> is properly clamped to zero rather than becoming negative.
>
> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> ---
>  .../net/packetdrill/tcp_rcv_neg_window.pkt         | 26 ++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
> new file mode 100644
> index 0000000000000000000000000000000000000000..15a9b4938f16d175ac54f3fd192ed2b59b0a4399
> --- /dev/null
> +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
> @@ -0,0 +1,26 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +--mss=1000
> +
> +`./defaults.sh`
> +
> +// Establish a connection.
> +   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +   +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [20000], 4) = 0
> +   +0 bind(3, ..., ...) = 0
> +   +0 listen(3, 1) = 0
> +
> +   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
> +   +0 > S. 0:0(0) ack 1 win 18980 <mss 1460,nop,wscale 0>
> +  +.1 < . 1:1(0) ack 1 win 257
> +
> +   +0 accept(3, ..., ...) = 4
> +
> +// A too big packet is accepted if the receive queue is empty
> +   +0 < P. 1:20001(20000) ack 1 win 257

We do not see the answer, it seems this test is not complete ?

> +// Send a RST immediately so that there is no rcv_wup/rcv_mwnd_seq update yet
> +   +0 < R. 20001:20001(0) ack 1 win 257
> +
> +  +.1 %{ assert tcpi_state == TCP_CLOSE, tcpi_state }%
> +
>
> --
> 2.53.0
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements
  2026-03-09  8:02 ` [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
  2026-03-09  9:22   ` Eric Dumazet
@ 2026-03-10  8:58   ` Stefano Brivio
  2026-03-10 22:34     ` Simon Baatz
  1 sibling, 1 reply; 27+ messages in thread
From: Stefano Brivio @ 2026-03-10  8:58 UTC (permalink / raw)
  To: Simon Baatz via B4 Relay
  Cc: gmbnomis, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Shuah Khan, David Ahern, Jon Maloy, Jason Xing,
	mfreemon, Shuah Khan, Matthieu Baerts, Mat Martineau,
	Geliang Tang, netdev, linux-doc, linux-kernel, linux-kselftest,
	mptcp

Simon,

On Mon, 09 Mar 2026 09:02:26 +0100
Simon Baatz via B4 Relay <devnull+gmbnomis.gmail.com@kernel.org> wrote:

> [...]
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index f72eef31fa23cc584f2f0cefacdc35cae43aa52d..73aa2e0ccd1d7a6314a00c27950b019b62a3851c 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -316,6 +316,9 @@ struct tcp_sock {
>  					*/
>  	u32	app_limited;	/* limited until "delivered" reaches this val */
>  	u32	rcv_wnd;	/* Current receiver window		*/
> +	u32	rcv_mwnd_seq;	/* Maximum window sequence number (RFC 7323,
> +				 * section 2.4, receiver requirements)
> +				 */

I didn't follow the rest of the discussion but, at this point, what
does this mean for applications (CRIU, passt) dumping/restoring socket
data? Do they have to adapt? I couldn't find this bit of information
anywhere in v3.

-- 
Stefano


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements
  2026-03-10  8:58   ` Stefano Brivio
@ 2026-03-10 22:34     ` Simon Baatz
  0 siblings, 0 replies; 27+ messages in thread
From: Simon Baatz @ 2026-03-10 22:34 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Simon Baatz via B4 Relay, Eric Dumazet, Neal Cardwell,
	Kuniyuki Iwashima, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, David Ahern, Jon Maloy,
	Jason Xing, mfreemon, Shuah Khan, Matthieu Baerts, Mat Martineau,
	Geliang Tang, netdev, linux-doc, linux-kernel, linux-kselftest,
	mptcp

Hi Stefano,

On Tue, Mar 10, 2026 at 09:58:07AM +0100, Stefano Brivio wrote:
> Simon,
> 
> On Mon, 09 Mar 2026 09:02:26 +0100
> Simon Baatz via B4 Relay <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> 
> > [...]
> >
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > index f72eef31fa23cc584f2f0cefacdc35cae43aa52d..73aa2e0ccd1d7a6314a00c27950b019b62a3851c 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -316,6 +316,9 @@ struct tcp_sock {
> >  					*/
> >  	u32	app_limited;	/* limited until "delivered" reaches this val */
> >  	u32	rcv_wnd;	/* Current receiver window		*/
> > +	u32	rcv_mwnd_seq;	/* Maximum window sequence number (RFC 7323,
> > +				 * section 2.4, receiver requirements)
> > +				 */
> 
> I didn't follow the rest of the discussion but, at this point, what
> does this mean for applications (CRIU, passt) dumping/restoring socket
> data? Do they have to adapt? I couldn't find this bit of information
> anywhere in v3.

Based on our discussion, the "Setting the TCP_REPAIR_WINDOW socket
option initializes rcv_mwnd_seq" v2 change addresses TCP window
restoration.  As we said that information about window retraction is
not crucial, the window will be restored as "non-retracted", matching
prior behavior.  Therefore, there is no change to the information
that applications need to dump or restore.

-- 
Simon Baatz <gmbnomis@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt
  2026-03-10  8:54   ` Eric Dumazet
@ 2026-03-10 23:09     ` Simon Baatz
  2026-03-14  3:58       ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Simon Baatz @ 2026-03-10 23:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

Hi Eric,

On Tue, Mar 10, 2026 at 09:54:58AM +0100, Eric Dumazet wrote:
> On Mon, Mar 9, 2026 at 9:03???AM Simon Baatz via B4 Relay
> <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> >
> > From: Simon Baatz <gmbnomis@gmail.com>
> >
> > The test ensures we correctly apply the maximum advertised window limit
> > when rcv_nxt advances past rcv_mwnd_seq, so that the "usable window"
> > is properly clamped to zero rather than becoming negative.
> >
> > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> > ---
> >  .../net/packetdrill/tcp_rcv_neg_window.pkt         | 26 ++++++++++++++++++++++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..15a9b4938f16d175ac54f3fd192ed2b59b0a4399
> > --- /dev/null
> > +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
> > @@ -0,0 +1,26 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +--mss=1000
> > +
> > +`./defaults.sh`
> > +
> > +// Establish a connection.
> > +   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> > +   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> > +   +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [20000], 4) = 0
> > +   +0 bind(3, ..., ...) = 0
> > +   +0 listen(3, 1) = 0
> > +
> > +   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
> > +   +0 > S. 0:0(0) ack 1 win 18980 <mss 1460,nop,wscale 0>
> > +  +.1 < . 1:1(0) ack 1 win 257
> > +
> > +   +0 accept(3, ..., ...) = 4
> > +
> > +// A too big packet is accepted if the receive queue is empty
> > +   +0 < P. 1:20001(20000) ack 1 win 257
> 
> We do not see the answer, it seems this test is not complete ?

Actually we do not want to see an answer.  The packet won't trigger
an immediate ACK (it is larger than the advertised window, but does
not cause immediate memory pressure).

When we then send a RST before the delayed ACK would be generated:
 
> > +// Send a RST immediately so that there is no rcv_wup/rcv_mwnd_seq update yet
> > +   +0 < R. 20001:20001(0) ack 1 win 257

We are in a state where rcv_wup, rcv_wnd, and rcv_mwnd_seq have not
been updated yet, but we must still accept the RST 
(rcv_nxt == 20001 > rcv_mwnd_seq, tcp_max_receive_window() == 0)

> > +
> > +  +.1 %{ assert tcpi_state == TCP_CLOSE, tcpi_state }%

And we verify that we accepted the RST here.

Given how subtle this sequence is, and considering the limited value
of this test, I am also fine with dropping it if it is too fragile or
confusing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd
  2026-03-09  8:02 ` [PATCH net-next v3 2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd Simon Baatz via B4 Relay
  2026-03-10  8:46   ` Eric Dumazet
@ 2026-03-11 18:27   ` Matthieu Baerts
  2026-03-11 22:08     ` Simon Baatz
  1 sibling, 1 reply; 27+ messages in thread
From: Matthieu Baerts @ 2026-03-11 18:27 UTC (permalink / raw)
  To: Simon Baatz
  Cc: netdev, linux-doc, linux-kernel, linux-kselftest, mptcp,
	Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Stefano Brivio, Mat Martineau, Geliang Tang

Hi Simon,

On 09/03/2026 09:02, Simon Baatz via B4 Relay wrote:
> From: Simon Baatz <gmbnomis@gmail.com>
> 
> MPTCP shares a receive window across subflows and applies it at the
> subflow level by adjusting each subflow's rcv_wnd when needed.  With
> the new TCP tracking of the maximum advertised window sequence,
> rcv_mwnd_seq must stay consistent with these subflow-level rcv_wnd
> adjustments.

Thank you for these modifications!

> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> ---
>  net/mptcp/options.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index 43df4293f58bfbd8a8df6bf24b9f15e0f9e238f6..8a1c5698983cff3082d68290626dd8f1e044527f 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c

(...)

> @@ -1338,8 +1339,9 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
>  		 */
>  		rcv_wnd_new = rcv_wnd_old;
>  		win = rcv_wnd_old - ack_seq;
> -		tp->rcv_wnd = min_t(u64, win, U32_MAX);
> -		new_win = tp->rcv_wnd;
> +		new_win = min_t(u64, win, U32_MAX);
> +		tp->rcv_wnd = new_win;

Out of curiosity, why did you change the two lines above?
(even if it makes sense, the diff is a bit confusing, and the commit
message doesn't mention this :) )

> +		tcp_update_max_rcv_wnd_seq(tp);


This patch adding this new helper each time rcv_wnd is modified looks
good to me:

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>


Note: just in case a new version is needed, checkpatch reported an error
in patch 4/6 because of a trailing whitespace (+ No space is necessary
after a cast in patch 1/6), see:

  https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22844479818

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd
  2026-03-11 18:27   ` Matthieu Baerts
@ 2026-03-11 22:08     ` Simon Baatz
  2026-03-12 11:01       ` Matthieu Baerts
  0 siblings, 1 reply; 27+ messages in thread
From: Simon Baatz @ 2026-03-11 22:08 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: netdev, linux-doc, linux-kernel, linux-kselftest, mptcp,
	Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Stefano Brivio, Mat Martineau, Geliang Tang

Hi Matt,

On Wed, Mar 11, 2026 at 07:27:34PM +0100, Matthieu Baerts wrote:
> Hi Simon,
> 
> On 09/03/2026 09:02, Simon Baatz via B4 Relay wrote:
> > From: Simon Baatz <gmbnomis@gmail.com>
> > 
> > MPTCP shares a receive window across subflows and applies it at the
> > subflow level by adjusting each subflow's rcv_wnd when needed.  With
> > the new TCP tracking of the maximum advertised window sequence,
> > rcv_mwnd_seq must stay consistent with these subflow-level rcv_wnd
> > adjustments.
> 
> Thank you for these modifications!
> 
> > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> > ---
> >  net/mptcp/options.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > index 43df4293f58bfbd8a8df6bf24b9f15e0f9e238f6..8a1c5698983cff3082d68290626dd8f1e044527f 100644
> > --- a/net/mptcp/options.c
> > +++ b/net/mptcp/options.c
> 
> (...)
> 
> > @@ -1338,8 +1339,9 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
> >  		 */
> >  		rcv_wnd_new = rcv_wnd_old;
> >  		win = rcv_wnd_old - ack_seq;
> > -		tp->rcv_wnd = min_t(u64, win, U32_MAX);
> > -		new_win = tp->rcv_wnd;
> > +		new_win = min_t(u64, win, U32_MAX);
> > +		tp->rcv_wnd = new_win;
> 
> Out of curiosity, why did you change the two lines above?
> (even if it makes sense, the diff is a bit confusing, and the commit
> message doesn't mention this :) )

I wanted to keep tcp_update_max_rcv_wnd_seq() calls close to the
respective update sites (same pattern everywhere).  In the original
form

tp->rcv_wnd = min_t(u64, win, U32_MAX);
tcp_update_max_rcv_wnd_seq(tp);
new_win = tp->rcv_wnd;

the ordering suggests that tcp_update_max_rcv_wnd_seq() might modify
tp->rcv_wnd.

So, I changed it for legibility.  Now, I realize it made the
diff harder to read.  I might have optimized the wrong metric here ;-)

> 
> > +		tcp_update_max_rcv_wnd_seq(tp);
> 
> 
> This patch adding this new helper each time rcv_wnd is modified looks
> good to me:
> 
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> 
> 
> Note: just in case a new version is needed, checkpatch reported an error
> in patch 4/6 because of a trailing whitespace (+ No space is necessary
> after a cast in patch 1/6), see:
> 
>   https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22844479818

Thanks. I will change that if there is a v4.


-- 
Simon Baatz <gmbnomis@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd
  2026-03-11 22:08     ` Simon Baatz
@ 2026-03-12 11:01       ` Matthieu Baerts
  0 siblings, 0 replies; 27+ messages in thread
From: Matthieu Baerts @ 2026-03-12 11:01 UTC (permalink / raw)
  To: Simon Baatz
  Cc: netdev, linux-doc, linux-kernel, linux-kselftest, mptcp,
	Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, David Ahern, Jon Maloy, Jason Xing, mfreemon,
	Shuah Khan, Stefano Brivio, Mat Martineau, Geliang Tang

Hi Simon,

On 11/03/2026 23:08, Simon Baatz wrote:
> On Wed, Mar 11, 2026 at 07:27:34PM +0100, Matthieu Baerts wrote:
>> On 09/03/2026 09:02, Simon Baatz via B4 Relay wrote:
>>> From: Simon Baatz <gmbnomis@gmail.com>
>>>
>>> MPTCP shares a receive window across subflows and applies it at the
>>> subflow level by adjusting each subflow's rcv_wnd when needed.  With
>>> the new TCP tracking of the maximum advertised window sequence,
>>> rcv_mwnd_seq must stay consistent with these subflow-level rcv_wnd
>>> adjustments.
>>
>> Thank you for these modifications!
>>
>>> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
>>> ---
>>>  net/mptcp/options.c | 6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
>>> index 43df4293f58bfbd8a8df6bf24b9f15e0f9e238f6..8a1c5698983cff3082d68290626dd8f1e044527f 100644
>>> --- a/net/mptcp/options.c
>>> +++ b/net/mptcp/options.c
>>
>> (...)
>>
>>> @@ -1338,8 +1339,9 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
>>>  		 */
>>>  		rcv_wnd_new = rcv_wnd_old;
>>>  		win = rcv_wnd_old - ack_seq;
>>> -		tp->rcv_wnd = min_t(u64, win, U32_MAX);
>>> -		new_win = tp->rcv_wnd;
>>> +		new_win = min_t(u64, win, U32_MAX);
>>> +		tp->rcv_wnd = new_win;
>>
>> Out of curiosity, why did you change the two lines above?
>> (even if it makes sense, the diff is a bit confusing, and the commit
>> message doesn't mention this :) )
> 
> I wanted to keep tcp_update_max_rcv_wnd_seq() calls close to the
> respective update sites (same pattern everywhere).

Thanks, I now understand the reason.

> In the original form
> 
> tp->rcv_wnd = min_t(u64, win, U32_MAX);
> tcp_update_max_rcv_wnd_seq(tp);
> new_win = tp->rcv_wnd;
> 
> the ordering suggests that tcp_update_max_rcv_wnd_seq() might modify
> tp->rcv_wnd.

Note that if tp->rcv_mwnd_seq always needs to be modified when
tp->rcv_wnd and/or tp->rcv_wup are modified, maybe a single helper could
be called to modify all of them, so it might be less likely to forget
about modifying tp->rcv_mwnd_seq as well in the future.

But probably it might be unlikely to have new places where tp->rcv_wnd
and/or tp->rcv_wup need to be modified like here with MPTCP. So probably
fine like that.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt
  2026-03-10 23:09     ` Simon Baatz
@ 2026-03-14  3:58       ` Eric Dumazet
  2026-03-14 14:55         ` Eric Dumazet
  2026-03-14 17:07         ` Simon Baatz
  0 siblings, 2 replies; 27+ messages in thread
From: Eric Dumazet @ 2026-03-14  3:58 UTC (permalink / raw)
  To: Simon Baatz
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Wed, Mar 11, 2026 at 12:09 AM Simon Baatz <gmbnomis@gmail.com> wrote:
>
> Hi Eric,
>
> On Tue, Mar 10, 2026 at 09:54:58AM +0100, Eric Dumazet wrote:
> > On Mon, Mar 9, 2026 at 9:03???AM Simon Baatz via B4 Relay
> > <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> > >
> > > From: Simon Baatz <gmbnomis@gmail.com>
> > >
> > > The test ensures we correctly apply the maximum advertised window limit
> > > when rcv_nxt advances past rcv_mwnd_seq, so that the "usable window"
> > > is properly clamped to zero rather than becoming negative.
> > >
> > > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> > > ---
> > >  .../net/packetdrill/tcp_rcv_neg_window.pkt         | 26 ++++++++++++++++++++++
> > >  1 file changed, 26 insertions(+)
> > >
> > > diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
> > > new file mode 100644
> > > index 0000000000000000000000000000000000000000..15a9b4938f16d175ac54f3fd192ed2b59b0a4399
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
> > > @@ -0,0 +1,26 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +
> > > +--mss=1000
> > > +
> > > +`./defaults.sh`
> > > +
> > > +// Establish a connection.
> > > +   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> > > +   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> > > +   +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [20000], 4) = 0
> > > +   +0 bind(3, ..., ...) = 0
> > > +   +0 listen(3, 1) = 0
> > > +
> > > +   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
> > > +   +0 > S. 0:0(0) ack 1 win 18980 <mss 1460,nop,wscale 0>
> > > +  +.1 < . 1:1(0) ack 1 win 257
> > > +
> > > +   +0 accept(3, ..., ...) = 4
> > > +
> > > +// A too big packet is accepted if the receive queue is empty
> > > +   +0 < P. 1:20001(20000) ack 1 win 257
> >
> > We do not see the answer, it seems this test is not complete ?
>
> Actually we do not want to see an answer.  The packet won't trigger
> an immediate ACK (it is larger than the advertised window, but does
> not cause immediate memory pressure).
>
> When we then send a RST before the delayed ACK would be generated:
>
> > > +// Send a RST immediately so that there is no rcv_wup/rcv_mwnd_seq update yet
> > > +   +0 < R. 20001:20001(0) ack 1 win 257
>
> We are in a state where rcv_wup, rcv_wnd, and rcv_mwnd_seq have not
> been updated yet, but we must still accept the RST
> (rcv_nxt == 20001 > rcv_mwnd_seq, tcp_max_receive_window() == 0)
>
> > > +
> > > +  +.1 %{ assert tcpi_state == TCP_CLOSE, tcpi_state }%
>
> And we verify that we accepted the RST here.
>
> Given how subtle this sequence is, and considering the limited value
> of this test, I am also fine with dropping it if it is too fragile or
> confusing.

Sorry I missed your answer.

Ok then please use :

// A too big packet is accepted if the receive queue is empty
   +0 < P. 1:20001(20000) ack 1 win 257
   +0 %{ assert tcpi_bytes_received == 20000, tcpi_bytes_received;
assert tcpi_bytes_acked == 0, tcpi_bytes_acked }%

// Send a RST immediately so that there is no rcv_wup/rcv_mwnd_seq update yet
   +0 < R. 20001:20001(0) ack 1 win 257

  +.1 %{ assert tcpi_state == TCP_CLOSE, tcpi_state }%



Then add my
Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt
  2026-03-14  3:58       ` Eric Dumazet
@ 2026-03-14 14:55         ` Eric Dumazet
  2026-03-14 15:01           ` Jakub Kicinski
  2026-03-14 17:07         ` Simon Baatz
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2026-03-14 14:55 UTC (permalink / raw)
  To: Simon Baatz
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Sat, Mar 14, 2026 at 4:58 AM Eric Dumazet <edumazet@google.com> wrote:
>

>
> Then add my
> Reviewed-by: Eric Dumazet <edumazet@google.com>

BTW, this can be done in a followup.

Jakub/Paolo feel free to apply v3 series if this is still possible.

Thanks !

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt
  2026-03-14 14:55         ` Eric Dumazet
@ 2026-03-14 15:01           ` Jakub Kicinski
  0 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2026-03-14 15:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Baatz, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Sat, 14 Mar 2026 15:55:35 +0100 Eric Dumazet wrote:
> On Sat, Mar 14, 2026 at 4:58 AM Eric Dumazet <edumazet@google.com> wrote:
> > Then add my
> > Reviewed-by: Eric Dumazet <edumazet@google.com>  
> 
> BTW, this can be done in a followup.
> 
> Jakub/Paolo feel free to apply v3 series if this is still possible.

Roger that!

For the record I'll fix the trailing white space on patch 4 when
applying.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling
  2026-03-09  8:02 [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
                   ` (5 preceding siblings ...)
  2026-03-09  8:02 ` [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt Simon Baatz via B4 Relay
@ 2026-03-14 15:40 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 27+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-03-14 15:40 UTC (permalink / raw)
  To: Simon Baatz
  Cc: edumazet, ncardwell, kuniyu, davem, kuba, pabeni, horms, corbet,
	skhan, dsahern, jmaloy, kerneljasonxing, mfreemon, shuah, sbrivio,
	matttbe, martineau, geliang, netdev, linux-doc, linux-kernel,
	linux-kselftest, mptcp

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 09 Mar 2026 09:02:25 +0100 you wrote:
> Hi,
> 
> this series implements the receiver-side requirements for TCP window
> retraction as specified in RFC 7323 and adds packetdrill tests to
> cover the new behavior.
> 
> Please see the first patch for background and implementation
> details. Since MPTCP adjusts the TCP receive window on subflows, the
> relevant MPTCP code paths are updated accordingly.
> 
> [...]

Here is the summary with links:
  - [net-next,v3,1/6] tcp: implement RFC 7323 window retraction receiver requirements
    https://git.kernel.org/netdev/net-next/c/0e24d17bd966
  - [net-next,v3,2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd
    https://git.kernel.org/netdev/net-next/c/81714374a29c
  - [net-next,v3,3/6] tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOW
    https://git.kernel.org/netdev/net-next/c/e2b9c52a2b00
  - [net-next,v3,4/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
    https://git.kernel.org/netdev/net-next/c/ec1adf8ecf95
  - [net-next,v3,5/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
    https://git.kernel.org/netdev/net-next/c/ba58b3e70b86
  - [net-next,v3,6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt
    https://git.kernel.org/netdev/net-next/c/3eb371eddad0

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt
  2026-03-14  3:58       ` Eric Dumazet
  2026-03-14 14:55         ` Eric Dumazet
@ 2026-03-14 17:07         ` Simon Baatz
  2026-03-16 21:51           ` Simon Baatz
  1 sibling, 1 reply; 27+ messages in thread
From: Simon Baatz @ 2026-03-14 17:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

Hi Eric,

On Sat, Mar 14, 2026 at 04:58:28AM +0100, Eric Dumazet wrote:
> On Wed, Mar 11, 2026 at 12:09???AM Simon Baatz <gmbnomis@gmail.com> wrote:
> >
> > Hi Eric,
> >
> > On Tue, Mar 10, 2026 at 09:54:58AM +0100, Eric Dumazet wrote:
> > > On Mon, Mar 9, 2026 at 9:03???AM Simon Baatz via B4 Relay
> > > <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> > > >
> > > > From: Simon Baatz <gmbnomis@gmail.com>
> > > >
> > > > The test ensures we correctly apply the maximum advertised window limit
> > > > when rcv_nxt advances past rcv_mwnd_seq, so that the "usable window"
> > > > is properly clamped to zero rather than becoming negative.
> > > >
> > > > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> > > > ---
> > > >  .../net/packetdrill/tcp_rcv_neg_window.pkt         | 26 ++++++++++++++++++++++
> > > >  1 file changed, 26 insertions(+)
> > > >
> > > > diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
> > > > new file mode 100644
> > > > index 0000000000000000000000000000000000000000..15a9b4938f16d175ac54f3fd192ed2b59b0a4399
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
> > > > @@ -0,0 +1,26 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +--mss=1000
> > > > +
> > > > +`./defaults.sh`
> > > > +
> > > > +// Establish a connection.
> > > > +   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> > > > +   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> > > > +   +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [20000], 4) = 0
> > > > +   +0 bind(3, ..., ...) = 0
> > > > +   +0 listen(3, 1) = 0
> > > > +
> > > > +   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
> > > > +   +0 > S. 0:0(0) ack 1 win 18980 <mss 1460,nop,wscale 0>
> > > > +  +.1 < . 1:1(0) ack 1 win 257
> > > > +
> > > > +   +0 accept(3, ..., ...) = 4
> > > > +
> > > > +// A too big packet is accepted if the receive queue is empty
> > > > +   +0 < P. 1:20001(20000) ack 1 win 257
> > >
> > > We do not see the answer, it seems this test is not complete ?
> >
> > Actually we do not want to see an answer.  The packet won't trigger
> > an immediate ACK (it is larger than the advertised window, but does
> > not cause immediate memory pressure).
> >
> > When we then send a RST before the delayed ACK would be generated:
> >
> > > > +// Send a RST immediately so that there is no rcv_wup/rcv_mwnd_seq update yet
> > > > +   +0 < R. 20001:20001(0) ack 1 win 257
> >
> > We are in a state where rcv_wup, rcv_wnd, and rcv_mwnd_seq have not
> > been updated yet, but we must still accept the RST
> > (rcv_nxt == 20001 > rcv_mwnd_seq, tcp_max_receive_window() == 0)
> >
> > > > +
> > > > +  +.1 %{ assert tcpi_state == TCP_CLOSE, tcpi_state }%
> >
> > And we verify that we accepted the RST here.
> >
> > Given how subtle this sequence is, and considering the limited value
> > of this test, I am also fine with dropping it if it is too fragile or
> > confusing.
> 
> Sorry I missed your answer.
> 
> Ok then please use :
> 
> // A too big packet is accepted if the receive queue is empty
>    +0 < P. 1:20001(20000) ack 1 win 257
>    +0 %{ assert tcpi_bytes_received == 20000, tcpi_bytes_received;
> assert tcpi_bytes_acked == 0, tcpi_bytes_acked }%

Unfortunately, tcpi_bytes_acked is the TX direction, it will always
be 0 here.

Instead, we can still test that the oversized packet is accepted and
indirectly verify that no immediate ACK is sent by eliciting and
checking a RST:

// A too big packet is accepted if the receive queue is empty, but does not trigger
// an immediate ACK.
   +0 < P. 1:20001(20000) ack 1 win 257
   +0 %{ assert tcpi_bytes_received == 20000, tcpi_bytes_received; }%

// Send a RST immediately so that there is no rcv_wup/rcv_mwnd_seq update yet
   +0 < R. 20001:20001(0) ack 1 win 257

// Verify that the RST was accepted. Indirectly this also verifies that no immediate
// ACK was sent for the data packet above.
   +0 < . 20001:20001(0) ack 1 win 257
    * > R 1:1(0)

As the series is merged now (thank you!), I will send this
separately, as suggested.

- Simon

-- 
Simon Baatz <gmbnomis@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt
  2026-03-14 17:07         ` Simon Baatz
@ 2026-03-16 21:51           ` Simon Baatz
  0 siblings, 0 replies; 27+ messages in thread
From: Simon Baatz @ 2026-03-16 21:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	David Ahern, Jon Maloy, Jason Xing, mfreemon, Shuah Khan,
	Stefano Brivio, Matthieu Baerts, Mat Martineau, Geliang Tang,
	netdev, linux-doc, linux-kernel, linux-kselftest, mptcp

On Sat, Mar 14, 2026 at 06:07:05PM +0100, Simon Baatz wrote:
> Hi Eric,
> 
> On Sat, Mar 14, 2026 at 04:58:28AM +0100, Eric Dumazet wrote:
> > On Wed, Mar 11, 2026 at 12:09???AM Simon Baatz <gmbnomis@gmail.com> wrote:
> > >
> > > Hi Eric,
> > >
> > > On Tue, Mar 10, 2026 at 09:54:58AM +0100, Eric Dumazet wrote:
> > > > On Mon, Mar 9, 2026 at 9:03???AM Simon Baatz via B4 Relay
> > > > <devnull+gmbnomis.gmail.com@kernel.org> wrote:
> > > > >
> > > > > From: Simon Baatz <gmbnomis@gmail.com>
> > > > >
> > > > > The test ensures we correctly apply the maximum advertised window limit
> > > > > when rcv_nxt advances past rcv_mwnd_seq, so that the "usable window"
> > > > > is properly clamped to zero rather than becoming negative.
> > > > >
> > > > > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> > > > > ---
> > > > >  .../net/packetdrill/tcp_rcv_neg_window.pkt         | 26 ++++++++++++++++++++++
> > > > >  1 file changed, 26 insertions(+)
> > > > >
> > > > > diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
> > > > > new file mode 100644
> > > > > index 0000000000000000000000000000000000000000..15a9b4938f16d175ac54f3fd192ed2b59b0a4399
> > > > > --- /dev/null
> > > > > +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window.pkt
> > > > > @@ -0,0 +1,26 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0
> > > > > +
> > > > > +--mss=1000
> > > > > +
> > > > > +`./defaults.sh`
> > > > > +
> > > > > +// Establish a connection.
> > > > > +   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> > > > > +   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> > > > > +   +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [20000], 4) = 0
> > > > > +   +0 bind(3, ..., ...) = 0
> > > > > +   +0 listen(3, 1) = 0
> > > > > +
> > > > > +   +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
> > > > > +   +0 > S. 0:0(0) ack 1 win 18980 <mss 1460,nop,wscale 0>
> > > > > +  +.1 < . 1:1(0) ack 1 win 257
> > > > > +
> > > > > +   +0 accept(3, ..., ...) = 4
> > > > > +
> > > > > +// A too big packet is accepted if the receive queue is empty
> > > > > +   +0 < P. 1:20001(20000) ack 1 win 257
> > > >
> > > > We do not see the answer, it seems this test is not complete ?
> > >
> > > Actually we do not want to see an answer.  The packet won't trigger
> > > an immediate ACK (it is larger than the advertised window, but does
> > > not cause immediate memory pressure).
> > >
> > > When we then send a RST before the delayed ACK would be generated:
> > >
> > > > > +// Send a RST immediately so that there is no rcv_wup/rcv_mwnd_seq update yet
> > > > > +   +0 < R. 20001:20001(0) ack 1 win 257
> > >
> > > We are in a state where rcv_wup, rcv_wnd, and rcv_mwnd_seq have not
> > > been updated yet, but we must still accept the RST
> > > (rcv_nxt == 20001 > rcv_mwnd_seq, tcp_max_receive_window() == 0)
> > >
> > > > > +
> > > > > +  +.1 %{ assert tcpi_state == TCP_CLOSE, tcpi_state }%
> > >
> > > And we verify that we accepted the RST here.
> > >
> > > Given how subtle this sequence is, and considering the limited value
> > > of this test, I am also fine with dropping it if it is too fragile or
> > > confusing.
> > 
> > Sorry I missed your answer.
> > 
> > Ok then please use :
> > 
> > // A too big packet is accepted if the receive queue is empty
> >    +0 < P. 1:20001(20000) ack 1 win 257
> >    +0 %{ assert tcpi_bytes_received == 20000, tcpi_bytes_received;
> > assert tcpi_bytes_acked == 0, tcpi_bytes_acked }%
> 
> Unfortunately, tcpi_bytes_acked is the TX direction, it will always
> be 0 here.
> 
> Instead, we can still test that the oversized packet is accepted and
> indirectly verify that no immediate ACK is sent by eliciting and
> checking a RST:
> 
> // A too big packet is accepted if the receive queue is empty, but does not trigger
> // an immediate ACK.
>    +0 < P. 1:20001(20000) ack 1 win 257
>    +0 %{ assert tcpi_bytes_received == 20000, tcpi_bytes_received; }%
> 
> // Send a RST immediately so that there is no rcv_wup/rcv_mwnd_seq update yet
>    +0 < R. 20001:20001(0) ack 1 win 257
> 
> // Verify that the RST was accepted. Indirectly this also verifies that no immediate
> // ACK was sent for the data packet above.
>    +0 < . 20001:20001(0) ack 1 win 257
>     * > R 1:1(0)
> 
> As the series is merged now (thank you!), I will send this
> separately, as suggested.

Patch is at: https://lore.kernel.org/netdev/20260316-improve_tcp_neg_usable_wnd_test-v1-1-f16d5e365107@gmail.com/

-- 
Simon Baatz <gmbnomis@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-03-16 21:51 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09  8:02 [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling Simon Baatz via B4 Relay
2026-03-09  8:02 ` [PATCH net-next v3 1/6] tcp: implement RFC 7323 window retraction receiver requirements Simon Baatz via B4 Relay
2026-03-09  9:22   ` Eric Dumazet
2026-03-09 18:35     ` Simon Baatz
2026-03-10  7:40       ` Eric Dumazet
2026-03-10  8:58   ` Stefano Brivio
2026-03-10 22:34     ` Simon Baatz
2026-03-09  8:02 ` [PATCH net-next v3 2/6] mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd Simon Baatz via B4 Relay
2026-03-10  8:46   ` Eric Dumazet
2026-03-11 18:27   ` Matthieu Baerts
2026-03-11 22:08     ` Simon Baatz
2026-03-12 11:01       ` Matthieu Baerts
2026-03-09  8:02 ` [PATCH net-next v3 3/6] tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOW Simon Baatz via B4 Relay
2026-03-09  9:27   ` Eric Dumazet
2026-03-09  8:02 ` [PATCH net-next v3 4/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt Simon Baatz via B4 Relay
2026-03-10  8:46   ` Eric Dumazet
2026-03-09  8:02 ` [PATCH net-next v3 5/6] selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt Simon Baatz via B4 Relay
2026-03-10  8:52   ` Eric Dumazet
2026-03-09  8:02 ` [PATCH net-next v3 6/6] selftests/net: packetdrill: add tcp_rcv_neg_window.pkt Simon Baatz via B4 Relay
2026-03-10  8:54   ` Eric Dumazet
2026-03-10 23:09     ` Simon Baatz
2026-03-14  3:58       ` Eric Dumazet
2026-03-14 14:55         ` Eric Dumazet
2026-03-14 15:01           ` Jakub Kicinski
2026-03-14 17:07         ` Simon Baatz
2026-03-16 21:51           ` Simon Baatz
2026-03-14 15:40 ` [PATCH net-next v3 0/6] tcp: RFC 7323-compliant window retraction handling patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox