linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 00/14] AccECN protocol case handling series
@ 2025-09-18 16:21 chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
                   ` (13 more replies)
  0 siblings, 14 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Hello,

Plesae find the v2 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.

This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/

Best regards,
Chia-Yu

---
Chia-Yu Chang (11):
  tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
  tcp: disable RFC3168 fallback identifier for CC modules
  tcp: accecn: handle unexpected AccECN negotiation feedback
  tcp: accecn: retransmit downgraded SYN in AccECN negotiation
  tcp: move increment of num_retrans
  tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
    SYN/ACK
  tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
  tcp: accecn: fallback outgoing half link to non-AccECN
  tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation
  tcp: accecn: stop sending AccECN opt when loss ACK w/ option
  tcp: accecn: enable AccECN

Ilpo Järvinen (3):
  tcp: try to avoid safer when ACKs are thinned
  gro: flushing when CWR is set negatively affects AccECN
  tcp: accecn: Add ece_delta to rate_sample

 .../networking/net_cachelines/tcp_sock.rst    |  1 +
 include/linux/tcp.h                           |  4 +-
 include/net/inet_ecn.h                        | 20 +++-
 include/net/tcp.h                             | 30 +++++-
 include/net/tcp_ecn.h                         | 85 ++++++++++++-----
 net/ipv4/sysctl_net_ipv4.c                    |  2 +-
 net/ipv4/tcp.c                                |  2 +
 net/ipv4/tcp_cong.c                           |  9 +-
 net/ipv4/tcp_input.c                          | 91 +++++++++++++------
 net/ipv4/tcp_minisocks.c                      | 40 +++++---
 net/ipv4/tcp_offload.c                        |  3 +-
 net/ipv4/tcp_output.c                         | 38 +++++---
 12 files changed, 239 insertions(+), 86 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 01/14] tcp: try to avoid safer when ACKs are thinned
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-23  9:32   ` Paolo Abeni
  2025-09-18 16:21 ` [PATCH v2 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Ilpo Järvinen <ij@kernel.org>

Add newly acked pkts EWMA. When ACK thinning occurs, select
between safer and unsafe cep delta in AccECN processing based
on it. If the packets ACKed per ACK tends to be large, don't
conservatively assume ACE field overflow.

This patch uses the existing 2-byte holes in the rx group for new
u16 variables withtout creating more holes. Below are the pahole
outcomes before and after this patch:

[BEFORE THIS PATCH]
struct tcp_sock {
    [...]
    u32                        delivered_ecn_bytes[3]; /*  2744    12 */
    /* XXX 4 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */

    [...]
    /* size: 3264, cachelines: 51, members: 177 */
}

[AFTER THIS PATCH]
struct tcp_sock {
    [...]
    u32                        delivered_ecn_bytes[3]; /*  2744    12 */
    u16                        pkts_acked_ewma;        /*  2756     2 */
    /* XXX 2 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */

    [...]
    /* size: 3264, cachelines: 51, members: 178 */
}

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 .../networking/net_cachelines/tcp_sock.rst    |  1 +
 include/linux/tcp.h                           |  1 +
 net/ipv4/tcp.c                                |  2 ++
 net/ipv4/tcp_input.c                          | 20 ++++++++++++++++++-
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
index d4dc01800945..6fef633ce6b2 100644
--- a/Documentation/networking/net_cachelines/tcp_sock.rst
+++ b/Documentation/networking/net_cachelines/tcp_sock.rst
@@ -105,6 +105,7 @@ u32                           received_ce             read_mostly         read_w
 u32[3]                        received_ecn_bytes      read_mostly         read_write
 u8:4                          received_ce_pending     read_mostly         read_write
 u32[3]                        delivered_ecn_bytes                         read_write
+u16                           pkts_acked_ewma                             read_write
 u8:2                          syn_ect_snt             write_mostly        read_write
 u8:2                          syn_ect_rcv             read_mostly         read_write
 u8:2                          accecn_minlen           write_mostly        read_write
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 3ca5ed02de6d..53a4a0da35cc 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -345,6 +345,7 @@ struct tcp_sock {
 	u32	rate_interval_us;  /* saved rate sample: time elapsed */
 	u32	rcv_rtt_last_tsecr;
 	u32	delivered_ecn_bytes[3];
+	u16	pkts_acked_ewma;/* Pkts acked EWMA for AccECN cep heuristic */
 	u64	first_tx_mstamp;  /* start of window send phase */
 	u64	delivered_mstamp; /* time we reached "delivered" */
 	u64	bytes_acked;	/* RFC4898 tcpEStatsAppHCThruOctetsAcked
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d6d0d970e014..92c91e35d1b5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3415,6 +3415,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	tcp_accecn_init_counters(tp);
 	tp->prev_ecnfield = 0;
 	tp->accecn_opt_tstamp = 0;
+	tp->pkts_acked_ewma = 0;
 	if (icsk->icsk_ca_initialized && icsk->icsk_ca_ops->release)
 		icsk->icsk_ca_ops->release(sk);
 	memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
@@ -5183,6 +5184,7 @@ static void __init tcp_struct_check(void)
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, rate_interval_us);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, rcv_rtt_last_tsecr);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, delivered_ecn_bytes);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, pkts_acked_ewma);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, first_tx_mstamp);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, delivered_mstamp);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, bytes_acked);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9fdc6ce25eb1..0b25bf03ae6a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -488,6 +488,10 @@ static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
 		tcp_count_delivered_ce(tp, delivered);
 }
 
+#define PKTS_ACKED_WEIGHT	6
+#define PKTS_ACKED_PREC		6
+#define ACK_COMP_THRESH		4
+
 /* Returns the ECN CE delta */
 static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 				u32 delivered_pkts, u32 delivered_bytes,
@@ -507,6 +511,19 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 	opt_deltas_valid = tcp_accecn_process_option(tp, skb,
 						     delivered_bytes, flag);
 
+	if (delivered_pkts) {
+		if (!tp->pkts_acked_ewma) {
+			tp->pkts_acked_ewma = delivered_pkts << PKTS_ACKED_PREC;
+		} else {
+			u32 ewma = tp->pkts_acked_ewma;
+
+			ewma = (((ewma << PKTS_ACKED_WEIGHT) - ewma) +
+				(delivered_pkts << PKTS_ACKED_PREC)) >>
+				PKTS_ACKED_WEIGHT;
+			tp->pkts_acked_ewma = min_t(u32, ewma, 0xFFFFU);
+		}
+	}
+
 	if (!(flag & FLAG_SLOWPATH)) {
 		/* AccECN counter might overflow on large ACKs */
 		if (delivered_pkts <= TCP_ACCECN_CEP_ACE_MASK)
@@ -555,7 +572,8 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 		if (d_ceb <
 		    safe_delta * tp->mss_cache >> TCP_ACCECN_SAFETY_SHIFT)
 			return delta;
-	}
+	} else if (tp->pkts_acked_ewma > (ACK_COMP_THRESH << PKTS_ACKED_PREC))
+		return delta;
 
 	return safe_delta;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 03/14] tcp: accecn: Add ece_delta to rate_sample chia-yu.chang
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Ilpo Järvinen <ij@kernel.org>

As AccECN may keep CWR bit asserted due to different
interpretation of the bit, flushing with GRO because of
CWR may effectively disable GRO until AccECN counter
field changes such that CWR-bit becomes 0.

There is no harm done from not immediately forwarding the
CWR'ed segment with RFC3168 ECN.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 net/ipv4/tcp_offload.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index e6612bd84d09..533195e00e4b 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -330,8 +330,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
 		goto out_check_final;
 
 	th2 = tcp_hdr(p);
-	flush = (__force int)(flags & TCP_FLAG_CWR);
-	flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
+	flush = (__force int)((flags ^ tcp_flag_word(th2)) &
 		  ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
 	flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
 	for (i = sizeof(*th); i < thlen; i += 4)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 03/14] tcp: accecn: Add ece_delta to rate_sample
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-23  9:47   ` Paolo Abeni
  2025-09-18 16:21 ` [PATCH v2 net-next 04/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Olivier Tilmans, Chia-Yu Chang

From: Ilpo Järvinen <ij@kernel.org>

Include echoed CE count into rate_sample. Replace local ecn_count
variable with it.

Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/net/tcp.h    |  1 +
 net/ipv4/tcp_input.c | 37 ++++++++++++++++---------------------
 2 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7c51a0a5ace8..23f0355d1991 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1220,6 +1220,7 @@ struct rate_sample {
 	int  losses;		/* number of packets marked lost upon ACK */
 	u32  acked_sacked;	/* number of packets newly (S)ACKed upon ACK */
 	u32  prior_in_flight;	/* in flight before this ACK */
+	u32  ece_delta;		/* is this ACK echoing some received CE? */
 	u32  last_end_seq;	/* end_seq of most recently ACKed packet */
 	bool is_app_limited;	/* is sample from packet with bubble in pipe? */
 	bool is_retrans;	/* is sample from retransmission? */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0b25bf03ae6a..6f7c2f0fc0f6 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -578,9 +578,9 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 	return safe_delta;
 }
 
-static u32 tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
-			      u32 delivered_pkts, u32 delivered_bytes,
-			      int *flag)
+static void tcp_accecn_process(struct sock *sk, struct rate_sample *rs,
+			       const struct sk_buff *skb, u32 delivered_pkts,
+			       u32 delivered_bytes, int *flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 delta;
@@ -590,11 +590,11 @@ static u32 tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 	if (delta > 0) {
 		tcp_count_delivered_ce(tp, delta);
 		*flag |= FLAG_ECE;
+		rs->ece_delta = delta;
 		/* Recalculate header predictor */
 		if (tp->pred_flags)
 			tcp_fast_path_on(tp);
 	}
-	return delta;
 }
 
 /* Buffer size and advertised window tuning.
@@ -3971,8 +3971,8 @@ static void tcp_xmit_recovery(struct sock *sk, int rexmit)
 }
 
 /* Returns the number of packets newly acked or sacked by the current ACK */
-static u32 tcp_newly_delivered(struct sock *sk, u32 prior_delivered,
-			       u32 ecn_count, int flag)
+static u32 tcp_newly_delivered(struct sock *sk, struct rate_sample *rs,
+			       u32 prior_delivered, int flag)
 {
 	const struct net *net = sock_net(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
@@ -3983,8 +3983,8 @@ static u32 tcp_newly_delivered(struct sock *sk, u32 prior_delivered,
 
 	if (flag & FLAG_ECE) {
 		if (tcp_ecn_mode_rfc3168(tp))
-			ecn_count = delivered;
-		NET_ADD_STATS(net, LINUX_MIB_TCPDELIVEREDCE, ecn_count);
+			rs->ece_delta = delivered;
+		NET_ADD_STATS(net, LINUX_MIB_TCPDELIVEREDCE, rs->ece_delta);
 	}
 
 	return delivered;
@@ -3996,7 +3996,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_sacktag_state sack_state;
-	struct rate_sample rs = { .prior_delivered = 0 };
+	struct rate_sample rs = { .prior_delivered = 0, .ece_delta = 0 };
 	u32 prior_snd_una = tp->snd_una;
 	bool is_sack_reneg = tp->is_sack_reneg;
 	u32 ack_seq = TCP_SKB_CB(skb)->seq;
@@ -4006,7 +4006,6 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	u32 delivered = tp->delivered;
 	u32 lost = tp->lost;
 	int rexmit = REXMIT_NONE; /* Flag to (re)transmit to recover losses */
-	u32 ecn_count = 0;	  /* Did we receive ECE/an AccECN ACE update? */
 	u32 prior_fack;
 
 	sack_state.first_sackt = 0;
@@ -4116,10 +4115,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	tcp_rack_update_reo_wnd(sk, &rs);
 
 	if (tcp_ecn_mode_accecn(tp))
-		ecn_count = tcp_accecn_process(sk, skb,
-					       tp->delivered - delivered,
-					       sack_state.delivered_bytes,
-					       &flag);
+		tcp_accecn_process(sk, &rs, skb, tp->delivered - delivered,
+				   sack_state.delivered_bytes, &flag);
 
 	tcp_in_ack_event(sk, flag);
 
@@ -4145,7 +4142,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
 		sk_dst_confirm(sk);
 
-	delivered = tcp_newly_delivered(sk, delivered, ecn_count, flag);
+	delivered = tcp_newly_delivered(sk, &rs, delivered, flag);
 
 	lost = tp->lost - lost;			/* freshly marked lost */
 	rs.is_ack_delayed = !!(flag & FLAG_ACK_MAYBE_DELAYED);
@@ -4156,16 +4153,14 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 
 no_queue:
 	if (tcp_ecn_mode_accecn(tp))
-		ecn_count = tcp_accecn_process(sk, skb,
-					       tp->delivered - delivered,
-					       sack_state.delivered_bytes,
-					       &flag);
+		tcp_accecn_process(sk, &rs, skb, tp->delivered - delivered,
+				   sack_state.delivered_bytes, &flag);
 	tcp_in_ack_event(sk, flag);
 	/* If data was DSACKed, see if we can undo a cwnd reduction. */
 	if (flag & FLAG_DSACKING_ACK) {
 		tcp_fastretrans_alert(sk, prior_snd_una, num_dupack, &flag,
 				      &rexmit);
-		tcp_newly_delivered(sk, delivered, ecn_count, flag);
+		tcp_newly_delivered(sk, &rs, delivered, flag);
 	}
 	/* If this ack opens up a zero window, clear backoff.  It was
 	 * being used to time the probes, and is probably far higher than
@@ -4186,7 +4181,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 						&sack_state);
 		tcp_fastretrans_alert(sk, prior_snd_una, num_dupack, &flag,
 				      &rexmit);
-		tcp_newly_delivered(sk, delivered, ecn_count, flag);
+		tcp_newly_delivered(sk, &rs, delivered, flag);
 		tcp_xmit_recovery(sk, rexmit);
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 04/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (2 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 03/14] tcp: accecn: Add ece_delta to rate_sample chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-23 10:04   ` Paolo Abeni
  2025-09-18 16:21 ` [PATCH v2 net-next 05/14] tcp: disable RFC3168 fallback identifier " chia-yu.chang
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang, Olivier Tilmans

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Two CA module flags are added in this patch. First, a new CA module
flag (TCP_CONG_NEEDS_ACCECN) defines that the CA expects to negotiate
AccECN functionality using the ECE, CWR and AE flags in the TCP header.
The detailed AccECN negotiaotn during the 3WHS can be found in the
AccECN spec:
  https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt

Second, when ECN is negociated for a TCP flow, it defaults to use
ECT(0) in the IP header. L4S service, however, needs to se ECT(1).
This patch enables CA to control whether ECT(0) or ECT(1) should
be used on a per-segment basis. A new flag (TCP_CONG_WANTS_ECT_1)
defines the behavior expected by the CA when not-yet initialized for
the connection. As such, it implicitely assumes that the CA also has
the TCP_CONG_NEEDS_ECN set.

Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/net/inet_ecn.h | 20 +++++++++++++++++---
 include/net/tcp.h      | 20 +++++++++++++++++++-
 include/net/tcp_ecn.h  |  7 ++++---
 net/ipv4/tcp_cong.c    |  9 ++++++---
 net/ipv4/tcp_input.c   |  3 ++-
 net/ipv4/tcp_output.c  | 10 ++++++++--
 6 files changed, 56 insertions(+), 13 deletions(-)

diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index ea32393464a2..827b87a95dab 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -51,11 +51,25 @@ static inline __u8 INET_ECN_encapsulate(__u8 outer, __u8 inner)
 	return outer;
 }
 
+/* Apply either ECT(0) or ECT(1) */
+static inline void __INET_ECN_xmit(struct sock *sk, bool use_ect_1)
+{
+	__u8 ect = use_ect_1 ? INET_ECN_ECT_1 : INET_ECN_ECT_0;
+
+	/* Mask the complete byte in case the connection alternates between
+	 * ECT(0) and ECT(1).
+	 */
+	inet_sk(sk)->tos &= ~INET_ECN_MASK;
+	inet_sk(sk)->tos |= ect;
+	if (inet6_sk(sk)) {
+		inet6_sk(sk)->tclass &= ~INET_ECN_MASK;
+		inet6_sk(sk)->tclass |= ect;
+	}
+}
+
 static inline void INET_ECN_xmit(struct sock *sk)
 {
-	inet_sk(sk)->tos |= INET_ECN_ECT_0;
-	if (inet6_sk(sk) != NULL)
-		inet6_sk(sk)->tclass |= INET_ECN_ECT_0;
+	__INET_ECN_xmit(sk, false);
 }
 
 static inline void INET_ECN_dontxmit(struct sock *sk)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 23f0355d1991..4b255b92cb42 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -402,6 +402,7 @@ static inline void tcp_dec_quickack_mode(struct sock *sk)
 #define	TCP_ECN_DEMAND_CWR	BIT(2)
 #define	TCP_ECN_SEEN		BIT(3)
 #define	TCP_ECN_MODE_ACCECN	BIT(4)
+#define	TCP_ECN_ECT_1		BIT(5)
 
 #define	TCP_ECN_DISABLED	0
 #define	TCP_ECN_MODE_PENDING	(TCP_ECN_MODE_RFC3168 | TCP_ECN_MODE_ACCECN)
@@ -1189,7 +1190,12 @@ enum tcp_ca_ack_event_flags {
 #define TCP_CONG_NON_RESTRICTED		BIT(0)
 /* Requires ECN/ECT set on all packets */
 #define TCP_CONG_NEEDS_ECN		BIT(1)
-#define TCP_CONG_MASK	(TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN)
+/* Require successfully negotiated AccECN capability */
+#define TCP_CONG_NEEDS_ACCECN		BIT(2)
+/* Use ECT(1) instead of ECT(0) while the CA is uninitialized */
+#define TCP_CONG_WANTS_ECT_1 (TCP_CONG_NEEDS_ECN | TCP_CONG_NEEDS_ACCECN)
+#define TCP_CONG_MASK  (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN | \
+			TCP_CONG_NEEDS_ACCECN)
 
 union tcp_cc_info;
 
@@ -1322,6 +1328,18 @@ static inline bool tcp_ca_needs_ecn(const struct sock *sk)
 	return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ECN;
 }
 
+static inline bool tcp_ca_needs_accecn(const struct sock *sk)
+{
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+
+	return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ACCECN;
+}
+
+static inline bool tcp_ca_wants_ect_1(const struct sock *sk)
+{
+	return inet_csk(sk)->icsk_ca_ops->flags & TCP_CONG_WANTS_ECT_1;
+}
+
 static inline void tcp_ca_event(struct sock *sk, const enum tcp_ca_event event)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index f13e5cd2b1ac..bde5835a30f0 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -561,7 +561,7 @@ static inline void tcp_ecn_send_synack(struct sock *sk, struct sk_buff *skb)
 		TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ECE;
 	else if (tcp_ca_needs_ecn(sk) ||
 		 tcp_bpf_ca_needs_ecn(sk))
-		INET_ECN_xmit(sk);
+		__INET_ECN_xmit(sk, tcp_ca_wants_ect_1(sk));
 
 	if (tp->ecn_flags & TCP_ECN_MODE_ACCECN) {
 		TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ACE;
@@ -579,7 +579,8 @@ static inline void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
 	bool use_ecn, use_accecn;
 	u8 tcp_ecn = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn);
 
-	use_accecn = tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ACCECN;
+	use_accecn = tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ACCECN ||
+		     tcp_ca_needs_accecn(sk);
 	use_ecn = tcp_ecn == TCP_ECN_IN_ECN_OUT_ECN ||
 		  tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ECN ||
 		  tcp_ca_needs_ecn(sk) || bpf_needs_ecn || use_accecn;
@@ -595,7 +596,7 @@ static inline void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
 
 	if (use_ecn) {
 		if (tcp_ca_needs_ecn(sk) || bpf_needs_ecn)
-			INET_ECN_xmit(sk);
+			__INET_ECN_xmit(sk, tcp_ca_wants_ect_1(sk));
 
 		TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR;
 		if (use_accecn) {
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index df758adbb445..f9efbcf1d856 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -227,7 +227,7 @@ void tcp_assign_congestion_control(struct sock *sk)
 
 	memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
 	if (ca->flags & TCP_CONG_NEEDS_ECN)
-		INET_ECN_xmit(sk);
+		__INET_ECN_xmit(sk, tcp_ca_wants_ect_1(sk));
 	else
 		INET_ECN_dontxmit(sk);
 }
@@ -240,7 +240,10 @@ void tcp_init_congestion_control(struct sock *sk)
 	if (icsk->icsk_ca_ops->init)
 		icsk->icsk_ca_ops->init(sk);
 	if (tcp_ca_needs_ecn(sk))
-		INET_ECN_xmit(sk);
+		/* The CA is already initialized, expect it to set the
+		 * appropriate flag to select ECT(1).
+		 */
+		__INET_ECN_xmit(sk, tcp_sk(sk)->ecn_flags & TCP_ECN_ECT_1);
 	else
 		INET_ECN_dontxmit(sk);
 	icsk->icsk_ca_initialized = 1;
@@ -257,7 +260,7 @@ static void tcp_reinit_congestion_control(struct sock *sk,
 	memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
 
 	if (ca->flags & TCP_CONG_NEEDS_ECN)
-		INET_ECN_xmit(sk);
+		__INET_ECN_xmit(sk, tcp_ca_wants_ect_1(sk));
 	else
 		INET_ECN_dontxmit(sk);
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6f7c2f0fc0f6..b0557f72f3be 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -7216,7 +7216,8 @@ static void tcp_ecn_create_request(struct request_sock *req,
 	u32 ecn_ok_dst;
 
 	if (tcp_accecn_syn_requested(th) &&
-	    READ_ONCE(net->ipv4.sysctl_tcp_ecn) >= 3) {
+	    (READ_ONCE(net->ipv4.sysctl_tcp_ecn) >= 3 ||
+	     tcp_ca_needs_accecn(listen_sk))) {
 		inet_rsk(req)->ecn_ok = 1;
 		tcp_rsk(req)->accecn_ok = 1;
 		tcp_rsk(req)->syn_ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 223d7feeb19d..18bbb30381f0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -328,12 +328,17 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
 			 struct tcphdr *th, int tcp_header_len)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	bool ecn_ect_1;
 
 	if (!tcp_ecn_mode_any(tp))
 		return;
 
+	ecn_ect_1 = tp->ecn_flags & TCP_ECN_ECT_1;
+	if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
+		__INET_ECN_xmit(sk, true);
+
 	if (tcp_ecn_mode_accecn(tp)) {
-		if (!tcp_accecn_ace_fail_recv(tp))
+		if (!ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
 			INET_ECN_xmit(sk);
 		tcp_accecn_set_ace(tp, skb, th);
 		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN;
@@ -341,7 +346,8 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
 		/* Not-retransmitted data segment: set ECT and inject CWR. */
 		if (skb->len != tcp_header_len &&
 		    !before(TCP_SKB_CB(skb)->seq, tp->snd_nxt)) {
-			INET_ECN_xmit(sk);
+			if (!ecn_ect_1)
+				INET_ECN_xmit(sk);
 			if (tp->ecn_flags & TCP_ECN_QUEUE_CWR) {
 				tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR;
 				th->cwr = 1;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 05/14] tcp: disable RFC3168 fallback identifier for CC modules
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (3 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 04/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-23 10:08   ` Paolo Abeni
  2025-09-18 16:21 ` [PATCH v2 net-next 06/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

When AccECN is not successfully negociated for a TCP flow, it defaults
fallback to classic ECN (RFC3168). However, L4S service will fallback
to non-ECN.

This patch enables congestion control module to control whether it
should not fallback to classic ECN after unsuccessful AccECN negotiation.
A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this
behavior expected by the CA.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/net/tcp.h        | 11 ++++++++++-
 include/net/tcp_ecn.h    | 10 +++++++---
 net/ipv4/tcp_input.c     |  2 +-
 net/ipv4/tcp_minisocks.c |  7 ++++---
 4 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 4b255b92cb42..8458aa79f509 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1192,10 +1192,12 @@ enum tcp_ca_ack_event_flags {
 #define TCP_CONG_NEEDS_ECN		BIT(1)
 /* Require successfully negotiated AccECN capability */
 #define TCP_CONG_NEEDS_ACCECN		BIT(2)
+/* Cannot fallback to RFC3168 during AccECN negotiation */
+#define TCP_CONG_NO_FALLBACK_RFC3168	BIT(3)
 /* Use ECT(1) instead of ECT(0) while the CA is uninitialized */
 #define TCP_CONG_WANTS_ECT_1 (TCP_CONG_NEEDS_ECN | TCP_CONG_NEEDS_ACCECN)
 #define TCP_CONG_MASK  (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN | \
-			TCP_CONG_NEEDS_ACCECN)
+			TCP_CONG_NEEDS_ACCECN | TCP_CONG_NO_FALLBACK_RFC3168)
 
 union tcp_cc_info;
 
@@ -1335,6 +1337,13 @@ static inline bool tcp_ca_needs_accecn(const struct sock *sk)
 	return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ACCECN;
 }
 
+static inline bool tcp_ca_no_fallback_rfc3168(const struct sock *sk)
+{
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+
+	return icsk->icsk_ca_ops->flags & TCP_CONG_NO_FALLBACK_RFC3168;
+}
+
 static inline bool tcp_ca_wants_ect_1(const struct sock *sk)
 {
 	return inet_csk(sk)->icsk_ca_ops->flags & TCP_CONG_WANTS_ECT_1;
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index bde5835a30f0..fb4fcb641d24 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -501,7 +501,9 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
 		 * | ECN    | AccECN | 0   0   1  | Classic ECN |
 		 * +========+========+============+=============+
 		 */
-		if (tcp_ecn_mode_pending(tp))
+		if (tcp_ca_no_fallback_rfc3168(sk))
+			tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
+		else if (tcp_ecn_mode_pending(tp))
 			/* Downgrade from AccECN, or requested initially */
 			tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
 		break;
@@ -525,9 +527,10 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
 	}
 }
 
-static inline void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th,
+static inline void tcp_ecn_rcv_syn(struct sock *sk, const struct tcphdr *th,
 				   const struct sk_buff *skb)
 {
+	struct tcp_sock *tp = tcp_sk(sk);
 	if (tcp_ecn_mode_pending(tp)) {
 		if (!tcp_accecn_syn_requested(th)) {
 			/* Downgrade to classic ECN feedback */
@@ -539,7 +542,8 @@ static inline void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th,
 			tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
 		}
 	}
-	if (tcp_ecn_mode_rfc3168(tp) && (!th->ece || !th->cwr))
+	if (tcp_ecn_mode_rfc3168(tp) &&
+	    (!th->ece || !th->cwr || tcp_ca_no_fallback_rfc3168(sk)))
 		tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
 }
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b0557f72f3be..beb23dd34c5a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6811,7 +6811,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 		tp->snd_wl1    = TCP_SKB_CB(skb)->seq;
 		tp->max_window = tp->snd_wnd;
 
-		tcp_ecn_rcv_syn(tp, th, skb);
+		tcp_ecn_rcv_syn(sk, th, skb);
 
 		tcp_mtup_init(sk);
 		tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 2ec8c6f1cdcc..1fade94813c6 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -488,9 +488,10 @@ static void tcp_ecn_openreq_child(struct sock *sk,
 		tp->accecn_opt_demand = 1;
 		tcp_ecn_received_counters_payload(sk, skb);
 	} else {
-		tcp_ecn_mode_set(tp, inet_rsk(req)->ecn_ok ?
-				     TCP_ECN_MODE_RFC3168 :
-				     TCP_ECN_DISABLED);
+		if (inet_rsk(req)->ecn_ok && !tcp_ca_no_fallback_rfc3168(sk))
+			tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
+		else
+			tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
 	}
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 06/14] tcp: accecn: handle unexpected AccECN negotiation feedback
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (4 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 05/14] tcp: disable RFC3168 fallback identifier " chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-23 10:14   ` Paolo Abeni
  2025-09-18 16:21 ` [PATCH v2 net-next 07/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

According to Section 3.1.2 of AccECN spec (RFC9768), if a TCP Client
has sent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1)
then receives a SYN/ACK with the currently reserved combination
(AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a
combination, the Client MUST enable AccECN mode as if the SYN/ACK
confirmed that the Server supported AccECN and as if it fed back that
the IP-ECN field on the SYN had arrived unchanged.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/net/tcp_ecn.h | 44 ++++++++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index fb4fcb641d24..7ceb191893c6 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -467,6 +467,26 @@ static inline u8 tcp_accecn_option_init(const struct sk_buff *skb,
 	return TCP_ACCECN_OPT_COUNTER_SEEN;
 }
 
+static inline void tcp_ecn_rcv_synack_accecn(struct sock *sk,
+					     const struct sk_buff *skb, u8 dsf)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
+	tp->syn_ect_rcv = dsf & INET_ECN_MASK;
+	/* Demand Accurate ECN option in response to the SYN on the SYN/ACK
+	 * and the TCP server will try to send one more packet with an AccECN
+	 * Option at a later point during the connection.
+	 */
+	if (tp->rx_opt.accecn &&
+	    tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
+		u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
+
+		tcp_accecn_saw_opt_fail_recv(tp, saw_opt);
+		tp->accecn_opt_demand = 2;
+	}
+}
+
 /* See Table 2 of the AccECN draft */
 static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb,
 				      const struct tcphdr *th, u8 ip_dsfield)
@@ -489,13 +509,11 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
 		tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
 		break;
 	case 0x1:
-	case 0x5:
 		/* +========+========+============+=============+
 		 * | A      | B      |  SYN/ACK   |  Feedback   |
 		 * |        |        |    B->A    |  Mode of A  |
 		 * |        |        | AE CWR ECE |             |
 		 * +========+========+============+=============+
-		 * | AccECN | Nonce  | 1   0   1  | (Reserved)  |
 		 * | AccECN | ECN    | 0   0   1  | Classic ECN |
 		 * | Nonce  | AccECN | 0   0   1  | Classic ECN |
 		 * | ECN    | AccECN | 0   0   1  | Classic ECN |
@@ -503,20 +521,20 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
 		 */
 		if (tcp_ca_no_fallback_rfc3168(sk))
 			tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
-		else if (tcp_ecn_mode_pending(tp))
-			/* Downgrade from AccECN, or requested initially */
+		else
 			tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
 		break;
-	default:
-		tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
-		tp->syn_ect_rcv = ip_dsfield & INET_ECN_MASK;
-		if (tp->rx_opt.accecn &&
-		    tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
-			u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
-
-			tcp_accecn_saw_opt_fail_recv(tp, saw_opt);
-			tp->accecn_opt_demand = 2;
+	case 0x5:
+		if (tcp_ecn_mode_pending(tp)) {
+			tcp_ecn_rcv_synack_accecn(sk, skb, ip_dsfield);
+			if (INET_ECN_is_ce(ip_dsfield)) {
+				tp->received_ce++;
+				tp->received_ce_pending++;
+			}
 		}
+		break;
+	default:
+		tcp_ecn_rcv_synack_accecn(sk, skb, ip_dsfield);
 		if (INET_ECN_is_ce(ip_dsfield) &&
 		    tcp_accecn_validate_syn_feedback(sk, ace,
 						     tp->syn_ect_snt)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 07/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (5 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 06/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 08/14] tcp: move increment of num_retrans chia-yu.chang
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Based on AccECN spec (RFC9768), if the sender of an AccECN SYN
(the TCP Client) times out before receiving the SYN/ACK, it SHOULD
attempt to negotiate the use of AccECN at least one more time by
continuing to set all three TCP ECN flags (AE,CWR,ECE) = (1,1,1) on
the first retransmitted SYN (using the usual retransmission time-outs).

If this first retransmission also fails to be acknowledged, in
deployment scenarios where AccECN path traversal might be problematic,
the TCP Client SHOULD send subsequent retransmissions of the SYN with
the three TCP-ECN flags cleared (AE,CWR,ECE) = (0,0,0).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 net/ipv4/tcp_output.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 18bbb30381f0..1c372fe0e57f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3563,12 +3563,15 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 			tcp_retrans_try_collapse(sk, skb, avail_wnd);
 	}
 
-	/* RFC3168, section 6.1.1.1. ECN fallback
-	 * As AccECN uses the same SYN flags (+ AE), this check covers both
-	 * cases.
-	 */
-	if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) == TCPHDR_SYN_ECN)
-		tcp_ecn_clear_syn(sk, skb);
+	if (!tcp_ecn_mode_pending(tp) || icsk->icsk_retransmits > 1) {
+		/* RFC3168, section 6.1.1.1. ECN fallback
+		 * As AccECN uses the same SYN flags (+ AE), this check
+		 * covers both cases.
+		 */
+		if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) ==
+		    TCPHDR_SYN_ECN)
+			tcp_ecn_clear_syn(sk, skb);
+	}
 
 	/* Update global and local TCP statistics. */
 	segs = tcp_skb_pcount(skb);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 08/14] tcp: move increment of num_retrans
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (6 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 07/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 09/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Before this patch, num_retrans = 0 for the first SYN/ACK and the first
retransmitted SYN/ACK; however, an upcoming change will need to
differentiate between those two conditions. This patch moves the
increment of num_tranns before rtx_syn_ack() so we can distinguish
between these two cases when making SYN/ACK.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 net/ipv4/tcp_output.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1c372fe0e57f..bf90c2b33af3 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4593,6 +4593,7 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
 	/* Paired with WRITE_ONCE() in sock_setsockopt() */
 	if (READ_ONCE(sk->sk_txrehash) == SOCK_TXREHASH_ENABLED)
 		WRITE_ONCE(tcp_rsk(req)->txhash, net_tx_rndhash());
+	WRITE_ONCE(req->num_retrans, req->num_retrans + 1);
 	res = af_ops->send_synack(sk, NULL, &fl, req, NULL, TCP_SYNACK_NORMAL,
 				  NULL);
 	if (!res) {
@@ -4606,7 +4607,8 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
 			tcp_sk_rw(sk)->total_retrans++;
 		}
 		trace_tcp_retransmit_synack(sk, req);
-		WRITE_ONCE(req->num_retrans, req->num_retrans + 1);
+	} else {
+		WRITE_ONCE(req->num_retrans, req->num_retrans - 1);
 	}
 	return res;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 09/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (7 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 08/14] tcp: move increment of num_retrans chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 10/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

If the TCP Server has not received an ACK to acknowledge its SYN/ACK
after the normal TCP timeout or it receives a second SYN with a
request for AccECN support, then either the SYN/ACK might just have
been lost, e.g. due to congestion, or a middlebox might be blocking
AccECN Options. To expedite connection setup in deployment scenarios
where AccECN path traversal might be problematic, the TCP Server SHOULD
retransmit the SYN/ACK, but with no AccECN Option.

If this retransmission times out, to expedite connection setup, the TCP
Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0)
and no AccECN Option, but it remains in AccECN feedback mode.

This follows Section 3.2.3.2.2 of AccECN spec (RFC9768).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/net/tcp_ecn.h | 14 ++++++++++----
 net/ipv4/tcp_output.c |  2 +-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 7ceb191893c6..f01cac9d39c9 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -644,10 +644,16 @@ static inline void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb)
 static inline void
 tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
 {
-	if (tcp_rsk(req)->accecn_ok)
-		tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
-	else if (inet_rsk(req)->ecn_ok)
-		th->ece = 1;
+	if (req->num_retrans < 1 || req->num_timeout < 1) {
+		if (tcp_rsk(req)->accecn_ok)
+			tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
+		else if (inet_rsk(req)->ecn_ok)
+			th->ece = 1;
+	} else if (tcp_rsk(req)->accecn_ok) {
+		th->ae  = 0;
+		th->cwr = 0;
+		th->ece = 0;
+	}
 }
 
 static inline bool tcp_accecn_option_beacon_check(const struct sock *sk)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bf90c2b33af3..d4aae5dc2201 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1109,7 +1109,7 @@ static unsigned int tcp_synack_options(const struct sock *sk,
 
 	if (treq->accecn_ok &&
 	    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn_option) &&
-	    req->num_timeout < 1 && remaining >= TCPOLEN_ACCECN_BASE) {
+	    req->num_retrans < 1 && remaining >= TCPOLEN_ACCECN_BASE) {
 		opts->use_synack_ecn_bytes = 1;
 		remaining -= tcp_options_fit_accecn(opts, 0, remaining);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 10/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (8 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 09/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 11/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Based on specification:
  https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt

Based on Section 3.1.5 of AccECN spec (RFC9768), a TCP Server in
AccECN mode MUST NOT set ECT on any packet for the rest of the connection,
if it has received or sent at least one valid SYN or Acceptable SYN/ACK
with (AE,CWR,ECE) = (0,0,0) during the handshake.

In addition, a host in AccECN mode that is feeding back the IP-ECN
field on a SYN or SYN/ACK MUST feed back the IP-ECN field on the
latest valid SYN or acceptable SYN/ACK to arrive.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/net/tcp_ecn.h    |  4 +++-
 net/ipv4/tcp_input.c     |  2 ++
 net/ipv4/tcp_minisocks.c | 33 +++++++++++++++++++++++----------
 net/ipv4/tcp_output.c    |  8 +++++---
 4 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index f01cac9d39c9..2256d2efa5ec 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -642,7 +642,8 @@ static inline void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb)
 }
 
 static inline void
-tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
+tcp_ecn_make_synack(struct sock *sk, const struct request_sock *req,
+		    struct tcphdr *th)
 {
 	if (req->num_retrans < 1 || req->num_timeout < 1) {
 		if (tcp_rsk(req)->accecn_ok)
@@ -653,6 +654,7 @@ tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
 		th->ae  = 0;
 		th->cwr = 0;
 		th->ece = 0;
+		tcp_accecn_fail_mode_set(tcp_sk(sk), TCP_ACCECN_ACE_FAIL_SEND);
 	}
 }
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index beb23dd34c5a..752251268cbe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6190,6 +6190,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
 	if (th->syn) {
 		if (tcp_ecn_mode_accecn(tp)) {
 			accecn_reflector = true;
+			tp->syn_ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
+					  INET_ECN_MASK;
 			if (tp->rx_opt.accecn &&
 			    tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
 				u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 1fade94813c6..ecead2b771fd 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -751,16 +751,29 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 		 */
 		if (!tcp_oow_rate_limited(sock_net(sk), skb,
 					  LINUX_MIB_TCPACKSKIPPEDSYNRECV,
-					  &tcp_rsk(req)->last_oow_ack_time) &&
-
-		    !tcp_rtx_synack(sk, req)) {
-			unsigned long expires = jiffies;
-
-			expires += reqsk_timeout(req, TCP_RTO_MAX);
-			if (!fastopen)
-				mod_timer_pending(&req->rsk_timer, expires);
-			else
-				req->rsk_timer.expires = expires;
+					  &tcp_rsk(req)->last_oow_ack_time)) {
+			if (tcp_rsk(req)->accecn_ok) {
+				u8 ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
+					     INET_ECN_MASK;
+
+				tcp_rsk(req)->syn_ect_rcv = ect_rcv;
+				if (tcp_accecn_ace(tcp_hdr(skb)) == 0x0) {
+					u8 fail_mode = TCP_ACCECN_ACE_FAIL_RECV;
+
+					tcp_accecn_fail_mode_set(tcp_sk(sk),
+								 fail_mode);
+				}
+			}
+			if (!tcp_rtx_synack(sk, req)) {
+				unsigned long expires = jiffies;
+
+				expires += reqsk_timeout(req, TCP_RTO_MAX);
+				if (!fastopen)
+					mod_timer_pending(&req->rsk_timer,
+							  expires);
+				else
+					req->rsk_timer.expires = expires;
+			}
 		}
 		return NULL;
 	}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d4aae5dc2201..df07b76ba84f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -334,11 +334,13 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
 		return;
 
 	ecn_ect_1 = tp->ecn_flags & TCP_ECN_ECT_1;
-	if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
+	if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp) &&
+	    !tcp_accecn_ace_fail_send(tp))
 		__INET_ECN_xmit(sk, true);
 
 	if (tcp_ecn_mode_accecn(tp)) {
-		if (!ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
+		if (!ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp) &&
+		    !tcp_accecn_ace_fail_send(tp))
 			INET_ECN_xmit(sk);
 		tcp_accecn_set_ace(tp, skb, th);
 		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN;
@@ -3990,7 +3992,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 	memset(th, 0, sizeof(struct tcphdr));
 	th->syn = 1;
 	th->ack = 1;
-	tcp_ecn_make_synack(req, th);
+	tcp_ecn_make_synack((struct sock *)sk, req, th);
 	th->source = htons(ireq->ir_num);
 	th->dest = ireq->ir_rmt_port;
 	skb->mark = ireq->ir_mark;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 11/14] tcp: accecn: fallback outgoing half link to non-AccECN
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (9 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 10/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-23 10:28   ` Paolo Abeni
  2025-09-18 16:21 ` [PATCH v2 net-next 12/14] tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation chia-yu.chang
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server
is in AccECN mode and in SYN-RCVD state, and if it receives a value of
zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the
connection the Server MUST NOT set ECT on outgoing packets and MUST
NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it
MUST NOT disable AccECN feedback.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/net/tcp_ecn.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 2256d2efa5ec..8317c3f279c9 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -169,7 +169,10 @@ static inline void tcp_accecn_third_ack(struct sock *sk,
 	switch (ace) {
 	case 0x0:
 		/* Invalid value */
-		tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV);
+		if (!TCP_SKB_CB(skb)->sacked) {
+			tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV |
+						     TCP_ACCECN_OPT_FAIL_RECV);
+		}
 		break;
 	case 0x7:
 	case 0x5:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 12/14] tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (10 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 11/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 13/14] tcp: accecn: stop sending AccECN opt when loss ACK w/ option chia-yu.chang
  2025-09-18 16:21 ` [PATCH v2 net-next 14/14] tcp: accecn: enable AccECN chia-yu.chang
  13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

After successfully negotiating AccECN mode in the handshake, check
the ACE field of the first data ACK. If zero (0b000), non-ECT packets
are sent and any response to CE marking feedback is disabled. This
follows Section 3.2.2.4 of AccECN spec (RFC9768).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 net/ipv4/tcp_input.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 752251268cbe..f250b4912a0b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -495,7 +495,7 @@ static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
 /* Returns the ECN CE delta */
 static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 				u32 delivered_pkts, u32 delivered_bytes,
-				int flag)
+				u64 prior_bytes_acked, int flag)
 {
 	u32 old_ceb = tcp_sk(sk)->delivered_ecn_bytes[INET_ECN_CE - 1];
 	const struct tcphdr *th = tcp_hdr(skb);
@@ -534,6 +534,16 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 	if (flag & FLAG_SYN_ACKED)
 		return 0;
 
+	/* Verify ACE!=0 in the 1st data ACK after AccECN negotiation */
+	if ((flag & FLAG_DATA_ACKED) && prior_bytes_acked <= tp->mss_cache) {
+		if (tcp_accecn_ace(tcp_hdr(skb)) == 0x0) {
+			INET_ECN_dontxmit(sk);
+			tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV |
+						     TCP_ACCECN_OPT_FAIL_RECV);
+			return 0;
+		}
+	}
+
 	if (tp->received_ce_pending >= TCP_ACCECN_ACE_MAX_DELTA)
 		inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
 
@@ -580,13 +590,14 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 
 static void tcp_accecn_process(struct sock *sk, struct rate_sample *rs,
 			       const struct sk_buff *skb, u32 delivered_pkts,
-			       u32 delivered_bytes, int *flag)
+			       u32 delivered_bytes, u64 prior_bytes_acked,
+			       int *flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 delta;
 
 	delta = __tcp_accecn_process(sk, skb, delivered_pkts,
-				     delivered_bytes, *flag);
+				     delivered_bytes, prior_bytes_acked, *flag);
 	if (delta > 0) {
 		tcp_count_delivered_ce(tp, delta);
 		*flag |= FLAG_ECE;
@@ -3997,6 +4008,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_sacktag_state sack_state;
 	struct rate_sample rs = { .prior_delivered = 0, .ece_delta = 0 };
+	u64 prior_bytes_acked = tp->bytes_acked;
 	u32 prior_snd_una = tp->snd_una;
 	bool is_sack_reneg = tp->is_sack_reneg;
 	u32 ack_seq = TCP_SKB_CB(skb)->seq;
@@ -4116,7 +4128,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 
 	if (tcp_ecn_mode_accecn(tp))
 		tcp_accecn_process(sk, &rs, skb, tp->delivered - delivered,
-				   sack_state.delivered_bytes, &flag);
+				   sack_state.delivered_bytes,
+				   prior_bytes_acked, &flag);
 
 	tcp_in_ack_event(sk, flag);
 
@@ -4154,7 +4167,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 no_queue:
 	if (tcp_ecn_mode_accecn(tp))
 		tcp_accecn_process(sk, &rs, skb, tp->delivered - delivered,
-				   sack_state.delivered_bytes, &flag);
+				   sack_state.delivered_bytes,
+				   prior_bytes_acked, &flag);
 	tcp_in_ack_event(sk, flag);
 	/* If data was DSACKed, see if we can undo a cwnd reduction. */
 	if (flag & FLAG_DSACKING_ACK) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 13/14] tcp: accecn: stop sending AccECN opt when loss ACK w/ option
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (11 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 12/14] tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  2025-09-23 10:52   ` Paolo Abeni
  2025-09-18 16:21 ` [PATCH v2 net-next 14/14] tcp: accecn: enable AccECN chia-yu.chang
  13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Detect spurious retransmission of a previously sent ACK carrying the
AccECN option after the second retransmission. Since this might be caused
by the middlebox dropping ACK with options it does not recognize, disable
the sending of the AccECN option in all subsequent ACKs. This patch
follows Section 3.2.3.2.2 of AccECN spec (RFC9768).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/linux/tcp.h   | 3 ++-
 include/net/tcp_ecn.h | 1 +
 net/ipv4/tcp_input.c  | 9 +++++++++
 net/ipv4/tcp_output.c | 3 +++
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 53a4a0da35cc..f3b72fb6fa76 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -295,7 +295,8 @@ struct tcp_sock {
 	u8	nonagle     : 4,/* Disable Nagle algorithm?             */
 		rate_app_limited:1;  /* rate_{delivered,interval_us} limited? */
 	u8	received_ce_pending:4, /* Not yet transmit cnt of received_ce */
-		unused2:4;
+		accecn_opt_sent:1,/* Sent AccECN option in previous ACK */
+		unused2:3;
 	u8	accecn_minlen:2,/* Minimum length of AccECN option sent */
 		est_ecnfield:2,/* ECN field for AccECN delivered estimates */
 		accecn_opt_demand:2,/* Demand AccECN option for n next ACKs */
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 8317c3f279c9..3eb450b4b648 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -401,6 +401,7 @@ static inline void tcp_accecn_init_counters(struct tcp_sock *tp)
 	tp->received_ce_pending = 0;
 	__tcp_accecn_init_bytes_counters(tp->received_ecn_bytes);
 	__tcp_accecn_init_bytes_counters(tp->delivered_ecn_bytes);
+	tp->accecn_opt_sent = 0;
 	tp->accecn_minlen = 0;
 	tp->accecn_opt_demand = 0;
 	tp->est_ecnfield = 0;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f250b4912a0b..6083260133e9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4804,6 +4804,7 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq)
 
 static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
 {
+	struct tcp_sock *tp = tcp_sk(sk);
 	/* When the ACK path fails or drops most ACKs, the sender would
 	 * timeout and spuriously retransmit the same segment repeatedly.
 	 * If it seems our ACKs are not reaching the other side,
@@ -4823,6 +4824,14 @@ static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
 	/* Save last flowlabel after a spurious retrans. */
 	tcp_save_lrcv_flowlabel(sk, skb);
 #endif
+	/* Check DSACK info to detect that the previous ACK carrying the
+	 * AccECN option was lost after the second retransmision, and then
+	 * stop sending AccECN option in all subsequent ACKs.
+	 */
+	if (tcp_ecn_mode_accecn(tp) &&
+	    TCP_SKB_CB(skb)->seq == tp->duplicate_sack[0].start_seq &&
+	    tp->accecn_opt_sent)
+		tcp_accecn_fail_mode_set(tp, TCP_ACCECN_OPT_FAIL_SEND);
 }
 
 static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index df07b76ba84f..a2f897b044b9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -719,9 +719,12 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
 		if (tp) {
 			tp->accecn_minlen = 0;
 			tp->accecn_opt_tstamp = tp->tcp_mstamp;
+			tp->accecn_opt_sent = 1;
 			if (tp->accecn_opt_demand)
 				tp->accecn_opt_demand--;
 		}
+	} else if (tp) {
+		tp->accecn_opt_sent = 0;
 	}
 
 	if (unlikely(OPTION_SACK_ADVERTISE & options)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 net-next 14/14] tcp: accecn: enable AccECN
  2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (12 preceding siblings ...)
  2025-09-18 16:21 ` [PATCH v2 net-next 13/14] tcp: accecn: stop sending AccECN opt when loss ACK w/ option chia-yu.chang
@ 2025-09-18 16:21 ` chia-yu.chang
  13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-09-18 16:21 UTC (permalink / raw)
  To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
	netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
	davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Enable Accurate ECN negotiation and request for incoming and
outgoing connection by setting sysctl_tcp_ecn:

+==============+===========================================+
|              |  Highest ECN variant (Accurate ECN, ECN,  |
|   tcp_ecn    |  or no ECN) to be negotiated & requested  |
|              +---------------------+---------------------+
|              | Incoming connection | Outgoing connection |
+==============+=====================+=====================+
|      0       |        No ECN       |        No ECN       |
|      1       |         ECN         |         ECN         |
|      2       |         ECN         |        No ECN       |
+--------------+---------------------+---------------------+
|      3       |     Accurate ECN    |     Accurate ECN    |
|      4       |     Accurate ECN    |         ECN         |
|      5       |     Accurate ECN    |        No ECN       |
+==============+=====================+=====================+

Refer Documentation/networking/ip-sysctl.rst for more details.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 net/ipv4/sysctl_net_ipv4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 24dbc603cc44..169a393374b3 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -47,7 +47,7 @@ static unsigned int udp_child_hash_entries_max = UDP_HTABLE_SIZE_MAX;
 static int tcp_plb_max_rounds = 31;
 static int tcp_plb_max_cong_thresh = 256;
 static unsigned int tcp_tw_reuse_delay_max = TCP_PAWS_MSL * MSEC_PER_SEC;
-static int tcp_ecn_mode_max = 2;
+static int tcp_ecn_mode_max = 5;
 
 /* obsolete */
 static int sysctl_tcp_low_latency __read_mostly;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 net-next 01/14] tcp: try to avoid safer when ACKs are thinned
  2025-09-18 16:21 ` [PATCH v2 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
@ 2025-09-23  9:32   ` Paolo Abeni
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-09-23  9:32 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 9fdc6ce25eb1..0b25bf03ae6a 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -488,6 +488,10 @@ static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
>  		tcp_count_delivered_ce(tp, delivered);
>  }
>  
> +#define PKTS_ACKED_WEIGHT	6
> +#define PKTS_ACKED_PREC		6
> +#define ACK_COMP_THRESH		4
> +
>  /* Returns the ECN CE delta */
>  static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
>  				u32 delivered_pkts, u32 delivered_bytes,
> @@ -507,6 +511,19 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
>  	opt_deltas_valid = tcp_accecn_process_option(tp, skb,
>  						     delivered_bytes, flag);
>  
> +	if (delivered_pkts) {
> +		if (!tp->pkts_acked_ewma) {
> +			tp->pkts_acked_ewma = delivered_pkts << PKTS_ACKED_PREC;

Can the above statement overflow pkts_acked_ewma in exceptional cases?
Do that need and additional min_t(u32, <init val>, 0xFFFFU)?

/P


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 net-next 03/14] tcp: accecn: Add ece_delta to rate_sample
  2025-09-18 16:21 ` [PATCH v2 net-next 03/14] tcp: accecn: Add ece_delta to rate_sample chia-yu.chang
@ 2025-09-23  9:47   ` Paolo Abeni
  2025-09-25  7:40     ` Chia-Yu Chang (Nokia)
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Abeni @ 2025-09-23  9:47 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Olivier Tilmans

On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Ilpo Järvinen <ij@kernel.org>
> 
> Include echoed CE count into rate_sample. Replace local ecn_count
> variable with it.

Why? skimming over the next few patches it's not clear to me which is
the goal here.

Expanding the commit message would help, thanks!

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 net-next 04/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
  2025-09-18 16:21 ` [PATCH v2 net-next 04/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
@ 2025-09-23 10:04   ` Paolo Abeni
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-09-23 10:04 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Olivier Tilmans

On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> Two CA module flags are added in this patch. First, a new CA module
> flag (TCP_CONG_NEEDS_ACCECN) defines that the CA expects to negotiate
> AccECN functionality using the ECE, CWR and AE flags in the TCP header.
> The detailed AccECN negotiaotn during the 3WHS can be found in the
> AccECN spec:
>   https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
> 
> Second, when ECN is negociated for a TCP flow, it defaults to use
> ECT(0) in the IP header. L4S service, however, needs to se ECT(1).
> This patch enables CA to control whether ECT(0) or ECT(1) should
> be used on a per-segment basis. A new flag (TCP_CONG_WANTS_ECT_1)

I find this description confusing/contradictory with the implementation
where TCP_CONG_WANTS_ECT_1 is actually a mask.


> @@ -1322,6 +1328,18 @@ static inline bool tcp_ca_needs_ecn(const struct sock *sk)
>  	return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ECN;
>  }
>  
> +static inline bool tcp_ca_needs_accecn(const struct sock *sk)
> +{
> +	const struct inet_connection_sock *icsk = inet_csk(sk);
> +
> +	return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ACCECN;
> +}
> +
> +static inline bool tcp_ca_wants_ect_1(const struct sock *sk)
> +{
> +	return inet_csk(sk)->icsk_ca_ops->flags & TCP_CONG_WANTS_ECT_1;

Should the above tests be:

	(inet_csk(sk)->icsk_ca_ops->flags & TCP_CONG_WANTS_ECT_1) ==
TCP_CONG_WANTS_ECT_1

?

Otherwise existing CC with TCP_CONG_NEEDS_ECN will unexpectedly switch
to ECT_1 usage.

[...]
> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index df758adbb445..f9efbcf1d856 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -227,7 +227,7 @@ void tcp_assign_congestion_control(struct sock *sk)
>  
>  	memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
>  	if (ca->flags & TCP_CONG_NEEDS_ECN)
> -		INET_ECN_xmit(sk);
> +		__INET_ECN_xmit(sk, tcp_ca_wants_ect_1(sk));

Possibly a new helper for the above statement could be useful

/P


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 net-next 05/14] tcp: disable RFC3168 fallback identifier for CC modules
  2025-09-18 16:21 ` [PATCH v2 net-next 05/14] tcp: disable RFC3168 fallback identifier " chia-yu.chang
@ 2025-09-23 10:08   ` Paolo Abeni
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-09-23 10:08 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> @@ -525,9 +527,10 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
>  	}
>  }
>  
> -static inline void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th,
> +static inline void tcp_ecn_rcv_syn(struct sock *sk, const struct tcphdr *th,
>  				   const struct sk_buff *skb)
>  {
> +	struct tcp_sock *tp = tcp_sk(sk);

Minor nit: please leave an empty line between variable declarations and
code.

>  	if (tcp_ecn_mode_pending(tp)) {
>  		if (!tcp_accecn_syn_requested(th)) {
>  			/* Downgrade to classic ECN feedback */

/P


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 net-next 06/14] tcp: accecn: handle unexpected AccECN negotiation feedback
  2025-09-18 16:21 ` [PATCH v2 net-next 06/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
@ 2025-09-23 10:14   ` Paolo Abeni
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-09-23 10:14 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> According to Section 3.1.2 of AccECN spec (RFC9768), if a TCP Client
> has sent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1)
> then receives a SYN/ACK with the currently reserved combination
> (AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a
> combination, the Client MUST enable AccECN mode as if the SYN/ACK
> confirmed that the Server supported AccECN and as if it fed back that
> the IP-ECN field on the SYN had arrived unchanged.
> 
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

This looks like a fix for an incorrect behavior introduced by a previous
AccECN patch. If so, please add a suitable fixes tag. We accept such
tags even for net-next material, and it's better to explicitly call out
needed fixes.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 net-next 11/14] tcp: accecn: fallback outgoing half link to non-AccECN
  2025-09-18 16:21 ` [PATCH v2 net-next 11/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
@ 2025-09-23 10:28   ` Paolo Abeni
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-09-23 10:28 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
> index 2256d2efa5ec..8317c3f279c9 100644
> --- a/include/net/tcp_ecn.h
> +++ b/include/net/tcp_ecn.h
> @@ -169,7 +169,10 @@ static inline void tcp_accecn_third_ack(struct sock *sk,
>  	switch (ace) {
>  	case 0x0:
>  		/* Invalid value */
> -		tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV);
> +		if (!TCP_SKB_CB(skb)->sacked) {
> +			tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV |
> +						     TCP_ACCECN_OPT_FAIL_RECV);
> +		}

Minor nit: brackets are not needed above.

/P


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 net-next 13/14] tcp: accecn: stop sending AccECN opt when loss ACK w/ option
  2025-09-18 16:21 ` [PATCH v2 net-next 13/14] tcp: accecn: stop sending AccECN opt when loss ACK w/ option chia-yu.chang
@ 2025-09-23 10:52   ` Paolo Abeni
  2025-09-25 14:46     ` Chia-Yu Chang (Nokia)
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Abeni @ 2025-09-23 10:52 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> Detect spurious retransmission of a previously sent ACK carrying the
> AccECN option after the second retransmission. Since this might be caused
> by the middlebox dropping ACK with options it does not recognize, disable
> the sending of the AccECN option in all subsequent ACKs. This patch
> follows Section 3.2.3.2.2 of AccECN spec (RFC9768).

Is this really useful/triggers in practice?

AFAICS it will take effect only it the retransmission happens just after
an egress AccECN packet, i.e. will not trigger if the there are more
later non AccECN packets pending.
> 
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> ---
>  include/linux/tcp.h   | 3 ++-
>  include/net/tcp_ecn.h | 1 +
>  net/ipv4/tcp_input.c  | 9 +++++++++
>  net/ipv4/tcp_output.c | 3 +++
>  4 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index 53a4a0da35cc..f3b72fb6fa76 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -295,7 +295,8 @@ struct tcp_sock {
>  	u8	nonagle     : 4,/* Disable Nagle algorithm?             */
>  		rate_app_limited:1;  /* rate_{delivered,interval_us} limited? */
>  	u8	received_ce_pending:4, /* Not yet transmit cnt of received_ce */
> -		unused2:4;
> +		accecn_opt_sent:1,/* Sent AccECN option in previous ACK */
> +		unused2:3;
>  	u8	accecn_minlen:2,/* Minimum length of AccECN option sent */
>  		est_ecnfield:2,/* ECN field for AccECN delivered estimates */
>  		accecn_opt_demand:2,/* Demand AccECN option for n next ACKs */
> diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
> index 8317c3f279c9..3eb450b4b648 100644
> --- a/include/net/tcp_ecn.h
> +++ b/include/net/tcp_ecn.h
> @@ -401,6 +401,7 @@ static inline void tcp_accecn_init_counters(struct tcp_sock *tp)
>  	tp->received_ce_pending = 0;
>  	__tcp_accecn_init_bytes_counters(tp->received_ecn_bytes);
>  	__tcp_accecn_init_bytes_counters(tp->delivered_ecn_bytes);
> +	tp->accecn_opt_sent = 0;
>  	tp->accecn_minlen = 0;
>  	tp->accecn_opt_demand = 0;
>  	tp->est_ecnfield = 0;
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index f250b4912a0b..6083260133e9 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4804,6 +4804,7 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq)
>  
>  static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
>  {
> +	struct tcp_sock *tp = tcp_sk(sk);

Minor nit: please leave an empty line after variable declaration.

/P


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 net-next 03/14] tcp: accecn: Add ece_delta to rate_sample
  2025-09-23  9:47   ` Paolo Abeni
@ 2025-09-25  7:40     ` Chia-Yu Chang (Nokia)
  2025-09-25  7:44       ` Paolo Abeni
  0 siblings, 1 reply; 26+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-09-25  7:40 UTC (permalink / raw)
  To: Paolo Abeni, edumazet@google.com, linux-doc@vger.kernel.org,
	corbet@lwn.net, horms@kernel.org, dsahern@kernel.org,
	kuniyu@amazon.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
	dave.taht@gmail.com, jhs@mojatatu.com, kuba@kernel.org,
	stephen@networkplumber.org, xiyou.wangcong@gmail.com,
	jiri@resnulli.us, davem@davemloft.net, andrew+netdev@lunn.ch,
	donald.hunter@gmail.com, ast@fiberby.net, liuhangbin@gmail.com,
	shuah@kernel.org, linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire@apple.com, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, vidhi_goel@apple.com
  Cc: Olivier Tilmans (Nokia)

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com> 
> Sent: Tuesday, September 23, 2025 11:48 AM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com
> Cc: Olivier Tilmans (Nokia) <olivier.tilmans@nokia.com>
> Subject: Re: [PATCH v2 net-next 03/14] tcp: accecn: Add ece_delta to rate_sample
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > From: Ilpo Järvinen <ij@kernel.org>
> >
> > Include echoed CE count into rate_sample. Replace local ecn_count 
> > variable with it.
> 
> Why? skimming over the next few patches it's not clear to me which is the goal here.
> 
> Expanding the commit message would help, thanks!
> 
> Paolo

Hi Paolo,

Sorry for my late reply.
Here, ece_delata will be used in the next patch series by tcp_prague (or other congestion control algorithm) that support Prague requirement to adjust the congestion window accordingly.
I will elaborate more on the patch notes, or you think this shall stay in the tcp_prague patch series?

Best regards,
Chia-Yu

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 net-next 03/14] tcp: accecn: Add ece_delta to rate_sample
  2025-09-25  7:40     ` Chia-Yu Chang (Nokia)
@ 2025-09-25  7:44       ` Paolo Abeni
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-09-25  7:44 UTC (permalink / raw)
  To: Chia-Yu Chang (Nokia), edumazet@google.com,
	linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
	dsahern@kernel.org, kuniyu@amazon.com, bpf@vger.kernel.org,
	netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
	kuba@kernel.org, stephen@networkplumber.org,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
	liuhangbin@gmail.com, shuah@kernel.org,
	linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire@apple.com, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, vidhi_goel@apple.com
  Cc: Olivier Tilmans (Nokia)

On 9/25/25 9:40 AM, Chia-Yu Chang (Nokia) wrote:
>> -----Original Message-----
>> From: Paolo Abeni <pabeni@redhat.com> 
>> Sent: Tuesday, September 23, 2025 11:48 AM
>> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com
>> Cc: Olivier Tilmans (Nokia) <olivier.tilmans@nokia.com>
>> Subject: Re: [PATCH v2 net-next 03/14] tcp: accecn: Add ece_delta to rate_sample
>>
>>
>> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>>
>>
>>
>> On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
>>> From: Ilpo Järvinen <ij@kernel.org>
>>>
>>> Include echoed CE count into rate_sample. Replace local ecn_count 
>>> variable with it.
>>
>> Why? skimming over the next few patches it's not clear to me which is the goal here.
>>
>> Expanding the commit message would help, thanks!
>>
>> Paolo
> 
> Hi Paolo,
> 
> Sorry for my late reply.
> Here, ece_delata will be used in the next patch series by tcp_prague (or other congestion control algorithm) that support Prague requirement to adjust the congestion window accordingly.
> I will elaborate more on the patch notes, or you think this shall stay in the tcp_prague patch series?

I think the latter would be the better option.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 net-next 13/14] tcp: accecn: stop sending AccECN opt when loss ACK w/ option
  2025-09-23 10:52   ` Paolo Abeni
@ 2025-09-25 14:46     ` Chia-Yu Chang (Nokia)
  2025-09-25 15:46       ` Paolo Abeni
  0 siblings, 1 reply; 26+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-09-25 14:46 UTC (permalink / raw)
  To: Paolo Abeni, edumazet@google.com, linux-doc@vger.kernel.org,
	corbet@lwn.net, horms@kernel.org, dsahern@kernel.org,
	kuniyu@amazon.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
	dave.taht@gmail.com, jhs@mojatatu.com, kuba@kernel.org,
	stephen@networkplumber.org, xiyou.wangcong@gmail.com,
	jiri@resnulli.us, davem@davemloft.net, andrew+netdev@lunn.ch,
	donald.hunter@gmail.com, ast@fiberby.net, liuhangbin@gmail.com,
	shuah@kernel.org, linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire@apple.com, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, vidhi_goel@apple.com

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com> 
> Sent: Tuesday, September 23, 2025 12:52 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com
> Subject: Re: [PATCH v2 net-next 13/14] tcp: accecn: stop sending AccECN opt when loss ACK w/ option
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >
> > Detect spurious retransmission of a previously sent ACK carrying the 
> > AccECN option after the second retransmission. Since this might be 
> > caused by the middlebox dropping ACK with options it does not 
> > recognize, disable the sending of the AccECN option in all subsequent 
> > ACKs. This patch follows Section 3.2.3.2.2 of AccECN spec (RFC9768).
> 
> Is this really useful/triggers in practice?
> 
> AFAICS it will take effect only it the retransmission happens just after an egress AccECN packet, i.e. will not trigger if the there are more later non AccECN packets pending.

Hi Paolo,

This is a simplied implementation than what is mentieond in the RFC9768: 
"Such a host detect loss of ACKs carrying the AccECN Option by detecting whether the acknowledged data alwaysreappears as a retransmission. In such cases, the host disable the sending of the AccECN Option for this half-connection."


However, to implement the case that not that just after egressing the ACK with AccECN, I was thinking to modify struct tcp_sack_block but that maybe an over engineering.

And IMO this could be useful with proper configuration on syscel tcp_accecn_option and tcp_ecn_option_beacon to see whether the middlebox is dropping all ACKs with AccECN option.
An example packetdrill is at:
https://github.com/minuscat/packetdrill_accecn/blob/main/gtests/net/tcp/accecn/fallback/client_accecn_options_drop.pkt

What do you think? And maybe @mirja.kuehlewind@ericsson.com could help to comment from RFC perspective.

Chia-Yu

> >
> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> > ---
> >  include/linux/tcp.h   | 3 ++-
> >  include/net/tcp_ecn.h | 1 +
> >  net/ipv4/tcp_input.c  | 9 +++++++++
> >  net/ipv4/tcp_output.c | 3 +++
> >  4 files changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 
> > 53a4a0da35cc..f3b72fb6fa76 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -295,7 +295,8 @@ struct tcp_sock {
> >       u8      nonagle     : 4,/* Disable Nagle algorithm?             */
> >               rate_app_limited:1;  /* rate_{delivered,interval_us} limited? */
> >       u8      received_ce_pending:4, /* Not yet transmit cnt of received_ce */
> > -             unused2:4;
> > +             accecn_opt_sent:1,/* Sent AccECN option in previous ACK */
> > +             unused2:3;
> >       u8      accecn_minlen:2,/* Minimum length of AccECN option sent */
> >               est_ecnfield:2,/* ECN field for AccECN delivered estimates */
> >               accecn_opt_demand:2,/* Demand AccECN option for n next 
> > ACKs */ diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h 
> > index 8317c3f279c9..3eb450b4b648 100644
> > --- a/include/net/tcp_ecn.h
> > +++ b/include/net/tcp_ecn.h
> > @@ -401,6 +401,7 @@ static inline void tcp_accecn_init_counters(struct tcp_sock *tp)
> >       tp->received_ce_pending = 0;
> >       __tcp_accecn_init_bytes_counters(tp->received_ecn_bytes);
> >       __tcp_accecn_init_bytes_counters(tp->delivered_ecn_bytes);
> > +     tp->accecn_opt_sent = 0;
> >       tp->accecn_minlen = 0;
> >       tp->accecn_opt_demand = 0;
> >       tp->est_ecnfield = 0;
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 
> > f250b4912a0b..6083260133e9 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -4804,6 +4804,7 @@ static void tcp_dsack_extend(struct sock *sk, 
> > u32 seq, u32 end_seq)
> >
> >  static void tcp_rcv_spurious_retrans(struct sock *sk, const struct 
> > sk_buff *skb)  {
> > +     struct tcp_sock *tp = tcp_sk(sk);
> 
> Minor nit: please leave an empty line after variable declaration.
> 
> /P


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 net-next 13/14] tcp: accecn: stop sending AccECN opt when loss ACK w/ option
  2025-09-25 14:46     ` Chia-Yu Chang (Nokia)
@ 2025-09-25 15:46       ` Paolo Abeni
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-09-25 15:46 UTC (permalink / raw)
  To: Chia-Yu Chang (Nokia), edumazet@google.com,
	linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
	dsahern@kernel.org, kuniyu@amazon.com, bpf@vger.kernel.org,
	netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
	kuba@kernel.org, stephen@networkplumber.org,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
	liuhangbin@gmail.com, shuah@kernel.org,
	linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire@apple.com, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, vidhi_goel@apple.com

On 9/25/25 4:46 PM, Chia-Yu Chang (Nokia) wrote:
>From: Paolo Abeni <pabeni@redhat.com> Sent: Tuesday, September 23, 2025 12:52 PM
>> On 9/18/25 6:21 PM, chia-yu.chang@nokia-bell-labs.com wrote:
>>> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>>>
>>> Detect spurious retransmission of a previously sent ACK carrying the 
>>> AccECN option after the second retransmission. Since this might be 
>>> caused by the middlebox dropping ACK with options it does not 
>>> recognize, disable the sending of the AccECN option in all subsequent 
>>> ACKs. This patch follows Section 3.2.3.2.2 of AccECN spec (RFC9768).
>>
>> Is this really useful/triggers in practice?
>>
>> AFAICS it will take effect only it the retransmission happens just after an egress AccECN packet, i.e. will not trigger if the there are more later non AccECN packets pending.
> 
> Hi Paolo,
> 
> This is a simplied implementation than what is mentieond in the RFC9768: 
> "Such a host detect loss of ACKs carrying the AccECN Option by detecting whether the acknowledged data alwaysreappears as a retransmission. In such cases, the host disable the sending of the AccECN Option for this half-connection."
> 
> However, to implement the case that not that just after egressing the ACK with AccECN, I was thinking to modify struct tcp_sack_block but that maybe an over engineering.

I agree touching tcp_sack_block looks overkill. I think that the
simplified implementation is a bit too far from the RFC specification
and too simplistic to be effective. I suggest dropping this change.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2025-09-25 15:46 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-18 16:21 [PATCH v2 net-next 00/14] AccECN protocol case handling series chia-yu.chang
2025-09-18 16:21 ` [PATCH v2 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
2025-09-23  9:32   ` Paolo Abeni
2025-09-18 16:21 ` [PATCH v2 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
2025-09-18 16:21 ` [PATCH v2 net-next 03/14] tcp: accecn: Add ece_delta to rate_sample chia-yu.chang
2025-09-23  9:47   ` Paolo Abeni
2025-09-25  7:40     ` Chia-Yu Chang (Nokia)
2025-09-25  7:44       ` Paolo Abeni
2025-09-18 16:21 ` [PATCH v2 net-next 04/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
2025-09-23 10:04   ` Paolo Abeni
2025-09-18 16:21 ` [PATCH v2 net-next 05/14] tcp: disable RFC3168 fallback identifier " chia-yu.chang
2025-09-23 10:08   ` Paolo Abeni
2025-09-18 16:21 ` [PATCH v2 net-next 06/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
2025-09-23 10:14   ` Paolo Abeni
2025-09-18 16:21 ` [PATCH v2 net-next 07/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
2025-09-18 16:21 ` [PATCH v2 net-next 08/14] tcp: move increment of num_retrans chia-yu.chang
2025-09-18 16:21 ` [PATCH v2 net-next 09/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
2025-09-18 16:21 ` [PATCH v2 net-next 10/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
2025-09-18 16:21 ` [PATCH v2 net-next 11/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
2025-09-23 10:28   ` Paolo Abeni
2025-09-18 16:21 ` [PATCH v2 net-next 12/14] tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation chia-yu.chang
2025-09-18 16:21 ` [PATCH v2 net-next 13/14] tcp: accecn: stop sending AccECN opt when loss ACK w/ option chia-yu.chang
2025-09-23 10:52   ` Paolo Abeni
2025-09-25 14:46     ` Chia-Yu Chang (Nokia)
2025-09-25 15:46       ` Paolo Abeni
2025-09-18 16:21 ` [PATCH v2 net-next 14/14] tcp: accecn: enable AccECN chia-yu.chang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).