linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 net-next 00/14] AccECN protocol case handling series
@ 2025-10-29  8:05 chia-yu.chang
  0 siblings, 0 replies; 40+ messages in thread
From: chia-yu.chang @ 2025-10-29  8:05 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Hello,

Plesae find the v5 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.

This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/

Best regards,
Chia-Yu

---
v5:
- Move previous #11 in v4 in latter patch after discussion with RFC author.
- Add #3 to update the comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN. (Parav Pandit <parav@nvidia.com>)
- Add gro self-test for TCP CWR flag in #4. (Eric Dumazet <edumazet@google.com>)
- Add fixes: tag into #7 (Paolo Abeni <pabeni@redhat.com>)
- Update commit message of #8 and if condition check (Paolo Abeni <pabeni@redhat.com>)
- Add empty line between variable declarations and code in #13 (Paolo Abeni <pabeni@redhat.com>)

v4:
- Add previous #13 in v2 back after dicussion with the RFC author.
- Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option.

v3:
- Add additional min() check if pkts_acked_ewma is not initialized in #1. (Paolo Abeni <pabeni@redhat.com>)
- Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3. (Paolo Abeni <pabeni@redhat.com>)
- Add empty line between variable declarations and code in #4. (Paolo Abeni <pabeni@redhat.com>)
- Update commit message to fix old AccECN commits in #5. (Paolo Abeni <pabeni@redhat.com>)
- Remove unnecessary brackets in #10. (Paolo Abeni <pabeni@redhat.com>)
- Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2. (Paolo Abeni <pabeni@redhat.com>)

---
Chia-Yu Chang (12):
  net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
  selftests/net: gro: add self-test for TCP CWR flag
  tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
  tcp: disable RFC3168 fallback identifier for CC modules
  tcp: accecn: handle unexpected AccECN negotiation feedback
  tcp: accecn: retransmit downgraded SYN in AccECN negotiation
  tcp: move increment of num_retrans
  tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
    SYN/ACK
  tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
  tcp: accecn: fallback outgoing half link to non-AccECN
  tcp: accecn: detect loss ACK w/ AccECN option and add
    TCP_ACCECN_OPTION_PERSIST
  tcp: accecn: enable AccECN

Ilpo Järvinen (2):
  tcp: try to avoid safer when ACKs are thinned
  gro: flushing when CWR is set negatively affects AccECN

 Documentation/networking/ip-sysctl.rst        |  4 +-
 .../networking/net_cachelines/tcp_sock.rst    |  1 +
 include/linux/skbuff.h                        | 13 ++-
 include/linux/tcp.h                           |  4 +-
 include/net/inet_ecn.h                        | 20 +++-
 include/net/tcp.h                             | 32 ++++++-
 include/net/tcp_ecn.h                         | 92 ++++++++++++++-----
 net/ipv4/sysctl_net_ipv4.c                    |  4 +-
 net/ipv4/tcp.c                                |  2 +
 net/ipv4/tcp_cong.c                           | 10 +-
 net/ipv4/tcp_input.c                          | 37 +++++++-
 net/ipv4/tcp_minisocks.c                      | 40 +++++---
 net/ipv4/tcp_offload.c                        |  3 +-
 net/ipv4/tcp_output.c                         | 42 ++++++---
 tools/testing/selftests/net/gro.c             | 80 +++++++++++-----
 15 files changed, 294 insertions(+), 90 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 00/14] AccECN protocol case handling series
@ 2025-10-30 14:34 chia-yu.chang
  2025-10-30 14:34 ` [PATCH v5 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
                   ` (14 more replies)
  0 siblings, 15 replies; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Hello,

Plesae find the v5 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.

This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/

Best regards,
Chia-Yu

---
v5:
- Move previous #11 in v4 in latter patch after discussion with RFC author.
- Add #3 to update the comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN. (Parav Pandit <parav@nvidia.com>)
- Add gro self-test for TCP CWR flag in #4. (Eric Dumazet <edumazet@google.com>)
- Add fixes: tag into #7 (Paolo Abeni <pabeni@redhat.com>)
- Update commit message of #8 and if condition check (Paolo Abeni <pabeni@redhat.com>)
- Add empty line between variable declarations and code in #13 (Paolo Abeni <pabeni@redhat.com>)

v4:
- Add previous #13 in v2 back after dicussion with the RFC author.
- Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option.

v3:
- Add additional min() check if pkts_acked_ewma is not initialized in #1. (Paolo Abeni <pabeni@redhat.com>)
- Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3. (Paolo Abeni <pabeni@redhat.com>)
- Add empty line between variable declarations and code in #4. (Paolo Abeni <pabeni@redhat.com>)
- Update commit message to fix old AccECN commits in #5. (Paolo Abeni <pabeni@redhat.com>)
- Remove unnecessary brackets in #10. (Paolo Abeni <pabeni@redhat.com>)
- Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2. (Paolo Abeni <pabeni@redhat.com>)

---
Chia-Yu Chang (12):
  net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
  selftests/net: gro: add self-test for TCP CWR flag
  tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
  tcp: disable RFC3168 fallback identifier for CC modules
  tcp: accecn: handle unexpected AccECN negotiation feedback
  tcp: accecn: retransmit downgraded SYN in AccECN negotiation
  tcp: move increment of num_retrans
  tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
    SYN/ACK
  tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
  tcp: accecn: fallback outgoing half link to non-AccECN
  tcp: accecn: detect loss ACK w/ AccECN option and add
    TCP_ACCECN_OPTION_PERSIST
  tcp: accecn: enable AccECN

Ilpo Järvinen (2):
  tcp: try to avoid safer when ACKs are thinned
  gro: flushing when CWR is set negatively affects AccECN

 Documentation/networking/ip-sysctl.rst        |  4 +-
 .../networking/net_cachelines/tcp_sock.rst    |  1 +
 include/linux/skbuff.h                        | 13 ++-
 include/linux/tcp.h                           |  4 +-
 include/net/inet_ecn.h                        | 20 +++-
 include/net/tcp.h                             | 32 ++++++-
 include/net/tcp_ecn.h                         | 92 ++++++++++++++-----
 net/ipv4/sysctl_net_ipv4.c                    |  4 +-
 net/ipv4/tcp.c                                |  2 +
 net/ipv4/tcp_cong.c                           | 10 +-
 net/ipv4/tcp_input.c                          | 37 +++++++-
 net/ipv4/tcp_minisocks.c                      | 40 +++++---
 net/ipv4/tcp_offload.c                        |  3 +-
 net/ipv4/tcp_output.c                         | 42 ++++++---
 tools/testing/selftests/net/gro.c             | 80 +++++++++++-----
 15 files changed, 294 insertions(+), 90 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 01/14] tcp: try to avoid safer when ACKs are thinned
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 10:57   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Ilpo Järvinen <ij@kernel.org>

Add newly acked pkts EWMA. When ACK thinning occurs, select
between safer and unsafe cep delta in AccECN processing based
on it. If the packets ACKed per ACK tends to be large, don't
conservatively assume ACE field overflow.

This patch uses the existing 2-byte holes in the rx group for new
u16 variables withtout creating more holes. Below are the pahole
outcomes before and after this patch:

[BEFORE THIS PATCH]
struct tcp_sock {
    [...]
    u32                        delivered_ecn_bytes[3]; /*  2744    12 */
    /* XXX 4 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */

    [...]
    /* size: 3264, cachelines: 51, members: 177 */
}

[AFTER THIS PATCH]
struct tcp_sock {
    [...]
    u32                        delivered_ecn_bytes[3]; /*  2744    12 */
    u16                        pkts_acked_ewma;        /*  2756     2 */
    /* XXX 2 bytes hole, try to pack */

    [...]
    __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */

    [...]
    /* size: 3264, cachelines: 51, members: 178 */
}

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

---
v3:
- Add additional min() check if pkts_acked_ewma is not initialized.
---
 .../networking/net_cachelines/tcp_sock.rst    |  1 +
 include/linux/tcp.h                           |  1 +
 net/ipv4/tcp.c                                |  2 ++
 net/ipv4/tcp_input.c                          | 20 ++++++++++++++++++-
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
index 26f32dbcf6ec..563daea10d6c 100644
--- a/Documentation/networking/net_cachelines/tcp_sock.rst
+++ b/Documentation/networking/net_cachelines/tcp_sock.rst
@@ -105,6 +105,7 @@ u32                           received_ce             read_mostly         read_w
 u32[3]                        received_ecn_bytes      read_mostly         read_write
 u8:4                          received_ce_pending     read_mostly         read_write
 u32[3]                        delivered_ecn_bytes                         read_write
+u16                           pkts_acked_ewma                             read_write
 u8:2                          syn_ect_snt             write_mostly        read_write
 u8:2                          syn_ect_rcv             read_mostly         read_write
 u8:2                          accecn_minlen           write_mostly        read_write
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 20b8c6e21fef..683f38362977 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -345,6 +345,7 @@ struct tcp_sock {
 	u32	rate_interval_us;  /* saved rate sample: time elapsed */
 	u32	rcv_rtt_last_tsecr;
 	u32	delivered_ecn_bytes[3];
+	u16	pkts_acked_ewma;/* Pkts acked EWMA for AccECN cep heuristic */
 	u64	first_tx_mstamp;  /* start of window send phase */
 	u64	delivered_mstamp; /* time we reached "delivered" */
 	u64	bytes_acked;	/* RFC4898 tcpEStatsAppHCThruOctetsAcked
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a9345aa5a2e5..d92223954cc7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3420,6 +3420,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	tcp_accecn_init_counters(tp);
 	tp->prev_ecnfield = 0;
 	tp->accecn_opt_tstamp = 0;
+	tp->pkts_acked_ewma = 0;
 	if (icsk->icsk_ca_initialized && icsk->icsk_ca_ops->release)
 		icsk->icsk_ca_ops->release(sk);
 	memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
@@ -5193,6 +5194,7 @@ static void __init tcp_struct_check(void)
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, rate_interval_us);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, rcv_rtt_last_tsecr);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, delivered_ecn_bytes);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, pkts_acked_ewma);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, first_tx_mstamp);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, delivered_mstamp);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, bytes_acked);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ff19f6e54d55..f6e6f30c3d79 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -488,6 +488,10 @@ static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
 		tcp_count_delivered_ce(tp, delivered);
 }
 
+#define PKTS_ACKED_WEIGHT	6
+#define PKTS_ACKED_PREC		6
+#define ACK_COMP_THRESH		4
+
 /* Returns the ECN CE delta */
 static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 				u32 delivered_pkts, u32 delivered_bytes,
@@ -499,6 +503,7 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 	u32 delta, safe_delta, d_ceb;
 	bool opt_deltas_valid;
 	u32 corrected_ace;
+	u32 ewma;
 
 	/* Reordered ACK or uncertain due to lack of data to send and ts */
 	if (!(flag & (FLAG_FORWARD_PROGRESS | FLAG_TS_PROGRESS)))
@@ -507,6 +512,18 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 	opt_deltas_valid = tcp_accecn_process_option(tp, skb,
 						     delivered_bytes, flag);
 
+	if (delivered_pkts) {
+		if (!tp->pkts_acked_ewma) {
+			ewma = delivered_pkts << PKTS_ACKED_PREC;
+		} else {
+			ewma = tp->pkts_acked_ewma;
+			ewma = (((ewma << PKTS_ACKED_WEIGHT) - ewma) +
+				(delivered_pkts << PKTS_ACKED_PREC)) >>
+				PKTS_ACKED_WEIGHT;
+		}
+		tp->pkts_acked_ewma = min_t(u32, ewma, 0xFFFFU);
+	}
+
 	if (!(flag & FLAG_SLOWPATH)) {
 		/* AccECN counter might overflow on large ACKs */
 		if (delivered_pkts <= TCP_ACCECN_CEP_ACE_MASK)
@@ -555,7 +572,8 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
 		if (d_ceb <
 		    safe_delta * tp->mss_cache >> TCP_ACCECN_SAFETY_SHIFT)
 			return delta;
-	}
+	} else if (tp->pkts_acked_ewma > (ACK_COMP_THRESH << PKTS_ACKED_PREC))
+		return delta;
 
 	return safe_delta;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
  2025-10-30 14:34 ` [PATCH v5 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 11:01   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN chia-yu.chang
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Ilpo Järvinen <ij@kernel.org>

As AccECN may keep CWR bit asserted due to different
interpretation of the bit, flushing with GRO because of
CWR may effectively disable GRO until AccECN counter
field changes such that CWR-bit becomes 0.

There is no harm done from not immediately forwarding the
CWR'ed segment with RFC3168 ECN.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 net/ipv4/tcp_offload.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 2cb93da93abc..fcbf4148919c 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -330,8 +330,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
 		goto out_check_final;
 
 	th2 = tcp_hdr(p);
-	flush = (__force int)(flags & TCP_FLAG_CWR);
-	flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
+	flush = (__force int)((flags ^ tcp_flag_word(th2)) &
 		  ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
 	flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
 	for (i = sizeof(*th); i < thlen; i += 4)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
  2025-10-30 14:34 ` [PATCH v5 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
  2025-10-30 14:34 ` [PATCH v5 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 11:06   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 04/14] selftests/net: gro: add self-test for TCP CWR flag chia-yu.chang
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

No functional changes.

Co-developed-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/linux/skbuff.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a7cc3d1f4fd1..74d6a209e203 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -671,7 +671,12 @@ enum {
 	/* This indicates the skb is from an untrusted source. */
 	SKB_GSO_DODGY = 1 << 1,
 
-	/* This indicates the tcp segment has CWR set. */
+	/* For Tx, this indicates the first TCP segment has CWR set, and any
+	 * subsequent segment in the same skb has CWR cleared. This cannot be
+	 * used on Rx, because the connection to which the segment belongs is
+	 * not tracked to use RFC3168 or Accurate ECN, and using RFC3168 ECN
+	 * offload may corrupt AccECN signal of AccECN segments.
+	 */
 	SKB_GSO_TCP_ECN = 1 << 2,
 
 	__SKB_GSO_TCP_FIXEDID = 1 << 3,
@@ -706,6 +711,12 @@ enum {
 
 	SKB_GSO_FRAGLIST = 1 << 18,
 
+	/* For TX, this indicates the TCP segment uses the CWR flag as part of
+	 * AccECN signal, and the CWR flag is not modified in the skb. For RX,
+	 * any CWR flagged segment must use SKB_GSO_TCP_ACCECN. This is to
+	 * ensure the CWR flag is not cleared by any RFC3168 ECN offload, and
+	 * thus keeping AccECN signal of AccECN segments.
+	 */
 	SKB_GSO_TCP_ACCECN = 1 << 19,
 
 	/* These indirectly map onto the same netdev feature.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 04/14] selftests/net: gro: add self-test for TCP CWR flag
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (2 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-10-30 14:34 ` [PATCH v5 net-next 05/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Currently, GRO does not flush packets when the CWR bit is set.
A corresponding self-test is being added, in which the CWR flag
is set for two consecutive packets, but the first packet with the
CWR flag set will not be flushed immediately.

+===================+==========+===============+===========+
|     Packet id     | CWR flag |    Payload    | Flushing? |
+===================+==========+===============+===========+
|         0         |     0    |  PAYLOAD_LEN  |     0     |
|        ...        |     0    |  PAYLOAD_LEN  |     1     |
+-------------------+----------+---------------+-----------+
| NUM_PACKETS/2 - 1 |     1    |  payload_len  |     0     |
|   NUM_PACKETS/2   |     1    |  payload_len  |     1     |
+-------------------+----------+---------------+-----------+
|        ...        |     0    |  PAYLOAD_LEN  |     0     |
|   NUM_PACKETS     |     0    |  PAYLOAD_LEN  |     1     |
+===================+==========+===============+===========+

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 tools/testing/selftests/net/gro.c | 80 ++++++++++++++++++++++---------
 1 file changed, 57 insertions(+), 23 deletions(-)

diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c
index 2b1d9f2b3e9e..763331bb4bbd 100644
--- a/tools/testing/selftests/net/gro.c
+++ b/tools/testing/selftests/net/gro.c
@@ -11,8 +11,8 @@
  * 2.ack
  *  Pure ACK does not coalesce.
  * 3.flags
- *  Specific test cases: no packets with PSH, SYN, URG, RST set will
- *  be coalesced.
+ *  Specific test cases: no packets with PSH, SYN, URG, RST, CWR set
+ *  will be coalesced.
  * 4.tcp
  *  Packets with incorrect checksum, non-consecutive seqno and
  *  different TCP header options shouldn't coalesce. Nit: given that
@@ -332,32 +332,57 @@ static void create_packet(void *buf, int seq_offset, int ack_offset,
 	fill_datalinklayer(buf);
 }
 
-/* send one extra flag, not first and not last pkt */
-static void send_flags(int fd, struct sockaddr_ll *daddr, int psh, int syn,
-		       int rst, int urg)
+#ifndef TH_CWR
+#define TH_CWR 0x80
+#endif
+static void set_flags(struct tcphdr *tcph, int payload_len, int psh, int syn,
+		      int rst, int urg, int cwr)
 {
-	static char flag_buf[MAX_HDR_LEN + PAYLOAD_LEN];
-	static char buf[MAX_HDR_LEN + PAYLOAD_LEN];
-	int payload_len, pkt_size, flag, i;
-	struct tcphdr *tcph;
-
-	payload_len = PAYLOAD_LEN * psh;
-	pkt_size = total_hdr_len + payload_len;
-	flag = NUM_PACKETS / 2;
-
-	create_packet(flag_buf, flag * payload_len, 0, payload_len, 0);
-
-	tcph = (struct tcphdr *)(flag_buf + tcp_offset);
 	tcph->psh = psh;
 	tcph->syn = syn;
 	tcph->rst = rst;
 	tcph->urg = urg;
+	if (cwr)
+		tcph->th_flags |= TH_CWR;
+	else
+		tcph->th_flags &= ~TH_CWR;
 	tcph->check = 0;
 	tcph->check = tcp_checksum(tcph, payload_len);
+}
+
+/* send extra flags of the (NUM_PACKETS / 2) and (NUM_PACKETS / 2 - 1)
+ * pkts, not first and not last pkt
+ */
+static void send_flags(int fd, struct sockaddr_ll *daddr, int psh, int syn,
+		       int rst, int urg, int cwr)
+{
+	static char flag_buf[2][MAX_HDR_LEN + PAYLOAD_LEN];
+	static char buf[MAX_HDR_LEN + PAYLOAD_LEN];
+	int payload_len, pkt_size, i;
+	struct tcphdr *tcph;
+	int flag[2];
+
+	payload_len = PAYLOAD_LEN * (psh || cwr);
+	pkt_size = total_hdr_len + payload_len;
+	flag[0] = NUM_PACKETS / 2;
+	flag[1] = NUM_PACKETS / 2 - 1;
+
+	// Create and configure packets with flags
+	for (i = 0; i < 2; i++) {
+		if (flag[i] > 0) {
+			create_packet(flag_buf[i], flag[i] * payload_len, 0,
+				      payload_len, 0);
+			tcph = (struct tcphdr *)(flag_buf[i] + tcp_offset);
+			set_flags(tcph, payload_len, psh, syn, rst, urg, cwr);
+		}
+	}
 
 	for (i = 0; i < NUM_PACKETS + 1; i++) {
-		if (i == flag) {
-			write_packet(fd, flag_buf, pkt_size, daddr);
+		if (i == flag[0]) {
+			write_packet(fd, flag_buf[0], pkt_size, daddr);
+			continue;
+		} else if (i == flag[1] && cwr) {
+			write_packet(fd, flag_buf[1], pkt_size, daddr);
 			continue;
 		}
 		create_packet(buf, i * PAYLOAD_LEN, 0, PAYLOAD_LEN, 0);
@@ -1019,16 +1044,19 @@ static void gro_sender(void)
 		send_ack(txfd, &daddr);
 		write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
 	} else if (strcmp(testname, "flags") == 0) {
-		send_flags(txfd, &daddr, 1, 0, 0, 0);
+		send_flags(txfd, &daddr, 1, 0, 0, 0, 0);
 		write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
 
-		send_flags(txfd, &daddr, 0, 1, 0, 0);
+		send_flags(txfd, &daddr, 0, 1, 0, 0, 0);
 		write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
 
-		send_flags(txfd, &daddr, 0, 0, 1, 0);
+		send_flags(txfd, &daddr, 0, 0, 1, 0, 0);
 		write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
 
-		send_flags(txfd, &daddr, 0, 0, 0, 1);
+		send_flags(txfd, &daddr, 0, 0, 0, 1, 0);
+		write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
+
+		send_flags(txfd, &daddr, 0, 0, 0, 0, 1);
 		write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
 	} else if (strcmp(testname, "tcp") == 0) {
 		send_changed_checksum(txfd, &daddr);
@@ -1155,6 +1183,12 @@ static void gro_receiver(void)
 
 		printf("urg flag ends coalescing: ");
 		check_recv_pkts(rxfd, correct_payload, 3);
+
+		correct_payload[0] = PAYLOAD_LEN;
+		correct_payload[1] = PAYLOAD_LEN * 2;
+		correct_payload[2] = PAYLOAD_LEN * 2;
+		printf("cwr flag ends coalescing: ");
+		check_recv_pkts(rxfd, correct_payload, 3);
 	} else if (strcmp(testname, "tcp") == 0) {
 		correct_payload[0] = PAYLOAD_LEN;
 		correct_payload[1] = PAYLOAD_LEN;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 05/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (3 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 04/14] selftests/net: gro: add self-test for TCP CWR flag chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 11:38   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 06/14] tcp: disable RFC3168 fallback identifier " chia-yu.chang
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang, Olivier Tilmans

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Two CA module flags are added in this patch. First, a new CA module
flag (TCP_CONG_NEEDS_ACCECN) defines that the CA expects to negotiate
AccECN functionality using the ECE, CWR and AE flags in the TCP header.
The detailed AccECN negotiaotn during the 3WHS can be found in the
AccECN spec:
  https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt

Second, when ECN is negociated for a TCP flow, it defaults to use
ECT(0) in the IP header. L4S service, however, needs to se ECT(1).
This patch enables CA to control whether ECT(0) or ECT(1) should
be used on a per-segment basis. A new flag (TCP_CONG_WANTS_ECT_1)
defines the behavior expected by the CA when not-yet initialized for
the connection.

Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

---
v3:
- Change TCP_CONG_WANTS_ECT_1 into individual flag.
- Add helper function INET_ECN_xmit_wants_ect_1().
---
 include/net/inet_ecn.h | 20 +++++++++++++++++---
 include/net/tcp.h      | 22 +++++++++++++++++++++-
 include/net/tcp_ecn.h  | 13 ++++++++++---
 net/ipv4/tcp_cong.c    | 10 +++++++---
 net/ipv4/tcp_input.c   |  3 ++-
 net/ipv4/tcp_output.c  | 10 ++++++++--
 6 files changed, 65 insertions(+), 13 deletions(-)

diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index ea32393464a2..827b87a95dab 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -51,11 +51,25 @@ static inline __u8 INET_ECN_encapsulate(__u8 outer, __u8 inner)
 	return outer;
 }
 
+/* Apply either ECT(0) or ECT(1) */
+static inline void __INET_ECN_xmit(struct sock *sk, bool use_ect_1)
+{
+	__u8 ect = use_ect_1 ? INET_ECN_ECT_1 : INET_ECN_ECT_0;
+
+	/* Mask the complete byte in case the connection alternates between
+	 * ECT(0) and ECT(1).
+	 */
+	inet_sk(sk)->tos &= ~INET_ECN_MASK;
+	inet_sk(sk)->tos |= ect;
+	if (inet6_sk(sk)) {
+		inet6_sk(sk)->tclass &= ~INET_ECN_MASK;
+		inet6_sk(sk)->tclass |= ect;
+	}
+}
+
 static inline void INET_ECN_xmit(struct sock *sk)
 {
-	inet_sk(sk)->tos |= INET_ECN_ECT_0;
-	if (inet6_sk(sk) != NULL)
-		inet6_sk(sk)->tclass |= INET_ECN_ECT_0;
+	__INET_ECN_xmit(sk, false);
 }
 
 static inline void INET_ECN_dontxmit(struct sock *sk)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 190b3714e93b..76a67e77900d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -406,6 +406,7 @@ static inline void tcp_dec_quickack_mode(struct sock *sk)
 #define	TCP_ECN_DEMAND_CWR	BIT(2)
 #define	TCP_ECN_SEEN		BIT(3)
 #define	TCP_ECN_MODE_ACCECN	BIT(4)
+#define	TCP_ECN_ECT_1		BIT(5)
 
 #define	TCP_ECN_DISABLED	0
 #define	TCP_ECN_MODE_PENDING	(TCP_ECN_MODE_RFC3168 | TCP_ECN_MODE_ACCECN)
@@ -1195,7 +1196,12 @@ enum tcp_ca_ack_event_flags {
 #define TCP_CONG_NON_RESTRICTED		BIT(0)
 /* Requires ECN/ECT set on all packets */
 #define TCP_CONG_NEEDS_ECN		BIT(1)
-#define TCP_CONG_MASK	(TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN)
+/* Require successfully negotiated AccECN capability */
+#define TCP_CONG_NEEDS_ACCECN		BIT(2)
+/* Use ECT(1) instead of ECT(0) while the CA is uninitialized */
+#define TCP_CONG_WANTS_ECT_1		BIT(3)
+#define TCP_CONG_MASK  (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN | \
+			TCP_CONG_NEEDS_ACCECN | TCP_CONG_WANTS_ECT_1)
 
 union tcp_cc_info;
 
@@ -1327,6 +1333,20 @@ static inline bool tcp_ca_needs_ecn(const struct sock *sk)
 	return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ECN;
 }
 
+static inline bool tcp_ca_needs_accecn(const struct sock *sk)
+{
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+
+	return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ACCECN;
+}
+
+static inline bool tcp_ca_wants_ect_1(const struct sock *sk)
+{
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+
+	return icsk->icsk_ca_ops->flags & TCP_CONG_WANTS_ECT_1;
+}
+
 static inline void tcp_ca_event(struct sock *sk, const enum tcp_ca_event event)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index f13e5cd2b1ac..0cc698a8438c 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -31,6 +31,12 @@ enum tcp_accecn_option {
 	TCP_ACCECN_OPTION_FULL = 2,
 };
 
+/* Apply either ECT(0) or ECT(1) based on TCP_CONG_WANTS_ECT_1 flag */
+static inline void INET_ECN_xmit_wants_ect_1(struct sock *sk)
+{
+	__INET_ECN_xmit(sk, tcp_ca_wants_ect_1(sk));
+}
+
 static inline void tcp_ecn_queue_cwr(struct tcp_sock *tp)
 {
 	/* Do not set CWR if in AccECN mode! */
@@ -561,7 +567,7 @@ static inline void tcp_ecn_send_synack(struct sock *sk, struct sk_buff *skb)
 		TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ECE;
 	else if (tcp_ca_needs_ecn(sk) ||
 		 tcp_bpf_ca_needs_ecn(sk))
-		INET_ECN_xmit(sk);
+		INET_ECN_xmit_wants_ect_1(sk);
 
 	if (tp->ecn_flags & TCP_ECN_MODE_ACCECN) {
 		TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ACE;
@@ -579,7 +585,8 @@ static inline void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
 	bool use_ecn, use_accecn;
 	u8 tcp_ecn = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn);
 
-	use_accecn = tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ACCECN;
+	use_accecn = tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ACCECN ||
+		     tcp_ca_needs_accecn(sk);
 	use_ecn = tcp_ecn == TCP_ECN_IN_ECN_OUT_ECN ||
 		  tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ECN ||
 		  tcp_ca_needs_ecn(sk) || bpf_needs_ecn || use_accecn;
@@ -595,7 +602,7 @@ static inline void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
 
 	if (use_ecn) {
 		if (tcp_ca_needs_ecn(sk) || bpf_needs_ecn)
-			INET_ECN_xmit(sk);
+			INET_ECN_xmit_wants_ect_1(sk);
 
 		TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR;
 		if (use_accecn) {
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index df758adbb445..1a8ed6983ac3 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -16,6 +16,7 @@
 #include <linux/gfp.h>
 #include <linux/jhash.h>
 #include <net/tcp.h>
+#include <net/tcp_ecn.h>
 #include <trace/events/tcp.h>
 
 static DEFINE_SPINLOCK(tcp_cong_list_lock);
@@ -227,7 +228,7 @@ void tcp_assign_congestion_control(struct sock *sk)
 
 	memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
 	if (ca->flags & TCP_CONG_NEEDS_ECN)
-		INET_ECN_xmit(sk);
+		INET_ECN_xmit_wants_ect_1(sk);
 	else
 		INET_ECN_dontxmit(sk);
 }
@@ -240,7 +241,10 @@ void tcp_init_congestion_control(struct sock *sk)
 	if (icsk->icsk_ca_ops->init)
 		icsk->icsk_ca_ops->init(sk);
 	if (tcp_ca_needs_ecn(sk))
-		INET_ECN_xmit(sk);
+		/* The CA is already initialized, expect it to set the
+		 * appropriate flag to select ECT(1).
+		 */
+		__INET_ECN_xmit(sk, tcp_sk(sk)->ecn_flags & TCP_ECN_ECT_1);
 	else
 		INET_ECN_dontxmit(sk);
 	icsk->icsk_ca_initialized = 1;
@@ -257,7 +261,7 @@ static void tcp_reinit_congestion_control(struct sock *sk,
 	memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
 
 	if (ca->flags & TCP_CONG_NEEDS_ECN)
-		INET_ECN_xmit(sk);
+		INET_ECN_xmit_wants_ect_1(sk);
 	else
 		INET_ECN_dontxmit(sk);
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f6e6f30c3d79..b4098d5cce48 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -7239,7 +7239,8 @@ static void tcp_ecn_create_request(struct request_sock *req,
 	u32 ecn_ok_dst;
 
 	if (tcp_accecn_syn_requested(th) &&
-	    READ_ONCE(net->ipv4.sysctl_tcp_ecn) >= 3) {
+	    (READ_ONCE(net->ipv4.sysctl_tcp_ecn) >= 3 ||
+	     tcp_ca_needs_accecn(listen_sk))) {
 		inet_rsk(req)->ecn_ok = 1;
 		tcp_rsk(req)->accecn_ok = 1;
 		tcp_rsk(req)->syn_ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7f5df7a71f62..d475f80b2248 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -328,12 +328,17 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
 			 struct tcphdr *th, int tcp_header_len)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	bool ecn_ect_1;
 
 	if (!tcp_ecn_mode_any(tp))
 		return;
 
+	ecn_ect_1 = tp->ecn_flags & TCP_ECN_ECT_1;
+	if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
+		__INET_ECN_xmit(sk, true);
+
 	if (tcp_ecn_mode_accecn(tp)) {
-		if (!tcp_accecn_ace_fail_recv(tp))
+		if (!ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
 			INET_ECN_xmit(sk);
 		tcp_accecn_set_ace(tp, skb, th);
 		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN;
@@ -341,7 +346,8 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
 		/* Not-retransmitted data segment: set ECT and inject CWR. */
 		if (skb->len != tcp_header_len &&
 		    !before(TCP_SKB_CB(skb)->seq, tp->snd_nxt)) {
-			INET_ECN_xmit(sk);
+			if (!ecn_ect_1)
+				INET_ECN_xmit(sk);
 			if (tp->ecn_flags & TCP_ECN_QUEUE_CWR) {
 				tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR;
 				th->cwr = 1;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 06/14] tcp: disable RFC3168 fallback identifier for CC modules
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (4 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 05/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 11:42   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 07/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

When AccECN is not successfully negociated for a TCP flow, it defaults
fallback to classic ECN (RFC3168). However, L4S service will fallback
to non-ECN.

This patch enables congestion control module to control whether it
should not fallback to classic ECN after unsuccessful AccECN negotiation.
A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this
behavior expected by the CA.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

---
v3:
- Add empty line between variable declarations and code.
---
 include/net/tcp.h        | 12 +++++++++++-
 include/net/tcp_ecn.h    | 11 ++++++++---
 net/ipv4/tcp_input.c     |  2 +-
 net/ipv4/tcp_minisocks.c |  7 ++++---
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 76a67e77900d..68f835e70f44 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1200,8 +1200,11 @@ enum tcp_ca_ack_event_flags {
 #define TCP_CONG_NEEDS_ACCECN		BIT(2)
 /* Use ECT(1) instead of ECT(0) while the CA is uninitialized */
 #define TCP_CONG_WANTS_ECT_1		BIT(3)
+/* Cannot fallback to RFC3168 during AccECN negotiation */
+#define TCP_CONG_NO_FALLBACK_RFC3168	BIT(4)
 #define TCP_CONG_MASK  (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN | \
-			TCP_CONG_NEEDS_ACCECN | TCP_CONG_WANTS_ECT_1)
+			TCP_CONG_NEEDS_ACCECN | TCP_CONG_WANTS_ECT_1 | \
+			TCP_CONG_NO_FALLBACK_RFC3168)
 
 union tcp_cc_info;
 
@@ -1340,6 +1343,13 @@ static inline bool tcp_ca_needs_accecn(const struct sock *sk)
 	return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ACCECN;
 }
 
+static inline bool tcp_ca_no_fallback_rfc3168(const struct sock *sk)
+{
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+
+	return icsk->icsk_ca_ops->flags & TCP_CONG_NO_FALLBACK_RFC3168;
+}
+
 static inline bool tcp_ca_wants_ect_1(const struct sock *sk)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 0cc698a8438c..a7ba21d298ff 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -507,7 +507,9 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
 		 * | ECN    | AccECN | 0   0   1  | Classic ECN |
 		 * +========+========+============+=============+
 		 */
-		if (tcp_ecn_mode_pending(tp))
+		if (tcp_ca_no_fallback_rfc3168(sk))
+			tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
+		else if (tcp_ecn_mode_pending(tp))
 			/* Downgrade from AccECN, or requested initially */
 			tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
 		break;
@@ -531,9 +533,11 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
 	}
 }
 
-static inline void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th,
+static inline void tcp_ecn_rcv_syn(struct sock *sk, const struct tcphdr *th,
 				   const struct sk_buff *skb)
 {
+	struct tcp_sock *tp = tcp_sk(sk);
+
 	if (tcp_ecn_mode_pending(tp)) {
 		if (!tcp_accecn_syn_requested(th)) {
 			/* Downgrade to classic ECN feedback */
@@ -545,7 +549,8 @@ static inline void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th,
 			tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
 		}
 	}
-	if (tcp_ecn_mode_rfc3168(tp) && (!th->ece || !th->cwr))
+	if (tcp_ecn_mode_rfc3168(tp) &&
+	    (!th->ece || !th->cwr || tcp_ca_no_fallback_rfc3168(sk)))
 		tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
 }
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b4098d5cce48..6b10333fedd1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6834,7 +6834,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 		tp->snd_wl1    = TCP_SKB_CB(skb)->seq;
 		tp->max_window = tp->snd_wnd;
 
-		tcp_ecn_rcv_syn(tp, th, skb);
+		tcp_ecn_rcv_syn(sk, th, skb);
 
 		tcp_mtup_init(sk);
 		tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index ded2cf1f6006..512920b23968 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -486,9 +486,10 @@ static void tcp_ecn_openreq_child(struct sock *sk,
 		tp->accecn_opt_demand = 1;
 		tcp_ecn_received_counters_payload(sk, skb);
 	} else {
-		tcp_ecn_mode_set(tp, inet_rsk(req)->ecn_ok ?
-				     TCP_ECN_MODE_RFC3168 :
-				     TCP_ECN_DISABLED);
+		if (inet_rsk(req)->ecn_ok && !tcp_ca_no_fallback_rfc3168(sk))
+			tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
+		else
+			tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
 	}
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 07/14] tcp: accecn: handle unexpected AccECN negotiation feedback
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (5 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 06/14] tcp: disable RFC3168 fallback identifier " chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 11:45   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 08/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

According to Section 3.1.2 of AccECN spec (RFC9768), if a TCP Client
has sent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1)
then receives a SYN/ACK with the currently reserved combination
(AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a
combination, the Client MUST enable AccECN mode as if the SYN/ACK
confirmed that the Server supported AccECN and as if it fed back that
the IP-ECN field on the SYN had arrived unchanged.

Fixes: 3cae34274c79 ("tcp: accecn: AccECN negotiation").
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

---
v5:
- Add "Fixes" tag.

v3:
- Update commit message to fix old AccECN commits.
---
 include/net/tcp_ecn.h | 44 ++++++++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index a7ba21d298ff..c66f0d944e1c 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -473,6 +473,26 @@ static inline u8 tcp_accecn_option_init(const struct sk_buff *skb,
 	return TCP_ACCECN_OPT_COUNTER_SEEN;
 }
 
+static inline void tcp_ecn_rcv_synack_accecn(struct sock *sk,
+					     const struct sk_buff *skb, u8 dsf)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
+	tp->syn_ect_rcv = dsf & INET_ECN_MASK;
+	/* Demand Accurate ECN option in response to the SYN on the SYN/ACK
+	 * and the TCP server will try to send one more packet with an AccECN
+	 * Option at a later point during the connection.
+	 */
+	if (tp->rx_opt.accecn &&
+	    tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
+		u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
+
+		tcp_accecn_saw_opt_fail_recv(tp, saw_opt);
+		tp->accecn_opt_demand = 2;
+	}
+}
+
 /* See Table 2 of the AccECN draft */
 static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb,
 				      const struct tcphdr *th, u8 ip_dsfield)
@@ -495,13 +515,11 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
 		tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
 		break;
 	case 0x1:
-	case 0x5:
 		/* +========+========+============+=============+
 		 * | A      | B      |  SYN/ACK   |  Feedback   |
 		 * |        |        |    B->A    |  Mode of A  |
 		 * |        |        | AE CWR ECE |             |
 		 * +========+========+============+=============+
-		 * | AccECN | Nonce  | 1   0   1  | (Reserved)  |
 		 * | AccECN | ECN    | 0   0   1  | Classic ECN |
 		 * | Nonce  | AccECN | 0   0   1  | Classic ECN |
 		 * | ECN    | AccECN | 0   0   1  | Classic ECN |
@@ -509,20 +527,20 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
 		 */
 		if (tcp_ca_no_fallback_rfc3168(sk))
 			tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
-		else if (tcp_ecn_mode_pending(tp))
-			/* Downgrade from AccECN, or requested initially */
+		else
 			tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
 		break;
-	default:
-		tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
-		tp->syn_ect_rcv = ip_dsfield & INET_ECN_MASK;
-		if (tp->rx_opt.accecn &&
-		    tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
-			u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
-
-			tcp_accecn_saw_opt_fail_recv(tp, saw_opt);
-			tp->accecn_opt_demand = 2;
+	case 0x5:
+		if (tcp_ecn_mode_pending(tp)) {
+			tcp_ecn_rcv_synack_accecn(sk, skb, ip_dsfield);
+			if (INET_ECN_is_ce(ip_dsfield)) {
+				tp->received_ce++;
+				tp->received_ce_pending++;
+			}
 		}
+		break;
+	default:
+		tcp_ecn_rcv_synack_accecn(sk, skb, ip_dsfield);
 		if (INET_ECN_is_ce(ip_dsfield) &&
 		    tcp_accecn_validate_syn_feedback(sk, ace,
 						     tp->syn_ect_snt)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 08/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (6 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 07/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 11:47   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 09/14] tcp: move increment of num_retrans chia-yu.chang
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Based on AccECN spec (RFC9768), if the sender of an AccECN SYN
(the TCP Client) times out before receiving the SYN/ACK, it SHOULD
attempt to negotiate the use of AccECN at least one more time by
continuing to set all three TCP ECN flags (AE,CWR,ECE) = (1,1,1) on
the first retransmitted SYN (using the usual retransmission time-outs).

If this first retransmission also fails to be acknowledged, in
deployment scenarios where AccECN path traversal might be problematic,
the TCP Client SHOULD send subsequent retransmissions of the SYN with
the three TCP-ECN flags cleared (AE,CWR,ECE) = (0,0,0).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

---
v5:
- Update commit message and the if condition statement.
---
 net/ipv4/tcp_output.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d475f80b2248..71f65bb26675 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3574,12 +3574,15 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 			tcp_retrans_try_collapse(sk, skb, avail_wnd);
 	}
 
-	/* RFC3168, section 6.1.1.1. ECN fallback
-	 * As AccECN uses the same SYN flags (+ AE), this check covers both
-	 * cases.
-	 */
-	if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) == TCPHDR_SYN_ECN)
-		tcp_ecn_clear_syn(sk, skb);
+	if (!tcp_ecn_mode_pending(tp) || icsk->icsk_retransmits > 1) {
+		/* RFC3168, section 6.1.1.1. ECN fallback
+		 * As AccECN uses the same SYN flags (+ AE), this check
+		 * covers both cases.
+		 */
+		if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) ==
+		    TCPHDR_SYN_ECN)
+			tcp_ecn_clear_syn(sk, skb);
+	}
 
 	/* Update global and local TCP statistics. */
 	segs = tcp_skb_pcount(skb);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 09/14] tcp: move increment of num_retrans
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (7 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 08/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 11:56   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Before this patch, num_retrans = 0 for the first SYN/ACK and the first
retransmitted SYN/ACK; however, an upcoming change will need to
differentiate between those two conditions. This patch moves the
increment of num_tranns before rtx_syn_ack() so we can distinguish
between these two cases when making SYN/ACK.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 net/ipv4/tcp_output.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 71f65bb26675..90a71556b93c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4609,6 +4609,7 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
 	/* Paired with WRITE_ONCE() in sock_setsockopt() */
 	if (READ_ONCE(sk->sk_txrehash) == SOCK_TXREHASH_ENABLED)
 		WRITE_ONCE(tcp_rsk(req)->txhash, net_tx_rndhash());
+	WRITE_ONCE(req->num_retrans, req->num_retrans + 1);
 	res = af_ops->send_synack(sk, NULL, &fl, req, NULL, TCP_SYNACK_NORMAL,
 				  NULL);
 	if (!res) {
@@ -4622,7 +4623,8 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
 			tcp_sk_rw(sk)->total_retrans++;
 		}
 		trace_tcp_retransmit_synack(sk, req);
-		WRITE_ONCE(req->num_retrans, req->num_retrans + 1);
+	} else {
+		WRITE_ONCE(req->num_retrans, req->num_retrans - 1);
 	}
 	return res;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (8 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 09/14] tcp: move increment of num_retrans chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 12:07   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

For Accurate ECN, the first SYN/ACK sent by the TCP server shall set the
ACE flag (see Table 1 of RFC9768) and the AccECN option to complete the
capability negotiation. However, if the TCP server needs to retransmit such
a SYN/ACK (for example, because it did not receive an ACK acknowledging its
SYN/ACK, or received a second SYN requesting AccECN support), the TCP server
retransmits the SYN/ACK without the AccECN option. This is because the
SYN/ACK may be lost due to congestion, or a middlebox may block the AccECN
option. Furthermore, if this retransmission also times out, to expedite
connection establishment, the TCP server should retransmit the SYN/ACK with
(AE,CWR,ECE) = (0,0,0) and without the AccECN option, while maintaining
AccECN feedback mode.

This complies with Section 3.2.3.2.2 of the AccECN specification (RFC9768).

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/net/tcp_ecn.h | 14 ++++++++++----
 net/ipv4/tcp_output.c |  2 +-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index c66f0d944e1c..99d095ed01b3 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -651,10 +651,16 @@ static inline void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb)
 static inline void
 tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
 {
-	if (tcp_rsk(req)->accecn_ok)
-		tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
-	else if (inet_rsk(req)->ecn_ok)
-		th->ece = 1;
+	if (!req->num_retrans || !req->num_timeout) {
+		if (tcp_rsk(req)->accecn_ok)
+			tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
+		else if (inet_rsk(req)->ecn_ok)
+			th->ece = 1;
+	} else if (tcp_rsk(req)->accecn_ok) {
+		th->ae  = 0;
+		th->cwr = 0;
+		th->ece = 0;
+	}
 }
 
 static inline bool tcp_accecn_option_beacon_check(const struct sock *sk)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 90a71556b93c..37c04da4cfb1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1109,7 +1109,7 @@ static unsigned int tcp_synack_options(const struct sock *sk,
 
 	if (treq->accecn_ok &&
 	    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn_option) &&
-	    req->num_timeout < 1 && remaining >= TCPOLEN_ACCECN_BASE) {
+	    req->num_retrans < 1 && remaining >= TCPOLEN_ACCECN_BASE) {
 		opts->use_synack_ecn_bytes = 1;
 		remaining -= tcp_options_fit_accecn(opts, 0, remaining);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (9 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 12:17   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 12/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Based on specification:
  https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt

Based on Section 3.1.5 of AccECN spec (RFC9768), a TCP Server in
AccECN mode MUST NOT set ECT on any packet for the rest of the connection,
if it has received or sent at least one valid SYN or Acceptable SYN/ACK
with (AE,CWR,ECE) = (0,0,0) during the handshake.

In addition, a host in AccECN mode that is feeding back the IP-ECN
field on a SYN or SYN/ACK MUST feed back the IP-ECN field on the
latest valid SYN or acceptable SYN/ACK to arrive.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/net/tcp_ecn.h    |  4 +++-
 net/ipv4/tcp_input.c     |  2 ++
 net/ipv4/tcp_minisocks.c | 33 +++++++++++++++++++++++----------
 net/ipv4/tcp_output.c    |  8 +++++---
 4 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 99d095ed01b3..88a328e7bcde 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -649,7 +649,8 @@ static inline void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb)
 }
 
 static inline void
-tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
+tcp_ecn_make_synack(struct sock *sk, const struct request_sock *req,
+		    struct tcphdr *th)
 {
 	if (!req->num_retrans || !req->num_timeout) {
 		if (tcp_rsk(req)->accecn_ok)
@@ -660,6 +661,7 @@ tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
 		th->ae  = 0;
 		th->cwr = 0;
 		th->ece = 0;
+		tcp_accecn_fail_mode_set(tcp_sk(sk), TCP_ACCECN_ACE_FAIL_SEND);
 	}
 }
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6b10333fedd1..cc39056d446f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6213,6 +6213,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
 	if (th->syn) {
 		if (tcp_ecn_mode_accecn(tp)) {
 			accecn_reflector = true;
+			tp->syn_ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
+					  INET_ECN_MASK;
 			if (tp->rx_opt.accecn &&
 			    tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
 				u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 512920b23968..4a9190df0668 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -749,16 +749,29 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 		 */
 		if (!tcp_oow_rate_limited(sock_net(sk), skb,
 					  LINUX_MIB_TCPACKSKIPPEDSYNRECV,
-					  &tcp_rsk(req)->last_oow_ack_time) &&
-
-		    !tcp_rtx_synack(sk, req)) {
-			unsigned long expires = jiffies;
-
-			expires += reqsk_timeout(req, TCP_RTO_MAX);
-			if (!fastopen)
-				mod_timer_pending(&req->rsk_timer, expires);
-			else
-				req->rsk_timer.expires = expires;
+					  &tcp_rsk(req)->last_oow_ack_time)) {
+			if (tcp_rsk(req)->accecn_ok) {
+				u8 ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
+					     INET_ECN_MASK;
+
+				tcp_rsk(req)->syn_ect_rcv = ect_rcv;
+				if (tcp_accecn_ace(tcp_hdr(skb)) == 0x0) {
+					u8 fail_mode = TCP_ACCECN_ACE_FAIL_RECV;
+
+					tcp_accecn_fail_mode_set(tcp_sk(sk),
+								 fail_mode);
+				}
+			}
+			if (!tcp_rtx_synack(sk, req)) {
+				unsigned long expires = jiffies;
+
+				expires += reqsk_timeout(req, TCP_RTO_MAX);
+				if (!fastopen)
+					mod_timer_pending(&req->rsk_timer,
+							  expires);
+				else
+					req->rsk_timer.expires = expires;
+			}
 		}
 		return NULL;
 	}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 37c04da4cfb1..d52229d32b4d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -334,11 +334,13 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
 		return;
 
 	ecn_ect_1 = tp->ecn_flags & TCP_ECN_ECT_1;
-	if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
+	if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp) &&
+	    !tcp_accecn_ace_fail_send(tp))
 		__INET_ECN_xmit(sk, true);
 
 	if (tcp_ecn_mode_accecn(tp)) {
-		if (!ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
+		if (!ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp) &&
+		    !tcp_accecn_ace_fail_send(tp))
 			INET_ECN_xmit(sk);
 		tcp_accecn_set_ace(tp, skb, th);
 		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN;
@@ -4006,7 +4008,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 	memset(th, 0, sizeof(struct tcphdr));
 	th->syn = 1;
 	th->ack = 1;
-	tcp_ecn_make_synack(req, th);
+	tcp_ecn_make_synack((struct sock *)sk, req, th);
 	th->source = htons(ireq->ir_num);
 	th->dest = ireq->ir_rmt_port;
 	skb->mark = ireq->ir_mark;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 12/14] tcp: accecn: fallback outgoing half link to non-AccECN
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (10 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 12:18   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 13/14] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server
is in AccECN mode and in SYN-RCVD state, and if it receives a value of
zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the
connection the Server MUST NOT set ECT on outgoing packets and MUST
NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it
MUST NOT disable AccECN feedback.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

---
v3:
- Remove unnecessary brackets.
---
 include/net/tcp_ecn.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 88a328e7bcde..c82b5a35db28 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -175,7 +175,9 @@ static inline void tcp_accecn_third_ack(struct sock *sk,
 	switch (ace) {
 	case 0x0:
 		/* Invalid value */
-		tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV);
+		if (!TCP_SKB_CB(skb)->sacked)
+			tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV |
+						     TCP_ACCECN_OPT_FAIL_RECV);
 		break;
 	case 0x7:
 	case 0x5:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 13/14] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (11 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 12/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 12:21   ` Paolo Abeni
  2025-10-30 14:34 ` [PATCH v5 net-next 14/14] tcp: accecn: enable AccECN chia-yu.chang
  2025-10-31  0:56 ` [PATCH v5 net-next 00/14] AccECN protocol case handling series Jakub Kicinski
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Detect spurious retransmission of a previously sent ACK carrying the
AccECN option after the second retransmission. Since this might be caused
by the middlebox dropping ACK with options it does not recognize, disable
the sending of the AccECN option in all subsequent ACKs. This patch
follows Section 3.2.3.2.2 of AccECN spec (RFC9768).

Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl:
(TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and
persistently sends AccECN option once it fits into TCP option space.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

---
v5:
- Add empty line between variable declarations and code
---
 Documentation/networking/ip-sysctl.rst |  4 +++-
 include/linux/tcp.h                    |  3 ++-
 include/net/tcp_ecn.h                  |  2 ++
 net/ipv4/sysctl_net_ipv4.c             |  2 +-
 net/ipv4/tcp_input.c                   | 10 ++++++++++
 net/ipv4/tcp_output.c                  |  7 ++++++-
 6 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 7cd35bfd39e6..f5d9e596ee61 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -482,7 +482,9 @@ tcp_ecn_option - INTEGER
 	1 Send AccECN option sparingly according to the minimum option
 	  rules outlined in draft-ietf-tcpm-accurate-ecn.
 	2 Send AccECN option on every packet whenever it fits into TCP
-	  option space.
+	  option space except when AccECN fallback is triggered.
+	3 Send AccECN option on every packet whenever it fits into TCP
+	  option space even when AccECN fallback is triggered.
 	= ============================================================
 
 	Default: 2
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 683f38362977..32b031d09294 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -294,7 +294,8 @@ struct tcp_sock {
 	u8	nonagle     : 4,/* Disable Nagle algorithm?             */
 		rate_app_limited:1;  /* rate_{delivered,interval_us} limited? */
 	u8	received_ce_pending:4, /* Not yet transmit cnt of received_ce */
-		unused2:4;
+		accecn_opt_sent:1,/* Sent AccECN option in previous ACK */
+		unused2:3;
 	u8	accecn_minlen:2,/* Minimum length of AccECN option sent */
 		est_ecnfield:2,/* ECN field for AccECN delivered estimates */
 		accecn_opt_demand:2,/* Demand AccECN option for n next ACKs */
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index c82b5a35db28..d6ffa8492365 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -29,6 +29,7 @@ enum tcp_accecn_option {
 	TCP_ACCECN_OPTION_DISABLED = 0,
 	TCP_ACCECN_OPTION_MINIMUM = 1,
 	TCP_ACCECN_OPTION_FULL = 2,
+	TCP_ACCECN_OPTION_PERSIST = 3,
 };
 
 /* Apply either ECT(0) or ECT(1) based on TCP_CONG_WANTS_ECT_1 flag */
@@ -406,6 +407,7 @@ static inline void tcp_accecn_init_counters(struct tcp_sock *tp)
 	tp->received_ce_pending = 0;
 	__tcp_accecn_init_bytes_counters(tp->received_ecn_bytes);
 	__tcp_accecn_init_bytes_counters(tp->delivered_ecn_bytes);
+	tp->accecn_opt_sent = 0;
 	tp->accecn_minlen = 0;
 	tp->accecn_opt_demand = 0;
 	tp->est_ecnfield = 0;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0c7c8f9041cb..6695a6022539 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -749,7 +749,7 @@ static struct ctl_table ipv4_net_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dou8vec_minmax,
 		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_TWO,
+		.extra2		= SYSCTL_THREE,
 	},
 	{
 		.procname	= "tcp_ecn_option_beacon",
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index cc39056d446f..ebedcf0ea0d0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4802,6 +4802,8 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq)
 
 static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
 {
+	struct tcp_sock *tp = tcp_sk(sk);
+
 	/* When the ACK path fails or drops most ACKs, the sender would
 	 * timeout and spuriously retransmit the same segment repeatedly.
 	 * If it seems our ACKs are not reaching the other side,
@@ -4821,6 +4823,14 @@ static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
 	/* Save last flowlabel after a spurious retrans. */
 	tcp_save_lrcv_flowlabel(sk, skb);
 #endif
+	/* Check DSACK info to detect that the previous ACK carrying the
+	 * AccECN option was lost after the second retransmision, and then
+	 * stop sending AccECN option in all subsequent ACKs.
+	 */
+	if (tcp_ecn_mode_accecn(tp) &&
+	    TCP_SKB_CB(skb)->seq == tp->duplicate_sack[0].start_seq &&
+	    tp->accecn_opt_sent)
+		tcp_accecn_fail_mode_set(tp, TCP_ACCECN_OPT_FAIL_SEND);
 }
 
 static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d52229d32b4d..41e9a7a50538 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -719,9 +719,12 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
 		if (tp) {
 			tp->accecn_minlen = 0;
 			tp->accecn_opt_tstamp = tp->tcp_mstamp;
+			tp->accecn_opt_sent = 1;
 			if (tp->accecn_opt_demand)
 				tp->accecn_opt_demand--;
 		}
+	} else if (tp) {
+		tp->accecn_opt_sent = 0;
 	}
 
 	if (unlikely(OPTION_SACK_ADVERTISE & options)) {
@@ -1191,7 +1194,9 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
 	if (tcp_ecn_mode_accecn(tp)) {
 		int ecn_opt = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn_option);
 
-		if (ecn_opt && tp->saw_accecn_opt && !tcp_accecn_opt_fail_send(tp) &&
+		if (ecn_opt && tp->saw_accecn_opt &&
+		    (ecn_opt >= TCP_ACCECN_OPTION_PERSIST ||
+		     !tcp_accecn_opt_fail_send(tp)) &&
 		    (ecn_opt >= TCP_ACCECN_OPTION_FULL || tp->accecn_opt_demand ||
 		     tcp_accecn_option_beacon_check(sk))) {
 			opts->use_synack_ecn_bytes = 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 net-next 14/14] tcp: accecn: enable AccECN
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (12 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 13/14] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
@ 2025-10-30 14:34 ` chia-yu.chang
  2025-11-06 12:24   ` Paolo Abeni
  2025-10-31  0:56 ` [PATCH v5 net-next 00/14] AccECN protocol case handling series Jakub Kicinski
  14 siblings, 1 reply; 40+ messages in thread
From: chia-yu.chang @ 2025-10-30 14:34 UTC (permalink / raw)
  To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Enable Accurate ECN negotiation and request for incoming and
outgoing connection by setting sysctl_tcp_ecn:

+==============+===========================================+
|              |  Highest ECN variant (Accurate ECN, ECN,  |
|   tcp_ecn    |  or no ECN) to be negotiated & requested  |
|              +---------------------+---------------------+
|              | Incoming connection | Outgoing connection |
+==============+=====================+=====================+
|      0       |        No ECN       |        No ECN       |
|      1       |         ECN         |         ECN         |
|      2       |         ECN         |        No ECN       |
+--------------+---------------------+---------------------+
|      3       |     Accurate ECN    |     Accurate ECN    |
|      4       |     Accurate ECN    |         ECN         |
|      5       |     Accurate ECN    |        No ECN       |
+==============+=====================+=====================+

Refer Documentation/networking/ip-sysctl.rst for more details.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 net/ipv4/sysctl_net_ipv4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 6695a6022539..7edba98a91cb 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -47,7 +47,7 @@ static unsigned int udp_child_hash_entries_max = UDP_HTABLE_SIZE_MAX;
 static int tcp_plb_max_rounds = 31;
 static int tcp_plb_max_cong_thresh = 256;
 static unsigned int tcp_tw_reuse_delay_max = TCP_PAWS_MSL * MSEC_PER_SEC;
-static int tcp_ecn_mode_max = 2;
+static int tcp_ecn_mode_max = 5;
 static u32 icmp_errors_extension_mask_all =
 	GENMASK_U8(ICMP_ERR_EXT_COUNT - 1, 0);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 00/14] AccECN protocol case handling series
  2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
                   ` (13 preceding siblings ...)
  2025-10-30 14:34 ` [PATCH v5 net-next 14/14] tcp: accecn: enable AccECN chia-yu.chang
@ 2025-10-31  0:56 ` Jakub Kicinski
  2025-10-31  7:32   ` Chia-Yu Chang (Nokia)
  14 siblings, 1 reply; 40+ messages in thread
From: Jakub Kicinski @ 2025-10-31  0:56 UTC (permalink / raw)
  To: chia-yu.chang
  Cc: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, stephen, xiyou.wangcong,
	jiri, davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
	linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
	ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
	Jason_Livingood, vidhi_goel

On Thu, 30 Oct 2025 15:34:21 +0100 chia-yu.chang@nokia-bell-labs.com
wrote:
> Plesae find the v5 AccECN case handling patch series, which covers
> several excpetional case handling of Accurate ECN spec (RFC9768),
> adds new identifiers to be used by CC modules, adds ecn_delta into
> rate_sample, and keeps the ACE counter for computation, etc.

Is this a pure repost or you changed something?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 net-next 00/14] AccECN protocol case handling series
  2025-10-31  0:56 ` [PATCH v5 net-next 00/14] AccECN protocol case handling series Jakub Kicinski
@ 2025-10-31  7:32   ` Chia-Yu Chang (Nokia)
  2025-10-31 14:06     ` Jakub Kicinski
  0 siblings, 1 reply; 40+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-10-31  7:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: pabeni@redhat.com, edumazet@google.com, parav@nvidia.com,
	linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
	dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
	netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
	stephen@networkplumber.org, xiyou.wangcong@gmail.com,
	jiri@resnulli.us, davem@davemloft.net, andrew+netdev@lunn.ch,
	donald.hunter@gmail.com, ast@fiberby.net, liuhangbin@gmail.com,
	shuah@kernel.org, linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, Vidhi Goel

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org> 
> Sent: Friday, October 31, 2025 1:57 AM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>
> Cc: pabeni@redhat.com; edumazet@google.com; parav@nvidia.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@google.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v5 net-next 00/14] AccECN protocol case handling series
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On Thu, 30 Oct 2025 15:34:21 +0100 chia-yu.chang@nokia-bell-labs.com
> wrote:
> > Plesae find the v5 AccECN case handling patch series, which covers 
> > several excpetional case handling of Accurate ECN spec (RFC9768), adds 
> > new identifiers to be used by CC modules, adds ecn_delta into 
> > rate_sample, and keeps the ACE counter for computation, etc.
> 
> Is this a pure repost or you changed something?

It only removes one empty line between "Fixes:" and "Signed-off" - no error was reported when using checkpatch.pl, but an error showed in patchworks pipeline.

Shall I resubmit with v6 tag? Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 00/14] AccECN protocol case handling series
  2025-10-31  7:32   ` Chia-Yu Chang (Nokia)
@ 2025-10-31 14:06     ` Jakub Kicinski
  0 siblings, 0 replies; 40+ messages in thread
From: Jakub Kicinski @ 2025-10-31 14:06 UTC (permalink / raw)
  To: Chia-Yu Chang (Nokia)
  Cc: pabeni@redhat.com, edumazet@google.com, parav@nvidia.com,
	linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
	dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
	netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
	stephen@networkplumber.org, xiyou.wangcong@gmail.com,
	jiri@resnulli.us, davem@davemloft.net, andrew+netdev@lunn.ch,
	donald.hunter@gmail.com, ast@fiberby.net, liuhangbin@gmail.com,
	shuah@kernel.org, linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, Vidhi Goel

On Fri, 31 Oct 2025 07:32:27 +0000 Chia-Yu Chang (Nokia) wrote:
> > Is this a pure repost or you changed something?  
> 
> It only removes one empty line between "Fixes:" and "Signed-off" - no
> error was reported when using checkpatch.pl, but an error showed in
> patchworks pipeline.
> 
> Shall I resubmit with v6 tag? Thanks.

No need. Just make sure to add the changelog each time going forward.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 01/14] tcp: try to avoid safer when ACKs are thinned
  2025-10-30 14:34 ` [PATCH v5 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
@ 2025-11-06 10:57   ` Paolo Abeni
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 10:57 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Ilpo Järvinen <ij@kernel.org>
> 
> Add newly acked pkts EWMA. When ACK thinning occurs, select
> between safer and unsafe cep delta in AccECN processing based
> on it. If the packets ACKed per ACK tends to be large, don't
> conservatively assume ACE field overflow.
> 
> This patch uses the existing 2-byte holes in the rx group for new
> u16 variables withtout creating more holes. Below are the pahole
> outcomes before and after this patch:
> 
> [BEFORE THIS PATCH]
> struct tcp_sock {
>     [...]
>     u32                        delivered_ecn_bytes[3]; /*  2744    12 */
>     /* XXX 4 bytes hole, try to pack */
> 
>     [...]
>     __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */
> 
>     [...]
>     /* size: 3264, cachelines: 51, members: 177 */
> }
> 
> [AFTER THIS PATCH]
> struct tcp_sock {
>     [...]
>     u32                        delivered_ecn_bytes[3]; /*  2744    12 */
>     u16                        pkts_acked_ewma;        /*  2756     2 */
>     /* XXX 2 bytes hole, try to pack */
> 
>     [...]
>     __cacheline_group_end__tcp_sock_write_rx[0];       /*  2816     0 */
> 
>     [...]
>     /* size: 3264, cachelines: 51, members: 178 */
> }
> 
> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN
  2025-10-30 14:34 ` [PATCH v5 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
@ 2025-11-06 11:01   ` Paolo Abeni
  2025-11-06 11:08     ` Paolo Abeni
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 11:01 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Ilpo Järvinen <ij@kernel.org>
> 
> As AccECN may keep CWR bit asserted due to different
> interpretation of the bit, flushing with GRO because of
> CWR may effectively disable GRO until AccECN counter
> field changes such that CWR-bit becomes 0.
> 
> There is no harm done from not immediately forwarding the
> CWR'ed segment with RFC3168 ECN.
> 
> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Please provide a test/update the existing one to cover this case or move
to a later series. Possibly both :)

/P


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
  2025-10-30 14:34 ` [PATCH v5 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN chia-yu.chang
@ 2025-11-06 11:06   ` Paolo Abeni
  2025-11-06 11:26     ` Paolo Abeni
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 11:06 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> No functional changes.
> 
> Co-developed-by: Ilpo Järvinen <ij@kernel.org>
> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> ---
>  include/linux/skbuff.h | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index a7cc3d1f4fd1..74d6a209e203 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -671,7 +671,12 @@ enum {
>  	/* This indicates the skb is from an untrusted source. */
>  	SKB_GSO_DODGY = 1 << 1,
>  
> -	/* This indicates the tcp segment has CWR set. */
> +	/* For Tx, this indicates the first TCP segment has CWR set, and any
> +	 * subsequent segment in the same skb has CWR cleared. This cannot be
> +	 * used on Rx, because the connection to which the segment belongs is
> +	 * not tracked to use RFC3168 or Accurate ECN, and using RFC3168 ECN
> +	 * offload may corrupt AccECN signal of AccECN segments.
> +	 */

The intended difference between RX and TX sounds bad to me; I think it
conflicts with the basic GRO design goal of making aggregated and
re-segmented traffic indistinguishable from the original stream. Also
what about forwarded packet?

/P


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN
  2025-11-06 11:01   ` Paolo Abeni
@ 2025-11-06 11:08     ` Paolo Abeni
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 11:08 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 11/6/25 12:01 PM, Paolo Abeni wrote:
> On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
>> From: Ilpo Järvinen <ij@kernel.org>
>>
>> As AccECN may keep CWR bit asserted due to different
>> interpretation of the bit, flushing with GRO because of
>> CWR may effectively disable GRO until AccECN counter
>> field changes such that CWR-bit becomes 0.
>>
>> There is no harm done from not immediately forwarding the
>> CWR'ed segment with RFC3168 ECN.
>>
>> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
>> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> Please provide a test/update the existing one to cover this case or move
> to a later series. Possibly both :)

Whoops, sorry. I'm looking at the patch in order and when I wrote the
above I haven't seen yet patch 4/14. Please ignore.

/P


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
  2025-11-06 11:06   ` Paolo Abeni
@ 2025-11-06 11:26     ` Paolo Abeni
  2025-11-11 11:02       ` Chia-Yu Chang (Nokia)
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 11:26 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 11/6/25 12:06 PM, Paolo Abeni wrote:
> On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
>> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>>
>> No functional changes.
>>
>> Co-developed-by: Ilpo Järvinen <ij@kernel.org>
>> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
>> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>> ---
>>  include/linux/skbuff.h | 13 ++++++++++++-
>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index a7cc3d1f4fd1..74d6a209e203 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -671,7 +671,12 @@ enum {
>>  	/* This indicates the skb is from an untrusted source. */
>>  	SKB_GSO_DODGY = 1 << 1,
>>  
>> -	/* This indicates the tcp segment has CWR set. */
>> +	/* For Tx, this indicates the first TCP segment has CWR set, and any
>> +	 * subsequent segment in the same skb has CWR cleared. This cannot be
>> +	 * used on Rx, because the connection to which the segment belongs is
>> +	 * not tracked to use RFC3168 or Accurate ECN, and using RFC3168 ECN
>> +	 * offload may corrupt AccECN signal of AccECN segments.
>> +	 */
> 
> The intended difference between RX and TX sounds bad to me; I think it
> conflicts with the basic GRO design goal of making aggregated and
> re-segmented traffic indistinguishable from the original stream. Also
> what about forwarded packet?

Uhm... I missed completely the point that SKB_GSO_TCP_ECN is TX path
only, i.e. GRO never produces aggregated SKB_GSO_TCP_ECN packets. Except
virtio_net uses it in the RX path ( virtio_net_hdr_to_skb ). Please
clarify the statement accordingly.

/P



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 05/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
  2025-10-30 14:34 ` [PATCH v5 net-next 05/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
@ 2025-11-06 11:38   ` Paolo Abeni
  2025-11-11 11:02     ` Chia-Yu Chang (Nokia)
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 11:38 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Olivier Tilmans

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 7f5df7a71f62..d475f80b2248 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -328,12 +328,17 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
>  			 struct tcphdr *th, int tcp_header_len)
>  {
>  	struct tcp_sock *tp = tcp_sk(sk);
> +	bool ecn_ect_1;
>  
>  	if (!tcp_ecn_mode_any(tp))
>  		return;
>  
> +	ecn_ect_1 = tp->ecn_flags & TCP_ECN_ECT_1;
> +	if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
> +		__INET_ECN_xmit(sk, true);

I'm possibly lost, but I can't find ecn_flags TCP_ECN_ECT_1 bit being
set here or elsewhere in this series.

Also why isn't this chunk under `if (tcp_ecn_mode_accecn(tp))` ?

/P


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 06/14] tcp: disable RFC3168 fallback identifier for CC modules
  2025-10-30 14:34 ` [PATCH v5 net-next 06/14] tcp: disable RFC3168 fallback identifier " chia-yu.chang
@ 2025-11-06 11:42   ` Paolo Abeni
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 11:42 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> When AccECN is not successfully negociated for a TCP flow, it defaults
> fallback to classic ECN (RFC3168). However, L4S service will fallback
> to non-ECN.
> 
> This patch enables congestion control module to control whether it
> should not fallback to classic ECN after unsuccessful AccECN negotiation.
> A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this
> behavior expected by the CA.
> 
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 07/14] tcp: accecn: handle unexpected AccECN negotiation feedback
  2025-10-30 14:34 ` [PATCH v5 net-next 07/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
@ 2025-11-06 11:45   ` Paolo Abeni
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 11:45 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> According to Section 3.1.2 of AccECN spec (RFC9768), if a TCP Client
> has sent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1)
> then receives a SYN/ACK with the currently reserved combination
> (AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a
> combination, the Client MUST enable AccECN mode as if the SYN/ACK
> confirmed that the Server supported AccECN and as if it fed back that
> the IP-ECN field on the SYN had arrived unchanged.
> 
> Fixes: 3cae34274c79 ("tcp: accecn: AccECN negotiation").
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 08/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation
  2025-10-30 14:34 ` [PATCH v5 net-next 08/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
@ 2025-11-06 11:47   ` Paolo Abeni
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 11:47 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> Based on AccECN spec (RFC9768), if the sender of an AccECN SYN
> (the TCP Client) times out before receiving the SYN/ACK, it SHOULD
> attempt to negotiate the use of AccECN at least one more time by
> continuing to set all three TCP ECN flags (AE,CWR,ECE) = (1,1,1) on
> the first retransmitted SYN (using the usual retransmission time-outs).
> 
> If this first retransmission also fails to be acknowledged, in
> deployment scenarios where AccECN path traversal might be problematic,
> the TCP Client SHOULD send subsequent retransmissions of the SYN with
> the three TCP-ECN flags cleared (AE,CWR,ECE) = (0,0,0).
> 
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 09/14] tcp: move increment of num_retrans
  2025-10-30 14:34 ` [PATCH v5 net-next 09/14] tcp: move increment of num_retrans chia-yu.chang
@ 2025-11-06 11:56   ` Paolo Abeni
  2025-11-06 12:03     ` Paolo Abeni
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 11:56 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> Before this patch, num_retrans = 0 for the first SYN/ACK and the first
> retransmitted SYN/ACK; however, an upcoming change will need to
> differentiate between those two conditions. 

AFAICS, send_synack is invoked with a NULL dst only on retransmissions.
Perhaps you could use that info instead? moving forward and backward a
counter is not so nice.

/P


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 09/14] tcp: move increment of num_retrans
  2025-11-06 11:56   ` Paolo Abeni
@ 2025-11-06 12:03     ` Paolo Abeni
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 12:03 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 11/6/25 12:56 PM, Paolo Abeni wrote:
> On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
>> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>>
>> Before this patch, num_retrans = 0 for the first SYN/ACK and the first
>> retransmitted SYN/ACK; however, an upcoming change will need to
>> differentiate between those two conditions. 
> 
> AFAICS, send_synack is invoked with a NULL dst only on retransmissions.
> Perhaps you could use that info instead? moving forward and backward a
> counter is not so nice.

... except you need to propagate the information to nested call.
Possibly adding a TCP_SYNACK_RETRANS synack_type would fit?

/P


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
  2025-10-30 14:34 ` [PATCH v5 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
@ 2025-11-06 12:07   ` Paolo Abeni
  2025-11-11 11:02     ` Chia-Yu Chang (Nokia)
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 12:07 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel



On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> For Accurate ECN, the first SYN/ACK sent by the TCP server shall set the
> ACE flag (see Table 1 of RFC9768) and the AccECN option to complete the
> capability negotiation. However, if the TCP server needs to retransmit such
> a SYN/ACK (for example, because it did not receive an ACK acknowledging its
> SYN/ACK, or received a second SYN requesting AccECN support), the TCP server
> retransmits the SYN/ACK without the AccECN option. This is because the
> SYN/ACK may be lost due to congestion, or a middlebox may block the AccECN
> option. Furthermore, if this retransmission also times out, to expedite
> connection establishment, the TCP server should retransmit the SYN/ACK with
> (AE,CWR,ECE) = (0,0,0) and without the AccECN option, while maintaining
> AccECN feedback mode.
> 
> This complies with Section 3.2.3.2.2 of the AccECN specification (RFC9768).
> 
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> ---
>  include/net/tcp_ecn.h | 14 ++++++++++----
>  net/ipv4/tcp_output.c |  2 +-
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
> index c66f0d944e1c..99d095ed01b3 100644
> --- a/include/net/tcp_ecn.h
> +++ b/include/net/tcp_ecn.h
> @@ -651,10 +651,16 @@ static inline void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb)
>  static inline void
>  tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
>  {
> -	if (tcp_rsk(req)->accecn_ok)
> -		tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
> -	else if (inet_rsk(req)->ecn_ok)
> -		th->ece = 1;
> +	if (!req->num_retrans || !req->num_timeout) {

Why `if (!req->num_timeout)` is not a sufficient condition here?

Simplifying the above condition will make the TCP_SYNACK_RETRANS
alternative simpler, I think.

/P


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
  2025-10-30 14:34 ` [PATCH v5 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
@ 2025-11-06 12:17   ` Paolo Abeni
  2025-11-06 17:30     ` Chia-Yu Chang (Nokia)
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 12:17 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> @@ -4006,7 +4008,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
>  	memset(th, 0, sizeof(struct tcphdr));
>  	th->syn = 1;
>  	th->ack = 1;
> -	tcp_ecn_make_synack(req, th);
> +	tcp_ecn_make_synack((struct sock *)sk, req, th);
>  	th->source = htons(ireq->ir_num);
>  	th->dest = ireq->ir_rmt_port;
>  	skb->mark = ireq->ir_mark;

Whoops, I missed the const cast in the previous revisions. This could
make the code generated by the compiler for the caller incorrect -
assuming the changed field is actually constant.

I don't have a good idea on how to address this. Changing the argument
type for the whole call chain looks like a no go.

/P


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 12/14] tcp: accecn: fallback outgoing half link to non-AccECN
  2025-10-30 14:34 ` [PATCH v5 net-next 12/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
@ 2025-11-06 12:18   ` Paolo Abeni
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 12:18 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server
> is in AccECN mode and in SYN-RCVD state, and if it receives a value of
> zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the
> connection the Server MUST NOT set ECT on outgoing packets and MUST
> NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it
> MUST NOT disable AccECN feedback.
> 
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 13/14] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST
  2025-10-30 14:34 ` [PATCH v5 net-next 13/14] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
@ 2025-11-06 12:21   ` Paolo Abeni
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 12:21 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> Detect spurious retransmission of a previously sent ACK carrying the
> AccECN option after the second retransmission. Since this might be caused
> by the middlebox dropping ACK with options it does not recognize, disable
> the sending of the AccECN option in all subsequent ACKs. This patch
> follows Section 3.2.3.2.2 of AccECN spec (RFC9768).
> 
> Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl:
> (TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and
> persistently sends AccECN option once it fits into TCP option space.
> 
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 14/14] tcp: accecn: enable AccECN
  2025-10-30 14:34 ` [PATCH v5 net-next 14/14] tcp: accecn: enable AccECN chia-yu.chang
@ 2025-11-06 12:24   ` Paolo Abeni
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Abeni @ 2025-11-06 12:24 UTC (permalink / raw)
  To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
	kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
	xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
	liuhangbin, shuah, linux-kselftest, ij, ncardwell,
	koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
	cheshire, rs.ietf, Jason_Livingood, vidhi_goel

On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> 
> Enable Accurate ECN negotiation and request for incoming and
> outgoing connection by setting sysctl_tcp_ecn:
> 
> +==============+===========================================+
> |              |  Highest ECN variant (Accurate ECN, ECN,  |
> |   tcp_ecn    |  or no ECN) to be negotiated & requested  |
> |              +---------------------+---------------------+
> |              | Incoming connection | Outgoing connection |
> +==============+=====================+=====================+
> |      0       |        No ECN       |        No ECN       |
> |      1       |         ECN         |         ECN         |
> |      2       |         ECN         |        No ECN       |
> +--------------+---------------------+---------------------+
> |      3       |     Accurate ECN    |     Accurate ECN    |
> |      4       |     Accurate ECN    |         ECN         |
> |      5       |     Accurate ECN    |        No ECN       |
> +==============+=====================+=====================+
> 
> Refer Documentation/networking/ip-sysctl.rst for more details.
> 
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
  2025-11-06 12:17   ` Paolo Abeni
@ 2025-11-06 17:30     ` Chia-Yu Chang (Nokia)
  0 siblings, 0 replies; 40+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-11-06 17:30 UTC (permalink / raw)
  To: Paolo Abeni, edumazet@google.com, parav@nvidia.com,
	linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
	dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
	netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
	kuba@kernel.org, stephen@networkplumber.org,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
	liuhangbin@gmail.com, shuah@kernel.org,
	linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, Vidhi Goel

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com> 
> Sent: Thursday, November 6, 2025 1:18 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; parav@nvidia.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@google.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v5 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > @@ -4006,7 +4008,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
> >       memset(th, 0, sizeof(struct tcphdr));
> >       th->syn = 1;
> >       th->ack = 1;
> > -     tcp_ecn_make_synack(req, th);
> > +     tcp_ecn_make_synack((struct sock *)sk, req, th);
> >       th->source = htons(ireq->ir_num);
> >       th->dest = ireq->ir_rmt_port;
> >       skb->mark = ireq->ir_mark;
> 
> Whoops, I missed the const cast in the previous revisions. This could make the code generated by the compiler for the caller incorrect - assuming the changed field is actually constant.
> 
> I don't have a good idea on how to address this. Changing the argument type for the whole call chain looks like a no go.
> 
> /P

One thought I have now is to add one extra flag in request_sock.

By using this new flag in rquest_sock after calling tcp_rtx_synack, the ACCECN_FAIL_MODE can be set in sk.

Would it make sense to you?

Chia-Yu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
  2025-11-06 12:07   ` Paolo Abeni
@ 2025-11-11 11:02     ` Chia-Yu Chang (Nokia)
  0 siblings, 0 replies; 40+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-11-11 11:02 UTC (permalink / raw)
  To: Paolo Abeni, edumazet@google.com, parav@nvidia.com,
	linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
	dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
	netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
	kuba@kernel.org, stephen@networkplumber.org,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
	liuhangbin@gmail.com, shuah@kernel.org,
	linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, Vidhi Goel

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com> 
> Sent: Thursday, November 6, 2025 1:07 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; parav@nvidia.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@google.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v5 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >
> > For Accurate ECN, the first SYN/ACK sent by the TCP server shall set 
> > the ACE flag (see Table 1 of RFC9768) and the AccECN option to 
> > complete the capability negotiation. However, if the TCP server needs 
> > to retransmit such a SYN/ACK (for example, because it did not receive 
> > an ACK acknowledging its SYN/ACK, or received a second SYN requesting 
> > AccECN support), the TCP server retransmits the SYN/ACK without the 
> > AccECN option. This is because the SYN/ACK may be lost due to 
> > congestion, or a middlebox may block the AccECN option. Furthermore, 
> > if this retransmission also times out, to expedite connection 
> > establishment, the TCP server should retransmit the SYN/ACK with
> > (AE,CWR,ECE) = (0,0,0) and without the AccECN option, while 
> > maintaining AccECN feedback mode.
> >
> > This complies with Section 3.2.3.2.2 of the AccECN specification (RFC9768).
> >
> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> > ---
> >  include/net/tcp_ecn.h | 14 ++++++++++----  net/ipv4/tcp_output.c |  2 
> > +-
> >  2 files changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h index 
> > c66f0d944e1c..99d095ed01b3 100644
> > --- a/include/net/tcp_ecn.h
> > +++ b/include/net/tcp_ecn.h
> > @@ -651,10 +651,16 @@ static inline void tcp_ecn_clear_syn(struct sock 
> > *sk, struct sk_buff *skb)  static inline void  
> > tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)  
> > {
> > -     if (tcp_rsk(req)->accecn_ok)
> > -             tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
> > -     else if (inet_rsk(req)->ecn_ok)
> > -             th->ece = 1;
> > +     if (!req->num_retrans || !req->num_timeout) {
> 
> Why `if (!req->num_timeout)` is not a sufficient condition here?
> 
> Simplifying the above condition will make the TCP_SYNACK_RETRANS alternative simpler, I think.
> 
> /P

Hi Paolo,

AFAIK, req->num_timeout will be increased after tcp_rtx_synack() is done due to timeout, abut it does not cover the case when 2nd SYN is received.

But so far the AccECN spec requests to do the same fallback in both cases (i.e., either timeout or receive 2nd SYN).

So, here I would still think to use either num_retrans (or like you suggested to use different SYN_ACK types?)

Chia-Yu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 net-next 05/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
  2025-11-06 11:38   ` Paolo Abeni
@ 2025-11-11 11:02     ` Chia-Yu Chang (Nokia)
  2025-11-12 14:50       ` Paolo Abeni
  0 siblings, 1 reply; 40+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-11-11 11:02 UTC (permalink / raw)
  To: Paolo Abeni, edumazet@google.com, parav@nvidia.com,
	linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
	dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
	netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
	kuba@kernel.org, stephen@networkplumber.org,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
	liuhangbin@gmail.com, shuah@kernel.org,
	linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, Vidhi Goel
  Cc: Olivier Tilmans (Nokia)

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com> 
> Sent: Thursday, November 6, 2025 12:39 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; parav@nvidia.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@google.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Cc: Olivier Tilmans (Nokia) <olivier.tilmans@nokia.com>
> Subject: Re: [PATCH v5 net-next 05/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 
> > 7f5df7a71f62..d475f80b2248 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -328,12 +328,17 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
> >                        struct tcphdr *th, int tcp_header_len)  {
> >       struct tcp_sock *tp = tcp_sk(sk);
> > +     bool ecn_ect_1;
> >
> >       if (!tcp_ecn_mode_any(tp))
> >               return;
> >
> > +     ecn_ect_1 = tp->ecn_flags & TCP_ECN_ECT_1;
> > +     if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
> > +             __INET_ECN_xmit(sk, true);
> 
> I'm possibly lost, but I can't find ecn_flags TCP_ECN_ECT_1 bit being
> set here or elsewhere in this series.
> 
> Also why isn't this chunk under `if (tcp_ecn_mode_accecn(tp))` ?
> 
> /P
Hi Paolo,

This bit will be set by congestion control (TCP Prague, which will be submitted after AccECN patch series).

It is intended to use ECT-1 rather than ECT-0, and we were thinking this flag can be irrespective to AccECN negotiation.

Shall I put in the Prague patch series?

Chia-Yu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
  2025-11-06 11:26     ` Paolo Abeni
@ 2025-11-11 11:02       ` Chia-Yu Chang (Nokia)
  0 siblings, 0 replies; 40+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-11-11 11:02 UTC (permalink / raw)
  To: Paolo Abeni, edumazet@google.com, parav@nvidia.com,
	linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
	dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
	netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
	kuba@kernel.org, stephen@networkplumber.org,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
	liuhangbin@gmail.com, shuah@kernel.org,
	linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, Vidhi Goel

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com> 
> Sent: Thursday, November 6, 2025 12:26 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; parav@nvidia.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@google.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v5 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On 11/6/25 12:06 PM, Paolo Abeni wrote:
> > On 10/30/25 3:34 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> >> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >>
> >> No functional changes.
> >>
> >> Co-developed-by: Ilpo Järvinen <ij@kernel.org>
> >> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> >> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >> ---
> >>  include/linux/skbuff.h | 13 ++++++++++++-
> >>  1 file changed, 12 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 
> >> a7cc3d1f4fd1..74d6a209e203 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -671,7 +671,12 @@ enum {
> >>      /* This indicates the skb is from an untrusted source. */
> >>      SKB_GSO_DODGY = 1 << 1,
> >>
> >> -    /* This indicates the tcp segment has CWR set. */
> >> +    /* For Tx, this indicates the first TCP segment has CWR set, and any
> >> +     * subsequent segment in the same skb has CWR cleared. This cannot be
> >> +     * used on Rx, because the connection to which the segment belongs is
> >> +     * not tracked to use RFC3168 or Accurate ECN, and using RFC3168 ECN
> >> +     * offload may corrupt AccECN signal of AccECN segments.
> >> +     */
> >
> > The intended difference between RX and TX sounds bad to me; I think it 
> > conflicts with the basic GRO design goal of making aggregated and 
> > re-segmented traffic indistinguishable from the original stream. Also 
> > what about forwarded packet?
> 
> Uhm... I missed completely the point that SKB_GSO_TCP_ECN is TX path only, i.e. GRO never produces aggregated SKB_GSO_TCP_ECN packets. Except virtio_net uses it in the RX path ( virtio_net_hdr_to_skb ). Please clarify the statement accordingly.
> 
> /P
Hi Paolo,

Yes, SKB_GSO_ECN was set in RX path from patch bf296b125b21b8d558ceb6ec30bb4eba2730cd6b in tcp_gro_complete().
In patch 4e4f7cefb130af6aba6a393b2d13930b49390df9 (part of our first AccECN preparation patch series), it was changed into ACCECN to avoid corrupting CWR flag elsewhere.

And you are right that SKB_GSO_ECN is still been used for virtio and some drivers (drivers/net/ethernet/mellanox/mlx5/core/en_rx.c, ./drivers/net/ethernet/hisilicon/hns3/hns3_enet.c).
Therefore, we plan to replace SKB_GSO_ECN with SKB_GSO_ACCECN in the two upcoming patches following this patch series.
Following discussions with virtio-spec colleague (Parav Pandit in cc), it is suggested to fix this text first here before changing virtio-spec.

To clarify it, I would propose the following texts in next version for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN:

/* For Tx, this indicates the first TCP segment has CWR set, and any
 * subsequent segment in the same skb has CWR cleared. This is not
 * used on Rx except for virtio_net. However, because the connection
 * to which the segment belongs is not tracked to use RFC3168 or
 * Accurate ECN, and using RFC3168 ECN offload may corrupt AccECN
 * signal of AccECN segments. Therefore, this cannot be used on Rx.
 */


/* For TX, this indicates the TCP segment uses the CWR flag as part of
 * AccECN signal, and the CWR flag is not modified in the skb. This is
 * not used on Rx except for virtio_net. For RX, any CWR flagged segment
 * must use SKB_GSO_TCP_ACCECN to ensure the CWR flag is not cleared by
 * any RFC3168 ECN offload, and thus keeping AccECN signal of TCP segments.
 */


Chia-Yu


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 net-next 05/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
  2025-11-11 11:02     ` Chia-Yu Chang (Nokia)
@ 2025-11-12 14:50       ` Paolo Abeni
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Abeni @ 2025-11-12 14:50 UTC (permalink / raw)
  To: Chia-Yu Chang (Nokia), edumazet@google.com, parav@nvidia.com,
	linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
	dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
	netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
	kuba@kernel.org, stephen@networkplumber.org,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
	liuhangbin@gmail.com, shuah@kernel.org,
	linux-kselftest@vger.kernel.org, ij@kernel.org,
	ncardwell@google.com, Koen De Schepper (Nokia),
	g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
	Jason_Livingood@comcast.com, Vidhi Goel
  Cc: Olivier Tilmans (Nokia)

On 11/11/25 12:02 PM, Chia-Yu Chang (Nokia) wrote:
> This bit will be set by congestion control (TCP Prague, which will be submitted after AccECN patch series).
> 
> It is intended to use ECT-1 rather than ECT-0, and we were thinking this flag can be irrespective to AccECN negotiation.
> 
> Shall I put in the Prague patch series?

Yes, please!

/P


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2025-11-12 14:50 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-30 14:34 [PATCH v5 net-next 00/14] AccECN protocol case handling series chia-yu.chang
2025-10-30 14:34 ` [PATCH v5 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
2025-11-06 10:57   ` Paolo Abeni
2025-10-30 14:34 ` [PATCH v5 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
2025-11-06 11:01   ` Paolo Abeni
2025-11-06 11:08     ` Paolo Abeni
2025-10-30 14:34 ` [PATCH v5 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN chia-yu.chang
2025-11-06 11:06   ` Paolo Abeni
2025-11-06 11:26     ` Paolo Abeni
2025-11-11 11:02       ` Chia-Yu Chang (Nokia)
2025-10-30 14:34 ` [PATCH v5 net-next 04/14] selftests/net: gro: add self-test for TCP CWR flag chia-yu.chang
2025-10-30 14:34 ` [PATCH v5 net-next 05/14] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
2025-11-06 11:38   ` Paolo Abeni
2025-11-11 11:02     ` Chia-Yu Chang (Nokia)
2025-11-12 14:50       ` Paolo Abeni
2025-10-30 14:34 ` [PATCH v5 net-next 06/14] tcp: disable RFC3168 fallback identifier " chia-yu.chang
2025-11-06 11:42   ` Paolo Abeni
2025-10-30 14:34 ` [PATCH v5 net-next 07/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
2025-11-06 11:45   ` Paolo Abeni
2025-10-30 14:34 ` [PATCH v5 net-next 08/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
2025-11-06 11:47   ` Paolo Abeni
2025-10-30 14:34 ` [PATCH v5 net-next 09/14] tcp: move increment of num_retrans chia-yu.chang
2025-11-06 11:56   ` Paolo Abeni
2025-11-06 12:03     ` Paolo Abeni
2025-10-30 14:34 ` [PATCH v5 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
2025-11-06 12:07   ` Paolo Abeni
2025-11-11 11:02     ` Chia-Yu Chang (Nokia)
2025-10-30 14:34 ` [PATCH v5 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
2025-11-06 12:17   ` Paolo Abeni
2025-11-06 17:30     ` Chia-Yu Chang (Nokia)
2025-10-30 14:34 ` [PATCH v5 net-next 12/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
2025-11-06 12:18   ` Paolo Abeni
2025-10-30 14:34 ` [PATCH v5 net-next 13/14] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
2025-11-06 12:21   ` Paolo Abeni
2025-10-30 14:34 ` [PATCH v5 net-next 14/14] tcp: accecn: enable AccECN chia-yu.chang
2025-11-06 12:24   ` Paolo Abeni
2025-10-31  0:56 ` [PATCH v5 net-next 00/14] AccECN protocol case handling series Jakub Kicinski
2025-10-31  7:32   ` Chia-Yu Chang (Nokia)
2025-10-31 14:06     ` Jakub Kicinski
  -- strict thread matches above, loose matches on Subject: below --
2025-10-29  8:05 chia-yu.chang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).