* [PATCH v6 net-next 00/14] AccECN protocol case handling series
@ 2025-11-14 7:13 chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
` (13 more replies)
0 siblings, 14 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Hello,
Plesae find the v5 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best regards,
Chia-Yu
---
v6:
- Update comment in #3 to highlight RX path is only used for virtio-net (Paolo Abeni <pabeni@redhat.com>)
- Rename TCP_CONG_WANTS_ECT_1 to TCP_CONG_ECT_1_NEGOTIATION to distiguish from TCP_CONG_ECT_1_ESTABLISH (Paolo Abeni <pabeni@redhat.com>)
- Move TCP_CONG_ECT_1_ESTABLISH in #6 to latter patch series (Paolo Abeni <pabeni@redhat.com>)
- Add new synack_type instead of moving the increment of num_retran in #9 (Paolo Abeni <pabeni@redhat.com>)
- Use new synack_type TCP_SYNACK_RETRANS and num_retrans for SYN/ACK retx fallbackk for AccECN in #10 (Paolo Abeni <pabeni@redhat.com>)
- Do not cast const struct into non-const in #11, and set AccECN fail mode after tcp_rtx_synack() (Paolo Abeni <pabeni@redhat.com>)
v5:
- Move previous #11 in v4 in latter patch after discussion with RFC author.
- Add #3 to update the comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN. (Parav Pandit <parav@nvidia.com>)
- Add gro self-test for TCP CWR flag in #4. (Eric Dumazet <edumazet@google.com>)
- Add fixes: tag into #7 (Paolo Abeni <pabeni@redhat.com>)
- Update commit message of #8 and if condition check (Paolo Abeni <pabeni@redhat.com>)
- Add empty line between variable declarations and code in #13 (Paolo Abeni <pabeni@redhat.com>)
v4:
- Add previous #13 in v2 back after dicussion with the RFC author.
- Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option.
v3:
- Add additional min() check if pkts_acked_ewma is not initialized in #1. (Paolo Abeni <pabeni@redhat.com>)
- Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3. (Paolo Abeni <pabeni@redhat.com>)
- Add empty line between variable declarations and code in #4. (Paolo Abeni <pabeni@redhat.com>)
- Update commit message to fix old AccECN commits in #5. (Paolo Abeni <pabeni@redhat.com>)
- Remove unnecessary brackets in #10. (Paolo Abeni <pabeni@redhat.com>)
- Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2. (Paolo Abeni <pabeni@redhat.com>)
---
Chia-Yu Chang (12):
net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
selftests/net: gro: add self-test for TCP CWR flag
tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers
tcp: disable RFC3168 fallback identifier for CC modules
tcp: accecn: handle unexpected AccECN negotiation feedback
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
tcp: add TCP_SYNACK_RETRANS synack_type
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
SYN/ACK
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
tcp: accecn: fallback outgoing half link to non-AccECN
tcp: accecn: detect loss ACK w/ AccECN option and add
TCP_ACCECN_OPTION_PERSIST
tcp: accecn: enable AccECN
Ilpo Järvinen (2):
tcp: try to avoid safer when ACKs are thinned
gro: flushing when CWR is set negatively affects AccECN
Documentation/networking/ip-sysctl.rst | 4 +-
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/skbuff.h | 14 ++-
include/linux/tcp.h | 4 +-
include/net/inet_ecn.h | 20 +++-
include/net/tcp.h | 32 ++++++-
include/net/tcp_ecn.h | 92 ++++++++++++++-----
net/ipv4/inet_connection_sock.c | 4 +
net/ipv4/sysctl_net_ipv4.c | 4 +-
net/ipv4/tcp.c | 2 +
net/ipv4/tcp_cong.c | 5 +-
net/ipv4/tcp_input.c | 37 +++++++-
net/ipv4/tcp_minisocks.c | 46 +++++++---
net/ipv4/tcp_offload.c | 3 +-
net/ipv4/tcp_output.c | 32 ++++---
net/ipv4/tcp_timer.c | 3 +
tools/testing/selftests/net/gro.c | 80 +++++++++++-----
17 files changed, 295 insertions(+), 88 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 01/14] tcp: try to avoid safer when ACKs are thinned
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
` (12 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Ilpo Järvinen <ij@kernel.org>
Add newly acked pkts EWMA. When ACK thinning occurs, select
between safer and unsafe cep delta in AccECN processing based
on it. If the packets ACKed per ACK tends to be large, don't
conservatively assume ACE field overflow.
This patch uses the existing 2-byte holes in the rx group for new
u16 variables withtout creating more holes. Below are the pahole
outcomes before and after this patch:
[BEFORE THIS PATCH]
struct tcp_sock {
[...]
u32 delivered_ecn_bytes[3]; /* 2744 12 */
/* XXX 4 bytes hole, try to pack */
[...]
__cacheline_group_end__tcp_sock_write_rx[0]; /* 2816 0 */
[...]
/* size: 3264, cachelines: 51, members: 177 */
}
[AFTER THIS PATCH]
struct tcp_sock {
[...]
u32 delivered_ecn_bytes[3]; /* 2744 12 */
u16 pkts_acked_ewma; /* 2756 2 */
/* XXX 2 bytes hole, try to pack */
[...]
__cacheline_group_end__tcp_sock_write_rx[0]; /* 2816 0 */
[...]
/* size: 3264, cachelines: 51, members: 178 */
}
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
---
v3:
- Add additional min() check if pkts_acked_ewma is not initialized.
---
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/tcp.h | 1 +
net/ipv4/tcp.c | 2 ++
net/ipv4/tcp_input.c | 20 ++++++++++++++++++-
4 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
index 26f32dbcf6ec..563daea10d6c 100644
--- a/Documentation/networking/net_cachelines/tcp_sock.rst
+++ b/Documentation/networking/net_cachelines/tcp_sock.rst
@@ -105,6 +105,7 @@ u32 received_ce read_mostly read_w
u32[3] received_ecn_bytes read_mostly read_write
u8:4 received_ce_pending read_mostly read_write
u32[3] delivered_ecn_bytes read_write
+u16 pkts_acked_ewma read_write
u8:2 syn_ect_snt write_mostly read_write
u8:2 syn_ect_rcv read_mostly read_write
u8:2 accecn_minlen write_mostly read_write
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 20b8c6e21fef..683f38362977 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -345,6 +345,7 @@ struct tcp_sock {
u32 rate_interval_us; /* saved rate sample: time elapsed */
u32 rcv_rtt_last_tsecr;
u32 delivered_ecn_bytes[3];
+ u16 pkts_acked_ewma;/* Pkts acked EWMA for AccECN cep heuristic */
u64 first_tx_mstamp; /* start of window send phase */
u64 delivered_mstamp; /* time we reached "delivered" */
u64 bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index dee578aad690..ea28e6a73ec8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3420,6 +3420,7 @@ int tcp_disconnect(struct sock *sk, int flags)
tcp_accecn_init_counters(tp);
tp->prev_ecnfield = 0;
tp->accecn_opt_tstamp = 0;
+ tp->pkts_acked_ewma = 0;
if (icsk->icsk_ca_initialized && icsk->icsk_ca_ops->release)
icsk->icsk_ca_ops->release(sk);
memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
@@ -5193,6 +5194,7 @@ static void __init tcp_struct_check(void)
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, rate_interval_us);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, rcv_rtt_last_tsecr);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, delivered_ecn_bytes);
+ CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, pkts_acked_ewma);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, first_tx_mstamp);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, delivered_mstamp);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, bytes_acked);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9df5d7515605..eddd2e54d119 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -488,6 +488,10 @@ static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
tcp_count_delivered_ce(tp, delivered);
}
+#define PKTS_ACKED_WEIGHT 6
+#define PKTS_ACKED_PREC 6
+#define ACK_COMP_THRESH 4
+
/* Returns the ECN CE delta */
static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
u32 delivered_pkts, u32 delivered_bytes,
@@ -499,6 +503,7 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
u32 delta, safe_delta, d_ceb;
bool opt_deltas_valid;
u32 corrected_ace;
+ u32 ewma;
/* Reordered ACK or uncertain due to lack of data to send and ts */
if (!(flag & (FLAG_FORWARD_PROGRESS | FLAG_TS_PROGRESS)))
@@ -507,6 +512,18 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
opt_deltas_valid = tcp_accecn_process_option(tp, skb,
delivered_bytes, flag);
+ if (delivered_pkts) {
+ if (!tp->pkts_acked_ewma) {
+ ewma = delivered_pkts << PKTS_ACKED_PREC;
+ } else {
+ ewma = tp->pkts_acked_ewma;
+ ewma = (((ewma << PKTS_ACKED_WEIGHT) - ewma) +
+ (delivered_pkts << PKTS_ACKED_PREC)) >>
+ PKTS_ACKED_WEIGHT;
+ }
+ tp->pkts_acked_ewma = min_t(u32, ewma, 0xFFFFU);
+ }
+
if (!(flag & FLAG_SLOWPATH)) {
/* AccECN counter might overflow on large ACKs */
if (delivered_pkts <= TCP_ACCECN_CEP_ACE_MASK)
@@ -555,7 +572,8 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
if (d_ceb <
safe_delta * tp->mss_cache >> TCP_ACCECN_SAFETY_SHIFT)
return delta;
- }
+ } else if (tp->pkts_acked_ewma > (ACK_COMP_THRESH << PKTS_ACKED_PREC))
+ return delta;
return safe_delta;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN chia-yu.chang
` (11 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Ilpo Järvinen <ij@kernel.org>
As AccECN may keep CWR bit asserted due to different
interpretation of the bit, flushing with GRO because of
CWR may effectively disable GRO until AccECN counter
field changes such that CWR-bit becomes 0.
There is no harm done from not immediately forwarding the
CWR'ed segment with RFC3168 ECN.
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
net/ipv4/tcp_offload.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 2cb93da93abc..fcbf4148919c 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -330,8 +330,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
goto out_check_final;
th2 = tcp_hdr(p);
- flush = (__force int)(flags & TCP_FLAG_CWR);
- flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
+ flush = (__force int)((flags ^ tcp_flag_word(th2)) &
~(TCP_FLAG_FIN | TCP_FLAG_PSH));
flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
for (i = sizeof(*th); i < thlen; i += 4)
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-18 12:02 ` Paolo Abeni
2025-11-14 7:13 ` [PATCH v6 net-next 04/14] selftests/net: gro: add self-test for TCP CWR flag chia-yu.chang
` (10 subsequent siblings)
13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
No functional changes.
Co-developed-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
v6:
- Update comments.
---
include/linux/skbuff.h | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ff90281ddf90..e09455cee8e3 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -671,7 +671,13 @@ enum {
/* This indicates the skb is from an untrusted source. */
SKB_GSO_DODGY = 1 << 1,
- /* This indicates the tcp segment has CWR set. */
+ /* For Tx, this indicates the first TCP segment has CWR set, and any
+ * subsequent segment in the same skb has CWR cleared. This is not
+ * used on Rx except for virtio_net. However, because the connection
+ * to which the segment belongs is not tracked to use RFC3168 or
+ * Accurate ECN, and using RFC3168 ECN offload may corrupt AccECN
+ * signal of AccECN segments. Therefore, this cannot be used on Rx.
+ */
SKB_GSO_TCP_ECN = 1 << 2,
__SKB_GSO_TCP_FIXEDID = 1 << 3,
@@ -706,6 +712,12 @@ enum {
SKB_GSO_FRAGLIST = 1 << 18,
+ /* For TX, this indicates the TCP segment uses the CWR flag as part of
+ * AccECN signal, and the CWR flag is not modified in the skb. This is
+ * not used on Rx except for virtio_net. For RX, any CWR flagged segment
+ * must use SKB_GSO_TCP_ACCECN to ensure CWR flag is not cleared by any
+ * RFC3168 ECN offload, and thus keeping AccECN signal of TCP segments.
+ */
SKB_GSO_TCP_ACCECN = 1 << 19,
/* These indirectly map onto the same netdev feature.
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 04/14] selftests/net: gro: add self-test for TCP CWR flag
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (2 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-18 12:14 ` Paolo Abeni
2025-11-14 7:13 ` [PATCH v6 net-next 05/14] tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers chia-yu.chang
` (9 subsequent siblings)
13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Currently, GRO does not flush packets when the CWR bit is set.
A corresponding self-test is being added, in which the CWR flag
is set for two consecutive packets, but the first packet with the
CWR flag set will not be flushed immediately.
+===================+==========+===============+===========+
| Packet id | CWR flag | Payload | Flushing? |
+===================+==========+===============+===========+
| 0 | 0 | PAYLOAD_LEN | 0 |
| ... | 0 | PAYLOAD_LEN | 1 |
+-------------------+----------+---------------+-----------+
| NUM_PACKETS/2 - 1 | 1 | payload_len | 0 |
| NUM_PACKETS/2 | 1 | payload_len | 1 |
+-------------------+----------+---------------+-----------+
| ... | 0 | PAYLOAD_LEN | 0 |
| NUM_PACKETS | 0 | PAYLOAD_LEN | 1 |
+===================+==========+===============+===========+
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
tools/testing/selftests/net/gro.c | 80 ++++++++++++++++++++++---------
1 file changed, 57 insertions(+), 23 deletions(-)
diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c
index cfc39f70635d..50bf1b96ea9d 100644
--- a/tools/testing/selftests/net/gro.c
+++ b/tools/testing/selftests/net/gro.c
@@ -11,8 +11,8 @@
* 2.ack
* Pure ACK does not coalesce.
* 3.flags
- * Specific test cases: no packets with PSH, SYN, URG, RST set will
- * be coalesced.
+ * Specific test cases: no packets with PSH, SYN, URG, RST, CWR set
+ * will be coalesced.
* 4.tcp
* Packets with incorrect checksum, non-consecutive seqno and
* different TCP header options shouldn't coalesce. Nit: given that
@@ -332,32 +332,57 @@ static void create_packet(void *buf, int seq_offset, int ack_offset,
fill_datalinklayer(buf);
}
-/* send one extra flag, not first and not last pkt */
-static void send_flags(int fd, struct sockaddr_ll *daddr, int psh, int syn,
- int rst, int urg)
+#ifndef TH_CWR
+#define TH_CWR 0x80
+#endif
+static void set_flags(struct tcphdr *tcph, int payload_len, int psh, int syn,
+ int rst, int urg, int cwr)
{
- static char flag_buf[MAX_HDR_LEN + PAYLOAD_LEN];
- static char buf[MAX_HDR_LEN + PAYLOAD_LEN];
- int payload_len, pkt_size, flag, i;
- struct tcphdr *tcph;
-
- payload_len = PAYLOAD_LEN * psh;
- pkt_size = total_hdr_len + payload_len;
- flag = NUM_PACKETS / 2;
-
- create_packet(flag_buf, flag * payload_len, 0, payload_len, 0);
-
- tcph = (struct tcphdr *)(flag_buf + tcp_offset);
tcph->psh = psh;
tcph->syn = syn;
tcph->rst = rst;
tcph->urg = urg;
+ if (cwr)
+ tcph->th_flags |= TH_CWR;
+ else
+ tcph->th_flags &= ~TH_CWR;
tcph->check = 0;
tcph->check = tcp_checksum(tcph, payload_len);
+}
+
+/* send extra flags of the (NUM_PACKETS / 2) and (NUM_PACKETS / 2 - 1)
+ * pkts, not first and not last pkt
+ */
+static void send_flags(int fd, struct sockaddr_ll *daddr, int psh, int syn,
+ int rst, int urg, int cwr)
+{
+ static char flag_buf[2][MAX_HDR_LEN + PAYLOAD_LEN];
+ static char buf[MAX_HDR_LEN + PAYLOAD_LEN];
+ int payload_len, pkt_size, i;
+ struct tcphdr *tcph;
+ int flag[2];
+
+ payload_len = PAYLOAD_LEN * (psh || cwr);
+ pkt_size = total_hdr_len + payload_len;
+ flag[0] = NUM_PACKETS / 2;
+ flag[1] = NUM_PACKETS / 2 - 1;
+
+ // Create and configure packets with flags
+ for (i = 0; i < 2; i++) {
+ if (flag[i] > 0) {
+ create_packet(flag_buf[i], flag[i] * payload_len, 0,
+ payload_len, 0);
+ tcph = (struct tcphdr *)(flag_buf[i] + tcp_offset);
+ set_flags(tcph, payload_len, psh, syn, rst, urg, cwr);
+ }
+ }
for (i = 0; i < NUM_PACKETS + 1; i++) {
- if (i == flag) {
- write_packet(fd, flag_buf, pkt_size, daddr);
+ if (i == flag[0]) {
+ write_packet(fd, flag_buf[0], pkt_size, daddr);
+ continue;
+ } else if (i == flag[1] && cwr) {
+ write_packet(fd, flag_buf[1], pkt_size, daddr);
continue;
}
create_packet(buf, i * PAYLOAD_LEN, 0, PAYLOAD_LEN, 0);
@@ -1020,16 +1045,19 @@ static void gro_sender(void)
send_ack(txfd, &daddr);
write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
} else if (strcmp(testname, "flags") == 0) {
- send_flags(txfd, &daddr, 1, 0, 0, 0);
+ send_flags(txfd, &daddr, 1, 0, 0, 0, 0);
write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
- send_flags(txfd, &daddr, 0, 1, 0, 0);
+ send_flags(txfd, &daddr, 0, 1, 0, 0, 0);
write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
- send_flags(txfd, &daddr, 0, 0, 1, 0);
+ send_flags(txfd, &daddr, 0, 0, 1, 0, 0);
write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
- send_flags(txfd, &daddr, 0, 0, 0, 1);
+ send_flags(txfd, &daddr, 0, 0, 0, 1, 0);
+ write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
+
+ send_flags(txfd, &daddr, 0, 0, 0, 0, 1);
write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
} else if (strcmp(testname, "tcp") == 0) {
send_changed_checksum(txfd, &daddr);
@@ -1163,6 +1191,12 @@ static void gro_receiver(void)
printf("urg flag ends coalescing: ");
check_recv_pkts(rxfd, correct_payload, 3);
+
+ correct_payload[0] = PAYLOAD_LEN;
+ correct_payload[1] = PAYLOAD_LEN * 2;
+ correct_payload[2] = PAYLOAD_LEN * 2;
+ printf("cwr flag ends coalescing: ");
+ check_recv_pkts(rxfd, correct_payload, 3);
} else if (strcmp(testname, "tcp") == 0) {
correct_payload[0] = PAYLOAD_LEN;
correct_payload[1] = PAYLOAD_LEN;
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 05/14] tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (3 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 04/14] selftests/net: gro: add self-test for TCP CWR flag chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-18 12:30 ` Paolo Abeni
2025-11-14 7:13 ` [PATCH v6 net-next 06/14] tcp: disable RFC3168 fallback identifier for CC modules chia-yu.chang
` (8 subsequent siblings)
13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang, Olivier Tilmans
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Two CA module flags are added in this patch related to AccECN negotiation.
First, a new CA module flag (TCP_CONG_NEEDS_ACCECN) defines that the CA
expects to negotiate AccECN functionality using the ECE, CWR and AE flags
in the TCP header.
Second, during ECN negotiation, ECT(0) in the IP header is used. This patch
enables CA to control whether ECT(0) or ECT(1) should be used on a per-segment
basis. A new flag (TCP_CONG_ECT_1_NEGOTIATION) defines the expected ECT value
in the IP header by the CA when not-yet initialized for the connection.
The detailed AccECN negotiaotn during the 3WHS can be found in the AccECN spec:
https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
v6:
- Rename TCP_CONG_WANTS_ECT_1 to TCP_CONG_ECT_1_NEGOTIATION to distinguish
it from TCP_CONG_ECT_1_ESTABLISH.
- Move TCP_CONG_ECT_1_ESTABLISH to latter TCP Prague patch series.
v3:
- Change TCP_CONG_WANTS_ECT_1 into individual flag.
- Add helper function INET_ECN_xmit_wants_ect_1().
---
include/net/inet_ecn.h | 20 +++++++++++++++++---
include/net/tcp.h | 21 ++++++++++++++++++++-
include/net/tcp_ecn.h | 13 ++++++++++---
net/ipv4/tcp_cong.c | 5 +++--
net/ipv4/tcp_input.c | 3 ++-
5 files changed, 52 insertions(+), 10 deletions(-)
diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index ea32393464a2..827b87a95dab 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -51,11 +51,25 @@ static inline __u8 INET_ECN_encapsulate(__u8 outer, __u8 inner)
return outer;
}
+/* Apply either ECT(0) or ECT(1) */
+static inline void __INET_ECN_xmit(struct sock *sk, bool use_ect_1)
+{
+ __u8 ect = use_ect_1 ? INET_ECN_ECT_1 : INET_ECN_ECT_0;
+
+ /* Mask the complete byte in case the connection alternates between
+ * ECT(0) and ECT(1).
+ */
+ inet_sk(sk)->tos &= ~INET_ECN_MASK;
+ inet_sk(sk)->tos |= ect;
+ if (inet6_sk(sk)) {
+ inet6_sk(sk)->tclass &= ~INET_ECN_MASK;
+ inet6_sk(sk)->tclass |= ect;
+ }
+}
+
static inline void INET_ECN_xmit(struct sock *sk)
{
- inet_sk(sk)->tos |= INET_ECN_ECT_0;
- if (inet6_sk(sk) != NULL)
- inet6_sk(sk)->tclass |= INET_ECN_ECT_0;
+ __INET_ECN_xmit(sk, false);
}
static inline void INET_ECN_dontxmit(struct sock *sk)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 4833ec7903ec..2e1a5b3d1c5c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1203,7 +1203,12 @@ enum tcp_ca_ack_event_flags {
#define TCP_CONG_NON_RESTRICTED BIT(0)
/* Requires ECN/ECT set on all packets */
#define TCP_CONG_NEEDS_ECN BIT(1)
-#define TCP_CONG_MASK (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN)
+/* Require successfully negotiated AccECN capability */
+#define TCP_CONG_NEEDS_ACCECN BIT(2)
+/* Use ECT(1) instead of ECT(0) while the CA is uninitialized */
+#define TCP_CONG_ECT_1_NEGOTIATION BIT(3)
+#define TCP_CONG_MASK (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN | \
+ TCP_CONG_NEEDS_ACCECN | TCP_CONG_ECT_1_NEGOTIATION)
union tcp_cc_info;
@@ -1335,6 +1340,20 @@ static inline bool tcp_ca_needs_ecn(const struct sock *sk)
return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ECN;
}
+static inline bool tcp_ca_needs_accecn(const struct sock *sk)
+{
+ const struct inet_connection_sock *icsk = inet_csk(sk);
+
+ return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ACCECN;
+}
+
+static inline bool tcp_ca_ect_1_negotiation(const struct sock *sk)
+{
+ const struct inet_connection_sock *icsk = inet_csk(sk);
+
+ return icsk->icsk_ca_ops->flags & TCP_CONG_ECT_1_NEGOTIATION;
+}
+
static inline void tcp_ca_event(struct sock *sk, const enum tcp_ca_event event)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index f13e5cd2b1ac..fdde1c342b35 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -31,6 +31,12 @@ enum tcp_accecn_option {
TCP_ACCECN_OPTION_FULL = 2,
};
+/* Apply either ECT(0) or ECT(1) based on TCP_CONG_ECT_1_NEGOTIATION flag */
+static inline void INET_ECN_xmit_ect_1_negotiation(struct sock *sk)
+{
+ __INET_ECN_xmit(sk, tcp_ca_ect_1_negotiation(sk));
+}
+
static inline void tcp_ecn_queue_cwr(struct tcp_sock *tp)
{
/* Do not set CWR if in AccECN mode! */
@@ -561,7 +567,7 @@ static inline void tcp_ecn_send_synack(struct sock *sk, struct sk_buff *skb)
TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ECE;
else if (tcp_ca_needs_ecn(sk) ||
tcp_bpf_ca_needs_ecn(sk))
- INET_ECN_xmit(sk);
+ INET_ECN_xmit_ect_1_negotiation(sk);
if (tp->ecn_flags & TCP_ECN_MODE_ACCECN) {
TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ACE;
@@ -579,7 +585,8 @@ static inline void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
bool use_ecn, use_accecn;
u8 tcp_ecn = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn);
- use_accecn = tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ACCECN;
+ use_accecn = tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ACCECN ||
+ tcp_ca_needs_accecn(sk);
use_ecn = tcp_ecn == TCP_ECN_IN_ECN_OUT_ECN ||
tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ECN ||
tcp_ca_needs_ecn(sk) || bpf_needs_ecn || use_accecn;
@@ -595,7 +602,7 @@ static inline void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
if (use_ecn) {
if (tcp_ca_needs_ecn(sk) || bpf_needs_ecn)
- INET_ECN_xmit(sk);
+ INET_ECN_xmit_ect_1_negotiation(sk);
TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR;
if (use_accecn) {
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index df758adbb445..e9f6c77e0631 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -16,6 +16,7 @@
#include <linux/gfp.h>
#include <linux/jhash.h>
#include <net/tcp.h>
+#include <net/tcp_ecn.h>
#include <trace/events/tcp.h>
static DEFINE_SPINLOCK(tcp_cong_list_lock);
@@ -227,7 +228,7 @@ void tcp_assign_congestion_control(struct sock *sk)
memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
if (ca->flags & TCP_CONG_NEEDS_ECN)
- INET_ECN_xmit(sk);
+ INET_ECN_xmit_ect_1_negotiation(sk);
else
INET_ECN_dontxmit(sk);
}
@@ -257,7 +258,7 @@ static void tcp_reinit_congestion_control(struct sock *sk,
memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
if (ca->flags & TCP_CONG_NEEDS_ECN)
- INET_ECN_xmit(sk);
+ INET_ECN_xmit_ect_1_negotiation(sk);
else
INET_ECN_dontxmit(sk);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index eddd2e54d119..3fa4a70d429f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -7256,7 +7256,8 @@ static void tcp_ecn_create_request(struct request_sock *req,
u32 ecn_ok_dst;
if (tcp_accecn_syn_requested(th) &&
- READ_ONCE(net->ipv4.sysctl_tcp_ecn) >= 3) {
+ (READ_ONCE(net->ipv4.sysctl_tcp_ecn) >= 3 ||
+ tcp_ca_needs_accecn(listen_sk))) {
inet_rsk(req)->ecn_ok = 1;
tcp_rsk(req)->accecn_ok = 1;
tcp_rsk(req)->syn_ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 06/14] tcp: disable RFC3168 fallback identifier for CC modules
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (4 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 05/14] tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 07/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
` (7 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
When AccECN is not successfully negociated for a TCP flow, it defaults
fallback to classic ECN (RFC3168). However, L4S service will fallback
to non-ECN.
This patch enables congestion control module to control whether it
should not fallback to classic ECN after unsuccessful AccECN negotiation.
A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this
behavior expected by the CA.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
---
v3:
- Add empty line between variable declarations and code.
---
include/net/tcp.h | 12 +++++++++++-
include/net/tcp_ecn.h | 11 ++++++++---
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_minisocks.c | 7 ++++---
4 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2e1a5b3d1c5c..a8eb67ff1568 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1207,8 +1207,11 @@ enum tcp_ca_ack_event_flags {
#define TCP_CONG_NEEDS_ACCECN BIT(2)
/* Use ECT(1) instead of ECT(0) while the CA is uninitialized */
#define TCP_CONG_ECT_1_NEGOTIATION BIT(3)
+/* Cannot fallback to RFC3168 during AccECN negotiation */
+#define TCP_CONG_NO_FALLBACK_RFC3168 BIT(4)
#define TCP_CONG_MASK (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN | \
- TCP_CONG_NEEDS_ACCECN | TCP_CONG_ECT_1_NEGOTIATION)
+ TCP_CONG_NEEDS_ACCECN | TCP_CONG_ECT_1_NEGOTIATION | \
+ TCP_CONG_NO_FALLBACK_RFC3168)
union tcp_cc_info;
@@ -1354,6 +1357,13 @@ static inline bool tcp_ca_ect_1_negotiation(const struct sock *sk)
return icsk->icsk_ca_ops->flags & TCP_CONG_ECT_1_NEGOTIATION;
}
+static inline bool tcp_ca_no_fallback_rfc3168(const struct sock *sk)
+{
+ const struct inet_connection_sock *icsk = inet_csk(sk);
+
+ return icsk->icsk_ca_ops->flags & TCP_CONG_NO_FALLBACK_RFC3168;
+}
+
static inline void tcp_ca_event(struct sock *sk, const enum tcp_ca_event event)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index fdde1c342b35..2e1637edf1d3 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -507,7 +507,9 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
* | ECN | AccECN | 0 0 1 | Classic ECN |
* +========+========+============+=============+
*/
- if (tcp_ecn_mode_pending(tp))
+ if (tcp_ca_no_fallback_rfc3168(sk))
+ tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
+ else if (tcp_ecn_mode_pending(tp))
/* Downgrade from AccECN, or requested initially */
tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
break;
@@ -531,9 +533,11 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
}
}
-static inline void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th,
+static inline void tcp_ecn_rcv_syn(struct sock *sk, const struct tcphdr *th,
const struct sk_buff *skb)
{
+ struct tcp_sock *tp = tcp_sk(sk);
+
if (tcp_ecn_mode_pending(tp)) {
if (!tcp_accecn_syn_requested(th)) {
/* Downgrade to classic ECN feedback */
@@ -545,7 +549,8 @@ static inline void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th,
tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
}
}
- if (tcp_ecn_mode_rfc3168(tp) && (!th->ece || !th->cwr))
+ if (tcp_ecn_mode_rfc3168(tp) &&
+ (!th->ece || !th->cwr || tcp_ca_no_fallback_rfc3168(sk)))
tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3fa4a70d429f..1f354d3cf26a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6851,7 +6851,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
tp->snd_wl1 = TCP_SKB_CB(skb)->seq;
tp->max_window = tp->snd_wnd;
- tcp_ecn_rcv_syn(tp, th, skb);
+ tcp_ecn_rcv_syn(sk, th, skb);
tcp_mtup_init(sk);
tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index d8f4d813e8dd..545d3ba0adcf 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -486,9 +486,10 @@ static void tcp_ecn_openreq_child(struct sock *sk,
tp->accecn_opt_demand = 1;
tcp_ecn_received_counters_payload(sk, skb);
} else {
- tcp_ecn_mode_set(tp, inet_rsk(req)->ecn_ok ?
- TCP_ECN_MODE_RFC3168 :
- TCP_ECN_DISABLED);
+ if (inet_rsk(req)->ecn_ok && !tcp_ca_no_fallback_rfc3168(sk))
+ tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
+ else
+ tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 07/14] tcp: accecn: handle unexpected AccECN negotiation feedback
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (5 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 06/14] tcp: disable RFC3168 fallback identifier for CC modules chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 08/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
` (6 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
According to Section 3.1.2 of AccECN spec (RFC9768), if a TCP Client
has sent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1)
then receives a SYN/ACK with the currently reserved combination
(AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a
combination, the Client MUST enable AccECN mode as if the SYN/ACK
confirmed that the Server supported AccECN and as if it fed back that
the IP-ECN field on the SYN had arrived unchanged.
Fixes: 3cae34274c79 ("tcp: accecn: AccECN negotiation").
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
---
v5:
- Add "Fixes" tag.
v3:
- Update commit message to fix old AccECN commits.
---
include/net/tcp_ecn.h | 44 ++++++++++++++++++++++++++++++-------------
1 file changed, 31 insertions(+), 13 deletions(-)
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 2e1637edf1d3..a709fb1756eb 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -473,6 +473,26 @@ static inline u8 tcp_accecn_option_init(const struct sk_buff *skb,
return TCP_ACCECN_OPT_COUNTER_SEEN;
}
+static inline void tcp_ecn_rcv_synack_accecn(struct sock *sk,
+ const struct sk_buff *skb, u8 dsf)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
+ tp->syn_ect_rcv = dsf & INET_ECN_MASK;
+ /* Demand Accurate ECN option in response to the SYN on the SYN/ACK
+ * and the TCP server will try to send one more packet with an AccECN
+ * Option at a later point during the connection.
+ */
+ if (tp->rx_opt.accecn &&
+ tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
+ u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
+
+ tcp_accecn_saw_opt_fail_recv(tp, saw_opt);
+ tp->accecn_opt_demand = 2;
+ }
+}
+
/* See Table 2 of the AccECN draft */
static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb,
const struct tcphdr *th, u8 ip_dsfield)
@@ -495,13 +515,11 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
break;
case 0x1:
- case 0x5:
/* +========+========+============+=============+
* | A | B | SYN/ACK | Feedback |
* | | | B->A | Mode of A |
* | | | AE CWR ECE | |
* +========+========+============+=============+
- * | AccECN | Nonce | 1 0 1 | (Reserved) |
* | AccECN | ECN | 0 0 1 | Classic ECN |
* | Nonce | AccECN | 0 0 1 | Classic ECN |
* | ECN | AccECN | 0 0 1 | Classic ECN |
@@ -509,20 +527,20 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
*/
if (tcp_ca_no_fallback_rfc3168(sk))
tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
- else if (tcp_ecn_mode_pending(tp))
- /* Downgrade from AccECN, or requested initially */
+ else
tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
break;
- default:
- tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
- tp->syn_ect_rcv = ip_dsfield & INET_ECN_MASK;
- if (tp->rx_opt.accecn &&
- tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
- u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
-
- tcp_accecn_saw_opt_fail_recv(tp, saw_opt);
- tp->accecn_opt_demand = 2;
+ case 0x5:
+ if (tcp_ecn_mode_pending(tp)) {
+ tcp_ecn_rcv_synack_accecn(sk, skb, ip_dsfield);
+ if (INET_ECN_is_ce(ip_dsfield)) {
+ tp->received_ce++;
+ tp->received_ce_pending++;
+ }
}
+ break;
+ default:
+ tcp_ecn_rcv_synack_accecn(sk, skb, ip_dsfield);
if (INET_ECN_is_ce(ip_dsfield) &&
tcp_accecn_validate_syn_feedback(sk, ace,
tp->syn_ect_snt)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 08/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (6 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 07/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 09/14] tcp: add TCP_SYNACK_RETRANS synack_type chia-yu.chang
` (5 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Based on AccECN spec (RFC9768), if the sender of an AccECN SYN
(the TCP Client) times out before receiving the SYN/ACK, it SHOULD
attempt to negotiate the use of AccECN at least one more time by
continuing to set all three TCP ECN flags (AE,CWR,ECE) = (1,1,1) on
the first retransmitted SYN (using the usual retransmission time-outs).
If this first retransmission also fails to be acknowledged, in
deployment scenarios where AccECN path traversal might be problematic,
the TCP Client SHOULD send subsequent retransmissions of the SYN with
the three TCP-ECN flags cleared (AE,CWR,ECE) = (0,0,0).
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
---
v5:
- Update commit message and the if condition statement.
---
net/ipv4/tcp_output.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 479afb714bdf..8039c726d235 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3571,12 +3571,15 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
tcp_retrans_try_collapse(sk, skb, avail_wnd);
}
- /* RFC3168, section 6.1.1.1. ECN fallback
- * As AccECN uses the same SYN flags (+ AE), this check covers both
- * cases.
- */
- if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) == TCPHDR_SYN_ECN)
- tcp_ecn_clear_syn(sk, skb);
+ if (!tcp_ecn_mode_pending(tp) || icsk->icsk_retransmits > 1) {
+ /* RFC3168, section 6.1.1.1. ECN fallback
+ * As AccECN uses the same SYN flags (+ AE), this check
+ * covers both cases.
+ */
+ if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) ==
+ TCPHDR_SYN_ECN)
+ tcp_ecn_clear_syn(sk, skb);
+ }
/* Update global and local TCP statistics. */
segs = tcp_skb_pcount(skb);
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 09/14] tcp: add TCP_SYNACK_RETRANS synack_type
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (7 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 08/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-18 12:32 ` Paolo Abeni
2025-11-14 7:13 ` [PATCH v6 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
` (4 subsequent siblings)
13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Before this patch, retransmitted SYN/ACK did not have a specific synack_type;
however, the upcoming patch needs to distinguish between retransmitted and
non-retransmitted SYN/ACK for AccECN negotiation to transmit the fallback
SYN/ACK during AccECN negotiation. Therefore, this patch introduces a new
synack_type (TCP_SYNACK_RETRANS).
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
v6:
- Add new synack_type instead of moving the increment of num_retran.
---
include/net/tcp.h | 1 +
net/ipv4/tcp_output.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index a8eb67ff1568..510d2e595b08 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -541,6 +541,7 @@ enum tcp_synack_type {
TCP_SYNACK_NORMAL,
TCP_SYNACK_FASTOPEN,
TCP_SYNACK_COOKIE,
+ TCP_SYNACK_RETRANS,
};
struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
struct request_sock *req,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8039c726d235..5fa14a73d03f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3921,6 +3921,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
switch (synack_type) {
case TCP_SYNACK_NORMAL:
+ case TCP_SYNACK_RETRANS:
skb_set_owner_edemux(skb, req_to_sk(req));
break;
case TCP_SYNACK_COOKIE:
@@ -4606,7 +4607,7 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
/* Paired with WRITE_ONCE() in sock_setsockopt() */
if (READ_ONCE(sk->sk_txrehash) == SOCK_TXREHASH_ENABLED)
WRITE_ONCE(tcp_rsk(req)->txhash, net_tx_rndhash());
- res = af_ops->send_synack(sk, NULL, &fl, req, NULL, TCP_SYNACK_NORMAL,
+ res = af_ops->send_synack(sk, NULL, &fl, req, NULL, TCP_SYNACK_RETRANS,
NULL);
if (!res) {
TCP_INC_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS);
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (8 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 09/14] tcp: add TCP_SYNACK_RETRANS synack_type chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-18 13:58 ` Paolo Abeni
2025-11-14 7:13 ` [PATCH v6 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
` (3 subsequent siblings)
13 siblings, 1 reply; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
For Accurate ECN, the first SYN/ACK sent by the TCP server shall set the
ACE flag (see Table 1 of RFC9768) and the AccECN option to complete the
capability negotiation. However, if the TCP server needs to retransmit such
a SYN/ACK (for example, because it did not receive an ACK acknowledging its
SYN/ACK, or received a second SYN requesting AccECN support), the TCP server
retransmits the SYN/ACK without the AccECN option. This is because the
SYN/ACK may be lost due to congestion, or a middlebox may block the AccECN
option. Furthermore, if this retransmission also times out, to expedite
connection establishment, the TCP server should retransmit the SYN/ACK with
(AE,CWR,ECE) = (0,0,0) and without the AccECN option, while maintaining
AccECN feedback mode.
This complies with Section 3.2.3.2.2 of the AccECN specification (RFC9768).
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
v6:
- Use new synack_type TCP_SYNACK_RETRANS and num_retrans.
---
include/net/tcp_ecn.h | 20 ++++++++++++++------
net/ipv4/tcp_output.c | 4 ++--
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index a709fb1756eb..57841dfa6705 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -649,12 +649,20 @@ static inline void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb)
}
static inline void
-tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
-{
- if (tcp_rsk(req)->accecn_ok)
- tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
- else if (inet_rsk(req)->ecn_ok)
- th->ece = 1;
+tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th,
+ enum tcp_synack_type synack_type)
+{
+ // num_retrans will be incresaed after tcp_ecn_make_synack()
+ if (!req->num_retrans) {
+ if (tcp_rsk(req)->accecn_ok)
+ tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
+ else if (inet_rsk(req)->ecn_ok)
+ th->ece = 1;
+ } else if (tcp_rsk(req)->accecn_ok) {
+ th->ae = 0;
+ th->cwr = 0;
+ th->ece = 0;
+ }
}
static inline bool tcp_accecn_option_beacon_check(const struct sock *sk)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5fa14a73d03f..c6754854ad09 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1106,7 +1106,7 @@ static unsigned int tcp_synack_options(const struct sock *sk,
if (treq->accecn_ok &&
READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn_option) &&
- req->num_timeout < 1 && remaining >= TCPOLEN_ACCECN_BASE) {
+ synack_type != TCP_SYNACK_RETRANS && remaining >= TCPOLEN_ACCECN_BASE) {
opts->use_synack_ecn_bytes = 1;
remaining -= tcp_options_fit_accecn(opts, 0, remaining);
}
@@ -4004,7 +4004,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
memset(th, 0, sizeof(struct tcphdr));
th->syn = 1;
th->ack = 1;
- tcp_ecn_make_synack(req, th);
+ tcp_ecn_make_synack(req, th, synack_type);
th->source = htons(ireq->ir_num);
th->dest = ireq->ir_rmt_port;
skb->mark = ireq->ir_mark;
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (9 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 12/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
` (2 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Based on specification:
https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
Based on Section 3.1.5 of AccECN spec (RFC9768), a TCP Server in
AccECN mode MUST NOT set ECT on any packet for the rest of the connection,
if it has received or sent at least one valid SYN or Acceptable SYN/ACK
with (AE,CWR,ECE) = (0,0,0) during the handshake.
In addition, a host in AccECN mode that is feeding back the IP-ECN
field on a SYN or SYN/ACK MUST feed back the IP-ECN field on the
latest valid SYN or acceptable SYN/ACK to arrive.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
v6:
- Do not cast const struct request_sock into struct request_sock
- Set tcp_accecn_fail_mode after calling tcp_rtx_synack().
---
net/ipv4/inet_connection_sock.c | 4 ++++
net/ipv4/tcp_input.c | 2 ++
net/ipv4/tcp_minisocks.c | 39 ++++++++++++++++++++++++---------
net/ipv4/tcp_output.c | 3 ++-
net/ipv4/tcp_timer.c | 3 +++
5 files changed, 40 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index b4eae731c9ba..ea5fdbf05b05 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -20,6 +20,7 @@
#include <net/tcp_states.h>
#include <net/xfrm.h>
#include <net/tcp.h>
+#include <net/tcp_ecn.h>
#include <net/sock_reuseport.h>
#include <net/addrconf.h>
@@ -1103,6 +1104,9 @@ static void reqsk_timer_handler(struct timer_list *t)
(!resend ||
!tcp_rtx_synack(sk_listener, req) ||
inet_rsk(req)->acked)) {
+ if (req->num_retrans > 1 && tcp_rsk(req)->accecn_ok)
+ tcp_accecn_fail_mode_set(tcp_sk(sk_listener),
+ TCP_ACCECN_ACE_FAIL_SEND);
if (req->num_timeout++ == 0)
atomic_dec(&queue->young);
mod_timer(&req->rsk_timer, jiffies + tcp_reqsk_timeout(req));
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1f354d3cf26a..7638aaa8befb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6230,6 +6230,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
if (th->syn) {
if (tcp_ecn_mode_accecn(tp)) {
accecn_reflector = true;
+ tp->syn_ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
+ INET_ECN_MASK;
if (tp->rx_opt.accecn &&
tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 545d3ba0adcf..e6cbd4317fd5 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -750,16 +750,35 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
*/
if (!tcp_oow_rate_limited(sock_net(sk), skb,
LINUX_MIB_TCPACKSKIPPEDSYNRECV,
- &tcp_rsk(req)->last_oow_ack_time) &&
-
- !tcp_rtx_synack(sk, req)) {
- unsigned long expires = jiffies;
-
- expires += tcp_reqsk_timeout(req);
- if (!fastopen)
- mod_timer_pending(&req->rsk_timer, expires);
- else
- req->rsk_timer.expires = expires;
+ &tcp_rsk(req)->last_oow_ack_time)) {
+ if (tcp_rsk(req)->accecn_ok) {
+ u8 ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
+ INET_ECN_MASK;
+
+ tcp_rsk(req)->syn_ect_rcv = ect_rcv;
+ if (tcp_accecn_ace(tcp_hdr(skb)) == 0x0) {
+ u8 fail_mode = TCP_ACCECN_ACE_FAIL_RECV;
+
+ tcp_accecn_fail_mode_set(tcp_sk(sk),
+ fail_mode);
+ }
+ }
+ if (!tcp_rtx_synack(sk, req)) {
+ u8 fail_mode = TCP_ACCECN_ACE_FAIL_SEND;
+ unsigned long expires = jiffies;
+
+ if (req->num_retrans > 1 &&
+ tcp_rsk(req)->accecn_ok)
+ tcp_accecn_fail_mode_set(tcp_sk(sk),
+ fail_mode);
+
+ expires += tcp_reqsk_timeout(req);
+ if (!fastopen)
+ mod_timer_pending(&req->rsk_timer,
+ expires);
+ else
+ req->rsk_timer.expires = expires;
+ }
}
return NULL;
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c6754854ad09..9489cda7322e 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -334,7 +334,8 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
return;
if (tcp_ecn_mode_accecn(tp)) {
- if (!tcp_accecn_ace_fail_recv(tp))
+ if (!tcp_accecn_ace_fail_recv(tp) &&
+ !tcp_accecn_ace_fail_send(tp))
INET_ECN_xmit(sk);
tcp_accecn_set_ace(tp, skb, th);
skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN;
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 0672c3d8f4f1..58b9d8af4321 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -22,6 +22,7 @@
#include <linux/module.h>
#include <linux/gfp.h>
#include <net/tcp.h>
+#include <net/tcp_ecn.h>
#include <net/rstreason.h>
static u32 tcp_clamp_rto_to_user_timeout(const struct sock *sk)
@@ -479,6 +480,8 @@ static void tcp_fastopen_synack_timer(struct sock *sk, struct request_sock *req)
* it's not good to give up too easily.
*/
tcp_rtx_synack(sk, req);
+ if (req->num_retrans > 1 && tcp_rsk(req)->accecn_ok)
+ tcp_accecn_fail_mode_set(tcp_sk(sk), TCP_ACCECN_ACE_FAIL_SEND);
req->num_timeout++;
tcp_update_rto_stats(sk);
if (!tp->retrans_stamp)
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 12/14] tcp: accecn: fallback outgoing half link to non-AccECN
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (10 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 13/14] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 14/14] tcp: accecn: enable AccECN chia-yu.chang
13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server
is in AccECN mode and in SYN-RCVD state, and if it receives a value of
zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the
connection the Server MUST NOT set ECT on outgoing packets and MUST
NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it
MUST NOT disable AccECN feedback.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
---
v3:
- Remove unnecessary brackets.
---
include/net/tcp_ecn.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 57841dfa6705..42fd28d818aa 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -175,7 +175,9 @@ static inline void tcp_accecn_third_ack(struct sock *sk,
switch (ace) {
case 0x0:
/* Invalid value */
- tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV);
+ if (!TCP_SKB_CB(skb)->sacked)
+ tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV |
+ TCP_ACCECN_OPT_FAIL_RECV);
break;
case 0x7:
case 0x5:
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 13/14] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (11 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 12/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 14/14] tcp: accecn: enable AccECN chia-yu.chang
13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Detect spurious retransmission of a previously sent ACK carrying the
AccECN option after the second retransmission. Since this might be caused
by the middlebox dropping ACK with options it does not recognize, disable
the sending of the AccECN option in all subsequent ACKs. This patch
follows Section 3.2.3.2.2 of AccECN spec (RFC9768).
Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl:
(TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and
persistently sends AccECN option once it fits into TCP option space.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
---
v5:
- Add empty line between variable declarations and code
---
Documentation/networking/ip-sysctl.rst | 4 +++-
include/linux/tcp.h | 3 ++-
include/net/tcp_ecn.h | 2 ++
net/ipv4/sysctl_net_ipv4.c | 2 +-
net/ipv4/tcp_input.c | 10 ++++++++++
net/ipv4/tcp_output.c | 7 ++++++-
6 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 2bae61be1859..db2b45b34f17 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -482,7 +482,9 @@ tcp_ecn_option - INTEGER
1 Send AccECN option sparingly according to the minimum option
rules outlined in draft-ietf-tcpm-accurate-ecn.
2 Send AccECN option on every packet whenever it fits into TCP
- option space.
+ option space except when AccECN fallback is triggered.
+ 3 Send AccECN option on every packet whenever it fits into TCP
+ option space even when AccECN fallback is triggered.
= ============================================================
Default: 2
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 683f38362977..32b031d09294 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -294,7 +294,8 @@ struct tcp_sock {
u8 nonagle : 4,/* Disable Nagle algorithm? */
rate_app_limited:1; /* rate_{delivered,interval_us} limited? */
u8 received_ce_pending:4, /* Not yet transmit cnt of received_ce */
- unused2:4;
+ accecn_opt_sent:1,/* Sent AccECN option in previous ACK */
+ unused2:3;
u8 accecn_minlen:2,/* Minimum length of AccECN option sent */
est_ecnfield:2,/* ECN field for AccECN delivered estimates */
accecn_opt_demand:2,/* Demand AccECN option for n next ACKs */
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 42fd28d818aa..c15891711f77 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -29,6 +29,7 @@ enum tcp_accecn_option {
TCP_ACCECN_OPTION_DISABLED = 0,
TCP_ACCECN_OPTION_MINIMUM = 1,
TCP_ACCECN_OPTION_FULL = 2,
+ TCP_ACCECN_OPTION_PERSIST = 3,
};
/* Apply either ECT(0) or ECT(1) based on TCP_CONG_ECT_1_NEGOTIATION flag */
@@ -406,6 +407,7 @@ static inline void tcp_accecn_init_counters(struct tcp_sock *tp)
tp->received_ce_pending = 0;
__tcp_accecn_init_bytes_counters(tp->received_ecn_bytes);
__tcp_accecn_init_bytes_counters(tp->delivered_ecn_bytes);
+ tp->accecn_opt_sent = 0;
tp->accecn_minlen = 0;
tp->accecn_opt_demand = 0;
tp->est_ecnfield = 0;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 35367f8e2da3..ef21a85e021c 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -749,7 +749,7 @@ static struct ctl_table ipv4_net_table[] = {
.mode = 0644,
.proc_handler = proc_dou8vec_minmax,
.extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_TWO,
+ .extra2 = SYSCTL_THREE,
},
{
.procname = "tcp_ecn_option_beacon",
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7638aaa8befb..87ca6e3515b7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4809,6 +4809,8 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq)
static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
{
+ struct tcp_sock *tp = tcp_sk(sk);
+
/* When the ACK path fails or drops most ACKs, the sender would
* timeout and spuriously retransmit the same segment repeatedly.
* If it seems our ACKs are not reaching the other side,
@@ -4828,6 +4830,14 @@ static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
/* Save last flowlabel after a spurious retrans. */
tcp_save_lrcv_flowlabel(sk, skb);
#endif
+ /* Check DSACK info to detect that the previous ACK carrying the
+ * AccECN option was lost after the second retransmision, and then
+ * stop sending AccECN option in all subsequent ACKs.
+ */
+ if (tcp_ecn_mode_accecn(tp) &&
+ TCP_SKB_CB(skb)->seq == tp->duplicate_sack[0].start_seq &&
+ tp->accecn_opt_sent)
+ tcp_accecn_fail_mode_set(tp, TCP_ACCECN_OPT_FAIL_SEND);
}
static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9489cda7322e..63c7f448037a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -713,9 +713,12 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
if (tp) {
tp->accecn_minlen = 0;
tp->accecn_opt_tstamp = tp->tcp_mstamp;
+ tp->accecn_opt_sent = 1;
if (tp->accecn_opt_demand)
tp->accecn_opt_demand--;
}
+ } else if (tp) {
+ tp->accecn_opt_sent = 0;
}
if (unlikely(OPTION_SACK_ADVERTISE & options)) {
@@ -1187,7 +1190,9 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
if (tcp_ecn_mode_accecn(tp)) {
int ecn_opt = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn_option);
- if (ecn_opt && tp->saw_accecn_opt && !tcp_accecn_opt_fail_send(tp) &&
+ if (ecn_opt && tp->saw_accecn_opt &&
+ (ecn_opt >= TCP_ACCECN_OPTION_PERSIST ||
+ !tcp_accecn_opt_fail_send(tp)) &&
(ecn_opt >= TCP_ACCECN_OPTION_FULL || tp->accecn_opt_demand ||
tcp_accecn_option_beacon_check(sk))) {
opts->use_synack_ecn_bytes = 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v6 net-next 14/14] tcp: accecn: enable AccECN
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
` (12 preceding siblings ...)
2025-11-14 7:13 ` [PATCH v6 net-next 13/14] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
@ 2025-11-14 7:13 ` chia-yu.chang
13 siblings, 0 replies; 26+ messages in thread
From: chia-yu.chang @ 2025-11-14 7:13 UTC (permalink / raw)
To: pabeni, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Enable Accurate ECN negotiation and request for incoming and
outgoing connection by setting sysctl_tcp_ecn:
+==============+===========================================+
| | Highest ECN variant (Accurate ECN, ECN, |
| tcp_ecn | or no ECN) to be negotiated & requested |
| +---------------------+---------------------+
| | Incoming connection | Outgoing connection |
+==============+=====================+=====================+
| 0 | No ECN | No ECN |
| 1 | ECN | ECN |
| 2 | ECN | No ECN |
+--------------+---------------------+---------------------+
| 3 | Accurate ECN | Accurate ECN |
| 4 | Accurate ECN | ECN |
| 5 | Accurate ECN | No ECN |
+==============+=====================+=====================+
Refer Documentation/networking/ip-sysctl.rst for more details.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
---
net/ipv4/sysctl_net_ipv4.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index ef21a85e021c..bde49331dc07 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -47,7 +47,7 @@ static unsigned int udp_child_hash_entries_max = UDP_HTABLE_SIZE_MAX;
static int tcp_plb_max_rounds = 31;
static int tcp_plb_max_cong_thresh = 256;
static unsigned int tcp_tw_reuse_delay_max = TCP_PAWS_MSL * MSEC_PER_SEC;
-static int tcp_ecn_mode_max = 2;
+static int tcp_ecn_mode_max = 5;
static u32 icmp_errors_extension_mask_all =
GENMASK_U8(ICMP_ERR_EXT_COUNT - 1, 0);
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
2025-11-14 7:13 ` [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN chia-yu.chang
@ 2025-11-18 12:02 ` Paolo Abeni
2025-11-19 10:24 ` Chia-Yu Chang (Nokia)
0 siblings, 1 reply; 26+ messages in thread
From: Paolo Abeni @ 2025-11-18 12:02 UTC (permalink / raw)
To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Note: typo in the subj
On 11/14/25 8:13 AM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>
> No functional changes.
Some real commit message is needed.
>
> Co-developed-by: Ilpo Järvinen <ij@kernel.org>
> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>
> ---
> v6:
> - Update comments.
> ---
> include/linux/skbuff.h | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index ff90281ddf90..e09455cee8e3 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -671,7 +671,13 @@ enum {
> /* This indicates the skb is from an untrusted source. */
> SKB_GSO_DODGY = 1 << 1,
>
> - /* This indicates the tcp segment has CWR set. */
> + /* For Tx, this indicates the first TCP segment has CWR set, and any
> + * subsequent segment in the same skb has CWR cleared. This is not
> + * used on Rx except for virtio_net. However, because the connection
> + * to which the segment belongs is not tracked to use RFC3168 or
> + * Accurate ECN, and using RFC3168 ECN offload may corrupt AccECN
> + * signal of AccECN segments. Therefore, this cannot be used on Rx.
Stating both that is used by virtio_net and can not be used in the RX
path is a bit confusing. Random Contributor may be tempted from removing
ECN support from virtio_net
Please state explicitly:
- why it makes sense to use this in virtio_net
- this must not be used in the RX path _outside_ the virtio net driver
something alike:
/* For Tx, this indicates the first TCP segment has CWR set, and any
* subsequent segment in the same skb has CWR cleared. However, because
* the connection to which the segment belongs is not tracked to use
* RFC3168 or Accurate ECN, and using RFC3168 ECN offload may corrupt
* AccECN signal of AccECN segments. Therefore, this cannot be used on
* Rx outside the virtio_net driver. Such exception exist due to
* <reason>
*/
/P
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v6 net-next 04/14] selftests/net: gro: add self-test for TCP CWR flag
2025-11-14 7:13 ` [PATCH v6 net-next 04/14] selftests/net: gro: add self-test for TCP CWR flag chia-yu.chang
@ 2025-11-18 12:14 ` Paolo Abeni
0 siblings, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-11-18 12:14 UTC (permalink / raw)
To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
On 11/14/25 8:13 AM, chia-yu.chang@nokia-bell-labs.com wrote:
> +/* send extra flags of the (NUM_PACKETS / 2) and (NUM_PACKETS / 2 - 1)
> + * pkts, not first and not last pkt
> + */
> +static void send_flags(int fd, struct sockaddr_ll *daddr, int psh, int syn,
> + int rst, int urg, int cwr)
> +{
> + static char flag_buf[2][MAX_HDR_LEN + PAYLOAD_LEN];
> + static char buf[MAX_HDR_LEN + PAYLOAD_LEN];
> + int payload_len, pkt_size, i;
> + struct tcphdr *tcph;
> + int flag[2];
> +
> + payload_len = PAYLOAD_LEN * (psh || cwr);
> + pkt_size = total_hdr_len + payload_len;
> + flag[0] = NUM_PACKETS / 2;
> + flag[1] = NUM_PACKETS / 2 - 1;
> +
> + // Create and configure packets with flags
Please use /* */ for comments.
Other than that:
Acked-by: Paolo Abeni <pabeni@redhat.com>
> + for (i = 0; i < 2; i++) {
> + if (flag[i] > 0) {
> + create_packet(flag_buf[i], flag[i] * payload_len, 0,
> + payload_len, 0);
> + tcph = (struct tcphdr *)(flag_buf[i] + tcp_offset);
> + set_flags(tcph, payload_len, psh, syn, rst, urg, cwr);
> + }
> + }
>
> for (i = 0; i < NUM_PACKETS + 1; i++) {
> - if (i == flag) {
> - write_packet(fd, flag_buf, pkt_size, daddr);
> + if (i == flag[0]) {
> + write_packet(fd, flag_buf[0], pkt_size, daddr);
> + continue;
> + } else if (i == flag[1] && cwr) {
> + write_packet(fd, flag_buf[1], pkt_size, daddr);
> continue;
> }
> create_packet(buf, i * PAYLOAD_LEN, 0, PAYLOAD_LEN, 0);
> @@ -1020,16 +1045,19 @@ static void gro_sender(void)
> send_ack(txfd, &daddr);
> write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
> } else if (strcmp(testname, "flags") == 0) {
> - send_flags(txfd, &daddr, 1, 0, 0, 0);
> + send_flags(txfd, &daddr, 1, 0, 0, 0, 0);
> write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
>
> - send_flags(txfd, &daddr, 0, 1, 0, 0);
> + send_flags(txfd, &daddr, 0, 1, 0, 0, 0);
> write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
>
> - send_flags(txfd, &daddr, 0, 0, 1, 0);
> + send_flags(txfd, &daddr, 0, 0, 1, 0, 0);
> write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
>
> - send_flags(txfd, &daddr, 0, 0, 0, 1);
> + send_flags(txfd, &daddr, 0, 0, 0, 1, 0);
> + write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
> +
> + send_flags(txfd, &daddr, 0, 0, 0, 0, 1);
> write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
> } else if (strcmp(testname, "tcp") == 0) {
> send_changed_checksum(txfd, &daddr);
> @@ -1163,6 +1191,12 @@ static void gro_receiver(void)
>
> printf("urg flag ends coalescing: ");
> check_recv_pkts(rxfd, correct_payload, 3);
> +
> + correct_payload[0] = PAYLOAD_LEN;
> + correct_payload[1] = PAYLOAD_LEN * 2;
> + correct_payload[2] = PAYLOAD_LEN * 2;
> + printf("cwr flag ends coalescing: ");
> + check_recv_pkts(rxfd, correct_payload, 3);
> } else if (strcmp(testname, "tcp") == 0) {
> correct_payload[0] = PAYLOAD_LEN;
> correct_payload[1] = PAYLOAD_LEN;
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v6 net-next 05/14] tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers
2025-11-14 7:13 ` [PATCH v6 net-next 05/14] tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers chia-yu.chang
@ 2025-11-18 12:30 ` Paolo Abeni
0 siblings, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-11-18 12:30 UTC (permalink / raw)
To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
Cc: Olivier Tilmans
On 11/14/25 8:13 AM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>
> Two CA module flags are added in this patch related to AccECN negotiation.
> First, a new CA module flag (TCP_CONG_NEEDS_ACCECN) defines that the CA
> expects to negotiate AccECN functionality using the ECE, CWR and AE flags
> in the TCP header.
>
> Second, during ECN negotiation, ECT(0) in the IP header is used. This patch
> enables CA to control whether ECT(0) or ECT(1) should be used on a per-segment
> basis. A new flag (TCP_CONG_ECT_1_NEGOTIATION) defines the expected ECT value
> in the IP header by the CA when not-yet initialized for the connection.
>
> The detailed AccECN negotiaotn during the 3WHS can be found in the AccECN spec:
> https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
>
> Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
> Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v6 net-next 09/14] tcp: add TCP_SYNACK_RETRANS synack_type
2025-11-14 7:13 ` [PATCH v6 net-next 09/14] tcp: add TCP_SYNACK_RETRANS synack_type chia-yu.chang
@ 2025-11-18 12:32 ` Paolo Abeni
0 siblings, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-11-18 12:32 UTC (permalink / raw)
To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
On 11/14/25 8:13 AM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>
> Before this patch, retransmitted SYN/ACK did not have a specific synack_type;
> however, the upcoming patch needs to distinguish between retransmitted and
> non-retransmitted SYN/ACK for AccECN negotiation to transmit the fallback
> SYN/ACK during AccECN negotiation. Therefore, this patch introduces a new
> synack_type (TCP_SYNACK_RETRANS).
>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v6 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
2025-11-14 7:13 ` [PATCH v6 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
@ 2025-11-18 13:58 ` Paolo Abeni
2025-11-19 10:32 ` Chia-Yu Chang (Nokia)
0 siblings, 1 reply; 26+ messages in thread
From: Paolo Abeni @ 2025-11-18 13:58 UTC (permalink / raw)
To: chia-yu.chang, edumazet, parav, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
On 11/14/25 8:13 AM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>
> For Accurate ECN, the first SYN/ACK sent by the TCP server shall set the
> ACE flag (see Table 1 of RFC9768) and the AccECN option to complete the
> capability negotiation. However, if the TCP server needs to retransmit such
> a SYN/ACK (for example, because it did not receive an ACK acknowledging its
> SYN/ACK, or received a second SYN requesting AccECN support), the TCP server
> retransmits the SYN/ACK without the AccECN option. This is because the
> SYN/ACK may be lost due to congestion, or a middlebox may block the AccECN
> option. Furthermore, if this retransmission also times out, to expedite
> connection establishment, the TCP server should retransmit the SYN/ACK with
> (AE,CWR,ECE) = (0,0,0) and without the AccECN option, while maintaining
> AccECN feedback mode.
>
> This complies with Section 3.2.3.2.2 of the AccECN specification (RFC9768).
>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>
> ---
> v6:
> - Use new synack_type TCP_SYNACK_RETRANS and num_retrans.
> ---
> include/net/tcp_ecn.h | 20 ++++++++++++++------
> net/ipv4/tcp_output.c | 4 ++--
> 2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
> index a709fb1756eb..57841dfa6705 100644
> --- a/include/net/tcp_ecn.h
> +++ b/include/net/tcp_ecn.h
> @@ -649,12 +649,20 @@ static inline void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb)
> }
>
> static inline void
> -tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
> -{
> - if (tcp_rsk(req)->accecn_ok)
> - tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
> - else if (inet_rsk(req)->ecn_ok)
> - th->ece = 1;
> +tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th,
> + enum tcp_synack_type synack_type)
> +{
> + // num_retrans will be incresaed after tcp_ecn_make_synack()
Please use /* */ for comments
> + if (!req->num_retrans) {
It's unclear you this function receives a `synack_type` argument and
don't use it. Should the above be
if (synack_type != TCP_SYNACK_RETRANS) {
?
Or just remove such argument.
/P
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
2025-11-18 12:02 ` Paolo Abeni
@ 2025-11-19 10:24 ` Chia-Yu Chang (Nokia)
2025-11-19 10:40 ` Paolo Abeni
0 siblings, 1 reply; 26+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-11-19 10:24 UTC (permalink / raw)
To: Paolo Abeni, edumazet@google.com, parav@nvidia.com,
linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
kuba@kernel.org, stephen@networkplumber.org,
xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
liuhangbin@gmail.com, shuah@kernel.org,
linux-kselftest@vger.kernel.org, ij@kernel.org,
ncardwell@google.com, Koen De Schepper (Nokia),
g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
Jason_Livingood@comcast.com, Vidhi Goel
> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Tuesday, November 18, 2025 1:02 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; parav@nvidia.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@google.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
>
>
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
> Note: typo in the subj
>
> On 11/14/25 8:13 AM, chia-yu.chang@nokia-bell-labs.com wrote:
> > From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >
> > No functional changes.
>
> Some real commit message is needed.
>
> >
> > Co-developed-by: Ilpo Järvinen <ij@kernel.org>
> > Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >
> > ---
> > v6:
> > - Update comments.
> > ---
> > include/linux/skbuff.h | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index
> > ff90281ddf90..e09455cee8e3 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -671,7 +671,13 @@ enum {
> > /* This indicates the skb is from an untrusted source. */
> > SKB_GSO_DODGY = 1 << 1,
> >
> > - /* This indicates the tcp segment has CWR set. */
> > + /* For Tx, this indicates the first TCP segment has CWR set, and any
> > + * subsequent segment in the same skb has CWR cleared. This is not
> > + * used on Rx except for virtio_net. However, because the connection
> > + * to which the segment belongs is not tracked to use RFC3168 or
> > + * Accurate ECN, and using RFC3168 ECN offload may corrupt AccECN
> > + * signal of AccECN segments. Therefore, this cannot be used on Rx.
>
> Stating both that is used by virtio_net and can not be used in the RX path is a bit confusing. Random Contributor may be tempted from removing ECN support from virtio_net
>
> Please state explicitly:
> - why it makes sense to use this in virtio_net
> - this must not be used in the RX path _outside_ the virtio net driver
>
> something alike:
>
> /* For Tx, this indicates the first TCP segment has CWR set, and any
> * subsequent segment in the same skb has CWR cleared. However, because
> * the connection to which the segment belongs is not tracked to use
> * RFC3168 or Accurate ECN, and using RFC3168 ECN offload may corrupt
> * AccECN signal of AccECN segments. Therefore, this cannot be used on
> * Rx outside the virtio_net driver. Such exception exist due to
> * <reason>
> */
>
> /P
Hi Paolo and Ilpo,
I was thinking to totally remove ECN from Rx path, and add the comments only in AccECN, like below:
Because we could use SKB_GSO_TCP_ACCECN in Rx to explicitly tell latter GSO Tx in a forwarding case that do NOT clean CWR flag.
What do you think?
/* For Tx, this indicates the first TCP segment has CWR set, and any
* subsequent segment in the same skb has CWR cleared. However, because
* the connection to which the segment belongs is not tracked to use
* RFC3168 or Accurate ECN, and using RFC3168 ECN offload may corrupt
* ACE signal of AccECN segments. Therefore, this cannot be used on Rx.
*/
SKB_GSO_TCP_ECN = 1 << 2,
[...]
/* For TX, this indicates the TCP segment uses the CWR flag as part of
* ACE signal, and the CWR flag is not modified in the skb. For RX, any
* CWR flagged segment must use SKB_GSO_TCP_ACCECN to ensure CWR flag
* is not cleared by any RFC3168 ECN offload, and thus keeping ACE
* signal of AccECN segments. This is particularly used for Rx of
* virtio_net driver in order to tell latter GSO Tx in a forwarding
* scenario that it is NOT ok to clean CWR flag from the 2nd segment.
*/
SKB_GSO_TCP_ACCECN = 1 << 19,
Chia-Yu
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v6 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
2025-11-18 13:58 ` Paolo Abeni
@ 2025-11-19 10:32 ` Chia-Yu Chang (Nokia)
0 siblings, 0 replies; 26+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-11-19 10:32 UTC (permalink / raw)
To: Paolo Abeni, edumazet@google.com, parav@nvidia.com,
linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
kuba@kernel.org, stephen@networkplumber.org,
xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
liuhangbin@gmail.com, shuah@kernel.org,
linux-kselftest@vger.kernel.org, ij@kernel.org,
ncardwell@google.com, Koen De Schepper (Nokia),
g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
Jason_Livingood@comcast.com, Vidhi Goel
> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Tuesday, November 18, 2025 2:59 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; parav@nvidia.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@google.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v6 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
>
>
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
> On 11/14/25 8:13 AM, chia-yu.chang@nokia-bell-labs.com wrote:
> > From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >
> > For Accurate ECN, the first SYN/ACK sent by the TCP server shall set
> > the ACE flag (see Table 1 of RFC9768) and the AccECN option to
> > complete the capability negotiation. However, if the TCP server needs
> > to retransmit such a SYN/ACK (for example, because it did not receive
> > an ACK acknowledging its SYN/ACK, or received a second SYN requesting
> > AccECN support), the TCP server retransmits the SYN/ACK without the
> > AccECN option. This is because the SYN/ACK may be lost due to
> > congestion, or a middlebox may block the AccECN option. Furthermore,
> > if this retransmission also times out, to expedite connection
> > establishment, the TCP server should retransmit the SYN/ACK with
> > (AE,CWR,ECE) = (0,0,0) and without the AccECN option, while
> > maintaining AccECN feedback mode.
> >
> > This complies with Section 3.2.3.2.2 of the AccECN specification (RFC9768).
> >
> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >
> > ---
> > v6:
> > - Use new synack_type TCP_SYNACK_RETRANS and num_retrans.
> > ---
> > include/net/tcp_ecn.h | 20 ++++++++++++++------
> > net/ipv4/tcp_output.c | 4 ++--
> > 2 files changed, 16 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h index
> > a709fb1756eb..57841dfa6705 100644
> > --- a/include/net/tcp_ecn.h
> > +++ b/include/net/tcp_ecn.h
> > @@ -649,12 +649,20 @@ static inline void tcp_ecn_clear_syn(struct sock
> > *sk, struct sk_buff *skb) }
> >
> > static inline void
> > -tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr
> > *th) -{
> > - if (tcp_rsk(req)->accecn_ok)
> > - tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
> > - else if (inet_rsk(req)->ecn_ok)
> > - th->ece = 1;
> > +tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th,
> > + enum tcp_synack_type synack_type) {
> > + // num_retrans will be incresaed after tcp_ecn_make_synack()
>
> Please use /* */ for comments
>
> > + if (!req->num_retrans) {
>
> It's unclear you this function receives a `synack_type` argument and don't use it. Should the above be
>
> if (synack_type != TCP_SYNACK_RETRANS) {
>
> ?
>
> Or just remove such argument.
>
> /P
Hi Paolo,
You are right, and I will use both "synack_type != TCP_SYNACK_RETRANS" || "!req->num_retrans".
Because this ACE field fallback will only happen from the 2nd retansmitted SYN/ACK.
Chia-Yu
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
2025-11-19 10:24 ` Chia-Yu Chang (Nokia)
@ 2025-11-19 10:40 ` Paolo Abeni
2025-11-19 10:43 ` Paolo Abeni
2025-11-19 11:22 ` Chia-Yu Chang (Nokia)
0 siblings, 2 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-11-19 10:40 UTC (permalink / raw)
To: Chia-Yu Chang (Nokia), edumazet@google.com, parav@nvidia.com,
linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
kuba@kernel.org, stephen@networkplumber.org,
xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
liuhangbin@gmail.com, shuah@kernel.org,
linux-kselftest@vger.kernel.org, ij@kernel.org,
ncardwell@google.com, Koen De Schepper (Nokia),
g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
Jason_Livingood@comcast.com, Vidhi Goel
On 11/19/25 11:24 AM, Chia-Yu Chang (Nokia) wrote:
> I was thinking to totally remove ECN from Rx path,
??? do you mean you intend to remove the existing virtio_net ECN
support? I guess/hope I misread the above.
Note that removing features from virtio_net is an extreme pain at best,
and more probably simply impossible - see the UFO removal history.
Please clarify, thanks!
Paolo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
2025-11-19 10:40 ` Paolo Abeni
@ 2025-11-19 10:43 ` Paolo Abeni
2025-11-19 11:22 ` Chia-Yu Chang (Nokia)
1 sibling, 0 replies; 26+ messages in thread
From: Paolo Abeni @ 2025-11-19 10:43 UTC (permalink / raw)
To: Chia-Yu Chang (Nokia), edumazet@google.com, parav@nvidia.com,
linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
kuba@kernel.org, stephen@networkplumber.org,
xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
liuhangbin@gmail.com, shuah@kernel.org,
linux-kselftest@vger.kernel.org, ij@kernel.org,
ncardwell@google.com, Koen De Schepper (Nokia),
g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
Jason_Livingood@comcast.com, Vidhi Goel
On 11/19/25 11:40 AM, Paolo Abeni wrote:
> On 11/19/25 11:24 AM, Chia-Yu Chang (Nokia) wrote:
>> I was thinking to totally remove ECN from Rx path,
>
> ??? do you mean you intend to remove the existing virtio_net ECN
> support? I guess/hope I misread the above.
>
> Note that removing features from virtio_net is an extreme pain at best,
> and more probably simply impossible - see the UFO removal history.
>
> Please clarify, thanks!
Note that my comment on this patch is focusing only on clarity: you are
updating a comment for such goal: the new comment need to be clear and
consistent. The proposed text was not; a better/more consistent one will
be ok for me.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
2025-11-19 10:40 ` Paolo Abeni
2025-11-19 10:43 ` Paolo Abeni
@ 2025-11-19 11:22 ` Chia-Yu Chang (Nokia)
2025-11-26 8:48 ` Chia-Yu Chang (Nokia)
1 sibling, 1 reply; 26+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-11-19 11:22 UTC (permalink / raw)
To: Paolo Abeni, edumazet@google.com, parav@nvidia.com,
linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
kuba@kernel.org, stephen@networkplumber.org,
xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
liuhangbin@gmail.com, shuah@kernel.org,
linux-kselftest@vger.kernel.org, ij@kernel.org,
ncardwell@google.com, Koen De Schepper (Nokia),
g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
Jason_Livingood@comcast.com, Vidhi Goel
> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Wednesday, November 19, 2025 11:40 AM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; parav@nvidia.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@google.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
>
>
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
> On 11/19/25 11:24 AM, Chia-Yu Chang (Nokia) wrote:
> > I was thinking to totally remove ECN from Rx path,
>
> ??? do you mean you intend to remove the existing virtio_net ECN support? I guess/hope I misread the above.
>
> Note that removing features from virtio_net is an extreme pain at best, and more probably simply impossible - see the UFO removal history.
>
> Please clarify, thanks!
>
> Paolo
This ECN flag on RX path shall not be used in Rx path for forwarding scenario. But it can still be used on Tx path in virtio_net.
And on RX path, new ACCECN flag shall be used to avoid breaking CWR flag for latter GSO Tx in forwarding scenario.
Let me borrow an example from Ilpo:
SKB_GSO_TCP_ECN will not replicate the same TCP header flags in a forwarding scenario:
Segment 1 CWR set
Segment 2 CWR set
GRO rx and GSO tx with SKB_GSO_TCP_ECN, after forwarding outputs these segments:
Segment 1 CWR set
Segment 2 CWR cleared
Thus, the ACE field in Segment 2 no longer contains the same value as it was sent with.
So, maybe a table below better represent this?
+===============+======================+===========================+
| | SKB_GSO_TCP_ECN | SKB_GSO_TCP_ACCECN |
+===============+======================+===========================+
| | The 1st TCP segment | The TCP segment uses |
| Tx path | has CWR set and | the CWR flag as part of |
| | suqsequent segments | ACE signal, and the CWR |
| | have CWR cleared. | flag is not modified. |
+---------------+----------------------+---------------------------+
| Rx path | Shall not be used to | Used to indicate latter |
| of forwarding | avoid potential ACE | GSO Tx NOT to clear CWR |
| scenario | signal corruption. | flag from the 2nd segment |
+===============+======================+===========================+
Chia-Yu
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
2025-11-19 11:22 ` Chia-Yu Chang (Nokia)
@ 2025-11-26 8:48 ` Chia-Yu Chang (Nokia)
0 siblings, 0 replies; 26+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-11-26 8:48 UTC (permalink / raw)
To: Paolo Abeni, edumazet@google.com, parav@nvidia.com,
linux-doc@vger.kernel.org, corbet@lwn.net, horms@kernel.org,
dsahern@kernel.org, kuniyu@google.com, bpf@vger.kernel.org,
netdev@vger.kernel.org, dave.taht@gmail.com, jhs@mojatatu.com,
kuba@kernel.org, stephen@networkplumber.org,
xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
liuhangbin@gmail.com, shuah@kernel.org,
linux-kselftest@vger.kernel.org, ij@kernel.org,
ncardwell@google.com, Koen De Schepper (Nokia),
g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
Jason_Livingood@comcast.com, Vidhi Goel
> -----Original Message-----
> From: Chia-Yu Chang (Nokia)
> Sent: Wednesday, November 19, 2025 12:22 PM
> To: 'Paolo Abeni' <pabeni@redhat.com>; edumazet@google.com; parav@nvidia.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@google.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: RE: [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
>
> > -----Original Message-----
> > From: Paolo Abeni <pabeni@redhat.com>
> > Sent: Wednesday, November 19, 2025 11:40 AM
> > To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; parav@nvidia.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@google.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> > Subject: Re: [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
> >
> >
> > CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> >
> >
> >
> > On 11/19/25 11:24 AM, Chia-Yu Chang (Nokia) wrote:
> > > I was thinking to totally remove ECN from Rx path,
> >
> > ??? do you mean you intend to remove the existing virtio_net ECN support? I guess/hope I misread the above.
> >
> > Note that removing features from virtio_net is an extreme pain at best, and more probably simply impossible - see the UFO removal history.
> >
> > Please clarify, thanks!
> >
> > Paolo
>
> This ECN flag on RX path shall not be used in Rx path for forwarding scenario. But it can still be used on Tx path in virtio_net.
>
> And on RX path, new ACCECN flag shall be used to avoid breaking CWR flag for latter GSO Tx in forwarding scenario.
>
> Let me borrow an example from Ilpo:
>
> SKB_GSO_TCP_ECN will not replicate the same TCP header flags in a forwarding scenario:
> Segment 1 CWR set
> Segment 2 CWR set
>
> GRO rx and GSO tx with SKB_GSO_TCP_ECN, after forwarding outputs these segments:
> Segment 1 CWR set
> Segment 2 CWR cleared
>
> Thus, the ACE field in Segment 2 no longer contains the same value as it was sent with.
>
>
> So, maybe a table below better represent this?
> +===============+======================+===========================+
> | | SKB_GSO_TCP_ECN | SKB_GSO_TCP_ACCECN |
> +===============+======================+===========================+
> | | The 1st TCP segment | The TCP segment uses |
> | Tx path | has CWR set and | the CWR flag as part of |
> | | suqsequent segments | ACE signal, and the CWR |
> | | have CWR cleared. | flag is not modified. |
> +---------------+----------------------+---------------------------+
> | Rx path | Shall not be used to | Used to indicate latter |
> | of forwarding | avoid potential ACE | GSO Tx NOT to clear CWR |
> | scenario | signal corruption. | flag from the 2nd segment |
> +===============+======================+===========================+
>
>
> Chia-Yu
Hi Paolo,
I was thinking to move this patch to a latter series in which we would like to add ACCECN flags for virtio_net and replace some existing SKB_GSO_TCP_ECN of RX path.
Hope this is ok for you.
For instance , in mlx5e_shampo_update_ipv4_tcp_hdr() of drivers/net/ethernet/mellanox/mlx5/core/en_rx.c, SKB_GSO_TCP_ECN is still being used for RX now.
But this needs to be either changed into SKB_GSO_TCP_ACCECN or totally removed to avoid potential CWR corruption.
For virtio_net RX path, our planned change is to check wither ACCECN is being set first and then translate from VIRTIO_NET_HDR_GSO_ACCECN to SKB_GSO_TCP_ACCECN.
And if VIRTIO_NET_HDR_GSO_ACCECN is not set but VIRTIO_NET_HDR_GSO_ECN is set, then VIRTIO_NET_HDR_GSO_ECN will be translated into SKB_GSO_TCP_ECN.
Chia-Yu
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2025-11-26 8:48 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-14 7:13 [PATCH v6 net-next 00/14] AccECN protocol case handling series chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 01/14] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 02/14] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 03/14] net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN chia-yu.chang
2025-11-18 12:02 ` Paolo Abeni
2025-11-19 10:24 ` Chia-Yu Chang (Nokia)
2025-11-19 10:40 ` Paolo Abeni
2025-11-19 10:43 ` Paolo Abeni
2025-11-19 11:22 ` Chia-Yu Chang (Nokia)
2025-11-26 8:48 ` Chia-Yu Chang (Nokia)
2025-11-14 7:13 ` [PATCH v6 net-next 04/14] selftests/net: gro: add self-test for TCP CWR flag chia-yu.chang
2025-11-18 12:14 ` Paolo Abeni
2025-11-14 7:13 ` [PATCH v6 net-next 05/14] tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers chia-yu.chang
2025-11-18 12:30 ` Paolo Abeni
2025-11-14 7:13 ` [PATCH v6 net-next 06/14] tcp: disable RFC3168 fallback identifier for CC modules chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 07/14] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 08/14] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 09/14] tcp: add TCP_SYNACK_RETRANS synack_type chia-yu.chang
2025-11-18 12:32 ` Paolo Abeni
2025-11-14 7:13 ` [PATCH v6 net-next 10/14] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
2025-11-18 13:58 ` Paolo Abeni
2025-11-19 10:32 ` Chia-Yu Chang (Nokia)
2025-11-14 7:13 ` [PATCH v6 net-next 11/14] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 12/14] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 13/14] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
2025-11-14 7:13 ` [PATCH v6 net-next 14/14] tcp: accecn: enable AccECN chia-yu.chang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).