* [PATCH v4 net-next 00/13] AccECN protocol case handling series
@ 2025-10-10 13:17 chia-yu.chang
2025-10-10 14:53 ` Paolo Abeni
0 siblings, 1 reply; 27+ messages in thread
From: chia-yu.chang @ 2025-10-10 13:17 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Hello,
Plesae find the v4 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best regards,
Chia-Yu
---
v4:
- Add previous #13 in v2 back after dicussion with the RFC author.
- Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option.
v3:
- Add additional min() check if pkts_acked_ewma is not initialized in #1.
- Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3.
- Add empty line between variable declarations and code in #4.
- Update commit message to fix old AccECN commits in #5.
- Remove unnecessary brackets in #10.
- Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2.
---
Chia-Yu Chang (11):
tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
tcp: disable RFC3168 fallback identifier for CC modules
tcp: accecn: handle unexpected AccECN negotiation feedback
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
tcp: move increment of num_retrans
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
SYN/ACK
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
tcp: accecn: fallback outgoing half link to non-AccECN
tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation
tcp: accecn: detect loss ACK w/ AccECN option and add
TCP_ACCECN_OPTION_PERSIST
tcp: accecn: enable AccECN
Ilpo Järvinen (2):
tcp: try to avoid safer when ACKs are thinned
gro: flushing when CWR is set negatively affects AccECN
Documentation/networking/ip-sysctl.rst | 2 +
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/tcp.h | 4 +-
include/net/inet_ecn.h | 20 +++-
include/net/tcp.h | 32 ++++++-
include/net/tcp_ecn.h | 92 ++++++++++++++-----
net/ipv4/sysctl_net_ipv4.c | 4 +-
net/ipv4/tcp.c | 2 +
net/ipv4/tcp_cong.c | 10 +-
net/ipv4/tcp_input.c | 58 ++++++++++--
net/ipv4/tcp_minisocks.c | 40 +++++---
net/ipv4/tcp_offload.c | 3 +-
net/ipv4/tcp_output.c | 42 ++++++---
13 files changed, 240 insertions(+), 70 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 net-next 00/13] AccECN protocol case handling series
2025-10-10 13:17 chia-yu.chang
@ 2025-10-10 14:53 ` Paolo Abeni
2025-10-10 14:55 ` Chia-Yu Chang (Nokia)
0 siblings, 1 reply; 27+ messages in thread
From: Paolo Abeni @ 2025-10-10 14:53 UTC (permalink / raw)
To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
On 10/10/25 3:17 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>
> Hello,
>
> Plesae find the v4 AccECN case handling patch series, which covers
> several excpetional case handling of Accurate ECN spec (RFC9768),
> adds new identifiers to be used by CC modules, adds ecn_delta into
> rate_sample, and keeps the ACE counter for computation, etc.
>
> This patch series is part of the full AccECN patch series, which is available at
> https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
>
> Best regards,
> Chia-Yu
## Form letter - net-next-closed
The merge window for v6.18 has begun and therefore net-next is closed
for new drivers, features, code refactoring and optimizations. We are
currently accepting bug fixes only.
Please repost when net-next reopens after October 12th.
RFC patches sent for review only are obviously welcome at any time.
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH v4 net-next 00/13] AccECN protocol case handling series
2025-10-10 14:53 ` Paolo Abeni
@ 2025-10-10 14:55 ` Chia-Yu Chang (Nokia)
0 siblings, 0 replies; 27+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-10-10 14:55 UTC (permalink / raw)
To: Paolo Abeni, edumazet@google.com, linux-doc@vger.kernel.org,
corbet@lwn.net, horms@kernel.org, dsahern@kernel.org,
kuniyu@amazon.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
dave.taht@gmail.com, jhs@mojatatu.com, kuba@kernel.org,
stephen@networkplumber.org, xiyou.wangcong@gmail.com,
jiri@resnulli.us, davem@davemloft.net, andrew+netdev@lunn.ch,
donald.hunter@gmail.com, ast@fiberby.net, liuhangbin@gmail.com,
shuah@kernel.org, linux-kselftest@vger.kernel.org, ij@kernel.org,
ncardwell@google.com, Koen De Schepper (Nokia),
g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
mirja.kuehlewind@ericsson.com, cheshire@apple.com, rs.ietf@gmx.at,
Jason_Livingood@comcast.com, vidhi_goel@apple.com
> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Friday, October 10, 2025 4:54 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com
> Subject: Re: [PATCH v4 net-next 00/13] AccECN protocol case handling series
>
>
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
> On 10/10/25 3:17 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >
> > Hello,
> >
> > Plesae find the v4 AccECN case handling patch series, which covers
> > several excpetional case handling of Accurate ECN spec (RFC9768), adds
> > new identifiers to be used by CC modules, adds ecn_delta into
> > rate_sample, and keeps the ACE counter for computation, etc.
> >
> > This patch series is part of the full AccECN patch series, which is
> > available at
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2FL4STeam%2Flinux-net-next%2Fcommits%2Fupstream_l4steam%2F&data
> > =05%7C02%7Cchia-yu.chang%40nokia-bell-labs.com%7Cb901188c311e446253840
> > 8de080ccfff%7C5d4717519675428d917b70f44f9630b0%7C0%7C0%7C6389570482857
> > 32929%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAw
> > MCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdat
> > a=43fGh93FKg1yBBBhGTNzy8tfJvm5Y%2FYmdABUmTuD%2FLU%3D&reserved=0
> >
> > Best regards,
> > Chia-Yu
>
> ## Form letter - net-next-closed
>
> The merge window for v6.18 has begun and therefore net-next is closed for new drivers, features, code refactoring and optimizations. We are currently accepting bug fixes only.
>
> Please repost when net-next reopens after October 12th.
>
> RFC patches sent for review only are obviously welcome at any time.
Sorry for spamming, I will repost after October 12th.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 00/13] AccECN protocol case handling series
@ 2025-10-13 17:03 chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 01/13] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
` (13 more replies)
0 siblings, 14 replies; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Hello,
Plesae find the v4 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best regards,
Chia-Yu
---
v4:
- Add previous #13 in v2 back after dicussion with the RFC author.
- Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option.
v3:
- Add additional min() check if pkts_acked_ewma is not initialized in #1.
- Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3.
- Add empty line between variable declarations and code in #4.
- Update commit message to fix old AccECN commits in #5.
- Remove unnecessary brackets in #10.
- Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2.
---
Chia-Yu Chang (11):
tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
tcp: disable RFC3168 fallback identifier for CC modules
tcp: accecn: handle unexpected AccECN negotiation feedback
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
tcp: move increment of num_retrans
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
SYN/ACK
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
tcp: accecn: fallback outgoing half link to non-AccECN
tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation
tcp: accecn: detect loss ACK w/ AccECN option and add
TCP_ACCECN_OPTION_PERSIST
tcp: accecn: enable AccECN
Ilpo Järvinen (2):
tcp: try to avoid safer when ACKs are thinned
gro: flushing when CWR is set negatively affects AccECN
Documentation/networking/ip-sysctl.rst | 4 +-
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/tcp.h | 4 +-
include/net/inet_ecn.h | 20 +++-
include/net/tcp.h | 32 ++++++-
include/net/tcp_ecn.h | 92 ++++++++++++++-----
net/ipv4/sysctl_net_ipv4.c | 4 +-
net/ipv4/tcp.c | 2 +
net/ipv4/tcp_cong.c | 10 +-
net/ipv4/tcp_input.c | 58 ++++++++++--
net/ipv4/tcp_minisocks.c | 40 +++++---
net/ipv4/tcp_offload.c | 3 +-
net/ipv4/tcp_output.c | 42 ++++++---
13 files changed, 241 insertions(+), 71 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 01/13] tcp: try to avoid safer when ACKs are thinned
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
` (12 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Ilpo Järvinen <ij@kernel.org>
Add newly acked pkts EWMA. When ACK thinning occurs, select
between safer and unsafe cep delta in AccECN processing based
on it. If the packets ACKed per ACK tends to be large, don't
conservatively assume ACE field overflow.
This patch uses the existing 2-byte holes in the rx group for new
u16 variables withtout creating more holes. Below are the pahole
outcomes before and after this patch:
[BEFORE THIS PATCH]
struct tcp_sock {
[...]
u32 delivered_ecn_bytes[3]; /* 2744 12 */
/* XXX 4 bytes hole, try to pack */
[...]
__cacheline_group_end__tcp_sock_write_rx[0]; /* 2816 0 */
[...]
/* size: 3264, cachelines: 51, members: 177 */
}
[AFTER THIS PATCH]
struct tcp_sock {
[...]
u32 delivered_ecn_bytes[3]; /* 2744 12 */
u16 pkts_acked_ewma; /* 2756 2 */
/* XXX 2 bytes hole, try to pack */
[...]
__cacheline_group_end__tcp_sock_write_rx[0]; /* 2816 0 */
[...]
/* size: 3264, cachelines: 51, members: 178 */
}
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
v3:
- Add additional min() check if pkts_acked_ewma is not initialized.
---
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/tcp.h | 1 +
net/ipv4/tcp.c | 2 ++
net/ipv4/tcp_input.c | 20 ++++++++++++++++++-
4 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
index 26f32dbcf6ec..563daea10d6c 100644
--- a/Documentation/networking/net_cachelines/tcp_sock.rst
+++ b/Documentation/networking/net_cachelines/tcp_sock.rst
@@ -105,6 +105,7 @@ u32 received_ce read_mostly read_w
u32[3] received_ecn_bytes read_mostly read_write
u8:4 received_ce_pending read_mostly read_write
u32[3] delivered_ecn_bytes read_write
+u16 pkts_acked_ewma read_write
u8:2 syn_ect_snt write_mostly read_write
u8:2 syn_ect_rcv read_mostly read_write
u8:2 accecn_minlen write_mostly read_write
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 20b8c6e21fef..683f38362977 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -345,6 +345,7 @@ struct tcp_sock {
u32 rate_interval_us; /* saved rate sample: time elapsed */
u32 rcv_rtt_last_tsecr;
u32 delivered_ecn_bytes[3];
+ u16 pkts_acked_ewma;/* Pkts acked EWMA for AccECN cep heuristic */
u64 first_tx_mstamp; /* start of window send phase */
u64 delivered_mstamp; /* time we reached "delivered" */
u64 bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8a18aeca7ab0..f68d2ba619d9 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3419,6 +3419,7 @@ int tcp_disconnect(struct sock *sk, int flags)
tcp_accecn_init_counters(tp);
tp->prev_ecnfield = 0;
tp->accecn_opt_tstamp = 0;
+ tp->pkts_acked_ewma = 0;
if (icsk->icsk_ca_initialized && icsk->icsk_ca_ops->release)
icsk->icsk_ca_ops->release(sk);
memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
@@ -5180,6 +5181,7 @@ static void __init tcp_struct_check(void)
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, rate_interval_us);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, rcv_rtt_last_tsecr);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, delivered_ecn_bytes);
+ CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, pkts_acked_ewma);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, first_tx_mstamp);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, delivered_mstamp);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_rx, bytes_acked);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 31ea5af49f2d..8af527f4e2b5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -488,6 +488,10 @@ static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
tcp_count_delivered_ce(tp, delivered);
}
+#define PKTS_ACKED_WEIGHT 6
+#define PKTS_ACKED_PREC 6
+#define ACK_COMP_THRESH 4
+
/* Returns the ECN CE delta */
static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
u32 delivered_pkts, u32 delivered_bytes,
@@ -499,6 +503,7 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
u32 delta, safe_delta, d_ceb;
bool opt_deltas_valid;
u32 corrected_ace;
+ u32 ewma;
/* Reordered ACK or uncertain due to lack of data to send and ts */
if (!(flag & (FLAG_FORWARD_PROGRESS | FLAG_TS_PROGRESS)))
@@ -507,6 +512,18 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
opt_deltas_valid = tcp_accecn_process_option(tp, skb,
delivered_bytes, flag);
+ if (delivered_pkts) {
+ if (!tp->pkts_acked_ewma) {
+ ewma = delivered_pkts << PKTS_ACKED_PREC;
+ } else {
+ ewma = tp->pkts_acked_ewma;
+ ewma = (((ewma << PKTS_ACKED_WEIGHT) - ewma) +
+ (delivered_pkts << PKTS_ACKED_PREC)) >>
+ PKTS_ACKED_WEIGHT;
+ }
+ tp->pkts_acked_ewma = min_t(u32, ewma, 0xFFFFU);
+ }
+
if (!(flag & FLAG_SLOWPATH)) {
/* AccECN counter might overflow on large ACKs */
if (delivered_pkts <= TCP_ACCECN_CEP_ACE_MASK)
@@ -555,7 +572,8 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
if (d_ceb <
safe_delta * tp->mss_cache >> TCP_ACCECN_SAFETY_SHIFT)
return delta;
- }
+ } else if (tp->pkts_acked_ewma > (ACK_COMP_THRESH << PKTS_ACKED_PREC))
+ return delta;
return safe_delta;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 01/13] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-16 9:17 ` Paolo Abeni
2025-10-13 17:03 ` [PATCH v4 net-next 03/13] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
` (11 subsequent siblings)
13 siblings, 1 reply; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Ilpo Järvinen <ij@kernel.org>
As AccECN may keep CWR bit asserted due to different
interpretation of the bit, flushing with GRO because of
CWR may effectively disable GRO until AccECN counter
field changes such that CWR-bit becomes 0.
There is no harm done from not immediately forwarding the
CWR'ed segment with RFC3168 ECN.
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
net/ipv4/tcp_offload.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 2cb93da93abc..fcbf4148919c 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -330,8 +330,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
goto out_check_final;
th2 = tcp_hdr(p);
- flush = (__force int)(flags & TCP_FLAG_CWR);
- flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
+ flush = (__force int)((flags ^ tcp_flag_word(th2)) &
~(TCP_FLAG_FIN | TCP_FLAG_PSH));
flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
for (i = sizeof(*th); i < thlen; i += 4)
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 03/13] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 01/13] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 04/13] tcp: disable RFC3168 fallback identifier " chia-yu.chang
` (10 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang, Olivier Tilmans
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Two CA module flags are added in this patch. First, a new CA module
flag (TCP_CONG_NEEDS_ACCECN) defines that the CA expects to negotiate
AccECN functionality using the ECE, CWR and AE flags in the TCP header.
The detailed AccECN negotiaotn during the 3WHS can be found in the
AccECN spec:
https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
Second, when ECN is negociated for a TCP flow, it defaults to use
ECT(0) in the IP header. L4S service, however, needs to se ECT(1).
This patch enables CA to control whether ECT(0) or ECT(1) should
be used on a per-segment basis. A new flag (TCP_CONG_WANTS_ECT_1)
defines the behavior expected by the CA when not-yet initialized for
the connection.
Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
v3:
- Change TCP_CONG_WANTS_ECT_1 into individual flag.
- Add helper function INET_ECN_xmit_wants_ect_1().
---
include/net/inet_ecn.h | 20 +++++++++++++++++---
include/net/tcp.h | 22 +++++++++++++++++++++-
include/net/tcp_ecn.h | 13 ++++++++++---
net/ipv4/tcp_cong.c | 10 +++++++---
net/ipv4/tcp_input.c | 3 ++-
net/ipv4/tcp_output.c | 10 ++++++++--
6 files changed, 65 insertions(+), 13 deletions(-)
diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index ea32393464a2..827b87a95dab 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -51,11 +51,25 @@ static inline __u8 INET_ECN_encapsulate(__u8 outer, __u8 inner)
return outer;
}
+/* Apply either ECT(0) or ECT(1) */
+static inline void __INET_ECN_xmit(struct sock *sk, bool use_ect_1)
+{
+ __u8 ect = use_ect_1 ? INET_ECN_ECT_1 : INET_ECN_ECT_0;
+
+ /* Mask the complete byte in case the connection alternates between
+ * ECT(0) and ECT(1).
+ */
+ inet_sk(sk)->tos &= ~INET_ECN_MASK;
+ inet_sk(sk)->tos |= ect;
+ if (inet6_sk(sk)) {
+ inet6_sk(sk)->tclass &= ~INET_ECN_MASK;
+ inet6_sk(sk)->tclass |= ect;
+ }
+}
+
static inline void INET_ECN_xmit(struct sock *sk)
{
- inet_sk(sk)->tos |= INET_ECN_ECT_0;
- if (inet6_sk(sk) != NULL)
- inet6_sk(sk)->tclass |= INET_ECN_ECT_0;
+ __INET_ECN_xmit(sk, false);
}
static inline void INET_ECN_dontxmit(struct sock *sk)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5ca230ed526a..8467c83e4aee 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -403,6 +403,7 @@ static inline void tcp_dec_quickack_mode(struct sock *sk)
#define TCP_ECN_DEMAND_CWR BIT(2)
#define TCP_ECN_SEEN BIT(3)
#define TCP_ECN_MODE_ACCECN BIT(4)
+#define TCP_ECN_ECT_1 BIT(5)
#define TCP_ECN_DISABLED 0
#define TCP_ECN_MODE_PENDING (TCP_ECN_MODE_RFC3168 | TCP_ECN_MODE_ACCECN)
@@ -1190,7 +1191,12 @@ enum tcp_ca_ack_event_flags {
#define TCP_CONG_NON_RESTRICTED BIT(0)
/* Requires ECN/ECT set on all packets */
#define TCP_CONG_NEEDS_ECN BIT(1)
-#define TCP_CONG_MASK (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN)
+/* Require successfully negotiated AccECN capability */
+#define TCP_CONG_NEEDS_ACCECN BIT(2)
+/* Use ECT(1) instead of ECT(0) while the CA is uninitialized */
+#define TCP_CONG_WANTS_ECT_1 BIT(3)
+#define TCP_CONG_MASK (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN | \
+ TCP_CONG_NEEDS_ACCECN | TCP_CONG_WANTS_ECT_1)
union tcp_cc_info;
@@ -1322,6 +1328,20 @@ static inline bool tcp_ca_needs_ecn(const struct sock *sk)
return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ECN;
}
+static inline bool tcp_ca_needs_accecn(const struct sock *sk)
+{
+ const struct inet_connection_sock *icsk = inet_csk(sk);
+
+ return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ACCECN;
+}
+
+static inline bool tcp_ca_wants_ect_1(const struct sock *sk)
+{
+ const struct inet_connection_sock *icsk = inet_csk(sk);
+
+ return icsk->icsk_ca_ops->flags & TCP_CONG_WANTS_ECT_1;
+}
+
static inline void tcp_ca_event(struct sock *sk, const enum tcp_ca_event event)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index f13e5cd2b1ac..0cc698a8438c 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -31,6 +31,12 @@ enum tcp_accecn_option {
TCP_ACCECN_OPTION_FULL = 2,
};
+/* Apply either ECT(0) or ECT(1) based on TCP_CONG_WANTS_ECT_1 flag */
+static inline void INET_ECN_xmit_wants_ect_1(struct sock *sk)
+{
+ __INET_ECN_xmit(sk, tcp_ca_wants_ect_1(sk));
+}
+
static inline void tcp_ecn_queue_cwr(struct tcp_sock *tp)
{
/* Do not set CWR if in AccECN mode! */
@@ -561,7 +567,7 @@ static inline void tcp_ecn_send_synack(struct sock *sk, struct sk_buff *skb)
TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ECE;
else if (tcp_ca_needs_ecn(sk) ||
tcp_bpf_ca_needs_ecn(sk))
- INET_ECN_xmit(sk);
+ INET_ECN_xmit_wants_ect_1(sk);
if (tp->ecn_flags & TCP_ECN_MODE_ACCECN) {
TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ACE;
@@ -579,7 +585,8 @@ static inline void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
bool use_ecn, use_accecn;
u8 tcp_ecn = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn);
- use_accecn = tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ACCECN;
+ use_accecn = tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ACCECN ||
+ tcp_ca_needs_accecn(sk);
use_ecn = tcp_ecn == TCP_ECN_IN_ECN_OUT_ECN ||
tcp_ecn == TCP_ECN_IN_ACCECN_OUT_ECN ||
tcp_ca_needs_ecn(sk) || bpf_needs_ecn || use_accecn;
@@ -595,7 +602,7 @@ static inline void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
if (use_ecn) {
if (tcp_ca_needs_ecn(sk) || bpf_needs_ecn)
- INET_ECN_xmit(sk);
+ INET_ECN_xmit_wants_ect_1(sk);
TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR;
if (use_accecn) {
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index df758adbb445..1a8ed6983ac3 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -16,6 +16,7 @@
#include <linux/gfp.h>
#include <linux/jhash.h>
#include <net/tcp.h>
+#include <net/tcp_ecn.h>
#include <trace/events/tcp.h>
static DEFINE_SPINLOCK(tcp_cong_list_lock);
@@ -227,7 +228,7 @@ void tcp_assign_congestion_control(struct sock *sk)
memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
if (ca->flags & TCP_CONG_NEEDS_ECN)
- INET_ECN_xmit(sk);
+ INET_ECN_xmit_wants_ect_1(sk);
else
INET_ECN_dontxmit(sk);
}
@@ -240,7 +241,10 @@ void tcp_init_congestion_control(struct sock *sk)
if (icsk->icsk_ca_ops->init)
icsk->icsk_ca_ops->init(sk);
if (tcp_ca_needs_ecn(sk))
- INET_ECN_xmit(sk);
+ /* The CA is already initialized, expect it to set the
+ * appropriate flag to select ECT(1).
+ */
+ __INET_ECN_xmit(sk, tcp_sk(sk)->ecn_flags & TCP_ECN_ECT_1);
else
INET_ECN_dontxmit(sk);
icsk->icsk_ca_initialized = 1;
@@ -257,7 +261,7 @@ static void tcp_reinit_congestion_control(struct sock *sk,
memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
if (ca->flags & TCP_CONG_NEEDS_ECN)
- INET_ECN_xmit(sk);
+ INET_ECN_xmit_wants_ect_1(sk);
else
INET_ECN_dontxmit(sk);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8af527f4e2b5..27fba14bb3b2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -7233,7 +7233,8 @@ static void tcp_ecn_create_request(struct request_sock *req,
u32 ecn_ok_dst;
if (tcp_accecn_syn_requested(th) &&
- READ_ONCE(net->ipv4.sysctl_tcp_ecn) >= 3) {
+ (READ_ONCE(net->ipv4.sysctl_tcp_ecn) >= 3 ||
+ tcp_ca_needs_accecn(listen_sk))) {
inet_rsk(req)->ecn_ok = 1;
tcp_rsk(req)->accecn_ok = 1;
tcp_rsk(req)->syn_ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bb3576ac0ad7..c7b884f173d0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -328,12 +328,17 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
struct tcphdr *th, int tcp_header_len)
{
struct tcp_sock *tp = tcp_sk(sk);
+ bool ecn_ect_1;
if (!tcp_ecn_mode_any(tp))
return;
+ ecn_ect_1 = tp->ecn_flags & TCP_ECN_ECT_1;
+ if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
+ __INET_ECN_xmit(sk, true);
+
if (tcp_ecn_mode_accecn(tp)) {
- if (!tcp_accecn_ace_fail_recv(tp))
+ if (!ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
INET_ECN_xmit(sk);
tcp_accecn_set_ace(tp, skb, th);
skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN;
@@ -341,7 +346,8 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
/* Not-retransmitted data segment: set ECT and inject CWR. */
if (skb->len != tcp_header_len &&
!before(TCP_SKB_CB(skb)->seq, tp->snd_nxt)) {
- INET_ECN_xmit(sk);
+ if (!ecn_ect_1)
+ INET_ECN_xmit(sk);
if (tp->ecn_flags & TCP_ECN_QUEUE_CWR) {
tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR;
th->cwr = 1;
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 04/13] tcp: disable RFC3168 fallback identifier for CC modules
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (2 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 03/13] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 05/13] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
` (9 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
When AccECN is not successfully negociated for a TCP flow, it defaults
fallback to classic ECN (RFC3168). However, L4S service will fallback
to non-ECN.
This patch enables congestion control module to control whether it
should not fallback to classic ECN after unsuccessful AccECN negotiation.
A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this
behavior expected by the CA.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
v3:
- Add empty line between variable declarations and code.
---
include/net/tcp.h | 12 +++++++++++-
include/net/tcp_ecn.h | 11 ++++++++---
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_minisocks.c | 7 ++++---
4 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 8467c83e4aee..08ac4433535f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1195,8 +1195,11 @@ enum tcp_ca_ack_event_flags {
#define TCP_CONG_NEEDS_ACCECN BIT(2)
/* Use ECT(1) instead of ECT(0) while the CA is uninitialized */
#define TCP_CONG_WANTS_ECT_1 BIT(3)
+/* Cannot fallback to RFC3168 during AccECN negotiation */
+#define TCP_CONG_NO_FALLBACK_RFC3168 BIT(4)
#define TCP_CONG_MASK (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN | \
- TCP_CONG_NEEDS_ACCECN | TCP_CONG_WANTS_ECT_1)
+ TCP_CONG_NEEDS_ACCECN | TCP_CONG_WANTS_ECT_1 | \
+ TCP_CONG_NO_FALLBACK_RFC3168)
union tcp_cc_info;
@@ -1335,6 +1338,13 @@ static inline bool tcp_ca_needs_accecn(const struct sock *sk)
return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ACCECN;
}
+static inline bool tcp_ca_no_fallback_rfc3168(const struct sock *sk)
+{
+ const struct inet_connection_sock *icsk = inet_csk(sk);
+
+ return icsk->icsk_ca_ops->flags & TCP_CONG_NO_FALLBACK_RFC3168;
+}
+
static inline bool tcp_ca_wants_ect_1(const struct sock *sk)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 0cc698a8438c..a7ba21d298ff 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -507,7 +507,9 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
* | ECN | AccECN | 0 0 1 | Classic ECN |
* +========+========+============+=============+
*/
- if (tcp_ecn_mode_pending(tp))
+ if (tcp_ca_no_fallback_rfc3168(sk))
+ tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
+ else if (tcp_ecn_mode_pending(tp))
/* Downgrade from AccECN, or requested initially */
tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
break;
@@ -531,9 +533,11 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
}
}
-static inline void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th,
+static inline void tcp_ecn_rcv_syn(struct sock *sk, const struct tcphdr *th,
const struct sk_buff *skb)
{
+ struct tcp_sock *tp = tcp_sk(sk);
+
if (tcp_ecn_mode_pending(tp)) {
if (!tcp_accecn_syn_requested(th)) {
/* Downgrade to classic ECN feedback */
@@ -545,7 +549,8 @@ static inline void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th,
tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
}
}
- if (tcp_ecn_mode_rfc3168(tp) && (!th->ece || !th->cwr))
+ if (tcp_ecn_mode_rfc3168(tp) &&
+ (!th->ece || !th->cwr || tcp_ca_no_fallback_rfc3168(sk)))
tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 27fba14bb3b2..4a6abf536dbe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6828,7 +6828,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
tp->snd_wl1 = TCP_SKB_CB(skb)->seq;
tp->max_window = tp->snd_wnd;
- tcp_ecn_rcv_syn(tp, th, skb);
+ tcp_ecn_rcv_syn(sk, th, skb);
tcp_mtup_init(sk);
tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 2ec8c6f1cdcc..1fade94813c6 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -488,9 +488,10 @@ static void tcp_ecn_openreq_child(struct sock *sk,
tp->accecn_opt_demand = 1;
tcp_ecn_received_counters_payload(sk, skb);
} else {
- tcp_ecn_mode_set(tp, inet_rsk(req)->ecn_ok ?
- TCP_ECN_MODE_RFC3168 :
- TCP_ECN_DISABLED);
+ if (inet_rsk(req)->ecn_ok && !tcp_ca_no_fallback_rfc3168(sk))
+ tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
+ else
+ tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 05/13] tcp: accecn: handle unexpected AccECN negotiation feedback
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (3 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 04/13] tcp: disable RFC3168 fallback identifier " chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-16 9:02 ` Paolo Abeni
2025-10-13 17:03 ` [PATCH v4 net-next 06/13] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
` (8 subsequent siblings)
13 siblings, 1 reply; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
According to Section 3.1.2 of AccECN spec (RFC9768), if a TCP Client
has sent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1)
then receives a SYN/ACK with the currently reserved combination
(AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a
combination, the Client MUST enable AccECN mode as if the SYN/ACK
confirmed that the Server supported AccECN and as if it fed back that
the IP-ECN field on the SYN had arrived unchanged.
This patch fix an incorrect AccECN negoation of commit 3cae34274c79
("tcp: accecn: AccECN negotiation").
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
v3:
- Update commit message to fix old AccECN commits.
---
include/net/tcp_ecn.h | 44 ++++++++++++++++++++++++++++++-------------
1 file changed, 31 insertions(+), 13 deletions(-)
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index a7ba21d298ff..c66f0d944e1c 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -473,6 +473,26 @@ static inline u8 tcp_accecn_option_init(const struct sk_buff *skb,
return TCP_ACCECN_OPT_COUNTER_SEEN;
}
+static inline void tcp_ecn_rcv_synack_accecn(struct sock *sk,
+ const struct sk_buff *skb, u8 dsf)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
+ tp->syn_ect_rcv = dsf & INET_ECN_MASK;
+ /* Demand Accurate ECN option in response to the SYN on the SYN/ACK
+ * and the TCP server will try to send one more packet with an AccECN
+ * Option at a later point during the connection.
+ */
+ if (tp->rx_opt.accecn &&
+ tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
+ u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
+
+ tcp_accecn_saw_opt_fail_recv(tp, saw_opt);
+ tp->accecn_opt_demand = 2;
+ }
+}
+
/* See Table 2 of the AccECN draft */
static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb,
const struct tcphdr *th, u8 ip_dsfield)
@@ -495,13 +515,11 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
break;
case 0x1:
- case 0x5:
/* +========+========+============+=============+
* | A | B | SYN/ACK | Feedback |
* | | | B->A | Mode of A |
* | | | AE CWR ECE | |
* +========+========+============+=============+
- * | AccECN | Nonce | 1 0 1 | (Reserved) |
* | AccECN | ECN | 0 0 1 | Classic ECN |
* | Nonce | AccECN | 0 0 1 | Classic ECN |
* | ECN | AccECN | 0 0 1 | Classic ECN |
@@ -509,20 +527,20 @@ static inline void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb
*/
if (tcp_ca_no_fallback_rfc3168(sk))
tcp_ecn_mode_set(tp, TCP_ECN_DISABLED);
- else if (tcp_ecn_mode_pending(tp))
- /* Downgrade from AccECN, or requested initially */
+ else
tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168);
break;
- default:
- tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN);
- tp->syn_ect_rcv = ip_dsfield & INET_ECN_MASK;
- if (tp->rx_opt.accecn &&
- tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
- u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
-
- tcp_accecn_saw_opt_fail_recv(tp, saw_opt);
- tp->accecn_opt_demand = 2;
+ case 0x5:
+ if (tcp_ecn_mode_pending(tp)) {
+ tcp_ecn_rcv_synack_accecn(sk, skb, ip_dsfield);
+ if (INET_ECN_is_ce(ip_dsfield)) {
+ tp->received_ce++;
+ tp->received_ce_pending++;
+ }
}
+ break;
+ default:
+ tcp_ecn_rcv_synack_accecn(sk, skb, ip_dsfield);
if (INET_ECN_is_ce(ip_dsfield) &&
tcp_accecn_validate_syn_feedback(sk, ace,
tp->syn_ect_snt)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 06/13] tcp: accecn: retransmit downgraded SYN in AccECN negotiation
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (4 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 05/13] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 07/13] tcp: move increment of num_retrans chia-yu.chang
` (7 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Based on AccECN spec (RFC9768), if the sender of an AccECN SYN
(the TCP Client) times out before receiving the SYN/ACK, it SHOULD
attempt to negotiate the use of AccECN at least one more time by
continuing to set all three TCP ECN flags (AE,CWR,ECE) = (1,1,1) on
the first retransmitted SYN (using the usual retransmission time-outs).
If this first retransmission also fails to be acknowledged, in
deployment scenarios where AccECN path traversal might be problematic,
the TCP Client SHOULD send subsequent retransmissions of the SYN with
the three TCP-ECN flags cleared (AE,CWR,ECE) = (0,0,0).
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
net/ipv4/tcp_output.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c7b884f173d0..6e6fea8f295b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3563,12 +3563,15 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
tcp_retrans_try_collapse(sk, skb, avail_wnd);
}
- /* RFC3168, section 6.1.1.1. ECN fallback
- * As AccECN uses the same SYN flags (+ AE), this check covers both
- * cases.
- */
- if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) == TCPHDR_SYN_ECN)
- tcp_ecn_clear_syn(sk, skb);
+ if (!tcp_ecn_mode_pending(tp) || icsk->icsk_retransmits > 1) {
+ /* RFC3168, section 6.1.1.1. ECN fallback
+ * As AccECN uses the same SYN flags (+ AE), this check
+ * covers both cases.
+ */
+ if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) ==
+ TCPHDR_SYN_ECN)
+ tcp_ecn_clear_syn(sk, skb);
+ }
/* Update global and local TCP statistics. */
segs = tcp_skb_pcount(skb);
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 07/13] tcp: move increment of num_retrans
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (5 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 06/13] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 08/13] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
` (6 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Before this patch, num_retrans = 0 for the first SYN/ACK and the first
retransmitted SYN/ACK; however, an upcoming change will need to
differentiate between those two conditions. This patch moves the
increment of num_tranns before rtx_syn_ack() so we can distinguish
between these two cases when making SYN/ACK.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
net/ipv4/tcp_output.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6e6fea8f295b..ef008736e3a9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4593,6 +4593,7 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
/* Paired with WRITE_ONCE() in sock_setsockopt() */
if (READ_ONCE(sk->sk_txrehash) == SOCK_TXREHASH_ENABLED)
WRITE_ONCE(tcp_rsk(req)->txhash, net_tx_rndhash());
+ WRITE_ONCE(req->num_retrans, req->num_retrans + 1);
res = af_ops->send_synack(sk, NULL, &fl, req, NULL, TCP_SYNACK_NORMAL,
NULL);
if (!res) {
@@ -4606,7 +4607,8 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
tcp_sk_rw(sk)->total_retrans++;
}
trace_tcp_retransmit_synack(sk, req);
- WRITE_ONCE(req->num_retrans, req->num_retrans + 1);
+ } else {
+ WRITE_ONCE(req->num_retrans, req->num_retrans - 1);
}
return res;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 08/13] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (6 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 07/13] tcp: move increment of num_retrans chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-16 9:13 ` Paolo Abeni
2025-10-13 17:03 ` [PATCH v4 net-next 09/13] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
` (5 subsequent siblings)
13 siblings, 1 reply; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
If the TCP Server has not received an ACK to acknowledge its SYN/ACK
after the normal TCP timeout or it receives a second SYN with a
request for AccECN support, then either the SYN/ACK might just have
been lost, e.g. due to congestion, or a middlebox might be blocking
AccECN Options. To expedite connection setup in deployment scenarios
where AccECN path traversal might be problematic, the TCP Server SHOULD
retransmit the SYN/ACK, but with no AccECN Option.
If this retransmission times out, to expedite connection setup, the TCP
Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0)
and no AccECN Option, but it remains in AccECN feedback mode.
This follows Section 3.2.3.2.2 of AccECN spec (RFC9768).
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
include/net/tcp_ecn.h | 14 ++++++++++----
net/ipv4/tcp_output.c | 2 +-
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index c66f0d944e1c..97a3a7f36aff 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -651,10 +651,16 @@ static inline void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb)
static inline void
tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
{
- if (tcp_rsk(req)->accecn_ok)
- tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
- else if (inet_rsk(req)->ecn_ok)
- th->ece = 1;
+ if (req->num_retrans < 1 || req->num_timeout < 1) {
+ if (tcp_rsk(req)->accecn_ok)
+ tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
+ else if (inet_rsk(req)->ecn_ok)
+ th->ece = 1;
+ } else if (tcp_rsk(req)->accecn_ok) {
+ th->ae = 0;
+ th->cwr = 0;
+ th->ece = 0;
+ }
}
static inline bool tcp_accecn_option_beacon_check(const struct sock *sk)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ef008736e3a9..f2e8a068f1d3 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1109,7 +1109,7 @@ static unsigned int tcp_synack_options(const struct sock *sk,
if (treq->accecn_ok &&
READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn_option) &&
- req->num_timeout < 1 && remaining >= TCPOLEN_ACCECN_BASE) {
+ req->num_retrans < 1 && remaining >= TCPOLEN_ACCECN_BASE) {
opts->use_synack_ecn_bytes = 1;
remaining -= tcp_options_fit_accecn(opts, 0, remaining);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 09/13] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (7 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 08/13] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 10/13] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
` (4 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Based on specification:
https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
Based on Section 3.1.5 of AccECN spec (RFC9768), a TCP Server in
AccECN mode MUST NOT set ECT on any packet for the rest of the connection,
if it has received or sent at least one valid SYN or Acceptable SYN/ACK
with (AE,CWR,ECE) = (0,0,0) during the handshake.
In addition, a host in AccECN mode that is feeding back the IP-ECN
field on a SYN or SYN/ACK MUST feed back the IP-ECN field on the
latest valid SYN or acceptable SYN/ACK to arrive.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
include/net/tcp_ecn.h | 4 +++-
net/ipv4/tcp_input.c | 2 ++
net/ipv4/tcp_minisocks.c | 33 +++++++++++++++++++++++----------
net/ipv4/tcp_output.c | 8 +++++---
4 files changed, 33 insertions(+), 14 deletions(-)
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 97a3a7f36aff..50551430b1fa 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -649,7 +649,8 @@ static inline void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb)
}
static inline void
-tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
+tcp_ecn_make_synack(struct sock *sk, const struct request_sock *req,
+ struct tcphdr *th)
{
if (req->num_retrans < 1 || req->num_timeout < 1) {
if (tcp_rsk(req)->accecn_ok)
@@ -660,6 +661,7 @@ tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
th->ae = 0;
th->cwr = 0;
th->ece = 0;
+ tcp_accecn_fail_mode_set(tcp_sk(sk), TCP_ACCECN_ACE_FAIL_SEND);
}
}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4a6abf536dbe..f1e73e264b19 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6207,6 +6207,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
if (th->syn) {
if (tcp_ecn_mode_accecn(tp)) {
accecn_reflector = true;
+ tp->syn_ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
+ INET_ECN_MASK;
if (tp->rx_opt.accecn &&
tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) {
u8 saw_opt = tcp_accecn_option_init(skb, tp->rx_opt.accecn);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 1fade94813c6..ecead2b771fd 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -751,16 +751,29 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
*/
if (!tcp_oow_rate_limited(sock_net(sk), skb,
LINUX_MIB_TCPACKSKIPPEDSYNRECV,
- &tcp_rsk(req)->last_oow_ack_time) &&
-
- !tcp_rtx_synack(sk, req)) {
- unsigned long expires = jiffies;
-
- expires += reqsk_timeout(req, TCP_RTO_MAX);
- if (!fastopen)
- mod_timer_pending(&req->rsk_timer, expires);
- else
- req->rsk_timer.expires = expires;
+ &tcp_rsk(req)->last_oow_ack_time)) {
+ if (tcp_rsk(req)->accecn_ok) {
+ u8 ect_rcv = TCP_SKB_CB(skb)->ip_dsfield &
+ INET_ECN_MASK;
+
+ tcp_rsk(req)->syn_ect_rcv = ect_rcv;
+ if (tcp_accecn_ace(tcp_hdr(skb)) == 0x0) {
+ u8 fail_mode = TCP_ACCECN_ACE_FAIL_RECV;
+
+ tcp_accecn_fail_mode_set(tcp_sk(sk),
+ fail_mode);
+ }
+ }
+ if (!tcp_rtx_synack(sk, req)) {
+ unsigned long expires = jiffies;
+
+ expires += reqsk_timeout(req, TCP_RTO_MAX);
+ if (!fastopen)
+ mod_timer_pending(&req->rsk_timer,
+ expires);
+ else
+ req->rsk_timer.expires = expires;
+ }
}
return NULL;
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f2e8a068f1d3..bd718e342638 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -334,11 +334,13 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
return;
ecn_ect_1 = tp->ecn_flags & TCP_ECN_ECT_1;
- if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
+ if (ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp) &&
+ !tcp_accecn_ace_fail_send(tp))
__INET_ECN_xmit(sk, true);
if (tcp_ecn_mode_accecn(tp)) {
- if (!ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp))
+ if (!ecn_ect_1 && !tcp_accecn_ace_fail_recv(tp) &&
+ !tcp_accecn_ace_fail_send(tp))
INET_ECN_xmit(sk);
tcp_accecn_set_ace(tp, skb, th);
skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN;
@@ -3990,7 +3992,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
memset(th, 0, sizeof(struct tcphdr));
th->syn = 1;
th->ack = 1;
- tcp_ecn_make_synack(req, th);
+ tcp_ecn_make_synack((struct sock *)sk, req, th);
th->source = htons(ireq->ir_num);
th->dest = ireq->ir_rmt_port;
skb->mark = ireq->ir_mark;
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 10/13] tcp: accecn: fallback outgoing half link to non-AccECN
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (8 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 09/13] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 11/13] tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation chia-yu.chang
` (3 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server
is in AccECN mode and in SYN-RCVD state, and if it receives a value of
zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the
connection the Server MUST NOT set ECT on outgoing packets and MUST
NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it
MUST NOT disable AccECN feedback.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
v3:
- Remove unnecessary brackets.
---
include/net/tcp_ecn.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 50551430b1fa..2dfd2fda085d 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -175,7 +175,9 @@ static inline void tcp_accecn_third_ack(struct sock *sk,
switch (ace) {
case 0x0:
/* Invalid value */
- tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV);
+ if (!TCP_SKB_CB(skb)->sacked)
+ tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV |
+ TCP_ACCECN_OPT_FAIL_RECV);
break;
case 0x7:
case 0x5:
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 11/13] tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (9 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 10/13] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 12/13] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
` (2 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
After successfully negotiating AccECN mode in the handshake, check
the ACE field of the first data ACK. If zero (0b000), non-ECT packets
are sent and any response to CE marking feedback is disabled. This
follows Section 3.2.2.4 of AccECN spec (RFC9768).
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
net/ipv4/tcp_input.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f1e73e264b19..61aada9f3a6f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -495,7 +495,7 @@ static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
/* Returns the ECN CE delta */
static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
u32 delivered_pkts, u32 delivered_bytes,
- int flag)
+ u64 prior_bytes_acked, int flag)
{
u32 old_ceb = tcp_sk(sk)->delivered_ecn_bytes[INET_ECN_CE - 1];
const struct tcphdr *th = tcp_hdr(skb);
@@ -534,6 +534,16 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
if (flag & FLAG_SYN_ACKED)
return 0;
+ /* Verify ACE!=0 in the 1st data ACK after AccECN negotiation */
+ if ((flag & FLAG_DATA_ACKED) && prior_bytes_acked <= tp->mss_cache) {
+ if (tcp_accecn_ace(tcp_hdr(skb)) == 0x0) {
+ INET_ECN_dontxmit(sk);
+ tcp_accecn_fail_mode_set(tp, TCP_ACCECN_ACE_FAIL_RECV |
+ TCP_ACCECN_OPT_FAIL_RECV);
+ return 0;
+ }
+ }
+
if (tp->received_ce_pending >= TCP_ACCECN_ACE_MAX_DELTA)
inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW;
@@ -580,13 +590,13 @@ static u32 __tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
static u32 tcp_accecn_process(struct sock *sk, const struct sk_buff *skb,
u32 delivered_pkts, u32 delivered_bytes,
- int *flag)
+ u64 prior_bytes_acked, int *flag)
{
struct tcp_sock *tp = tcp_sk(sk);
u32 delta;
delta = __tcp_accecn_process(sk, skb, delivered_pkts,
- delivered_bytes, *flag);
+ delivered_bytes, prior_bytes_acked, *flag);
if (delta > 0) {
tcp_count_delivered_ce(tp, delta);
*flag |= FLAG_ECE;
@@ -3997,6 +4007,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_sacktag_state sack_state;
struct rate_sample rs = { .prior_delivered = 0 };
+ u64 prior_bytes_acked = tp->bytes_acked;
u32 prior_snd_una = tp->snd_una;
bool is_sack_reneg = tp->is_sack_reneg;
u32 ack_seq = TCP_SKB_CB(skb)->seq;
@@ -4120,7 +4131,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
ecn_count = tcp_accecn_process(sk, skb,
tp->delivered - delivered,
sack_state.delivered_bytes,
- &flag);
+ prior_bytes_acked, &flag);
tcp_in_ack_event(sk, flag);
@@ -4160,7 +4171,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
ecn_count = tcp_accecn_process(sk, skb,
tp->delivered - delivered,
sack_state.delivered_bytes,
- &flag);
+ prior_bytes_acked, &flag);
+
tcp_in_ack_event(sk, flag);
/* If data was DSACKed, see if we can undo a cwnd reduction. */
if (flag & FLAG_DSACKING_ACK) {
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 12/13] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (10 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 11/13] tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-16 9:25 ` Paolo Abeni
2025-10-13 17:03 ` [PATCH v4 net-next 13/13] tcp: accecn: enable AccECN chia-yu.chang
2025-10-16 10:08 ` [PATCH v4 net-next 00/13] AccECN protocol case handling series Jakub Sitnicki
13 siblings, 1 reply; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Detect spurious retransmission of a previously sent ACK carrying the
AccECN option after the second retransmission. Since this might be caused
by the middlebox dropping ACK with options it does not recognize, disable
the sending of the AccECN option in all subsequent ACKs. This patch
follows Section 3.2.3.2.2 of AccECN spec (RFC9768).
Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl:
(TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and
persistently sends AccECN option once it fits into TCP option space.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
Documentation/networking/ip-sysctl.rst | 4 +++-
include/linux/tcp.h | 3 ++-
include/net/tcp_ecn.h | 2 ++
net/ipv4/sysctl_net_ipv4.c | 2 +-
net/ipv4/tcp_input.c | 9 +++++++++
net/ipv4/tcp_output.c | 7 ++++++-
6 files changed, 23 insertions(+), 4 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index a06cb99d66dc..88ce54bf2a02 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -482,7 +482,9 @@ tcp_ecn_option - INTEGER
1 Send AccECN option sparingly according to the minimum option
rules outlined in draft-ietf-tcpm-accurate-ecn.
2 Send AccECN option on every packet whenever it fits into TCP
- option space.
+ option space except when AccECN fallback is triggered.
+ 3 Send AccECN option on every packet whenever it fits into TCP
+ option space even when AccECN fallback is triggered.
= ============================================================
Default: 2
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 683f38362977..32b031d09294 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -294,7 +294,8 @@ struct tcp_sock {
u8 nonagle : 4,/* Disable Nagle algorithm? */
rate_app_limited:1; /* rate_{delivered,interval_us} limited? */
u8 received_ce_pending:4, /* Not yet transmit cnt of received_ce */
- unused2:4;
+ accecn_opt_sent:1,/* Sent AccECN option in previous ACK */
+ unused2:3;
u8 accecn_minlen:2,/* Minimum length of AccECN option sent */
est_ecnfield:2,/* ECN field for AccECN delivered estimates */
accecn_opt_demand:2,/* Demand AccECN option for n next ACKs */
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index 2dfd2fda085d..13315adb5bc7 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -29,6 +29,7 @@ enum tcp_accecn_option {
TCP_ACCECN_OPTION_DISABLED = 0,
TCP_ACCECN_OPTION_MINIMUM = 1,
TCP_ACCECN_OPTION_FULL = 2,
+ TCP_ACCECN_OPTION_PERSIST = 3,
};
/* Apply either ECT(0) or ECT(1) based on TCP_CONG_WANTS_ECT_1 flag */
@@ -406,6 +407,7 @@ static inline void tcp_accecn_init_counters(struct tcp_sock *tp)
tp->received_ce_pending = 0;
__tcp_accecn_init_bytes_counters(tp->received_ecn_bytes);
__tcp_accecn_init_bytes_counters(tp->delivered_ecn_bytes);
+ tp->accecn_opt_sent = 0;
tp->accecn_minlen = 0;
tp->accecn_opt_demand = 0;
tp->est_ecnfield = 0;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 24dbc603cc44..91a90c34a810 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -738,7 +738,7 @@ static struct ctl_table ipv4_net_table[] = {
.mode = 0644,
.proc_handler = proc_dou8vec_minmax,
.extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_TWO,
+ .extra2 = SYSCTL_THREE,
},
{
.procname = "tcp_ecn_option_beacon",
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 61aada9f3a6f..edfcce235d2c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4808,6 +4808,7 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq)
static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
{
+ struct tcp_sock *tp = tcp_sk(sk);
/* When the ACK path fails or drops most ACKs, the sender would
* timeout and spuriously retransmit the same segment repeatedly.
* If it seems our ACKs are not reaching the other side,
@@ -4827,6 +4828,14 @@ static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
/* Save last flowlabel after a spurious retrans. */
tcp_save_lrcv_flowlabel(sk, skb);
#endif
+ /* Check DSACK info to detect that the previous ACK carrying the
+ * AccECN option was lost after the second retransmision, and then
+ * stop sending AccECN option in all subsequent ACKs.
+ */
+ if (tcp_ecn_mode_accecn(tp) &&
+ TCP_SKB_CB(skb)->seq == tp->duplicate_sack[0].start_seq &&
+ tp->accecn_opt_sent)
+ tcp_accecn_fail_mode_set(tp, TCP_ACCECN_OPT_FAIL_SEND);
}
static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bd718e342638..207b45a86938 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -719,9 +719,12 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
if (tp) {
tp->accecn_minlen = 0;
tp->accecn_opt_tstamp = tp->tcp_mstamp;
+ tp->accecn_opt_sent = 1;
if (tp->accecn_opt_demand)
tp->accecn_opt_demand--;
}
+ } else if (tp) {
+ tp->accecn_opt_sent = 0;
}
if (unlikely(OPTION_SACK_ADVERTISE & options)) {
@@ -1191,7 +1194,9 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
if (tcp_ecn_mode_accecn(tp)) {
int ecn_opt = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_ecn_option);
- if (ecn_opt && tp->saw_accecn_opt && !tcp_accecn_opt_fail_send(tp) &&
+ if (ecn_opt && tp->saw_accecn_opt &&
+ (ecn_opt >= TCP_ACCECN_OPTION_PERSIST ||
+ !tcp_accecn_opt_fail_send(tp)) &&
(ecn_opt >= TCP_ACCECN_OPTION_FULL || tp->accecn_opt_demand ||
tcp_accecn_option_beacon_check(sk))) {
opts->use_synack_ecn_bytes = 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 net-next 13/13] tcp: accecn: enable AccECN
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (11 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 12/13] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
@ 2025-10-13 17:03 ` chia-yu.chang
2025-10-16 10:08 ` [PATCH v4 net-next 00/13] AccECN protocol case handling series Jakub Sitnicki
13 siblings, 0 replies; 27+ messages in thread
From: chia-yu.chang @ 2025-10-13 17:03 UTC (permalink / raw)
To: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
Cc: Chia-Yu Chang
From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Enable Accurate ECN negotiation and request for incoming and
outgoing connection by setting sysctl_tcp_ecn:
+==============+===========================================+
| | Highest ECN variant (Accurate ECN, ECN, |
| tcp_ecn | or no ECN) to be negotiated & requested |
| +---------------------+---------------------+
| | Incoming connection | Outgoing connection |
+==============+=====================+=====================+
| 0 | No ECN | No ECN |
| 1 | ECN | ECN |
| 2 | ECN | No ECN |
+--------------+---------------------+---------------------+
| 3 | Accurate ECN | Accurate ECN |
| 4 | Accurate ECN | ECN |
| 5 | Accurate ECN | No ECN |
+==============+=====================+=====================+
Refer Documentation/networking/ip-sysctl.rst for more details.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
net/ipv4/sysctl_net_ipv4.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 91a90c34a810..9072584e6617 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -47,7 +47,7 @@ static unsigned int udp_child_hash_entries_max = UDP_HTABLE_SIZE_MAX;
static int tcp_plb_max_rounds = 31;
static int tcp_plb_max_cong_thresh = 256;
static unsigned int tcp_tw_reuse_delay_max = TCP_PAWS_MSL * MSEC_PER_SEC;
-static int tcp_ecn_mode_max = 2;
+static int tcp_ecn_mode_max = 5;
/* obsolete */
static int sysctl_tcp_low_latency __read_mostly;
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v4 net-next 05/13] tcp: accecn: handle unexpected AccECN negotiation feedback
2025-10-13 17:03 ` [PATCH v4 net-next 05/13] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
@ 2025-10-16 9:02 ` Paolo Abeni
0 siblings, 0 replies; 27+ messages in thread
From: Paolo Abeni @ 2025-10-16 9:02 UTC (permalink / raw)
To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
On 10/13/25 7:03 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>
> According to Section 3.1.2 of AccECN spec (RFC9768), if a TCP Client
> has sent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1)
> then receives a SYN/ACK with the currently reserved combination
> (AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a
> combination, the Client MUST enable AccECN mode as if the SYN/ACK
> confirmed that the Server supported AccECN and as if it fed back that
> the IP-ECN field on the SYN had arrived unchanged.
>
> This patch fix an incorrect AccECN negoation of commit 3cae34274c79
> ("tcp: accecn: AccECN negotiation").
Minor nit: with my previous feedback I asked a formal fixes tag here.
Yes, we can have fixes tag on net-next.
No need to re-submit just for this.
/P
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 net-next 08/13] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
2025-10-13 17:03 ` [PATCH v4 net-next 08/13] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
@ 2025-10-16 9:13 ` Paolo Abeni
2025-10-18 16:06 ` Chia-Yu Chang (Nokia)
0 siblings, 1 reply; 27+ messages in thread
From: Paolo Abeni @ 2025-10-16 9:13 UTC (permalink / raw)
To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
On 10/13/25 7:03 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>
> If the TCP Server has not received an ACK to acknowledge its SYN/ACK
> after the normal TCP timeout or it receives a second SYN with a
> request for AccECN support, then either the SYN/ACK might just have
> been lost, e.g. due to congestion, or a middlebox might be blocking
> AccECN Options. To expedite connection setup in deployment scenarios
> where AccECN path traversal might be problematic, the TCP Server SHOULD
> retransmit the SYN/ACK, but with no AccECN Option.
>
> If this retransmission times out, to expedite connection setup, the TCP
> Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0)
> and no AccECN Option, but it remains in AccECN feedback mode.
>
> This follows Section 3.2.3.2.2 of AccECN spec (RFC9768).
>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> ---
> include/net/tcp_ecn.h | 14 ++++++++++----
> net/ipv4/tcp_output.c | 2 +-
> 2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
> index c66f0d944e1c..97a3a7f36aff 100644
> --- a/include/net/tcp_ecn.h
> +++ b/include/net/tcp_ecn.h
> @@ -651,10 +651,16 @@ static inline void tcp_ecn_clear_syn(struct sock *sk, struct sk_buff *skb)
> static inline void
> tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
> {
> - if (tcp_rsk(req)->accecn_ok)
> - tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
> - else if (inet_rsk(req)->ecn_ok)
> - th->ece = 1;
> + if (req->num_retrans < 1 || req->num_timeout < 1) {
I think the above condition does not match the commit message. Should be:
if (!req->num_retrans && !req->num_timeout) {
/P
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN
2025-10-13 17:03 ` [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
@ 2025-10-16 9:17 ` Paolo Abeni
2025-10-16 20:26 ` Ilpo Järvinen
0 siblings, 1 reply; 27+ messages in thread
From: Paolo Abeni @ 2025-10-16 9:17 UTC (permalink / raw)
To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
On 10/13/25 7:03 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Ilpo Järvinen <ij@kernel.org>
>
> As AccECN may keep CWR bit asserted due to different
> interpretation of the bit, flushing with GRO because of
> CWR may effectively disable GRO until AccECN counter
> field changes such that CWR-bit becomes 0.
>
> There is no harm done from not immediately forwarding the
> CWR'ed segment with RFC3168 ECN.
I guess this change could introduce additional latency for RFC3168
notification, which sounds not good. On the flip side adding too much
AccECN logic to GRO (i.e. to allow aggregation only for AccECN enabled
flows) looks overkill.
@Eric: WDYT?
Thanks,
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 net-next 12/13] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST
2025-10-13 17:03 ` [PATCH v4 net-next 12/13] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
@ 2025-10-16 9:25 ` Paolo Abeni
0 siblings, 0 replies; 27+ messages in thread
From: Paolo Abeni @ 2025-10-16 9:25 UTC (permalink / raw)
To: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ij, ncardwell,
koen.de_schepper, g.white, ingemar.s.johansson, mirja.kuehlewind,
cheshire, rs.ietf, Jason_Livingood, vidhi_goel
On 10/13/25 7:03 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 61aada9f3a6f..edfcce235d2c 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4808,6 +4808,7 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq)
>
> static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
> {
> + struct tcp_sock *tp = tcp_sk(sk);
Minot nit: An empty line is needed here.
/P
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 net-next 00/13] AccECN protocol case handling series
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
` (12 preceding siblings ...)
2025-10-13 17:03 ` [PATCH v4 net-next 13/13] tcp: accecn: enable AccECN chia-yu.chang
@ 2025-10-16 10:08 ` Jakub Sitnicki
13 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2025-10-16 10:08 UTC (permalink / raw)
To: chia-yu.chang
Cc: pabeni, edumazet, linux-doc, corbet, horms, dsahern, kuniyu, bpf,
netdev, dave.taht, jhs, kuba, stephen, xiyou.wangcong, jiri,
davem, andrew+netdev, donald.hunter, ast, liuhangbin, shuah,
linux-kselftest, ij, ncardwell, koen.de_schepper, g.white,
ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
On Mon, Oct 13, 2025 at 07:03 PM +02, chia-yu.chang@nokia-bell-labs.com wrote:
> From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> Plesae find the v4 AccECN case handling patch series, which covers
> several excpetional case handling of Accurate ECN spec (RFC9768),
> adds new identifiers to be used by CC modules, adds ecn_delta into
> rate_sample, and keeps the ACE counter for computation, etc.
The latest draft is here, if anyone else is looking for it:
https://datatracker.ietf.org/doc/draft-ietf-tcpm-accurate-ecn/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN
2025-10-16 9:17 ` Paolo Abeni
@ 2025-10-16 20:26 ` Ilpo Järvinen
2025-10-20 15:26 ` Chia-Yu Chang (Nokia)
0 siblings, 1 reply; 27+ messages in thread
From: Ilpo Järvinen @ 2025-10-16 20:26 UTC (permalink / raw)
To: Paolo Abeni
Cc: chia-yu.chang, edumazet, linux-doc, corbet, horms, dsahern,
kuniyu, bpf, netdev, dave.taht, jhs, kuba, stephen,
xiyou.wangcong, jiri, davem, andrew+netdev, donald.hunter, ast,
liuhangbin, shuah, linux-kselftest, ncardwell, koen.de_schepper,
g.white, ingemar.s.johansson, mirja.kuehlewind, cheshire, rs.ietf,
Jason_Livingood, vidhi_goel
[-- Attachment #1: Type: text/plain, Size: 3453 bytes --]
On Thu, 16 Oct 2025, Paolo Abeni wrote:
> On 10/13/25 7:03 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > From: Ilpo Järvinen <ij@kernel.org>
> >
> > As AccECN may keep CWR bit asserted due to different
> > interpretation of the bit, flushing with GRO because of
> > CWR may effectively disable GRO until AccECN counter
> > field changes such that CWR-bit becomes 0.
> >
> > There is no harm done from not immediately forwarding the
> > CWR'ed segment with RFC3168 ECN.
>
> I guess this change could introduce additional latency for RFC3168
> notification, which sounds not good.
>
> @Eric: WDYT?
I'm not Eric but I want to add I foresaw somebody making this argument
and thus wanted to not hide this change into some other patch so it can be
properly discussed and rejected if so preferred, either way it's not a
correctness issue.
I agree it's possible for some delay be added but the question is why
would that matter? "CWR" tells sender did already reduce its sending rate
which is where congestion control aims to. So the reaction to congestion
is already done when GRO sees CWR (some might have a misconception that
delivering CWR causes sender to reduce sending rate but that's not the
case). With RFC 3168 ECN, CWR only tells the receiving end to stop sending
ECE. Why does it matter if that information arrives a bit later?
If there are other segments, they normally don't have CWR with RFC 3168
ECN which normally set CWR once per RTT. A non-CWR'ed segment results in
flush after an inter-packet delay due to flags difference. That delay is
nothing compared to GRO aggregating non-CWR segments en masse which is
in n times the inter-packet delay (simplification, ignores burstiness,
etc.).
If there are no other segments, the receiver won't be sending any ECEs
either, so the extra delay does not seem that impactful.
Some might argue that with this "special delivery" for CWR the segment
could trigger an ACK "sooner", but GRO shouldn't hold the segment forever
either (though I don't recall the details anymore). But if we make that
argument (which is no longer ECN signalling related at all, BTW), why use
GRO at all as it add delay for other segments too delaying other ACKs, why
is this CWR'ed segment so special that it in particular must elicit ACK
ASAP? It's hard to justify that distinction/CWR speciality, unless one has
that misconception CWR must arrive ASAP to expedite congestion reaction
which is based on misunderstanding how RFC 3168 ECN works.
Thus, what I wrote to the changelog about the delay not being harmful
seems well justified.
> On the flip side adding too much
> AccECN logic to GRO (i.e. to allow aggregation only for AccECN enabled
> flows) looks overkill.
The usual aggregation works on header bits remaining identical which
just happens to also suit AccECN better here. The RFC 3168 CWR trickery is
what is an expection to the rule, and as explained above, it does not seem
even that useful.
This CWR special delivery rule, on the other hand, is clearly harmful for
aggregating AccECN segments which may have long row of CWR flagged
segments if ACE field remains unchanging. None of them can be aggregated
by GRO if this particular change is not accepted. Not an end of the world
but if we weight the pros and cons, it seems to clearly favor not keeping
this special delivery rule.
--
i.
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH v4 net-next 08/13] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
2025-10-16 9:13 ` Paolo Abeni
@ 2025-10-18 16:06 ` Chia-Yu Chang (Nokia)
0 siblings, 0 replies; 27+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-10-18 16:06 UTC (permalink / raw)
To: Paolo Abeni, edumazet@google.com, linux-doc@vger.kernel.org,
corbet@lwn.net, horms@kernel.org, dsahern@kernel.org,
kuniyu@amazon.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
dave.taht@gmail.com, jhs@mojatatu.com, kuba@kernel.org,
stephen@networkplumber.org, xiyou.wangcong@gmail.com,
jiri@resnulli.us, davem@davemloft.net, andrew+netdev@lunn.ch,
donald.hunter@gmail.com, ast@fiberby.net, liuhangbin@gmail.com,
shuah@kernel.org, linux-kselftest@vger.kernel.org, ij@kernel.org,
ncardwell@google.com, Koen De Schepper (Nokia),
g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
Jason_Livingood@comcast.com, Vidhi Goel
> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Thursday, October 16, 2025 11:14 AM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v4 net-next 08/13] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK
>
>
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
> On 10/13/25 7:03 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >
> > If the TCP Server has not received an ACK to acknowledge its SYN/ACK
> > after the normal TCP timeout or it receives a second SYN with a
> > request for AccECN support, then either the SYN/ACK might just have
> > been lost, e.g. due to congestion, or a middlebox might be blocking
> > AccECN Options. To expedite connection setup in deployment scenarios
> > where AccECN path traversal might be problematic, the TCP Server
> > SHOULD retransmit the SYN/ACK, but with no AccECN Option.
> >
> > If this retransmission times out, to expedite connection setup, the
> > TCP Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0)
> > and no AccECN Option, but it remains in AccECN feedback mode.
> >
> > This follows Section 3.2.3.2.2 of AccECN spec (RFC9768).
> >
> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> > ---
> > include/net/tcp_ecn.h | 14 ++++++++++---- net/ipv4/tcp_output.c | 2
> > +-
> > 2 files changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h index
> > c66f0d944e1c..97a3a7f36aff 100644
> > --- a/include/net/tcp_ecn.h
> > +++ b/include/net/tcp_ecn.h
> > @@ -651,10 +651,16 @@ static inline void tcp_ecn_clear_syn(struct sock
> > *sk, struct sk_buff *skb) static inline void
> > tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th)
> > {
> > - if (tcp_rsk(req)->accecn_ok)
> > - tcp_accecn_echo_syn_ect(th, tcp_rsk(req)->syn_ect_rcv);
> > - else if (inet_rsk(req)->ecn_ok)
> > - th->ece = 1;
> > + if (req->num_retrans < 1 || req->num_timeout < 1) {
>
> I think the above condition does not match the commit message. Should be:
> if (!req->num_retrans && !req->num_timeout) {
>
> /P
Hi Paolo,
This patch includes two differetn SYN/ACK retransmissions:
In the first retransmited SYN/ACK, the retransmitted SYN/ACK will not include AccECN option.
This uses the condition of "req->num_retrans >1" in tcp_synack_options().
In the second retransmitted SYN/ACK, the retransmitted SYN/ACK will further set ACE=0.
This uses the condition of "req->num_retrans>1 && req->num_timeout>1" in tcp_ecn_make_synack().
I was thinking, in the next version, I could update the commit message to clarify it.
Chia-Yu
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN
2025-10-16 20:26 ` Ilpo Järvinen
@ 2025-10-20 15:26 ` Chia-Yu Chang (Nokia)
2025-10-20 15:31 ` Eric Dumazet
0 siblings, 1 reply; 27+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-10-20 15:26 UTC (permalink / raw)
To: Ilpo Järvinen, Paolo Abeni
Cc: edumazet@google.com, linux-doc@vger.kernel.org, corbet@lwn.net,
horms@kernel.org, dsahern@kernel.org, kuniyu@amazon.com,
bpf@vger.kernel.org, netdev@vger.kernel.org, dave.taht@gmail.com,
jhs@mojatatu.com, kuba@kernel.org, stephen@networkplumber.org,
xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net,
liuhangbin@gmail.com, shuah@kernel.org,
linux-kselftest@vger.kernel.org, ncardwell@google.com,
Koen De Schepper (Nokia), g.white@cablelabs.com,
ingemar.s.johansson@ericsson.com, mirja.kuehlewind@ericsson.com,
cheshire, rs.ietf@gmx.at, Jason_Livingood@comcast.com, Vidhi Goel
> -----Original Message-----
> From: Ilpo Järvinen <ij@kernel.org>
> Sent: Thursday, October 16, 2025 10:27 PM
> To: Paolo Abeni <pabeni@redhat.com>
> Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN
>
>
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
> On Thu, 16 Oct 2025, Paolo Abeni wrote:
> > On 10/13/25 7:03 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > > From: Ilpo Järvinen <ij@kernel.org>
> > >
> > > As AccECN may keep CWR bit asserted due to different interpretation
> > > of the bit, flushing with GRO because of CWR may effectively disable
> > > GRO until AccECN counter field changes such that CWR-bit becomes 0.
> > >
> > > There is no harm done from not immediately forwarding the CWR'ed
> > > segment with RFC3168 ECN.
> >
> > I guess this change could introduce additional latency for RFC3168
> > notification, which sounds not good.
> >
> > @Eric: WDYT?
>
> I'm not Eric but I want to add I foresaw somebody making this argument and thus wanted to not hide this change into some other patch so it can be properly discussed and rejected if so preferred, either way it's not a correctness issue.
>
> I agree it's possible for some delay be added but the question is why would that matter? "CWR" tells sender did already reduce its sending rate which is where congestion control aims to. So the reaction to congestion is already done when GRO sees CWR (some might have a misconception that delivering CWR causes sender to reduce sending rate but that's not the case). With RFC 3168 ECN, CWR only tells the receiving end to stop sending ECE. Why does it matter if that information arrives a bit later?
>
> If there are other segments, they normally don't have CWR with RFC 3168 ECN which normally set CWR once per RTT. A non-CWR'ed segment results in flush after an inter-packet delay due to flags difference. That delay is nothing compared to GRO aggregating non-CWR segments en masse which is in n times the inter-packet delay (simplification, ignores burstiness, etc.).
>
> If there are no other segments, the receiver won't be sending any ECEs either, so the extra delay does not seem that impactful.
>
> Some might argue that with this "special delivery" for CWR the segment could trigger an ACK "sooner", but GRO shouldn't hold the segment forever either (though I don't recall the details anymore). But if we make that argument (which is no longer ECN signalling related at all, BTW), why use GRO at all as it add delay for other segments too delaying other ACKs, why is this CWR'ed segment so special that it in particular must elicit ACK ASAP? It's hard to justify that distinction/CWR speciality, unless one has that misconception CWR must arrive ASAP to expedite congestion reaction which is based on misunderstanding how RFC 3168 ECN works.
>
> Thus, what I wrote to the changelog about the delay not being harmful seems well justified.
>
> > On the flip side adding too much
> > AccECN logic to GRO (i.e. to allow aggregation only for AccECN enabled
> > flows) looks overkill.
>
> The usual aggregation works on header bits remaining identical which just happens to also suit AccECN better here. The RFC 3168 CWR trickery is what is an expection to the rule, and as explained above, it does not seem even that useful.
>
> This CWR special delivery rule, on the other hand, is clearly harmful for aggregating AccECN segments which may have long row of CWR flagged segments if ACE field remains unchanging. None of them can be aggregated by GRO if this particular change is not accepted. Not an end of the world but if we weight the pros and cons, it seems to clearly favor not keeping this special delivery rule.
Hi Paolo,
I agree with what was mentioned by Ilpo above.
But if Eric can share extra comments or some particular cases would be helpful.
Shall we submit all patches with changes (and keep this patch unchanged)? Or please suggest other ways to move forward.
Chia-Yu
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN
2025-10-20 15:26 ` Chia-Yu Chang (Nokia)
@ 2025-10-20 15:31 ` Eric Dumazet
2025-10-20 16:46 ` Chia-Yu Chang (Nokia)
0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2025-10-20 15:31 UTC (permalink / raw)
To: Chia-Yu Chang (Nokia)
Cc: Ilpo Järvinen, Paolo Abeni, linux-doc@vger.kernel.org,
corbet@lwn.net, horms@kernel.org, dsahern@kernel.org,
kuniyu@amazon.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
dave.taht@gmail.com, jhs@mojatatu.com, kuba@kernel.org,
stephen@networkplumber.org, xiyou.wangcong@gmail.com,
jiri@resnulli.us, davem@davemloft.net, andrew+netdev@lunn.ch,
donald.hunter@gmail.com, ast@fiberby.net, liuhangbin@gmail.com,
shuah@kernel.org, linux-kselftest@vger.kernel.org,
ncardwell@google.com, Koen De Schepper (Nokia),
g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
Jason_Livingood@comcast.com, Vidhi Goel
On Mon, Oct 20, 2025 at 8:26 AM Chia-Yu Chang (Nokia)
<chia-yu.chang@nokia-bell-labs.com> wrote:
>
> > -----Original Message-----
> > From: Ilpo Järvinen <ij@kernel.org>
> > Sent: Thursday, October 16, 2025 10:27 PM
> > To: Paolo Abeni <pabeni@redhat.com>
> > Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; edumazet@google.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> > Subject: Re: [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN
> >
> >
> > CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> >
> >
> >
> > On Thu, 16 Oct 2025, Paolo Abeni wrote:
> > > On 10/13/25 7:03 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > > > From: Ilpo Järvinen <ij@kernel.org>
> > > >
> > > > As AccECN may keep CWR bit asserted due to different interpretation
> > > > of the bit, flushing with GRO because of CWR may effectively disable
> > > > GRO until AccECN counter field changes such that CWR-bit becomes 0.
> > > >
> > > > There is no harm done from not immediately forwarding the CWR'ed
> > > > segment with RFC3168 ECN.
> > >
> > > I guess this change could introduce additional latency for RFC3168
> > > notification, which sounds not good.
> > >
> > > @Eric: WDYT?
> >
> > I'm not Eric but I want to add I foresaw somebody making this argument and thus wanted to not hide this change into some other patch so it can be properly discussed and rejected if so preferred, either way it's not a correctness issue.
> >
> > I agree it's possible for some delay be added but the question is why would that matter? "CWR" tells sender did already reduce its sending rate which is where congestion control aims to. So the reaction to congestion is already done when GRO sees CWR (some might have a misconception that delivering CWR causes sender to reduce sending rate but that's not the case). With RFC 3168 ECN, CWR only tells the receiving end to stop sending ECE. Why does it matter if that information arrives a bit later?
> >
> > If there are other segments, they normally don't have CWR with RFC 3168 ECN which normally set CWR once per RTT. A non-CWR'ed segment results in flush after an inter-packet delay due to flags difference. That delay is nothing compared to GRO aggregating non-CWR segments en masse which is in n times the inter-packet delay (simplification, ignores burstiness, etc.).
> >
> > If there are no other segments, the receiver won't be sending any ECEs either, so the extra delay does not seem that impactful.
> >
> > Some might argue that with this "special delivery" for CWR the segment could trigger an ACK "sooner", but GRO shouldn't hold the segment forever either (though I don't recall the details anymore). But if we make that argument (which is no longer ECN signalling related at all, BTW), why use GRO at all as it add delay for other segments too delaying other ACKs, why is this CWR'ed segment so special that it in particular must elicit ACK ASAP? It's hard to justify that distinction/CWR speciality, unless one has that misconception CWR must arrive ASAP to expedite congestion reaction which is based on misunderstanding how RFC 3168 ECN works.
> >
> > Thus, what I wrote to the changelog about the delay not being harmful seems well justified.
> >
> > > On the flip side adding too much
> > > AccECN logic to GRO (i.e. to allow aggregation only for AccECN enabled
> > > flows) looks overkill.
> >
> > The usual aggregation works on header bits remaining identical which just happens to also suit AccECN better here. The RFC 3168 CWR trickery is what is an expection to the rule, and as explained above, it does not seem even that useful.
> >
> > This CWR special delivery rule, on the other hand, is clearly harmful for aggregating AccECN segments which may have long row of CWR flagged segments if ACE field remains unchanging. None of them can be aggregated by GRO if this particular change is not accepted. Not an end of the world but if we weight the pros and cons, it seems to clearly favor not keeping this special delivery rule.
>
> Hi Paolo,
>
> I agree with what was mentioned by Ilpo above.
>
> But if Eric can share extra comments or some particular cases would be helpful.
>
> Shall we submit all patches with changes (and keep this patch unchanged)? Or please suggest other ways to move forward.
Hmm... maybe now is a good time to amend tools/testing/selftests/net/gro.c
In general, the lack of tests in your series is not really appealing to me.
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN
2025-10-20 15:31 ` Eric Dumazet
@ 2025-10-20 16:46 ` Chia-Yu Chang (Nokia)
0 siblings, 0 replies; 27+ messages in thread
From: Chia-Yu Chang (Nokia) @ 2025-10-20 16:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: Ilpo Järvinen, Paolo Abeni, linux-doc@vger.kernel.org,
corbet@lwn.net, horms@kernel.org, dsahern@kernel.org,
kuniyu@amazon.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
dave.taht@gmail.com, jhs@mojatatu.com, kuba@kernel.org,
stephen@networkplumber.org, xiyou.wangcong@gmail.com,
jiri@resnulli.us, davem@davemloft.net, andrew+netdev@lunn.ch,
donald.hunter@gmail.com, ast@fiberby.net, liuhangbin@gmail.com,
shuah@kernel.org, linux-kselftest@vger.kernel.org,
ncardwell@google.com, Koen De Schepper (Nokia),
g.white@cablelabs.com, ingemar.s.johansson@ericsson.com,
mirja.kuehlewind@ericsson.com, cheshire, rs.ietf@gmx.at,
Jason_Livingood@comcast.com, Vidhi Goel
> -----Original Message-----
> From: Eric Dumazet <edumazet@google.com>
> Sent: Monday, October 20, 2025 5:32 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>
> Cc: Ilpo Järvinen <ij@kernel.org>; Paolo Abeni <pabeni@redhat.com>; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>; rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN
>
>
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
> On Mon, Oct 20, 2025 at 8:26 AM Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com> wrote:
> >
> > > -----Original Message-----
> > > From: Ilpo Järvinen <ij@kernel.org>
> > > Sent: Thursday, October 16, 2025 10:27 PM
> > > To: Paolo Abeni <pabeni@redhat.com>
> > > Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>;
> > > edumazet@google.com; linux-doc@vger.kernel.org; corbet@lwn.net;
> > > horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com;
> > > bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com;
> > > jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org;
> > > xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net;
> > > andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net;
> > > liuhangbin@gmail.com; shuah@kernel.org;
> > > linux-kselftest@vger.kernel.org; ncardwell@google.com; Koen De
> > > Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>;
> > > g.white@cablelabs.com; ingemar.s.johansson@ericsson.com;
> > > mirja.kuehlewind@ericsson.com; cheshire <cheshire@apple.com>;
> > > rs.ietf@gmx.at; Jason_Livingood@comcast.com; Vidhi Goel
> > > <vidhi_goel@apple.com>
> > > Subject: Re: [PATCH v4 net-next 02/13] gro: flushing when CWR is set
> > > negatively affects AccECN
> > >
> > >
> > > CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> > >
> > >
> > >
> > > On Thu, 16 Oct 2025, Paolo Abeni wrote:
> > > > On 10/13/25 7:03 PM, chia-yu.chang@nokia-bell-labs.com wrote:
> > > > > From: Ilpo Järvinen <ij@kernel.org>
> > > > >
> > > > > As AccECN may keep CWR bit asserted due to different
> > > > > interpretation of the bit, flushing with GRO because of CWR may
> > > > > effectively disable GRO until AccECN counter field changes such that CWR-bit becomes 0.
> > > > >
> > > > > There is no harm done from not immediately forwarding the CWR'ed
> > > > > segment with RFC3168 ECN.
> > > >
> > > > I guess this change could introduce additional latency for RFC3168
> > > > notification, which sounds not good.
> > > >
> > > > @Eric: WDYT?
> > >
> > > I'm not Eric but I want to add I foresaw somebody making this argument and thus wanted to not hide this change into some other patch so it can be properly discussed and rejected if so preferred, either way it's not a correctness issue.
> > >
> > > I agree it's possible for some delay be added but the question is why would that matter? "CWR" tells sender did already reduce its sending rate which is where congestion control aims to. So the reaction to congestion is already done when GRO sees CWR (some might have a misconception that delivering CWR causes sender to reduce sending rate but that's not the case). With RFC 3168 ECN, CWR only tells the receiving end to stop sending ECE. Why does it matter if that information arrives a bit later?
> > >
> > > If there are other segments, they normally don't have CWR with RFC 3168 ECN which normally set CWR once per RTT. A non-CWR'ed segment results in flush after an inter-packet delay due to flags difference. That delay is nothing compared to GRO aggregating non-CWR segments en masse which is in n times the inter-packet delay (simplification, ignores burstiness, etc.).
> > >
> > > If there are no other segments, the receiver won't be sending any ECEs either, so the extra delay does not seem that impactful.
> > >
> > > Some might argue that with this "special delivery" for CWR the segment could trigger an ACK "sooner", but GRO shouldn't hold the segment forever either (though I don't recall the details anymore). But if we make that argument (which is no longer ECN signalling related at all, BTW), why use GRO at all as it add delay for other segments too delaying other ACKs, why is this CWR'ed segment so special that it in particular must elicit ACK ASAP? It's hard to justify that distinction/CWR speciality, unless one has that misconception CWR must arrive ASAP to expedite congestion reaction which is based on misunderstanding how RFC 3168 ECN works.
> > >
> > > Thus, what I wrote to the changelog about the delay not being harmful seems well justified.
> > >
> > > > On the flip side adding too much
> > > > AccECN logic to GRO (i.e. to allow aggregation only for AccECN
> > > > enabled
> > > > flows) looks overkill.
> > >
> > > The usual aggregation works on header bits remaining identical which just happens to also suit AccECN better here. The RFC 3168 CWR trickery is what is an expection to the rule, and as explained above, it does not seem even that useful.
> > >
> > > This CWR special delivery rule, on the other hand, is clearly harmful for aggregating AccECN segments which may have long row of CWR flagged segments if ACE field remains unchanging. None of them can be aggregated by GRO if this particular change is not accepted. Not an end of the world but if we weight the pros and cons, it seems to clearly favor not keeping this special delivery rule.
> >
> > Hi Paolo,
> >
> > I agree with what was mentioned by Ilpo above.
> >
> > But if Eric can share extra comments or some particular cases would be helpful.
> >
> > Shall we submit all patches with changes (and keep this patch unchanged)? Or please suggest other ways to move forward.
>
> Hmm... maybe now is a good time to amend tools/testing/selftests/net/gro.c
>
> In general, the lack of tests in your series is not really appealing to me.
Thanks for the feedback, and I will look into this tool (Anything I could refer to use/update this tool?).
Also, maybe we could move this patch to a latter series and make other patches being reviewed?
Chia-Yu
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-10-20 16:46 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-13 17:03 [PATCH v4 net-next 00/13] AccECN protocol case handling series chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 01/13] tcp: try to avoid safer when ACKs are thinned chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 02/13] gro: flushing when CWR is set negatively affects AccECN chia-yu.chang
2025-10-16 9:17 ` Paolo Abeni
2025-10-16 20:26 ` Ilpo Järvinen
2025-10-20 15:26 ` Chia-Yu Chang (Nokia)
2025-10-20 15:31 ` Eric Dumazet
2025-10-20 16:46 ` Chia-Yu Chang (Nokia)
2025-10-13 17:03 ` [PATCH v4 net-next 03/13] tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 04/13] tcp: disable RFC3168 fallback identifier " chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 05/13] tcp: accecn: handle unexpected AccECN negotiation feedback chia-yu.chang
2025-10-16 9:02 ` Paolo Abeni
2025-10-13 17:03 ` [PATCH v4 net-next 06/13] tcp: accecn: retransmit downgraded SYN in AccECN negotiation chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 07/13] tcp: move increment of num_retrans chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 08/13] tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK chia-yu.chang
2025-10-16 9:13 ` Paolo Abeni
2025-10-18 16:06 ` Chia-Yu Chang (Nokia)
2025-10-13 17:03 ` [PATCH v4 net-next 09/13] tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 10/13] tcp: accecn: fallback outgoing half link to non-AccECN chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 11/13] tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation chia-yu.chang
2025-10-13 17:03 ` [PATCH v4 net-next 12/13] tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST chia-yu.chang
2025-10-16 9:25 ` Paolo Abeni
2025-10-13 17:03 ` [PATCH v4 net-next 13/13] tcp: accecn: enable AccECN chia-yu.chang
2025-10-16 10:08 ` [PATCH v4 net-next 00/13] AccECN protocol case handling series Jakub Sitnicki
-- strict thread matches above, loose matches on Subject: below --
2025-10-10 13:17 chia-yu.chang
2025-10-10 14:53 ` Paolo Abeni
2025-10-10 14:55 ` Chia-Yu Chang (Nokia)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).