[PATCH net-next] tcp: better handle TCP_TX

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
@ 2025-10-13 14:59 Eric Dumazet
  2025-10-14  8:22 ` Paolo Abeni
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Eric Dumazet @ 2025-10-13 14:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Neal Cardwell, Willem de Bruijn, Kuniyuki Iwashima,
	netdev, eric.dumazet, Eric Dumazet

Some applications uses TCP_TX_DELAY socket option after TCP flow
is established.

Some metrics need to be updated, otherwise TCP might take time to
adapt to the new (emulated) RTT.

This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto
and sk->sk_pacing_rate.

This is best effort, and for instance icsk_rto is reset
without taking backoff into account.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h    |  2 ++
 net/ipv4/tcp.c       | 31 +++++++++++++++++++++++++++----
 net/ipv4/tcp_input.c |  4 ++--
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5ca230ed526ae02711e8d2a409b91664b73390f2..1e547138f4fb7f5c47d15990954d4d135f465f73 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -461,6 +461,8 @@ enum skb_drop_reason tcp_child_process(struct sock *parent, struct sock *child,
 void tcp_enter_loss(struct sock *sk);
 void tcp_cwnd_reduction(struct sock *sk, int newly_acked_sacked, int newly_lost, int flag);
 void tcp_clear_retrans(struct tcp_sock *tp);
+void tcp_update_pacing_rate(struct sock *sk);
+void tcp_set_rto(struct sock *sk);
 void tcp_update_metrics(struct sock *sk);
 void tcp_init_metrics(struct sock *sk);
 void tcp_metrics_init(void);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8a18aeca7ab07480844946120f51a0555699b4c3..84662904ca96ed5685e56a827d067b62fdac3063 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3583,9 +3583,12 @@ static int tcp_repair_options_est(struct sock *sk, sockptr_t optbuf,
 DEFINE_STATIC_KEY_FALSE(tcp_tx_delay_enabled);
 EXPORT_IPV6_MOD(tcp_tx_delay_enabled);
 
-static void tcp_enable_tx_delay(void)
+static void tcp_enable_tx_delay(struct sock *sk, int val)
 {
-	if (!static_branch_unlikely(&tcp_tx_delay_enabled)) {
+	struct tcp_sock *tp = tcp_sk(sk);
+	s32 delta = (val - tp->tcp_tx_delay) << 3;
+
+	if (val && !static_branch_unlikely(&tcp_tx_delay_enabled)) {
 		static int __tcp_tx_delay_enabled = 0;
 
 		if (cmpxchg(&__tcp_tx_delay_enabled, 0, 1) == 0) {
@@ -3593,6 +3596,22 @@ static void tcp_enable_tx_delay(void)
 			pr_info("TCP_TX_DELAY enabled\n");
 		}
 	}
+	/* If we change tcp_tx_delay on a live flow, adjust tp->srtt_us,
+	 * tp->rtt_min, icsk_rto and sk->sk_pacing_rate.
+	 * This is best effort.
+	 */
+	if (delta && sk->sk_state == TCP_ESTABLISHED) {
+		s64 srtt = (s64)tp->srtt_us + delta;
+
+		tp->srtt_us = clamp_t(s64, srtt, 1, ~0U);
+
+		/* Note: does not deal with non zero icsk_backoff */
+		tcp_set_rto(sk);
+
+		minmax_reset(&tp->rtt_min, tcp_jiffies32, ~0U);
+
+		tcp_update_pacing_rate(sk);
+	}
 }
 
 /* When set indicates to always queue non-full frames.  Later the user clears
@@ -4119,8 +4138,12 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
 			tp->recvmsg_inq = val;
 		break;
 	case TCP_TX_DELAY:
-		if (val)
-			tcp_enable_tx_delay();
+		/* tp->srtt_us is u32, and is shifted by 3 */
+		if (val < 0 || val >= (1U << (31 - 3)) ) {
+			err = -EINVAL;
+			break;
+		}
+		tcp_enable_tx_delay(sk, val);
 		WRITE_ONCE(tp->tcp_tx_delay, val);
 		break;
 	default:
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 31ea5af49f2dc8a6f95f3f8c24065369765b8987..8fc97f4d8a6b2f8e39cabf6c9b3e6cdae294a5f5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1095,7 +1095,7 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
 	tp->srtt_us = max(1U, srtt);
 }
 
-static void tcp_update_pacing_rate(struct sock *sk)
+void tcp_update_pacing_rate(struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	u64 rate;
@@ -1132,7 +1132,7 @@ static void tcp_update_pacing_rate(struct sock *sk)
 /* Calculate rto without backoff.  This is the second half of Van Jacobson's
  * routine referred to above.
  */
-static void tcp_set_rto(struct sock *sk)
+void tcp_set_rto(struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	/* Old crap is replaced with new one. 8)
-- 
2.51.0.740.g6adb054d12-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-13 14:59 [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows Eric Dumazet
@ 2025-10-14  8:22 ` Paolo Abeni
  2025-10-14  8:29   ` Eric Dumazet
  2025-10-15  2:34 ` Jakub Kicinski
  2025-10-15 16:00 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2025-10-14  8:22 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: Simon Horman, Neal Cardwell, Willem de Bruijn, Kuniyuki Iwashima,
	netdev, eric.dumazet

On 10/13/25 4:59 PM, Eric Dumazet wrote:
> Some applications uses TCP_TX_DELAY socket option after TCP flow
> is established.
> 
> Some metrics need to be updated, otherwise TCP might take time to
> adapt to the new (emulated) RTT.
> 
> This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto
> and sk->sk_pacing_rate.
> 
> This is best effort, and for instance icsk_rto is reset
> without taking backoff into account.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

The CI is consistently reporting pktdrill failures on top of this patch:

# selftests: net/packetdrill: tcp_user_timeout_user-timeout-probe.pkt
# TAP version 13
# 1..2
# tcp_user_timeout_user-timeout-probe.pkt:35: error in Python code
# Traceback (most recent call last):
#   File "/tmp/code_T7S7S4", line 202, in <module>
#     assert tcpi_probes == 6, tcpi_probes; \
# AssertionError: 0
# tcp_user_timeout_user-timeout-probe.pkt: error executing code:
'python3' returned non-zero status 1

To be accurate, the patches batch under tests also includes:

https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780

but the latter looks even more unlikely to cause the reported issues?!?

Tentatively setting this patch to changes request, to for CI's sake.

/P


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-14  8:22 ` Paolo Abeni
@ 2025-10-14  8:29   ` Eric Dumazet
  2025-10-14  8:54     ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2025-10-14  8:29 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: David S . Miller, Jakub Kicinski, Simon Horman, Neal Cardwell,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On Tue, Oct 14, 2025 at 1:22 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 10/13/25 4:59 PM, Eric Dumazet wrote:
> > Some applications uses TCP_TX_DELAY socket option after TCP flow
> > is established.
> >
> > Some metrics need to be updated, otherwise TCP might take time to
> > adapt to the new (emulated) RTT.
> >
> > This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto
> > and sk->sk_pacing_rate.
> >
> > This is best effort, and for instance icsk_rto is reset
> > without taking backoff into account.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> The CI is consistently reporting pktdrill failures on top of this patch:
>
> # selftests: net/packetdrill: tcp_user_timeout_user-timeout-probe.pkt
> # TAP version 13
> # 1..2
> # tcp_user_timeout_user-timeout-probe.pkt:35: error in Python code
> # Traceback (most recent call last):
> #   File "/tmp/code_T7S7S4", line 202, in <module>
> #     assert tcpi_probes == 6, tcpi_probes; \
> # AssertionError: 0
> # tcp_user_timeout_user-timeout-probe.pkt: error executing code:
> 'python3' returned non-zero status 1
>
> To be accurate, the patches batch under tests also includes:
>
> https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
>
> but the latter looks even more unlikely to cause the reported issues?!?
>
> Tentatively setting this patch to changes request, to for CI's sake.

I will take a look, thanks.

I ran our ~2000 packetdrill tests for the tcp_tso_should_defer() fix,
but had no coverage yet for TCP_TX_DELAY, and started adding
packetdrill tests for that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-14  8:29   ` Eric Dumazet
@ 2025-10-14  8:54     ` Eric Dumazet
  2025-10-14  9:37       ` Paolo Abeni
  2025-10-14  9:38       ` Eric Dumazet
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2025-10-14  8:54 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: David S . Miller, Jakub Kicinski, Simon Horman, Neal Cardwell,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On Tue, Oct 14, 2025 at 1:29 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Oct 14, 2025 at 1:22 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On 10/13/25 4:59 PM, Eric Dumazet wrote:
> > > Some applications uses TCP_TX_DELAY socket option after TCP flow
> > > is established.
> > >
> > > Some metrics need to be updated, otherwise TCP might take time to
> > > adapt to the new (emulated) RTT.
> > >
> > > This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto
> > > and sk->sk_pacing_rate.
> > >
> > > This is best effort, and for instance icsk_rto is reset
> > > without taking backoff into account.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> >
> > The CI is consistently reporting pktdrill failures on top of this patch:
> >
> > # selftests: net/packetdrill: tcp_user_timeout_user-timeout-probe.pkt
> > # TAP version 13
> > # 1..2
> > # tcp_user_timeout_user-timeout-probe.pkt:35: error in Python code
> > # Traceback (most recent call last):
> > #   File "/tmp/code_T7S7S4", line 202, in <module>
> > #     assert tcpi_probes == 6, tcpi_probes; \
> > # AssertionError: 0
> > # tcp_user_timeout_user-timeout-probe.pkt: error executing code:
> > 'python3' returned non-zero status 1
> >
> > To be accurate, the patches batch under tests also includes:
> >
> > https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
> >
> > but the latter looks even more unlikely to cause the reported issues?!?

Not sure, look at the packetdrill test "`tc qdisc delete dev tun0 root
2>/dev/null ; tc qdisc add dev tun0 root pfifo limit 0`"

After "net: dev_queue_xmit() llist adoption" __dev_xmit_skb() might
return NET_XMIT_SUCCESS instead of NET_XMIT_DROP

__tcp_transmit_skb() has some code to detect NET_XMIT_DROP
immediately, instead of relying on a timer.

I can fix the 'single packet' case, but not the case of many packets
being sent in //

Note this issue was there already, for qdisc with TCQ_F_CAN_BYPASS :
We were returning NET_XMIT_SUCCESS even if the driver had to drop the packet.

Test is flaky even without the
https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
series.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-14  8:54     ` Eric Dumazet
@ 2025-10-14  9:37       ` Paolo Abeni
  2025-10-14  9:40         ` Eric Dumazet
  2025-10-14  9:38       ` Eric Dumazet
  1 sibling, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2025-10-14  9:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Simon Horman, Neal Cardwell,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On 10/14/25 10:54 AM, Eric Dumazet wrote:
> On Tue, Oct 14, 2025 at 1:29 AM Eric Dumazet <edumazet@google.com> wrote:
>>
>> On Tue, Oct 14, 2025 at 1:22 AM Paolo Abeni <pabeni@redhat.com> wrote:
>>>
>>> On 10/13/25 4:59 PM, Eric Dumazet wrote:
>>>> Some applications uses TCP_TX_DELAY socket option after TCP flow
>>>> is established.
>>>>
>>>> Some metrics need to be updated, otherwise TCP might take time to
>>>> adapt to the new (emulated) RTT.
>>>>
>>>> This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto
>>>> and sk->sk_pacing_rate.
>>>>
>>>> This is best effort, and for instance icsk_rto is reset
>>>> without taking backoff into account.
>>>>
>>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>
>>> The CI is consistently reporting pktdrill failures on top of this patch:
>>>
>>> # selftests: net/packetdrill: tcp_user_timeout_user-timeout-probe.pkt
>>> # TAP version 13
>>> # 1..2
>>> # tcp_user_timeout_user-timeout-probe.pkt:35: error in Python code
>>> # Traceback (most recent call last):
>>> #   File "/tmp/code_T7S7S4", line 202, in <module>
>>> #     assert tcpi_probes == 6, tcpi_probes; \
>>> # AssertionError: 0
>>> # tcp_user_timeout_user-timeout-probe.pkt: error executing code:
>>> 'python3' returned non-zero status 1
>>>
>>> To be accurate, the patches batch under tests also includes:
>>>
>>> https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
>>>
>>> but the latter looks even more unlikely to cause the reported issues?!?
> 
> Not sure, look at the packetdrill test "`tc qdisc delete dev tun0 root
> 2>/dev/null ; tc qdisc add dev tun0 root pfifo limit 0`"
> 
> After "net: dev_queue_xmit() llist adoption" __dev_xmit_skb() might
> return NET_XMIT_SUCCESS instead of NET_XMIT_DROP
> 
> __tcp_transmit_skb() has some code to detect NET_XMIT_DROP
> immediately, instead of relying on a timer.
> 
> I can fix the 'single packet' case, but not the case of many packets
> being sent in //

What about using a nf rule to drop all the 'tun0' egress packet, instead
of a qdisc?

In any case I think the pending patches should be ok.

/P


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-14  9:37       ` Paolo Abeni
@ 2025-10-14  9:40         ` Eric Dumazet
  2025-10-14 16:06           ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2025-10-14  9:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: David S . Miller, Jakub Kicinski, Simon Horman, Neal Cardwell,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On Tue, Oct 14, 2025 at 2:38 AM Paolo Abeni <pabeni@redhat.com> wrote:
>

> What about using a nf rule to drop all the 'tun0' egress packet, instead
> of a qdisc?
>
> In any case I think the pending patches should be ok.

Or add a best effort, so that TCP can have some clue, vast majority of
cases is that the batch is 1 skb :)

diff --git a/net/core/dev.c b/net/core/dev.c
index e281bae9b150..4b938f4d4759 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4226,6 +4226,13 @@ static inline int __dev_xmit_skb(struct sk_buff
*skb, struct Qdisc *q,
                        __qdisc_run(q);
                qdisc_run_end(q);
        } else {
+               if (!llist_next(ll_list)) {
+                       DEBUG_NET_WARN_ON_ONCE(skb != llist_entry(ll_list,
+
struct sk_buff,
+                                                                 ll_node));
+                       rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
+                       ll_list = NULL;
+               }
                llist_for_each_entry_safe(skb, next, ll_list, ll_node) {
                        prefetch(next);
                        skb_mark_not_on_list(skb);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-14  9:40         ` Eric Dumazet
@ 2025-10-14 16:06           ` Jakub Kicinski
  2025-10-14 16:16             ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-10-14 16:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, David S . Miller, Simon Horman, Neal Cardwell,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On Tue, 14 Oct 2025 02:40:39 -0700 Eric Dumazet wrote:
> > What about using a nf rule to drop all the 'tun0' egress packet, instead
> > of a qdisc?
> >
> > In any case I think the pending patches should be ok.  
> 
> Or add a best effort, so that TCP can have some clue, vast majority of
> cases is that the batch is 1 skb :)

FWIW I don't see an official submission and CI is quite behind 
so I'll set the test to ignored for now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-14 16:06           ` Jakub Kicinski
@ 2025-10-14 16:16             ` Eric Dumazet
  2025-10-14 17:04               ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2025-10-14 16:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paolo Abeni, David S . Miller, Simon Horman, Neal Cardwell,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On Tue, Oct 14, 2025 at 9:06 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 14 Oct 2025 02:40:39 -0700 Eric Dumazet wrote:
> > > What about using a nf rule to drop all the 'tun0' egress packet, instead
> > > of a qdisc?
> > >
> > > In any case I think the pending patches should be ok.
> >
> > Or add a best effort, so that TCP can have some clue, vast majority of
> > cases is that the batch is 1 skb :)
>
> FWIW I don't see an official submission and CI is quite behind
> so I'll set the test to ignored for now.

You mean this TCP_TX_DELAY patch ? Or the series ?

I will send V2 of the series soon.  (I added the test unflake in it)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-14 16:16             ` Eric Dumazet
@ 2025-10-14 17:04               ` Jakub Kicinski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2025-10-14 17:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, David S . Miller, Simon Horman, Neal Cardwell,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On Tue, 14 Oct 2025 09:16:23 -0700 Eric Dumazet wrote:
> On Tue, Oct 14, 2025 at 9:06 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Tue, 14 Oct 2025 02:40:39 -0700 Eric Dumazet wrote:  
> > > Or add a best effort, so that TCP can have some clue, vast majority of
> > > cases is that the batch is 1 skb :)  
> >
> > FWIW I don't see an official submission and CI is quite behind
> > so I'll set the test to ignored for now.  
> 
> You mean this TCP_TX_DELAY patch ? Or the series ?
> 
> I will send V2 of the series soon.  (I added the test unflake in it)

Great, I wasn't clear whether you'll send a separate fix or v2.
So I disabled the test itself. 
Not the patches, patches are still queued. 

Sorry for the confusion, our CI is what it is - just carry on as normal
and I'll try to keep up.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-14  8:54     ` Eric Dumazet
  2025-10-14  9:37       ` Paolo Abeni
@ 2025-10-14  9:38       ` Eric Dumazet
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2025-10-14  9:38 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: David S . Miller, Jakub Kicinski, Simon Horman, Neal Cardwell,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On Tue, Oct 14, 2025 at 1:54 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Oct 14, 2025 at 1:29 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Oct 14, 2025 at 1:22 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > On 10/13/25 4:59 PM, Eric Dumazet wrote:
> > > > Some applications uses TCP_TX_DELAY socket option after TCP flow
> > > > is established.
> > > >
> > > > Some metrics need to be updated, otherwise TCP might take time to
> > > > adapt to the new (emulated) RTT.
> > > >
> > > > This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto
> > > > and sk->sk_pacing_rate.
> > > >
> > > > This is best effort, and for instance icsk_rto is reset
> > > > without taking backoff into account.
> > > >
> > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > >
> > > The CI is consistently reporting pktdrill failures on top of this patch:
> > >
> > > # selftests: net/packetdrill: tcp_user_timeout_user-timeout-probe.pkt
> > > # TAP version 13
> > > # 1..2
> > > # tcp_user_timeout_user-timeout-probe.pkt:35: error in Python code
> > > # Traceback (most recent call last):
> > > #   File "/tmp/code_T7S7S4", line 202, in <module>
> > > #     assert tcpi_probes == 6, tcpi_probes; \
> > > # AssertionError: 0
> > > # tcp_user_timeout_user-timeout-probe.pkt: error executing code:
> > > 'python3' returned non-zero status 1
> > >
> > > To be accurate, the patches batch under tests also includes:
> > >
> > > https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
> > >
> > > but the latter looks even more unlikely to cause the reported issues?!?
>
> Not sure, look at the packetdrill test "`tc qdisc delete dev tun0 root
> 2>/dev/null ; tc qdisc add dev tun0 root pfifo limit 0`"
>
> After "net: dev_queue_xmit() llist adoption" __dev_xmit_skb() might
> return NET_XMIT_SUCCESS instead of NET_XMIT_DROP
>
> __tcp_transmit_skb() has some code to detect NET_XMIT_DROP
> immediately, instead of relying on a timer.
>
> I can fix the 'single packet' case, but not the case of many packets
> being sent in //
>
> Note this issue was there already, for qdisc with TCQ_F_CAN_BYPASS :
> We were returning NET_XMIT_SUCCESS even if the driver had to drop the packet.
>
> Test is flaky even without the
> https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
> series.

Test flakiness can be fixed with

diff --git a/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
b/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
index 183051ba0cae..71f7a75a733b 100644
--- a/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
+++ b/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
@@ -7,6 +7,8 @@
    +0 bind(3, ..., ...) = 0
    +0 listen(3, 1) = 0

+// install a pfifo qdisc
+   +0 `tc qdisc delete dev tun0 root 2>/dev/null ; tc qdisc add dev
tun0 root pfifo limit 10`

    +0 < S 0:0(0) win 0 <mss 1460>
    +0 > S. 0:0(0) ack 1 <mss 1460>
@@ -21,16 +23,18 @@
    +0 %{ assert tcpi_probes == 0, tcpi_probes; \
          assert tcpi_backoff == 0, tcpi_backoff }%

-// install a qdisc dropping all packets
-   +0 `tc qdisc delete dev tun0 root 2>/dev/null ; tc qdisc add dev
tun0 root pfifo limit 0`
+// Tune pfifo limit to 0. A single tc command is less disruptive in VM tests.
+   +0 `tc qdisc change dev tun0 root pfifo limit 0`
+
    +0 write(4, ..., 24) = 24
    // When qdisc is congested we retry every 500ms
    // (TCP_RESOURCE_PROBE_INTERVAL) and therefore
    // we retry 6 times before hitting 3s timeout.
    // First verify that the connection is alive:
-+3.250 write(4, ..., 24) = 24
++3 write(4, ..., 24) = 24
+
    // Now verify that shortly after that the socket is dead:
- +.100 write(4, ..., 24) = -1 ETIMEDOUT (Connection timed out)
++1 write(4, ..., 24) = -1 ETIMEDOUT (Connection timed out)

    +0 %{ assert tcpi_probes == 6, tcpi_probes; \
          assert tcpi_backoff == 0, tcpi_backoff }%

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-13 14:59 [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows Eric Dumazet
  2025-10-14  8:22 ` Paolo Abeni
@ 2025-10-15  2:34 ` Jakub Kicinski
  2025-10-15  5:17   ` Eric Dumazet
  2025-10-15 16:00 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-10-15  2:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Neal Cardwell,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On Mon, 13 Oct 2025 14:59:26 +0000 Eric Dumazet wrote:
> +		if (val < 0 || val >= (1U << (31 - 3)) ) {

Is the space between the ))) brackets intentional?
Sorry for asking late.. we can fix when applying.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-15  2:34 ` Jakub Kicinski
@ 2025-10-15  5:17   ` Eric Dumazet
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2025-10-15  5:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Simon Horman, Neal Cardwell,
	Willem de Bruijn, Kuniyuki Iwashima, netdev, eric.dumazet

On Tue, Oct 14, 2025 at 7:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 13 Oct 2025 14:59:26 +0000 Eric Dumazet wrote:
> > +             if (val < 0 || val >= (1U << (31 - 3)) ) {
>
> Is the space between the ))) brackets intentional?
> Sorry for asking late.. we can fix when applying.

This was not intentional, sorry for this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
  2025-10-13 14:59 [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows Eric Dumazet
  2025-10-14  8:22 ` Paolo Abeni
  2025-10-15  2:34 ` Jakub Kicinski
@ 2025-10-15 16:00 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-10-15 16:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, horms, ncardwell, willemb, kuniyu, netdev,
	eric.dumazet

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 13 Oct 2025 14:59:26 +0000 you wrote:
> Some applications uses TCP_TX_DELAY socket option after TCP flow
> is established.
> 
> Some metrics need to be updated, otherwise TCP might take time to
> adapt to the new (emulated) RTT.
> 
> This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto
> and sk->sk_pacing_rate.
> 
> [...]

Here is the summary with links:
  - [net-next] tcp: better handle TCP_TX_DELAY on established flows
    https://git.kernel.org/netdev/net-next/c/1c51450f1aff

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-10-15 16:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-13 14:59 [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows Eric Dumazet
2025-10-14  8:22 ` Paolo Abeni
2025-10-14  8:29   ` Eric Dumazet
2025-10-14  8:54     ` Eric Dumazet
2025-10-14  9:37       ` Paolo Abeni
2025-10-14  9:40         ` Eric Dumazet
2025-10-14 16:06           ` Jakub Kicinski
2025-10-14 16:16             ` Eric Dumazet
2025-10-14 17:04               ` Jakub Kicinski
2025-10-14  9:38       ` Eric Dumazet
2025-10-15  2:34 ` Jakub Kicinski
2025-10-15  5:17   ` Eric Dumazet
2025-10-15 16:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).