* [PATCH net 0/2] tcp_metrics: fix hanlding of route options
@ 2025-06-13 10:20 Petr Tesarik
2025-06-13 10:20 ` [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry Petr Tesarik
2025-06-13 10:20 ` [PATCH net 2/2] tcp_metrics: use ssthresh value from dst if there is no metrics Petr Tesarik
0 siblings, 2 replies; 9+ messages in thread
From: Petr Tesarik @ 2025-06-13 10:20 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
open list:NETWORKING [TCP]
Cc: David Ahern, Jakub Kicinski, Paolo Abeni, open list, Petr Tesarik
I ran into a couple of issues while trying to tweak TCP congestion
avoidance to analyze a potential performance regression. It turns out
that overriding the parameters with ip-route(8) does not work as
expected and appears to be buggy.
Petr Tesarik (2):
tcp_metrics: set maximum cwnd from the dst entry
tcp_metrics: use ssthresh value from dst if there is no metrics
net/ipv4/tcp_metrics.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry
2025-06-13 10:20 [PATCH net 0/2] tcp_metrics: fix hanlding of route options Petr Tesarik
@ 2025-06-13 10:20 ` Petr Tesarik
2025-06-17 11:00 ` Paolo Abeni
2025-06-13 10:20 ` [PATCH net 2/2] tcp_metrics: use ssthresh value from dst if there is no metrics Petr Tesarik
1 sibling, 1 reply; 9+ messages in thread
From: Petr Tesarik @ 2025-06-13 10:20 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
open list:NETWORKING [TCP]
Cc: David Ahern, Jakub Kicinski, Paolo Abeni, open list, Petr Tesarik
Always initialize tp->snd_cwnd_clamp from the corresponding dst entry.
There are two issues with setting it from the TCP metrics:
1. If the cwnd option is changed in the routing table, the new value is not
used as long as there is a cached TCP metric for the destination.
2. After evicting the cached TCP metric entry, the next connection will use
the default value (i.e. no limit). Only after this connection is
finished, a new entry is created, and this entry gets the locked value
from the routing table.
As a result, the following shenanigan is required to set a new locked cwnd
value:
- update the route (``ip route replace ... cwnd lock $value``)
- flush any existing TCP metric entry (``ip tcp_metrics flush $dest``)
- create and finish a dummy connection to the destination to create a TCP
metric entry with the new value
- *next* connection to this destination will use the new value
It does not seem to be intentional.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
---
net/ipv4/tcp_metrics.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 4251670e328c8..dd8f3457bd72e 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
if (!dst)
goto reset;
+ if (dst_metric_locked(dst, RTAX_CWND))
+ tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
+
rcu_read_lock();
tm = tcp_get_metrics(sk, dst, false);
if (!tm) {
@@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
goto reset;
}
- if (tcp_metric_locked(tm, TCP_METRIC_CWND))
- tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
-
val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
if (val) {
--
2.49.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net 2/2] tcp_metrics: use ssthresh value from dst if there is no metrics
2025-06-13 10:20 [PATCH net 0/2] tcp_metrics: fix hanlding of route options Petr Tesarik
2025-06-13 10:20 ` [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry Petr Tesarik
@ 2025-06-13 10:20 ` Petr Tesarik
2025-06-17 10:48 ` Paolo Abeni
1 sibling, 1 reply; 9+ messages in thread
From: Petr Tesarik @ 2025-06-13 10:20 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
open list:NETWORKING [TCP]
Cc: David Ahern, Jakub Kicinski, Paolo Abeni, open list, Petr Tesarik
If there is no cached TCP metrics entry for a connection, initialize
tp->snd_ssthresh from the corresponding dst entry. Also move the check
against tp->snd_cwnd_clamp to the common path to ensure that the ssthresh
value is never greater than maximum cwnd, regardless where it came from.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
---
net/ipv4/tcp_metrics.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index dd8f3457bd72e..b08920abec0e6 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -479,6 +479,9 @@ void tcp_init_metrics(struct sock *sk)
if (dst_metric_locked(dst, RTAX_CWND))
tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
+ val = dst_metric(dst, RTAX_SSTHRESH);
+ if (val)
+ tp->snd_ssthresh = val;
rcu_read_lock();
tm = tcp_get_metrics(sk, dst, false);
@@ -489,11 +492,8 @@ void tcp_init_metrics(struct sock *sk)
val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
- if (val) {
+ if (val)
tp->snd_ssthresh = val;
- if (tp->snd_ssthresh > tp->snd_cwnd_clamp)
- tp->snd_ssthresh = tp->snd_cwnd_clamp;
- }
val = tcp_metric_get(tm, TCP_METRIC_REORDERING);
if (val && tp->reordering != val)
tp->reordering = val;
@@ -537,6 +537,9 @@ void tcp_init_metrics(struct sock *sk)
inet_csk(sk)->icsk_rto = TCP_TIMEOUT_FALLBACK;
}
+
+ if (tp->snd_ssthresh > tp->snd_cwnd_clamp)
+ tp->snd_ssthresh = tp->snd_cwnd_clamp;
}
bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst)
--
2.49.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net 2/2] tcp_metrics: use ssthresh value from dst if there is no metrics
2025-06-13 10:20 ` [PATCH net 2/2] tcp_metrics: use ssthresh value from dst if there is no metrics Petr Tesarik
@ 2025-06-17 10:48 ` Paolo Abeni
2025-06-17 11:56 ` Petr Tesarik
0 siblings, 1 reply; 9+ messages in thread
From: Paolo Abeni @ 2025-06-17 10:48 UTC (permalink / raw)
To: Petr Tesarik, David S. Miller, Eric Dumazet, Neal Cardwell,
Kuniyuki Iwashima, open list:NETWORKING [TCP]
Cc: David Ahern, Jakub Kicinski, open list
On 6/13/25 12:20 PM, Petr Tesarik wrote:
> @@ -537,6 +537,9 @@ void tcp_init_metrics(struct sock *sk)
>
> inet_csk(sk)->icsk_rto = TCP_TIMEOUT_FALLBACK;
> }
> +
> + if (tp->snd_ssthresh > tp->snd_cwnd_clamp)
> + tp->snd_ssthresh = tp->snd_cwnd_clamp;
I don't think we can do this unconditionally, as other parts of the TCP
stack check explicitly for TCP_INFINITE_SSTHRESH.
/P
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry
2025-06-13 10:20 ` [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry Petr Tesarik
@ 2025-06-17 11:00 ` Paolo Abeni
2025-06-17 11:39 ` Petr Tesarik
0 siblings, 1 reply; 9+ messages in thread
From: Paolo Abeni @ 2025-06-17 11:00 UTC (permalink / raw)
To: Petr Tesarik, David S. Miller, Eric Dumazet, Neal Cardwell,
Kuniyuki Iwashima, open list:NETWORKING [TCP]
Cc: David Ahern, Jakub Kicinski, open list
On 6/13/25 12:20 PM, Petr Tesarik wrote:
> diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
> index 4251670e328c8..dd8f3457bd72e 100644
> --- a/net/ipv4/tcp_metrics.c
> +++ b/net/ipv4/tcp_metrics.c
> @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
> if (!dst)
> goto reset;
>
> + if (dst_metric_locked(dst, RTAX_CWND))
> + tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
> +
> rcu_read_lock();
> tm = tcp_get_metrics(sk, dst, false);
> if (!tm) {
> @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
> goto reset;
> }
>
> - if (tcp_metric_locked(tm, TCP_METRIC_CWND))
> - tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
> -
> val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
> 0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
> if (val) {
It's unclear to me why you drop the tcp_metric_get() here. It looks like
the above will cause a functional regression, with unlocked cached
metrics no longer taking effects?
/P
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry
2025-06-17 11:00 ` Paolo Abeni
@ 2025-06-17 11:39 ` Petr Tesarik
2025-06-18 17:01 ` Petr Tesarik
0 siblings, 1 reply; 9+ messages in thread
From: Petr Tesarik @ 2025-06-17 11:39 UTC (permalink / raw)
To: Paolo Abeni
Cc: David S. Miller, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
open list:NETWORKING [TCP], David Ahern, Jakub Kicinski,
open list
On Tue, 17 Jun 2025 13:00:53 +0200
Paolo Abeni <pabeni@redhat.com> wrote:
> On 6/13/25 12:20 PM, Petr Tesarik wrote:
> > diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
> > index 4251670e328c8..dd8f3457bd72e 100644
> > --- a/net/ipv4/tcp_metrics.c
> > +++ b/net/ipv4/tcp_metrics.c
> > @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
> > if (!dst)
> > goto reset;
> >
> > + if (dst_metric_locked(dst, RTAX_CWND))
> > + tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
> > +
> > rcu_read_lock();
> > tm = tcp_get_metrics(sk, dst, false);
> > if (!tm) {
> > @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
> > goto reset;
> > }
> >
> > - if (tcp_metric_locked(tm, TCP_METRIC_CWND))
> > - tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
> > -
> > val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
> > 0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
> > if (val) {
>
> It's unclear to me why you drop the tcp_metric_get() here. It looks like
> the above will cause a functional regression, with unlocked cached
> metrics no longer taking effects?
Unlocked cached TCP_METRIC_CWND has never taken effects. As you can
see, tcp_metric_get() was executed only if the metric was locked.
In fact, the cwnd parameter in the route does not have any effect
either. It's even documented in the manual page of ip-route(8):
cwnd NUMBER (Linux 2.3.15+ only)
the clamp for congestion window. It is ignored if
the lock flag is not used.
Note that here is also an initcwnd parameter, and I'm not changing
anything about the handling of that one.
Now, if you think that this TCP_METRIC_CWND is quite useless, then I
wholeheartedly agree with you, but we cannot simply remove it, as it
has become part of uapi, defined in include/uapi/linux/tcp_metrics.h.
Petr T
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 2/2] tcp_metrics: use ssthresh value from dst if there is no metrics
2025-06-17 10:48 ` Paolo Abeni
@ 2025-06-17 11:56 ` Petr Tesarik
0 siblings, 0 replies; 9+ messages in thread
From: Petr Tesarik @ 2025-06-17 11:56 UTC (permalink / raw)
To: Paolo Abeni
Cc: David S. Miller, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
open list:NETWORKING [TCP], David Ahern, Jakub Kicinski,
open list
On Tue, 17 Jun 2025 12:48:30 +0200
Paolo Abeni <pabeni@redhat.com> wrote:
> On 6/13/25 12:20 PM, Petr Tesarik wrote:
> > @@ -537,6 +537,9 @@ void tcp_init_metrics(struct sock *sk)
> >
> > inet_csk(sk)->icsk_rto = TCP_TIMEOUT_FALLBACK;
> > }
> > +
> > + if (tp->snd_ssthresh > tp->snd_cwnd_clamp)
> > + tp->snd_ssthresh = tp->snd_cwnd_clamp;
>
> I don't think we can do this unconditionally, as other parts of the TCP
> stack check explicitly for TCP_INFINITE_SSTHRESH.
Good catch! I noticed that the condition can never be true unless the
congestion window is explicitly clamped, but you're right that it is a
valid combination to lock the maximum cwnd but keep the initial TCP Slow
Start.
I'll fix that in v2.
Thank you,
Petr T
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry
2025-06-17 11:39 ` Petr Tesarik
@ 2025-06-18 17:01 ` Petr Tesarik
2025-06-19 8:22 ` Paolo Abeni
0 siblings, 1 reply; 9+ messages in thread
From: Petr Tesarik @ 2025-06-18 17:01 UTC (permalink / raw)
To: Paolo Abeni
Cc: David S. Miller, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
open list:NETWORKING [TCP], David Ahern, Jakub Kicinski,
open list
On Tue, 17 Jun 2025 13:39:35 +0200
Petr Tesarik <ptesarik@suse.com> wrote:
> On Tue, 17 Jun 2025 13:00:53 +0200
> Paolo Abeni <pabeni@redhat.com> wrote:
>
> > On 6/13/25 12:20 PM, Petr Tesarik wrote:
> > > diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
> > > index 4251670e328c8..dd8f3457bd72e 100644
> > > --- a/net/ipv4/tcp_metrics.c
> > > +++ b/net/ipv4/tcp_metrics.c
> > > @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
> > > if (!dst)
> > > goto reset;
> > >
> > > + if (dst_metric_locked(dst, RTAX_CWND))
> > > + tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
> > > +
> > > rcu_read_lock();
> > > tm = tcp_get_metrics(sk, dst, false);
> > > if (!tm) {
> > > @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
> > > goto reset;
> > > }
> > >
> > > - if (tcp_metric_locked(tm, TCP_METRIC_CWND))
> > > - tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
> > > -
> > > val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
> > > 0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
> > > if (val) {
> >
> > It's unclear to me why you drop the tcp_metric_get() here. It looks like
> > the above will cause a functional regression, with unlocked cached
> > metrics no longer taking effects?
>
> Unlocked cached TCP_METRIC_CWND has never taken effects. As you can
> see, tcp_metric_get() was executed only if the metric was locked.
>
> In fact, the cwnd parameter in the route does not have any effect
> either. It's even documented in the manual page of ip-route(8):
>
> cwnd NUMBER (Linux 2.3.15+ only)
> the clamp for congestion window. It is ignored if
> the lock flag is not used.
>
> Note that here is also an initcwnd parameter, and I'm not changing
> anything about the handling of that one.
>
> Now, if you think that this TCP_METRIC_CWND is quite useless, then I
> wholeheartedly agree with you, but we cannot simply remove it, as it
> has become part of uapi, defined in include/uapi/linux/tcp_metrics.h.
As an afterthought, I'm not quite sure about the semantics of this
metric. The value calculated in tcp_update_metrics() has never been
used for anything since it was introduced in 2.3.15. So there is:
- either a locked cwnd value, which is used to clamp cwnd on a
route and never updated,
- or an unlocked cwnd value, which is updated upon connection
termination but never used for anything by the kernel.
OK, the unlocked value can be read by userspace, but what is it
supposed to mean? The manual page for route-tcp_metrics(8) says: “CWND
metric value”, which sounds like the author did not have a clue either.
Unless someone here _has_ a clue, I'll just leave it as is, except the
clamp value will be taken from the routing table, as it makes no sense
to wait until the very same value propagates to a tcp_metrics_block
(where it is then never updated).
Petr T
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry
2025-06-18 17:01 ` Petr Tesarik
@ 2025-06-19 8:22 ` Paolo Abeni
0 siblings, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2025-06-19 8:22 UTC (permalink / raw)
To: Petr Tesarik
Cc: David S. Miller, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
open list:NETWORKING [TCP], David Ahern, Jakub Kicinski,
open list
On 6/18/25 7:01 PM, Petr Tesarik wrote:
> On Tue, 17 Jun 2025 13:39:35 +0200 Petr Tesarik <ptesarik@suse.com> wrote:
>> On Tue, 17 Jun 2025 13:00:53 +0200 Paolo Abeni <pabeni@redhat.com> wrote:
>>> On 6/13/25 12:20 PM, Petr Tesarik wrote:
>>>> diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
>>>> index 4251670e328c8..dd8f3457bd72e 100644
>>>> --- a/net/ipv4/tcp_metrics.c
>>>> +++ b/net/ipv4/tcp_metrics.c
>>>> @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
>>>> if (!dst)
>>>> goto reset;
>>>>
>>>> + if (dst_metric_locked(dst, RTAX_CWND))
>>>> + tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
>>>> +
>>>> rcu_read_lock();
>>>> tm = tcp_get_metrics(sk, dst, false);
>>>> if (!tm) {
>>>> @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
>>>> goto reset;
>>>> }
>>>>
>>>> - if (tcp_metric_locked(tm, TCP_METRIC_CWND))
>>>> - tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
>>>> -
>>>> val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
>>>> 0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
>>>> if (val) {
>>>
>>> It's unclear to me why you drop the tcp_metric_get() here. It looks like
>>> the above will cause a functional regression, with unlocked cached
>>> metrics no longer taking effects?
>>
>> Unlocked cached TCP_METRIC_CWND has never taken effects. As you can
>> see, tcp_metric_get() was executed only if the metric was locked.
Uhm... the locking propagation from dst to tcp storage was not so
straight forward to me, I missed it. Please be a little more verbose
about this part in the commit message.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-06-19 8:22 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13 10:20 [PATCH net 0/2] tcp_metrics: fix hanlding of route options Petr Tesarik
2025-06-13 10:20 ` [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry Petr Tesarik
2025-06-17 11:00 ` Paolo Abeni
2025-06-17 11:39 ` Petr Tesarik
2025-06-18 17:01 ` Petr Tesarik
2025-06-19 8:22 ` Paolo Abeni
2025-06-13 10:20 ` [PATCH net 2/2] tcp_metrics: use ssthresh value from dst if there is no metrics Petr Tesarik
2025-06-17 10:48 ` Paolo Abeni
2025-06-17 11:56 ` Petr Tesarik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).