* [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us
@ 2024-05-30 15:34 Kevin Yang
2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
2024-05-30 15:34 ` [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
0 siblings, 2 replies; 6+ messages in thread
From: Kevin Yang @ 2024-05-30 15:34 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Jakub Kicinski
Cc: netdev, ncardwell, ycheng, kerneljasonxing, pabeni, tonylu,
Kevin Yang
Adding a sysctl knob to allow user to specify a default
rto_min at socket init time.
After this patch series, the rto_min will has multiple sources:
route option has the highest precedence, followed by the
TCP_BPF_RTO_MIN socket option, followed by this new
tcp_rto_min_us sysctl.
v2:
fit line width to 80 column.
v1: https://lore.kernel.org/netdev/20240528171320.1332292-1-yyd@google.com/
Kevin Yang (2):
tcp: derive delack_max with tcp_rto_min helper
tcp: add sysctl_tcp_rto_min_us
Documentation/networking/ip-sysctl.rst | 13 +++++++++++++
include/net/netns/ipv4.h | 1 +
net/ipv4/sysctl_net_ipv4.c | 8 ++++++++
net/ipv4/tcp.c | 4 +++-
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/tcp_output.c | 11 ++---------
6 files changed, 28 insertions(+), 10 deletions(-)
--
2.45.1.288.g0e0cd299f1-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper
2024-05-30 15:34 [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
@ 2024-05-30 15:34 ` Kevin Yang
2024-06-01 13:28 ` Simon Horman
2024-06-01 14:56 ` David Laight
2024-05-30 15:34 ` [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
1 sibling, 2 replies; 6+ messages in thread
From: Kevin Yang @ 2024-05-30 15:34 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Jakub Kicinski
Cc: netdev, ncardwell, ycheng, kerneljasonxing, pabeni, tonylu,
Kevin Yang
Rto_min now has multiple souces, ordered by preprecedence high to
low: ip route option rto_min, icsk->icsk_rto_min.
When derive delack_max from rto_min, we should not only use ip
route option, but should use tcp_rto_min helper to get the correct
rto_min.
Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
---
net/ipv4/tcp_output.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f97e098f18a5..b44f639a9fa6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4163,16 +4163,9 @@ EXPORT_SYMBOL(tcp_connect);
u32 tcp_delack_max(const struct sock *sk)
{
- const struct dst_entry *dst = __sk_dst_get(sk);
- u32 delack_max = inet_csk(sk)->icsk_delack_max;
-
- if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) {
- u32 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
- u32 delack_from_rto_min = max_t(int, 1, rto_min - 1);
+ u32 delack_from_rto_min = max_t(int, 1, tcp_rto_min(sk) - 1);
- delack_max = min_t(u32, delack_max, delack_from_rto_min);
- }
- return delack_max;
+ return min_t(u32, inet_csk(sk)->icsk_delack_max, delack_from_rto_min);
}
/* Send out a delayed ack, the caller does the policy checking
--
2.45.1.288.g0e0cd299f1-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us
2024-05-30 15:34 [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
@ 2024-05-30 15:34 ` Kevin Yang
1 sibling, 0 replies; 6+ messages in thread
From: Kevin Yang @ 2024-05-30 15:34 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Jakub Kicinski
Cc: netdev, ncardwell, ycheng, kerneljasonxing, pabeni, tonylu,
Kevin Yang
Adding a sysctl knob to allow user to specify a default
rto_min at socket init time, other than using the hard
coded 200ms default rto_min.
Note that the rto_min route option has the highest precedence
for configuring this setting, followed by the TCP_BPF_RTO_MIN
socket option, followed by the tcp_rto_min_us sysctl.
Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
---
Documentation/networking/ip-sysctl.rst | 13 +++++++++++++
include/net/netns/ipv4.h | 1 +
net/ipv4/sysctl_net_ipv4.c | 8 ++++++++
net/ipv4/tcp.c | 4 +++-
net/ipv4/tcp_ipv4.c | 1 +
5 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index bd50df6a5a42..6e99eccdb837 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1196,6 +1196,19 @@ tcp_pingpong_thresh - INTEGER
Default: 1
+tcp_rto_min_us - INTEGER
+ Minimal TCP retransmission timeout (in microseconds). Note that the
+ rto_min route option has the highest precedence for configuring this
+ setting, followed by the TCP_BPF_RTO_MIN socket option, followed by
+ this tcp_rto_min_us sysctl.
+
+ The recommended practice is to use a value less or equal to 200000
+ microseconds.
+
+ Possible Values: 1 - INT_MAX
+
+ Default: 200000
+
UDP variables
=============
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index c356c458b340..a91bb971f901 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -170,6 +170,7 @@ struct netns_ipv4 {
u8 sysctl_tcp_sack;
u8 sysctl_tcp_window_scaling;
u8 sysctl_tcp_timestamps;
+ int sysctl_tcp_rto_min_us;
u8 sysctl_tcp_recovery;
u8 sysctl_tcp_thin_linear_timeouts;
u8 sysctl_tcp_slow_start_after_idle;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index d7892f34a15b..bb64c0ef092d 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -1503,6 +1503,14 @@ static struct ctl_table ipv4_net_table[] = {
.proc_handler = proc_dou8vec_minmax,
.extra1 = SYSCTL_ONE,
},
+ {
+ .procname = "tcp_rto_min_us",
+ .data = &init_net.ipv4.sysctl_tcp_rto_min_us,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ONE,
+ },
};
static __net_init int ipv4_sysctl_init_net(struct net *net)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5fa68e7f6ddb..fa43aaacd92b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -420,6 +420,7 @@ void tcp_init_sock(struct sock *sk)
{
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
+ int rto_min_us;
tp->out_of_order_queue = RB_ROOT;
sk->tcp_rtx_queue = RB_ROOT;
@@ -428,7 +429,8 @@ void tcp_init_sock(struct sock *sk)
INIT_LIST_HEAD(&tp->tsorted_sent_queue);
icsk->icsk_rto = TCP_TIMEOUT_INIT;
- icsk->icsk_rto_min = TCP_RTO_MIN;
+ rto_min_us = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rto_min_us);
+ icsk->icsk_rto_min = usecs_to_jiffies(rto_min_us);
icsk->icsk_delack_max = TCP_DELACK_MAX;
tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
minmax_reset(&tp->rtt_min, tcp_jiffies32, ~0U);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 041c7eda9abe..49a5e2c4ec18 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3506,6 +3506,7 @@ static int __net_init tcp_sk_init(struct net *net)
net->ipv4.sysctl_tcp_shrink_window = 0;
net->ipv4.sysctl_tcp_pingpong_thresh = 1;
+ net->ipv4.sysctl_tcp_rto_min_us = jiffies_to_usecs(TCP_RTO_MIN);
return 0;
}
--
2.45.1.288.g0e0cd299f1-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper
2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
@ 2024-06-01 13:28 ` Simon Horman
2024-06-01 14:56 ` David Laight
1 sibling, 0 replies; 6+ messages in thread
From: Simon Horman @ 2024-06-01 13:28 UTC (permalink / raw)
To: Kevin Yang
Cc: David Miller, Eric Dumazet, Jakub Kicinski, netdev, ncardwell,
ycheng, kerneljasonxing, pabeni, tonylu
On Thu, May 30, 2024 at 03:34:35PM +0000, Kevin Yang wrote:
> Rto_min now has multiple souces, ordered by preprecedence high to
> low: ip route option rto_min, icsk->icsk_rto_min.
Hi Kevin,
If you have to respin for some other reason: souces -> sources
Flagged by checkpatch.pl --codespell
...
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper
2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
2024-06-01 13:28 ` Simon Horman
@ 2024-06-01 14:56 ` David Laight
2024-06-03 21:34 ` Kevin Yang
1 sibling, 1 reply; 6+ messages in thread
From: David Laight @ 2024-06-01 14:56 UTC (permalink / raw)
To: 'Kevin Yang', David Miller, Eric Dumazet, Jakub Kicinski
Cc: netdev@vger.kernel.org, ncardwell@google.com, ycheng@google.com,
kerneljasonxing@gmail.com, pabeni@redhat.com,
tonylu@linux.alibaba.com
From: Kevin Yang
> Sent: 30 May 2024 16:35
> To: David Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski
>
> Rto_min now has multiple souces, ordered by preprecedence high to
> low: ip route option rto_min, icsk->icsk_rto_min.
>
> When derive delack_max from rto_min, we should not only use ip
> route option, but should use tcp_rto_min helper to get the correct
> rto_min.
...
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index f97e098f18a5..b44f639a9fa6 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -4163,16 +4163,9 @@ EXPORT_SYMBOL(tcp_connect);
>
> u32 tcp_delack_max(const struct sock *sk)
> {
> - const struct dst_entry *dst = __sk_dst_get(sk);
> - u32 delack_max = inet_csk(sk)->icsk_delack_max;
> -
> - if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) {
> - u32 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
> - u32 delack_from_rto_min = max_t(int, 1, rto_min - 1);
> + u32 delack_from_rto_min = max_t(int, 1, tcp_rto_min(sk) - 1);
That max_t() is more horrid than most.
Perhaps:
= max(tcp_rto_min(sk), 2) - 1;
>
> - delack_max = min_t(u32, delack_max, delack_from_rto_min);
> - }
> - return delack_max;
> + return min_t(u32, inet_csk(sk)->icsk_delack_max, delack_from_rto_min);
Can that just be a min() ??
David
> }
>
> /* Send out a delayed ack, the caller does the policy checking
> --
> 2.45.1.288.g0e0cd299f1-goog
>
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper
2024-06-01 14:56 ` David Laight
@ 2024-06-03 21:34 ` Kevin Yang
0 siblings, 0 replies; 6+ messages in thread
From: Kevin Yang @ 2024-06-03 21:34 UTC (permalink / raw)
To: David Laight
Cc: David Miller, Eric Dumazet, Jakub Kicinski,
netdev@vger.kernel.org, ncardwell@google.com, ycheng@google.com,
kerneljasonxing@gmail.com, pabeni@redhat.com,
tonylu@linux.alibaba.com
thanks for the nice suggestions, sent v3
https://lore.kernel.org/netdev/20240603213054.3883725-1-yyd@google.com/
On Sat, Jun 1, 2024 at 10:56 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Kevin Yang
> > Sent: 30 May 2024 16:35
> > To: David Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski
> >
> > Rto_min now has multiple souces, ordered by preprecedence high to
> > low: ip route option rto_min, icsk->icsk_rto_min.
> >
> > When derive delack_max from rto_min, we should not only use ip
> > route option, but should use tcp_rto_min helper to get the correct
> > rto_min.
> ...
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index f97e098f18a5..b44f639a9fa6 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -4163,16 +4163,9 @@ EXPORT_SYMBOL(tcp_connect);
> >
> > u32 tcp_delack_max(const struct sock *sk)
> > {
> > - const struct dst_entry *dst = __sk_dst_get(sk);
> > - u32 delack_max = inet_csk(sk)->icsk_delack_max;
> > -
> > - if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) {
> > - u32 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
> > - u32 delack_from_rto_min = max_t(int, 1, rto_min - 1);
> > + u32 delack_from_rto_min = max_t(int, 1, tcp_rto_min(sk) - 1);
>
> That max_t() is more horrid than most.
> Perhaps:
> = max(tcp_rto_min(sk), 2) - 1;
>
> >
> > - delack_max = min_t(u32, delack_max, delack_from_rto_min);
> > - }
> > - return delack_max;
> > + return min_t(u32, inet_csk(sk)->icsk_delack_max, delack_from_rto_min);
>
> Can that just be a min() ??
>
> David
>
> > }
> >
> > /* Send out a delayed ack, the caller does the policy checking
> > --
> > 2.45.1.288.g0e0cd299f1-goog
> >
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-03 21:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-30 15:34 [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
2024-06-01 13:28 ` Simon Horman
2024-06-01 14:56 ` David Laight
2024-06-03 21:34 ` Kevin Yang
2024-05-30 15:34 ` [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).