* [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us
@ 2024-05-30 15:34 Kevin Yang
2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
2024-05-30 15:34 ` [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
0 siblings, 2 replies; 6+ messages in thread
From: Kevin Yang @ 2024-05-30 15:34 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Jakub Kicinski
Cc: netdev, ncardwell, ycheng, kerneljasonxing, pabeni, tonylu,
Kevin Yang
Adding a sysctl knob to allow user to specify a default
rto_min at socket init time.
After this patch series, the rto_min will has multiple sources:
route option has the highest precedence, followed by the
TCP_BPF_RTO_MIN socket option, followed by this new
tcp_rto_min_us sysctl.
v2:
fit line width to 80 column.
v1: https://lore.kernel.org/netdev/20240528171320.1332292-1-yyd@google.com/
Kevin Yang (2):
tcp: derive delack_max with tcp_rto_min helper
tcp: add sysctl_tcp_rto_min_us
Documentation/networking/ip-sysctl.rst | 13 +++++++++++++
include/net/netns/ipv4.h | 1 +
net/ipv4/sysctl_net_ipv4.c | 8 ++++++++
net/ipv4/tcp.c | 4 +++-
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/tcp_output.c | 11 ++---------
6 files changed, 28 insertions(+), 10 deletions(-)
--
2.45.1.288.g0e0cd299f1-goog
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper 2024-05-30 15:34 [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang @ 2024-05-30 15:34 ` Kevin Yang 2024-06-01 13:28 ` Simon Horman 2024-06-01 14:56 ` David Laight 2024-05-30 15:34 ` [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang 1 sibling, 2 replies; 6+ messages in thread From: Kevin Yang @ 2024-05-30 15:34 UTC (permalink / raw) To: David Miller, Eric Dumazet, Jakub Kicinski Cc: netdev, ncardwell, ycheng, kerneljasonxing, pabeni, tonylu, Kevin Yang Rto_min now has multiple souces, ordered by preprecedence high to low: ip route option rto_min, icsk->icsk_rto_min. When derive delack_max from rto_min, we should not only use ip route option, but should use tcp_rto_min helper to get the correct rto_min. Signed-off-by: Kevin Yang <yyd@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> --- net/ipv4/tcp_output.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index f97e098f18a5..b44f639a9fa6 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -4163,16 +4163,9 @@ EXPORT_SYMBOL(tcp_connect); u32 tcp_delack_max(const struct sock *sk) { - const struct dst_entry *dst = __sk_dst_get(sk); - u32 delack_max = inet_csk(sk)->icsk_delack_max; - - if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) { - u32 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN); - u32 delack_from_rto_min = max_t(int, 1, rto_min - 1); + u32 delack_from_rto_min = max_t(int, 1, tcp_rto_min(sk) - 1); - delack_max = min_t(u32, delack_max, delack_from_rto_min); - } - return delack_max; + return min_t(u32, inet_csk(sk)->icsk_delack_max, delack_from_rto_min); } /* Send out a delayed ack, the caller does the policy checking -- 2.45.1.288.g0e0cd299f1-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper 2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang @ 2024-06-01 13:28 ` Simon Horman 2024-06-01 14:56 ` David Laight 1 sibling, 0 replies; 6+ messages in thread From: Simon Horman @ 2024-06-01 13:28 UTC (permalink / raw) To: Kevin Yang Cc: David Miller, Eric Dumazet, Jakub Kicinski, netdev, ncardwell, ycheng, kerneljasonxing, pabeni, tonylu On Thu, May 30, 2024 at 03:34:35PM +0000, Kevin Yang wrote: > Rto_min now has multiple souces, ordered by preprecedence high to > low: ip route option rto_min, icsk->icsk_rto_min. Hi Kevin, If you have to respin for some other reason: souces -> sources Flagged by checkpatch.pl --codespell ... ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper 2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang 2024-06-01 13:28 ` Simon Horman @ 2024-06-01 14:56 ` David Laight 2024-06-03 21:34 ` Kevin Yang 1 sibling, 1 reply; 6+ messages in thread From: David Laight @ 2024-06-01 14:56 UTC (permalink / raw) To: 'Kevin Yang', David Miller, Eric Dumazet, Jakub Kicinski Cc: netdev@vger.kernel.org, ncardwell@google.com, ycheng@google.com, kerneljasonxing@gmail.com, pabeni@redhat.com, tonylu@linux.alibaba.com From: Kevin Yang > Sent: 30 May 2024 16:35 > To: David Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski > > Rto_min now has multiple souces, ordered by preprecedence high to > low: ip route option rto_min, icsk->icsk_rto_min. > > When derive delack_max from rto_min, we should not only use ip > route option, but should use tcp_rto_min helper to get the correct > rto_min. ... > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index f97e098f18a5..b44f639a9fa6 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -4163,16 +4163,9 @@ EXPORT_SYMBOL(tcp_connect); > > u32 tcp_delack_max(const struct sock *sk) > { > - const struct dst_entry *dst = __sk_dst_get(sk); > - u32 delack_max = inet_csk(sk)->icsk_delack_max; > - > - if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) { > - u32 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN); > - u32 delack_from_rto_min = max_t(int, 1, rto_min - 1); > + u32 delack_from_rto_min = max_t(int, 1, tcp_rto_min(sk) - 1); That max_t() is more horrid than most. Perhaps: = max(tcp_rto_min(sk), 2) - 1; > > - delack_max = min_t(u32, delack_max, delack_from_rto_min); > - } > - return delack_max; > + return min_t(u32, inet_csk(sk)->icsk_delack_max, delack_from_rto_min); Can that just be a min() ?? David > } > > /* Send out a delayed ack, the caller does the policy checking > -- > 2.45.1.288.g0e0cd299f1-goog > - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper 2024-06-01 14:56 ` David Laight @ 2024-06-03 21:34 ` Kevin Yang 0 siblings, 0 replies; 6+ messages in thread From: Kevin Yang @ 2024-06-03 21:34 UTC (permalink / raw) To: David Laight Cc: David Miller, Eric Dumazet, Jakub Kicinski, netdev@vger.kernel.org, ncardwell@google.com, ycheng@google.com, kerneljasonxing@gmail.com, pabeni@redhat.com, tonylu@linux.alibaba.com thanks for the nice suggestions, sent v3 https://lore.kernel.org/netdev/20240603213054.3883725-1-yyd@google.com/ On Sat, Jun 1, 2024 at 10:56 AM David Laight <David.Laight@aculab.com> wrote: > > From: Kevin Yang > > Sent: 30 May 2024 16:35 > > To: David Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski > > > > Rto_min now has multiple souces, ordered by preprecedence high to > > low: ip route option rto_min, icsk->icsk_rto_min. > > > > When derive delack_max from rto_min, we should not only use ip > > route option, but should use tcp_rto_min helper to get the correct > > rto_min. > ... > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > index f97e098f18a5..b44f639a9fa6 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -4163,16 +4163,9 @@ EXPORT_SYMBOL(tcp_connect); > > > > u32 tcp_delack_max(const struct sock *sk) > > { > > - const struct dst_entry *dst = __sk_dst_get(sk); > > - u32 delack_max = inet_csk(sk)->icsk_delack_max; > > - > > - if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) { > > - u32 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN); > > - u32 delack_from_rto_min = max_t(int, 1, rto_min - 1); > > + u32 delack_from_rto_min = max_t(int, 1, tcp_rto_min(sk) - 1); > > That max_t() is more horrid than most. > Perhaps: > = max(tcp_rto_min(sk), 2) - 1; > > > > > - delack_max = min_t(u32, delack_max, delack_from_rto_min); > > - } > > - return delack_max; > > + return min_t(u32, inet_csk(sk)->icsk_delack_max, delack_from_rto_min); > > Can that just be a min() ?? > > David > > > } > > > > /* Send out a delayed ack, the caller does the policy checking > > -- > > 2.45.1.288.g0e0cd299f1-goog > > > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us 2024-05-30 15:34 [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang 2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang @ 2024-05-30 15:34 ` Kevin Yang 1 sibling, 0 replies; 6+ messages in thread From: Kevin Yang @ 2024-05-30 15:34 UTC (permalink / raw) To: David Miller, Eric Dumazet, Jakub Kicinski Cc: netdev, ncardwell, ycheng, kerneljasonxing, pabeni, tonylu, Kevin Yang Adding a sysctl knob to allow user to specify a default rto_min at socket init time, other than using the hard coded 200ms default rto_min. Note that the rto_min route option has the highest precedence for configuring this setting, followed by the TCP_BPF_RTO_MIN socket option, followed by the tcp_rto_min_us sysctl. Signed-off-by: Kevin Yang <yyd@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> --- Documentation/networking/ip-sysctl.rst | 13 +++++++++++++ include/net/netns/ipv4.h | 1 + net/ipv4/sysctl_net_ipv4.c | 8 ++++++++ net/ipv4/tcp.c | 4 +++- net/ipv4/tcp_ipv4.c | 1 + 5 files changed, 26 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index bd50df6a5a42..6e99eccdb837 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1196,6 +1196,19 @@ tcp_pingpong_thresh - INTEGER Default: 1 +tcp_rto_min_us - INTEGER + Minimal TCP retransmission timeout (in microseconds). Note that the + rto_min route option has the highest precedence for configuring this + setting, followed by the TCP_BPF_RTO_MIN socket option, followed by + this tcp_rto_min_us sysctl. + + The recommended practice is to use a value less or equal to 200000 + microseconds. + + Possible Values: 1 - INT_MAX + + Default: 200000 + UDP variables ============= diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index c356c458b340..a91bb971f901 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -170,6 +170,7 @@ struct netns_ipv4 { u8 sysctl_tcp_sack; u8 sysctl_tcp_window_scaling; u8 sysctl_tcp_timestamps; + int sysctl_tcp_rto_min_us; u8 sysctl_tcp_recovery; u8 sysctl_tcp_thin_linear_timeouts; u8 sysctl_tcp_slow_start_after_idle; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index d7892f34a15b..bb64c0ef092d 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -1503,6 +1503,14 @@ static struct ctl_table ipv4_net_table[] = { .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ONE, }, + { + .procname = "tcp_rto_min_us", + .data = &init_net.ipv4.sysctl_tcp_rto_min_us, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ONE, + }, }; static __net_init int ipv4_sysctl_init_net(struct net *net) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5fa68e7f6ddb..fa43aaacd92b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -420,6 +420,7 @@ void tcp_init_sock(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); + int rto_min_us; tp->out_of_order_queue = RB_ROOT; sk->tcp_rtx_queue = RB_ROOT; @@ -428,7 +429,8 @@ void tcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&tp->tsorted_sent_queue); icsk->icsk_rto = TCP_TIMEOUT_INIT; - icsk->icsk_rto_min = TCP_RTO_MIN; + rto_min_us = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rto_min_us); + icsk->icsk_rto_min = usecs_to_jiffies(rto_min_us); icsk->icsk_delack_max = TCP_DELACK_MAX; tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT); minmax_reset(&tp->rtt_min, tcp_jiffies32, ~0U); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 041c7eda9abe..49a5e2c4ec18 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -3506,6 +3506,7 @@ static int __net_init tcp_sk_init(struct net *net) net->ipv4.sysctl_tcp_shrink_window = 0; net->ipv4.sysctl_tcp_pingpong_thresh = 1; + net->ipv4.sysctl_tcp_rto_min_us = jiffies_to_usecs(TCP_RTO_MIN); return 0; } -- 2.45.1.288.g0e0cd299f1-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-03 21:34 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-30 15:34 [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang 2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang 2024-06-01 13:28 ` Simon Horman 2024-06-01 14:56 ` David Laight 2024-06-03 21:34 ` Kevin Yang 2024-05-30 15:34 ` [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).