netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us
@ 2024-05-30 15:34 Kevin Yang
  2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
  2024-05-30 15:34 ` [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
  0 siblings, 2 replies; 6+ messages in thread
From: Kevin Yang @ 2024-05-30 15:34 UTC (permalink / raw)
  To: David Miller, Eric Dumazet, Jakub Kicinski
  Cc: netdev, ncardwell, ycheng, kerneljasonxing, pabeni, tonylu,
	Kevin Yang

Adding a sysctl knob to allow user to specify a default
rto_min at socket init time.

After this patch series, the rto_min will has multiple sources:
route option has the highest precedence, followed by the
TCP_BPF_RTO_MIN socket option, followed by this new
tcp_rto_min_us sysctl.

v2:
    fit line width to 80 column.

v1: https://lore.kernel.org/netdev/20240528171320.1332292-1-yyd@google.com/

Kevin Yang (2):
  tcp: derive delack_max with tcp_rto_min helper
  tcp: add sysctl_tcp_rto_min_us

 Documentation/networking/ip-sysctl.rst | 13 +++++++++++++
 include/net/netns/ipv4.h               |  1 +
 net/ipv4/sysctl_net_ipv4.c             |  8 ++++++++
 net/ipv4/tcp.c                         |  4 +++-
 net/ipv4/tcp_ipv4.c                    |  1 +
 net/ipv4/tcp_output.c                  | 11 ++---------
 6 files changed, 28 insertions(+), 10 deletions(-)

-- 
2.45.1.288.g0e0cd299f1-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper
  2024-05-30 15:34 [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
@ 2024-05-30 15:34 ` Kevin Yang
  2024-06-01 13:28   ` Simon Horman
  2024-06-01 14:56   ` David Laight
  2024-05-30 15:34 ` [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
  1 sibling, 2 replies; 6+ messages in thread
From: Kevin Yang @ 2024-05-30 15:34 UTC (permalink / raw)
  To: David Miller, Eric Dumazet, Jakub Kicinski
  Cc: netdev, ncardwell, ycheng, kerneljasonxing, pabeni, tonylu,
	Kevin Yang

Rto_min now has multiple souces, ordered by preprecedence high to
low: ip route option rto_min, icsk->icsk_rto_min.

When derive delack_max from rto_min, we should not only use ip
route option, but should use tcp_rto_min helper to get the correct
rto_min.

Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
---
 net/ipv4/tcp_output.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f97e098f18a5..b44f639a9fa6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4163,16 +4163,9 @@ EXPORT_SYMBOL(tcp_connect);
 
 u32 tcp_delack_max(const struct sock *sk)
 {
-	const struct dst_entry *dst = __sk_dst_get(sk);
-	u32 delack_max = inet_csk(sk)->icsk_delack_max;
-
-	if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) {
-		u32 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
-		u32 delack_from_rto_min = max_t(int, 1, rto_min - 1);
+	u32 delack_from_rto_min = max_t(int, 1, tcp_rto_min(sk) - 1);
 
-		delack_max = min_t(u32, delack_max, delack_from_rto_min);
-	}
-	return delack_max;
+	return min_t(u32, inet_csk(sk)->icsk_delack_max, delack_from_rto_min);
 }
 
 /* Send out a delayed ack, the caller does the policy checking
-- 
2.45.1.288.g0e0cd299f1-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us
  2024-05-30 15:34 [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
  2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
@ 2024-05-30 15:34 ` Kevin Yang
  1 sibling, 0 replies; 6+ messages in thread
From: Kevin Yang @ 2024-05-30 15:34 UTC (permalink / raw)
  To: David Miller, Eric Dumazet, Jakub Kicinski
  Cc: netdev, ncardwell, ycheng, kerneljasonxing, pabeni, tonylu,
	Kevin Yang

Adding a sysctl knob to allow user to specify a default
rto_min at socket init time, other than using the hard
coded 200ms default rto_min.

Note that the rto_min route option has the highest precedence
for configuring this setting, followed by the TCP_BPF_RTO_MIN
socket option, followed by the tcp_rto_min_us sysctl.

Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/ip-sysctl.rst | 13 +++++++++++++
 include/net/netns/ipv4.h               |  1 +
 net/ipv4/sysctl_net_ipv4.c             |  8 ++++++++
 net/ipv4/tcp.c                         |  4 +++-
 net/ipv4/tcp_ipv4.c                    |  1 +
 5 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index bd50df6a5a42..6e99eccdb837 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1196,6 +1196,19 @@ tcp_pingpong_thresh - INTEGER
 
 	Default: 1
 
+tcp_rto_min_us - INTEGER
+	Minimal TCP retransmission timeout (in microseconds). Note that the
+	rto_min route option has the highest precedence for configuring this
+	setting, followed by the TCP_BPF_RTO_MIN socket option, followed by
+	this tcp_rto_min_us sysctl.
+
+	The recommended practice is to use a value less or equal to 200000
+	microseconds.
+
+	Possible Values: 1 - INT_MAX
+
+	Default: 200000
+
 UDP variables
 =============
 
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index c356c458b340..a91bb971f901 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -170,6 +170,7 @@ struct netns_ipv4 {
 	u8 sysctl_tcp_sack;
 	u8 sysctl_tcp_window_scaling;
 	u8 sysctl_tcp_timestamps;
+	int sysctl_tcp_rto_min_us;
 	u8 sysctl_tcp_recovery;
 	u8 sysctl_tcp_thin_linear_timeouts;
 	u8 sysctl_tcp_slow_start_after_idle;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index d7892f34a15b..bb64c0ef092d 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -1503,6 +1503,14 @@ static struct ctl_table ipv4_net_table[] = {
 		.proc_handler	= proc_dou8vec_minmax,
 		.extra1		= SYSCTL_ONE,
 	},
+	{
+		.procname	= "tcp_rto_min_us",
+		.data		= &init_net.ipv4.sysctl_tcp_rto_min_us,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ONE,
+	},
 };
 
 static __net_init int ipv4_sysctl_init_net(struct net *net)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5fa68e7f6ddb..fa43aaacd92b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -420,6 +420,7 @@ void tcp_init_sock(struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
+	int rto_min_us;
 
 	tp->out_of_order_queue = RB_ROOT;
 	sk->tcp_rtx_queue = RB_ROOT;
@@ -428,7 +429,8 @@ void tcp_init_sock(struct sock *sk)
 	INIT_LIST_HEAD(&tp->tsorted_sent_queue);
 
 	icsk->icsk_rto = TCP_TIMEOUT_INIT;
-	icsk->icsk_rto_min = TCP_RTO_MIN;
+	rto_min_us = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rto_min_us);
+	icsk->icsk_rto_min = usecs_to_jiffies(rto_min_us);
 	icsk->icsk_delack_max = TCP_DELACK_MAX;
 	tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
 	minmax_reset(&tp->rtt_min, tcp_jiffies32, ~0U);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 041c7eda9abe..49a5e2c4ec18 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3506,6 +3506,7 @@ static int __net_init tcp_sk_init(struct net *net)
 	net->ipv4.sysctl_tcp_shrink_window = 0;
 
 	net->ipv4.sysctl_tcp_pingpong_thresh = 1;
+	net->ipv4.sysctl_tcp_rto_min_us = jiffies_to_usecs(TCP_RTO_MIN);
 
 	return 0;
 }
-- 
2.45.1.288.g0e0cd299f1-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper
  2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
@ 2024-06-01 13:28   ` Simon Horman
  2024-06-01 14:56   ` David Laight
  1 sibling, 0 replies; 6+ messages in thread
From: Simon Horman @ 2024-06-01 13:28 UTC (permalink / raw)
  To: Kevin Yang
  Cc: David Miller, Eric Dumazet, Jakub Kicinski, netdev, ncardwell,
	ycheng, kerneljasonxing, pabeni, tonylu

On Thu, May 30, 2024 at 03:34:35PM +0000, Kevin Yang wrote:
> Rto_min now has multiple souces, ordered by preprecedence high to
> low: ip route option rto_min, icsk->icsk_rto_min.

Hi Kevin,

If you have to respin for some other reason: souces -> sources

Flagged by checkpatch.pl --codespell

...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper
  2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
  2024-06-01 13:28   ` Simon Horman
@ 2024-06-01 14:56   ` David Laight
  2024-06-03 21:34     ` Kevin Yang
  1 sibling, 1 reply; 6+ messages in thread
From: David Laight @ 2024-06-01 14:56 UTC (permalink / raw)
  To: 'Kevin Yang', David Miller, Eric Dumazet, Jakub Kicinski
  Cc: netdev@vger.kernel.org, ncardwell@google.com, ycheng@google.com,
	kerneljasonxing@gmail.com, pabeni@redhat.com,
	tonylu@linux.alibaba.com

From: Kevin Yang
> Sent: 30 May 2024 16:35
> To: David Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski
> 
> Rto_min now has multiple souces, ordered by preprecedence high to
> low: ip route option rto_min, icsk->icsk_rto_min.
> 
> When derive delack_max from rto_min, we should not only use ip
> route option, but should use tcp_rto_min helper to get the correct
> rto_min.
...
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index f97e098f18a5..b44f639a9fa6 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -4163,16 +4163,9 @@ EXPORT_SYMBOL(tcp_connect);
> 
>  u32 tcp_delack_max(const struct sock *sk)
>  {
> -	const struct dst_entry *dst = __sk_dst_get(sk);
> -	u32 delack_max = inet_csk(sk)->icsk_delack_max;
> -
> -	if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) {
> -		u32 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
> -		u32 delack_from_rto_min = max_t(int, 1, rto_min - 1);
> +	u32 delack_from_rto_min = max_t(int, 1, tcp_rto_min(sk) - 1);

That max_t() is more horrid than most.
Perhaps:
		= max(tcp_rto_min(sk), 2) - 1;

> 
> -		delack_max = min_t(u32, delack_max, delack_from_rto_min);
> -	}
> -	return delack_max;
> +	return min_t(u32, inet_csk(sk)->icsk_delack_max, delack_from_rto_min);

Can that just be a min() ??

	David

>  }
> 
>  /* Send out a delayed ack, the caller does the policy checking
> --
> 2.45.1.288.g0e0cd299f1-goog
> 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper
  2024-06-01 14:56   ` David Laight
@ 2024-06-03 21:34     ` Kevin Yang
  0 siblings, 0 replies; 6+ messages in thread
From: Kevin Yang @ 2024-06-03 21:34 UTC (permalink / raw)
  To: David Laight
  Cc: David Miller, Eric Dumazet, Jakub Kicinski,
	netdev@vger.kernel.org, ncardwell@google.com, ycheng@google.com,
	kerneljasonxing@gmail.com, pabeni@redhat.com,
	tonylu@linux.alibaba.com

thanks for the nice suggestions, sent v3
https://lore.kernel.org/netdev/20240603213054.3883725-1-yyd@google.com/

On Sat, Jun 1, 2024 at 10:56 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Kevin Yang
> > Sent: 30 May 2024 16:35
> > To: David Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski
> >
> > Rto_min now has multiple souces, ordered by preprecedence high to
> > low: ip route option rto_min, icsk->icsk_rto_min.
> >
> > When derive delack_max from rto_min, we should not only use ip
> > route option, but should use tcp_rto_min helper to get the correct
> > rto_min.
> ...
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index f97e098f18a5..b44f639a9fa6 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -4163,16 +4163,9 @@ EXPORT_SYMBOL(tcp_connect);
> >
> >  u32 tcp_delack_max(const struct sock *sk)
> >  {
> > -     const struct dst_entry *dst = __sk_dst_get(sk);
> > -     u32 delack_max = inet_csk(sk)->icsk_delack_max;
> > -
> > -     if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) {
> > -             u32 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
> > -             u32 delack_from_rto_min = max_t(int, 1, rto_min - 1);
> > +     u32 delack_from_rto_min = max_t(int, 1, tcp_rto_min(sk) - 1);
>
> That max_t() is more horrid than most.
> Perhaps:
>                 = max(tcp_rto_min(sk), 2) - 1;
>
> >
> > -             delack_max = min_t(u32, delack_max, delack_from_rto_min);
> > -     }
> > -     return delack_max;
> > +     return min_t(u32, inet_csk(sk)->icsk_delack_max, delack_from_rto_min);
>
> Can that just be a min() ??
>
>         David
>
> >  }
> >
> >  /* Send out a delayed ack, the caller does the policy checking
> > --
> > 2.45.1.288.g0e0cd299f1-goog
> >
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-03 21:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-30 15:34 [PATCH net-next v2 0/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang
2024-05-30 15:34 ` [PATCH net-next v2 1/2] tcp: derive delack_max with tcp_rto_min helper Kevin Yang
2024-06-01 13:28   ` Simon Horman
2024-06-01 14:56   ` David Laight
2024-06-03 21:34     ` Kevin Yang
2024-05-30 15:34 ` [PATCH net-next v2 2/2] tcp: add sysctl_tcp_rto_min_us Kevin Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).