* [PATCH net-next v3 0/2] support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt
@ 2025-03-16 2:27 Jason Xing
2025-03-16 2:27 ` [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use Jason Xing
2025-03-16 2:27 ` [PATCH net-next v3 2/2] tcp: support TCP_DELACK_MAX_US " Jason Xing
0 siblings, 2 replies; 5+ messages in thread
From: Jason Xing @ 2025-03-16 2:27 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, dsahern, horms, kuniyu, ncardwell
Cc: netdev, Jason Xing
Add set/getsockopt supports for TCP_RTO_MIN_US and TCP_DELACK_MAX_US.
Jason Xing (2):
tcp: support TCP_RTO_MIN_US for set/getsockopt use
tcp: support TCP_DELACK_MAX_US for set/getsockopt use
Documentation/networking/ip-sysctl.rst | 4 ++--
include/net/tcp.h | 2 +-
include/uapi/linux/tcp.h | 2 ++
net/ipv4/tcp.c | 32 ++++++++++++++++++++++++--
net/ipv4/tcp_output.c | 2 +-
5 files changed, 36 insertions(+), 6 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use
2025-03-16 2:27 [PATCH net-next v3 0/2] support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt Jason Xing
@ 2025-03-16 2:27 ` Jason Xing
2025-03-17 8:19 ` Eric Dumazet
2025-03-16 2:27 ` [PATCH net-next v3 2/2] tcp: support TCP_DELACK_MAX_US " Jason Xing
1 sibling, 1 reply; 5+ messages in thread
From: Jason Xing @ 2025-03-16 2:27 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, dsahern, horms, kuniyu, ncardwell
Cc: netdev, Jason Xing
Support adjusting RTO MIN for socket level by using setsockopt().
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
Documentation/networking/ip-sysctl.rst | 4 ++--
include/net/tcp.h | 2 +-
include/uapi/linux/tcp.h | 1 +
net/ipv4/tcp.c | 16 +++++++++++++++-
4 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 054561f8dcae..5c63ab928b97 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1229,8 +1229,8 @@ tcp_pingpong_thresh - INTEGER
tcp_rto_min_us - INTEGER
Minimal TCP retransmission timeout (in microseconds). Note that the
rto_min route option has the highest precedence for configuring this
- setting, followed by the TCP_BPF_RTO_MIN socket option, followed by
- this tcp_rto_min_us sysctl.
+ setting, followed by the TCP_BPF_RTO_MIN and TCP_RTO_MIN_US socket
+ options, followed by this tcp_rto_min_us sysctl.
The recommended practice is to use a value less or equal to 200000
microseconds.
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7207c52b1fc9..6a7aab854b86 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -806,7 +806,7 @@ u32 tcp_delack_max(const struct sock *sk);
static inline u32 tcp_rto_min(const struct sock *sk)
{
const struct dst_entry *dst = __sk_dst_get(sk);
- u32 rto_min = inet_csk(sk)->icsk_rto_min;
+ u32 rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min);
if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 32a27b4a5020..b2476cf7058e 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -137,6 +137,7 @@ enum {
#define TCP_IS_MPTCP 43 /* Is MPTCP being used? */
#define TCP_RTO_MAX_MS 44 /* max rto time in ms */
+#define TCP_RTO_MIN_US 45 /* min rto time in us */
#define TCP_REPAIR_ON 1
#define TCP_REPAIR_OFF 0
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 46951e749308..f2249d712fcc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3352,7 +3352,7 @@ int tcp_disconnect(struct sock *sk, int flags)
icsk->icsk_probes_out = 0;
icsk->icsk_probes_tstamp = 0;
icsk->icsk_rto = TCP_TIMEOUT_INIT;
- icsk->icsk_rto_min = TCP_RTO_MIN;
+ WRITE_ONCE(icsk->icsk_rto_min, TCP_RTO_MIN);
icsk->icsk_delack_max = TCP_DELACK_MAX;
tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
tcp_snd_cwnd_set(tp, TCP_INIT_CWND);
@@ -3833,6 +3833,14 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
WRITE_ONCE(inet_csk(sk)->icsk_rto_max, msecs_to_jiffies(val));
return 0;
+ case TCP_RTO_MIN_US: {
+ int rto_min = usecs_to_jiffies(val);
+
+ if (rto_min > TCP_RTO_MIN || rto_min < TCP_TIMEOUT_MIN)
+ return -EINVAL;
+ WRITE_ONCE(inet_csk(sk)->icsk_rto_min, rto_min);
+ return 0;
+ }
}
sockopt_lock_sock(sk);
@@ -4672,6 +4680,12 @@ int do_tcp_getsockopt(struct sock *sk, int level,
case TCP_RTO_MAX_MS:
val = jiffies_to_msecs(tcp_rto_max(sk));
break;
+ case TCP_RTO_MIN_US: {
+ int rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min);
+
+ val = jiffies_to_usecs(rto_min);
+ break;
+ }
default:
return -ENOPROTOOPT;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next v3 2/2] tcp: support TCP_DELACK_MAX_US for set/getsockopt use
2025-03-16 2:27 [PATCH net-next v3 0/2] support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt Jason Xing
2025-03-16 2:27 ` [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use Jason Xing
@ 2025-03-16 2:27 ` Jason Xing
1 sibling, 0 replies; 5+ messages in thread
From: Jason Xing @ 2025-03-16 2:27 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, dsahern, horms, kuniyu, ncardwell
Cc: netdev, Jason Xing
Support adjusting delayed ack max for socket level by using setsockopt().
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
include/uapi/linux/tcp.h | 1 +
net/ipv4/tcp.c | 16 +++++++++++++++-
net/ipv4/tcp_output.c | 2 +-
3 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index b2476cf7058e..2377e22f2c4b 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -138,6 +138,7 @@ enum {
#define TCP_IS_MPTCP 43 /* Is MPTCP being used? */
#define TCP_RTO_MAX_MS 44 /* max rto time in ms */
#define TCP_RTO_MIN_US 45 /* min rto time in us */
+#define TCP_DELACK_MAX_US 46 /* max delayed ack time in us */
#define TCP_REPAIR_ON 1
#define TCP_REPAIR_OFF 0
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f2249d712fcc..d12a663e13be 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3353,7 +3353,7 @@ int tcp_disconnect(struct sock *sk, int flags)
icsk->icsk_probes_tstamp = 0;
icsk->icsk_rto = TCP_TIMEOUT_INIT;
WRITE_ONCE(icsk->icsk_rto_min, TCP_RTO_MIN);
- icsk->icsk_delack_max = TCP_DELACK_MAX;
+ WRITE_ONCE(icsk->icsk_delack_max, TCP_DELACK_MAX);
tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
tcp_snd_cwnd_set(tp, TCP_INIT_CWND);
tp->snd_cwnd_cnt = 0;
@@ -3841,6 +3841,14 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
WRITE_ONCE(inet_csk(sk)->icsk_rto_min, rto_min);
return 0;
}
+ case TCP_DELACK_MAX_US: {
+ int delack_max = usecs_to_jiffies(val);
+
+ if (delack_max > TCP_DELACK_MAX || delack_max < TCP_TIMEOUT_MIN)
+ return -EINVAL;
+ WRITE_ONCE(inet_csk(sk)->icsk_delack_max, delack_max);
+ return 0;
+ }
}
sockopt_lock_sock(sk);
@@ -4686,6 +4694,12 @@ int do_tcp_getsockopt(struct sock *sk, int level,
val = jiffies_to_usecs(rto_min);
break;
}
+ case TCP_DELACK_MAX_US: {
+ int delack_max = READ_ONCE(inet_csk(sk)->icsk_delack_max);
+
+ val = jiffies_to_usecs(delack_max);
+ break;
+ }
default:
return -ENOPROTOOPT;
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 24e56bf96747..65aa26d65987 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4179,7 +4179,7 @@ u32 tcp_delack_max(const struct sock *sk)
{
u32 delack_from_rto_min = max(tcp_rto_min(sk), 2) - 1;
- return min(inet_csk(sk)->icsk_delack_max, delack_from_rto_min);
+ return min(READ_ONCE(inet_csk(sk)->icsk_delack_max), delack_from_rto_min);
}
/* Send out a delayed ack, the caller does the policy checking
--
2.43.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use
2025-03-16 2:27 ` [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use Jason Xing
@ 2025-03-17 8:19 ` Eric Dumazet
2025-03-17 12:04 ` Jason Xing
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2025-03-17 8:19 UTC (permalink / raw)
To: Jason Xing; +Cc: davem, kuba, pabeni, dsahern, horms, kuniyu, ncardwell, netdev
On Sun, Mar 16, 2025 at 3:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Support adjusting RTO MIN for socket level by using setsockopt().
This changelog is small :/
You should clearly state that this option has no effect if the route
has a RTAX_RTO_MIN attribute set.
Also document what is the default socket value after a socket() system
call and/or accept() in the changelog.
>
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
> Documentation/networking/ip-sysctl.rst | 4 ++--
> include/net/tcp.h | 2 +-
> include/uapi/linux/tcp.h | 1 +
> net/ipv4/tcp.c | 16 +++++++++++++++-
> 4 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index 054561f8dcae..5c63ab928b97 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -1229,8 +1229,8 @@ tcp_pingpong_thresh - INTEGER
> tcp_rto_min_us - INTEGER
> Minimal TCP retransmission timeout (in microseconds). Note that the
> rto_min route option has the highest precedence for configuring this
> - setting, followed by the TCP_BPF_RTO_MIN socket option, followed by
> - this tcp_rto_min_us sysctl.
> + setting, followed by the TCP_BPF_RTO_MIN and TCP_RTO_MIN_US socket
> + options, followed by this tcp_rto_min_us sysctl.
>
> The recommended practice is to use a value less or equal to 200000
> microseconds.
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 7207c52b1fc9..6a7aab854b86 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -806,7 +806,7 @@ u32 tcp_delack_max(const struct sock *sk);
> static inline u32 tcp_rto_min(const struct sock *sk)
> {
> const struct dst_entry *dst = __sk_dst_get(sk);
> - u32 rto_min = inet_csk(sk)->icsk_rto_min;
> + u32 rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min);
>
> if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
> rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 32a27b4a5020..b2476cf7058e 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -137,6 +137,7 @@ enum {
>
> #define TCP_IS_MPTCP 43 /* Is MPTCP being used? */
> #define TCP_RTO_MAX_MS 44 /* max rto time in ms */
> +#define TCP_RTO_MIN_US 45 /* min rto time in us */
>
> #define TCP_REPAIR_ON 1
> #define TCP_REPAIR_OFF 0
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 46951e749308..f2249d712fcc 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3352,7 +3352,7 @@ int tcp_disconnect(struct sock *sk, int flags)
> icsk->icsk_probes_out = 0;
> icsk->icsk_probes_tstamp = 0;
> icsk->icsk_rto = TCP_TIMEOUT_INIT;
> - icsk->icsk_rto_min = TCP_RTO_MIN;
> + WRITE_ONCE(icsk->icsk_rto_min, TCP_RTO_MIN);
> icsk->icsk_delack_max = TCP_DELACK_MAX;
> tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
> tcp_snd_cwnd_set(tp, TCP_INIT_CWND);
> @@ -3833,6 +3833,14 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
> return -EINVAL;
> WRITE_ONCE(inet_csk(sk)->icsk_rto_max, msecs_to_jiffies(val));
> return 0;
> + case TCP_RTO_MIN_US: {
> + int rto_min = usecs_to_jiffies(val);
> +
> + if (rto_min > TCP_RTO_MIN || rto_min < TCP_TIMEOUT_MIN)
> + return -EINVAL;
> + WRITE_ONCE(inet_csk(sk)->icsk_rto_min, rto_min);
> + return 0;
> + }
> }
>
> sockopt_lock_sock(sk);
> @@ -4672,6 +4680,12 @@ int do_tcp_getsockopt(struct sock *sk, int level,
> case TCP_RTO_MAX_MS:
> val = jiffies_to_msecs(tcp_rto_max(sk));
> break;
> + case TCP_RTO_MIN_US: {
> + int rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min);
> +
> + val = jiffies_to_usecs(rto_min);
Reuse val directly, no need for a temporary variable, there is no
fancy computation on it.
val =
jiffies_to_usecs(READ_ONCE(inet_csk(sk)->icsk_rto_min));
break;
> + break;
> + }
> default:
> return -ENOPROTOOPT;
> }
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use
2025-03-17 8:19 ` Eric Dumazet
@ 2025-03-17 12:04 ` Jason Xing
0 siblings, 0 replies; 5+ messages in thread
From: Jason Xing @ 2025-03-17 12:04 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, kuba, pabeni, dsahern, horms, kuniyu, ncardwell, netdev
On Mon, Mar 17, 2025 at 4:19 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sun, Mar 16, 2025 at 3:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Support adjusting RTO MIN for socket level by using setsockopt().
>
>
> This changelog is small :/
>
> You should clearly state that this option has no effect if the route
> has a RTAX_RTO_MIN attribute set.
>
> Also document what is the default socket value after a socket() system
> call and/or accept() in the changelog.
Thanks for the review.
Will do it in another patch as well.
>
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> > Documentation/networking/ip-sysctl.rst | 4 ++--
> > include/net/tcp.h | 2 +-
> > include/uapi/linux/tcp.h | 1 +
> > net/ipv4/tcp.c | 16 +++++++++++++++-
> > 4 files changed, 19 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> > index 054561f8dcae..5c63ab928b97 100644
> > --- a/Documentation/networking/ip-sysctl.rst
> > +++ b/Documentation/networking/ip-sysctl.rst
> > @@ -1229,8 +1229,8 @@ tcp_pingpong_thresh - INTEGER
> > tcp_rto_min_us - INTEGER
> > Minimal TCP retransmission timeout (in microseconds). Note that the
> > rto_min route option has the highest precedence for configuring this
> > - setting, followed by the TCP_BPF_RTO_MIN socket option, followed by
> > - this tcp_rto_min_us sysctl.
> > + setting, followed by the TCP_BPF_RTO_MIN and TCP_RTO_MIN_US socket
> > + options, followed by this tcp_rto_min_us sysctl.
> >
> > The recommended practice is to use a value less or equal to 200000
> > microseconds.
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 7207c52b1fc9..6a7aab854b86 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -806,7 +806,7 @@ u32 tcp_delack_max(const struct sock *sk);
> > static inline u32 tcp_rto_min(const struct sock *sk)
> > {
> > const struct dst_entry *dst = __sk_dst_get(sk);
> > - u32 rto_min = inet_csk(sk)->icsk_rto_min;
> > + u32 rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min);
> >
> > if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
> > rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
> > diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> > index 32a27b4a5020..b2476cf7058e 100644
> > --- a/include/uapi/linux/tcp.h
> > +++ b/include/uapi/linux/tcp.h
> > @@ -137,6 +137,7 @@ enum {
> >
> > #define TCP_IS_MPTCP 43 /* Is MPTCP being used? */
> > #define TCP_RTO_MAX_MS 44 /* max rto time in ms */
> > +#define TCP_RTO_MIN_US 45 /* min rto time in us */
> >
> > #define TCP_REPAIR_ON 1
> > #define TCP_REPAIR_OFF 0
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 46951e749308..f2249d712fcc 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -3352,7 +3352,7 @@ int tcp_disconnect(struct sock *sk, int flags)
> > icsk->icsk_probes_out = 0;
> > icsk->icsk_probes_tstamp = 0;
> > icsk->icsk_rto = TCP_TIMEOUT_INIT;
> > - icsk->icsk_rto_min = TCP_RTO_MIN;
> > + WRITE_ONCE(icsk->icsk_rto_min, TCP_RTO_MIN);
> > icsk->icsk_delack_max = TCP_DELACK_MAX;
> > tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
> > tcp_snd_cwnd_set(tp, TCP_INIT_CWND);
> > @@ -3833,6 +3833,14 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
> > return -EINVAL;
> > WRITE_ONCE(inet_csk(sk)->icsk_rto_max, msecs_to_jiffies(val));
> > return 0;
> > + case TCP_RTO_MIN_US: {
> > + int rto_min = usecs_to_jiffies(val);
>
> > +
> > + if (rto_min > TCP_RTO_MIN || rto_min < TCP_TIMEOUT_MIN)
> > + return -EINVAL;
> > + WRITE_ONCE(inet_csk(sk)->icsk_rto_min, rto_min);
> > + return 0;
> > + }
> > }
> >
> > sockopt_lock_sock(sk);
> > @@ -4672,6 +4680,12 @@ int do_tcp_getsockopt(struct sock *sk, int level,
> > case TCP_RTO_MAX_MS:
> > val = jiffies_to_msecs(tcp_rto_max(sk));
> > break;
> > + case TCP_RTO_MIN_US: {
> > + int rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min);
> > +
> > + val = jiffies_to_usecs(rto_min);
>
> Reuse val directly, no need for a temporary variable, there is no
> fancy computation on it.
>
> val =
> jiffies_to_usecs(READ_ONCE(inet_csk(sk)->icsk_rto_min));
> break;
>
Thanks, I will handle it.
Thanks,
Jason
> > + break;
> > + }
> > default:
> > return -ENOPROTOOPT;
> > }
> > --
> > 2.43.5
> >
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-03-17 12:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-16 2:27 [PATCH net-next v3 0/2] support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt Jason Xing
2025-03-16 2:27 ` [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use Jason Xing
2025-03-17 8:19 ` Eric Dumazet
2025-03-17 12:04 ` Jason Xing
2025-03-16 2:27 ` [PATCH net-next v3 2/2] tcp: support TCP_DELACK_MAX_US " Jason Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).