* [PATCH net-next v3 0/2] support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt @ 2025-03-16 2:27 Jason Xing 2025-03-16 2:27 ` [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use Jason Xing 2025-03-16 2:27 ` [PATCH net-next v3 2/2] tcp: support TCP_DELACK_MAX_US " Jason Xing 0 siblings, 2 replies; 5+ messages in thread From: Jason Xing @ 2025-03-16 2:27 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, dsahern, horms, kuniyu, ncardwell Cc: netdev, Jason Xing Add set/getsockopt supports for TCP_RTO_MIN_US and TCP_DELACK_MAX_US. Jason Xing (2): tcp: support TCP_RTO_MIN_US for set/getsockopt use tcp: support TCP_DELACK_MAX_US for set/getsockopt use Documentation/networking/ip-sysctl.rst | 4 ++-- include/net/tcp.h | 2 +- include/uapi/linux/tcp.h | 2 ++ net/ipv4/tcp.c | 32 ++++++++++++++++++++++++-- net/ipv4/tcp_output.c | 2 +- 5 files changed, 36 insertions(+), 6 deletions(-) -- 2.43.5 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use 2025-03-16 2:27 [PATCH net-next v3 0/2] support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt Jason Xing @ 2025-03-16 2:27 ` Jason Xing 2025-03-17 8:19 ` Eric Dumazet 2025-03-16 2:27 ` [PATCH net-next v3 2/2] tcp: support TCP_DELACK_MAX_US " Jason Xing 1 sibling, 1 reply; 5+ messages in thread From: Jason Xing @ 2025-03-16 2:27 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, dsahern, horms, kuniyu, ncardwell Cc: netdev, Jason Xing Support adjusting RTO MIN for socket level by using setsockopt(). Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> --- Documentation/networking/ip-sysctl.rst | 4 ++-- include/net/tcp.h | 2 +- include/uapi/linux/tcp.h | 1 + net/ipv4/tcp.c | 16 +++++++++++++++- 4 files changed, 19 insertions(+), 4 deletions(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 054561f8dcae..5c63ab928b97 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1229,8 +1229,8 @@ tcp_pingpong_thresh - INTEGER tcp_rto_min_us - INTEGER Minimal TCP retransmission timeout (in microseconds). Note that the rto_min route option has the highest precedence for configuring this - setting, followed by the TCP_BPF_RTO_MIN socket option, followed by - this tcp_rto_min_us sysctl. + setting, followed by the TCP_BPF_RTO_MIN and TCP_RTO_MIN_US socket + options, followed by this tcp_rto_min_us sysctl. The recommended practice is to use a value less or equal to 200000 microseconds. diff --git a/include/net/tcp.h b/include/net/tcp.h index 7207c52b1fc9..6a7aab854b86 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -806,7 +806,7 @@ u32 tcp_delack_max(const struct sock *sk); static inline u32 tcp_rto_min(const struct sock *sk) { const struct dst_entry *dst = __sk_dst_get(sk); - u32 rto_min = inet_csk(sk)->icsk_rto_min; + u32 rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min); if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN); diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 32a27b4a5020..b2476cf7058e 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -137,6 +137,7 @@ enum { #define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ #define TCP_RTO_MAX_MS 44 /* max rto time in ms */ +#define TCP_RTO_MIN_US 45 /* min rto time in us */ #define TCP_REPAIR_ON 1 #define TCP_REPAIR_OFF 0 diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 46951e749308..f2249d712fcc 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3352,7 +3352,7 @@ int tcp_disconnect(struct sock *sk, int flags) icsk->icsk_probes_out = 0; icsk->icsk_probes_tstamp = 0; icsk->icsk_rto = TCP_TIMEOUT_INIT; - icsk->icsk_rto_min = TCP_RTO_MIN; + WRITE_ONCE(icsk->icsk_rto_min, TCP_RTO_MIN); icsk->icsk_delack_max = TCP_DELACK_MAX; tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; tcp_snd_cwnd_set(tp, TCP_INIT_CWND); @@ -3833,6 +3833,14 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, return -EINVAL; WRITE_ONCE(inet_csk(sk)->icsk_rto_max, msecs_to_jiffies(val)); return 0; + case TCP_RTO_MIN_US: { + int rto_min = usecs_to_jiffies(val); + + if (rto_min > TCP_RTO_MIN || rto_min < TCP_TIMEOUT_MIN) + return -EINVAL; + WRITE_ONCE(inet_csk(sk)->icsk_rto_min, rto_min); + return 0; + } } sockopt_lock_sock(sk); @@ -4672,6 +4680,12 @@ int do_tcp_getsockopt(struct sock *sk, int level, case TCP_RTO_MAX_MS: val = jiffies_to_msecs(tcp_rto_max(sk)); break; + case TCP_RTO_MIN_US: { + int rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min); + + val = jiffies_to_usecs(rto_min); + break; + } default: return -ENOPROTOOPT; } -- 2.43.5 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use 2025-03-16 2:27 ` [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use Jason Xing @ 2025-03-17 8:19 ` Eric Dumazet 2025-03-17 12:04 ` Jason Xing 0 siblings, 1 reply; 5+ messages in thread From: Eric Dumazet @ 2025-03-17 8:19 UTC (permalink / raw) To: Jason Xing; +Cc: davem, kuba, pabeni, dsahern, horms, kuniyu, ncardwell, netdev On Sun, Mar 16, 2025 at 3:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > Support adjusting RTO MIN for socket level by using setsockopt(). This changelog is small :/ You should clearly state that this option has no effect if the route has a RTAX_RTO_MIN attribute set. Also document what is the default socket value after a socket() system call and/or accept() in the changelog. > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> > --- > Documentation/networking/ip-sysctl.rst | 4 ++-- > include/net/tcp.h | 2 +- > include/uapi/linux/tcp.h | 1 + > net/ipv4/tcp.c | 16 +++++++++++++++- > 4 files changed, 19 insertions(+), 4 deletions(-) > > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst > index 054561f8dcae..5c63ab928b97 100644 > --- a/Documentation/networking/ip-sysctl.rst > +++ b/Documentation/networking/ip-sysctl.rst > @@ -1229,8 +1229,8 @@ tcp_pingpong_thresh - INTEGER > tcp_rto_min_us - INTEGER > Minimal TCP retransmission timeout (in microseconds). Note that the > rto_min route option has the highest precedence for configuring this > - setting, followed by the TCP_BPF_RTO_MIN socket option, followed by > - this tcp_rto_min_us sysctl. > + setting, followed by the TCP_BPF_RTO_MIN and TCP_RTO_MIN_US socket > + options, followed by this tcp_rto_min_us sysctl. > > The recommended practice is to use a value less or equal to 200000 > microseconds. > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 7207c52b1fc9..6a7aab854b86 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -806,7 +806,7 @@ u32 tcp_delack_max(const struct sock *sk); > static inline u32 tcp_rto_min(const struct sock *sk) > { > const struct dst_entry *dst = __sk_dst_get(sk); > - u32 rto_min = inet_csk(sk)->icsk_rto_min; > + u32 rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min); > > if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) > rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN); > diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h > index 32a27b4a5020..b2476cf7058e 100644 > --- a/include/uapi/linux/tcp.h > +++ b/include/uapi/linux/tcp.h > @@ -137,6 +137,7 @@ enum { > > #define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ > #define TCP_RTO_MAX_MS 44 /* max rto time in ms */ > +#define TCP_RTO_MIN_US 45 /* min rto time in us */ > > #define TCP_REPAIR_ON 1 > #define TCP_REPAIR_OFF 0 > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 46951e749308..f2249d712fcc 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -3352,7 +3352,7 @@ int tcp_disconnect(struct sock *sk, int flags) > icsk->icsk_probes_out = 0; > icsk->icsk_probes_tstamp = 0; > icsk->icsk_rto = TCP_TIMEOUT_INIT; > - icsk->icsk_rto_min = TCP_RTO_MIN; > + WRITE_ONCE(icsk->icsk_rto_min, TCP_RTO_MIN); > icsk->icsk_delack_max = TCP_DELACK_MAX; > tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; > tcp_snd_cwnd_set(tp, TCP_INIT_CWND); > @@ -3833,6 +3833,14 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, > return -EINVAL; > WRITE_ONCE(inet_csk(sk)->icsk_rto_max, msecs_to_jiffies(val)); > return 0; > + case TCP_RTO_MIN_US: { > + int rto_min = usecs_to_jiffies(val); > + > + if (rto_min > TCP_RTO_MIN || rto_min < TCP_TIMEOUT_MIN) > + return -EINVAL; > + WRITE_ONCE(inet_csk(sk)->icsk_rto_min, rto_min); > + return 0; > + } > } > > sockopt_lock_sock(sk); > @@ -4672,6 +4680,12 @@ int do_tcp_getsockopt(struct sock *sk, int level, > case TCP_RTO_MAX_MS: > val = jiffies_to_msecs(tcp_rto_max(sk)); > break; > + case TCP_RTO_MIN_US: { > + int rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min); > + > + val = jiffies_to_usecs(rto_min); Reuse val directly, no need for a temporary variable, there is no fancy computation on it. val = jiffies_to_usecs(READ_ONCE(inet_csk(sk)->icsk_rto_min)); break; > + break; > + } > default: > return -ENOPROTOOPT; > } > -- > 2.43.5 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use 2025-03-17 8:19 ` Eric Dumazet @ 2025-03-17 12:04 ` Jason Xing 0 siblings, 0 replies; 5+ messages in thread From: Jason Xing @ 2025-03-17 12:04 UTC (permalink / raw) To: Eric Dumazet Cc: davem, kuba, pabeni, dsahern, horms, kuniyu, ncardwell, netdev On Mon, Mar 17, 2025 at 4:19 PM Eric Dumazet <edumazet@google.com> wrote: > > On Sun, Mar 16, 2025 at 3:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > Support adjusting RTO MIN for socket level by using setsockopt(). > > > This changelog is small :/ > > You should clearly state that this option has no effect if the route > has a RTAX_RTO_MIN attribute set. > > Also document what is the default socket value after a socket() system > call and/or accept() in the changelog. Thanks for the review. Will do it in another patch as well. > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> > > --- > > Documentation/networking/ip-sysctl.rst | 4 ++-- > > include/net/tcp.h | 2 +- > > include/uapi/linux/tcp.h | 1 + > > net/ipv4/tcp.c | 16 +++++++++++++++- > > 4 files changed, 19 insertions(+), 4 deletions(-) > > > > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst > > index 054561f8dcae..5c63ab928b97 100644 > > --- a/Documentation/networking/ip-sysctl.rst > > +++ b/Documentation/networking/ip-sysctl.rst > > @@ -1229,8 +1229,8 @@ tcp_pingpong_thresh - INTEGER > > tcp_rto_min_us - INTEGER > > Minimal TCP retransmission timeout (in microseconds). Note that the > > rto_min route option has the highest precedence for configuring this > > - setting, followed by the TCP_BPF_RTO_MIN socket option, followed by > > - this tcp_rto_min_us sysctl. > > + setting, followed by the TCP_BPF_RTO_MIN and TCP_RTO_MIN_US socket > > + options, followed by this tcp_rto_min_us sysctl. > > > > The recommended practice is to use a value less or equal to 200000 > > microseconds. > > diff --git a/include/net/tcp.h b/include/net/tcp.h > > index 7207c52b1fc9..6a7aab854b86 100644 > > --- a/include/net/tcp.h > > +++ b/include/net/tcp.h > > @@ -806,7 +806,7 @@ u32 tcp_delack_max(const struct sock *sk); > > static inline u32 tcp_rto_min(const struct sock *sk) > > { > > const struct dst_entry *dst = __sk_dst_get(sk); > > - u32 rto_min = inet_csk(sk)->icsk_rto_min; > > + u32 rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min); > > > > if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) > > rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN); > > diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h > > index 32a27b4a5020..b2476cf7058e 100644 > > --- a/include/uapi/linux/tcp.h > > +++ b/include/uapi/linux/tcp.h > > @@ -137,6 +137,7 @@ enum { > > > > #define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ > > #define TCP_RTO_MAX_MS 44 /* max rto time in ms */ > > +#define TCP_RTO_MIN_US 45 /* min rto time in us */ > > > > #define TCP_REPAIR_ON 1 > > #define TCP_REPAIR_OFF 0 > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index 46951e749308..f2249d712fcc 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -3352,7 +3352,7 @@ int tcp_disconnect(struct sock *sk, int flags) > > icsk->icsk_probes_out = 0; > > icsk->icsk_probes_tstamp = 0; > > icsk->icsk_rto = TCP_TIMEOUT_INIT; > > - icsk->icsk_rto_min = TCP_RTO_MIN; > > + WRITE_ONCE(icsk->icsk_rto_min, TCP_RTO_MIN); > > icsk->icsk_delack_max = TCP_DELACK_MAX; > > tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; > > tcp_snd_cwnd_set(tp, TCP_INIT_CWND); > > @@ -3833,6 +3833,14 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, > > return -EINVAL; > > WRITE_ONCE(inet_csk(sk)->icsk_rto_max, msecs_to_jiffies(val)); > > return 0; > > + case TCP_RTO_MIN_US: { > > + int rto_min = usecs_to_jiffies(val); > > > + > > + if (rto_min > TCP_RTO_MIN || rto_min < TCP_TIMEOUT_MIN) > > + return -EINVAL; > > + WRITE_ONCE(inet_csk(sk)->icsk_rto_min, rto_min); > > + return 0; > > + } > > } > > > > sockopt_lock_sock(sk); > > @@ -4672,6 +4680,12 @@ int do_tcp_getsockopt(struct sock *sk, int level, > > case TCP_RTO_MAX_MS: > > val = jiffies_to_msecs(tcp_rto_max(sk)); > > break; > > + case TCP_RTO_MIN_US: { > > + int rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min); > > + > > + val = jiffies_to_usecs(rto_min); > > Reuse val directly, no need for a temporary variable, there is no > fancy computation on it. > > val = > jiffies_to_usecs(READ_ONCE(inet_csk(sk)->icsk_rto_min)); > break; > Thanks, I will handle it. Thanks, Jason > > + break; > > + } > > default: > > return -ENOPROTOOPT; > > } > > -- > > 2.43.5 > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next v3 2/2] tcp: support TCP_DELACK_MAX_US for set/getsockopt use 2025-03-16 2:27 [PATCH net-next v3 0/2] support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt Jason Xing 2025-03-16 2:27 ` [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use Jason Xing @ 2025-03-16 2:27 ` Jason Xing 1 sibling, 0 replies; 5+ messages in thread From: Jason Xing @ 2025-03-16 2:27 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, dsahern, horms, kuniyu, ncardwell Cc: netdev, Jason Xing Support adjusting delayed ack max for socket level by using setsockopt(). Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> --- include/uapi/linux/tcp.h | 1 + net/ipv4/tcp.c | 16 +++++++++++++++- net/ipv4/tcp_output.c | 2 +- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index b2476cf7058e..2377e22f2c4b 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -138,6 +138,7 @@ enum { #define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ #define TCP_RTO_MAX_MS 44 /* max rto time in ms */ #define TCP_RTO_MIN_US 45 /* min rto time in us */ +#define TCP_DELACK_MAX_US 46 /* max delayed ack time in us */ #define TCP_REPAIR_ON 1 #define TCP_REPAIR_OFF 0 diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f2249d712fcc..d12a663e13be 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3353,7 +3353,7 @@ int tcp_disconnect(struct sock *sk, int flags) icsk->icsk_probes_tstamp = 0; icsk->icsk_rto = TCP_TIMEOUT_INIT; WRITE_ONCE(icsk->icsk_rto_min, TCP_RTO_MIN); - icsk->icsk_delack_max = TCP_DELACK_MAX; + WRITE_ONCE(icsk->icsk_delack_max, TCP_DELACK_MAX); tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; tcp_snd_cwnd_set(tp, TCP_INIT_CWND); tp->snd_cwnd_cnt = 0; @@ -3841,6 +3841,14 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, WRITE_ONCE(inet_csk(sk)->icsk_rto_min, rto_min); return 0; } + case TCP_DELACK_MAX_US: { + int delack_max = usecs_to_jiffies(val); + + if (delack_max > TCP_DELACK_MAX || delack_max < TCP_TIMEOUT_MIN) + return -EINVAL; + WRITE_ONCE(inet_csk(sk)->icsk_delack_max, delack_max); + return 0; + } } sockopt_lock_sock(sk); @@ -4686,6 +4694,12 @@ int do_tcp_getsockopt(struct sock *sk, int level, val = jiffies_to_usecs(rto_min); break; } + case TCP_DELACK_MAX_US: { + int delack_max = READ_ONCE(inet_csk(sk)->icsk_delack_max); + + val = jiffies_to_usecs(delack_max); + break; + } default: return -ENOPROTOOPT; } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 24e56bf96747..65aa26d65987 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -4179,7 +4179,7 @@ u32 tcp_delack_max(const struct sock *sk) { u32 delack_from_rto_min = max(tcp_rto_min(sk), 2) - 1; - return min(inet_csk(sk)->icsk_delack_max, delack_from_rto_min); + return min(READ_ONCE(inet_csk(sk)->icsk_delack_max), delack_from_rto_min); } /* Send out a delayed ack, the caller does the policy checking -- 2.43.5 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-03-17 12:04 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-16 2:27 [PATCH net-next v3 0/2] support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt Jason Xing 2025-03-16 2:27 ` [PATCH net-next v3 1/2] tcp: support TCP_RTO_MIN_US for set/getsockopt use Jason Xing 2025-03-17 8:19 ` Eric Dumazet 2025-03-17 12:04 ` Jason Xing 2025-03-16 2:27 ` [PATCH net-next v3 2/2] tcp: support TCP_DELACK_MAX_US " Jason Xing
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).