netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Soheil Hassas Yeganeh <soheil@google.com>
To: Eric Dumazet <edumazet@google.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>,
	Neal Cardwell <ncardwell@google.com>,
	 Yuchung Cheng <ycheng@google.com>,
	netdev@vger.kernel.org, eric.dumazet@gmail.com
Subject: Re: [PATCH net-next 3/3] tcp: derive delack_max from rto_min
Date: Wed, 20 Sep 2023 13:34:52 -0400	[thread overview]
Message-ID: <CACSApvZg8soR6bsMv9NqHST2+FLX2RFGRORY-DbTDfhCcUURBQ@mail.gmail.com> (raw)
In-Reply-To: <20230920172943.4135513-4-edumazet@google.com>

On Wed, Sep 20, 2023 at 1:29 PM Eric Dumazet <edumazet@google.com> wrote:
>
> While BPF allows to set icsk->->icsk_delack_max

nit pick in the commit message:  redundant ->->.

> and/or icsk->icsk_rto_min, we have an ip route
> attribute (RTAX_RTO_MIN) to be able to tune rto_min,
> but nothing to consequently adjust max delayed ack,
> which vary from 40ms to 200 ms (TCP_DELACK_{MIN|MAX}).
>
> This makes RTAX_RTO_MIN of almost no practical use,
> unless customers are in big trouble.
>
> Modern days datacenter communications want to set
> rto_min to ~5 ms, and the max delayed ack one jiffie
> smaller to avoid spurious retransmits.
>
> After this patch, an "rto_min 5" route attribute will
> effectively lower max delayed ack timers to 4 ms.
>
> Note in the following ss output, "rto:6 ... ato:4"
>
> $ ss -temoi dst XXXXXX
> State Recv-Q Send-Q           Local Address:Port       Peer Address:Port  Process
> ESTAB 0      0        [2002:a05:6608:295::]:52950   [2002:a05:6608:297::]:41597
>      ino:255134 sk:1001 <->
>          skmem:(r0,rb1707063,t872,tb262144,f0,w0,o0,bl0,d0) ts sack
>  cubic wscale:8,8 rto:6 rtt:0.02/0.002 ato:4 mss:4096 pmtu:4500
>  rcvmss:536 advmss:4096 cwnd:10 bytes_sent:54823160 bytes_acked:54823121
>  bytes_received:54823120 segs_out:1370582 segs_in:1370580
>  data_segs_out:1370579 data_segs_in:1370578 send 16.4Gbps
>  pacing_rate 32.6Gbps delivery_rate 1.72Gbps delivered:1370579
>  busy:26920ms unacked:1 rcv_rtt:34.615 rcv_space:65920
>  rcv_ssthresh:65535 minrtt:0.015 snd_wnd:65536
>
> While we could argue this patch fixes a bug with RTAX_RTO_MIN,
> I do not add a Fixes: tag, so that we can soak it a bit before
> asking backports to stable branches.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

So great that this is now upstream :-) Thank you!

> ---
>  include/net/tcp.h     |  2 ++
>  net/ipv4/tcp.c        |  3 ++-
>  net/ipv4/tcp_output.c | 16 +++++++++++++++-
>  3 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index a8db7d43fb6215197af4a80e270b8c82070d55cb..af9cb37fbe53ec60b4e545e8aa0740bbf30da7b6 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -718,6 +718,8 @@ static inline void tcp_fast_path_check(struct sock *sk)
>                 tcp_fast_path_on(tp);
>  }
>
> +u32 tcp_delack_max(const struct sock *sk);
> +
>  /* Compute the actual rto_min value */
>  static inline u32 tcp_rto_min(const struct sock *sk)
>  {
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 69b8d707370844020770438cc4f31aeda4830b3d..e54f91eb943b2f09f303951cc72cbea61ada519d 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3762,7 +3762,8 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
>                 info->tcpi_options |= TCPI_OPT_SYN_DATA;
>
>         info->tcpi_rto = jiffies_to_usecs(icsk->icsk_rto);
> -       info->tcpi_ato = jiffies_to_usecs(icsk->icsk_ack.ato);
> +       info->tcpi_ato = jiffies_to_usecs(min(icsk->icsk_ack.ato,
> +                                             tcp_delack_max(sk)));
>         info->tcpi_snd_mss = tp->mss_cache;
>         info->tcpi_rcv_mss = icsk->icsk_ack.rcv_mss;
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 1fc1f879cfd6c28cd655bb8f02eff6624eec2ffc..2d1e4b5ac1ca41ff3db8dc58458d4e922a2c4999 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3977,6 +3977,20 @@ int tcp_connect(struct sock *sk)
>  }
>  EXPORT_SYMBOL(tcp_connect);
>
> +u32 tcp_delack_max(const struct sock *sk)
> +{
> +       const struct dst_entry *dst = __sk_dst_get(sk);
> +       u32 delack_max = inet_csk(sk)->icsk_delack_max;
> +
> +       if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) {
> +               u32 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
> +               u32 delack_from_rto_min = max_t(int, 1, rto_min - 1);
> +
> +               delack_max = min_t(u32, delack_max, delack_from_rto_min);
> +       }
> +       return delack_max;
> +}
> +
>  /* Send out a delayed ack, the caller does the policy checking
>   * to see if we should even be here.  See tcp_input.c:tcp_ack_snd_check()
>   * for details.
> @@ -4012,7 +4026,7 @@ void tcp_send_delayed_ack(struct sock *sk)
>                 ato = min(ato, max_ato);
>         }
>
> -       ato = min_t(u32, ato, inet_csk(sk)->icsk_delack_max);
> +       ato = min_t(u32, ato, tcp_delack_max(sk));
>
>         /* Stay within the limit we were given */
>         timeout = jiffies + ato;
> --
> 2.42.0.459.ge4e396fd5e-goog
>

  reply	other threads:[~2023-09-20 17:35 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-20 17:29 [PATCH net-next 0/3] tcp: add tcp_delack_max() Eric Dumazet
2023-09-20 17:29 ` [PATCH net-next 1/3] net: constify sk_dst_get() and __sk_dst_get() argument Eric Dumazet
2023-09-20 17:29 ` [PATCH net-next 2/3] tcp: constify tcp_rto_min() and tcp_rto_min_us() argument Eric Dumazet
2023-09-20 17:29 ` [PATCH net-next 3/3] tcp: derive delack_max from rto_min Eric Dumazet
2023-09-20 17:34   ` Soheil Hassas Yeganeh [this message]
2023-09-20 19:06   ` Neal Cardwell
2023-09-20 21:57   ` David Ahern
2023-09-21  2:16     ` Eric Dumazet
2023-09-21 12:37       ` David Ahern
2023-09-21 12:58         ` Eric Dumazet
2023-09-22  9:59           ` David Laight
2023-09-22 10:53             ` Eric Dumazet
2023-09-22 16:51               ` David Laight
2023-10-01 12:20 ` [PATCH net-next 0/3] tcp: add tcp_delack_max() patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACSApvZg8soR6bsMv9NqHST2+FLX2RFGRORY-DbTDfhCcUURBQ@mail.gmail.com \
    --to=soheil@google.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).