* [PATCH net-next 0/3] tcp: better receiver autotuning
@ 2017-12-11 1:55 Eric Dumazet
2017-12-11 1:55 ` [PATCH net-next 1/3] tcp: do not overshoot window_clamp in tcp_rcv_space_adjust() Eric Dumazet
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Eric Dumazet @ 2017-12-11 1:55 UTC (permalink / raw)
To: David S . Miller, Neal Cardwell, Yuchung Cheng,
Soheil Hassas Yeganeh, Wei Wang, Priyaranjan Jha
Cc: netdev, Eric Dumazet, Eric Dumazet
Now TCP senders no longer backoff when a drop is detected,
it appears we are very often receive window limited.
This series makes tcp_rcv_space_adjust() slightly more robust
and responsive.
Eric Dumazet (3):
tcp: do not overshoot window_clamp in tcp_rcv_space_adjust()
tcp: avoid integer overflows in tcp_rcv_space_adjust()
tcp: smoother receiver autotuning
include/linux/tcp.h | 2 +-
net/ipv4/tcp_input.c | 31 ++++++++++++-------------------
2 files changed, 13 insertions(+), 20 deletions(-)
--
2.15.1.424.g9478a66081-goog
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next 1/3] tcp: do not overshoot window_clamp in tcp_rcv_space_adjust()
2017-12-11 1:55 [PATCH net-next 0/3] tcp: better receiver autotuning Eric Dumazet
@ 2017-12-11 1:55 ` Eric Dumazet
2017-12-11 17:47 ` Neal Cardwell
2017-12-11 1:55 ` [PATCH net-next 2/3] tcp: avoid integer overflows " Eric Dumazet
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2017-12-11 1:55 UTC (permalink / raw)
To: David S . Miller, Neal Cardwell, Yuchung Cheng,
Soheil Hassas Yeganeh, Wei Wang, Priyaranjan Jha
Cc: netdev, Eric Dumazet, Eric Dumazet
While rcvbuf is properly clamped by tcp_rmem[2], rcvwin
is left to a potentially too big value.
It has no serious effect, since :
1) tcp_grow_window() has very strict checks.
2) window_clamp can be mangled by user space to any value anyway.
tcp_init_buffer_space() and companions use tcp_full_space(),
we use tcp_win_from_space() to avoid reloading sk->sk_rcvbuf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9550cc42de2d9ba4cca6d961a2a3bca501755a69..746a6773c482f5d419ddc9d7c9d52949cbb74cfb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -631,7 +631,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
sk->sk_rcvbuf = rcvbuf;
/* Make the window clamp follow along. */
- tp->window_clamp = rcvwin;
+ tp->window_clamp = tcp_win_from_space(sk, rcvbuf);
}
}
tp->rcvq_space.space = copied;
--
2.15.1.424.g9478a66081-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next 2/3] tcp: avoid integer overflows in tcp_rcv_space_adjust()
2017-12-11 1:55 [PATCH net-next 0/3] tcp: better receiver autotuning Eric Dumazet
2017-12-11 1:55 ` [PATCH net-next 1/3] tcp: do not overshoot window_clamp in tcp_rcv_space_adjust() Eric Dumazet
@ 2017-12-11 1:55 ` Eric Dumazet
2017-12-11 17:47 ` Neal Cardwell
2017-12-11 1:55 ` [PATCH net-next 3/3] tcp: smoother receiver autotuning Eric Dumazet
2017-12-12 15:53 ` [PATCH net-next 0/3] tcp: better " David Miller
3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2017-12-11 1:55 UTC (permalink / raw)
To: David S . Miller, Neal Cardwell, Yuchung Cheng,
Soheil Hassas Yeganeh, Wei Wang, Priyaranjan Jha
Cc: netdev, Eric Dumazet, Eric Dumazet
When using large tcp_rmem[2] values (I did tests with 500 MB),
I noticed overflows while computing rcvwin.
Lets fix this before the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
---
include/linux/tcp.h | 2 +-
net/ipv4/tcp_input.c | 12 +++++++-----
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index ca4a6361389b8a3b268ca5b0f4778662a1f7d315..4f93f0953c411322dc6403af7d1b9b6c3e30bd4f 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -344,7 +344,7 @@ struct tcp_sock {
/* Receiver queue space */
struct {
- int space;
+ u32 space;
u32 seq;
u64 time;
} rcvq_space;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 746a6773c482f5d419ddc9d7c9d52949cbb74cfb..2900e58738cde0ad1ab4a034b6300876ac276edb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -576,8 +576,8 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk,
void tcp_rcv_space_adjust(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
+ u32 copied;
int time;
- int copied;
tcp_mstamp_refresh(tp);
time = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcvq_space.time);
@@ -600,12 +600,13 @@ void tcp_rcv_space_adjust(struct sock *sk)
if (sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf &&
!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
- int rcvwin, rcvmem, rcvbuf;
+ int rcvmem, rcvbuf;
+ u64 rcvwin;
/* minimal window to cope with packet losses, assuming
* steady state. Add some cushion because of small variations.
*/
- rcvwin = (copied << 1) + 16 * tp->advmss;
+ rcvwin = ((u64)copied << 1) + 16 * tp->advmss;
/* If rate increased by 25%,
* assume slow start, rcvwin = 3 * copied
@@ -625,8 +626,9 @@ void tcp_rcv_space_adjust(struct sock *sk)
while (tcp_win_from_space(sk, rcvmem) < tp->advmss)
rcvmem += 128;
- rcvbuf = min(rcvwin / tp->advmss * rcvmem,
- sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
+ do_div(rcvwin, tp->advmss);
+ rcvbuf = min_t(u64, rcvwin * rcvmem,
+ sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
if (rcvbuf > sk->sk_rcvbuf) {
sk->sk_rcvbuf = rcvbuf;
--
2.15.1.424.g9478a66081-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next 3/3] tcp: smoother receiver autotuning
2017-12-11 1:55 [PATCH net-next 0/3] tcp: better receiver autotuning Eric Dumazet
2017-12-11 1:55 ` [PATCH net-next 1/3] tcp: do not overshoot window_clamp in tcp_rcv_space_adjust() Eric Dumazet
2017-12-11 1:55 ` [PATCH net-next 2/3] tcp: avoid integer overflows " Eric Dumazet
@ 2017-12-11 1:55 ` Eric Dumazet
2017-12-11 17:59 ` Neal Cardwell
2017-12-12 15:53 ` [PATCH net-next 0/3] tcp: better " David Miller
3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2017-12-11 1:55 UTC (permalink / raw)
To: David S . Miller, Neal Cardwell, Yuchung Cheng,
Soheil Hassas Yeganeh, Wei Wang, Priyaranjan Jha
Cc: netdev, Eric Dumazet, Eric Dumazet
Back in linux-3.13 (commit b0983d3c9b13 ("tcp: fix dynamic right sizing"))
I addressed the pressing issues we had with receiver autotuning.
But DRS suffers from extra latencies caused by rcv_rtt_est.rtt_us
drifts. One common problem happens during slow start, since the
apparent RTT measured by the receiver can be inflated by ~50%,
at the end of one packet train.
Also, a single drop can delay read() calls by one RTT, meaning
tcp_rcv_space_adjust() can be called one RTT too late.
By replacing the tri-modal heuristic with a continuous function,
we can offset the effects of not growing 'at the optimal time'.
The curve of the function matches prior behavior if the space
increased by 25% and 50% exactly.
Cost of added multiply/divide is small, considering a TCP flow
typically would run this part of the code few times in its life.
I tested this patch with 100 ms RTT / 1% loss link, 100 runs
of (netperf -l 5), and got an average throughput of 4600 Mbit
instead of 1700 Mbit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
---
net/ipv4/tcp_input.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2900e58738cde0ad1ab4a034b6300876ac276edb..fefb46c16de7b1da76443f714a3f42faacca708d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -601,26 +601,17 @@ void tcp_rcv_space_adjust(struct sock *sk)
if (sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf &&
!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
int rcvmem, rcvbuf;
- u64 rcvwin;
+ u64 rcvwin, grow;
/* minimal window to cope with packet losses, assuming
* steady state. Add some cushion because of small variations.
*/
rcvwin = ((u64)copied << 1) + 16 * tp->advmss;
- /* If rate increased by 25%,
- * assume slow start, rcvwin = 3 * copied
- * If rate increased by 50%,
- * assume sender can use 2x growth, rcvwin = 4 * copied
- */
- if (copied >=
- tp->rcvq_space.space + (tp->rcvq_space.space >> 2)) {
- if (copied >=
- tp->rcvq_space.space + (tp->rcvq_space.space >> 1))
- rcvwin <<= 1;
- else
- rcvwin += (rcvwin >> 1);
- }
+ /* Accommodate for sender rate increase (eg. slow start) */
+ grow = rcvwin * (copied - tp->rcvq_space.space);
+ do_div(grow, tp->rcvq_space.space);
+ rcvwin += (grow << 1);
rcvmem = SKB_TRUESIZE(tp->advmss + MAX_TCP_HEADER);
while (tcp_win_from_space(sk, rcvmem) < tp->advmss)
--
2.15.1.424.g9478a66081-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 1/3] tcp: do not overshoot window_clamp in tcp_rcv_space_adjust()
2017-12-11 1:55 ` [PATCH net-next 1/3] tcp: do not overshoot window_clamp in tcp_rcv_space_adjust() Eric Dumazet
@ 2017-12-11 17:47 ` Neal Cardwell
0 siblings, 0 replies; 8+ messages in thread
From: Neal Cardwell @ 2017-12-11 17:47 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Yuchung Cheng, Soheil Hassas Yeganeh, Wei Wang,
Priyaranjan Jha, netdev, Eric Dumazet
On Sun, Dec 10, 2017 at 8:55 PM, Eric Dumazet <edumazet@google.com> wrote:
>
> While rcvbuf is properly clamped by tcp_rmem[2], rcvwin
> is left to a potentially too big value.
>
> It has no serious effect, since :
> 1) tcp_grow_window() has very strict checks.
> 2) window_clamp can be mangled by user space to any value anyway.
>
> tcp_init_buffer_space() and companions use tcp_full_space(),
> we use tcp_win_from_space() to avoid reloading sk->sk_rcvbuf
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Wei Wang <weiwan@google.com>
> ---
Acked-by: Neal Cardwell <ncardwell@google.com>
Thanks, Eric!
neal
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 2/3] tcp: avoid integer overflows in tcp_rcv_space_adjust()
2017-12-11 1:55 ` [PATCH net-next 2/3] tcp: avoid integer overflows " Eric Dumazet
@ 2017-12-11 17:47 ` Neal Cardwell
0 siblings, 0 replies; 8+ messages in thread
From: Neal Cardwell @ 2017-12-11 17:47 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Yuchung Cheng, Soheil Hassas Yeganeh, Wei Wang,
Priyaranjan Jha, netdev, Eric Dumazet
On Sun, Dec 10, 2017 at 8:55 PM, Eric Dumazet <edumazet@google.com> wrote:
> When using large tcp_rmem[2] values (I did tests with 500 MB),
> I noticed overflows while computing rcvwin.
>
> Lets fix this before the following patch.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Wei Wang <weiwan@google.com>
> ---
Acked-by: Neal Cardwell <ncardwell@google.com>
Thanks, Eric!
neal
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 3/3] tcp: smoother receiver autotuning
2017-12-11 1:55 ` [PATCH net-next 3/3] tcp: smoother receiver autotuning Eric Dumazet
@ 2017-12-11 17:59 ` Neal Cardwell
0 siblings, 0 replies; 8+ messages in thread
From: Neal Cardwell @ 2017-12-11 17:59 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Yuchung Cheng, Soheil Hassas Yeganeh, Wei Wang,
Priyaranjan Jha, netdev, Eric Dumazet
On Sun, Dec 10, 2017 at 8:55 PM, Eric Dumazet <edumazet@google.com> wrote:
> Back in linux-3.13 (commit b0983d3c9b13 ("tcp: fix dynamic right sizing"))
> I addressed the pressing issues we had with receiver autotuning.
>
> But DRS suffers from extra latencies caused by rcv_rtt_est.rtt_us
> drifts. One common problem happens during slow start, since the
> apparent RTT measured by the receiver can be inflated by ~50%,
> at the end of one packet train.
>
> Also, a single drop can delay read() calls by one RTT, meaning
> tcp_rcv_space_adjust() can be called one RTT too late.
>
> By replacing the tri-modal heuristic with a continuous function,
> we can offset the effects of not growing 'at the optimal time'.
>
> The curve of the function matches prior behavior if the space
> increased by 25% and 50% exactly.
>
> Cost of added multiply/divide is small, considering a TCP flow
> typically would run this part of the code few times in its life.
>
> I tested this patch with 100 ms RTT / 1% loss link, 100 runs
> of (netperf -l 5), and got an average throughput of 4600 Mbit
> instead of 1700 Mbit.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Wei Wang <weiwan@google.com>
> ---
Acked-by: Neal Cardwell <ncardwell@google.com>
Thanks, Eric!
neal
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 0/3] tcp: better receiver autotuning
2017-12-11 1:55 [PATCH net-next 0/3] tcp: better receiver autotuning Eric Dumazet
` (2 preceding siblings ...)
2017-12-11 1:55 ` [PATCH net-next 3/3] tcp: smoother receiver autotuning Eric Dumazet
@ 2017-12-12 15:53 ` David Miller
3 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2017-12-12 15:53 UTC (permalink / raw)
To: edumazet; +Cc: ncardwell, ycheng, soheil, weiwan, priyarjha, netdev,
eric.dumazet
From: Eric Dumazet <edumazet@google.com>
Date: Sun, 10 Dec 2017 17:55:01 -0800
> Now TCP senders no longer backoff when a drop is detected,
> it appears we are very often receive window limited.
>
> This series makes tcp_rcv_space_adjust() slightly more robust
> and responsive.
Series applied, thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-12-12 15:53 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-11 1:55 [PATCH net-next 0/3] tcp: better receiver autotuning Eric Dumazet
2017-12-11 1:55 ` [PATCH net-next 1/3] tcp: do not overshoot window_clamp in tcp_rcv_space_adjust() Eric Dumazet
2017-12-11 17:47 ` Neal Cardwell
2017-12-11 1:55 ` [PATCH net-next 2/3] tcp: avoid integer overflows " Eric Dumazet
2017-12-11 17:47 ` Neal Cardwell
2017-12-11 1:55 ` [PATCH net-next 3/3] tcp: smoother receiver autotuning Eric Dumazet
2017-12-11 17:59 ` Neal Cardwell
2017-12-12 15:53 ` [PATCH net-next 0/3] tcp: better " David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).