From: Florian Westphal <fw@strlen.de>
To: netdev@vger.kernel.org
Cc: Florian Westphal <fw@strlen.de>,
Neal Cardwell <ncardwell@google.com>,
Yuchung Cheng <ycheng@google.com>
Subject: [PATCH next v3] tcp: use zero-window when free_space is low
Date: Wed, 19 Feb 2014 12:51:10 +0100 [thread overview]
Message-ID: <1392810670-3543-1-git-send-email-fw@strlen.de> (raw)
Currently the kernel tries to announce a zero window when free_space
is below the current receiver mss estimate.
When a sender is transmitting small packets and reader consumes data
slowly (or not at all), receiver might be unable to shrink the receive
win because
a) we cannot withdraw already-commited receive window, and,
b) we have to round the current rwin up to a multiple of the wscale
factor, else we would shrink the current window.
This causes the receive buffer to fill up until the rmem limit is hit.
When this happens, we start dropping packets.
Moreover, tcp_clamp_window may continue to grow sk_rcvbuf towards rmem[2]
even if socket is not being read from.
As we cannot avoid the "current_win is rounded up to multiple of mss"
issue [we would violate a) above] at least try to prevent the receive buf
growth towards tcp_rmem[2] limit by attempting to move to zero-window
announcement when free_space becomes less than 1/16 of the current
allowed receive buffer maximum. If tcp_rmem[2] is large, this will
increase our chances to get a zero-window announcement out in time.
Reproducer:
On server:
$ nc -l -p 12345
<suspend it: CTRL-Z>
Client:
#!/usr/bin/env python
import socket
import time
sock = socket.socket()
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
sock.connect(("192.168.4.1", 12345));
while True:
sock.send('A' * 23)
time.sleep(0.005)
socket buffer on server-side will grow until tcp_rmem[2] is hit,
at which point the client rexmits data until -EDTIMEOUT:
tcp_data_queue invokes tcp_try_rmem_schedule which will call
tcp_prune_queue which calls tcp_clamp_window(). And that function will
grow sk->sk_rcvbuf up until it eventually hits tcp_rmem[2].
Thanks to Eric Dumazet for running regression tests.
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
no changes since v2; resend with Erics Ack/Tested-by tags
V1 of this patch was deferred, resending to get discussion going again.
Changes since v1:
- add reproducer to commit message
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2a69f42..fd8d821 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2145,7 +2145,8 @@ u32 __tcp_select_window(struct sock *sk)
*/
int mss = icsk->icsk_ack.rcv_mss;
int free_space = tcp_space(sk);
- int full_space = min_t(int, tp->window_clamp, tcp_full_space(sk));
+ int allowed_space = tcp_full_space(sk);
+ int full_space = min_t(int, tp->window_clamp, allowed_space);
int window;
if (mss > full_space)
@@ -2158,7 +2159,19 @@ u32 __tcp_select_window(struct sock *sk)
tp->rcv_ssthresh = min(tp->rcv_ssthresh,
4U * tp->advmss);
- if (free_space < mss)
+ /* free_space might become our new window, make sure we don't
+ * increase it due to wscale.
+ */
+ free_space = round_down(free_space, 1 << tp->rx_opt.rcv_wscale);
+
+ /* if free space is less than mss estimate, or is below 1/16th
+ * of the maximum allowed, try to move to zero-window, else
+ * tcp_clamp_window() will grow rcv buf up to tcp_rmem[2], and
+ * new incoming data is dropped due to memory limits.
+ * With large window, mss test triggers way too late in order
+ * to announce zero window in time before rmem limit kicks in.
+ */
+ if (free_space < (allowed_space >> 4) || free_space < mss)
return 0;
}
--
1.8.1.5
next reply other threads:[~2014-02-19 11:55 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-19 11:51 Florian Westphal [this message]
2014-02-19 21:48 ` [PATCH next v3] tcp: use zero-window when free_space is low David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1392810670-3543-1-git-send-email-fw@strlen.de \
--to=fw@strlen.de \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=ycheng@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).