From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>,
Eric Dumazet <edumazet@google.com>,
Soheil Hassas Yeganeh <soheil@google.com>,
"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 5.4 11/34] tcp: select sane initial rcvq_space.space for big MSS
Date: Sat, 19 Dec 2020 14:03:08 +0100 [thread overview]
Message-ID: <20201219125341.934325259@linuxfoundation.org> (raw)
In-Reply-To: <20201219125341.384025953@linuxfoundation.org>
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit 72d05c00d7ecda85df29abd046da7e41cc071c17 ]
Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.
This is no longer the case, and Hazem Mohamed Abuelfotoh reported
that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.
Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
of rcvq_space.space computation, while it can select later a smaller
value for tp->rcv_ssthresh and tp->window_clamp.
ss -temoi on receiver would show :
skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596
This means that TCP can not increase its window in tcp_grow_window(),
and that DRS can never kick.
Fix this by making sure that rcvq_space.space is not bigger than number of bytes
that can be held in TCP receive queue.
People unable/unwilling to change their kernel can work around this issue by
selecting a bigger tcp_rmem[1] value as in :
echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem
Based on an initial report and patch from Hazem Mohamed Abuelfotoh
https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/
Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/tcp_input.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -446,7 +446,6 @@ void tcp_init_buffer_space(struct sock *
if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK))
tcp_sndbuf_expand(sk);
- tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss);
tcp_mstamp_refresh(tp);
tp->rcvq_space.time = tp->tcp_mstamp;
tp->rcvq_space.seq = tp->copied_seq;
@@ -470,6 +469,8 @@ void tcp_init_buffer_space(struct sock *
tp->rcv_ssthresh = min(tp->rcv_ssthresh, tp->window_clamp);
tp->snd_cwnd_stamp = tcp_jiffies32;
+ tp->rcvq_space.space = min3(tp->rcv_ssthresh, tp->rcv_wnd,
+ (u32)TCP_INIT_CWND * tp->advmss);
}
/* 4. Recalculate window clamp after socket hit its memory bounds. */
next prev parent reply other threads:[~2020-12-19 13:03 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-19 13:02 [PATCH 5.4 00/34] 5.4.85-rc1 review Greg Kroah-Hartman
2020-12-19 13:02 ` [PATCH 5.4 01/34] ptrace: Prevent kernel-infoleak in ptrace_get_syscall_info() Greg Kroah-Hartman
2020-12-19 13:02 ` [PATCH 5.4 02/34] ipv4: fix error return code in rtm_to_fib_config() Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 03/34] mac80211: mesh: fix mesh_pathtbl_init() error path Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 04/34] net: bridge: vlan: fix error return code in __vlan_add() Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 05/34] vrf: packets with lladdr src needs dst at input with orig_iif when needs strict Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 06/34] net: hns3: remove a misused pragma packed Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 07/34] udp: fix the proto value passed to ip_protocol_deliver_rcu for the segments Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 08/34] enetc: Fix reporting of h/w packet counters Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 09/34] bridge: Fix a deadlock when enabling multicast snooping Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 10/34] net: stmmac: free tx skb buffer in stmmac_resume() Greg Kroah-Hartman
2020-12-19 13:03 ` Greg Kroah-Hartman [this message]
2020-12-19 13:03 ` [PATCH 5.4 12/34] tcp: fix cwnd-limited bug for TSO deferral where we send nothing Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 13/34] net/mlx4_en: Avoid scheduling restart task if it is already running Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 14/34] lan743x: fix for potential NULL pointer dereference with bare card Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 15/34] net/mlx4_en: Handle TX error CQE Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 16/34] net: ll_temac: Fix potential NULL dereference in temac_probe() Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 17/34] net: stmmac: dwmac-meson8b: fix mask definition of the m250_sel mux Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 18/34] net: stmmac: delete the eee_ctrl_timer after napi disabled Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 19/34] ktest.pl: If size of log is too big to email, email error message Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 20/34] USB: dummy-hcd: Fix uninitialized array use in init() Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 21/34] USB: add RESET_RESUME quirk for Snapscan 1212 Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 22/34] ALSA: usb-audio: Fix potential out-of-bounds shift Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 23/34] ALSA: usb-audio: Fix control access overflow errors from chmap Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 24/34] xhci: Give USB2 ports time to enter U3 in bus suspend Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 25/34] xhci-pci: Allow host runtime PM as default for Intel Alpine Ridge LP Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 26/34] USB: UAS: introduce a quirk to set no_write_same Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 27/34] USB: sisusbvga: Make console support depend on BROKEN Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 28/34] ALSA: pcm: oss: Fix potential out-of-bounds shift Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 29/34] serial: 8250_omap: Avoid FIFO corruption caused by MDR1 access Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 30/34] KVM: mmu: Fix SPTE encoding of MMIO generation upper half Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 31/34] Revert "selftests/ftrace: check for do_sys_openat2 in user-memory test" Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 32/34] membarrier: Explicitly sync remote cores when SYNC_CORE is requested Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 33/34] x86/resctrl: Remove unused struct mbm_state::chunks_bw Greg Kroah-Hartman
2020-12-19 13:03 ` [PATCH 5.4 34/34] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled Greg Kroah-Hartman
2020-12-19 21:49 ` [PATCH 5.4 00/34] 5.4.85-rc1 review Guenter Roeck
2020-12-20 3:58 ` Naresh Kamboju
2020-12-20 13:18 ` Jon Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201219125341.934325259@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=abuehaze@amazon.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=soheil@google.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).