Archive-only list for patches
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Neal Cardwell <ncardwell@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Xin Guo <guoxin0309@gmail.com>, Jakub Kicinski <kuba@kernel.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 81/91] tcp: fix delayed ACKs for MSS boundary condition
Date: Mon,  9 Oct 2023 15:06:53 +0200	[thread overview]
Message-ID: <20231009130114.361204842@linuxfoundation.org> (raw)
In-Reply-To: <20231009130111.518916887@linuxfoundation.org>

4.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <ncardwell@google.com>

[ Upstream commit 4720852ed9afb1c5ab84e96135cb5b73d5afde6f ]

This commit fixes poor delayed ACK behavior that can cause poor TCP
latency in a particular boundary condition: when an application makes
a TCP socket write that is an exact multiple of the MSS size.

The problem is that there is painful boundary discontinuity in the
current delayed ACK behavior. With the current delayed ACK behavior,
we have:

(1) If an app reads data when > 1*MSS is unacknowledged, then
    tcp_cleanup_rbuf() ACKs immediately because of:

     tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||

(2) If an app reads all received data, and the packets were < 1*MSS,
    and either (a) the app is not ping-pong or (b) we received two
    packets < 1*MSS, then tcp_cleanup_rbuf() ACKs immediately beecause
    of:

     ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED2) ||
      ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED) &&
       !inet_csk_in_pingpong_mode(sk))) &&

(3) *However*: if an app reads exactly 1*MSS of data,
    tcp_cleanup_rbuf() does not send an immediate ACK. This is true
    even if the app is not ping-pong and the 1*MSS of data had the PSH
    bit set, suggesting the sending application completed an
    application write.

Thus if the app is not ping-pong, we have this painful case where
>1*MSS gets an immediate ACK, and <1*MSS gets an immediate ACK, but a
write whose last skb is an exact multiple of 1*MSS can get a 40ms
delayed ACK. This means that any app that transfers data in one
direction and takes care to align write size or packet size with MSS
can suffer this problem. With receive zero copy making 4KB MSS values
more common, it is becoming more common to have application writes
naturally align with MSS, and more applications are likely to
encounter this delayed ACK problem.

The fix in this commit is to refine the delayed ACK heuristics with a
simple check: immediately ACK a received 1*MSS skb with PSH bit set if
the app reads all data. Why? If an skb has a len of exactly 1*MSS and
has the PSH bit set then it is likely the end of an application
write. So more data may not be arriving soon, and yet the data sender
may be waiting for an ACK if cwnd-bound or using TX zero copy. Thus we
set ICSK_ACK_PUSHED in this case so that tcp_cleanup_rbuf() will send
an ACK immediately if the app reads all of the data and is not
ping-pong. Note that this logic is also executed for the case where
len > MSS, but in that case this logic does not matter (and does not
hurt) because tcp_cleanup_rbuf() will always ACK immediately if the
app reads data and there is more than an MSS of unACKed data.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Guo <guoxin0309@gmail.com>
Link: https://lore.kernel.org/r/20231001151239.1866845-2-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/ipv4/tcp_input.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9e1ec69fe5b46..0052a6194cc1a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -172,6 +172,19 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 		if (unlikely(len > icsk->icsk_ack.rcv_mss +
 				   MAX_TCP_OPTION_SPACE))
 			tcp_gro_dev_warn(sk, skb, len);
+		/* If the skb has a len of exactly 1*MSS and has the PSH bit
+		 * set then it is likely the end of an application write. So
+		 * more data may not be arriving soon, and yet the data sender
+		 * may be waiting for an ACK if cwnd-bound or using TX zero
+		 * copy. So we set ICSK_ACK_PUSHED here so that
+		 * tcp_cleanup_rbuf() will send an ACK immediately if the app
+		 * reads all of the data and is not ping-pong. If len > MSS
+		 * then this logic does not matter (and does not hurt) because
+		 * tcp_cleanup_rbuf() will always ACK immediately if the app
+		 * reads data and there is more than an MSS of unACKed data.
+		 */
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_PSH)
+			icsk->icsk_ack.pending |= ICSK_ACK_PUSHED;
 	} else {
 		/* Otherwise, we make more careful check taking into account,
 		 * that SACKs block is variable.
-- 
2.40.1




  parent reply	other threads:[~2023-10-09 13:53 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-09 13:05 [PATCH 4.19 00/91] 4.19.296-rc1 review Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 01/91] NFS/pNFS: Report EINVAL errors from connect() to the server Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 02/91] ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 03/91] ata: libahci: clear pending interrupt status Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 04/91] netfilter: nf_tables: disallow element removal on anonymous sets Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 05/91] selftests/tls: Add {} to avoid static checker warning Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 06/91] selftests: tls: swap the TX and RX sockets in some tests Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 07/91] ipv4: fix null-deref in ipv4_link_failure Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 08/91] powerpc/perf/hv-24x7: Update domain value check Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 09/91] net: hns3: add 5ms delay before clear firmware reset irq source Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 10/91] net: add atomic_long_t to net_device_stats fields Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 11/91] net: bridge: use DEV_STATS_INC() Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 12/91] team: fix null-ptr-deref when team device type is changed Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 13/91] gpio: tb10x: Fix an error handling path in tb10x_gpio_probe() Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 14/91] i2c: mux: demux-pinctrl: check the return value of devm_kstrdup() Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 15/91] Input: i8042 - add quirk for TUXEDO Gemini 17 Gen1/Clevo PD70PN Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 16/91] scsi: qla2xxx: Add protection mask module parameters Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 17/91] scsi: qla2xxx: Remove unsupported ql2xenabledif option Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 18/91] scsi: megaraid_sas: Load balance completions across all MSI-X Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 19/91] scsi: megaraid_sas: Fix deadlock on firmware crashdump Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 20/91] ext4: remove the group parameter of ext4_trim_extent Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 21/91] ext4: add new helper interface ext4_try_to_trim_range() Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 22/91] ext4: scope ret locally in ext4_try_to_trim_range() Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 23/91] ext4: change s_last_trim_minblks type to unsigned long Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 24/91] ext4: mark group as trimmed only if it was fully scanned Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 25/91] ext4: replace the traditional ternary conditional operator with with max()/min() Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 26/91] ext4: move setting of trimmed bit into ext4_try_to_trim_range() Greg Kroah-Hartman
2023-10-09 13:05 ` [PATCH 4.19 27/91] ext4: do not let fstrim block system suspend Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 28/91] MIPS: Alchemy: only build mmc support helpers if au1xmmc is enabled Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 29/91] clk: tegra: fix error return case for recalc_rate Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 30/91] ARM: dts: ti: omap: motorola-mapphone: Fix abe_clkctrl warning on boot Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 31/91] gpio: pmic-eic-sprd: Add can_sleep flag for PMIC EIC chip Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 32/91] parisc: sba: Fix compile warning wrt list of SBA devices Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 33/91] parisc: iosapic.c: Fix sparse warnings Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 34/91] parisc: drivers: Fix sparse warning Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 35/91] parisc: irq: Make irq_stack_union static to avoid " Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 36/91] selftests/ftrace: Correctly enable event in instance-event.tc Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 37/91] ring-buffer: Avoid softlockup in ring_buffer_resize() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 38/91] ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 39/91] bpf: Clarify error expectations from bpf_clone_redirect Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 40/91] fbdev/sh7760fb: Depend on FB=y Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 41/91] nvme-pci: do not set the NUMA node of device if it has none Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 42/91] watchdog: iTCO_wdt: No need to stop the timer in probe Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 43/91] watchdog: iTCO_wdt: Set NO_REBOOT if the watchdog is not already running Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 44/91] net: Fix unwanted sign extension in netdev_stats_to_stats64() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 45/91] scsi: megaraid_sas: Enable msix_load_balance for Invader and later controllers Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 46/91] Smack:- Use overlay inode label in smack_inode_copy_up() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 47/91] smack: Retrieve transmuting information in smack_inode_getsecurity() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 48/91] smack: Record transmuting in smk_transmuted Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 49/91] serial: 8250_port: Check IRQ data before use Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 50/91] nilfs2: fix potential use after free in nilfs_gccache_submit_read_data() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 51/91] ALSA: hda: Disable power save for solving pop issue on Lenovo ThinkCentre M70q Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 52/91] ata: libata-scsi: ignore reserved bits for REPORT SUPPORTED OPERATION CODES Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 53/91] i2c: i801: unregister tco_pdev in i801_probe() error path Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 54/91] btrfs: properly report 0 avail for very full file systems Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 55/91] net: thunderbolt: Fix TCPv6 GSO checksum calculation Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 56/91] ata: libata-core: Fix ata_port_request_pm() locking Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 57/91] ata: libata-core: Fix port and device removal Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 58/91] ata: libata-core: Do not register PM operations for SAS ports Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 59/91] ata: libata-sata: increase PMP SRST timeout to 10s Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 60/91] fs: binfmt_elf_efpic: fix personality for ELF-FDPIC Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 61/91] ext4: fix rec_len verify error Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 62/91] ata: libata: disallow dev-initiated LPM transitions to unsupported states Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 63/91] Revert "drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) functions" Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 64/91] media: dvb: symbol fixup for dvb_attach() - again Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 65/91] Revert "PCI: qcom: Disable write access to read only registers for IP v2.3.3" Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 66/91] scsi: zfcp: Fix a double put in zfcp_port_enqueue() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 67/91] qed/red_ll2: Fix undefined behavior bug in struct qed_ll2_info Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 68/91] wifi: mwifiex: Fix tlv_buf_left calculation Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 69/91] net: replace calls to sock->ops->connect() with kernel_connect() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 70/91] btrfs: reject unknown mount options early Greg Kroah-Hartman
2023-10-10  8:53   ` Qu Wenruo
2023-10-10 11:27     ` Greg Kroah-Hartman
2023-10-10 12:59       ` David Sterba
2023-10-10 15:25         ` Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 71/91] ubi: Refuse attaching if mtds erasesize is 0 Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 72/91] wifi: mwifiex: Fix oob check condition in mwifiex_process_rx_packet Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 73/91] drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 74/91] regmap: rbtree: Fix wrong register marked as in-cache when creating new node Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 75/91] scsi: target: core: Fix deadlock due to recursive locking Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 76/91] modpost: add missing else to the "of" check Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 77/91] ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 78/91] net: usb: smsc75xx: Fix uninit-value access in __smsc75xx_read_reg Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 79/91] net: stmmac: dwmac-stm32: fix resume on STM32 MCU Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 80/91] tcp: fix quick-ack counting to count actual ACKs of new data Greg Kroah-Hartman
2023-10-09 13:06 ` Greg Kroah-Hartman [this message]
2023-10-09 13:06 ` [PATCH 4.19 82/91] sctp: update transport state when processing a dupcook packet Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 83/91] sctp: update hb timer immediately after users change hb_interval Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 84/91] cpupower: add Makefile dependencies for install targets Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 85/91] IB/mlx4: Fix the size of a buffer in add_port_entries() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 86/91] gpio: aspeed: fix the GPIO number passed to pinctrl_gpio_set_config() Greg Kroah-Hartman
2023-10-09 13:06 ` [PATCH 4.19 87/91] gpio: pxa: disable pinctrl calls for MMP_GPIO Greg Kroah-Hartman
2023-10-09 13:07 ` [PATCH 4.19 88/91] RDMA/cma: Fix truncation compilation warning in make_cma_ports Greg Kroah-Hartman
2023-10-09 13:07 ` [PATCH 4.19 89/91] RDMA/mlx5: Fix NULL string error Greg Kroah-Hartman
2023-10-09 13:07 ` [PATCH 4.19 90/91] parisc: Restore __ldcw_align for PA-RISC 2.0 processors Greg Kroah-Hartman
2023-10-09 13:07 ` [PATCH 4.19 91/91] dccp: fix dccp_v4_err()/dccp_v6_err() again Greg Kroah-Hartman
2023-10-09 23:03 ` [PATCH 4.19 00/91] 4.19.296-rc1 review Shuah Khan
2023-10-10  9:57 ` Jon Hunter
2023-10-10  9:57 ` Jon Hunter
2023-10-10 18:17 ` Guenter Roeck
2023-10-11  1:30 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231009130114.361204842@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=edumazet@google.com \
    --cc=guoxin0309@gmail.com \
    --cc=kuba@kernel.org \
    --cc=ncardwell@google.com \
    --cc=patches@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox