All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Eric Dumazet <edumazet@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	Tom Herbert <therbert@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [ 19/53] tcp: change tcp_adv_win_scale and tcp_rmem[2]
Date: Fri, 18 May 2012 03:33:13 +0100	[thread overview]
Message-ID: <20120518023256.905701155@decadent.org.uk> (raw)
In-Reply-To: <20120518023254.339945758@decadent.org.uk>

3.2.18-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit b49960a05e32121d29316cfdf653894b88ac9190 ]

tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 Documentation/networking/ip-sysctl.txt |    4 ++--
 net/ipv4/tcp.c                         |    9 +++++----
 net/ipv4/tcp_input.c                   |    2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 589f2da..a4399f5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -137,7 +137,7 @@ tcp_adv_win_scale - INTEGER
 	(if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
 	if it is <= 0.
 	Possible values are [-31, 31], inclusive.
-	Default: 2
+	Default: 1
 
 tcp_allowed_congestion_control - STRING
 	Show/set the congestion control choices available to non-privileged
@@ -397,7 +397,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
 	net.core.rmem_max.  Calling setsockopt() with SO_RCVBUF disables
 	automatic tuning of that socket's receive buffer size, in which
 	case this value is ignored.
-	Default: between 87380B and 4MB, depending on RAM size.
+	Default: between 87380B and 6MB, depending on RAM size.
 
 tcp_sack - BOOLEAN
 	Enable select acknowledgments (SACKS).
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 34f5db1..0237ad3 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3216,7 +3216,7 @@ void __init tcp_init(void)
 {
 	struct sk_buff *skb = NULL;
 	unsigned long limit;
-	int i, max_share, cnt;
+	int i, max_rshare, max_wshare, cnt;
 	unsigned long jiffy = jiffies;
 
 	BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
@@ -3280,15 +3280,16 @@ void __init tcp_init(void)
 
 	/* Set per-socket limits to no more than 1/128 the pressure threshold */
 	limit = ((unsigned long)sysctl_tcp_mem[1]) << (PAGE_SHIFT - 7);
-	max_share = min(4UL*1024*1024, limit);
+	max_wshare = min(4UL*1024*1024, limit);
+	max_rshare = min(6UL*1024*1024, limit);
 
 	sysctl_tcp_wmem[0] = SK_MEM_QUANTUM;
 	sysctl_tcp_wmem[1] = 16*1024;
-	sysctl_tcp_wmem[2] = max(64*1024, max_share);
+	sysctl_tcp_wmem[2] = max(64*1024, max_wshare);
 
 	sysctl_tcp_rmem[0] = SK_MEM_QUANTUM;
 	sysctl_tcp_rmem[1] = 87380;
-	sysctl_tcp_rmem[2] = max(87380, max_share);
+	sysctl_tcp_rmem[2] = max(87380, max_rshare);
 
 	printk(KERN_INFO "TCP: Hash tables configured "
 	       "(established %u bind %u)\n",
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b9c6567..06a4052 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -83,7 +83,7 @@ int sysctl_tcp_ecn __read_mostly = 2;
 EXPORT_SYMBOL(sysctl_tcp_ecn);
 int sysctl_tcp_dsack __read_mostly = 1;
 int sysctl_tcp_app_win __read_mostly = 31;
-int sysctl_tcp_adv_win_scale __read_mostly = 2;
+int sysctl_tcp_adv_win_scale __read_mostly = 1;
 EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);
 
 int sysctl_tcp_stdurg __read_mostly;
-- 
1.7.10.1




  parent reply	other threads:[~2012-05-18  3:07 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18  2:32 [ 00/53] 3.2.18-stable review Ben Hutchings
2012-05-18  2:32 ` [ 01/53] regulator: Fix the logic to ensure new voltage setting in valid range Ben Hutchings
2012-05-18  2:32 ` [ 02/53] ARM: OMAP: Revert "ARM: OMAP: ctrl: Fix CONTROL_DSIPHY register fields" Ben Hutchings
2012-05-18  2:32 ` [ 03/53] ALSA: echoaudio: Remove incorrect part of assertion Ben Hutchings
2012-05-18  2:32 ` [ 04/53] ARM: orion5x: Fix GPIO enable bits for MPP9 Ben Hutchings
2012-05-18  2:32 ` [ 05/53] ALSA: HDA: Lessen CPU usage when waiting for chip to respond Ben Hutchings
2012-05-18  2:33 ` [ 06/53] percpu: pcpu_embed_first_chunk() should free unused parts after all allocs are complete Ben Hutchings
2012-05-18  2:33 ` [ 07/53] hugetlb: prevent BUG_ON in hugetlb_fault() -> hugetlb_cow() Ben Hutchings
2012-05-18  2:33 ` [ 08/53] namespaces, pid_ns: fix leakage on fork() failure Ben Hutchings
2012-05-18  2:33 ` [ 09/53] mm: nobootmem: fix sign extend problem in __free_pages_memory() Ben Hutchings
2012-05-18  2:33 ` [ 10/53] asix: Fix tx transfer padding for full-speed USB Ben Hutchings
2012-05-18  2:33 ` [ 11/53] netem: fix possible skb leak Ben Hutchings
2012-05-18  2:33 ` [ 12/53] net: In unregister_netdevice_notifier unregister the netdevices Ben Hutchings
2012-05-18  5:09   ` Herton Ronaldo Krzesinski
2012-05-18  5:41     ` David Miller
2012-05-19  0:31       ` Ben Hutchings
2012-05-19  3:43         ` David Miller
2012-05-18  2:33 ` [ 13/53] net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg Ben Hutchings
2012-05-18  2:33 ` [ 14/53] sky2: propogate rx hash when packet is copied Ben Hutchings
2012-05-18  2:33 ` [ 15/53] sky2: fix receive length error in mixed non-VLAN/VLAN traffic Ben Hutchings
2012-05-18  2:33 ` [ 16/53] sungem: Fix WakeOnLan Ben Hutchings
2012-05-18  2:33 ` [ 17/53] tg3: Avoid panic from reserved statblk field access Ben Hutchings
2012-05-18  2:33 ` [ 18/53] tcp: fix infinite cwnd in tcp_complete_cwr() Ben Hutchings
2012-05-18  2:33 ` Ben Hutchings [this message]
2012-05-18  2:33 ` [ 20/53] brcm80211: smac: pass missing argument to brcms_b_mute Ben Hutchings
2012-05-18  2:33 ` [ 21/53] phy:icplus:fix Auto Power Saving in ip101a_config_init Ben Hutchings
2012-05-18  2:33 ` [ 22/53] NFSv4: Revalidate uid/gid after open Ben Hutchings
2012-05-18  2:33 ` [ 23/53] target: Drop incorrect se_lun_acl release for dynamic -> explict ACL conversion Ben Hutchings
2012-05-18  2:33 ` [ 24/53] [media] marvell-cam: fix an ARM build error Ben Hutchings
2012-05-18  2:33 ` [ 25/53] [media] rc: Postpone ISR registration Ben Hutchings
2012-05-18  2:33 ` [ 26/53] cdc_ether: Ignore bogus union descriptor for RNDIS devices Ben Hutchings
2012-05-18  2:33 ` [ 27/53] jffs2: Fix lock acquisition order bug in gc path Ben Hutchings
2012-05-18  2:33 ` [ 28/53] [media] s5p-fimc: Fix locking in subdev set_crop op Ben Hutchings
2012-05-18  2:33 ` [ 29/53] dm mpath: check if scsi_dh module already loaded before trying to load Ben Hutchings
2012-05-18  2:33 ` [ 30/53] sparc64: Do not clobber %g2 in xcall_fetch_glob_regs() Ben Hutchings
2012-05-18  2:33 ` [ 31/53] gpio: Add missing spin_lock_init in gpio-ml-ioh driver Ben Hutchings
2012-05-18  2:33 ` [ 32/53] spi-topcliff-pch: Modify pci-bus number dynamically to get DMA device info Ben Hutchings
2012-05-18  2:33 ` [ 33/53] spi-topcliff-pch: Fix issue for transmitting over 4KByte Ben Hutchings
2012-05-18  2:33 ` [ 34/53] spi-topcliff-pch: supports a spi mode setup and bit order setup by IO control Ben Hutchings
2012-05-18  2:33 ` [ 35/53] spi-topcliff-pch: add recovery processing in case wait-event timeout Ben Hutchings
2012-05-18  2:33 ` [ 36/53] ext4: avoid deadlock on sync-mounted FS w/o journal Ben Hutchings
2012-05-18  2:33 ` [ 37/53] ia64: Add accept4() syscall Ben Hutchings
2012-05-18  2:33 ` [ 38/53] brcm80211: smac: fix endless retry of A-MPDU transmissions Ben Hutchings
2012-05-18  2:33 ` [ 39/53] ARM: 7417/1: vfp: ensure preemption is disabled when enabling VFP access Ben Hutchings
2012-05-18  2:33 ` [ 40/53] target: Fix SPC-2 RELEASE bug for multi-session iSCSI client setups Ben Hutchings
2012-05-18  2:33 ` [ 41/53] crypto: mv_cesa requires on CRYPTO_HASH to build Ben Hutchings
2012-05-18  2:33 ` [ 42/53] ALSA: hda/idt - Fix power-map for speaker-pins with some HP laptops Ben Hutchings
2012-05-18  2:33 ` [ 43/53] ASoC: wm8994: Fix AIF2ADC power down Ben Hutchings
2012-05-18  2:33 ` [ 44/53] usbnet: fix skb traversing races during unlink(v2) Ben Hutchings
2012-05-18  2:33 ` [ 45/53] cdc_ether: add Novatel USB551L device IDs for FLAG_WWAN Ben Hutchings
2012-05-18  2:33 ` [ 46/53] ARM: prevent VM_GROWSDOWN mmaps extending below FIRST_USER_ADDRESS Ben Hutchings
2012-05-18  2:33 ` [ 47/53] arch/tile: apply commit 74fca9da0 to the compat signal handling as well Ben Hutchings
2012-05-18  2:33 ` [ 48/53] MD: Add del_timer_sync to mddev_suspend (fix nasty panic) Ben Hutchings
2012-05-18  2:33 ` [ 49/53] target: Fix bug in handling of FILEIO + block_device resize ops Ben Hutchings
2012-05-18  2:33 ` [ 50/53] tcp: do_tcp_sendpages() must try to push data out on oom conditions Ben Hutchings
2012-05-18  2:33 ` [ 51/53] e1000: Prevent reset task killing itself Ben Hutchings
2012-05-18  2:33 ` [ 52/53] mtd: map.h: fix arm cross-build failure Ben Hutchings
2012-05-18  2:33 ` [ 53/53] stmmac: Fix compilation error in mmc_core.c Ben Hutchings
2012-05-18 14:46 ` [ 00/53] 3.2.18-stable review Steven Rostedt
2012-05-18 18:22   ` Greg KH
2012-05-18 22:00     ` Ben Hutchings
2012-05-18 22:04       ` Willy Tarreau
2012-05-19  0:48       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120518023256.905701155@decadent.org.uk \
    --to=ben@decadent.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=stable@vger.kernel.org \
    --cc=therbert@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.