stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Eric Dumazet <edumazet@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	Tom Herbert <therbert@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [ 19/54] tcp: change tcp_adv_win_scale and tcp_rmem[2]
Date: Fri, 18 May 2012 14:16:18 -0700	[thread overview]
Message-ID: <20120518211601.206387642@linuxfoundation.org> (raw)
In-Reply-To: <20120518212656.GA4992@kroah.com>

3.0-stable review patch.  If anyone has any objections, please let me know.

------------------


From: Eric Dumazet <edumazet@google.com>

[ Upstream commit b49960a05e32121d29316cfdf653894b88ac9190 ]

tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/networking/ip-sysctl.txt |    4 ++--
 net/ipv4/tcp.c                         |    9 +++++----
 net/ipv4/tcp_input.c                   |    2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -147,7 +147,7 @@ tcp_adv_win_scale - INTEGER
 	(if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
 	if it is <= 0.
 	Possible values are [-31, 31], inclusive.
-	Default: 2
+	Default: 1
 
 tcp_allowed_congestion_control - STRING
 	Show/set the congestion control choices available to non-privileged
@@ -407,7 +407,7 @@ tcp_rmem - vector of 3 INTEGERs: min, de
 	net.core.rmem_max.  Calling setsockopt() with SO_RCVBUF disables
 	automatic tuning of that socket's receive buffer size, in which
 	case this value is ignored.
-	Default: between 87380B and 4MB, depending on RAM size.
+	Default: between 87380B and 6MB, depending on RAM size.
 
 tcp_sack - BOOLEAN
 	Enable select acknowledgments (SACKS).
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3221,7 +3221,7 @@ void __init tcp_init(void)
 {
 	struct sk_buff *skb = NULL;
 	unsigned long limit;
-	int i, max_share, cnt;
+	int i, max_rshare, max_wshare, cnt;
 	unsigned long jiffy = jiffies;
 
 	BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
@@ -3285,15 +3285,16 @@ void __init tcp_init(void)
 
 	/* Set per-socket limits to no more than 1/128 the pressure threshold */
 	limit = ((unsigned long)sysctl_tcp_mem[1]) << (PAGE_SHIFT - 7);
-	max_share = min(4UL*1024*1024, limit);
+	max_wshare = min(4UL*1024*1024, limit);
+	max_rshare = min(6UL*1024*1024, limit);
 
 	sysctl_tcp_wmem[0] = SK_MEM_QUANTUM;
 	sysctl_tcp_wmem[1] = 16*1024;
-	sysctl_tcp_wmem[2] = max(64*1024, max_share);
+	sysctl_tcp_wmem[2] = max(64*1024, max_wshare);
 
 	sysctl_tcp_rmem[0] = SK_MEM_QUANTUM;
 	sysctl_tcp_rmem[1] = 87380;
-	sysctl_tcp_rmem[2] = max(87380, max_share);
+	sysctl_tcp_rmem[2] = max(87380, max_rshare);
 
 	printk(KERN_INFO "TCP: Hash tables configured "
 	       "(established %u bind %u)\n",
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -83,7 +83,7 @@ int sysctl_tcp_ecn __read_mostly = 2;
 EXPORT_SYMBOL(sysctl_tcp_ecn);
 int sysctl_tcp_dsack __read_mostly = 1;
 int sysctl_tcp_app_win __read_mostly = 31;
-int sysctl_tcp_adv_win_scale __read_mostly = 2;
+int sysctl_tcp_adv_win_scale __read_mostly = 1;
 EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);
 
 int sysctl_tcp_stdurg __read_mostly;



  parent reply	other threads:[~2012-05-18 21:16 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18 21:26 [ 00/54] 3.0.32-stable review Greg KH
2012-05-18 21:16 ` [ 01/54] smsc95xx: mark link down on startup and let PHY interrupt deal with carrier changes Greg KH
2012-05-18 21:16 ` [ 02/54] xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs Greg KH
2012-05-18 21:16 ` [ 03/54] xen/pci: dont use PCI BIOS service for configuration space accesses Greg KH
2012-05-18 21:16 ` [ 04/54] percpu, x86: dont use PMD_SIZE as embedded atom_size on 32bit Greg KH
2012-05-18 21:16 ` [ 05/54] asm-generic: Use __BITS_PER_LONG in statfs.h Greg KH
2012-05-18 21:16 ` [ 06/54] Fix __read_seqcount_begin() to use ACCESS_ONCE for sequence value read Greg KH
2012-05-18 21:16 ` [ 07/54] ARM: 7410/1: Add extra clobber registers for assembly in kernel_execve Greg KH
2012-05-18 21:16 ` [ 08/54] ARM: 7414/1: SMP: prevent use of the console when using idmap_pgd Greg KH
2012-05-18 21:16 ` [ 09/54] regulator: Fix the logic to ensure new voltage setting in valid range Greg KH
2012-05-18 21:16 ` [ 10/54] ARM: orion5x: Fix GPIO enable bits for MPP9 Greg KH
2012-05-18 21:16 ` [ 11/54] asix: Fix tx transfer padding for full-speed USB Greg KH
2012-05-18 21:16 ` [ 12/54] netem: fix possible skb leak Greg KH
2012-05-18 21:16 ` [ 13/54] net: In unregister_netdevice_notifier unregister the netdevices Greg KH
2012-05-21 17:35   ` Herton Ronaldo Krzesinski
2012-05-27  0:13     ` Greg KH
2012-05-27  0:18       ` David Miller
2012-05-27  0:22         ` Greg KH
2012-05-18 21:16 ` [ 14/54] net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg Greg KH
2012-05-18 21:16 ` [ 15/54] sky2: propogate rx hash when packet is copied Greg KH
2012-05-18 21:16 ` [ 16/54] sky2: fix receive length error in mixed non-VLAN/VLAN traffic Greg KH
2012-05-18 21:16 ` [ 17/54] tg3: Avoid panic from reserved statblk field access Greg KH
2012-05-18 21:16 ` [ 18/54] sungem: Fix WakeOnLan Greg KH
2012-05-18 21:16 ` Greg KH [this message]
2012-05-18 21:16 ` [ 20/54] sony-laptop: Enable keyboard backlight by default Greg KH
2012-05-18 21:16 ` [ 21/54] ALSA: echoaudio: Remove incorrect part of assertion Greg KH
2012-05-18 21:16 ` [ 22/54] ALSA: HDA: Lessen CPU usage when waiting for chip to respond Greg KH
2012-05-18 21:16 ` [ 23/54] usbnet: fix skb traversing races during unlink(v2) Greg KH
2012-05-18 21:16 ` [ 24/54] namespaces, pid_ns: fix leakage on fork() failure Greg KH
2012-05-18 21:16 ` [ 25/54] sparc64: Do not clobber %g2 in xcall_fetch_glob_regs() Greg KH
2012-05-18 21:16 ` [ 26/54] ARM: prevent VM_GROWSDOWN mmaps extending below FIRST_USER_ADDRESS Greg KH
2012-05-18 21:16 ` [ 27/54] media: rc: Postpone ISR registration Greg KH
2012-05-18 21:16 ` [ 28/54] cdc_ether: Ignore bogus union descriptor for RNDIS devices Greg KH
2012-05-18 21:16 ` [ 29/54] cdc_ether: add Novatel USB551L device IDs for FLAG_WWAN Greg KH
2012-05-18 21:16 ` [ 30/54] percpu: pcpu_embed_first_chunk() should free unused parts after all allocs are complete Greg KH
2012-05-18 21:16 ` [ 31/54] kmemleak: Fix the kmemleak tracking of the percpu areas with !SMP Greg KH
2012-05-19 13:27   ` Christoph Biedl
2012-05-19 14:46     ` Greg KH
2012-05-19 15:45       ` Christoph Biedl
2012-05-19 21:45       ` Catalin Marinas
2012-05-18 21:16 ` [ 32/54] hugetlb: prevent BUG_ON in hugetlb_fault() -> hugetlb_cow() Greg KH
2012-05-18 21:16 ` [ 33/54] mm: nobootmem: fix sign extend problem in __free_pages_memory() Greg KH
2012-05-18 21:16 ` [ 34/54] jffs2: Fix lock acquisition order bug in gc path Greg KH
2012-05-18 21:16 ` [ 35/54] arch/tile: apply commit 74fca9da0 to the compat signal handling as well Greg KH
2012-05-18 21:16 ` [ 36/54] crypto: mv_cesa requires on CRYPTO_HASH to build Greg KH
2012-05-18 21:16 ` [ 37/54] MD: Add del_timer_sync to mddev_suspend (fix nasty panic) Greg KH
2012-05-18 21:16 ` [ 38/54] tcp: do_tcp_sendpages() must try to push data out on oom conditions Greg KH
2012-05-18 21:16 ` [ 39/54] init: dont try mounting device as nfs root unless type fully matches Greg KH
2012-05-18 21:16 ` [ 40/54] ext4: avoid deadlock on sync-mounted FS w/o journal Greg KH
2012-05-18 21:16 ` [ 41/54] NFSv4: Revalidate uid/gid after open Greg KH
2012-05-18 21:16 ` [ 42/54] memcg: free spare array to avoid memory leak Greg KH
2012-05-18 21:16 ` [ 43/54] compat: Fix RT signal mask corruption via sigprocmask Greg KH
2012-05-18 21:16 ` [ 44/54] ext3: Fix error handling on inode bitmap corruption Greg KH
2012-05-18 21:16 ` [ 45/54] ext4: fix " Greg KH
2012-05-18 21:16 ` [ 46/54] ACPI / PM: Add Sony Vaio VPCCW29FX to nonvs blacklist Greg KH
2012-05-18 21:16 ` [ 47/54] SCSI: hpsa: Add IRQF_SHARED back in for the non-MSI(X) interrupt handler Greg KH
2012-05-18 21:16 ` [ 48/54] wake up s_wait_unfrozen when ->freeze_fs fails Greg KH
2012-05-18 21:16 ` [ 49/54] pch_gpio: Support new device LAPIS Semiconductor ML7831 IOH Greg KH
2012-05-18 21:16 ` [ 50/54] pch_gbe: fixed the issue which receives an unnecessary packet Greg KH
2012-05-18 21:16 ` [ 51/54] pch_gbe: support ML7831 IOH Greg KH
2012-05-18 21:16 ` [ 52/54] pch_gbe: Fixed the issue on which PC was frozen when link was downed Greg KH
2012-05-18 21:16 ` [ 53/54] pch_gbe: Do not abort probe on bad MAC Greg KH
2012-05-18 21:16 ` [ 54/54] pch_gbe: memory corruption calling pch_gbe_validate_option() Greg KH
2012-05-19  1:01 ` [ 00/54] 3.0.32-stable review Steven Rostedt
2012-05-19  4:20   ` Greg KH
2012-05-20  2:01     ` [PATCH] pidmap: Use GFP_ATOMIC to allocate page (was: Re: [ 00/54] 3.0.32-stable review) Steven Rostedt
2012-05-20  2:32       ` David Rientjes
2012-05-20 19:03         ` Linus Torvalds
2012-05-20 23:22           ` David Rientjes
2012-05-20 23:35             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120518211601.206387642@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=stable@vger.kernel.org \
    --cc=therbert@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).