netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "David S. Miller" <davem@davemloft.net>
To: "David S. Miller" <davem@davemloft.net>
Cc: herbert@gondor.apana.org.au, jheffner@psc.edu, ak@suse.de,
	niv@us.ibm.com, andy.grover@gmail.com, anton@samba.org,
	netdev@oss.sgi.com
Subject: Re: bad TSO performance in 2.6.9-rc2-BK
Date: Thu, 30 Sep 2004 18:12:48 -0700	[thread overview]
Message-ID: <20040930181248.48185e41.davem@davemloft.net> (raw)
In-Reply-To: <20040930173439.3e0d2799.davem@davemloft.net>

On Thu, 30 Sep 2004 17:34:39 -0700
"David S. Miller" <davem@davemloft.net> wrote:

> If I disable /proc/sys/net/tcp_moderate_rcvbuf performance
> goes down from ~634Mbit/sec to ~495Mbit/sec.
> 
> Andi, I know you said that with TSO disabled things go 
> more smoothly.  But could you try upping the TCP socket
> receive buffer sizes on the 2.6.5 box to see if that gives
> you the performance back with TSO enabled?

Ok, here is something to play with.  This adds a sysctl
to moderate the percentage of the congestion window we'll
limit TSO segmenting to.

It defaults to 2, but setting of 3 or 4 seem to make
Andi's case behave much better.

With such small receive buffers, netperf simply can't clear
the receive queue fast enough when a burst of TSO created
frames come in.

This is also where the stretch ACKs come from.  We defer
the ACK to recvmsg making progress, because we cannot
advertise a larger window and thus the connection is
application limited.

I'm also thinking about whether this sysctl should be
a divisor instead of a shift, and also whether it should
be in terms of the snd_cwnd or the advertised receiver
window whichever is smaller.

Basically, receivers with too small socket receive buffers
crap out if TSO bursts are too large.  This effect is
minimized the further the receiver is (rtt wise) from
the sender since the path tends to smooth out the bursts.
But on local gigabit lans, the effect is quite pronounced.

Ironically, this case is a great example of how powerful
and incredibly effective John's receive buffer moderation
code is.  2.6.5 performance is severely hampered due to lack
of this code.

===== include/linux/sysctl.h 1.88 vs edited =====
--- 1.88/include/linux/sysctl.h	2004-09-23 14:34:12 -07:00
+++ edited/include/linux/sysctl.h	2004-09-30 17:17:49 -07:00
@@ -341,6 +341,7 @@
 	NET_TCP_BIC_LOW_WINDOW=104,
 	NET_TCP_DEFAULT_WIN_SCALE=105,
 	NET_TCP_MODERATE_RCVBUF=106,
+	NET_TCP_TSO_CWND_SHIFT=107,
 };
 
 enum {
===== include/net/tcp.h 1.92 vs edited =====
--- 1.92/include/net/tcp.h	2004-09-29 21:11:52 -07:00
+++ edited/include/net/tcp.h	2004-09-30 17:18:02 -07:00
@@ -609,6 +609,7 @@
 extern int sysctl_tcp_bic_fast_convergence;
 extern int sysctl_tcp_bic_low_window;
 extern int sysctl_tcp_moderate_rcvbuf;
+extern int sysctl_tcp_tso_cwnd_shift;
 
 extern atomic_t tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
===== net/ipv4/sysctl_net_ipv4.c 1.25 vs edited =====
--- 1.25/net/ipv4/sysctl_net_ipv4.c	2004-08-26 13:55:36 -07:00
+++ edited/net/ipv4/sysctl_net_ipv4.c	2004-09-30 17:19:32 -07:00
@@ -674,6 +674,14 @@
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	},
+	{
+		.ctl_name	= NET_TCP_TSO_CWND_SHIFT,
+		.procname	= "tcp_tso_cwnd_shift",
+		.data		= &sysctl_tcp_tso_cwnd_shift,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
 	{ .ctl_name = 0 }
 };
 
===== net/ipv4/tcp_output.c 1.65 vs edited =====
--- 1.65/net/ipv4/tcp_output.c	2004-09-29 21:11:53 -07:00
+++ edited/net/ipv4/tcp_output.c	2004-09-30 17:27:32 -07:00
@@ -44,6 +44,7 @@
 
 /* People can turn this off for buggy TCP's found in printers etc. */
 int sysctl_tcp_retrans_collapse = 1;
+int sysctl_tcp_tso_cwnd_shift = 2;
 
 static __inline__
 void update_send_head(struct sock *sk, struct tcp_opt *tp, struct sk_buff *skb)
@@ -673,7 +674,7 @@
 		    !tp->urg_mode);
 
 	if (do_large) {
-		int large_mss, factor;
+		int large_mss, factor, limit;
 
 		large_mss = 65535 - tp->af_specific->net_header_len -
 			tp->ext_header_len - tp->ext2_header_len -
@@ -688,8 +689,10 @@
 		 * can keep the ACK clock ticking.
 		 */
 		factor = large_mss / mss_now;
-		if (factor > (tp->snd_cwnd >> 2))
-			factor = max(1, tp->snd_cwnd >> 2);
+		limit = tp->snd_cwnd >> sysctl_tcp_tso_cwnd_shift;
+		limit = max(1, limit);
+		if (factor > limit)
+			factor = limit;
 
 		tp->mss_cache = mss_now * factor;
 

  reply	other threads:[~2004-10-01  1:12 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-20  6:30 bad TSO performance in 2.6.9-rc2-BK Anton Blanchard
2004-09-20 15:54 ` Nivedita Singhvi
2004-09-21 15:55   ` Anton Blanchard
2004-09-20 20:30 ` Andi Kleen
2004-09-21 22:58   ` David S. Miller
2004-09-22 14:00     ` Andi Kleen
2004-09-22 18:12       ` David S. Miller
2004-09-22 19:55         ` Andi Kleen
2004-09-22 20:07           ` Nivedita Singhvi
2004-09-22 20:30             ` David S. Miller
2004-09-22 20:56               ` Nivedita Singhvi
2004-09-22 21:56               ` Andi Kleen
2004-09-22 22:04                 ` David S. Miller
2004-09-22 20:12           ` Andrew Grover
2004-09-22 20:39             ` David S. Miller
2004-09-22 22:06               ` Andi Kleen
2004-09-22 22:25                 ` David S. Miller
2004-09-22 22:47                   ` Andi Kleen
2004-09-22 22:50                     ` David S. Miller
2004-09-23 23:11                     ` David S. Miller
2004-09-23 23:41                       ` Herbert Xu
2004-09-23 23:41                         ` David S. Miller
2004-09-24  0:12                           ` Herbert Xu
2004-09-24  0:40                             ` Herbert Xu
2004-09-24  1:07                               ` Herbert Xu
2004-09-24  1:17                                 ` David S. Miller
2004-09-27  1:27                           ` Herbert Xu
2004-09-27  2:50                             ` Herbert Xu
2004-09-27  4:00                               ` David S. Miller
2004-09-27  5:45                                 ` Herbert Xu
2004-09-27 19:01                                   ` David S. Miller
2004-09-27 21:32                                     ` Herbert Xu
2004-09-28 21:10                                       ` David S. Miller
2004-09-28 21:34                                         ` Andi Kleen
2004-09-28 21:53                                           ` David S. Miller
2004-09-28 22:33                                             ` Andi Kleen
2004-09-28 22:57                                               ` David S. Miller
2004-09-28 23:27                                                 ` Andi Kleen
2004-09-28 23:35                                                   ` David S. Miller
2004-09-28 23:55                                                     ` Andi Kleen
2004-09-29  0:04                                                       ` David S. Miller
2004-09-29 20:58                                                   ` John Heffner
2004-09-29 21:10                                                     ` Nivedita Singhvi
2004-09-29 21:50                                                       ` David S. Miller
2004-09-29 21:56                                                         ` Andi Kleen
2004-09-29 23:29                                                           ` David S. Miller
2004-09-29 23:51                                                             ` John Heffner
2004-09-30  0:03                                                               ` David S. Miller
2004-09-30  0:10                                                                 ` Herbert Xu
2004-10-01  0:34                                                                   ` David S. Miller
2004-10-01  1:12                                                                     ` David S. Miller [this message]
2004-10-01  3:40                                                                       ` David S. Miller
2004-10-01 10:35                                                                         ` Andi Kleen
2004-10-01 10:23                                                                       ` Andi Kleen
2004-09-30  0:10                                                               ` John Heffner
2004-09-30 17:25                                                                 ` John Heffner
2004-09-30 20:23                                                                   ` David S. Miller
2004-09-30  0:05                                                             ` Herbert Xu
2004-09-30  4:33                                                               ` David S. Miller
2004-09-30  5:47                                                                 ` Herbert Xu
2004-09-30  7:39                                                                   ` David S. Miller
2004-09-30  8:09                                                                     ` Herbert Xu
2004-09-30  9:29                                                                 ` Andi Kleen
2004-09-30 20:20                                                                   ` David S. Miller
2004-09-29  3:27                                               ` John Heffner
2004-09-29  9:01                                                 ` Andi Kleen
2004-09-29 19:56                                                   ` David S. Miller
2004-09-29 20:56                                                     ` Andi Kleen
2004-09-29 21:17                                                       ` David S. Miller
2004-09-29 21:00                                                 ` David S. Miller
2004-09-29 21:16                                                   ` Nivedita Singhvi
2004-09-29 21:22                                                     ` David S. Miller
2004-09-29 21:43                                                       ` Andi Kleen
2004-09-29 21:51                                                         ` John Heffner
2004-09-29 21:52                                                           ` David S. Miller
2004-09-24  8:30                       ` Andi Kleen
2004-09-27 22:38                       ` John Heffner
2004-09-27 23:04                         ` David S. Miller
2004-09-27 23:25                           ` Andi Kleen
2004-09-27 23:37                             ` David S. Miller
2004-09-27 23:51                               ` Andi Kleen
2004-09-28  0:15                                 ` David S. Miller
2004-09-27 23:36                           ` Herbert Xu
2004-09-28  0:13                             ` David S. Miller
2004-09-28  0:34                               ` Herbert Xu
2004-09-28  4:59                                 ` David S. Miller
2004-09-28  5:15                                   ` Herbert Xu
2004-09-28  5:58                                     ` David S. Miller
2004-09-28  6:45                                   ` Nivedita Singhvi
2004-09-28  7:20                               ` Nivedita Singhvi
2004-09-28 20:38                                 ` David S. Miller
2004-09-28  7:23                         ` Nivedita Singhvi
2004-09-28  8:23                           ` Herbert Xu
2004-09-28 12:53                           ` John Heffner
2004-09-22 20:28           ` David S. Miller
     [not found] <Pine.NEB.4.33.0409301625560.13549-100000@dexter.psc.edu>
2004-10-02  1:32 ` John Heffner
2004-10-04 20:07   ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040930181248.48185e41.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=ak@suse.de \
    --cc=andy.grover@gmail.com \
    --cc=anton@samba.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jheffner@psc.edu \
    --cc=netdev@oss.sgi.com \
    --cc=niv@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).