netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "David S. Miller" <davem@davemloft.net>
To: "David S. Miller" <davem@davemloft.net>
Cc: herbert@gondor.apana.org.au, jheffner@psc.edu, ak@suse.de,
	niv@us.ibm.com, andy.grover@gmail.com, anton@samba.org,
	netdev@oss.sgi.com
Subject: Re: bad TSO performance in 2.6.9-rc2-BK
Date: Thu, 30 Sep 2004 20:40:05 -0700	[thread overview]
Message-ID: <20040930204005.69115c0e.davem@davemloft.net> (raw)
In-Reply-To: <20040930181248.48185e41.davem@davemloft.net>

On Thu, 30 Sep 2004 18:12:48 -0700
"David S. Miller" <davem@davemloft.net> wrote:

> Ok, here is something to play with.  This adds a sysctl
> to moderate the percentage of the congestion window we'll
> limit TSO segmenting to.

I've done some tweaking and this is the patch I actually
checked into my tree.  I made it a divisor and the default
is 8.

I tried to play around with taking the send window and the
congestion window both into account, but that did not help
at all.

My current setup is Ultra-III 750Mhz w/tg3 sending to
Ultra-II 360Mhz w/tg3 through a D-Link DGS 1008-T gigabit
switch.  I'm using 32-bit binaries of netperf 2.3pl1
built with -DUSE_PROC_STAT and -DHAVE_SENDFILE.

The MTU being used is 1500.

Each run is made via "netperf -fM -H ${IP_OF_ULTRA-II}".
I did 3 runs each for 4 different configurations.  The
parameters are "TSO on/off" (sender side) and "TCP rcvbuf
moderation on/off" (receiver side).

With this patch I'm seeing these results:

TSO off + rbuf off:	63.15 MBytes/sec
			64.78 MBytes/sec
			64.53 MBytes/sec

TSO on  + rbuf off:	62.76 MBytes/sec
			63.36 MBytes/sec
			63.79 MBytes/sec

TSO off + rbuf on:	71.98 MBytes/sec
			73.52 MBytes/sec
			73.57 MBytes/sec

TSO on  + rbuf on:	75.70 MBytes/sec
			76.05 MBytes/sec
			75.42 MBytes/sec

The "rbuf off" cases are meant to emulate Andi's 2.6.5
case, and "rbuf on" is current 2.6.x.

How do things look for you with this change Andi?
If things are still out of whack, play around with
different values of /proc/sys/net/ipv4/tcp_tso_win_divisor

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/09/30 20:09:28-07:00 davem@nuts.davemloft.net 
#   [TCP]: Add tcp_tso_win_divisor sysctl.
#   
#   This allows control over what percentage of
#   the congestion window can be consumed by a
#   single TSO frame.
#   
#   The setting of this parameter is a choice
#   between burstiness and building larger TSO
#   frames.
#   
#   Signed-off-by: David S. Miller <davem@davemloft.net>
# 
# net/ipv4/tcp_output.c
#   2004/09/30 20:07:20-07:00 davem@nuts.davemloft.net +19 -7
#   [TCP]: Add tcp_tso_win_divisor sysctl.
# 
# net/ipv4/sysctl_net_ipv4.c
#   2004/09/30 20:07:20-07:00 davem@nuts.davemloft.net +8 -0
#   [TCP]: Add tcp_tso_win_divisor sysctl.
# 
# include/net/tcp.h
#   2004/09/30 20:07:20-07:00 davem@nuts.davemloft.net +1 -0
#   [TCP]: Add tcp_tso_win_divisor sysctl.
# 
# include/linux/sysctl.h
#   2004/09/30 20:07:20-07:00 davem@nuts.davemloft.net +1 -0
#   [TCP]: Add tcp_tso_win_divisor sysctl.
# 
diff -Nru a/include/linux/sysctl.h b/include/linux/sysctl.h
--- a/include/linux/sysctl.h	2004-09-30 20:19:49 -07:00
+++ b/include/linux/sysctl.h	2004-09-30 20:19:49 -07:00
@@ -341,6 +341,7 @@
 	NET_TCP_BIC_LOW_WINDOW=104,
 	NET_TCP_DEFAULT_WIN_SCALE=105,
 	NET_TCP_MODERATE_RCVBUF=106,
+	NET_TCP_TSO_WIN_DIVISOR=107,
 };
 
 enum {
diff -Nru a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h	2004-09-30 20:19:49 -07:00
+++ b/include/net/tcp.h	2004-09-30 20:19:49 -07:00
@@ -609,6 +609,7 @@
 extern int sysctl_tcp_bic_fast_convergence;
 extern int sysctl_tcp_bic_low_window;
 extern int sysctl_tcp_moderate_rcvbuf;
+extern int sysctl_tcp_tso_win_divisor;
 
 extern atomic_t tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
diff -Nru a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
--- a/net/ipv4/sysctl_net_ipv4.c	2004-09-30 20:19:49 -07:00
+++ b/net/ipv4/sysctl_net_ipv4.c	2004-09-30 20:19:49 -07:00
@@ -674,6 +674,14 @@
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	},
+	{
+		.ctl_name	= NET_TCP_TSO_WIN_DIVISOR,
+		.procname	= "tcp_tso_win_divisor",
+		.data		= &sysctl_tcp_tso_win_divisor,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
 	{ .ctl_name = 0 }
 };
 
diff -Nru a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c	2004-09-30 20:19:49 -07:00
+++ b/net/ipv4/tcp_output.c	2004-09-30 20:19:49 -07:00
@@ -45,6 +45,12 @@
 /* People can turn this off for buggy TCP's found in printers etc. */
 int sysctl_tcp_retrans_collapse = 1;
 
+/* This limits the percentage of the congestion window which we
+ * will allow a single TSO frame to consume.  Building TSO frames
+ * which are too large can cause TCP streams to be bursty.
+ */
+int sysctl_tcp_tso_win_divisor = 8;
+
 static __inline__
 void update_send_head(struct sock *sk, struct tcp_opt *tp, struct sk_buff *skb)
 {
@@ -658,7 +664,7 @@
 {
 	struct tcp_opt *tp = tcp_sk(sk);
 	struct dst_entry *dst = __sk_dst_get(sk);
-	int do_large, mss_now;
+	unsigned int do_large, mss_now;
 
 	mss_now = tp->mss_cache_std;
 	if (dst) {
@@ -673,7 +679,7 @@
 		    !tp->urg_mode);
 
 	if (do_large) {
-		int large_mss, factor;
+		unsigned int large_mss, factor, limit;
 
 		large_mss = 65535 - tp->af_specific->net_header_len -
 			tp->ext_header_len - tp->ext2_header_len -
@@ -683,13 +689,19 @@
 			large_mss = max((tp->max_window>>1),
 					68U - tp->tcp_header_len);
 
+		factor = large_mss / mss_now;
+
 		/* Always keep large mss multiple of real mss, but
-		 * do not exceed 1/4 of the congestion window so we
-		 * can keep the ACK clock ticking.
+		 * do not exceed 1/tso_win_divisor of the congestion window
+		 * so we can keep the ACK clock ticking and minimize
+		 * bursting.
 		 */
-		factor = large_mss / mss_now;
-		if (factor > (tp->snd_cwnd >> 2))
-			factor = max(1, tp->snd_cwnd >> 2);
+		limit = tp->snd_cwnd;
+		if (sysctl_tcp_tso_win_divisor)
+			limit /= sysctl_tcp_tso_win_divisor;
+		limit = max(1U, limit);
+		if (factor > limit)
+			factor = limit;
 
 		tp->mss_cache = mss_now * factor;
 

  reply	other threads:[~2004-10-01  3:40 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-20  6:30 bad TSO performance in 2.6.9-rc2-BK Anton Blanchard
2004-09-20 15:54 ` Nivedita Singhvi
2004-09-21 15:55   ` Anton Blanchard
2004-09-20 20:30 ` Andi Kleen
2004-09-21 22:58   ` David S. Miller
2004-09-22 14:00     ` Andi Kleen
2004-09-22 18:12       ` David S. Miller
2004-09-22 19:55         ` Andi Kleen
2004-09-22 20:07           ` Nivedita Singhvi
2004-09-22 20:30             ` David S. Miller
2004-09-22 20:56               ` Nivedita Singhvi
2004-09-22 21:56               ` Andi Kleen
2004-09-22 22:04                 ` David S. Miller
2004-09-22 20:12           ` Andrew Grover
2004-09-22 20:39             ` David S. Miller
2004-09-22 22:06               ` Andi Kleen
2004-09-22 22:25                 ` David S. Miller
2004-09-22 22:47                   ` Andi Kleen
2004-09-22 22:50                     ` David S. Miller
2004-09-23 23:11                     ` David S. Miller
2004-09-23 23:41                       ` Herbert Xu
2004-09-23 23:41                         ` David S. Miller
2004-09-24  0:12                           ` Herbert Xu
2004-09-24  0:40                             ` Herbert Xu
2004-09-24  1:07                               ` Herbert Xu
2004-09-24  1:17                                 ` David S. Miller
2004-09-27  1:27                           ` Herbert Xu
2004-09-27  2:50                             ` Herbert Xu
2004-09-27  4:00                               ` David S. Miller
2004-09-27  5:45                                 ` Herbert Xu
2004-09-27 19:01                                   ` David S. Miller
2004-09-27 21:32                                     ` Herbert Xu
2004-09-28 21:10                                       ` David S. Miller
2004-09-28 21:34                                         ` Andi Kleen
2004-09-28 21:53                                           ` David S. Miller
2004-09-28 22:33                                             ` Andi Kleen
2004-09-28 22:57                                               ` David S. Miller
2004-09-28 23:27                                                 ` Andi Kleen
2004-09-28 23:35                                                   ` David S. Miller
2004-09-28 23:55                                                     ` Andi Kleen
2004-09-29  0:04                                                       ` David S. Miller
2004-09-29 20:58                                                   ` John Heffner
2004-09-29 21:10                                                     ` Nivedita Singhvi
2004-09-29 21:50                                                       ` David S. Miller
2004-09-29 21:56                                                         ` Andi Kleen
2004-09-29 23:29                                                           ` David S. Miller
2004-09-29 23:51                                                             ` John Heffner
2004-09-30  0:03                                                               ` David S. Miller
2004-09-30  0:10                                                                 ` Herbert Xu
2004-10-01  0:34                                                                   ` David S. Miller
2004-10-01  1:12                                                                     ` David S. Miller
2004-10-01  3:40                                                                       ` David S. Miller [this message]
2004-10-01 10:35                                                                         ` Andi Kleen
2004-10-01 10:23                                                                       ` Andi Kleen
2004-09-30  0:10                                                               ` John Heffner
2004-09-30 17:25                                                                 ` John Heffner
2004-09-30 20:23                                                                   ` David S. Miller
2004-09-30  0:05                                                             ` Herbert Xu
2004-09-30  4:33                                                               ` David S. Miller
2004-09-30  5:47                                                                 ` Herbert Xu
2004-09-30  7:39                                                                   ` David S. Miller
2004-09-30  8:09                                                                     ` Herbert Xu
2004-09-30  9:29                                                                 ` Andi Kleen
2004-09-30 20:20                                                                   ` David S. Miller
2004-09-29  3:27                                               ` John Heffner
2004-09-29  9:01                                                 ` Andi Kleen
2004-09-29 19:56                                                   ` David S. Miller
2004-09-29 20:56                                                     ` Andi Kleen
2004-09-29 21:17                                                       ` David S. Miller
2004-09-29 21:00                                                 ` David S. Miller
2004-09-29 21:16                                                   ` Nivedita Singhvi
2004-09-29 21:22                                                     ` David S. Miller
2004-09-29 21:43                                                       ` Andi Kleen
2004-09-29 21:51                                                         ` John Heffner
2004-09-29 21:52                                                           ` David S. Miller
2004-09-24  8:30                       ` Andi Kleen
2004-09-27 22:38                       ` John Heffner
2004-09-27 23:04                         ` David S. Miller
2004-09-27 23:25                           ` Andi Kleen
2004-09-27 23:37                             ` David S. Miller
2004-09-27 23:51                               ` Andi Kleen
2004-09-28  0:15                                 ` David S. Miller
2004-09-27 23:36                           ` Herbert Xu
2004-09-28  0:13                             ` David S. Miller
2004-09-28  0:34                               ` Herbert Xu
2004-09-28  4:59                                 ` David S. Miller
2004-09-28  5:15                                   ` Herbert Xu
2004-09-28  5:58                                     ` David S. Miller
2004-09-28  6:45                                   ` Nivedita Singhvi
2004-09-28  7:20                               ` Nivedita Singhvi
2004-09-28 20:38                                 ` David S. Miller
2004-09-28  7:23                         ` Nivedita Singhvi
2004-09-28  8:23                           ` Herbert Xu
2004-09-28 12:53                           ` John Heffner
2004-09-22 20:28           ` David S. Miller
     [not found] <Pine.NEB.4.33.0409301625560.13549-100000@dexter.psc.edu>
2004-10-02  1:32 ` John Heffner
2004-10-04 20:07   ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040930204005.69115c0e.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=ak@suse.de \
    --cc=andy.grover@gmail.com \
    --cc=anton@samba.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jheffner@psc.edu \
    --cc=netdev@oss.sgi.com \
    --cc=niv@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).