using software TSO on non-TSO capable netdevices

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* using software TSO on non-TSO capable netdevices
@ 2008-07-30 23:50 Lennert Buytenhek
  2008-07-30 23:56 ` David Miller
  2008-07-31 17:00 ` Rick Jones
  0 siblings, 2 replies; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-30 23:50 UTC (permalink / raw)
  To: netdev; +Cc: Ashish Karkare, Nicolas Pitre

Hi,

I've been doing some network throughput tests with a NIC (mv643xx_eth)
that does not support TSO/GSO in hardware.  The host CPU is an ARM CPU
that is pretty fast as far as ARM CPUs go (1.2 GHz), but not so fast
when compared to x86s.

When using sendfile() to send a GiB worth of zeroes over a single TCP
connection to another host on a 100 Mb/s network, with a vanilla
2.6.27-rc1 kernel, this runs as expected at wire speed, taking the
following amount of CPU time per test:

	sys     0m5.410s
	sys     0m5.380s
	sys     0m5.620s
	sys     0m5.360s

With this patch:

	Index: linux-2.6.27-rc1/include/net/sock.h
	===================================================================
	--- linux-2.6.27-rc1.orig/include/net/sock.h
	+++ linux-2.6.27-rc1/include/net/sock.h
	@@ -1085,7 +1085,8 @@ extern struct dst_entry *sk_dst_check(st

	 static inline int sk_can_gso(const struct sock *sk)
	 {
	-	return net_gso_ok(sk->sk_route_caps, sk->sk_gso_type);
	+//	return net_gso_ok(sk->sk_route_caps, sk->sk_gso_type);
	+	return 1;
	 }

	 extern void sk_setup_caps(struct sock *sk, struct dst_entry *dst);

The CPU utilisation numbers drop to:

	sys     0m3.280s
	sys     0m3.230s
	sys     0m3.220s
	sys     0m3.350s

Putting some debug code in net/core/dev.c:dev_hard_start_xmit(), I can
see that pretty much all of the segments that enter there to be GSOd in
software are full-sized (64 KiB-ish).

When the ethernet link is in 1000 Mb/s mode, the test seems CPU-bound,
and things look a little different.  With vanilla 2.6.27-rc1, I get
these numbers for the same 1 GiB sendfile() test, where real time ~=
sys time:

	sys     0m18.200s
	sys     0m18.260s
	sys     0m17.830s
	sys     0m17.670s
	sys     0m17.840s
	sys     0m17.670s
	sys     0m17.300s
	sys     0m17.860s
	sys     0m18.260s
	sys     0m17.150s
	sys     0m17.950s

With the patch above applied once again, I get:

	real    0m16.319s       sys     0m13.930s
	real    0m15.680s       sys     0m14.900s
	real    0m15.538s       sys     0m10.410s
	real    0m15.325s       sys     0m8.440s
	real    0m16.147s       sys     0m12.680s
	real    0m15.549s       sys     0m12.840s
	real    0m15.667s       sys     0m13.860s
	real    0m15.509s       sys     0m14.980s
	real    0m15.237s       sys     0m10.850s

While the wall clock time isn't much improved (hitting some kind of
internal bus bandwidth or DMA latency limitation in the hardware?),
the system time is improved, although the improvement is jittery.

In general, when the link is at 1000 Mb/s, skb_shinfo(skb)->gso_segs
of 99.99% of the skbs sent to net/core/dev.c:dev_hard_start_xmit()
is either 2 or 3 in dev_hard_start_xmit() (which seems to be cwnd
limited), unlike the 44 I see when the link is in 100 Mb/s mode.

I.e. with the patch below, 100 Mb/s, the output during steady state
is always something like this, i.e. skb_shinfo(skb)->gso_segs is
always 44:

	Jul 31 00:12:59 kw kernel: 10k seg: 44:10000
	Jul 31 00:12:59 kw kernel: 10k size: 127:10000
	Jul 31 00:13:00 kw kernel: 10k seg: 44:10000
	Jul 31 00:13:00 kw kernel: 10k size: 127:10000
	Jul 31 00:13:02 kw kernel: 10k seg: 44:10000
	Jul 31 00:13:02 kw kernel: 10k size: 127:10000
	Jul 31 00:13:04 kw kernel: 10k seg: 44:10000
	Jul 31 00:13:04 kw kernel: 10k size: 127:10000
	Jul 31 00:13:05 kw kernel: 10k seg: 44:10000
	Jul 31 00:13:05 kw kernel: 10k size: 127:10000

With the same patch, 1000 Mb/s, the output is something like this (the
2-seg:3-seg ratio varies between runs but is typically pretty constant
within the same run, this is from one particular run):

	Jul 31 00:57:56 kw kernel: 10k seg: 2:4592 3:5408 
	Jul 31 00:57:56 kw kernel: 10k size: 5:4592 8:5408 
	Jul 31 00:57:56 kw kernel: 10k seg: 2:4513 3:5487 
	Jul 31 00:57:56 kw kernel: 10k size: 5:4513 8:5487 
	Jul 31 00:57:57 kw kernel: 10k seg: 2:4575 3:5425 
	Jul 31 00:57:57 kw kernel: 10k size: 5:4575 8:5425 
	Jul 31 00:57:58 kw kernel: 10k seg: 2:4569 3:5431 
	Jul 31 00:57:58 kw kernel: 10k size: 5:4569 8:5431 
	Jul 31 00:57:58 kw kernel: 10k seg: 2:4581 3:5419 
	Jul 31 00:57:58 kw kernel: 10k size: 5:4581 8:5419
	Jul 31 00:57:59 kw kernel: 10k seg: 2:4583 3:5417
	Jul 31 00:57:59 kw kernel: 10k size: 5:4583 8:5417

Given this, I'm wondering about the following:

1. Considering the drop in CPU utilisation, are there reasons not
   to use software GSO on non-hardware-GSO-capable netdevices (apart
   from GSO possibly confusing tcpdump/iptables/qdiscs/etc)?

2. Why is the number of cycles necessary to send 1 GiB of data so
   much higher (~3.5x higher) in 1000 Mb/s mode than in 100 Mb/s mode?
   (Is this maybe just because time(1) is inaccurate w.r.t. time spent
   in interrupts and such?)

3. Why does dev_hard_start_xmit() get sent 64 KiB segments when the
   link is in 100 Mb/s mode but gso_segs never grows beyond 3 when
   the link is in 1000 Mb/s mode?

Any more thoughts about this or things I can try?  Any other ideas
to speed up the 1000 Mb/s case?

thanks,
Lennert

Index: linux-2.6.27-rc1/net/core/dev.c
===================================================================
--- linux-2.6.27-rc1.orig/net/core/dev.c
+++ linux-2.6.27-rc1/net/core/dev.c
@@ -1633,6 +1633,58 @@ int dev_hard_start_xmit(struct sk_buff *
 	}

 gso:
+	if (1) {
+		static int samples;
+		static int segment_histo[45];
+		int segments = 0;
+
+		segments = skb_shinfo(skb)->gso_segs;
+		if (segments > 44)
+			segments = 44;
+		segment_histo[segments]++;
+
+		if (++samples == 10000) {
+			int i;
+
+			samples = 0;
+
+			printk(KERN_CRIT "10k seg: ");
+			for (i = 0; i < 45; i++) {
+				if (segment_histo[i]) {
+					printk("%d:%d ", i, segment_histo[i]);
+					segment_histo[i] = 0;
+				}
+			}
+			printk("\n");
+		}
+	}
+
+	if (1) {
+		static int samples;
+		static int size_histo[150];
+		int len = 0;
+
+		len = skb->len >> 9;
+		if (len > 149)
+			len = 149;
+		size_histo[len]++;
+
+		if (++samples == 10000) {
+			int i;
+
+			samples = 0;
+
+			printk(KERN_CRIT "10k size: ");
+			for (i = 0; i < 150; i++) {
+				if (size_histo[i]) {
+					printk("%d:%d ", i, size_histo[i]);
+					size_histo[i] = 0;
+				}
+			}
+			printk("\n");
+		}
+	}
+
 	do {
 		struct sk_buff *nskb = skb->next;
 		int rc;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-30 23:50 using software TSO on non-TSO capable netdevices Lennert Buytenhek
@ 2008-07-30 23:56 ` David Miller
  2008-07-31  0:41   ` Lennert Buytenhek
  2008-07-31 17:00 ` Rick Jones
  1 sibling, 1 reply; 35+ messages in thread
From: David Miller @ 2008-07-30 23:56 UTC (permalink / raw)
  To: buytenh; +Cc: netdev, akarkare, nico

From: Lennert Buytenhek <buytenh@wantstofly.org>
Date: Thu, 31 Jul 2008 01:50:04 +0200

Thanks for all the great data and testing.

> Given this, I'm wondering about the following:
> 
> 1. Considering the drop in CPU utilisation, are there reasons not
>    to use software GSO on non-hardware-GSO-capable netdevices (apart
>    from GSO possibly confusing tcpdump/iptables/qdiscs/etc)?

We should probably enable software GSO whenever the device can
do scatter-gather and checksum offload.

> 2. Why is the number of cycles necessary to send 1 GiB of data so
>    much higher (~3.5x higher) in 1000 Mb/s mode than in 100 Mb/s mode?
>    (Is this maybe just because time(1) is inaccurate w.r.t. time spent
>    in interrupts and such?)

This I have no idea about.

> 3. Why does dev_hard_start_xmit() get sent 64 KiB segments when the
>    link is in 100 Mb/s mode but gso_segs never grows beyond 3 when
>    the link is in 1000 Mb/s mode?

Because the link can empty the socket send buffer fast enough such
that there is often not enough data to coalesce into larger GSO frames.
At least that's my guess.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-30 23:56 ` David Miller
@ 2008-07-31  0:41   ` Lennert Buytenhek
  2008-07-31  1:10     ` David Miller
  2008-07-31  2:29     ` Herbert Xu
  0 siblings, 2 replies; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-31  0:41 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, akarkare, nico

On Wed, Jul 30, 2008 at 04:56:21PM -0700, David Miller wrote:

> Thanks for all the great data and testing.

Thanks for taking the time to look at this and replying so quickly!


> > Given this, I'm wondering about the following:
> > 
> > 1. Considering the drop in CPU utilisation, are there reasons not
> >    to use software GSO on non-hardware-GSO-capable netdevices (apart
> >    from GSO possibly confusing tcpdump/iptables/qdiscs/etc)?
> 
> We should probably enable software GSO whenever the device can
> do scatter-gather and checksum offload.

OK.


> > 3. Why does dev_hard_start_xmit() get sent 64 KiB segments when the
> >    link is in 100 Mb/s mode but gso_segs never grows beyond 3 when
> >    the link is in 1000 Mb/s mode?
> 
> Because the link can empty the socket send buffer fast enough such
> that there is often not enough data to coalesce into larger GSO frames.
> At least that's my guess.

Hmmmm.

The hacky patch below (on top of 2.6.27-rc1 + stubbing out the
sk_can_gso() check) reduces the 1 GiB 1000 Mb/s sendfile test from:

	real	0m16.319s	sys	0m13.930s
	real	0m15.680s	sys	0m14.900s
	real	0m15.538s	sys	0m10.410s
	real	0m15.325s	sys	0m8.440s
	real	0m16.147s	sys	0m12.680s
	real	0m15.549s	sys	0m12.840s
	real	0m15.667s	sys	0m13.860s
	real	0m15.509s	sys	0m14.980s
	real	0m15.237s	sys	0m10.850s

to:

	real	0m14.643s	sys	0m3.260s
	real    0m14.547s	sys     0m3.100s
	real    0m14.932s	sys     0m3.290s
	real    0m14.557s	sys     0m3.160s
	real    0m14.712s	sys     0m3.260s
	real    0m14.827s	sys     0m3.360s
	real    0m14.495s	sys     0m3.200s
	real    0m14.575s	sys     0m3.220s
	real    0m14.552s	sys     0m3.420s

(I'm sure there's a better way to enforce larger GSO frames, I don't
know the TCP stack too well.)

I.e. dramatic CPU time improvements, and some overall speedup as well.

I wonder if something like this can be done in a less hacky fashion --
the hard part I guess is deciding when to keep coalescing (to reduce
CPU overhead) vs. when to push out what has been coalesced so far (in
order to keep the pipe filled), and I'm not sure I have good ideas
about how to make that decision.



Index: linux-2.6.27-rc1/net/ipv4/tcp_output.c
===================================================================
--- linux-2.6.27-rc1.orig/net/ipv4/tcp_output.c
+++ linux-2.6.27-rc1/net/ipv4/tcp_output.c
@@ -1544,7 +1544,7 @@ static int tcp_write_xmit(struct sock *s
 			break;
 
 		if (tso_segs == 1) {
-			if (unlikely(!tcp_nagle_test(tp, skb, mss_now,
+			if (unlikely(!tcp_nagle_test(tp, skb, 5 * mss_now,
 						     (tcp_skb_is_last(sk, skb) ?
 						      nonagle : TCP_NAGLE_PUSH))))
 				break;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  0:41   ` Lennert Buytenhek
@ 2008-07-31  1:10     ` David Miller
  2008-07-31  1:45       ` Lennert Buytenhek
  2008-07-31  7:34       ` Ilpo Järvinen
  2008-07-31  2:29     ` Herbert Xu
  1 sibling, 2 replies; 35+ messages in thread
From: David Miller @ 2008-07-31  1:10 UTC (permalink / raw)
  To: buytenh; +Cc: netdev, akarkare, nico, herbert

From: Lennert Buytenhek <buytenh@wantstofly.org>
Date: Thu, 31 Jul 2008 02:41:23 +0200

> The hacky patch below (on top of 2.6.27-rc1 + stubbing out the
> sk_can_gso() check) reduces the 1 GiB 1000 Mb/s sendfile test from:
 ...
> I.e. dramatic CPU time improvements, and some overall speedup as well.
> 
> I wonder if something like this can be done in a less hacky fashion --
> the hard part I guess is deciding when to keep coalescing (to reduce
> CPU overhead) vs. when to push out what has been coalesced so far (in
> order to keep the pipe filled), and I'm not sure I have good ideas
> about how to make that decision.

Interesting, I'll take a closer look at this.

Actually your patch is less of a surprise, because one of the issues I
had to surmount constantly when rewriting the TSO output path was the
implicit conflict between TSO deferral (to accumulate segments) and
the nagle logic.

Anyways, thanks, I'll think about this patch and your data and see
where we can go with that part.

In the mean time could you possibly cook up a cleaned up "use software
GSO for SG+CSUM capable devices" patch?  I think I want to apply it,
especially since all of the things Herbert and I have suspected is
completely confirmed by your data and tests.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  1:10     ` David Miller
@ 2008-07-31  1:45       ` Lennert Buytenhek
  2008-07-31  3:54         ` Herbert Xu
  2008-07-31  7:34       ` Ilpo Järvinen
  1 sibling, 1 reply; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-31  1:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, akarkare, nico, herbert

On Wed, Jul 30, 2008 at 06:10:47PM -0700, David Miller wrote:

> > The hacky patch below (on top of 2.6.27-rc1 + stubbing out the
> > sk_can_gso() check) reduces the 1 GiB 1000 Mb/s sendfile test from:
>  ...
> > I.e. dramatic CPU time improvements, and some overall speedup as well.
> > 
> > I wonder if something like this can be done in a less hacky fashion --
> > the hard part I guess is deciding when to keep coalescing (to reduce
> > CPU overhead) vs. when to push out what has been coalesced so far (in
> > order to keep the pipe filled), and I'm not sure I have good ideas
> > about how to make that decision.
> 
> Interesting, I'll take a closer look at this.
> 
> Actually your patch is less of a surprise, because one of the issues I
> had to surmount constantly when rewriting the TSO output path was the
> implicit conflict between TSO deferral (to accumulate segments) and
> the nagle logic.
> 
> Anyways, thanks, I'll think about this patch and your data and see
> where we can go with that part.
> 
> In the mean time could you possibly cook up a cleaned up "use software
> GSO for SG+CSUM capable devices" patch?  I think I want to apply it,
> especially since all of the things Herbert and I have suspected is
> completely confirmed by your data and tests.

OK, how about:



From: Lennert Buytenhek <buytenh@wantstofly.org>
Subject: [NET] use software GSO for SG+CSUM capable netdevices

If a netdevice does not support hardware GSO, allowing the stack to
use GSO anyway and then splitting the GSO skb into MSS-sized pieces
as it is handed to the netdevice for transmitting is likely still
a win at least as far as CPU usage is concerned, since it reduces
the number of trips through the output path.

This patch enables the use of GSO on any netdevice that supports SG
and hardware checksumming.  If a GSO skb is then sent to a netdevice
that supports SG and checksumming but does not support hardware GSO,
net/core/dev.c:dev_hard_start_xmit() will take care of doing the
necessary GSO segmentation in software.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>

Index: linux-2.6.27-rc1/include/net/sock.h
===================================================================
--- linux-2.6.27-rc1.orig/include/net/sock.h
+++ linux-2.6.27-rc1/include/net/sock.h
@@ -1085,7 +1085,13 @@ extern struct dst_entry *sk_dst_check(st
 
 static inline int sk_can_gso(const struct sock *sk)
 {
-	return net_gso_ok(sk->sk_route_caps, sk->sk_gso_type);
+	int caps = sk->sk_route_caps;
+	int type = sk->sk_gso_type;
+
+	return (caps & NETIF_F_SG) &&
+		 (((type == SKB_GSO_TCPV4 || type == SKB_GSO_UDP) &&
+			(caps & NETIF_F_V4_CSUM)) ||
+		  (type == SKB_GSO_TCPV6 && (caps & NETIF_F_V6_CSUM)));
 }
 
 extern void sk_setup_caps(struct sock *sk, struct dst_entry *dst);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  1:45       ` Lennert Buytenhek
@ 2008-07-31  3:54         ` Herbert Xu
  2008-07-31  9:45           ` Lennert Buytenhek
  0 siblings, 1 reply; 35+ messages in thread
From: Herbert Xu @ 2008-07-31  3:54 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: David Miller, netdev, akarkare, nico

On Thu, Jul 31, 2008 at 03:45:06AM +0200, Lennert Buytenhek wrote:
>
> +	return (caps & NETIF_F_SG) &&
> +		 (((type == SKB_GSO_TCPV4 || type == SKB_GSO_UDP) &&

Nobody has written software UFO yet.  Let's just stick TCP for
now.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  3:54         ` Herbert Xu
@ 2008-07-31  9:45           ` Lennert Buytenhek
  2008-07-31 10:55             ` Herbert Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-31  9:45 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, netdev, akarkare, nico

On Thu, Jul 31, 2008 at 11:54:10AM +0800, Herbert Xu wrote:

> > +	return (caps & NETIF_F_SG) &&
> > +		 (((type == SKB_GSO_TCPV4 || type == SKB_GSO_UDP) &&
> 
> Nobody has written software UFO yet.  Let's just stick TCP for
> now.

OK, how about:



From: Lennert Buytenhek <buytenh@wantstofly.org>
Subject: [NET] use software GSO for SG+CSUM capable netdevices

If a netdevice does not support hardware GSO, allowing the stack to
use GSO anyway and then splitting the GSO skb into MSS-sized pieces
as it is handed to the netdevice for transmitting is likely still
a win at least as far as CPU usage is concerned, since it reduces
the number of trips through the output path.

This patch enables the use of GSO on any netdevice that supports SG
and hardware checksumming.  If a GSO skb is then sent to a netdevice
that supports SG and checksumming but does not support hardware GSO,
net/core/dev.c:dev_hard_start_xmit() will take care of doing the
necessary GSO segmentation in software.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>

Index: linux-2.6.27-rc1/include/net/sock.h
===================================================================
--- linux-2.6.27-rc1.orig/include/net/sock.h
+++ linux-2.6.27-rc1/include/net/sock.h
@@ -1085,7 +1085,12 @@ extern struct dst_entry *sk_dst_check(st
 
 static inline int sk_can_gso(const struct sock *sk)
 {
-	return net_gso_ok(sk->sk_route_caps, sk->sk_gso_type);
+	int caps = sk->sk_route_caps;
+	int type = sk->sk_gso_type;
+
+	return (caps & NETIF_F_SG) &&
+		 ((type == SKB_GSO_TCPV4 && (caps & NETIF_F_V4_CSUM)) ||
+		  (type == SKB_GSO_TCPV6 && (caps & NETIF_F_V6_CSUM)));
 }
 
 extern void sk_setup_caps(struct sock *sk, struct dst_entry *dst);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  9:45           ` Lennert Buytenhek
@ 2008-07-31 10:55             ` Herbert Xu
  2008-07-31 12:37               ` Lennert Buytenhek
  0 siblings, 1 reply; 35+ messages in thread
From: Herbert Xu @ 2008-07-31 10:55 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: David Miller, netdev, akarkare, nico

On Thu, Jul 31, 2008 at 11:45:36AM +0200, Lennert Buytenhek wrote:
>
> From: Lennert Buytenhek <buytenh@wantstofly.org>
> Subject: [NET] use software GSO for SG+CSUM capable netdevices
> 
> If a netdevice does not support hardware GSO, allowing the stack to
> use GSO anyway and then splitting the GSO skb into MSS-sized pieces
> as it is handed to the netdevice for transmitting is likely still
> a win at least as far as CPU usage is concerned, since it reduces
> the number of trips through the output path.
> 
> This patch enables the use of GSO on any netdevice that supports SG
> and hardware checksumming.  If a GSO skb is then sent to a netdevice
> that supports SG and checksumming but does not support hardware GSO,
> net/core/dev.c:dev_hard_start_xmit() will take care of doing the
> necessary GSO segmentation in software.
> 
> Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
> 
> Index: linux-2.6.27-rc1/include/net/sock.h
> ===================================================================
> --- linux-2.6.27-rc1.orig/include/net/sock.h
> +++ linux-2.6.27-rc1/include/net/sock.h
> @@ -1085,7 +1085,12 @@ extern struct dst_entry *sk_dst_check(st
>  
>  static inline int sk_can_gso(const struct sock *sk)
>  {
> -	return net_gso_ok(sk->sk_route_caps, sk->sk_gso_type);
> +	int caps = sk->sk_route_caps;
> +	int type = sk->sk_gso_type;
> +
> +	return (caps & NETIF_F_SG) &&
> +		 ((type == SKB_GSO_TCPV4 && (caps & NETIF_F_V4_CSUM)) ||
> +		  (type == SKB_GSO_TCPV6 && (caps & NETIF_F_V6_CSUM)));

I think you've lost the hardware UFO support.

In any case, this is really the wrong place to do this as the
user will no longer be able to disable it.

Please do it in the netdev registration function instead.  The code
should enable NETIF_F_GSO if NETIF_F_SG is on.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31 10:55             ` Herbert Xu
@ 2008-07-31 12:37               ` Lennert Buytenhek
  2008-07-31 12:59                 ` Herbert Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-31 12:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, netdev, akarkare, nico

On Thu, Jul 31, 2008 at 06:55:22PM +0800, Herbert Xu wrote:

> > From: Lennert Buytenhek <buytenh@wantstofly.org>
> > Subject: [NET] use software GSO for SG+CSUM capable netdevices
> > 
> > If a netdevice does not support hardware GSO, allowing the stack to
> > use GSO anyway and then splitting the GSO skb into MSS-sized pieces
> > as it is handed to the netdevice for transmitting is likely still
> > a win at least as far as CPU usage is concerned, since it reduces
> > the number of trips through the output path.
> > 
> > This patch enables the use of GSO on any netdevice that supports SG
> > and hardware checksumming.  If a GSO skb is then sent to a netdevice
> > that supports SG and checksumming but does not support hardware GSO,
> > net/core/dev.c:dev_hard_start_xmit() will take care of doing the
> > necessary GSO segmentation in software.
> > 
> > Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
> > 
> > Index: linux-2.6.27-rc1/include/net/sock.h
> > ===================================================================
> > --- linux-2.6.27-rc1.orig/include/net/sock.h
> > +++ linux-2.6.27-rc1/include/net/sock.h
> > @@ -1085,7 +1085,12 @@ extern struct dst_entry *sk_dst_check(st
> >  
> >  static inline int sk_can_gso(const struct sock *sk)
> >  {
> > -	return net_gso_ok(sk->sk_route_caps, sk->sk_gso_type);
> > +	int caps = sk->sk_route_caps;
> > +	int type = sk->sk_gso_type;
> > +
> > +	return (caps & NETIF_F_SG) &&
> > +		 ((type == SKB_GSO_TCPV4 && (caps & NETIF_F_V4_CSUM)) ||
> > +		  (type == SKB_GSO_TCPV6 && (caps & NETIF_F_V6_CSUM)));
> 
> I think you've lost the hardware UFO support.
> 
> In any case, this is really the wrong place to do this as the
> user will no longer be able to disable it.
> 
> Please do it in the netdev registration function instead.  The code
> should enable NETIF_F_GSO if NETIF_F_SG is on.

Like this?




From: Lennert Buytenhek <buytenh@wantstofly.org>
Subject: [NET] use software GSO for SG+CSUM capable netdevices

If a netdevice does not support hardware GSO, allowing the stack to
use GSO anyway and then splitting the GSO skb into MSS-sized pieces
as it is handed to the netdevice for transmitting is likely still
a win as far as throughput and/or CPU usage are concerned, since it
reduces the number of trips through the output path.

This patch enables the use of GSO on any netdevice that supports SG.
If a GSO skb is then sent to a netdevice that supports SG but does not
support hardware GSO, net/core/dev.c:dev_hard_start_xmit() will take
care of doing the necessary GSO segmentation in software.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>

Index: linux-2.6.27-rc1/net/core/dev.c
===================================================================
--- linux-2.6.27-rc1.orig/net/core/dev.c
+++ linux-2.6.27-rc1/net/core/dev.c
@@ -3988,6 +3988,10 @@ int register_netdevice(struct net_device
 		}
 	}
 
+	/* Enable software GSO if SG is supported. */
+	if (dev->features & NETIF_F_SG)
+		dev->features |= NETIF_F_GSO;
+
 	netdev_initialize_kobject(dev);
 	ret = netdev_register_kobject(dev);
 	if (ret)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31 12:37               ` Lennert Buytenhek
@ 2008-07-31 12:59                 ` Herbert Xu
  2008-08-03  8:23                   ` David Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Herbert Xu @ 2008-07-31 12:59 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: David Miller, netdev, akarkare, nico

On Thu, Jul 31, 2008 at 02:37:31PM +0200, Lennert Buytenhek wrote:
>
> Like this?

Yes that ought to do the trick.

> From: Lennert Buytenhek <buytenh@wantstofly.org>
> Subject: [NET] use software GSO for SG+CSUM capable netdevices
> 
> If a netdevice does not support hardware GSO, allowing the stack to
> use GSO anyway and then splitting the GSO skb into MSS-sized pieces
> as it is handed to the netdevice for transmitting is likely still
> a win as far as throughput and/or CPU usage are concerned, since it
> reduces the number of trips through the output path.
> 
> This patch enables the use of GSO on any netdevice that supports SG.
> If a GSO skb is then sent to a netdevice that supports SG but does not
> support hardware GSO, net/core/dev.c:dev_hard_start_xmit() will take
> care of doing the necessary GSO segmentation in software.
> 
> Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31 12:59                 ` Herbert Xu
@ 2008-08-03  8:23                   ` David Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2008-08-03  8:23 UTC (permalink / raw)
  To: herbert; +Cc: buytenh, netdev, akarkare, nico

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 31 Jul 2008 20:59:03 +0800

> On Thu, Jul 31, 2008 at 02:37:31PM +0200, Lennert Buytenhek wrote:
> >
> > Like this?
> 
> Yes that ought to do the trick.
> 
> > From: Lennert Buytenhek <buytenh@wantstofly.org>
> > Subject: [NET] use software GSO for SG+CSUM capable netdevices
> > 
> > If a netdevice does not support hardware GSO, allowing the stack to
> > use GSO anyway and then splitting the GSO skb into MSS-sized pieces
> > as it is handed to the netdevice for transmitting is likely still
> > a win as far as throughput and/or CPU usage are concerned, since it
> > reduces the number of trips through the output path.
> > 
> > This patch enables the use of GSO on any netdevice that supports SG.
> > If a GSO skb is then sent to a netdevice that supports SG but does not
> > support hardware GSO, net/core/dev.c:dev_hard_start_xmit() will take
> > care of doing the necessary GSO segmentation in software.
> > 
> > Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks everyone!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  1:10     ` David Miller
  2008-07-31  1:45       ` Lennert Buytenhek
@ 2008-07-31  7:34       ` Ilpo Järvinen
  2008-07-31  9:50         ` Lennert Buytenhek
  1 sibling, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-07-31  7:34 UTC (permalink / raw)
  To: David Miller; +Cc: buytenh, Netdev, akarkare, nico, Herbert Xu

On Wed, 30 Jul 2008, David Miller wrote:

> From: Lennert Buytenhek <buytenh@wantstofly.org>
> Date: Thu, 31 Jul 2008 02:41:23 +0200
> 
> > The hacky patch below (on top of 2.6.27-rc1 + stubbing out the
> > sk_can_gso() check) reduces the 1 GiB 1000 Mb/s sendfile test from:
>  ...
> > I.e. dramatic CPU time improvements, and some overall speedup as well.
> > 
> > I wonder if something like this can be done in a less hacky fashion --
> > the hard part I guess is deciding when to keep coalescing (to reduce
> > CPU overhead) vs. when to push out what has been coalesced so far (in
> > order to keep the pipe filled), and I'm not sure I have good ideas
> > about how to make that decision.
> 
> Interesting, I'll take a closer look at this.
> 
> Actually your patch is less of a surprise, because one of the issues I
> had to surmount constantly when rewriting the TSO output path was the
> implicit conflict between TSO deferral (to accumulate segments) and
> the nagle logic.

I think your statement makes very little sense to me (though I had to 
lookup the meaning of surmount but that seems not so significant 
anyway)... They both work into the same direction, ie., to delay sending 
to prevent excessive processing of small bits, but the region of operation 
shouldn't overlap (nagle works with <mss, and tso deferring logic 
basically begins from where the nagle ends)?

It seems to me that this not about conflict between TSO deferring and 
nagle sub-mss logic at all (perhaps there wasn't as direct relation to 
this issue as I read...?) AFAICT, the change only makes (!nonagle && 
tp->packets_out && tcp_minshall_check(tp)) test in tcp_nagle_check more 
likely to occur (and result in false), ie., basically we end up using 
nagle test also to prevent sending of >= mss skbs, besides the usual 
functionality which is to prevent sending in case of < mss sized ones. 
...Which seems just an extension to what we checked for in 
tcp_tso_should_defer().

So it guess that the results just show that there's benefit in deferring 
even more than we do currently.

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  7:34       ` Ilpo Järvinen
@ 2008-07-31  9:50         ` Lennert Buytenhek
  2008-07-31 10:27           ` Ilpo Järvinen
  0 siblings, 1 reply; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-31  9:50 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, Netdev, akarkare, nico, Herbert Xu

On Thu, Jul 31, 2008 at 10:34:13AM +0300, Ilpo Järvinen wrote:

> > > The hacky patch below (on top of 2.6.27-rc1 + stubbing out the
> > > sk_can_gso() check) reduces the 1 GiB 1000 Mb/s sendfile test from:
> >  ...
> > > I.e. dramatic CPU time improvements, and some overall speedup as well.
> > > 
> > > I wonder if something like this can be done in a less hacky fashion --
> > > the hard part I guess is deciding when to keep coalescing (to reduce
> > > CPU overhead) vs. when to push out what has been coalesced so far (in
> > > order to keep the pipe filled), and I'm not sure I have good ideas
> > > about how to make that decision.
> > 
> > Interesting, I'll take a closer look at this.
> > 
> > Actually your patch is less of a surprise, because one of the issues I
> > had to surmount constantly when rewriting the TSO output path was the
> > implicit conflict between TSO deferral (to accumulate segments) and
> > the nagle logic.
> 
> I think your statement makes very little sense to me (though I had to 
> lookup the meaning of surmount but that seems not so significant 
> anyway)... They both work into the same direction, ie., to delay sending 
> to prevent excessive processing of small bits, but the region of operation 
> shouldn't overlap (nagle works with <mss, and tso deferring logic 
> basically begins from where the nagle ends)?
> 
> It seems to me that this not about conflict between TSO deferring and 
> nagle sub-mss logic at all (perhaps there wasn't as direct relation to 
> this issue as I read...?) AFAICT, the change only makes (!nonagle && 
> tp->packets_out && tcp_minshall_check(tp)) test in tcp_nagle_check more 
> likely to occur (and result in false), ie., basically we end up using 
> nagle test also to prevent sending of >= mss skbs, besides the usual 
> functionality which is to prevent sending in case of < mss sized ones. 
> ...Which seems just an extension to what we checked for in 
> tcp_tso_should_defer().

I wanted a way to get larger GSO segments, and the idea was to rig
the nagle check to consider sub-N*mss frames as small frames and not
let more than one of them into the pipe at any given time.  I don't
know whether the change I made accomplishes exactly that, but it did
end up giving me larger GSO segments, which was the goal.

It makes the GSO segment size distribution pretty chaotic, though:

10k seg: 2:851 3:430 4:3385 5:330 6:3611 7:382 8:949 9:18 10:43 11:1
10k size: 5:851 8:430 11:3385 14:330 17:3611 19:382 22:949 25:18 28:43 31:1
10k seg: 2:1952 3:410 4:2855 5:340 6:2956 7:356 8:1059 9:24 10:48
10k size: 5:1952 8:410 11:2855 14:340 17:2956 19:356 22:1059 25:24 28:48
10k seg: 2:1036 3:569 4:4824 5:369 6:2241 7:251 8:643 9:20 10:46 11:1
10k size: 5:1036 8:569 11:4824 14:369 17:2241 19:251 22:643 25:20 28:46 31:1
10k seg: 2:1270 3:408 4:3686 5:350 6:2910 7:319 8:988 9:15 10:54
10k size: 5:1270 8:408 11:3686 14:350 17:2910 19:319 22:988 25:15 28:54
10k seg: 2:870 3:407 4:4211 5:380 6:3392 7:286 8:389 9:20 10:45
10k size: 5:870 8:407 11:4211 14:380 17:3392 19:286 22:389 25:20 28:45
10k seg: 2:1217 3:411 4:3542 5:315 6:3263 7:348 8:832 9:23 10:48 11:1
10k size: 5:1217 8:411 11:3542 14:315 17:3263 19:348 22:832 25:23 28:48 31:1

("10k seg" numbers are the distribution of gso_segs for 10k skbuffs,
and "10k size" are the distribution of skb->len >> 9 for 10k skbuffs.)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  9:50         ` Lennert Buytenhek
@ 2008-07-31 10:27           ` Ilpo Järvinen
  0 siblings, 0 replies; 35+ messages in thread
From: Ilpo Järvinen @ 2008-07-31 10:27 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: David Miller, Netdev, akarkare, nico, Herbert Xu

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3070 bytes --]

On Thu, 31 Jul 2008, Lennert Buytenhek wrote:

> On Thu, Jul 31, 2008 at 10:34:13AM +0300, Ilpo Järvinen wrote:
> 
> > > > The hacky patch below (on top of 2.6.27-rc1 + stubbing out the
> > > > sk_can_gso() check) reduces the 1 GiB 1000 Mb/s sendfile test from:
> > >  ...
> > > > I.e. dramatic CPU time improvements, and some overall speedup as well.
> > > > 
> > > > I wonder if something like this can be done in a less hacky fashion --
> > > > the hard part I guess is deciding when to keep coalescing (to reduce
> > > > CPU overhead) vs. when to push out what has been coalesced so far (in
> > > > order to keep the pipe filled), and I'm not sure I have good ideas
> > > > about how to make that decision.
> > > 
> > > Interesting, I'll take a closer look at this.
> > > 
> > > Actually your patch is less of a surprise, because one of the issues I
> > > had to surmount constantly when rewriting the TSO output path was the
> > > implicit conflict between TSO deferral (to accumulate segments) and
> > > the nagle logic.
> > 
> > I think your statement makes very little sense to me (though I had to 
> > lookup the meaning of surmount but that seems not so significant 
> > anyway)... They both work into the same direction, ie., to delay sending 
> > to prevent excessive processing of small bits, but the region of operation 
> > shouldn't overlap (nagle works with <mss, and tso deferring logic 
> > basically begins from where the nagle ends)?
> > 
> > It seems to me that this not about conflict between TSO deferring and 
> > nagle sub-mss logic at all (perhaps there wasn't as direct relation to 
> > this issue as I read...?) AFAICT, the change only makes (!nonagle && 
> > tp->packets_out && tcp_minshall_check(tp)) test in tcp_nagle_check more 
> > likely to occur (and result in false), ie., basically we end up using 
> > nagle test also to prevent sending of >= mss skbs, besides the usual 
> > functionality which is to prevent sending in case of < mss sized ones. 
> > ...Which seems just an extension to what we checked for in 
> > tcp_tso_should_defer().
> 
> I wanted a way to get larger GSO segments, and the idea was to rig
> the nagle check to consider sub-N*mss frames as small frames and not
> let more than one of them into the pipe at any given time.  I don't
> know whether the change I made accomplishes exactly that, but it did
> end up giving me larger GSO segments, which was the goal.
>
> It makes the GSO segment size distribution pretty chaotic, though:

Your test accomplishes that only if there's a small segment in the 
outstanding window, ie., snd_sml points to outs. win (or packets_out is 
zero but that's probably not relevant).

Why not experimenting with modifying tcp_tso_should_defer instead to make 
it fully independent of snd_sml (existance of a sub mss skb in-flight), 
just make sure you don't try to defer past what min(tp->snd_cwnd, 
tcp_wnd_end(tp)) can give you at most (in theory you could apply some 
optimism and go even above in a slow start but that's not going to be very 
robust approach :-)).


-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  0:41   ` Lennert Buytenhek
  2008-07-31  1:10     ` David Miller
@ 2008-07-31  2:29     ` Herbert Xu
  2008-07-31  2:36       ` Lennert Buytenhek
  1 sibling, 1 reply; 35+ messages in thread
From: Herbert Xu @ 2008-07-31  2:29 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: davem, netdev, akarkare, nico

Lennert Buytenhek <buytenh@wantstofly.org> wrote:
>
> Index: linux-2.6.27-rc1/net/ipv4/tcp_output.c
> ===================================================================
> --- linux-2.6.27-rc1.orig/net/ipv4/tcp_output.c
> +++ linux-2.6.27-rc1/net/ipv4/tcp_output.c
> @@ -1544,7 +1544,7 @@ static int tcp_write_xmit(struct sock *s
>                        break;
> 
>                if (tso_segs == 1) {
> -                       if (unlikely(!tcp_nagle_test(tp, skb, mss_now,
> +                       if (unlikely(!tcp_nagle_test(tp, skb, 5 * mss_now,
>                                                     (tcp_skb_is_last(sk, skb) ?
>                                                      nonagle : TCP_NAGLE_PUSH))))
>                                break;

What's the size of your application's write calls?

But yeah we should at least offer something like this patch (with
5 * mss_now replaced by the TSO aggregation limit) as an option in
addition to nalge itself for those apps that do small writes.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  2:29     ` Herbert Xu
@ 2008-07-31  2:36       ` Lennert Buytenhek
  2008-07-31  3:03         ` Herbert Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-31  2:36 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, netdev, akarkare, nico

On Thu, Jul 31, 2008 at 10:29:24AM +0800, Herbert Xu wrote:

> > Index: linux-2.6.27-rc1/net/ipv4/tcp_output.c
> > ===================================================================
> > --- linux-2.6.27-rc1.orig/net/ipv4/tcp_output.c
> > +++ linux-2.6.27-rc1/net/ipv4/tcp_output.c
> > @@ -1544,7 +1544,7 @@ static int tcp_write_xmit(struct sock *s
> >                        break;
> > 
> >                if (tso_segs == 1) {
> > -                       if (unlikely(!tcp_nagle_test(tp, skb, mss_now,
> > +                       if (unlikely(!tcp_nagle_test(tp, skb, 5 * mss_now,
> >                                                     (tcp_skb_is_last(sk, skb) ?
> >                                                      nonagle : TCP_NAGLE_PUSH))))
> >                                break;
> 
> What's the size of your application's write calls?

The maximum that splice will allow -- PIPE_BUFFERS * PAGE_SIZE.

I.e. the writes should be large enough.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  2:36       ` Lennert Buytenhek
@ 2008-07-31  3:03         ` Herbert Xu
  2008-07-31  6:55           ` Ilpo Järvinen
  2008-07-31 10:14           ` Lennert Buytenhek
  0 siblings, 2 replies; 35+ messages in thread
From: Herbert Xu @ 2008-07-31  3:03 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: davem, netdev, akarkare, nico

On Thu, Jul 31, 2008 at 04:36:52AM +0200, Lennert Buytenhek wrote:
>
> The maximum that splice will allow -- PIPE_BUFFERS * PAGE_SIZE.
> 
> I.e. the writes should be large enough.

Have you looked at the packet dump? Maybe it was being constrained
by the receiver/congestion windows.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  3:03         ` Herbert Xu
@ 2008-07-31  6:55           ` Ilpo Järvinen
  2008-07-31  9:39             ` Lennert Buytenhek
  2008-07-31 10:14           ` Lennert Buytenhek
  1 sibling, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-07-31  6:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Lennert Buytenhek, David Miller, Netdev, akarkare, nico

On Thu, 31 Jul 2008, Herbert Xu wrote:

> On Thu, Jul 31, 2008 at 04:36:52AM +0200, Lennert Buytenhek wrote:
> >
> > The maximum that splice will allow -- PIPE_BUFFERS * PAGE_SIZE.
> > 
> > I.e. the writes should be large enough.
> 
> Have you looked at the packet dump?

...That would be helpful indeed.

> Maybe it was being constrained by the receiver/congestion windows.

The thing that came into my mind is that the link might still be fast 
enough to make the transfer application limited, can you try setting 
/proc/sys/net/ipv4/tcp_slow_start_after_idle to zero and retest.

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  6:55           ` Ilpo Järvinen
@ 2008-07-31  9:39             ` Lennert Buytenhek
  0 siblings, 0 replies; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-31  9:39 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Herbert Xu, David Miller, Netdev, akarkare, nico

On Thu, Jul 31, 2008 at 09:55:57AM +0300, Ilpo Järvinen wrote:

> > > The maximum that splice will allow -- PIPE_BUFFERS * PAGE_SIZE.
> > > 
> > > I.e. the writes should be large enough.
> > 
> > Have you looked at the packet dump?
> 
> ...That would be helpful indeed.
> 
> > Maybe it was being constrained by the receiver/congestion windows.
> 
> The thing that came into my mind is that the link might still be fast 
> enough to make the transfer application limited, can you try setting 
> /proc/sys/net/ipv4/tcp_slow_start_after_idle to zero and retest.

From a couple of quick tests, that doesn't seem to change anything
as far as transfer speed or CPU usage are concerned.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31  3:03         ` Herbert Xu
  2008-07-31  6:55           ` Ilpo Järvinen
@ 2008-07-31 10:14           ` Lennert Buytenhek
  2008-07-31 10:16             ` David Miller
  1 sibling, 1 reply; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-31 10:14 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, netdev, akarkare, nico

On Thu, Jul 31, 2008 at 11:03:39AM +0800, Herbert Xu wrote:

> > The maximum that splice will allow -- PIPE_BUFFERS * PAGE_SIZE.
> > 
> > I.e. the writes should be large enough.
> 
> Have you looked at the packet dump? Maybe it was being constrained
> by the receiver/congestion windows.

The receiver seems to advertise a large enough window (700-ish KiB,
while the RTT is ~ 0.1 ms).  As to the congestion window, I had the
idea that it's not increasing beyond ~2-3 because the RTT is so low so
it doesn't need much data to fill the pipe, but I'm not a TCP expert.

I've put a tcpdump taken on the receiving end (192.168.42.10) here:

	http://www.wantstofly.org/~buytenh/dump.bz2

I'll be happy to try other things. :-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31 10:14           ` Lennert Buytenhek
@ 2008-07-31 10:16             ` David Miller
  2008-07-31 12:25               ` Lennert Buytenhek
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2008-07-31 10:16 UTC (permalink / raw)
  To: buytenh; +Cc: herbert, netdev, akarkare, nico

From: Lennert Buytenhek <buytenh@wantstofly.org>
Date: Thu, 31 Jul 2008 12:14:25 +0200

> As to the congestion window, I had the idea that it's not increasing
> beyond ~2-3 because the RTT is so low so it doesn't need much data
> to fill the pipe, but I'm not a TCP expert.

Local network 10GB needs a pretty decent congestion window.

Well, it needs to be at least as big as the largest amount of
non-retransmitted data in-flight, and you've stated here that
the receive has grown it's window to 700K.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31 10:16             ` David Miller
@ 2008-07-31 12:25               ` Lennert Buytenhek
  2008-07-31 12:35                 ` David Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-31 12:25 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev, akarkare, nico, dale

On Thu, Jul 31, 2008 at 03:16:54AM -0700, David Miller wrote:

> > As to the congestion window, I had the idea that it's not increasing
> > beyond ~2-3 because the RTT is so low so it doesn't need much data
> > to fill the pipe, but I'm not a TCP expert.
> 
> Local network 10GB needs a pretty decent congestion window.
> 
> Well, it needs to be at least as big as the largest amount of
> non-retransmitted data in-flight, and you've stated here that
> the receive has grown it's window to 700K.

As Herbert Xu suspected, these tests were bw limited by the receiver
(a 2.4 GHz Core 2 Quad, but with a non-PCIe NIC), at ~70 MiB/s.  D'oh.

If I put a (x1) PCIe NIC in the receiver and do not change the sender
(which is still the same 1.2 GHz ARM box with the puny 16 bit memory
bus), I get ~95 MiB/s.

At this point things seem to be CPU limited at the sender again.  E.g.
by simply dropping IRQF_SAMPLE_RANDOM from mv643xx_eth.c (the driver
used on the sender), throughput jumps to ~108 MiB/s, and I get:

	real    0m9.531s	sys     0m9.350s
	real    0m9.603s	sys     0m9.460s
	real    0m9.566s	sys     0m9.380s
	real    0m9.587s	sys     0m9.370s
	real    0m9.552s	sys     0m9.350s
	real    0m9.525s	sys     0m9.330s

Putting the 5 * mss_now nagle hack back in doesn't seem to change
the gso_size distribution anymore at this point, and it doesn't
change the numbers much:

	real    0m9.565s	sys     0m9.340s
	real    0m9.555s	sys     0m9.400s
	real    0m9.594s	sys     0m9.430s
	real    0m9.503s	sys     0m9.320s
	real    0m9.563s	sys     0m9.420s
	real    0m9.539s	sys     0m9.310s

The throughput with software GSO off again seems to be about ~93 MiB/s:

	real    0m11.327s	sys     0m11.020s
	real    0m11.160s	sys     0m11.000s
	real    0m11.517s	sys     0m11.400s
	real    0m11.116s	sys     0m10.970s
	real    0m11.513s	sys     0m11.400s
	real    0m11.151s	sys     0m11.050s

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31 12:25               ` Lennert Buytenhek
@ 2008-07-31 12:35                 ` David Miller
  2008-07-31 13:19                   ` Ben Hutchings
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2008-07-31 12:35 UTC (permalink / raw)
  To: buytenh; +Cc: herbert, netdev, akarkare, nico, dale

From: Lennert Buytenhek <buytenh@wantstofly.org>
Date: Thu, 31 Jul 2008 14:25:41 +0200

> At this point things seem to be CPU limited at the sender again.  E.g.
> by simply dropping IRQF_SAMPLE_RANDOM from mv643xx_eth.c (the driver
> used on the sender), throughput jumps to ~108 MiB/s, and I get:
 ...
> Putting the 5 * mss_now nagle hack back in doesn't seem to change
> the gso_size distribution anymore at this point, and it doesn't
> change the numbers much:
 ...
> The throughput with software GSO off again seems to be about ~93 MiB/s:

So I would conclude that at the moment we should just do the software
GSO enabling thing (with the recent suggestions made by Herbert) and
for the time being the nagle hack isn't something to consider closely.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31 12:35                 ` David Miller
@ 2008-07-31 13:19                   ` Ben Hutchings
  2008-07-31 13:27                     ` Herbert Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Ben Hutchings @ 2008-07-31 13:19 UTC (permalink / raw)
  To: David Miller; +Cc: buytenh, herbert, netdev, akarkare, nico, dale

David Miller wrote:
> From: Lennert Buytenhek <buytenh@wantstofly.org>
> Date: Thu, 31 Jul 2008 14:25:41 +0200
> 
> > At this point things seem to be CPU limited at the sender again.  E.g.
> > by simply dropping IRQF_SAMPLE_RANDOM from mv643xx_eth.c (the driver
> > used on the sender), throughput jumps to ~108 MiB/s, and I get:
>  ...
> > Putting the 5 * mss_now nagle hack back in doesn't seem to change
> > the gso_size distribution anymore at this point, and it doesn't
> > change the numbers much:
>  ...
> > The throughput with software GSO off again seems to be about ~93 MiB/s:
> 
> So I would conclude that at the moment we should just do the software
> GSO enabling thing (with the recent suggestions made by Herbert) and
> for the time being the nagle hack isn't something to consider closely.

You might want to think about providing a way for soft-GSO to generate
more lightweight structures than skbs.  The overhead for skb allocation
becomes quite significant beyond 1 Gbit/s, which is why we added the soft-
TSO implementation in sfc using per-interface pools of header buffers.  I
would guess niu would benefit from this sort of approach, though it looks
like all the other 10G NICs do TSO in hardware/firmware.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31 13:19                   ` Ben Hutchings
@ 2008-07-31 13:27                     ` Herbert Xu
  2008-08-03  8:19                       ` David Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Herbert Xu @ 2008-07-31 13:27 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, buytenh, netdev, akarkare, nico, dale

On Thu, Jul 31, 2008 at 02:19:30PM +0100, Ben Hutchings wrote:
>
> You might want to think about providing a way for soft-GSO to generate
> more lightweight structures than skbs.  The overhead for skb allocation
> becomes quite significant beyond 1 Gbit/s, which is why we added the soft-
> TSO implementation in sfc using per-interface pools of header buffers.  I
> would guess niu would benefit from this sort of approach, though it looks
> like all the other 10G NICs do TSO in hardware/firmware.

We could always provide a library that makes it easier for the
drivers to do TSO in software without allocating skb's.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31 13:27                     ` Herbert Xu
@ 2008-08-03  8:19                       ` David Miller
  2008-08-03  8:55                         ` Herbert Xu
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2008-08-03  8:19 UTC (permalink / raw)
  To: herbert; +Cc: bhutchings, buytenh, netdev, akarkare, nico, dale

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 31 Jul 2008 21:27:35 +0800

> On Thu, Jul 31, 2008 at 02:19:30PM +0100, Ben Hutchings wrote:
> >
> > You might want to think about providing a way for soft-GSO to generate
> > more lightweight structures than skbs.  The overhead for skb allocation
> > becomes quite significant beyond 1 Gbit/s, which is why we added the soft-
> > TSO implementation in sfc using per-interface pools of header buffers.  I
> > would guess niu would benefit from this sort of approach, though it looks
> > like all the other 10G NICs do TSO in hardware/firmware.
> 
> We could always provide a library that makes it easier for the
> drivers to do TSO in software without allocating skb's.

I took a brief look into this, and yes NIU would benefit a lot from
what the sfc driver is doing and using sw GSO in general.

I think that, in order to work out, the driver has to provide a pool
of DMA buffers to use in some generic fashion.

It seems likely that it's best to give the driver the largest amount
of flexibility wrt. the DMA bits.  There are two reasonable ways for
them to implement a header buffer pool:

1) A big coherent DMA block that gets chopped up into fixed size pieces.

2) A free list of kmalloc() buffers that get DMA mapped dynamically
   (because such dynamic DMA mappings transfer faster than coherent ones
   on some systems).

But anyways, we don't want to be in the business of enforcing one way
or the other in whatever interface we come up with.

So likely what we'll do is have the driver say it can do hw TSO and
then at ->hard_start_xmit() time it calls into the sw GSO engine,
passing header buffers in along the way.

I would start hacking on this beast but I haven't yet come up with
a clean way to share a lot of code with the existing sw GSO engine.
That's the key to implementing this properly.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-08-03  8:19                       ` David Miller
@ 2008-08-03  8:55                         ` Herbert Xu
  2008-08-07  6:07                           ` David Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Herbert Xu @ 2008-08-03  8:55 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, buytenh, netdev, akarkare, nico, dale

On Sun, Aug 03, 2008 at 01:19:45AM -0700, David Miller wrote:
>
> I would start hacking on this beast but I haven't yet come up with
> a clean way to share a lot of code with the existing sw GSO engine.
> That's the key to implementing this properly.

I think it's doable.  We could refactor the software GSO so that
it spits out one fragment at a time and the output could either
be written to some memory provided by the caller or fed through
a callback.

BTW, loner term we should start thinking about breaking the 64K
barrier.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-08-03  8:55                         ` Herbert Xu
@ 2008-08-07  6:07                           ` David Miller
  2008-08-07  6:15                             ` Herbert Xu
                                               ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: David Miller @ 2008-08-07  6:07 UTC (permalink / raw)
  To: herbert; +Cc: bhutchings, buytenh, netdev, akarkare, nico, dale

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 3 Aug 2008 16:55:53 +0800

> On Sun, Aug 03, 2008 at 01:19:45AM -0700, David Miller wrote:
> >
> > I would start hacking on this beast but I haven't yet come up with
> > a clean way to share a lot of code with the existing sw GSO engine.
> > That's the key to implementing this properly.
> 
> I think it's doable.  We could refactor the software GSO so that
> it spits out one fragment at a time and the output could either
> be written to some memory provided by the caller or fed through
> a callback.
> 
> BTW, loner term we should start thinking about breaking the 64K
> barrier.

So I had this idea.  My goal is to minimize the number of DMA
mappings the driver has to make.

We don't touch anything in the original TSO skb.  However we expand
the headroom (if necessary) and in the area in front of skb->data we
build the header areas for the sub-TSO frames, one by one.

We give the driver some iterator functions that walk through the
header areas and compute offset/length pairs into the
skb_shared_info() page list.

So basically the number of DMA mappings to make would be identical
to the number necessary for TSO capable hardware.  And at the
top level we can arrange it such that the headroom will be large
enough already in the cases that matter.

The only fly in the ointment is that the driver has to store these
DMA mapping cookies away somewhere, because what's going to happen
is the driver will directly DMA map the skb_shared_info() area pages
but then slice and adjust DMA addresses as it unpacks the TSO frame
into the TX ring.

This might be where we get pushed over the edge and have to add a
dma_addr_t to sk_buff and skb_frag_struct.  And that might not
be such a bad thing because it will allow other things that
we've always wanted to do.

Another nice aspect of this idea is that we can make the existing GSO
code just build this funny "TSO plus hidden headers" SKB, and then do
the by-hand unpacking into new SKB chunks that we will let smart
drivers do directly into their TX rings.

Herbert what do you think?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-08-07  6:07                           ` David Miller
@ 2008-08-07  6:15                             ` Herbert Xu
  2008-09-12  4:08                               ` David Miller
  2008-08-07 11:50                             ` Lennert Buytenhek
  2008-08-07 20:32                             ` Rick Jones
  2 siblings, 1 reply; 35+ messages in thread
From: Herbert Xu @ 2008-08-07  6:15 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, buytenh, netdev, akarkare, nico, dale

On Wed, Aug 06, 2008 at 11:07:41PM -0700, David Miller wrote:
>
> We don't touch anything in the original TSO skb.  However we expand
> the headroom (if necessary) and in the area in front of skb->data we
> build the header areas for the sub-TSO frames, one by one.

Or we could just allocate them beforehand, either way, it's one
operation per superpacket so it's cheap.
 
> This might be where we get pushed over the edge and have to add a
> dma_addr_t to sk_buff and skb_frag_struct.  And that might not
> be such a bad thing because it will allow other things that
> we've always wanted to do.

Since the skb_frag struct is in the shared area where we have
to pad up to a power-of-two for kmalloc we usually have plenty
of free space anyway.

> Another nice aspect of this idea is that we can make the existing GSO
> code just build this funny "TSO plus hidden headers" SKB, and then do
> the by-hand unpacking into new SKB chunks that we will let smart
> drivers do directly into their TX rings.
> 
> Herbert what do you think?

Yes this idea sounds perfect :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-08-07  6:15                             ` Herbert Xu
@ 2008-09-12  4:08                               ` David Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2008-09-12  4:08 UTC (permalink / raw)
  To: herbert; +Cc: bhutchings, buytenh, netdev, akarkare, nico, dale

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 7 Aug 2008 14:15:35 +0800

> On Wed, Aug 06, 2008 at 11:07:41PM -0700, David Miller wrote:
> >
> > We don't touch anything in the original TSO skb.  However we expand
> > the headroom (if necessary) and in the area in front of skb->data we
> > build the header areas for the sub-TSO frames, one by one.
> 
> Or we could just allocate them beforehand, either way, it's one
> operation per superpacket so it's cheap.
 ...
> > Another nice aspect of this idea is that we can make the existing GSO
> > code just build this funny "TSO plus hidden headers" SKB, and then do
> > the by-hand unpacking into new SKB chunks that we will let smart
> > drivers do directly into their TX rings.
> > 
> > Herbert what do you think?
> 
> Yes this idea sounds perfect :)

So I started studying how to do this.  I'm still not exactly sure how I
want the driver usage to look like.  But I did draw an ascii diagram :)
We'll call them QGSO or "Quick GSO" frames.

		TSO frame
		---------

skb->data ->	link level header
		IP header
		TCP header
		all data (always paged)

		QGSO frame
		----------

		------------------\
		link level header  |
		IP header          |--- * ->gso_segs
		TCP header         |
		------------------/
skb->data ->	orig link level header
	  	orig IP Header
		orig TCP header
		all data (always paged)

So when the driver gets this QGSO thing, It uses skb_shinfo(skb)->gso_segs
to figure out how far back in front of skb->data to start reading header
sections.

It could use two pointers, one to the header array and one to the data.
And then it would advance each as it fills in the TX descriptors.

However, every time I try to codify the driver part, it's super clumsy. :)

But, anyways, one thing we can let the driver do is, assuming "header_size"
is the computed size of each entry in the header array:

	__skb_push(skb, header_size * skb_shinfo(skb)->gso_segs);
	skb_dma_map(priv->dev, skb, DMA_TO_DEVICE);

which makes things a little simpler.

The driver starts it's header DMA address at skb_shinfo(skb)->dma_maps[0],
and this is linear so can just advance using simple increments.

The data DMA address starts at ->dma_maps[1] and the length is obtained
from the skb_frag_t.  Once the length is exhausted, we pick up the
next ->dma_maps[] array entry, and fetch the next skb_frag_t's length.

This process continues as the descriptors are filled in.

Then later skb_dma_unmap() is called.

I suspect that this __skb_push() thing is legal, because we have a unique
reference to a clone or similar.

I don't know, a lot of details to work out, which is why I'm writing this
8-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-08-07  6:07                           ` David Miller
  2008-08-07  6:15                             ` Herbert Xu
@ 2008-08-07 11:50                             ` Lennert Buytenhek
  2008-08-07 20:32                             ` Rick Jones
  2 siblings, 0 replies; 35+ messages in thread
From: Lennert Buytenhek @ 2008-08-07 11:50 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, bhutchings, netdev, akarkare, nico, dale

On Wed, Aug 06, 2008 at 11:07:41PM -0700, David Miller wrote:

> > > I would start hacking on this beast but I haven't yet come up with
> > > a clean way to share a lot of code with the existing sw GSO engine.
> > > That's the key to implementing this properly.
> > 
> > I think it's doable.  We could refactor the software GSO so that
> > it spits out one fragment at a time and the output could either
> > be written to some memory provided by the caller or fed through
> > a callback.
> > 
> > BTW, loner term we should start thinking about breaking the 64K
> > barrier.
> 
> So I had this idea.  My goal is to minimize the number of DMA
> mappings the driver has to make.

FWIW, this wouldn't make much of a difference in my case..


> Another nice aspect of this idea is that we can make the existing GSO
> code just build this funny "TSO plus hidden headers" SKB, and then do
> the by-hand unpacking into new SKB chunks that we will let smart
> drivers do directly into their TX rings.

..but I'm pretty sure that this would.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-08-07  6:07                           ` David Miller
  2008-08-07  6:15                             ` Herbert Xu
  2008-08-07 11:50                             ` Lennert Buytenhek
@ 2008-08-07 20:32                             ` Rick Jones
  2008-08-07 22:44                               ` David Miller
  2 siblings, 1 reply; 35+ messages in thread
From: Rick Jones @ 2008-08-07 20:32 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, bhutchings, buytenh, netdev, akarkare, nico, dale

David Miller wrote:
> So I had this idea.  My goal is to minimize the number of DMA
> mappings the driver has to make.
> 
> We don't touch anything in the original TSO skb.  However we expand
> the headroom (if necessary) and in the area in front of skb->data we
> build the header areas for the sub-TSO frames, one by one.
> 
> We give the driver some iterator functions that walk through the
> header areas and compute offset/length pairs into the
> skb_shared_info() page list.

Is that like Solaris Multi Data Transmit?

rick jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-08-07 20:32                             ` Rick Jones
@ 2008-08-07 22:44                               ` David Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2008-08-07 22:44 UTC (permalink / raw)
  To: rick.jones2; +Cc: herbert, bhutchings, buytenh, netdev, akarkare, nico, dale

From: Rick Jones <rick.jones2@hp.com>
Date: Thu, 07 Aug 2008 13:32:05 -0700

> David Miller wrote:
> > So I had this idea.  My goal is to minimize the number of DMA
> > mappings the driver has to make.
> > 
> > We don't touch anything in the original TSO skb.  However we expand
> > the headroom (if necessary) and in the area in front of skb->data we
> > build the header areas for the sub-TSO frames, one by one.
> > 
> > We give the driver some iterator functions that walk through the
> > header areas and compute offset/length pairs into the
> > skb_shared_info() page list.
> 
> Is that like Solaris Multi Data Transmit?

No, it's slightly different.

Solaris just accumulates a list of packets and gives them all to the
device at once.  It doesn't do anything interesting to optimize
the DMA mappings or anything clever like we'll be doing here.

Here, the TCP stack will be working with TSO frames, which cuts down
per-packet overhead and whatnot.  Solaris works with just normal MSS
sized frames when it does it's batching thing.  And that's all it is,
batching.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-30 23:50 using software TSO on non-TSO capable netdevices Lennert Buytenhek
  2008-07-30 23:56 ` David Miller
@ 2008-07-31 17:00 ` Rick Jones
  2008-07-31 17:45   ` Lennert Buytenhek
  1 sibling, 1 reply; 35+ messages in thread
From: Rick Jones @ 2008-07-31 17:00 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev, Ashish Karkare, Nicolas Pitre

Lennert Buytenhek wrote:
> When using sendfile() to send a GiB worth of zeroes over a single TCP
> connection to another host on a 100 Mb/s network, with a vanilla
> 2.6.27-rc1 kernel, this runs as expected at wire speed, taking the
> following amount of CPU time per test:
> 
> 	sys     0m5.410s
> 	sys     0m5.380s
> 	sys     0m5.620s
> 	sys     0m5.360s

That's output from "time" run against your test program right?  Are 
folks confident that will account for all the CPU time consumed on 
behalf of that program?  FWIW netperf is unwilling to make such an 
assumption.  Perhaps I and it are being too paranoid but there you have 
it :)

rick jones
fwiw, there is a TCP_SENDFILE test in netperf

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: using software TSO on non-TSO capable netdevices
  2008-07-31 17:00 ` Rick Jones
@ 2008-07-31 17:45   ` Lennert Buytenhek
  0 siblings, 0 replies; 35+ messages in thread
From: Lennert Buytenhek @ 2008-07-31 17:45 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, Ashish Karkare, Nicolas Pitre

On Thu, Jul 31, 2008 at 10:00:32AM -0700, Rick Jones wrote:

> >When using sendfile() to send a GiB worth of zeroes over a single TCP
> >connection to another host on a 100 Mb/s network, with a vanilla
> >2.6.27-rc1 kernel, this runs as expected at wire speed, taking the
> >following amount of CPU time per test:
> >
> >	sys     0m5.410s
> >	sys     0m5.380s
> >	sys     0m5.620s
> >	sys     0m5.360s
> 
> That's output from "time" run against your test program right?  Are 
> folks confident that will account for all the CPU time consumed on 
> behalf of that program?

No, but I wasn't drawing absolute conclusions, just comparing runs
without GSO with runs with GSO.  :-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2008-09-12  4:08 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-30 23:50 using software TSO on non-TSO capable netdevices Lennert Buytenhek
2008-07-30 23:56 ` David Miller
2008-07-31  0:41   ` Lennert Buytenhek
2008-07-31  1:10     ` David Miller
2008-07-31  1:45       ` Lennert Buytenhek
2008-07-31  3:54         ` Herbert Xu
2008-07-31  9:45           ` Lennert Buytenhek
2008-07-31 10:55             ` Herbert Xu
2008-07-31 12:37               ` Lennert Buytenhek
2008-07-31 12:59                 ` Herbert Xu
2008-08-03  8:23                   ` David Miller
2008-07-31  7:34       ` Ilpo Järvinen
2008-07-31  9:50         ` Lennert Buytenhek
2008-07-31 10:27           ` Ilpo Järvinen
2008-07-31  2:29     ` Herbert Xu
2008-07-31  2:36       ` Lennert Buytenhek
2008-07-31  3:03         ` Herbert Xu
2008-07-31  6:55           ` Ilpo Järvinen
2008-07-31  9:39             ` Lennert Buytenhek
2008-07-31 10:14           ` Lennert Buytenhek
2008-07-31 10:16             ` David Miller
2008-07-31 12:25               ` Lennert Buytenhek
2008-07-31 12:35                 ` David Miller
2008-07-31 13:19                   ` Ben Hutchings
2008-07-31 13:27                     ` Herbert Xu
2008-08-03  8:19                       ` David Miller
2008-08-03  8:55                         ` Herbert Xu
2008-08-07  6:07                           ` David Miller
2008-08-07  6:15                             ` Herbert Xu
2008-09-12  4:08                               ` David Miller
2008-08-07 11:50                             ` Lennert Buytenhek
2008-08-07 20:32                             ` Rick Jones
2008-08-07 22:44                               ` David Miller
2008-07-31 17:00 ` Rick Jones
2008-07-31 17:45   ` Lennert Buytenhek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).