public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)
  2004-11-12 23:51 Radheka Godse
@ 2004-11-12  0:00 ` David S. Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David S. Miller @ 2004-11-12  0:00 UTC (permalink / raw)
  To: Radheka Godse; +Cc: bonding-devel, fubar, ctindel, linux-kernel

On Fri, 12 Nov 2004 15:51:49 -0800 (PST)
Radheka Godse <radheka.godse@intel.com> wrote:

> -
> + 
> +	/* We let the bond device publish all hardware
> +	 * acceleration features possible. This is OK,
> +	 * since if an skb is passed from the bond to
> +	 * a slave that doesn't support one of those
> +	 * features, everything is fixed in the
> +	 * dev_queue_xmit() function (e.g. calculate
> +	 * check sum, linearize the skb, etc.).
> +	 */

This is very inefficient if the bond slaves don't
support these features.

I believe you when you say you saw improvement in the
case where the slaves do support TSO, but if you test
a non-TSO slave case I bet you'll see a marked
decrease in system utilization at least.

The upper layers need to know the precise capabilities
of the device in order to optimize the copy from userspace,
the checksumming, and the data gathering into the SKB.

Therefore, if you "fake it out" like this without checking
what the slaves actually support, then a lot of wasted cpu
time will be spent in each dev_queue_xmit() path.  There will
in many cases be multiple passes over the data instead of one,
and it is possible to introduce an extra data copy as well.

I would recommend instead the following algorithm.  Publish
only the capabilities which all slaves support.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)
  2004-11-12 21:31 [Bonding-devel][PATCH]Zero Copy Transmit Support (Update) Godse, Radheka
@ 2004-11-12 21:20 ` David S. Miller
  2004-11-12 21:56   ` Jay Vosburgh
  0 siblings, 1 reply; 9+ messages in thread
From: David S. Miller @ 2004-11-12 21:20 UTC (permalink / raw)
  To: Godse, Radheka; +Cc: bonding-devel, fubar, ctindel, linux-kernel

On Fri, 12 Nov 2004 13:31:53 -0800
"Godse, Radheka" <radheka.godse@intel.com> wrote:

> I had similar thoughts but then, the bond device does not have any
> slaves attached to it at load time. By publishing them upfront the bond
> device is able to take advantage of hardware acceleration if it is later
> available...

This is not a problem at all.

You can make ethtool calls on the bonding device to change
settings later.  Nothing prevents you from changing the
SG/CSUM/TSO bits after device registration.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)
@ 2004-11-12 21:31 Godse, Radheka
  2004-11-12 21:20 ` David S. Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Godse, Radheka @ 2004-11-12 21:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: bonding-devel, fubar, ctindel, linux-kernel

I had similar thoughts but then, the bond device does not have any
slaves attached to it at load time. By publishing them upfront the bond
device is able to take advantage of hardware acceleration if it is later
available... I will get test data for non-TSO test scenarios to see if
there is significant degradation in performance.

-radheka 
 

-----Original Message-----
From: David S. Miller [mailto:davem@davemloft.net] 
Sent: Thursday, November 11, 2004 4:00 PM
To: Godse, Radheka
Cc: bonding-devel@lists.sourceforge.net; fubar@us.ibm.com;
ctindel@users.sourceforge.net; linux-kernel@vger.kernel.org
Subject: Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)

On Fri, 12 Nov 2004 15:51:49 -0800 (PST)
Radheka Godse <radheka.godse@intel.com> wrote:

> -
> + 
> +	/* We let the bond device publish all hardware
> +	 * acceleration features possible. This is OK,
> +	 * since if an skb is passed from the bond to
> +	 * a slave that doesn't support one of those
> +	 * features, everything is fixed in the
> +	 * dev_queue_xmit() function (e.g. calculate
> +	 * check sum, linearize the skb, etc.).
> +	 */

This is very inefficient if the bond slaves don't
support these features.

I believe you when you say you saw improvement in the
case where the slaves do support TSO, but if you test
a non-TSO slave case I bet you'll see a marked
decrease in system utilization at least.

The upper layers need to know the precise capabilities
of the device in order to optimize the copy from userspace,
the checksumming, and the data gathering into the SKB.

Therefore, if you "fake it out" like this without checking
what the slaves actually support, then a lot of wasted cpu
time will be spent in each dev_queue_xmit() path.  There will
in many cases be multiple passes over the data instead of one,
and it is possible to introduce an extra data copy as well.

I would recommend instead the following algorithm.  Publish
only the capabilities which all slaves support.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)
  2004-11-12 21:56   ` Jay Vosburgh
@ 2004-11-12 21:49     ` David S. Miller
  2004-11-12 22:20       ` Jay Vosburgh
  0 siblings, 1 reply; 9+ messages in thread
From: David S. Miller @ 2004-11-12 21:49 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: radheka.godse, bonding-devel, ctindel, linux-kernel

On Fri, 12 Nov 2004 13:56:26 -0800
Jay Vosburgh <fubar@us.ibm.com> wrote:

> 
> David S. Miller <davem@davemloft.net> wrote:
> >> I had similar thoughts but then, the bond device does not have any
> >> slaves attached to it at load time. By publishing them upfront the bond
> >> device is able to take advantage of hardware acceleration if it is later
> >> available...
> 
> 	"Shlomi Yaakobovich" <Shlomi@exanet.com> posted a patch to
> update the features as slaves are added and removed, based on the
> features advertised by the slaves.  His original patch wasn't properly
> based; this is the same change set redone to patch against 2.6.9.

That's definitely a good start.

It does need to be fixed to enforce the usual rules about
illegal combinations.  And his code is going to include
all sorts of weird things like VLAN offload which I wonder
if works correctly with the current bonding driver? :)

The two rules are codified in register_netdevice() as follows:

	/* Fix illegal SG+CSUM combinations. */
	if ((dev->features & NETIF_F_SG) &&
	    !(dev->features & (NETIF_F_IP_CSUM |
			       NETIF_F_NO_CSUM |
			       NETIF_F_HW_CSUM))) {
		printk("%s: Dropping NETIF_F_SG since no checksum feature.\n",
		       dev->name);
		dev->features &= ~NETIF_F_SG;
	}

	/* TSO requires that SG is present as well. */
	if ((dev->features & NETIF_F_TSO) &&
	    !(dev->features & NETIF_F_SG)) {
		printk("%s: Dropping NETIF_F_TSO since no SG feature.\n",
		       dev->name);
		dev->features &= ~NETIF_F_TSO;
	}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)
  2004-11-12 21:20 ` David S. Miller
@ 2004-11-12 21:56   ` Jay Vosburgh
  2004-11-12 21:49     ` David S. Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Jay Vosburgh @ 2004-11-12 21:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: Godse, Radheka, bonding-devel, ctindel, linux-kernel


David S. Miller <davem@davemloft.net> wrote:
>> I had similar thoughts but then, the bond device does not have any
>> slaves attached to it at load time. By publishing them upfront the bond
>> device is able to take advantage of hardware acceleration if it is later
>> available...

	"Shlomi Yaakobovich" <Shlomi@exanet.com> posted a patch to
update the features as slaves are added and removed, based on the
features advertised by the slaves.  His original patch wasn't properly
based; this is the same change set redone to patch against 2.6.9.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


Index: linux-2681-bd/drivers/net/bonding/bond_main.c
===================================================================
RCS file: /sda7/CVS/linux-2681-bd/drivers/net/bonding/bond_main.c,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -u -r1.2 -r1.3
--- linux-2681-bd/drivers/net/bonding/bond_main.c	15 Oct 2004 22:11:14 -0000	1.2
+++ linux-2681-bd/drivers/net/bonding/bond_main.c	15 Oct 2004 23:27:34 -0000	1.3
@@ -1580,6 +1580,32 @@
 	return 0;
 }
 
+enum { /* don't set these here */
+       BOND_MAINTAINED_FEATURES =
+               NETIF_F_VLAN_CHALLENGED|NETIF_F_HW_VLAN_TX |
+               NETIF_F_HW_VLAN_RX |NETIF_F_HW_VLAN_FILTER
+};
+
+static void bond_set_features(struct net_device *bond_dev)
+{
+       struct bonding *bond = bond_dev->priv;
+       struct slave *slave;
+       int i, features = 0;
+
+       bond_for_each_slave(bond, slave, i) {
+               struct net_device *slave_dev = slave->dev;
+               if (i == 0) {
+                       features = slave_dev->features;
+	       } else {
+                       features &= slave_dev->features;
+	       }
+       }
+       /* clear all previous features */
+       bond_dev->features &= BOND_MAINTAINED_FEATURES;
+       /* add common features, don't touch driver maintained ones */
+       bond_dev->features |= (features & ~BOND_MAINTAINED_FEATURES);
+}
+
 /* enslave device <slave> to bond device <master> */
 static int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 {
@@ -1972,6 +1998,7 @@
 	       new_slave->link != BOND_LINK_DOWN ? "n up" : " down");
 
 	/* enslave is successful */
+	bond_set_features(bond_dev);
 	return 0;
 
 /* Undo stages on error */
@@ -2177,6 +2204,7 @@
 
 	kfree(slave);
 
+	bond_set_features(bond_dev);
 	return 0;  /* deletion OK */
 }
 
@@ -2292,6 +2320,7 @@
 	printk(KERN_INFO DRV_NAME
 	       ": %s: released all slaves\n",
 	       bond_dev->name);
+	bond_set_features(bond_dev);
 
 out:
 	write_unlock_bh(&bond->lock);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)
  2004-11-12 21:49     ` David S. Miller
@ 2004-11-12 22:20       ` Jay Vosburgh
  2004-11-12 22:24         ` David S. Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Jay Vosburgh @ 2004-11-12 22:20 UTC (permalink / raw)
  To: David S. Miller; +Cc: radheka.godse, bonding-devel, ctindel, linux-kernel

David S. Miller <davem@davemloft.net> wrote:
>That's definitely a good start.
>
>It does need to be fixed to enforce the usual rules about
>illegal combinations.  And his code is going to include
>all sorts of weird things like VLAN offload which I wonder
>if works correctly with the current bonding driver? :)

	The existing code should handle VLANs correctly, and the patch
excludes the VLAN related bits from the dev->features update.

>The two rules are codified in register_netdevice() as follows:
[...]

	Would it be preferrable to duplicate that logic in bonding, or
push it out to an inline or some such?

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)
  2004-11-12 22:20       ` Jay Vosburgh
@ 2004-11-12 22:24         ` David S. Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David S. Miller @ 2004-11-12 22:24 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: radheka.godse, bonding-devel, ctindel, linux-kernel

On Fri, 12 Nov 2004 14:20:51 -0800
Jay Vosburgh <fubar@us.ibm.com> wrote:

> 	Would it be preferrable to duplicate that logic in bonding, or
> push it out to an inline or some such?

It's present also in the ethtool methods used to change
these things, so you might consider making ethtool calls
to change the bits as well.

It's a pretty bad idea for folks to be changing the ->features
bits directly in drivers after device registry.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)
@ 2004-11-12 23:51 Radheka Godse
  2004-11-12  0:00 ` David S. Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Radheka Godse @ 2004-11-12 23:51 UTC (permalink / raw)
  To: bonding-devel, fubar, ctindel; +Cc: linux-kernel

Please ignore previous submittal, it has reversed tags ('-'s instead of 
'+'s; I had swapped the order for diff of modified and un-modified 
src trees)

The patch below ADDS Zero Copy Transmit Support to bonding device.

We saw around ~50% system utilization improvement with tcp sendfile() 
function enabled netperf test in 802.3ad mode.

Note that this patch was generated for 2.6.9 kernel after applying 
bond-patch-2.6.1(also attached to this thread) that was submitted last week and 
got accepted into the kernel.

Signed-off-by: Radheka Godse <radheka.godse@intel.com>

diff -uprN -X dontdiff linux-2.6.9_bond-patch-2.6.1/drivers/net/bonding/bonding.h linux-2.6.9/drivers/net/bonding/bonding.h
--- linux-2.6.9_bond-patch-2.6.1/drivers/net/bonding/bonding.h	2004-11-10 15:42:55.000000000 -0800
+++ linux-2.6.9/drivers/net/bonding/bonding.h	2004-11-11 13:05:54.000000000 -0800
@@ -36,8 +36,8 @@
  #include "bond_3ad.h"
  #include "bond_alb.h"

-#define DRV_VERSION	"2.6.1"
-#define DRV_RELDATE	"October 29, 2004"
+#define DRV_VERSION	"2.6.2"
+#define DRV_RELDATE	"November 09, 2004"
  #define DRV_NAME	"bonding"
  #define DRV_DESCRIPTION	"Ethernet Channel Bonding Driver"

diff -uprN -X dontdiff linux-2.6.9_bond-patch-2.6.1/drivers/net/bonding/bond_main.c linux-2.6.9/drivers/net/bonding/bond_main.c
--- linux-2.6.9_bond-patch-2.6.1/drivers/net/bonding/bond_main.c	2004-11-10 15:42:55.000000000 -0800
+++ linux-2.6.9/drivers/net/bonding/bond_main.c	2004-11-11 13:05:54.000000000 -0800
@@ -475,6 +475,9 @@
   *        Solution is to move call to dev_remove_pack outside of the
   *        spinlock.
   *        Set version to 2.6.1.
+ * 2004/11/09 - Radheka Godse <radheka.godse at intel dot com>
+ *      - Added Zero Copy Transmit Support by setting appropriate flags.
+ *        Set version to 2.6.2.
   *
   */

@@ -4318,7 +4321,22 @@ static int __init bond_init(struct net_d
  	bond_dev->features |= (NETIF_F_HW_VLAN_TX |
  			       NETIF_F_HW_VLAN_RX |
  			       NETIF_F_HW_VLAN_FILTER);
-
+ 
+	/* We let the bond device publish all hardware
+	 * acceleration features possible. This is OK,
+	 * since if an skb is passed from the bond to
+	 * a slave that doesn't support one of those
+	 * features, everything is fixed in the
+	 * dev_queue_xmit() function (e.g. calculate
+	 * check sum, linearize the skb, etc.).
+	 */
+	bond_dev->features |= (NETIF_F_SG      |
+			       NETIF_F_IP_CSUM |
+			       NETIF_F_NO_CSUM |
+			       NETIF_F_HW_CSUM |
+			       NETIF_F_HIGHDMA |
+			       NETIF_F_FRAGLIST);
+
  #ifdef CONFIG_PROC_FS
  	bond_create_proc_entry(bond);
  #endif

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)
@ 2004-11-18 19:02 Godse, Radheka
  0 siblings, 0 replies; 9+ messages in thread
From: Godse, Radheka @ 2004-11-18 19:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: bonding-devel, fubar, ctindel, linux-kernel

> -
> + 
> +	/* We let the bond device publish all hardware
> +	 * acceleration features possible. This is OK,
> +	 * since if an skb is passed from the bond to
> +	 * a slave that doesn't support one of those
> +	 * features, everything is fixed in the
> +	 * dev_queue_xmit() function (e.g. calculate
> +	 * check sum, linearize the skb, etc.).
> +	 */

This is very inefficient if the bond slaves don't
support these features.

I believe you when you say you saw improvement in the
case where the slaves do support TSO, but if you test
a non-TSO slave case I bet you'll see a marked
decrease in system utilization at least.
...

Note: 2.6.2 patch enables Zero Copy Transmit support by default while
registering the bond device... but, I continue to see better CPU
utilization (~50% improvement) when this patch was tested in various
scenarios(see data below). Test results are from a dual Itanium 2 1.4GHz
system but I saw a similar trend on a PIII Xeon 550MHz x 8 system.
 
bond-patch-2.6.1 and 82546EB Copper NICs with hardware acceleration
support, 
Test results with TSO on and off
ETH1  ETH2  Throughput/Performance  System Utilization
on    off  1870                    35 %
on   on   1870                    35 %
off   off   1871                    34 %

bond-patch-2.6.2 and 82546EB Copper NICs with hardware acceleration
support, 
Test results with TSO on and off
ETH1  ETH2  Throughput/Performance  System Utilization
on    off  1879                    14 %
on   on   1880                    14.08 %
off   off   1879                    13 %
------------------------------------------------------------------

bond-patch-2.6.1 and 82543GC NICs without hardware support for TSO
ETH1  ETH2  Throughput/Performance  System Utilization
off   off   1815                    36.85 %
 
bond-patch-2.6.2 and 82543GC NICs without hardware support for TSO
ETH1  ETH2  Throughput/Performance  System Utilization
off   off   1848                    11.85 %
------------------------------------------------------------------

bond-patch-2.6.1, and one 82546EB NIC with & one 82543GC NIC without
hardware support for TSO 
ETH1  ETH2  Throughput/Performance  System Utilization

on    off   1817                    30.69 %

bond-patch-2.6.2, and one 82546EB NIC with & one 82543GC NIC without
hardware support for TSO 
ETH1  ETH2  Throughput/Performance  System Utilization
on    off   1836                    24.54 %
------------------------------------------------------------------

bond-patch-2.6.1 and 82543GC NICs
Test results with all HW acceleration bits on for 1 NIC and turned off
for the other NIC
ETH1  ETH2  Throughput/Performance  System Utilization
on    off  1851                    35 %

bond-patch-2.6.2 and 82543GC NICs
Test results with all HW acceleration bits on for 1 NIC and turned off
for the other NIC
ETH1  ETH2  Throughput/Performance  System Utilization
on    off  1822                    17 %
------------------------------------------------------------------

bond-patch-2.6.1 and 82543GC NICs
Test results with all HW acceleration bits turned off for both NICs
ETH1  ETH2  Throughput/Performance  System Utilization
off    off  1816                    35 %
 
bond-patch-2.6.2 and 82543GC NICs
Test results with all HW acceleration bits turned off for both NICs
ETH1  ETH2  Throughput/Performance  System Utilization
off    off  1604                    16.23 %

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-11-18 19:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-12 21:31 [Bonding-devel][PATCH]Zero Copy Transmit Support (Update) Godse, Radheka
2004-11-12 21:20 ` David S. Miller
2004-11-12 21:56   ` Jay Vosburgh
2004-11-12 21:49     ` David S. Miller
2004-11-12 22:20       ` Jay Vosburgh
2004-11-12 22:24         ` David S. Miller
  -- strict thread matches above, loose matches on Subject: below --
2004-11-12 23:51 Radheka Godse
2004-11-12  0:00 ` David S. Miller
2004-11-18 19:02 Godse, Radheka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox