Re: [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode
       [not found] <E6F7D288B394A64585E67497E5126BA601F991D0@hasmsx403.iil.intel.com>
@ 2004-01-08 15:33 ` Amir Noam
  2004-01-08 16:43   ` Per Hedeland
  0 siblings, 1 reply; 5+ messages in thread
From: Amir Noam @ 2004-01-08 15:33 UTC (permalink / raw)
  To: Per Hedeland; +Cc: bonding-devel, netdev

On Wednesday 07 January 2004 10:58 pm, Per Hedeland wrote:
>         struct bonding *bond = bond_dev->priv; 
>         struct ethhdr *data = (struct ethhdr *)skb->data; 
>         struct slave *slave, *start_at; 
> -       int slave_no; 
> +       int slave_no = 0; 
>         int i; 
> +       __u32 u; 
>  
>         read_lock(&bond->lock);
>

Please use u32 instead of __u32.


> +static int bond_xmit_xor_mac(struct sk_buff *skb, struct net_device *bond_dev) 
> +{ 
> +       return bond_xmit_xor(skb, bond_dev, 0); 
> +} 
> + 
> +static int bond_xmit_xor_ip(struct sk_buff *skb, struct net_device *bond_dev) 
> +{ 
> +       return bond_xmit_xor(skb, bond_dev, 1); 
> +} 
> +

hmm...

I don't like this. The reason we give different tx function pointers
to dev->hard_start_xmit in different bonding mode is to make the tx
path as fast as possible. Otherwise we might as well use a single tx
function that chooses its exact operation based on the bonding mode.

It might be better to have some code duplication if it results in
faster tx, but I'm not sure what's the optimal solution in this case.

-- 
Amir

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode
  2004-01-08 15:33 ` [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode Amir Noam
@ 2004-01-08 16:43   ` Per Hedeland
  0 siblings, 0 replies; 5+ messages in thread
From: Per Hedeland @ 2004-01-08 16:43 UTC (permalink / raw)
  To: amir.noam; +Cc: bonding-devel, netdev

Amir Noam <amir.noam@intel.com> wrote:

>Please use u32 instead of __u32.

OK.

>hmm...
>
>I don't like this. The reason we give different tx function pointers
>to dev->hard_start_xmit in different bonding mode is to make the tx
>path as fast as possible. Otherwise we might as well use a single tx
>function that chooses its exact operation based on the bonding mode.
>
>It might be better to have some code duplication if it results in
>faster tx, but I'm not sure what's the optimal solution in this case.

Well, I don't really have an opinion since I don't have a good idea
about the cost of a function call relative to "everything else" that is
happening here. I don't see a way to do "limited" duplication without
using function calls though, but I'm quite happy to make it two entirely
separate functions for MAC vs IP. Please advise.

--Per Hedeland
per@hedeland.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode
       [not found] <E6F7D288B394A64585E67497E5126BA601F991D2@hasmsx403.iil.intel.com>
@ 2004-01-11 14:50 ` Amir Noam
  2004-01-11 21:45   ` Per Hedeland
  0 siblings, 1 reply; 5+ messages in thread
From: Amir Noam @ 2004-01-11 14:50 UTC (permalink / raw)
  To: Per Hedeland; +Cc: bonding-devel, netdev

On Thursday 08 January 2004 06:43 pm, Per Hedeland wrote:
> Amir Noam <amir.noam@intel.com> wrote:
> >I don't like this. The reason we give different tx function
> > pointers to dev->hard_start_xmit in different bonding mode is to
> > make the tx path as fast as possible. Otherwise we might as well
> > use a single tx function that chooses its exact operation based
> > on the bonding mode.
> >
> >It might be better to have some code duplication if it results in
> >faster tx, but I'm not sure what's the optimal solution in this
> > case.
>
> Well, I don't really have an opinion since I don't have a good idea
> about the cost of a function call relative to "everything else"
> that is happening here. I don't see a way to do "limited"
> duplication without using function calls though, but I'm quite
> happy to make it two entirely separate functions for MAC vs IP.
> Please advise.

A possible way to have "limited" duplication would be to have two 
seperate xmit functions, that call an inline function for the shared 
code. This might be good enough, performance-wise, while avoiding 
some code duplication.

But, like I've said, I'm not sure wout the best solution is. I'd like 
to hear what Jay (and others) thinks about it.

-- 
Amir

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode
  2004-01-11 14:50 ` [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode Amir Noam
@ 2004-01-11 21:45   ` Per Hedeland
  2004-01-24 12:33     ` [PATCH] [bonding 2.4] Fixes for balance-xor and balance-rr Per Hedeland
  0 siblings, 1 reply; 5+ messages in thread
From: Per Hedeland @ 2004-01-11 21:45 UTC (permalink / raw)
  To: amir.noam; +Cc: bonding-devel, netdev

Amir Noam <amir.noam@intel.com> wrote:
>
>A possible way to have "limited" duplication would be to have two 
>seperate xmit functions, that call an inline function for the shared 
>code. This might be good enough, performance-wise, while avoiding 
>some code duplication.

Yes, I actually thought of that (after the previous post:-) - it would
at least avoid some *source* code duplication. And the reasonable thing
to break out into such a function would be the logic to find the slave
to xmit on given slave_no (perhaps also the actual xmit), I guess.

However when looking at this I realized that the xor mode has a pretty
serious bug/flaw in the fault-tolerance department: The modulus is done
with the total number of slaves, and slaves that are down are only
skipped in that find-slave logic. I.e. if you (e.g.) have a bond with 3
links where link 1 is down, link 2 will get 2/3 of the traffic and link
3 get 1/3 - not good I think.

Since fixing this would likely change the interface to the above
function, I guess it would be nice to do it at the same time as (or
before) the balance-xor-ip addition. Assuming I'm not the only one that
thinks it needs fixing, that is.:-)

The superficially obvious fix is to do the modulus with only the number
of slaves that are up, but I'm not sure about the best actual
implementation. That number isn't currently maintained, and traversing
the list of slaves just to find it for every packet doesn't make sense
of course. It seems straightforward to add another field to the bonding
struct for it, and maintain that in the monitoring functions, though.

But in addition to that, the actual slave selection logic would have to
always check the link state of each slave until it has found the
slave_no'th one that is up (though one could optimize for the case of
all slaves being up) - whereas currently it only starts checking at the
slave_no'th one, and if all slaves are up there is only one check. I'm
not sure if this is acceptable?

The only alternative I see is to also maintain a list of the slaves that
are up, but then we're perhaps getting too much complexity for these
"simple" modes. Any comments, other alternatives?

>But, like I've said, I'm not sure wout the best solution is. I'd like 
>to hear what Jay (and others) thinks about it.

Well, Jay said he would be out of email access for three weeks -
starting tomorrow and promising he would look over updated patches
first, but maybe he didn't have the time - or, given that you
objected:-), he (correctly) thought that a second update wouldn't arrive
in time.

I'm in no particular hurry to get this into the "official" source
though, since I'll be using an older kernel that I've already applied
the (original) patch to for a while anyway - it's just that it gets old
pretty quickly to keep reworking the patches against moving (and not
very easily accessible) targets...

--Per Hedeland
per@hedeland.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] [bonding 2.4] Fixes for balance-xor and balance-rr
  2004-01-11 21:45   ` Per Hedeland
@ 2004-01-24 12:33     ` Per Hedeland
  0 siblings, 0 replies; 5+ messages in thread
From: Per Hedeland @ 2004-01-24 12:33 UTC (permalink / raw)
  To: bonding-devel, netdev

Per Hedeland <per@hedeland.org> wrote:
>
>However when looking at this I realized that the xor mode has a pretty
>serious bug/flaw in the fault-tolerance department: The modulus is done
>with the total number of slaves, and slaves that are down are only
>skipped in that find-slave logic. I.e. if you (e.g.) have a bond with 3
>links where link 1 is down, link 2 will get 2/3 of the traffic and link
>3 get 1/3 - not good I think.

Enclosed is a fix for this against current netdev-2.4.  I also noticed
that there seems to be another serious bug in bond_xmit_xor(): If all
slaves are down, it will neither dev_queue_xmit() nor dev_kfree_skb().
Ditto for bond_xmit_roundrobin() - the patch should fix both. Comments
appreciated.

Also enclosed are some stats from a balance-xor-ip bond with 4 slaves
(eth2-eth5) where the first two are down - as expected, the patch
changes the transmit load distribution from ~ 75/25 to close to 50/50 on
the two remaining ones (eth4, eth5).

--Per Hedeland
per@hedeland.org

====================================================================
Before fix:

eth4      Link encap:Ethernet  HWaddr 00:C0:95:C9:1A:28
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:451676 errors:0 dropped:0 overruns:0 frame:0
          TX packets:300809 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:192971475 (184.0 Mb)  TX bytes:145298331 (138.5 Mb)
          Interrupt:20 Base address:0x5400

eth5      Link encap:Ethernet  HWaddr 00:C0:95:C9:1A:28
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:169355 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:208 (208.0 b)  TX bytes:51126318 (48.7 Mb)
          Interrupt:21 Base address:0x7000

-------------------------------------------------------------------
After fix:

eth4      Link encap:Ethernet  HWaddr 00:C0:95:C9:1A:28
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:453036 errors:0 dropped:0 overruns:0 frame:0
          TX packets:223884 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:193532059 (184.5 Mb)  TX bytes:93244842 (88.9 Mb)
          Interrupt:20 Base address:0x5400

eth5      Link encap:Ethernet  HWaddr 00:C0:95:C9:1A:28
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:249256 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:208 (208.0 b)  TX bytes:103875515 (99.0 Mb)
          Interrupt:21 Base address:0x7000

====================================================================

diff -ruN a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
--- a/drivers/net/bonding/bond_main.c	Fri Jan 23 18:03:31 2004
+++ b/drivers/net/bonding/bond_main.c	Fri Jan 23 18:12:23 2004
@@ -2010,6 +2010,7 @@
 	struct slave *slave, *oldcurrent;
 	int do_failover = 0;
 	int delta_in_ticks;
+ 	int up_slaves = 0;
 	int i;
 
 	read_lock(&bond->lock);
@@ -2046,6 +2047,7 @@
 		case BOND_LINK_UP:	/* the link was up */
 			if (link_state == BMSR_LSTATUS) {
 				/* link stays up, nothing more to do */
+				up_slaves++;
 				break;
 			} else { /* link going down */
 				slave->link  = BOND_LINK_FAIL;
@@ -2115,6 +2117,7 @@
 			} else {
 				/* link up again */
 				slave->link  = BOND_LINK_UP;
+				up_slaves++;
 				slave->jiffies = jiffies;
 				printk(KERN_INFO DRV_NAME
 				       ": %s: link status up again after %d "
@@ -2163,6 +2166,7 @@
 				if (slave->delay == 0) {
 					/* now the link has been up for long time enough */
 					slave->link = BOND_LINK_UP;
+					up_slaves++;
 					slave->jiffies = jiffies;
 
 					if (bond->params.mode == BOND_MODE_8023AD) {
@@ -2237,6 +2241,8 @@
 		write_unlock(&bond->curr_slave_lock);
 	}
 
+	bond->up_slave_cnt = up_slaves;
+
 re_arm:
 	if (bond->params.miimon) {
 		mod_timer(&bond->mii_timer, jiffies + delta_in_ticks);
@@ -2270,6 +2276,7 @@
 	struct slave *slave, *oldcurrent;
 	int do_failover = 0;
 	int delta_in_ticks;
+ 	int up_slaves = 0;
 	int i;
 
 	read_lock(&bond->lock);
@@ -2303,6 +2310,7 @@
 
 				slave->link  = BOND_LINK_UP;
 				slave->state = BOND_STATE_ACTIVE;
+				up_slaves++;
 
 				/* primary_slave has no meaning in round-robin
 				 * mode. the window of a slave being up and
@@ -2349,7 +2357,9 @@
 				if (slave == oldcurrent) {
 					do_failover = 1;
 				}
-			}
+			} else {
+				up_slaves++;
+			}				
 		}
 
 		/* note: if switch is in round-robin mode, all links
@@ -2379,6 +2389,8 @@
 		write_unlock(&bond->curr_slave_lock);
 	}
 
+	bond->up_slave_cnt = up_slaves;
+
 re_arm:
 	if (bond->params.arp_interval) {
 		mod_timer(&bond->arp_timer, jiffies + delta_in_ticks);
@@ -3601,14 +3613,14 @@
 		}
 	}
 
-out:
-	read_unlock(&bond->lock);
-	return 0;
-
 free_out:
 	/* no suitable interface, frame not sent */
 	dev_kfree_skb(skb);
 	goto out;
+
+out:
+	read_unlock(&bond->lock);
+	return 0;
 }
 
 /*
@@ -3665,47 +3677,42 @@
 {
 	struct bonding *bond = bond_dev->priv;
 	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct slave *slave, *start_at;
+	struct slave *slave;
+	int slave_cnt;
 	int slave_no;
 	int i;
 
 	read_lock(&bond->lock);
 
-	if (!BOND_IS_OK(bond)) {
+	slave_cnt = bond->up_slave_cnt;
+	if (!BOND_IS_OK(bond) || slave_cnt == 0) {
 		goto free_out;
 	}
 
-	slave_no = (data->h_dest[5]^bond_dev->dev_addr[5]) % bond->slave_cnt;
+	slave_no = (data->h_dest[5]^bond_dev->dev_addr[5]) % slave_cnt;
 
 	bond_for_each_slave(bond, slave, i) {
-		slave_no--;
-		if (slave_no < 0) {
-			break;
-		}
-	}
-
-	start_at = slave;
-
-	bond_for_each_slave_from(bond, slave, i, start_at) {
 		if (IS_UP(slave->dev) &&
 		    (slave->link == BOND_LINK_UP) &&
 		    (slave->state == BOND_STATE_ACTIVE)) {
-			skb->dev = slave->dev;
-			skb->priority = 1;
-			dev_queue_xmit(skb);
+			slave_no--;
+			if (slave_no < 0) {
+				skb->dev = slave->dev;
+				skb->priority = 1;
+				dev_queue_xmit(skb);
 
-			goto out;
+				goto out;
+			}
 		}
 	}
 
-out:
-	read_unlock(&bond->lock);
-	return 0;
-
 free_out:
 	/* no suitable interface, frame not sent */
 	dev_kfree_skb(skb);
-	goto out;
+
+out:
+	read_unlock(&bond->lock);
+	return 0;
 }
 
 /*
diff -ruN a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
--- a/drivers/net/bonding/bonding.h	Fri Jan 23 18:14:51 2004
+++ b/drivers/net/bonding/bonding.h	Fri Jan 23 18:15:39 2004
@@ -180,6 +180,7 @@
 	struct   slave *current_arp_slave;
 	struct   slave *primary_slave;
 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
+	s32      up_slave_cnt;	/* never change outside periodic monitors */
 	rwlock_t lock;
 	rwlock_t curr_slave_lock;
 	struct   timer_list mii_timer;

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-01-24 12:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E6F7D288B394A64585E67497E5126BA601F991D2@hasmsx403.iil.intel.com>
2004-01-11 14:50 ` [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode Amir Noam
2004-01-11 21:45   ` Per Hedeland
2004-01-24 12:33     ` [PATCH] [bonding 2.4] Fixes for balance-xor and balance-rr Per Hedeland
     [not found] <E6F7D288B394A64585E67497E5126BA601F991D0@hasmsx403.iil.intel.com>
2004-01-08 15:33 ` [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode Amir Noam
2004-01-08 16:43   ` Per Hedeland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).