* Re: [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode [not found] <E6F7D288B394A64585E67497E5126BA601F991D2@hasmsx403.iil.intel.com> @ 2004-01-11 14:50 ` Amir Noam 2004-01-11 21:45 ` Per Hedeland 0 siblings, 1 reply; 5+ messages in thread From: Amir Noam @ 2004-01-11 14:50 UTC (permalink / raw) To: Per Hedeland; +Cc: bonding-devel, netdev On Thursday 08 January 2004 06:43 pm, Per Hedeland wrote: > Amir Noam <amir.noam@intel.com> wrote: > >I don't like this. The reason we give different tx function > > pointers to dev->hard_start_xmit in different bonding mode is to > > make the tx path as fast as possible. Otherwise we might as well > > use a single tx function that chooses its exact operation based > > on the bonding mode. > > > >It might be better to have some code duplication if it results in > >faster tx, but I'm not sure what's the optimal solution in this > > case. > > Well, I don't really have an opinion since I don't have a good idea > about the cost of a function call relative to "everything else" > that is happening here. I don't see a way to do "limited" > duplication without using function calls though, but I'm quite > happy to make it two entirely separate functions for MAC vs IP. > Please advise. A possible way to have "limited" duplication would be to have two seperate xmit functions, that call an inline function for the shared code. This might be good enough, performance-wise, while avoiding some code duplication. But, like I've said, I'm not sure wout the best solution is. I'd like to hear what Jay (and others) thinks about it. -- Amir ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode 2004-01-11 14:50 ` [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode Amir Noam @ 2004-01-11 21:45 ` Per Hedeland 2004-01-24 12:33 ` [PATCH] [bonding 2.4] Fixes for balance-xor and balance-rr Per Hedeland 0 siblings, 1 reply; 5+ messages in thread From: Per Hedeland @ 2004-01-11 21:45 UTC (permalink / raw) To: amir.noam; +Cc: bonding-devel, netdev Amir Noam <amir.noam@intel.com> wrote: > >A possible way to have "limited" duplication would be to have two >seperate xmit functions, that call an inline function for the shared >code. This might be good enough, performance-wise, while avoiding >some code duplication. Yes, I actually thought of that (after the previous post:-) - it would at least avoid some *source* code duplication. And the reasonable thing to break out into such a function would be the logic to find the slave to xmit on given slave_no (perhaps also the actual xmit), I guess. However when looking at this I realized that the xor mode has a pretty serious bug/flaw in the fault-tolerance department: The modulus is done with the total number of slaves, and slaves that are down are only skipped in that find-slave logic. I.e. if you (e.g.) have a bond with 3 links where link 1 is down, link 2 will get 2/3 of the traffic and link 3 get 1/3 - not good I think. Since fixing this would likely change the interface to the above function, I guess it would be nice to do it at the same time as (or before) the balance-xor-ip addition. Assuming I'm not the only one that thinks it needs fixing, that is.:-) The superficially obvious fix is to do the modulus with only the number of slaves that are up, but I'm not sure about the best actual implementation. That number isn't currently maintained, and traversing the list of slaves just to find it for every packet doesn't make sense of course. It seems straightforward to add another field to the bonding struct for it, and maintain that in the monitoring functions, though. But in addition to that, the actual slave selection logic would have to always check the link state of each slave until it has found the slave_no'th one that is up (though one could optimize for the case of all slaves being up) - whereas currently it only starts checking at the slave_no'th one, and if all slaves are up there is only one check. I'm not sure if this is acceptable? The only alternative I see is to also maintain a list of the slaves that are up, but then we're perhaps getting too much complexity for these "simple" modes. Any comments, other alternatives? >But, like I've said, I'm not sure wout the best solution is. I'd like >to hear what Jay (and others) thinks about it. Well, Jay said he would be out of email access for three weeks - starting tomorrow and promising he would look over updated patches first, but maybe he didn't have the time - or, given that you objected:-), he (correctly) thought that a second update wouldn't arrive in time. I'm in no particular hurry to get this into the "official" source though, since I'll be using an older kernel that I've already applied the (original) patch to for a while anyway - it's just that it gets old pretty quickly to keep reworking the patches against moving (and not very easily accessible) targets... --Per Hedeland per@hedeland.org ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] [bonding 2.4] Fixes for balance-xor and balance-rr 2004-01-11 21:45 ` Per Hedeland @ 2004-01-24 12:33 ` Per Hedeland 0 siblings, 0 replies; 5+ messages in thread From: Per Hedeland @ 2004-01-24 12:33 UTC (permalink / raw) To: bonding-devel, netdev Per Hedeland <per@hedeland.org> wrote: > >However when looking at this I realized that the xor mode has a pretty >serious bug/flaw in the fault-tolerance department: The modulus is done >with the total number of slaves, and slaves that are down are only >skipped in that find-slave logic. I.e. if you (e.g.) have a bond with 3 >links where link 1 is down, link 2 will get 2/3 of the traffic and link >3 get 1/3 - not good I think. Enclosed is a fix for this against current netdev-2.4. I also noticed that there seems to be another serious bug in bond_xmit_xor(): If all slaves are down, it will neither dev_queue_xmit() nor dev_kfree_skb(). Ditto for bond_xmit_roundrobin() - the patch should fix both. Comments appreciated. Also enclosed are some stats from a balance-xor-ip bond with 4 slaves (eth2-eth5) where the first two are down - as expected, the patch changes the transmit load distribution from ~ 75/25 to close to 50/50 on the two remaining ones (eth4, eth5). --Per Hedeland per@hedeland.org ==================================================================== Before fix: eth4 Link encap:Ethernet HWaddr 00:C0:95:C9:1A:28 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:451676 errors:0 dropped:0 overruns:0 frame:0 TX packets:300809 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:192971475 (184.0 Mb) TX bytes:145298331 (138.5 Mb) Interrupt:20 Base address:0x5400 eth5 Link encap:Ethernet HWaddr 00:C0:95:C9:1A:28 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:169355 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:208 (208.0 b) TX bytes:51126318 (48.7 Mb) Interrupt:21 Base address:0x7000 ------------------------------------------------------------------- After fix: eth4 Link encap:Ethernet HWaddr 00:C0:95:C9:1A:28 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:453036 errors:0 dropped:0 overruns:0 frame:0 TX packets:223884 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:193532059 (184.5 Mb) TX bytes:93244842 (88.9 Mb) Interrupt:20 Base address:0x5400 eth5 Link encap:Ethernet HWaddr 00:C0:95:C9:1A:28 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:249256 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:208 (208.0 b) TX bytes:103875515 (99.0 Mb) Interrupt:21 Base address:0x7000 ==================================================================== diff -ruN a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c --- a/drivers/net/bonding/bond_main.c Fri Jan 23 18:03:31 2004 +++ b/drivers/net/bonding/bond_main.c Fri Jan 23 18:12:23 2004 @@ -2010,6 +2010,7 @@ struct slave *slave, *oldcurrent; int do_failover = 0; int delta_in_ticks; + int up_slaves = 0; int i; read_lock(&bond->lock); @@ -2046,6 +2047,7 @@ case BOND_LINK_UP: /* the link was up */ if (link_state == BMSR_LSTATUS) { /* link stays up, nothing more to do */ + up_slaves++; break; } else { /* link going down */ slave->link = BOND_LINK_FAIL; @@ -2115,6 +2117,7 @@ } else { /* link up again */ slave->link = BOND_LINK_UP; + up_slaves++; slave->jiffies = jiffies; printk(KERN_INFO DRV_NAME ": %s: link status up again after %d " @@ -2163,6 +2166,7 @@ if (slave->delay == 0) { /* now the link has been up for long time enough */ slave->link = BOND_LINK_UP; + up_slaves++; slave->jiffies = jiffies; if (bond->params.mode == BOND_MODE_8023AD) { @@ -2237,6 +2241,8 @@ write_unlock(&bond->curr_slave_lock); } + bond->up_slave_cnt = up_slaves; + re_arm: if (bond->params.miimon) { mod_timer(&bond->mii_timer, jiffies + delta_in_ticks); @@ -2270,6 +2276,7 @@ struct slave *slave, *oldcurrent; int do_failover = 0; int delta_in_ticks; + int up_slaves = 0; int i; read_lock(&bond->lock); @@ -2303,6 +2310,7 @@ slave->link = BOND_LINK_UP; slave->state = BOND_STATE_ACTIVE; + up_slaves++; /* primary_slave has no meaning in round-robin * mode. the window of a slave being up and @@ -2349,7 +2357,9 @@ if (slave == oldcurrent) { do_failover = 1; } - } + } else { + up_slaves++; + } } /* note: if switch is in round-robin mode, all links @@ -2379,6 +2389,8 @@ write_unlock(&bond->curr_slave_lock); } + bond->up_slave_cnt = up_slaves; + re_arm: if (bond->params.arp_interval) { mod_timer(&bond->arp_timer, jiffies + delta_in_ticks); @@ -3601,14 +3613,14 @@ } } -out: - read_unlock(&bond->lock); - return 0; - free_out: /* no suitable interface, frame not sent */ dev_kfree_skb(skb); goto out; + +out: + read_unlock(&bond->lock); + return 0; } /* @@ -3665,47 +3677,42 @@ { struct bonding *bond = bond_dev->priv; struct ethhdr *data = (struct ethhdr *)skb->data; - struct slave *slave, *start_at; + struct slave *slave; + int slave_cnt; int slave_no; int i; read_lock(&bond->lock); - if (!BOND_IS_OK(bond)) { + slave_cnt = bond->up_slave_cnt; + if (!BOND_IS_OK(bond) || slave_cnt == 0) { goto free_out; } - slave_no = (data->h_dest[5]^bond_dev->dev_addr[5]) % bond->slave_cnt; + slave_no = (data->h_dest[5]^bond_dev->dev_addr[5]) % slave_cnt; bond_for_each_slave(bond, slave, i) { - slave_no--; - if (slave_no < 0) { - break; - } - } - - start_at = slave; - - bond_for_each_slave_from(bond, slave, i, start_at) { if (IS_UP(slave->dev) && (slave->link == BOND_LINK_UP) && (slave->state == BOND_STATE_ACTIVE)) { - skb->dev = slave->dev; - skb->priority = 1; - dev_queue_xmit(skb); + slave_no--; + if (slave_no < 0) { + skb->dev = slave->dev; + skb->priority = 1; + dev_queue_xmit(skb); - goto out; + goto out; + } } } -out: - read_unlock(&bond->lock); - return 0; - free_out: /* no suitable interface, frame not sent */ dev_kfree_skb(skb); - goto out; + +out: + read_unlock(&bond->lock); + return 0; } /* diff -ruN a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h --- a/drivers/net/bonding/bonding.h Fri Jan 23 18:14:51 2004 +++ b/drivers/net/bonding/bonding.h Fri Jan 23 18:15:39 2004 @@ -180,6 +180,7 @@ struct slave *current_arp_slave; struct slave *primary_slave; s32 slave_cnt; /* never change this value outside the attach/detach wrappers */ + s32 up_slave_cnt; /* never change outside periodic monitors */ rwlock_t lock; rwlock_t curr_slave_lock; struct timer_list mii_timer; ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <E6F7D288B394A64585E67497E5126BA601F991D0@hasmsx403.iil.intel.com>]
* Re: [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode [not found] <E6F7D288B394A64585E67497E5126BA601F991D0@hasmsx403.iil.intel.com> @ 2004-01-08 15:33 ` Amir Noam 2004-01-08 16:43 ` Per Hedeland 0 siblings, 1 reply; 5+ messages in thread From: Amir Noam @ 2004-01-08 15:33 UTC (permalink / raw) To: Per Hedeland; +Cc: bonding-devel, netdev On Wednesday 07 January 2004 10:58 pm, Per Hedeland wrote: > struct bonding *bond = bond_dev->priv; > struct ethhdr *data = (struct ethhdr *)skb->data; > struct slave *slave, *start_at; > - int slave_no; > + int slave_no = 0; > int i; > + __u32 u; > > read_lock(&bond->lock); > Please use u32 instead of __u32. > +static int bond_xmit_xor_mac(struct sk_buff *skb, struct net_device *bond_dev) > +{ > + return bond_xmit_xor(skb, bond_dev, 0); > +} > + > +static int bond_xmit_xor_ip(struct sk_buff *skb, struct net_device *bond_dev) > +{ > + return bond_xmit_xor(skb, bond_dev, 1); > +} > + hmm... I don't like this. The reason we give different tx function pointers to dev->hard_start_xmit in different bonding mode is to make the tx path as fast as possible. Otherwise we might as well use a single tx function that chooses its exact operation based on the bonding mode. It might be better to have some code duplication if it results in faster tx, but I'm not sure what's the optimal solution in this case. -- Amir ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode 2004-01-08 15:33 ` [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode Amir Noam @ 2004-01-08 16:43 ` Per Hedeland 0 siblings, 0 replies; 5+ messages in thread From: Per Hedeland @ 2004-01-08 16:43 UTC (permalink / raw) To: amir.noam; +Cc: bonding-devel, netdev Amir Noam <amir.noam@intel.com> wrote: >Please use u32 instead of __u32. OK. >hmm... > >I don't like this. The reason we give different tx function pointers >to dev->hard_start_xmit in different bonding mode is to make the tx >path as fast as possible. Otherwise we might as well use a single tx >function that chooses its exact operation based on the bonding mode. > >It might be better to have some code duplication if it results in >faster tx, but I'm not sure what's the optimal solution in this case. Well, I don't really have an opinion since I don't have a good idea about the cost of a function call relative to "everything else" that is happening here. I don't see a way to do "limited" duplication without using function calls though, but I'm quite happy to make it two entirely separate functions for MAC vs IP. Please advise. --Per Hedeland per@hedeland.org ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-01-24 12:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E6F7D288B394A64585E67497E5126BA601F991D2@hasmsx403.iil.intel.com>
2004-01-11 14:50 ` [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode Amir Noam
2004-01-11 21:45 ` Per Hedeland
2004-01-24 12:33 ` [PATCH] [bonding 2.4] Fixes for balance-xor and balance-rr Per Hedeland
[not found] <E6F7D288B394A64585E67497E5126BA601F991D0@hasmsx403.iil.intel.com>
2004-01-08 15:33 ` [Bonding-devel] [PATCH] [bonding 2.4] Add balance-xor-ip bonding mode Amir Noam
2004-01-08 16:43 ` Per Hedeland
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).