* [RFC][PATCH 0/3] bonding support for operation over IPoIB
@ 2006-09-26 10:16 Or Gerlitz
2006-09-26 10:17 ` [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices Or Gerlitz
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Or Gerlitz @ 2006-09-26 10:16 UTC (permalink / raw)
To: netdev; +Cc: Roland Dreier
This patch series is an RFC for changes to the bonding driver such that it
would be able to support non ARPHRD_ETHER netdevices for its High-Availability
(active-backup) mode.
My motivation was to enable the bonding driver on its HA mode to work with the
IP over Infiniband (IPoIB) driver. With these patches I was able to enslave
IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast and ICMP traffic with
fail-over and fail-back working fine.
More over, as IPoIB is also the IB ARP provider for the RDMA CM driver which
is used by native IB ULPs whose addressing scheme is based on IP (eg iSER, SDP,
Lustre, NFSoRDMA, RDS), bonding support for IPoIB devices **enables** HA for
these ULPs. This holds as when the ULP is informed by the IB HW on the failure
of the currect IB connection, it just need to reconnect, where the bonding
device will now issue the IB ARP over the active IPoIB slave.
The first patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types.
IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have ommited here some details which
are not important for the bonding RFC).
Basically the IPoIB spec and impl. do not allow for setting the MAC address of
an IPoIB device and my work was made under this assumption.
The second patch allows for enslaving netdevices which do not support the
set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by the bonding code when fail-over
occurs (this was already in the bonding code).
The third patch is temporal i hope, and is now required to run IP multicast when
bonding IPoIB devices. The problem is that some multicast groups (eg the all-hosts
224.0.0.1) might be set to the bonding device by the net stack **before** any
enslavement takes place.
Since ether_setup() sets the bonding device type to be ARPHRD_ETHER and address
len to be ETHER_ALEN, the net core code computes a wrong multicast link address.
Now, the current IPoIB impl. attempts to join on this wrong mcast address and
does not process more join requests. As a result IP multicast over other groups
whose link address is computed correct would not work.
Or Gerlitz.
^ permalink raw reply [flat|nested] 20+ messages in thread* [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices 2006-09-26 10:16 [RFC][PATCH 0/3] bonding support for operation over IPoIB Or Gerlitz @ 2006-09-26 10:17 ` Or Gerlitz 2006-09-26 19:23 ` Jay Vosburgh 2006-09-26 10:17 ` [RFC][PATCH 2/3] enable bonding to enslave netdevices not supporting set_mac_address() Or Gerlitz 2006-09-26 10:18 ` [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices Or Gerlitz 2 siblings, 1 reply; 20+ messages in thread From: Or Gerlitz @ 2006-09-26 10:17 UTC (permalink / raw) To: netdev; +Cc: Roland Dreier Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Index: net-2.6.19/drivers/net/bonding/bond_main.c =================================================================== --- net-2.6.19.orig/drivers/net/bonding/bond_main.c 2006-09-20 14:40:13.000000000 +0300 +++ net-2.6.19/drivers/net/bonding/bond_main.c 2006-09-25 11:43:52.000000000 +0300 @@ -1013,6 +1013,23 @@ static struct slave *bond_find_best_slav return bestslave; } +void bond_setup_by_slave(struct bonding *bond, struct slave *new_active) +{ + bond->dev->hard_header = new_active->dev->hard_header; + bond->dev->rebuild_header = new_active->dev->rebuild_header; + bond->dev->hard_header_cache = new_active->dev->hard_header_cache; + bond->dev->header_cache_update = new_active->dev->header_cache_update; + bond->dev->hard_header_parse = new_active->dev->hard_header_parse; + + bond->dev->type = new_active->dev->type; + bond->dev->hard_header_len = new_active->dev->hard_header_len; + bond->dev->mtu = new_active->dev->mtu; + bond->dev->addr_len = new_active->dev->addr_len; + + memcpy(bond->dev->broadcast, new_active->dev->broadcast, + new_active->dev->addr_len); +} + /** * change_active_interface - change the active slave into the specified one * @bond: our bonding struct @@ -1091,6 +1108,14 @@ void bond_change_active_slave(struct bon if (new_active) { bond_set_slave_active_flags(new_active); } + + /* bonding netdevices are created with ether_setup, so when the + * slave type is not ARPHRD_ETHER there is a need to override + * some of the type dependent attributes/functions + */ + if (new_active && new_active->dev->type != ARPHRD_ETHER) + bond_setup_by_slave(bond, new_active); + bond_send_gratuitous_arp(bond); } } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices 2006-09-26 10:17 ` [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices Or Gerlitz @ 2006-09-26 19:23 ` Jay Vosburgh 2006-09-27 19:59 ` Or Gerlitz 0 siblings, 1 reply; 20+ messages in thread From: Jay Vosburgh @ 2006-09-26 19:23 UTC (permalink / raw) To: Or Gerlitz; +Cc: netdev, Roland Dreier Or Gerlitz <ogerlitz@voltaire.com> wrote: [...] + bond->dev->mtu = new_active->dev->mtu; This won't generate a NETDEV_CHANGEMTU notifier event. [...] >+ /* bonding netdevices are created with ether_setup, so when the >+ * slave type is not ARPHRD_ETHER there is a need to override >+ * some of the type dependent attributes/functions >+ */ >+ if (new_active && new_active->dev->type != ARPHRD_ETHER) >+ bond_setup_by_slave(bond, new_active); >+ In this case, if the bond has one slave that's ARPHRD_ETHER and one that's not, when the active changes from the non-ARPHRD_ETHER slave to the ARPHRD_ETHER slave, it won't call bond_setup_by_slave() to switch the hard_header, rebuild_header, et al, back to the ARPHRD_ETHER settings. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices 2006-09-26 19:23 ` Jay Vosburgh @ 2006-09-27 19:59 ` Or Gerlitz 2006-09-28 17:02 ` Jay Vosburgh 0 siblings, 1 reply; 20+ messages in thread From: Or Gerlitz @ 2006-09-27 19:59 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Or Gerlitz, netdev, Roland Dreier On 9/26/06, Jay Vosburgh <fubar@us.ibm.com> wrote: > Or Gerlitz <ogerlitz@voltaire.com> wrote: > [...] > + bond->dev->mtu = new_active->dev->mtu; > > This won't generate a NETDEV_CHANGEMTU notifier event. What is actually the trigger for the event with the current impl? is the code that actually calls dev_set_mtu() on the bonding device or dev_set_mtu() itself? > > [...] > >+ /* bonding netdevices are created with ether_setup, so when the > >+ * slave type is not ARPHRD_ETHER there is a need to override > >+ * some of the type dependent attributes/functions > >+ */ > >+ if (new_active && new_active->dev->type != ARPHRD_ETHER) > >+ bond_setup_by_slave(bond, new_active); > >+ > In this case, if the bond has one slave that's ARPHRD_ETHER and > one that's not, when the active changes from the non-ARPHRD_ETHER slave > to the ARPHRD_ETHER slave, it won't call bond_setup_by_slave() to switch > the hard_header, rebuild_header, et al, back to the ARPHRD_ETHER > settings. OK. First, under the assumption that one may enslave ARPHRD_ETHER and non-ARPHRD_ETHER devices in the same bond, you are correct and the patch is not complete here. However, putting devices from different types in the same bond requires a switch that **both** HW NICs/ports associated with the each of the netdevices can talk to. If there is no such switch, then the only possible config is two isolated networks/switches where each NIC/type is connected to a switch supporting this type so a local failure/failover on some node requires the whole subset of nodes talking to this one to do failover. So if the relation (i,j) which holds if node i talks to node j does not impose a disjoint partition on the set of all N nodes, you just can't do this bonding scheme. Practically, talking on IPoIB vs. "IPoETH" (ie slave devices of type ARPHRD_INFINIBAND vs slaves of type ARPHRD_ETHER) to have an IPoIB slave talk to "IPoETH" slave you need an IB to Ethernet IP router (actually IPoIB to IPoETH "bridge") in the middle where the IB switch should be connected to the IB ports of the bridge and the Ethernet switch to the Ethernet ports of the bridge. All in all, it is a configuration i think we can avoid supporting. So at the bottom line, i would go on enhancing my patch not to allow bonding together devices of different types or at least if you don't mind, not to allow putting ARPHRD_INFINIBAND with non-ARPHRD_INFINIBAND devices in the same bond. Or. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices 2006-09-27 19:59 ` Or Gerlitz @ 2006-09-28 17:02 ` Jay Vosburgh 2006-10-03 12:56 ` Or Gerlitz 0 siblings, 1 reply; 20+ messages in thread From: Jay Vosburgh @ 2006-09-28 17:02 UTC (permalink / raw) To: Or Gerlitz; +Cc: Or Gerlitz, netdev, Roland Dreier Or Gerlitz <or.gerlitz@gmail.com> wrote: >On 9/26/06, Jay Vosburgh <fubar@us.ibm.com> wrote: >> Or Gerlitz <ogerlitz@voltaire.com> wrote: >> [...] >> + bond->dev->mtu = new_active->dev->mtu; >> >> This won't generate a NETDEV_CHANGEMTU notifier event. > >What is actually the trigger for the event with the current impl? is >the code that actually calls dev_set_mtu() on the bonding device or >dev_set_mtu() itself? My comment wasn't quite totally thought out; pretend you didn't see it. I think what would be better overall is to handle the mtu for this case the way bonding handles the mtu for other slave devices. Normally, the mtu is pushed to the slaves from the bonding master, not the other way around. So, you don't want to assign the master's mtu here; the slave mtu should already be up to date (and set to whatever the master's mtu is via the usual mechanism, bond_change_mtu for changes, or set in the slave at enslavement time). [...] >So at the bottom line, i would go on enhancing my patch not to allow >bonding together devices of different types or at least if you don't >mind, not to allow putting ARPHRD_INFINIBAND with >non-ARPHRD_INFINIBAND devices in the same bond. I think this (disallowing bonding of dissimilar ARPHRD types) is the way to go, at least in the short term. Get it to work for the common case first, then deal with the fringe stuff later. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices 2006-09-28 17:02 ` Jay Vosburgh @ 2006-10-03 12:56 ` Or Gerlitz 0 siblings, 0 replies; 20+ messages in thread From: Or Gerlitz @ 2006-10-03 12:56 UTC (permalink / raw) To: Jay Vosburgh; +Cc: netdev, Roland Dreier Jay Vosburgh wrote: > Or Gerlitz <or.gerlitz@gmail.com> wrote: > >> On 9/26/06, Jay Vosburgh <fubar@us.ibm.com> wrote: >>> Or Gerlitz <ogerlitz@voltaire.com> wrote: >>> [...] >>> + bond->dev->mtu = new_active->dev->mtu; >>> >>> This won't generate a NETDEV_CHANGEMTU notifier event. >> What is actually the trigger for the event with the current impl? is >> the code that actually calls dev_set_mtu() on the bonding device or >> dev_set_mtu() itself? > My comment wasn't quite totally thought out; pretend you didn't > see it. > I think what would be better overall is to handle the mtu for > this case the way bonding handles the mtu for other slave devices. > Normally, the mtu is pushed to the slaves from the bonding master, not > the other way around. So, you don't want to assign the master's mtu > here; the slave mtu should already be up to date (and set to whatever > the master's mtu is via the usual mechanism, bond_change_mtu for > changes, or set in the slave at enslavement time). OK, i think i got you. Today the dev_set_mtu() is called on the slave device only when someone attempts to change the bond MTU. So you suggest to do it also during enslavement so the current master MTU would be propagated to the slaves and not vise versa, this makes sense. > [...] >> So at the bottom line, i would go on enhancing my patch not to allow >> bonding together devices of different types or at least if you don't >> mind, not to allow putting ARPHRD_INFINIBAND with >> non-ARPHRD_INFINIBAND devices in the same bond. > > I think this (disallowing bonding of dissimilar ARPHRD types) is > the way to go, at least in the short term. Get it to work for the > common case first, then deal with the fringe stuff later. OK, as you are fine with it, i will modify the patch to disallow bonding of dissimilar ARPHRD types. Or. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC][PATCH 2/3] enable bonding to enslave netdevices not supporting set_mac_address() 2006-09-26 10:16 [RFC][PATCH 0/3] bonding support for operation over IPoIB Or Gerlitz 2006-09-26 10:17 ` [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices Or Gerlitz @ 2006-09-26 10:17 ` Or Gerlitz 2006-09-26 10:18 ` [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices Or Gerlitz 2 siblings, 0 replies; 20+ messages in thread From: Or Gerlitz @ 2006-09-26 10:17 UTC (permalink / raw) To: netdev; +Cc: Roland Dreier Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Index: net-2.6.19/drivers/net/bonding/bond_main.c =================================================================== --- net-2.6.19.orig/drivers/net/bonding/bond_main.c 2006-09-25 11:43:52.000000000 +0300 +++ net-2.6.19/drivers/net/bonding/bond_main.c 2006-09-25 11:46:35.000000000 +0300 @@ -1115,7 +1115,14 @@ void bond_change_active_slave(struct bon */ if (new_active && new_active->dev->type != ARPHRD_ETHER) bond_setup_by_slave(bond, new_active); - + + /* when bonding does not set the slave MAC address, the bond MAC + * address is the one of the active slave. + */ + if (new_active && !bond->do_set_mac_addr) + memcpy(bond->dev->dev_addr, new_active->dev->dev_addr, + new_active->dev->addr_len); + bond_send_gratuitous_arp(bond); } } @@ -1335,14 +1342,23 @@ int bond_enslave(struct net_device *bond } if (slave_dev->set_mac_address == NULL) { - printk(KERN_ERR DRV_NAME - ": %s: Error: The slave device you specified does " - "not support setting the MAC address. " - "Your kernel likely does not support slave " - "devices.\n", bond_dev->name); - res = -EOPNOTSUPP; - goto err_undo_flags; - } + if (bond->slave_cnt == 0) { + printk(KERN_WARNING DRV_NAME + ": %s: Warning: The first slave device you " + "specified does not support setting the MAC " + "address. This bond MAC address would be that " + "of the active slave.\n", bond_dev->name); + bond->do_set_mac_addr = 0; + } else if (bond->do_set_mac_addr) { + printk(KERN_ERR DRV_NAME + ": %s: Error: The slave device you specified " + "does not support setting the MAC addres,." + "but this bond uses this practice. \n" + , bond_dev->name); + res = -EOPNOTSUPP; + goto err_undo_flags; + } + } new_slave = kmalloc(sizeof(struct slave), GFP_KERNEL); if (!new_slave) { @@ -1364,16 +1380,18 @@ int bond_enslave(struct net_device *bond */ memcpy(new_slave->perm_hwaddr, slave_dev->dev_addr, ETH_ALEN); - /* - * Set slave to master's mac address. The application already - * set the master's mac address to that of the first slave - */ - memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len); - addr.sa_family = slave_dev->type; - res = dev_set_mac_address(slave_dev, &addr); - if (res) { - dprintk("Error %d calling set_mac_address\n", res); - goto err_free; + if (bond->do_set_mac_addr) { + /* + * Set slave to master's mac address. The application already + * set the master's mac address to that of the first slave + */ + memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len); + addr.sa_family = slave_dev->type; + res = dev_set_mac_address(slave_dev, &addr); + if (res) { + dprintk("Error %d calling set_mac_address\n", res); + goto err_free; + } } /* open the slave since the application closed it */ @@ -1617,9 +1635,11 @@ err_close: dev_close(slave_dev); err_restore_mac: - memcpy(addr.sa_data, new_slave->perm_hwaddr, ETH_ALEN); - addr.sa_family = slave_dev->type; - dev_set_mac_address(slave_dev, &addr); + if (bond->do_set_mac_addr) { + memcpy(addr.sa_data, new_slave->perm_hwaddr, ETH_ALEN); + addr.sa_family = slave_dev->type; + dev_set_mac_address(slave_dev, &addr); + } err_free: kfree(new_slave); @@ -1797,10 +1817,12 @@ int bond_release(struct net_device *bond /* close slave before restoring its mac address */ dev_close(slave_dev); - /* restore original ("permanent") mac address */ - memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN); - addr.sa_family = slave_dev->type; - dev_set_mac_address(slave_dev, &addr); + if (bond->do_set_mac_addr) { + /* restore original ("permanent") mac address */ + memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN); + addr.sa_family = slave_dev->type; + dev_set_mac_address(slave_dev, &addr); + } slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB | IFF_SLAVE_INACTIVE); @@ -1886,10 +1908,12 @@ static int bond_release_all(struct net_d /* close slave before restoring its mac address */ dev_close(slave_dev); - /* restore original ("permanent") mac address*/ - memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN); - addr.sa_family = slave_dev->type; - dev_set_mac_address(slave_dev, &addr); + if (bond->do_set_mac_addr) { + /* restore original ("permanent") mac address*/ + memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN); + addr.sa_family = slave_dev->type; + dev_set_mac_address(slave_dev, &addr); + } slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB | IFF_SLAVE_INACTIVE); @@ -3793,6 +3817,10 @@ static int bond_set_mac_address(struct n dprintk("bond=%p, name=%s\n", bond, (bond_dev ? bond_dev->name : "None")); + if (!bond->do_set_mac_addr) { + return -EOPNOTSUPP; + } + if (!is_valid_ether_addr(sa->sa_data)) { return -EADDRNOTAVAIL; } @@ -4233,6 +4261,9 @@ static int bond_init(struct net_device * bond_create_proc_entry(bond); #endif + /* set do_set_mac_addr to true on startup */ + bond->do_set_mac_addr = 1; + list_add_tail(&bond->bond_list, &bond_dev_list); return 0; Index: net-2.6.19/drivers/net/bonding/bonding.h =================================================================== --- net-2.6.19.orig/drivers/net/bonding/bonding.h 2006-09-25 11:42:28.000000000 +0300 +++ net-2.6.19/drivers/net/bonding/bonding.h 2006-09-25 11:46:35.000000000 +0300 @@ -198,6 +198,7 @@ struct bonding { struct bond_params params; struct list_head vlan_list; struct vlan_group *vlgrp; + s8 do_set_mac_addr; }; /** ^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-09-26 10:16 [RFC][PATCH 0/3] bonding support for operation over IPoIB Or Gerlitz 2006-09-26 10:17 ` [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices Or Gerlitz 2006-09-26 10:17 ` [RFC][PATCH 2/3] enable bonding to enslave netdevices not supporting set_mac_address() Or Gerlitz @ 2006-09-26 10:18 ` Or Gerlitz 2006-09-26 17:05 ` Stephen Hemminger 2006-09-26 23:40 ` Jay Vosburgh 2 siblings, 2 replies; 20+ messages in thread From: Or Gerlitz @ 2006-09-26 10:18 UTC (permalink / raw) To: netdev; +Cc: Roland Dreier Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Index: net-2.6.19/drivers/net/bonding/bond_main.c =================================================================== --- net-2.6.19.orig/drivers/net/bonding/bond_main.c 2006-09-25 11:46:35.000000000 +0300 +++ net-2.6.19/drivers/net/bonding/bond_main.c 2006-09-26 10:54:44.000000000 +0300 @@ -128,6 +128,12 @@ MODULE_PARM_DESC(arp_interval, "arp inte module_param_array(arp_ip_target, charp, NULL, 0); MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form"); +static int bonding_dev_type = ARPHRD_ETHER; +static int bonding_dev_addrlen = ETH_ALEN; + +module_param(bonding_dev_type, int, 0644); +module_param(bonding_dev_addrlen, int, 0644); + /*----------------------------- Global variables ----------------------------*/ static const char * const version = @@ -4606,7 +4612,14 @@ int bond_create(char *name, struct bond_ res = -ENOMEM; goto out_rtnl; } - + + /* XXX set the bond dev type and addr len such that the net core code + * (eg arp_mc_map() in net/ipv4/arp.c) would correctly process multicast + * groups set ***before*** the first enslaveness + */ + bond_dev->type = bonding_dev_type; + bond_dev->addr_len = bonding_dev_addrlen; + /* bond_init() must be called after dev_alloc_name() (for the * /proc files), but before register_netdevice(), because we * need to set function pointers. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-09-26 10:18 ` [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices Or Gerlitz @ 2006-09-26 17:05 ` Stephen Hemminger 2006-09-27 20:16 ` Or Gerlitz 2006-09-26 23:40 ` Jay Vosburgh 1 sibling, 1 reply; 20+ messages in thread From: Stephen Hemminger @ 2006-09-26 17:05 UTC (permalink / raw) To: Or Gerlitz; +Cc: netdev, Roland Dreier On Tue, 26 Sep 2006 13:18:09 +0300 (IDT) Or Gerlitz <ogerlitz@voltaire.com> wrote: > Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> > > Index: net-2.6.19/drivers/net/bonding/bond_main.c > =================================================================== > --- net-2.6.19.orig/drivers/net/bonding/bond_main.c 2006-09-25 11:46:35.000000000 +0300 > +++ net-2.6.19/drivers/net/bonding/bond_main.c 2006-09-26 10:54:44.000000000 +0300 > @@ -128,6 +128,12 @@ MODULE_PARM_DESC(arp_interval, "arp inte > module_param_array(arp_ip_target, charp, NULL, 0); > MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form"); > > +static int bonding_dev_type = ARPHRD_ETHER; > +static int bonding_dev_addrlen = ETH_ALEN; > + > +module_param(bonding_dev_type, int, 0644); > +module_param(bonding_dev_addrlen, int, 0644); Do you really want to allow changing these values after module load? If not replace 0644 with 0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-09-26 17:05 ` Stephen Hemminger @ 2006-09-27 20:16 ` Or Gerlitz 0 siblings, 0 replies; 20+ messages in thread From: Or Gerlitz @ 2006-09-27 20:16 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Or Gerlitz, netdev, Roland Dreier On 9/26/06, Stephen Hemminger <shemminger@osdl.org> wrote: > On Tue, 26 Sep 2006 13:18:09 +0300 (IDT) > Or Gerlitz <ogerlitz@voltaire.com> wrote: > > +module_param(bonding_dev_type, int, 0644); > > +module_param(bonding_dev_addrlen, int, 0644); > Do you really want to allow changing these values after module load? > If not replace 0644 with 0 Nope, they are ment to be used only at load time, thanks for the comment. Howeve, as i mentioned this patch is temporal workaround to allow for have IP multicast supported when bondiong non ARPHRD_ETHER devices. I am seeking better ways to do that. Or. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-09-26 10:18 ` [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices Or Gerlitz 2006-09-26 17:05 ` Stephen Hemminger @ 2006-09-26 23:40 ` Jay Vosburgh 2006-09-27 20:12 ` Or Gerlitz 1 sibling, 1 reply; 20+ messages in thread From: Jay Vosburgh @ 2006-09-26 23:40 UTC (permalink / raw) To: Or Gerlitz; +Cc: netdev, Roland Dreier Or Gerlitz <ogerlitz@voltaire.com> wrote: >+ /* XXX set the bond dev type and addr len such that the net core code >+ * (eg arp_mc_map() in net/ipv4/arp.c) would correctly process multicast >+ * groups set ***before*** the first enslaveness >+ */ >+ bond_dev->type = bonding_dev_type; >+ bond_dev->addr_len = bonding_dev_addrlen; I've been thinking about this a little bit more. The system is understandably not set up to deal with this situation, since normal devices won't ever change their hardware type. You almost want to have some kind of call to induce a reload from scratch of the multicast filter settings (along with whatever else might be necessary to alter the hardware type on the fly), to be called by bonding at the time the first slave is added (since slave adds happen in user context, and can therefore hold rtnl as required by most of the multicast address handling code). That seems less hassle than having to specify the hardware type and address length at module load time. A side effect of this is that bonds would have to be restricted to consisting only of slaves of one hardware type, since slave changes (and thus hardware type changes) aren't necessarily restricted to user context. Other random thoughts on how to resolve this include modifying bonding to accept slaves when the master is down (which would also require changes to the initscripts that normally configure bonding), so that the initial setting of the, e.g., 224.0.0.1 multicast hardware address happens to the already-changed hardware type. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-09-26 23:40 ` Jay Vosburgh @ 2006-09-27 20:12 ` Or Gerlitz 2006-09-28 17:43 ` Jay Vosburgh 0 siblings, 1 reply; 20+ messages in thread From: Or Gerlitz @ 2006-09-27 20:12 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Or Gerlitz, netdev, Roland Dreier On 9/27/06, Jay Vosburgh <fubar@us.ibm.com> wrote: > Or Gerlitz <ogerlitz@voltaire.com> wrote: > > >+ /* XXX set the bond dev type and addr len such that the net core code > >+ * (eg arp_mc_map() in net/ipv4/arp.c) would correctly process multicast > >+ * groups set ***before*** the first enslaveness > >+ */ > >+ bond_dev->type = bonding_dev_type; > >+ bond_dev->addr_len = bonding_dev_addrlen; > > I've been thinking about this a little bit more. The system is > understandably not set up to deal with this situation, since normal > devices won't ever change their hardware type. > > You almost want to have some kind of call to induce a reload > from scratch of the multicast filter settings (along with whatever else > might be necessary to alter the hardware type on the fly), to be called > by bonding at the time the first slave is added (since slave adds happen > in user context, and can therefore hold rtnl as required by most of the > multicast address handling code). That seems less hassle than having to > specify the hardware type and address length at module load time. I agree that it would be better to avoid doing it this way. > > A side effect of this is that bonds would have to be restricted > to consisting only of slaves of one hardware type, since slave changes > (and thus hardware type changes) aren't necessarily restricted to user > context. I have addressed the point of putting slaves of different types (and specifically slave of type ARPHRD_INFINIBAND with slave of other type in the same bond) in the thead that goes with patch 1/3, let close it there... > Other random thoughts on how to resolve this include modifying > bonding to accept slaves when the master is down (which would also > require changes to the initscripts that normally configure bonding), so > that the initial setting of the, e.g., 224.0.0.1 multicast hardware > address happens to the already-changed hardware type. OK, this is a direction i would like to check. Can be nice if you provide me with a 1-2 liner of directions on what need to be changed to enable bonding to accept slaves when it down. Or. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-09-27 20:12 ` Or Gerlitz @ 2006-09-28 17:43 ` Jay Vosburgh 2006-10-03 13:06 ` Or Gerlitz 0 siblings, 1 reply; 20+ messages in thread From: Jay Vosburgh @ 2006-09-28 17:43 UTC (permalink / raw) To: Or Gerlitz; +Cc: Or Gerlitz, netdev, Roland Dreier Or Gerlitz <or.gerlitz@gmail.com> wrote: >On 9/27/06, Jay Vosburgh <fubar@us.ibm.com> wrote: >> Or Gerlitz <ogerlitz@voltaire.com> wrote: [...] >> You almost want to have some kind of call to induce a reload >> from scratch of the multicast filter settings (along with whatever else >> might be necessary to alter the hardware type on the fly), to be called >> by bonding at the time the first slave is added (since slave adds happen >> in user context, and can therefore hold rtnl as required by most of the >> multicast address handling code). That seems less hassle than having to >> specify the hardware type and address length at module load time. > >I agree that it would be better to avoid doing it this way. Actually, it would be ideal to do it this way in all cases, as the change of hardware type is the biggest hurdle to cross-hardware bonding instances. The current infrastructure simply won't allow it, though, since bonding failover events usually occur in a timer context (if memory serves, timers run in softirq and can't acquire rtnl). [...] >> Other random thoughts on how to resolve this include modifying >> bonding to accept slaves when the master is down (which would also >> require changes to the initscripts that normally configure bonding), so >> that the initial setting of the, e.g., 224.0.0.1 multicast hardware >> address happens to the already-changed hardware type. > >OK, this is a direction i would like to check. Can be nice if you >provide me with a 1-2 liner of directions on what need to be changed >to enable bonding to accept slaves when it down. I don't think right offhand this would be a particularly difficult change; the "up" operation for bonding mostly just starts up various timers. A few minutes poking around doesn't reveal anything obvious that would hinder enslaving with the master down. You'll have to change ifenslave and the sysfs code to allow enslaves with the master down; that might be all that's needed for bonding itself. Changing /sbin/ifup and friends is a separate problem. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-09-28 17:43 ` Jay Vosburgh @ 2006-10-03 13:06 ` Or Gerlitz 2006-10-03 23:10 ` Jay Vosburgh 0 siblings, 1 reply; 20+ messages in thread From: Or Gerlitz @ 2006-10-03 13:06 UTC (permalink / raw) To: Jay Vosburgh; +Cc: netdev, Roland Dreier Jay Vosburgh wrote: > Or Gerlitz <or.gerlitz@gmail.com> wrote: > >> On 9/27/06, Jay Vosburgh <fubar@us.ibm.com> wrote: >>> Or Gerlitz <ogerlitz@voltaire.com> wrote: > [...] >>> You almost want to have some kind of call to induce a reload >>> from scratch of the multicast filter settings (along with whatever else >>> might be necessary to alter the hardware type on the fly), to be called >>> by bonding at the time the first slave is added (since slave adds happen >>> in user context, and can therefore hold rtnl as required by most of the >>> multicast address handling code). That seems less hassle than having to >>> specify the hardware type and address length at module load time. >> I agree that it would be better to avoid doing it this way. > > Actually, it would be ideal to do it this way in all cases, as > the change of hardware type is the biggest hurdle to cross-hardware > bonding instances. The current infrastructure simply won't allow it, > though, since bonding failover events usually occur in a timer context > (if memory serves, timers run in softirq and can't acquire rtnl). Sorry, but I don't follow... by saying "would be ideal to do ***it*** this way in all cases" what exactly is the "it" you are referring to? > > [...] >>> Other random thoughts on how to resolve this include modifying >>> bonding to accept slaves when the master is down (which would also >>> require changes to the initscripts that normally configure bonding), so >>> that the initial setting of the, e.g., 224.0.0.1 multicast hardware >>> address happens to the already-changed hardware type. >> OK, this is a direction i would like to check. Can be nice if you >> provide me with a 1-2 liner of directions on what need to be changed >> to enable bonding to accept slaves when it down. > > I don't think right offhand this would be a particularly > difficult change; the "up" operation for bonding mostly just starts up > various timers. A few minutes poking around doesn't reveal anything > obvious that would hinder enslaving with the master down. You'll have > to change ifenslave and the sysfs code to allow enslaves with the master > down; that might be all that's needed for bonding itself. Changing > /sbin/ifup and friends is a separate problem. OK, lets see i follow: 1st, your current recommendation to solve the link layer address computation of multicast groups joined by the stack before any enslavement actually takes place, is to instrument the bonding code such that it would be possible to enslave devices when the bonding device is not "up" yet. 2nd, the change need to be worked out in the bonding sysfs code, the ifenslave program but ***also*** in packages such as /sbin/ifup and friends. ??? BTW - is the ifenslave program still supported to work with upstream (2.6.18 and above) kernel or it was obsoleted at some point. Or. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-10-03 13:06 ` Or Gerlitz @ 2006-10-03 23:10 ` Jay Vosburgh 2006-10-04 15:25 ` Or Gerlitz 0 siblings, 1 reply; 20+ messages in thread From: Jay Vosburgh @ 2006-10-03 23:10 UTC (permalink / raw) To: Or Gerlitz; +Cc: netdev, Roland Dreier Or Gerlitz <ogerlitz@voltaire.com> wrote: >Sorry, but I don't follow... by saying "would be ideal to do ***it*** this >way in all cases" what exactly is the "it" you are referring to? It refers to: >>>> You almost want to have some kind of call to induce a reload >>>> from scratch of the multicast filter settings (along with whatever else >>>> might be necessary to alter the hardware type on the fly), to be called >>>> by bonding at the time the first slave is added (since slave adds happen >>>> in user context, and can therefore hold rtnl as required by most of the >>>> multicast address handling code). That seems less hassle than having to >>>> specify the hardware type and address length at module load time. Having this would eliminate the need to specify the hardware type at load time, and would allow changing of the hardware type at enslave time, rather than at device up time. This requires fewer changes to other things, like the initscripts or ifenslave. The ideal would be to allow changing of hardware type at literally any time, allowing failover across dissimilar hardware types. That's a lot more complicated, and has a smaller pool of potential uses. >1st, your current recommendation to solve the link layer address >computation of multicast groups joined by the stack before any enslavement >actually takes place, is to instrument the bonding code such that it would >be possible to enslave devices when the bonding device is not "up" yet. > >2nd, the change need to be worked out in the bonding sysfs code, the >ifenslave program but ***also*** in packages such as /sbin/ifup and >friends. Correct. The necessary changes to initscript and sysconfig are probably the most complex piece to organize (not necessarily the hardest to implement, but rather the most troublesome to deploy, as it introduces an API change). >BTW - is the ifenslave program still supported to work with upstream >(2.6.18 and above) kernel or it was obsoleted at some point. Yes, ifenslave is still supported. It probably will be obsoleted some day (or replaced with a script that uses sysfs), but not anytime soon. As far as I know, all current distros use ifenslave to configure bonding. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-10-03 23:10 ` Jay Vosburgh @ 2006-10-04 15:25 ` Or Gerlitz 2006-10-04 17:34 ` Jay Vosburgh 0 siblings, 1 reply; 20+ messages in thread From: Or Gerlitz @ 2006-10-04 15:25 UTC (permalink / raw) To: Jay Vosburgh; +Cc: netdev, Roland Dreier Jay Vosburgh wrote: > Or Gerlitz <ogerlitz@voltaire.com> wrote: > >> Sorry, but I don't follow... by saying "would be ideal to do ***it*** this >> way in all cases" what exactly is the "it" you are referring to? > > It refers to: > >>>>> You almost want to have some kind of call to induce a reload >>>>> from scratch of the multicast filter settings (along with whatever else >>>>> might be necessary to alter the hardware type on the fly), to be called >>>>> by bonding at the time the first slave is added (since slave adds happen >>>>> in user context, and can therefore hold rtnl as required by most of the >>>>> multicast address handling code). That seems less hassle than having to >>>>> specify the hardware type and address length at module load time. > > Having this would eliminate the need to specify the hardware > type at load time, and would allow changing of the hardware type at > enslave time, rather than at device up time. This requires fewer > changes to other things, like the initscripts or ifenslave. > > The ideal would be to allow changing of hardware type at > literally any time, allowing failover across dissimilar hardware types. > That's a lot more complicated, and has a smaller pool of potential uses. Thanks for the clarification. I would prefer first trying to go in the direction you suggest below of changing the ifenslave program and the kernel bonding code to allow for enslaving while the bonding device is not UP. >> 1st, your current recommendation to solve the link layer address >> computation of multicast groups joined by the stack before any enslavement >> actually takes place, is to instrument the bonding code such that it would >> be possible to enslave devices when the bonding device is not "up" yet. >> >> 2nd, the change need to be worked out in the bonding sysfs code, the >> ifenslave program but ***also*** in packages such as /sbin/ifup and >> friends. > > Correct. The necessary changes to initscript and sysconfig are > probably the most complex piece to organize (not necessarily the hardest > to implement, but rather the most troublesome to deploy, as it > introduces an API change). Looking on the sysconfig package, some tools eg /sbin/if{up,down,status} use ifenslave which is in turn provided by the iputils package. My understanding is that changing ifenslave and the bonding kernel code to allow for enslaving while master is not up is enough, so actually no change is needed to the sysconfig tools, correct? I have now removed the two assertions in the bonding code on enslaving while master is not up and manage to work fine with IPoIB slave devices and ***without*** the two module params! When you have the most troublesome to deploy, the troubles you refer to is make sure that the distros would include ***both*** the bonding kernel changes and use an iputils package which has the ifenslave changes? > Yes, ifenslave is still supported. It probably will be > obsoleted some day (or replaced with a script that uses sysfs), but not > anytime soon. As far as I know, all current distros use ifenslave to > configure bonding. Cool, thanks for bringing this into my attention... I understand now my patch set should also handle the ifenslave.c source that comes with the kernel (eg to allow for not setting the hw address etc) Or. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-10-04 15:25 ` Or Gerlitz @ 2006-10-04 17:34 ` Jay Vosburgh 2006-10-05 14:56 ` Or Gerlitz 0 siblings, 1 reply; 20+ messages in thread From: Jay Vosburgh @ 2006-10-04 17:34 UTC (permalink / raw) To: Or Gerlitz; +Cc: netdev, Roland Dreier Or Gerlitz <ogerlitz@voltaire.com> wrote: [...] >Looking on the sysconfig package, some tools eg /sbin/if{up,down,status} >use ifenslave which is in turn provided by the iputils package. > >My understanding is that changing ifenslave and the bonding kernel code to >allow for enslaving while master is not up is enough, so actually no >change is needed to the sysconfig tools, correct? Incorrect. The /sbin/ifup included with sysconfig (I'm looking at version 0.31-0-15.51) has logic to set the bonding master device up prior to adding any slaves. E.g., # get up the bonding device before enslaving # if ! is_iface_up $INTERFACE; then ip link set $INTERFACE up 2>&1 # fi # enslave available slave devices; if there is none -> hard break and log MESSAGE=`/sbin/ifenslave $BONDING_OPTIONS $INTERFACE $BSINTERFACES 2>&1` For your purposes, this would cause it to register as an ethernet hardware type, not an IB type. The /sbin/ifup included with initscripts operates a little differently, but also sets the bonding master up prior to adding any slaves. >I have now removed the two assertions in the bonding code on enslaving >while master is not up and manage to work fine with IPoIB slave devices >and ***without*** the two module params! > >When you have the most troublesome to deploy, the troubles you refer to is >make sure that the distros would include ***both*** the bonding kernel >changes and use an iputils package which has the ifenslave changes? Yes. Part of the difficulty is that the changes to the initscripts and sysconfig packages won't be compatible with versions of bonding prior to the bonding kernel changes (because older versions of bonding will refuse to add slaves if the master is down). It might require adding another API version to bonding, and modifying ifenslave to work both ways (i.e., with the current "enslave with master up" API, as well as the new "enslave with master down" API). >> Yes, ifenslave is still supported. It probably will be >> obsoleted some day (or replaced with a script that uses sysfs), but not >> anytime soon. As far as I know, all current distros use ifenslave to >> configure bonding. > >Cool, thanks for bringing this into my attention... I understand now my >patch set should also handle the ifenslave.c source that comes with the >kernel (eg to allow for not setting the hw address etc) An alternate approach would be to undertake the more substantial task of converting the initscripts and sysconfig code to use sysfs to configure bonding. This would permit changing the logic (to add slaves while the bonding master is down, then set it up), as well as remove the current hacks (present only in sysconfig) to load the bonding module once per configured bonding interface. The initscripts currently don't do this (as far as I know), so it's generally only possible to have one bonding interface under initscripts control. In this case, ifenslave would continue to work as it does now, and would simply not be supported for the new hardware. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-10-04 17:34 ` Jay Vosburgh @ 2006-10-05 14:56 ` Or Gerlitz 2006-10-05 18:13 ` Jay Vosburgh 0 siblings, 1 reply; 20+ messages in thread From: Or Gerlitz @ 2006-10-05 14:56 UTC (permalink / raw) To: Jay Vosburgh; +Cc: netdev, Roland Dreier Jay Vosburgh wrote: > Or Gerlitz <ogerlitz@voltaire.com> wrote: >> My understanding is that changing ifenslave and the bonding kernel code to >> allow for enslaving while master is not up is enough, so actually no >> change is needed to the sysconfig tools, correct? > > Incorrect. The /sbin/ifup included with sysconfig (I'm looking > at version 0.31-0-15.51) has logic to set the bonding master device up > prior to adding any slaves. E.g., > > # get up the bonding device before enslaving > # if ! is_iface_up $INTERFACE; then > ip link set $INTERFACE up 2>&1 > # fi > # enslave available slave devices; if there is none -> hard break and log > MESSAGE=`/sbin/ifenslave $BONDING_OPTIONS $INTERFACE $BSINTERFACES 2>&1` > > For your purposes, this would cause it to register as an > ethernet hardware type, not an IB type. The /sbin/ifup included with > initscripts operates a little differently, but also sets the bonding > master up prior to adding any slaves. OK, you are correct, i agree that the /sbin/ifup would attempt to first bring up the bonding device so it breaks my assumptions... > Yes. Part of the difficulty is that the changes to the > initscripts and sysconfig packages won't be compatible with versions of > bonding prior to the bonding kernel changes (because older versions of > bonding will refuse to add slaves if the master is down). It might > require adding another API version to bonding, and modifying ifenslave > to work both ways (i.e., with the current "enslave with master up" API, > as well as the new "enslave with master down" API). Gee, sounds bad > An alternate approach would be to undertake the more substantial > task of converting the initscripts and sysconfig code to use sysfs to > configure bonding. This would permit changing the logic (to add slaves > while the bonding master is down, then set it up), as well as remove the > current hacks (present only in sysconfig) to load the bonding module > once per configured bonding interface. The initscripts currently don't > do this (as far as I know), so it's generally only possible to have one > bonding interface under initscripts control. This sounds like a good idea to get out of all these troubles... So the direction to have sysconfig and initscripts tools configure bonding by sysfs and not by the enslave program is something you were considering regardless of the needs imposed by bonding support for non ARPHRD_ETHER netdevices? and you think the distro packages owners would like this? I will look into the current methods used by sysconfig to configure bonding and see if i can come up with sketch of how to do it with sysfs. Basically, i use now my own script working with sysfs in my IPoIB bonding testing where i have followed the directions in the bonding kernel doc. Thanks again for all the coaching... Or. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-10-05 14:56 ` Or Gerlitz @ 2006-10-05 18:13 ` Jay Vosburgh 2006-10-09 13:15 ` Or Gerlitz 0 siblings, 1 reply; 20+ messages in thread From: Jay Vosburgh @ 2006-10-05 18:13 UTC (permalink / raw) To: Or Gerlitz; +Cc: netdev, Roland Dreier Or Gerlitz <ogerlitz@voltaire.com> wrote: >Jay Vosburgh wrote: [...] >> Yes. Part of the difficulty is that the changes to the >> initscripts and sysconfig packages won't be compatible with versions of >> bonding prior to the bonding kernel changes (because older versions of >> bonding will refuse to add slaves if the master is down). It might >> require adding another API version to bonding, and modifying ifenslave >> to work both ways (i.e., with the current "enslave with master up" API, >> as well as the new "enslave with master down" API). > >Gee, sounds bad After some reflection, I suspect it wouldn't be all that awful. The main concern is going to be whether or not the existing ifenslave binaries supplied with distros will run with the new version of bonding. Since the new version of bonding that you're proposing is really just relaxing the rules (rather than imposing a different, incompatible set of rules), that's probably not a really big deal. I don't think it would require a revision change to the bonding ifenslave API. [...] >So the direction to have sysconfig and initscripts tools configure bonding >by sysfs and not by the enslave program is something you were considering >regardless of the needs imposed by bonding support for non ARPHRD_ETHER >netdevices? and you think the distro packages owners would like this? Yes, the long term direction is to have the initscripts configure bonding via sysfs, either directly or via the step of converting ifenslave to a script that uses sysfs. I personally find ifenslave to be more convenient to use than repeated "echo whatever > /sys/this/that/the/other", but there's no reason that ifenslave couldn't do the various echo things itself under the covers. One drawback to sysfs is that there's no real-time error reporting; you have to look at dmesg to see if your request succeeded or not. I'm not sure offhand if, e.g., adding a sysfs file to bonding for "last-request-status" is a kosher sysfs thing to do; if it is, then an ifenslave script could check such a thing to figure out error returns. It seems more logical to me to embed all of the bonding sysfs magic stuff into a separate script, but the maintainers of initscipts or sysconfig may see things differently. The main advantage to either of these (initscripts/sysconfig and/or ifenslave converted to sysfs) is that it eliminates the need to load the bonding driver module multiple times to have more than one bonding device with differing module parameters (because the sysfs interface can create any number of bonding interfaces with arbitrary settings). >I will look into the current methods used by sysconfig to configure >bonding and see if i can come up with sketch of how to do it with sysfs. It's probably easier to first convert ifenslave to a sysfs-using script that the existing initscripts can use. This allows the changes to be published in stages, rather than requiring a single flag day changeover. The first stage changes the bonding driver itself to permit enslavement with the master down (insuring that existing ifenslave binaries supplied with reasonably current distros continue to function). Next, ifenslave is changed to use sysfs (simultaneously removing the adjustment of the master or slave's up/down state during enslavement). The next stage either changes the initscripts/sysconfig to use sysfs directly or change its use of ifenslave to not do multiple loads of the bonding driver. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices 2006-10-05 18:13 ` Jay Vosburgh @ 2006-10-09 13:15 ` Or Gerlitz 0 siblings, 0 replies; 20+ messages in thread From: Or Gerlitz @ 2006-10-09 13:15 UTC (permalink / raw) To: Jay Vosburgh; +Cc: netdev, Roland Dreier Jay Vosburgh wrote: > After some reflection, I suspect it wouldn't be all that awful. > The main concern is going to be whether or not the existing ifenslave > binaries supplied with distros will run with the new version of bonding. > Since the new version of bonding that you're proposing is really just > relaxing the rules (rather than imposing a different, incompatible set > of rules), that's probably not a really big deal. I don't think it > would require a revision change to the bonding ifenslave API. Indeed, makes sense, the modified bonding driver would work with old ifenslave binaries. > Yes, the long term direction is to have the initscripts > configure bonding via sysfs, either directly or via the step of > converting ifenslave to a script that uses sysfs. > I personally find ifenslave to be more convenient to use than > repeated "echo whatever > /sys/this/that/the/other", but there's no > reason that ifenslave couldn't do the various echo things itself under > the covers. > One drawback to sysfs is that there's no real-time error > reporting; you have to look at dmesg to see if your request succeeded or > not. I'm not sure offhand if, e.g., adding a sysfs file to bonding for > "last-request-status" is a kosher sysfs thing to do; if it is, then an > ifenslave script could check such a thing to figure out error returns. Can you check that with someone around? > > It seems more logical to me to embed all of the bonding sysfs > magic stuff into a separate script, but the maintainers of initscipts or > sysconfig may see things differently. > > The main advantage to either of these (initscripts/sysconfig > and/or ifenslave converted to sysfs) is that it eliminates the need to > load the bonding driver module multiple times to have more than one > bonding device with differing module parameters (because the sysfs > interface can create any number of bonding interfaces with arbitrary > settings). > >> I will look into the current methods used by sysconfig to configure >> bonding and see if i can come up with sketch of how to do it with sysfs. > > It's probably easier to first convert ifenslave to a sysfs-using > script that the existing initscripts can use. > > This allows the changes to be published in stages, rather than > requiring a single flag day changeover. The first stage changes the > bonding driver itself to permit enslavement with the master down > (insuring that existing ifenslave binaries supplied with reasonably > current distros continue to function). Next, ifenslave is changed to > use sysfs (simultaneously removing the adjustment of the master or > slave's up/down state during enslavement). The next stage either > changes the initscripts/sysconfig to use sysfs directly or change its > use of ifenslave to not do multiple loads of the bonding driver. This plan makes much sense! however, this way or another (ie whether sysconfig tools are modified to use sysfs or ifenslave becomes a script that uses sysfs) there should be a change to sysconfig tools (specifically /sbin/ifup) in the place where it first makes the bonding interface UP and only later enslave the slave devices (eg the quote below from /sbin/ifup of sysconfig-0.50.9-13.8 that comes with SLES10) correct? > # get up the bonding device before enslaving > # if ! is_iface_up $INTERFACE; then > ip link set $INTERFACE up 2>&1 > # fi > # enslave available slave devices; if there is none -> hard break and log > MESSAGE=`/sbin/ifenslave $BONDING_OPTIONS $INTERFACE $BSINTERFACES 2>&1` So this becomes the forth step on the plan. And the most fragile aspect of the plan is the fact that ***two*** packages need to be changed as /sbin/ifenslave is not part of sysconfig but rather of (eg on SLES10) iputils-ss021109-167.2 Or. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2006-10-09 13:15 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-09-26 10:16 [RFC][PATCH 0/3] bonding support for operation over IPoIB Or Gerlitz 2006-09-26 10:17 ` [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices Or Gerlitz 2006-09-26 19:23 ` Jay Vosburgh 2006-09-27 19:59 ` Or Gerlitz 2006-09-28 17:02 ` Jay Vosburgh 2006-10-03 12:56 ` Or Gerlitz 2006-09-26 10:17 ` [RFC][PATCH 2/3] enable bonding to enslave netdevices not supporting set_mac_address() Or Gerlitz 2006-09-26 10:18 ` [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices Or Gerlitz 2006-09-26 17:05 ` Stephen Hemminger 2006-09-27 20:16 ` Or Gerlitz 2006-09-26 23:40 ` Jay Vosburgh 2006-09-27 20:12 ` Or Gerlitz 2006-09-28 17:43 ` Jay Vosburgh 2006-10-03 13:06 ` Or Gerlitz 2006-10-03 23:10 ` Jay Vosburgh 2006-10-04 15:25 ` Or Gerlitz 2006-10-04 17:34 ` Jay Vosburgh 2006-10-05 14:56 ` Or Gerlitz 2006-10-05 18:13 ` Jay Vosburgh 2006-10-09 13:15 ` Or Gerlitz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).