[RFC][PATCH 0/3] bonding support for operation over IPoIB

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH 0/3] bonding support for operation over IPoIB
@ 2006-09-26 10:16 Or Gerlitz
  2006-09-26 10:17 ` [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices Or Gerlitz
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Or Gerlitz @ 2006-09-26 10:16 UTC (permalink / raw)
  To: netdev; +Cc: Roland Dreier

This patch series is an RFC for changes to the bonding driver such that it
would be able to support non ARPHRD_ETHER netdevices for its High-Availability
(active-backup) mode.

My motivation was to enable the bonding driver on its HA mode to work with the
IP over Infiniband (IPoIB) driver. With these patches I was able to enslave
IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast and ICMP traffic with
fail-over and fail-back working fine.

More over, as IPoIB is also the IB ARP provider for the RDMA CM driver which
is used by native IB ULPs whose addressing scheme is based on IP (eg iSER, SDP,
Lustre, NFSoRDMA, RDS), bonding support for IPoIB devices **enables** HA for
these ULPs. This holds as when the ULP is informed by the IB HW on the failure
of the currect IB connection, it just need to reconnect, where the bonding
device will now issue the IB ARP over the active IPoIB slave.

The first patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types.

IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have ommited here some details which
are not important for the bonding RFC).

Basically the IPoIB spec and impl. do not allow for setting the MAC address of
an IPoIB device and my work was made under this assumption.

The second patch allows for enslaving netdevices which do not support the
set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by the bonding code when fail-over
occurs (this was already in the bonding code).

The third patch is temporal i hope, and is now required to run IP multicast when
bonding IPoIB devices. The problem is that some multicast groups (eg the all-hosts
224.0.0.1) might be set to the bonding device by the net stack **before** any
enslavement takes place.

Since ether_setup() sets the bonding device type to be ARPHRD_ETHER and address
len to be ETHER_ALEN, the net core code computes a wrong multicast link address.
Now, the current IPoIB impl. attempts to join on this wrong mcast address and
does not process more join requests. As a result IP multicast over other groups
whose link address is computed correct would not work.

Or Gerlitz.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices
  2006-09-26 10:16 [RFC][PATCH 0/3] bonding support for operation over IPoIB Or Gerlitz
@ 2006-09-26 10:17 ` Or Gerlitz
  2006-09-26 19:23   ` Jay Vosburgh
  2006-09-26 10:17 ` [RFC][PATCH 2/3] enable bonding to enslave netdevices not supporting set_mac_address() Or Gerlitz
  2006-09-26 10:18 ` [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices Or Gerlitz
  2 siblings, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2006-09-26 10:17 UTC (permalink / raw)
  To: netdev; +Cc: Roland Dreier

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

Index: net-2.6.19/drivers/net/bonding/bond_main.c
===================================================================
--- net-2.6.19.orig/drivers/net/bonding/bond_main.c	2006-09-20 14:40:13.000000000 +0300
+++ net-2.6.19/drivers/net/bonding/bond_main.c	2006-09-25 11:43:52.000000000 +0300
@@ -1013,6 +1013,23 @@ static struct slave *bond_find_best_slav
 	return bestslave;
 }

+void bond_setup_by_slave(struct bonding *bond, struct slave *new_active)
+{
+	bond->dev->hard_header	        = new_active->dev->hard_header;
+	bond->dev->rebuild_header       = new_active->dev->rebuild_header;
+	bond->dev->hard_header_cache	= new_active->dev->hard_header_cache;
+	bond->dev->header_cache_update  = new_active->dev->header_cache_update;
+	bond->dev->hard_header_parse	= new_active->dev->hard_header_parse;
+
+	bond->dev->type		    = new_active->dev->type;
+	bond->dev->hard_header_len  = new_active->dev->hard_header_len;
+	bond->dev->mtu		    = new_active->dev->mtu;
+	bond->dev->addr_len	    = new_active->dev->addr_len;
+
+	memcpy(bond->dev->broadcast, new_active->dev->broadcast,
+		new_active->dev->addr_len);
+}
+
 /**
  * change_active_interface - change the active slave into the specified one
  * @bond: our bonding struct
@@ -1091,6 +1108,14 @@ void bond_change_active_slave(struct bon
 		if (new_active) {
 			bond_set_slave_active_flags(new_active);
 		}
+
+		/* bonding netdevices are created with ether_setup, so when the
+		 * slave type is not ARPHRD_ETHER there is a need to override
+		 * some of the type dependent attributes/functions
+		 */
+		if (new_active && new_active->dev->type != ARPHRD_ETHER)
+			bond_setup_by_slave(bond, new_active);
+
 		bond_send_gratuitous_arp(bond);
 	}
 }


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC][PATCH 2/3] enable bonding to enslave netdevices not supporting set_mac_address()
  2006-09-26 10:16 [RFC][PATCH 0/3] bonding support for operation over IPoIB Or Gerlitz
  2006-09-26 10:17 ` [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices Or Gerlitz
@ 2006-09-26 10:17 ` Or Gerlitz
  2006-09-26 10:18 ` [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices Or Gerlitz
  2 siblings, 0 replies; 20+ messages in thread
From: Or Gerlitz @ 2006-09-26 10:17 UTC (permalink / raw)
  To: netdev; +Cc: Roland Dreier

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

Index: net-2.6.19/drivers/net/bonding/bond_main.c
===================================================================
--- net-2.6.19.orig/drivers/net/bonding/bond_main.c	2006-09-25 11:43:52.000000000 +0300
+++ net-2.6.19/drivers/net/bonding/bond_main.c	2006-09-25 11:46:35.000000000 +0300
@@ -1115,7 +1115,14 @@ void bond_change_active_slave(struct bon
 		 */
 		if (new_active && new_active->dev->type != ARPHRD_ETHER)
 			bond_setup_by_slave(bond, new_active);
-
+
+		/* when bonding does not set the slave MAC address, the bond MAC
+		 * address is the one of the active slave.
+		 */
+		if (new_active && !bond->do_set_mac_addr)
+			memcpy(bond->dev->dev_addr,  new_active->dev->dev_addr,
+				new_active->dev->addr_len);
+
 		bond_send_gratuitous_arp(bond);
 	}
 }
@@ -1335,14 +1342,23 @@ int bond_enslave(struct net_device *bond
 	}

 	if (slave_dev->set_mac_address == NULL) {
-		printk(KERN_ERR DRV_NAME
-			": %s: Error: The slave device you specified does "
-			"not support setting the MAC address. "
-			"Your kernel likely does not support slave "
-			"devices.\n", bond_dev->name);
-  		res = -EOPNOTSUPP;
-		goto err_undo_flags;
-	}
+		if (bond->slave_cnt == 0) {
+			printk(KERN_WARNING DRV_NAME
+				": %s: Warning: The first slave device you "
+				"specified does not support setting the MAC "
+				"address. This bond MAC address would be that "
+				"of the active slave.\n", bond_dev->name);
+			bond->do_set_mac_addr = 0;
+		} else if (bond->do_set_mac_addr) {
+			printk(KERN_ERR DRV_NAME
+				": %s: Error: The slave device you specified "
+				"does not support setting the MAC addres,."
+				"but this bond uses this practice. \n"
+				, bond_dev->name);
+			res = -EOPNOTSUPP;
+			goto err_undo_flags;
+		}
+	}

 	new_slave = kmalloc(sizeof(struct slave), GFP_KERNEL);
 	if (!new_slave) {
@@ -1364,16 +1380,18 @@ int bond_enslave(struct net_device *bond
 	 */
 	memcpy(new_slave->perm_hwaddr, slave_dev->dev_addr, ETH_ALEN);

-	/*
-	 * Set slave to master's mac address.  The application already
-	 * set the master's mac address to that of the first slave
-	 */
-	memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
-	addr.sa_family = slave_dev->type;
-	res = dev_set_mac_address(slave_dev, &addr);
-	if (res) {
-		dprintk("Error %d calling set_mac_address\n", res);
-		goto err_free;
+	if (bond->do_set_mac_addr) {
+		/*
+		 * Set slave to master's mac address.  The application already
+		 * set the master's mac address to that of the first slave
+		 */
+		memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
+		addr.sa_family = slave_dev->type;
+		res = dev_set_mac_address(slave_dev, &addr);
+		if (res) {
+			dprintk("Error %d calling set_mac_address\n", res);
+			goto err_free;
+		}
 	}

 	/* open the slave since the application closed it */
@@ -1617,9 +1635,11 @@ err_close:
 	dev_close(slave_dev);

 err_restore_mac:
-	memcpy(addr.sa_data, new_slave->perm_hwaddr, ETH_ALEN);
-	addr.sa_family = slave_dev->type;
-	dev_set_mac_address(slave_dev, &addr);
+	if (bond->do_set_mac_addr) {
+		memcpy(addr.sa_data, new_slave->perm_hwaddr, ETH_ALEN);
+		addr.sa_family = slave_dev->type;
+		dev_set_mac_address(slave_dev, &addr);
+	}

 err_free:
 	kfree(new_slave);
@@ -1797,10 +1817,12 @@ int bond_release(struct net_device *bond
 	/* close slave before restoring its mac address */
 	dev_close(slave_dev);

-	/* restore original ("permanent") mac address */
-	memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
-	addr.sa_family = slave_dev->type;
-	dev_set_mac_address(slave_dev, &addr);
+	if (bond->do_set_mac_addr) {
+		/* restore original ("permanent") mac address */
+		memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
+		addr.sa_family = slave_dev->type;
+		dev_set_mac_address(slave_dev, &addr);
+	}

 	slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB |
 				   IFF_SLAVE_INACTIVE);
@@ -1886,10 +1908,12 @@ static int bond_release_all(struct net_d
 		/* close slave before restoring its mac address */
 		dev_close(slave_dev);

-		/* restore original ("permanent") mac address*/
-		memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
-		addr.sa_family = slave_dev->type;
-		dev_set_mac_address(slave_dev, &addr);
+		if (bond->do_set_mac_addr) {
+			/* restore original ("permanent") mac address*/
+			memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
+			addr.sa_family = slave_dev->type;
+			dev_set_mac_address(slave_dev, &addr);
+		}

 		slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB |
 					   IFF_SLAVE_INACTIVE);
@@ -3793,6 +3817,10 @@ static int bond_set_mac_address(struct n

 	dprintk("bond=%p, name=%s\n", bond, (bond_dev ? bond_dev->name : "None"));

+	if (!bond->do_set_mac_addr) {
+		return -EOPNOTSUPP;
+	}
+
 	if (!is_valid_ether_addr(sa->sa_data)) {
 		return -EADDRNOTAVAIL;
 	}
@@ -4233,6 +4261,9 @@ static int bond_init(struct net_device *
 	bond_create_proc_entry(bond);
 #endif

+	/* set do_set_mac_addr to true on startup */
+	bond->do_set_mac_addr = 1;
+
 	list_add_tail(&bond->bond_list, &bond_dev_list);

 	return 0;
Index: net-2.6.19/drivers/net/bonding/bonding.h
===================================================================
--- net-2.6.19.orig/drivers/net/bonding/bonding.h	2006-09-25 11:42:28.000000000 +0300
+++ net-2.6.19/drivers/net/bonding/bonding.h	2006-09-25 11:46:35.000000000 +0300
@@ -198,6 +198,7 @@ struct bonding {
 	struct   bond_params params;
 	struct   list_head vlan_list;
 	struct   vlan_group *vlgrp;
+	s8       do_set_mac_addr;
 };

 /**


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-09-26 10:16 [RFC][PATCH 0/3] bonding support for operation over IPoIB Or Gerlitz
  2006-09-26 10:17 ` [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices Or Gerlitz
  2006-09-26 10:17 ` [RFC][PATCH 2/3] enable bonding to enslave netdevices not supporting set_mac_address() Or Gerlitz
@ 2006-09-26 10:18 ` Or Gerlitz
  2006-09-26 17:05   ` Stephen Hemminger
  2006-09-26 23:40   ` Jay Vosburgh
  2 siblings, 2 replies; 20+ messages in thread
From: Or Gerlitz @ 2006-09-26 10:18 UTC (permalink / raw)
  To: netdev; +Cc: Roland Dreier

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

Index: net-2.6.19/drivers/net/bonding/bond_main.c
===================================================================
--- net-2.6.19.orig/drivers/net/bonding/bond_main.c	2006-09-25 11:46:35.000000000 +0300
+++ net-2.6.19/drivers/net/bonding/bond_main.c	2006-09-26 10:54:44.000000000 +0300
@@ -128,6 +128,12 @@ MODULE_PARM_DESC(arp_interval, "arp inte
 module_param_array(arp_ip_target, charp, NULL, 0);
 MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form");

+static int bonding_dev_type    = ARPHRD_ETHER;
+static int bonding_dev_addrlen = ETH_ALEN;
+
+module_param(bonding_dev_type,    int, 0644);
+module_param(bonding_dev_addrlen, int, 0644);
+
 /*----------------------------- Global variables ----------------------------*/

 static const char * const version =
@@ -4606,7 +4612,14 @@ int bond_create(char *name, struct bond_
 		res = -ENOMEM;
 		goto out_rtnl;
 	}
-
+
+	/* XXX set the bond dev type and addr len such that the net core code
+	* (eg arp_mc_map() in net/ipv4/arp.c) would correctly process multicast
+	* groups set ***before*** the first enslaveness
+	*/
+	bond_dev->type     = bonding_dev_type;
+	bond_dev->addr_len = bonding_dev_addrlen;
+
 	/* bond_init() must be called after dev_alloc_name() (for the
 	 * /proc files), but before register_netdevice(), because we
 	 * need to set function pointers.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-09-26 10:18 ` [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices Or Gerlitz
@ 2006-09-26 17:05   ` Stephen Hemminger
  2006-09-27 20:16     ` Or Gerlitz
  2006-09-26 23:40   ` Jay Vosburgh
  1 sibling, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2006-09-26 17:05 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: netdev, Roland Dreier

On Tue, 26 Sep 2006 13:18:09 +0300 (IDT)
Or Gerlitz <ogerlitz@voltaire.com> wrote:

> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
> 
> Index: net-2.6.19/drivers/net/bonding/bond_main.c
> ===================================================================
> --- net-2.6.19.orig/drivers/net/bonding/bond_main.c	2006-09-25 11:46:35.000000000 +0300
> +++ net-2.6.19/drivers/net/bonding/bond_main.c	2006-09-26 10:54:44.000000000 +0300
> @@ -128,6 +128,12 @@ MODULE_PARM_DESC(arp_interval, "arp inte
>  module_param_array(arp_ip_target, charp, NULL, 0);
>  MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form");
> 
> +static int bonding_dev_type    = ARPHRD_ETHER;
> +static int bonding_dev_addrlen = ETH_ALEN;
> +
> +module_param(bonding_dev_type,    int, 0644);
> +module_param(bonding_dev_addrlen, int, 0644);

Do you really want to allow changing these values after module load?
If not replace 0644 with 0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices
  2006-09-26 10:17 ` [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices Or Gerlitz
@ 2006-09-26 19:23   ` Jay Vosburgh
  2006-09-27 19:59     ` Or Gerlitz
  0 siblings, 1 reply; 20+ messages in thread
From: Jay Vosburgh @ 2006-09-26 19:23 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: netdev, Roland Dreier

Or Gerlitz <ogerlitz@voltaire.com> wrote:
[...]
+	bond->dev->mtu		    = new_active->dev->mtu;

	This won't generate a NETDEV_CHANGEMTU notifier event.

[...]
>+		/* bonding netdevices are created with ether_setup, so when the
>+		 * slave type is not ARPHRD_ETHER there is a need to override
>+		 * some of the type dependent attributes/functions
>+		 */
>+		if (new_active && new_active->dev->type != ARPHRD_ETHER)
>+			bond_setup_by_slave(bond, new_active);
>+

	In this case, if the bond has one slave that's ARPHRD_ETHER and
one that's not, when the active changes from the non-ARPHRD_ETHER slave
to the ARPHRD_ETHER slave, it won't call bond_setup_by_slave() to switch
the hard_header, rebuild_header, et al, back to the ARPHRD_ETHER
settings.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-09-26 10:18 ` [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices Or Gerlitz
  2006-09-26 17:05   ` Stephen Hemminger
@ 2006-09-26 23:40   ` Jay Vosburgh
  2006-09-27 20:12     ` Or Gerlitz
  1 sibling, 1 reply; 20+ messages in thread
From: Jay Vosburgh @ 2006-09-26 23:40 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: netdev, Roland Dreier

Or Gerlitz <ogerlitz@voltaire.com> wrote:

>+	/* XXX set the bond dev type and addr len such that the net core code
>+	* (eg arp_mc_map() in net/ipv4/arp.c) would correctly process multicast
>+	* groups set ***before*** the first enslaveness
>+	*/
>+	bond_dev->type     = bonding_dev_type;
>+	bond_dev->addr_len = bonding_dev_addrlen;

	I've been thinking about this a little bit more.  The system is
understandably not set up to deal with this situation, since normal
devices won't ever change their hardware type.

	You almost want to have some kind of call to induce a reload
from scratch of the multicast filter settings (along with whatever else
might be necessary to alter the hardware type on the fly), to be called
by bonding at the time the first slave is added (since slave adds happen
in user context, and can therefore hold rtnl as required by most of the
multicast address handling code).  That seems less hassle than having to
specify the hardware type and address length at module load time.

	A side effect of this is that bonds would have to be restricted
to consisting only of slaves of one hardware type, since slave changes
(and thus hardware type changes) aren't necessarily restricted to user
context.

	Other random thoughts on how to resolve this include modifying
bonding to accept slaves when the master is down (which would also
require changes to the initscripts that normally configure bonding), so
that the initial setting of the, e.g., 224.0.0.1 multicast hardware
address happens to the already-changed hardware type.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices
  2006-09-26 19:23   ` Jay Vosburgh
@ 2006-09-27 19:59     ` Or Gerlitz
  2006-09-28 17:02       ` Jay Vosburgh
  0 siblings, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2006-09-27 19:59 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Or Gerlitz, netdev, Roland Dreier

On 9/26/06, Jay Vosburgh <fubar@us.ibm.com> wrote:
> Or Gerlitz <ogerlitz@voltaire.com> wrote:
> [...]
> +       bond->dev->mtu              = new_active->dev->mtu;
>
>         This won't generate a NETDEV_CHANGEMTU notifier event.

What is actually the trigger for the event with the current impl? is
the code that actually calls dev_set_mtu() on the bonding device or
dev_set_mtu() itself?

>
> [...]
> >+              /* bonding netdevices are created with ether_setup, so when the
> >+               * slave type is not ARPHRD_ETHER there is a need to override
> >+               * some of the type dependent attributes/functions
> >+               */
> >+              if (new_active && new_active->dev->type != ARPHRD_ETHER)
> >+                      bond_setup_by_slave(bond, new_active);
> >+

>         In this case, if the bond has one slave that's ARPHRD_ETHER and
> one that's not, when the active changes from the non-ARPHRD_ETHER slave
> to the ARPHRD_ETHER slave, it won't call bond_setup_by_slave() to switch
> the hard_header, rebuild_header, et al, back to the ARPHRD_ETHER
> settings.

OK. First, under the assumption that one may enslave ARPHRD_ETHER and
non-ARPHRD_ETHER devices in the same bond, you are correct and the
patch is not complete here.

However, putting devices from different types in the same bond
requires a switch that **both** HW NICs/ports associated with the each
of the netdevices can talk to. If there is no such switch, then the
only possible config is two isolated networks/switches where each
NIC/type is connected to a switch supporting this type so  a local
failure/failover on some node requires the whole subset of nodes
talking to this one to do failover. So if the relation (i,j) which
holds if node i talks to node j does not impose a disjoint partition
on the set of all N nodes, you just can't do this bonding scheme.

Practically, talking on IPoIB vs. "IPoETH" (ie slave devices of type
ARPHRD_INFINIBAND vs slaves of type ARPHRD_ETHER) to have an IPoIB
slave talk to "IPoETH" slave you need an IB to Ethernet IP router
(actually IPoIB to IPoETH "bridge") in the middle where the IB switch
should be connected to the IB ports of the bridge and the Ethernet
switch to the Ethernet ports of the bridge. All in all, it is a
configuration i think we can avoid supporting.

So at the bottom line, i would go on enhancing my patch not to allow
bonding together devices of different types or at least if you don't
mind, not to allow putting ARPHRD_INFINIBAND with
non-ARPHRD_INFINIBAND devices in the same bond.

Or.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-09-26 23:40   ` Jay Vosburgh
@ 2006-09-27 20:12     ` Or Gerlitz
  2006-09-28 17:43       ` Jay Vosburgh
  0 siblings, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2006-09-27 20:12 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Or Gerlitz, netdev, Roland Dreier

On 9/27/06, Jay Vosburgh <fubar@us.ibm.com> wrote:
> Or Gerlitz <ogerlitz@voltaire.com> wrote:
>
> >+      /* XXX set the bond dev type and addr len such that the net core code
> >+      * (eg arp_mc_map() in net/ipv4/arp.c) would correctly process multicast
> >+      * groups set ***before*** the first enslaveness
> >+      */
> >+      bond_dev->type     = bonding_dev_type;
> >+      bond_dev->addr_len = bonding_dev_addrlen;
>
>         I've been thinking about this a little bit more.  The system is
> understandably not set up to deal with this situation, since normal
> devices won't ever change their hardware type.
>
>         You almost want to have some kind of call to induce a reload
> from scratch of the multicast filter settings (along with whatever else
> might be necessary to alter the hardware type on the fly), to be called
> by bonding at the time the first slave is added (since slave adds happen
> in user context, and can therefore hold rtnl as required by most of the
> multicast address handling code).  That seems less hassle than having to
> specify the hardware type and address length at module load time.

I agree that it would be better to avoid doing it this way.
>
>         A side effect of this is that bonds would have to be restricted
> to consisting only of slaves of one hardware type, since slave changes
> (and thus hardware type changes) aren't necessarily restricted to user
> context.

I have addressed the point of putting slaves of different types (and
specifically slave of type ARPHRD_INFINIBAND with slave of other type
in the same bond) in the thead that goes with patch 1/3, let close it
there...

>         Other random thoughts on how to resolve this include modifying
> bonding to accept slaves when the master is down (which would also
> require changes to the initscripts that normally configure bonding), so
> that the initial setting of the, e.g., 224.0.0.1 multicast hardware
> address happens to the already-changed hardware type.

OK, this is a direction i would like to check. Can be nice if you
provide me with a 1-2 liner of directions on what need to be changed
to enable bonding to accept slaves when it down.

Or.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-09-26 17:05   ` Stephen Hemminger
@ 2006-09-27 20:16     ` Or Gerlitz
  0 siblings, 0 replies; 20+ messages in thread
From: Or Gerlitz @ 2006-09-27 20:16 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Or Gerlitz, netdev, Roland Dreier

On 9/26/06, Stephen Hemminger <shemminger@osdl.org> wrote:
> On Tue, 26 Sep 2006 13:18:09 +0300 (IDT)
> Or Gerlitz <ogerlitz@voltaire.com> wrote:

> > +module_param(bonding_dev_type,    int, 0644);
> > +module_param(bonding_dev_addrlen, int, 0644);

> Do you really want to allow changing these values after module load?
> If not replace 0644 with 0

Nope, they are ment to be used only at load time, thanks for the
comment. Howeve, as i mentioned this patch is temporal workaround to
allow for have IP multicast supported when bondiong non ARPHRD_ETHER
devices. I am seeking better ways to do that.

Or.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices
  2006-09-27 19:59     ` Or Gerlitz
@ 2006-09-28 17:02       ` Jay Vosburgh
  2006-10-03 12:56         ` Or Gerlitz
  0 siblings, 1 reply; 20+ messages in thread
From: Jay Vosburgh @ 2006-09-28 17:02 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Or Gerlitz, netdev, Roland Dreier

Or Gerlitz <or.gerlitz@gmail.com> wrote:

>On 9/26/06, Jay Vosburgh <fubar@us.ibm.com> wrote:
>> Or Gerlitz <ogerlitz@voltaire.com> wrote:
>> [...]
>> +       bond->dev->mtu              = new_active->dev->mtu;
>>
>>         This won't generate a NETDEV_CHANGEMTU notifier event.
>
>What is actually the trigger for the event with the current impl? is
>the code that actually calls dev_set_mtu() on the bonding device or
>dev_set_mtu() itself?

	My comment wasn't quite totally thought out; pretend you didn't
see it.

	I think what would be better overall is to handle the mtu for
this case the way bonding handles the mtu for other slave devices.
Normally, the mtu is pushed to the slaves from the bonding master, not
the other way around.  So, you don't want to assign the master's mtu
here; the slave mtu should already be up to date (and set to whatever
the master's mtu is via the usual mechanism, bond_change_mtu for
changes, or set in the slave at enslavement time).

[...]
>So at the bottom line, i would go on enhancing my patch not to allow
>bonding together devices of different types or at least if you don't
>mind, not to allow putting ARPHRD_INFINIBAND with
>non-ARPHRD_INFINIBAND devices in the same bond.

	I think this (disallowing bonding of dissimilar ARPHRD types) is
the way to go, at least in the short term.  Get it to work for the
common case first, then deal with the fringe stuff later.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-09-27 20:12     ` Or Gerlitz
@ 2006-09-28 17:43       ` Jay Vosburgh
  2006-10-03 13:06         ` Or Gerlitz
  0 siblings, 1 reply; 20+ messages in thread
From: Jay Vosburgh @ 2006-09-28 17:43 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Or Gerlitz, netdev, Roland Dreier

Or Gerlitz <or.gerlitz@gmail.com> wrote:

>On 9/27/06, Jay Vosburgh <fubar@us.ibm.com> wrote:
>> Or Gerlitz <ogerlitz@voltaire.com> wrote:
[...]
>>         You almost want to have some kind of call to induce a reload
>> from scratch of the multicast filter settings (along with whatever else
>> might be necessary to alter the hardware type on the fly), to be called
>> by bonding at the time the first slave is added (since slave adds happen
>> in user context, and can therefore hold rtnl as required by most of the
>> multicast address handling code).  That seems less hassle than having to
>> specify the hardware type and address length at module load time.
>
>I agree that it would be better to avoid doing it this way.

	Actually, it would be ideal to do it this way in all cases, as
the change of hardware type is the biggest hurdle to cross-hardware
bonding instances.  The current infrastructure simply won't allow it,
though, since bonding failover events usually occur in a timer context
(if memory serves, timers run in softirq and can't acquire rtnl).

[...]
>>         Other random thoughts on how to resolve this include modifying
>> bonding to accept slaves when the master is down (which would also
>> require changes to the initscripts that normally configure bonding), so
>> that the initial setting of the, e.g., 224.0.0.1 multicast hardware
>> address happens to the already-changed hardware type.
>
>OK, this is a direction i would like to check. Can be nice if you
>provide me with a 1-2 liner of directions on what need to be changed
>to enable bonding to accept slaves when it down.

	I don't think right offhand this would be a particularly
difficult change; the "up" operation for bonding mostly just starts up
various timers.  A few minutes poking around doesn't reveal anything
obvious that would hinder enslaving with the master down.  You'll have
to change ifenslave and the sysfs code to allow enslaves with the master
down; that might be all that's needed for bonding itself.  Changing
/sbin/ifup and friends is a separate problem.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices
  2006-09-28 17:02       ` Jay Vosburgh
@ 2006-10-03 12:56         ` Or Gerlitz
  0 siblings, 0 replies; 20+ messages in thread
From: Or Gerlitz @ 2006-10-03 12:56 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, Roland Dreier

Jay Vosburgh wrote:
> Or Gerlitz <or.gerlitz@gmail.com> wrote:
> 
>> On 9/26/06, Jay Vosburgh <fubar@us.ibm.com> wrote:
>>> Or Gerlitz <ogerlitz@voltaire.com> wrote:
>>> [...]
>>> +       bond->dev->mtu              = new_active->dev->mtu;
>>>
>>>         This won't generate a NETDEV_CHANGEMTU notifier event.

>> What is actually the trigger for the event with the current impl? is
>> the code that actually calls dev_set_mtu() on the bonding device or
>> dev_set_mtu() itself?

> 	My comment wasn't quite totally thought out; pretend you didn't
> see it.

> 	I think what would be better overall is to handle the mtu for
> this case the way bonding handles the mtu for other slave devices.
> Normally, the mtu is pushed to the slaves from the bonding master, not
> the other way around.  So, you don't want to assign the master's mtu
> here; the slave mtu should already be up to date (and set to whatever
> the master's mtu is via the usual mechanism, bond_change_mtu for
> changes, or set in the slave at enslavement time).

OK, i think i got you. Today the dev_set_mtu() is called on the slave 
device only when someone attempts to change the bond MTU. So you suggest 
to do it also during enslavement so the current master MTU would be 
propagated to the slaves and not vise versa, this makes sense.

> [...]
>> So at the bottom line, i would go on enhancing my patch not to allow
>> bonding together devices of different types or at least if you don't
>> mind, not to allow putting ARPHRD_INFINIBAND with
>> non-ARPHRD_INFINIBAND devices in the same bond.
> 
> 	I think this (disallowing bonding of dissimilar ARPHRD types) is
> the way to go, at least in the short term.  Get it to work for the
> common case first, then deal with the fringe stuff later.

OK, as you are fine with it, i will modify the patch to disallow bonding 
of dissimilar ARPHRD types.

Or.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-09-28 17:43       ` Jay Vosburgh
@ 2006-10-03 13:06         ` Or Gerlitz
  2006-10-03 23:10           ` Jay Vosburgh
  0 siblings, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2006-10-03 13:06 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, Roland Dreier

Jay Vosburgh wrote:
> Or Gerlitz <or.gerlitz@gmail.com> wrote:
> 
>> On 9/27/06, Jay Vosburgh <fubar@us.ibm.com> wrote:
>>> Or Gerlitz <ogerlitz@voltaire.com> wrote:
> [...]
>>>         You almost want to have some kind of call to induce a reload
>>> from scratch of the multicast filter settings (along with whatever else
>>> might be necessary to alter the hardware type on the fly), to be called
>>> by bonding at the time the first slave is added (since slave adds happen
>>> in user context, and can therefore hold rtnl as required by most of the
>>> multicast address handling code).  That seems less hassle than having to
>>> specify the hardware type and address length at module load time.
>> I agree that it would be better to avoid doing it this way.
> 
> 	Actually, it would be ideal to do it this way in all cases, as
> the change of hardware type is the biggest hurdle to cross-hardware
> bonding instances.  The current infrastructure simply won't allow it,
> though, since bonding failover events usually occur in a timer context
> (if memory serves, timers run in softirq and can't acquire rtnl).

Sorry, but I don't follow... by saying "would be ideal to do ***it*** 
this way in all cases" what exactly is the "it" you are referring to?

> 
> [...]
>>>         Other random thoughts on how to resolve this include modifying
>>> bonding to accept slaves when the master is down (which would also
>>> require changes to the initscripts that normally configure bonding), so
>>> that the initial setting of the, e.g., 224.0.0.1 multicast hardware
>>> address happens to the already-changed hardware type.
>> OK, this is a direction i would like to check. Can be nice if you
>> provide me with a 1-2 liner of directions on what need to be changed
>> to enable bonding to accept slaves when it down.
> 
> 	I don't think right offhand this would be a particularly
> difficult change; the "up" operation for bonding mostly just starts up
> various timers.  A few minutes poking around doesn't reveal anything
> obvious that would hinder enslaving with the master down.  You'll have
> to change ifenslave and the sysfs code to allow enslaves with the master
> down; that might be all that's needed for bonding itself.  Changing
> /sbin/ifup and friends is a separate problem.

OK, lets see i follow:

1st, your current recommendation to solve the link layer address 
computation of multicast groups joined by the stack before any 
enslavement actually takes place, is to instrument the bonding code such 
that it would be possible to enslave devices when the bonding device is 
not "up" yet.

2nd, the change need to be worked out in the bonding sysfs code, the 
ifenslave program but ***also*** in packages such as /sbin/ifup and friends.

???

BTW - is the ifenslave program still supported to work with upstream 
(2.6.18 and above) kernel or it was obsoleted at some point.

Or.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-10-03 13:06         ` Or Gerlitz
@ 2006-10-03 23:10           ` Jay Vosburgh
  2006-10-04 15:25             ` Or Gerlitz
  0 siblings, 1 reply; 20+ messages in thread
From: Jay Vosburgh @ 2006-10-03 23:10 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: netdev, Roland Dreier

Or Gerlitz <ogerlitz@voltaire.com> wrote:

>Sorry, but I don't follow... by saying "would be ideal to do ***it*** this
>way in all cases" what exactly is the "it" you are referring to?

	It refers to:

>>>>         You almost want to have some kind of call to induce a reload
>>>> from scratch of the multicast filter settings (along with whatever else
>>>> might be necessary to alter the hardware type on the fly), to be called
>>>> by bonding at the time the first slave is added (since slave adds happen
>>>> in user context, and can therefore hold rtnl as required by most of the
>>>> multicast address handling code).  That seems less hassle than having to
>>>> specify the hardware type and address length at module load time.

	Having this would eliminate the need to specify the hardware
type at load time, and would allow changing of the hardware type at
enslave time, rather than at device up time.  This requires fewer
changes to other things, like the initscripts or ifenslave.

	The ideal would be to allow changing of hardware type at
literally any time, allowing failover across dissimilar hardware types.
That's a lot more complicated, and has a smaller pool of potential uses.

>1st, your current recommendation to solve the link layer address
>computation of multicast groups joined by the stack before any enslavement
>actually takes place, is to instrument the bonding code such that it would
>be possible to enslave devices when the bonding device is not "up" yet.
>
>2nd, the change need to be worked out in the bonding sysfs code, the
>ifenslave program but ***also*** in packages such as /sbin/ifup and
>friends.

	Correct.  The necessary changes to initscript and sysconfig are
probably the most complex piece to organize (not necessarily the hardest
to implement, but rather the most troublesome to deploy, as it
introduces an API change).

>BTW - is the ifenslave program still supported to work with upstream
>(2.6.18 and above) kernel or it was obsoleted at some point.

	Yes, ifenslave is still supported.  It probably will be
obsoleted some day (or replaced with a script that uses sysfs), but not
anytime soon.  As far as I know, all current distros use ifenslave to
configure bonding.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-10-03 23:10           ` Jay Vosburgh
@ 2006-10-04 15:25             ` Or Gerlitz
  2006-10-04 17:34               ` Jay Vosburgh
  0 siblings, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2006-10-04 15:25 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, Roland Dreier

Jay Vosburgh wrote:
> Or Gerlitz <ogerlitz@voltaire.com> wrote:
> 
>> Sorry, but I don't follow... by saying "would be ideal to do ***it*** this
>> way in all cases" what exactly is the "it" you are referring to?
> 
> 	It refers to:
> 
>>>>>         You almost want to have some kind of call to induce a reload
>>>>> from scratch of the multicast filter settings (along with whatever else
>>>>> might be necessary to alter the hardware type on the fly), to be called
>>>>> by bonding at the time the first slave is added (since slave adds happen
>>>>> in user context, and can therefore hold rtnl as required by most of the
>>>>> multicast address handling code).  That seems less hassle than having to
>>>>> specify the hardware type and address length at module load time.
> 
> 	Having this would eliminate the need to specify the hardware
> type at load time, and would allow changing of the hardware type at
> enslave time, rather than at device up time.  This requires fewer
> changes to other things, like the initscripts or ifenslave.
> 
> 	The ideal would be to allow changing of hardware type at
> literally any time, allowing failover across dissimilar hardware types.
> That's a lot more complicated, and has a smaller pool of potential uses.

Thanks for the clarification. I would prefer first trying to go in the 
direction you suggest below of changing the ifenslave program and the 
kernel bonding code to allow for enslaving while the bonding device is 
not UP.

>> 1st, your current recommendation to solve the link layer address
>> computation of multicast groups joined by the stack before any enslavement
>> actually takes place, is to instrument the bonding code such that it would
>> be possible to enslave devices when the bonding device is not "up" yet.
>>
>> 2nd, the change need to be worked out in the bonding sysfs code, the
>> ifenslave program but ***also*** in packages such as /sbin/ifup and
>> friends.
> 
> 	Correct.  The necessary changes to initscript and sysconfig are
> probably the most complex piece to organize (not necessarily the hardest
> to implement, but rather the most troublesome to deploy, as it
> introduces an API change).

Looking on the sysconfig package, some tools eg /sbin/if{up,down,status} 
use ifenslave which is in turn provided by the iputils package.

My understanding is that changing ifenslave and the bonding kernel code 
to allow for enslaving while master is not up is enough, so actually no 
change is needed to the sysconfig tools, correct?

I have now removed the two assertions in the bonding code on enslaving 
while master is not up and manage to work fine with IPoIB slave devices 
and ***without*** the two module params!

When you have the most troublesome to deploy, the troubles you refer to 
is make sure that the distros would include ***both*** the bonding 
kernel changes and use an iputils package which has the ifenslave changes?

> 	Yes, ifenslave is still supported.  It probably will be
> obsoleted some day (or replaced with a script that uses sysfs), but not
> anytime soon.  As far as I know, all current distros use ifenslave to
> configure bonding.

Cool, thanks for bringing this into my attention... I understand now my 
patch set should also handle the ifenslave.c source that comes with the 
kernel (eg to allow for not setting the hw address etc)

Or.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-10-04 15:25             ` Or Gerlitz
@ 2006-10-04 17:34               ` Jay Vosburgh
  2006-10-05 14:56                 ` Or Gerlitz
  0 siblings, 1 reply; 20+ messages in thread
From: Jay Vosburgh @ 2006-10-04 17:34 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: netdev, Roland Dreier

Or Gerlitz <ogerlitz@voltaire.com> wrote:
[...]
>Looking on the sysconfig package, some tools eg /sbin/if{up,down,status}
>use ifenslave which is in turn provided by the iputils package.
>
>My understanding is that changing ifenslave and the bonding kernel code to
>allow for enslaving while master is not up is enough, so actually no
>change is needed to the sysconfig tools, correct?

	Incorrect.  The /sbin/ifup included with sysconfig (I'm looking
at version 0.31-0-15.51) has logic to set the bonding master device up
prior to adding any slaves.  E.g.,

		# get up the bonding device before enslaving
#		if ! is_iface_up $INTERFACE; then
			ip link set $INTERFACE up 2>&1
#		fi
		# enslave available slave devices; if there is none -> hard break and log
		MESSAGE=`/sbin/ifenslave $BONDING_OPTIONS $INTERFACE $BSINTERFACES 2>&1`

	For your purposes, this would cause it to register as an
ethernet hardware type, not an IB type.  The /sbin/ifup included with
initscripts operates a little differently, but also sets the bonding
master up prior to adding any slaves.

>I have now removed the two assertions in the bonding code on enslaving
>while master is not up and manage to work fine with IPoIB slave devices
>and ***without*** the two module params!
>
>When you have the most troublesome to deploy, the troubles you refer to is
>make sure that the distros would include ***both*** the bonding kernel
>changes and use an iputils package which has the ifenslave changes?

	Yes.  Part of the difficulty is that the changes to the
initscripts and sysconfig packages won't be compatible with versions of
bonding prior to the bonding kernel changes (because older versions of
bonding will refuse to add slaves if the master is down).  It might
require adding another API version to bonding, and modifying ifenslave
to work both ways (i.e., with the current "enslave with master up" API,
as well as the new "enslave with master down" API).

>> 	Yes, ifenslave is still supported.  It probably will be
>> obsoleted some day (or replaced with a script that uses sysfs), but not
>> anytime soon.  As far as I know, all current distros use ifenslave to
>> configure bonding.
>
>Cool, thanks for bringing this into my attention... I understand now my
>patch set should also handle the ifenslave.c source that comes with the
>kernel (eg to allow for not setting the hw address etc)

	An alternate approach would be to undertake the more substantial
task of converting the initscripts and sysconfig code to use sysfs to
configure bonding.  This would permit changing the logic (to add slaves
while the bonding master is down, then set it up), as well as remove the
current hacks (present only in sysconfig) to load the bonding module
once per configured bonding interface.  The initscripts currently don't
do this (as far as I know), so it's generally only possible to have one
bonding interface under initscripts control.

	In this case, ifenslave would continue to work as it does now,
and would simply not be supported for the new hardware.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-10-04 17:34               ` Jay Vosburgh
@ 2006-10-05 14:56                 ` Or Gerlitz
  2006-10-05 18:13                   ` Jay Vosburgh
  0 siblings, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2006-10-05 14:56 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, Roland Dreier

Jay Vosburgh wrote:
> Or Gerlitz <ogerlitz@voltaire.com> wrote:
>> My understanding is that changing ifenslave and the bonding kernel code to
>> allow for enslaving while master is not up is enough, so actually no
>> change is needed to the sysconfig tools, correct?
> 
> 	Incorrect.  The /sbin/ifup included with sysconfig (I'm looking
> at version 0.31-0-15.51) has logic to set the bonding master device up
> prior to adding any slaves.  E.g.,
> 
> 		# get up the bonding device before enslaving
> #		if ! is_iface_up $INTERFACE; then
> 			ip link set $INTERFACE up 2>&1
> #		fi
> 		# enslave available slave devices; if there is none -> hard break and log
> 		MESSAGE=`/sbin/ifenslave $BONDING_OPTIONS $INTERFACE $BSINTERFACES 2>&1`
> 
> 	For your purposes, this would cause it to register as an
> ethernet hardware type, not an IB type.  The /sbin/ifup included with
> initscripts operates a little differently, but also sets the bonding
> master up prior to adding any slaves.

OK, you are correct, i agree that the /sbin/ifup would attempt to first 
bring up the bonding device so it breaks my assumptions...

> 	Yes.  Part of the difficulty is that the changes to the
> initscripts and sysconfig packages won't be compatible with versions of
> bonding prior to the bonding kernel changes (because older versions of
> bonding will refuse to add slaves if the master is down).  It might
> require adding another API version to bonding, and modifying ifenslave
> to work both ways (i.e., with the current "enslave with master up" API,
> as well as the new "enslave with master down" API).

Gee, sounds bad

> 	An alternate approach would be to undertake the more substantial
> task of converting the initscripts and sysconfig code to use sysfs to
> configure bonding.  This would permit changing the logic (to add slaves
> while the bonding master is down, then set it up), as well as remove the
> current hacks (present only in sysconfig) to load the bonding module
> once per configured bonding interface.  The initscripts currently don't
> do this (as far as I know), so it's generally only possible to have one
> bonding interface under initscripts control.

This sounds like a good idea to get out of all these troubles...

So the direction to have sysconfig and initscripts tools configure 
bonding by sysfs and not by the enslave program is something you were 
considering regardless of the needs imposed by bonding support for non 
ARPHRD_ETHER netdevices? and you think the distro packages owners would 
like this?

I will look into the current methods used by sysconfig to configure 
bonding and see if i can come up with sketch of how to do it with sysfs.

Basically, i use now my own script working with sysfs in my IPoIB 
bonding testing where i have followed the directions in the bonding 
kernel doc.

Thanks again for all the coaching...

Or.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-10-05 14:56                 ` Or Gerlitz
@ 2006-10-05 18:13                   ` Jay Vosburgh
  2006-10-09 13:15                     ` Or Gerlitz
  0 siblings, 1 reply; 20+ messages in thread
From: Jay Vosburgh @ 2006-10-05 18:13 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: netdev, Roland Dreier

Or Gerlitz <ogerlitz@voltaire.com> wrote:

>Jay Vosburgh wrote:
[...]
>> 	Yes.  Part of the difficulty is that the changes to the
>> initscripts and sysconfig packages won't be compatible with versions of
>> bonding prior to the bonding kernel changes (because older versions of
>> bonding will refuse to add slaves if the master is down).  It might
>> require adding another API version to bonding, and modifying ifenslave
>> to work both ways (i.e., with the current "enslave with master up" API,
>> as well as the new "enslave with master down" API).
>
>Gee, sounds bad

	After some reflection, I suspect it wouldn't be all that awful.
The main concern is going to be whether or not the existing ifenslave
binaries supplied with distros will run with the new version of bonding.
Since the new version of bonding that you're proposing is really just
relaxing the rules (rather than imposing a different, incompatible set
of rules), that's probably not a really big deal.  I don't think it
would require a revision change to the bonding ifenslave API.

[...]
>So the direction to have sysconfig and initscripts tools configure bonding
>by sysfs and not by the enslave program is something you were considering
>regardless of the needs imposed by bonding support for non ARPHRD_ETHER
>netdevices? and you think the distro packages owners would like this?

	Yes, the long term direction is to have the initscripts
configure bonding via sysfs, either directly or via the step of
converting ifenslave to a script that uses sysfs.  

	I personally find ifenslave to be more convenient to use than
repeated "echo whatever > /sys/this/that/the/other", but there's no
reason that ifenslave couldn't do the various echo things itself under
the covers.  

	One drawback to sysfs is that there's no real-time error
reporting; you have to look at dmesg to see if your request succeeded or
not.  I'm not sure offhand if, e.g., adding a sysfs file to bonding for
"last-request-status" is a kosher sysfs thing to do; if it is, then an
ifenslave script could check such a thing to figure out error returns.

	It seems more logical to me to embed all of the bonding sysfs
magic stuff into a separate script, but the maintainers of initscipts or
sysconfig may see things differently.

	The main advantage to either of these (initscripts/sysconfig
and/or ifenslave converted to sysfs) is that it eliminates the need to
load the bonding driver module multiple times to have more than one
bonding device with differing module parameters (because the sysfs
interface can create any number of bonding interfaces with arbitrary
settings).

>I will look into the current methods used by sysconfig to configure
>bonding and see if i can come up with sketch of how to do it with sysfs.

	It's probably easier to first convert ifenslave to a sysfs-using
script that the existing initscripts can use.  

	This allows the changes to be published in stages, rather than
requiring a single flag day changeover.  The first stage changes the
bonding driver itself to permit enslavement with the master down
(insuring that existing ifenslave binaries supplied with reasonably
current distros continue to function).  Next, ifenslave is changed to
use sysfs (simultaneously removing the adjustment of the master or
slave's up/down state during enslavement).  The next stage either
changes the initscripts/sysconfig to use sysfs directly or change its
use of ifenslave to not do multiple loads of the bonding driver. 

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices
  2006-10-05 18:13                   ` Jay Vosburgh
@ 2006-10-09 13:15                     ` Or Gerlitz
  0 siblings, 0 replies; 20+ messages in thread
From: Or Gerlitz @ 2006-10-09 13:15 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, Roland Dreier

Jay Vosburgh wrote:
> 	After some reflection, I suspect it wouldn't be all that awful.
> The main concern is going to be whether or not the existing ifenslave
> binaries supplied with distros will run with the new version of bonding.
> Since the new version of bonding that you're proposing is really just
> relaxing the rules (rather than imposing a different, incompatible set
> of rules), that's probably not a really big deal.  I don't think it
> would require a revision change to the bonding ifenslave API.

Indeed, makes sense, the modified bonding driver would work with old 
ifenslave binaries.

> 	Yes, the long term direction is to have the initscripts
> configure bonding via sysfs, either directly or via the step of
> converting ifenslave to a script that uses sysfs.  

> 	I personally find ifenslave to be more convenient to use than
> repeated "echo whatever > /sys/this/that/the/other", but there's no
> reason that ifenslave couldn't do the various echo things itself under
> the covers.  

> 	One drawback to sysfs is that there's no real-time error
> reporting; you have to look at dmesg to see if your request succeeded or
> not.  I'm not sure offhand if, e.g., adding a sysfs file to bonding for
> "last-request-status" is a kosher sysfs thing to do; if it is, then an
> ifenslave script could check such a thing to figure out error returns.

Can you check that with someone around?

> 
> 	It seems more logical to me to embed all of the bonding sysfs
> magic stuff into a separate script, but the maintainers of initscipts or
> sysconfig may see things differently.
> 
> 	The main advantage to either of these (initscripts/sysconfig
> and/or ifenslave converted to sysfs) is that it eliminates the need to
> load the bonding driver module multiple times to have more than one
> bonding device with differing module parameters (because the sysfs
> interface can create any number of bonding interfaces with arbitrary
> settings).
> 
>> I will look into the current methods used by sysconfig to configure
>> bonding and see if i can come up with sketch of how to do it with sysfs.
> 
> 	It's probably easier to first convert ifenslave to a sysfs-using
> script that the existing initscripts can use.  
> 
> 	This allows the changes to be published in stages, rather than
> requiring a single flag day changeover.  The first stage changes the
> bonding driver itself to permit enslavement with the master down
> (insuring that existing ifenslave binaries supplied with reasonably
> current distros continue to function).  Next, ifenslave is changed to
> use sysfs (simultaneously removing the adjustment of the master or
> slave's up/down state during enslavement).  The next stage either
> changes the initscripts/sysconfig to use sysfs directly or change its
> use of ifenslave to not do multiple loads of the bonding driver. 

This plan makes much sense! however, this way or another (ie whether 
sysconfig tools are modified to use sysfs or ifenslave becomes a script 
that uses sysfs) there should be a change to sysconfig tools 
(specifically /sbin/ifup) in the place where it first makes the bonding 
interface UP and only later enslave the slave devices (eg the quote 
below from /sbin/ifup of sysconfig-0.50.9-13.8 that comes with SLES10)
correct?

>                 # get up the bonding device before enslaving
> #               if ! is_iface_up $INTERFACE; then
>                         ip link set $INTERFACE up 2>&1
> #               fi
>                 # enslave available slave devices; if there is none -> hard break and log
>                 MESSAGE=`/sbin/ifenslave $BONDING_OPTIONS $INTERFACE $BSINTERFACES 2>&1`

So this becomes the forth step on the plan. And the most fragile aspect 
of the plan is the fact that ***two*** packages need to be changed as 
/sbin/ifenslave is not part of sysconfig but rather of (eg on SLES10)
iputils-ss021109-167.2

Or.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2006-10-09 13:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-26 10:16 [RFC][PATCH 0/3] bonding support for operation over IPoIB Or Gerlitz
2006-09-26 10:17 ` [RFC][PATCH 1/3] enable bonding to enslave non ARPHRD_ETHER netdevices Or Gerlitz
2006-09-26 19:23   ` Jay Vosburgh
2006-09-27 19:59     ` Or Gerlitz
2006-09-28 17:02       ` Jay Vosburgh
2006-10-03 12:56         ` Or Gerlitz
2006-09-26 10:17 ` [RFC][PATCH 2/3] enable bonding to enslave netdevices not supporting set_mac_address() Or Gerlitz
2006-09-26 10:18 ` [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices Or Gerlitz
2006-09-26 17:05   ` Stephen Hemminger
2006-09-27 20:16     ` Or Gerlitz
2006-09-26 23:40   ` Jay Vosburgh
2006-09-27 20:12     ` Or Gerlitz
2006-09-28 17:43       ` Jay Vosburgh
2006-10-03 13:06         ` Or Gerlitz
2006-10-03 23:10           ` Jay Vosburgh
2006-10-04 15:25             ` Or Gerlitz
2006-10-04 17:34               ` Jay Vosburgh
2006-10-05 14:56                 ` Or Gerlitz
2006-10-05 18:13                   ` Jay Vosburgh
2006-10-09 13:15                     ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).