* [ofa-general] [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
@ 2007-07-30 12:37 Moni Shoua
2007-07-30 12:48 ` [ofa-general] [PATCH V3 1/7] IB/ipoib: Bound the net device to the ipoib_neigh structue Moni Shoua
` (7 more replies)
0 siblings, 8 replies; 20+ messages in thread
From: Moni Shoua @ 2007-07-30 12:37 UTC (permalink / raw)
To: rdreier, davem, fubar; +Cc: netdev, general
This patch series is the third version (see below link to V2) of the
suggested changes to the bonding driver so it would be able to support
non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode.
The motivation is to enable the bonding driver on its HA mode to work with
the IP over Infiniband (IPoIB) driver. With these patches I was able to enslave
IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast and ICMP traffic with
fail-over and fail-back working fine. The working environment was the net-2.6 git.
More over, as IPoIB is also the IB ARP provider for the RDMA CM driver which
is used by native IB ULPs whose addressing scheme is based on IP (e.g. iSER,
SDP, Lustre, NFSoRDMA, RDS), bonding support for IPoIB devices **enables** HA
for these ULPs. This holds as when the ULP is informed by the IB HW on the
failure of the current IB connection, it just need to reconnect, where the
bonding device will now issue the IB ARP over the active IPoIB slave.
This series also includes patches to the IPoIB driver that fix some fix
some neighboring related issues.
There are still 2 open issues here:
1. When bonding enslaves an IPoIB device the bonding neighbor holds a
reference to a cleanup function in the IPoIB drives. This makes it unsafe to
unload the IPoIB module if there are bonding neighbors in the air. So, to
avoid this race one must unload bonding before unloading IPoIB.
2. Patch No. 7 is a workaround to a problem where gratuitous were not sent quite often.
I didn't find something better that fixes this and I would
appreciate advices and comments regarding it. However, this doesn't seem to me as
an issue related exclusively to IPoIB.
Links to earlier discussion:
1. A discussion in netdev about bonding support for IPoIB.
http://lists.openwall.net/netdev/2006/11/30/46
2. A discussion in openfabrics regarding changes in the IPoIB that
enable using it as a slave for bonding.
http://lists.openfabrics.org/pipermail/general/2007-March/034033.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] [PATCH V3 1/7] IB/ipoib: Bound the net device to the ipoib_neigh structue
2007-07-30 12:37 [ofa-general] [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Moni Shoua
@ 2007-07-30 12:48 ` Moni Shoua
2007-07-30 12:49 ` [ofa-general] [PATCH V3 2/7] IB/ipoib: Verify address handle validity on send Moni Shoua
` (6 subsequent siblings)
7 siblings, 0 replies; 20+ messages in thread
From: Moni Shoua @ 2007-07-30 12:48 UTC (permalink / raw)
Cc: netdev, rdreier, fubar, davem, general
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.
When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.
Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
can be casted to struct ipoib_dev_priv is buggy.
To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n->dev->flags has the
IFF_MASTER bit set.
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
---
drivers/infiniband/ulp/ipoib/ipoib.h | 4 +++-
drivers/infiniband/ulp/ipoib/ipoib_main.c | 17 +++++++++++++++--
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 2 +-
3 files changed, 19 insertions(+), 4 deletions(-)
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h
===================================================================
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-07-25 14:56:13.000000000 +0300
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h 2007-07-25 14:57:48.095724495 +0300
@@ -328,6 +328,7 @@ struct ipoib_neigh {
struct sk_buff_head queue;
struct neighbour *neighbour;
+ struct net_device *dev;
struct list_head list;
};
@@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ip
INFINIBAND_ALEN, sizeof(void *));
}
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh);
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
+ struct net_device *dev);
void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
extern struct workqueue_struct *ipoib_workqueue;
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 14:56:13.000000000 +0300
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 15:03:11.291291271 +0300
@@ -510,7 +510,7 @@ static void neigh_add_path(struct sk_buf
struct ipoib_path *path;
struct ipoib_neigh *neigh;
- neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+ neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev);
if (!neigh) {
++priv->stats.tx_dropped;
dev_kfree_skb_any(skb);
@@ -817,6 +817,16 @@ static void ipoib_neigh_cleanup(struct n
unsigned long flags;
struct ipoib_ah *ah = NULL;
+ if (n->dev->flags & IFF_MASTER) {
+ /* n->dev is not an IPoIB device and we have to take priv from elsewhere */
+ neigh = *to_ipoib_neigh(n);
+ if (neigh){
+ priv = netdev_priv(neigh->dev);
+ ipoib_dbg(priv, "neigh_destructor for bonding device: %s\n",
+ n->dev->name);
+ } else
+ return;
+ }
ipoib_dbg(priv,
"neigh_cleanup for %06x " IPOIB_GID_FMT "\n",
IPOIB_QPN(n->ha),
@@ -838,7 +848,9 @@ static void ipoib_neigh_cleanup(struct n
ipoib_put_ah(ah);
}
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour,
+ struct net_device *dev)
+
{
struct ipoib_neigh *neigh;
@@ -847,6 +859,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
return NULL;
neigh->neighbour = neighbour;
+ neigh->dev = dev;
*to_ipoib_neigh(neighbour) = neigh;
skb_queue_head_init(&neigh->queue);
ipoib_cm_set(neigh, NULL);
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===================================================================
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-07-25 14:56:13.000000000 +0300
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-07-25 14:57:48.097724142 +0300
@@ -727,7 +727,7 @@ out:
if (skb->dst &&
skb->dst->neighbour &&
!*to_ipoib_neigh(skb->dst->neighbour)) {
- struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+ struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev);
if (neigh) {
kref_get(&mcast->ah->ref);
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] [PATCH V3 2/7] IB/ipoib: Verify address handle validity on send
2007-07-30 12:37 [ofa-general] [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Moni Shoua
2007-07-30 12:48 ` [ofa-general] [PATCH V3 1/7] IB/ipoib: Bound the net device to the ipoib_neigh structue Moni Shoua
@ 2007-07-30 12:49 ` Moni Shoua
2007-07-30 12:51 ` [ofa-general] [PATCH V3 3/7] net/bonding: Enable bonding to enslave non ARPHRD_ETHER Moni Shoua
` (5 subsequent siblings)
7 siblings, 0 replies; 20+ messages in thread
From: Moni Shoua @ 2007-07-30 12:49 UTC (permalink / raw)
To: rdreier, davem, fubar; +Cc: netdev, general
When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 14:57:48.000000000 +0300
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 15:02:55.525131034 +0300
@@ -685,9 +685,10 @@ static int ipoib_start_xmit(struct sk_bu
goto out;
}
} else if (neigh->ah) {
- if (unlikely(memcmp(&neigh->dgid.raw,
+ if (unlikely((memcmp(&neigh->dgid.raw,
skb->dst->neighbour->ha + 4,
- sizeof(union ib_gid)))) {
+ sizeof(union ib_gid))) ||
+ (neigh->dev != dev))) {
spin_lock(&priv->lock);
/*
* It's safe to call ipoib_put_ah() inside
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] [PATCH V3 3/7] net/bonding: Enable bonding to enslave non ARPHRD_ETHER
2007-07-30 12:37 [ofa-general] [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Moni Shoua
2007-07-30 12:48 ` [ofa-general] [PATCH V3 1/7] IB/ipoib: Bound the net device to the ipoib_neigh structue Moni Shoua
2007-07-30 12:49 ` [ofa-general] [PATCH V3 2/7] IB/ipoib: Verify address handle validity on send Moni Shoua
@ 2007-07-30 12:51 ` Moni Shoua
2007-07-30 12:52 ` [ofa-general] [PATCH V3 4/7] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address() Moni Shoua
` (4 subsequent siblings)
7 siblings, 0 replies; 20+ messages in thread
From: Moni Shoua @ 2007-07-30 12:51 UTC (permalink / raw)
To: rdreier, davem, fubar; +Cc: netdev, general
This patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types. It also enforces mutual exclusion on bonding slaves
from dissimilar ether types, as was concluded over the v1 discussion.
IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have omitted here some details which
are not important for the bonding RFC).
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
---
drivers/net/bonding/bond_main.c | 38 ++++++++++++++++++++++++++++++++++++++
1 files changed, 38 insertions(+)
Index: net-2.6/drivers/net/bonding/bond_main.c
===================================================================
--- net-2.6.orig/drivers/net/bonding/bond_main.c 2007-07-25 15:02:10.000000000 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-29 16:24:30.913343981 +0300
@@ -1277,6 +1277,26 @@ static int bond_compute_features(struct
return 0;
}
+
+static void bond_setup_by_slave(struct net_device *bond_dev,
+ struct net_device *slave_dev)
+{
+ bond_dev->hard_header = slave_dev->hard_header;
+ bond_dev->rebuild_header = slave_dev->rebuild_header;
+ bond_dev->hard_header_cache = slave_dev->hard_header_cache;
+ bond_dev->header_cache_update = slave_dev->header_cache_update;
+ bond_dev->hard_header_parse = slave_dev->hard_header_parse;
+
+ bond_dev->neigh_setup = slave_dev->neigh_setup;
+
+ bond_dev->type = slave_dev->type;
+ bond_dev->hard_header_len = slave_dev->hard_header_len;
+ bond_dev->addr_len = slave_dev->addr_len;
+
+ memcpy(bond_dev->broadcast, slave_dev->broadcast,
+ slave_dev->addr_len);
+}
+
/* enslave device <slave> to bond device <master> */
int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
{
@@ -1351,6 +1371,24 @@ int bond_enslave(struct net_device *bond
goto err_undo_flags;
}
+ /* set bonding device ether type by slave - bonding netdevices are
+ * created with ether_setup, so when the slave type is not ARPHRD_ETHER
+ * there is a need to override some of the type dependent attribs/funcs.
+ *
+ * bond ether type mutual exclusion - don't allow slaves of dissimilar
+ * ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond
+ */
+ if (bond->slave_cnt == 0) {
+ if (slave_dev->type != ARPHRD_ETHER)
+ bond_setup_by_slave(bond_dev, slave_dev);
+ } else if (bond_dev->type != slave_dev->type) {
+ printk(KERN_ERR DRV_NAME ": %s ether type (%d) is different from "
+ "other slaves (%d), can not enslave it.\n", slave_dev->name,
+ slave_dev->type, bond_dev->type);
+ res = -EINVAL;
+ goto err_undo_flags;
+ }
+
if (slave_dev->set_mac_address == NULL) {
printk(KERN_ERR DRV_NAME
": %s: Error: The slave device you specified does "
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] [PATCH V3 4/7] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()
2007-07-30 12:37 [ofa-general] [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Moni Shoua
` (2 preceding siblings ...)
2007-07-30 12:51 ` [ofa-general] [PATCH V3 3/7] net/bonding: Enable bonding to enslave non ARPHRD_ETHER Moni Shoua
@ 2007-07-30 12:52 ` Moni Shoua
2007-07-30 12:54 ` [ofa-general] [PATCH V3 5/7] net/bonding: Enable IP multicast for bonding IPoIB devices Moni Shoua
` (3 subsequent siblings)
7 siblings, 0 replies; 20+ messages in thread
From: Moni Shoua @ 2007-07-30 12:52 UTC (permalink / raw)
To: rdreier, davem, fubar; +Cc: netdev, general
This patch allows for enslaving netdevices which do not support
the set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs
(this is already done by the bonding code).
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
---
drivers/net/bonding/bond_main.c | 88 +++++++++++++++++++++++++++-------------
drivers/net/bonding/bonding.h | 1
2 files changed, 61 insertions(+), 28 deletions(-)
Index: net-2.6/drivers/net/bonding/bond_main.c
===================================================================
--- net-2.6.orig/drivers/net/bonding/bond_main.c 2007-07-29 16:24:30.913343981 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-29 16:36:53.234602471 +0300
@@ -1127,6 +1127,14 @@ void bond_change_active_slave(struct bon
if (new_active) {
bond_set_slave_active_flags(new_active);
}
+
+ /* when bonding does not set the slave MAC address, the bond MAC
+ * address is the one of the active slave.
+ */
+ if (new_active && !bond->do_set_mac_addr)
+ memcpy(bond->dev->dev_addr, new_active->dev->dev_addr,
+ new_active->dev->addr_len);
+
bond_send_gratuitous_arp(bond);
}
}
@@ -1390,13 +1398,22 @@ int bond_enslave(struct net_device *bond
}
if (slave_dev->set_mac_address == NULL) {
- printk(KERN_ERR DRV_NAME
- ": %s: Error: The slave device you specified does "
- "not support setting the MAC address. "
- "Your kernel likely does not support slave "
- "devices.\n", bond_dev->name);
- res = -EOPNOTSUPP;
- goto err_undo_flags;
+ if (bond->slave_cnt == 0) {
+ printk(KERN_WARNING DRV_NAME
+ ": %s: Warning: The first slave device you "
+ "specified does not support setting the MAC "
+ "address. This bond MAC address would be that "
+ "of the active slave.\n", bond_dev->name);
+ bond->do_set_mac_addr = 0;
+ } else if (bond->do_set_mac_addr) {
+ printk(KERN_ERR DRV_NAME
+ ": %s: Error: The slave device you specified "
+ "does not support setting the MAC addres,."
+ "but this bond uses this practice. \n"
+ , bond_dev->name);
+ res = -EOPNOTSUPP;
+ goto err_undo_flags;
+ }
}
new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
@@ -1417,16 +1434,18 @@ int bond_enslave(struct net_device *bond
*/
memcpy(new_slave->perm_hwaddr, slave_dev->dev_addr, ETH_ALEN);
- /*
- * Set slave to master's mac address. The application already
- * set the master's mac address to that of the first slave
- */
- memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
- addr.sa_family = slave_dev->type;
- res = dev_set_mac_address(slave_dev, &addr);
- if (res) {
- dprintk("Error %d calling set_mac_address\n", res);
- goto err_free;
+ if (bond->do_set_mac_addr) {
+ /*
+ * Set slave to master's mac address. The application already
+ * set the master's mac address to that of the first slave
+ */
+ memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
+ addr.sa_family = slave_dev->type;
+ res = dev_set_mac_address(slave_dev, &addr);
+ if (res) {
+ dprintk("Error %d calling set_mac_address\n", res);
+ goto err_free;
+ }
}
res = netdev_set_master(slave_dev, bond_dev);
@@ -1651,9 +1670,11 @@ err_close:
dev_close(slave_dev);
err_restore_mac:
- memcpy(addr.sa_data, new_slave->perm_hwaddr, ETH_ALEN);
- addr.sa_family = slave_dev->type;
- dev_set_mac_address(slave_dev, &addr);
+ if (bond->do_set_mac_addr) {
+ memcpy(addr.sa_data, new_slave->perm_hwaddr, ETH_ALEN);
+ addr.sa_family = slave_dev->type;
+ dev_set_mac_address(slave_dev, &addr);
+ }
err_free:
kfree(new_slave);
@@ -1831,10 +1852,12 @@ int bond_release(struct net_device *bond
/* close slave before restoring its mac address */
dev_close(slave_dev);
- /* restore original ("permanent") mac address */
- memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
- addr.sa_family = slave_dev->type;
- dev_set_mac_address(slave_dev, &addr);
+ if (bond->do_set_mac_addr) {
+ /* restore original ("permanent") mac address */
+ memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
+ addr.sa_family = slave_dev->type;
+ dev_set_mac_address(slave_dev, &addr);
+ }
slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB |
IFF_SLAVE_INACTIVE | IFF_BONDING |
@@ -1921,10 +1944,12 @@ static int bond_release_all(struct net_d
/* close slave before restoring its mac address */
dev_close(slave_dev);
- /* restore original ("permanent") mac address*/
- memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
- addr.sa_family = slave_dev->type;
- dev_set_mac_address(slave_dev, &addr);
+ if (bond->do_set_mac_addr) {
+ /* restore original ("permanent") mac address*/
+ memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
+ addr.sa_family = slave_dev->type;
+ dev_set_mac_address(slave_dev, &addr);
+ }
slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB |
IFF_SLAVE_INACTIVE);
@@ -3961,6 +3986,10 @@ static int bond_set_mac_address(struct n
dprintk("bond=%p, name=%s\n", bond, (bond_dev ? bond_dev->name : "None"));
+ if (!bond->do_set_mac_addr) {
+ return -EOPNOTSUPP;
+ }
+
if (!is_valid_ether_addr(sa->sa_data)) {
return -EADDRNOTAVAIL;
}
@@ -4351,6 +4380,9 @@ static int bond_init(struct net_device *
bond_create_proc_entry(bond);
#endif
+ /* set do_set_mac_addr to true on startup */
+ bond->do_set_mac_addr = 1;
+
list_add_tail(&bond->bond_list, &bond_dev_list);
return 0;
Index: net-2.6/drivers/net/bonding/bonding.h
===================================================================
--- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-29 16:25:22.000000000 +0300
+++ net-2.6/drivers/net/bonding/bonding.h 2007-07-29 16:37:13.163056181 +0300
@@ -201,6 +201,7 @@ struct bonding {
struct list_head vlan_list;
struct vlan_group *vlgrp;
struct packet_type arp_mon_pt;
+ s8 do_set_mac_addr;
};
/**
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] [PATCH V3 5/7] net/bonding: Enable IP multicast for bonding IPoIB devices
2007-07-30 12:37 [ofa-general] [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Moni Shoua
` (3 preceding siblings ...)
2007-07-30 12:52 ` [ofa-general] [PATCH V3 4/7] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address() Moni Shoua
@ 2007-07-30 12:54 ` Moni Shoua
2007-07-30 12:54 ` [ofa-general] [PATCH V3 6/7] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device Moni Shoua
` (2 subsequent siblings)
7 siblings, 0 replies; 20+ messages in thread
From: Moni Shoua @ 2007-07-30 12:54 UTC (permalink / raw)
To: rdreier, davem, fubar; +Cc: netdev, general
Allow to enslave devices when the bonding device is not up. Over the discussion
held at the previous post this seemed to be the most clean way to go, where it
is not expected to cause instabilities.
Normally, the bonding driver is UP before any enslavement takes place.
Once a netdevice is UP, the network stack acts to have it join some multicast groups
(eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device
type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
where for multicast joins taking place after the enslavement another ip_xxx_mc_map()
is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
---
drivers/net/bonding/bond_main.c | 4 ++--
drivers/net/bonding/bond_sysfs.c | 6 ++----
2 files changed, 4 insertions(+), 6 deletions(-)
Index: net-2.6/drivers/net/bonding/bond_main.c
===================================================================
--- net-2.6.orig/drivers/net/bonding/bond_main.c 2007-07-25 15:04:50.000000000 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-25 15:06:17.175820632 +0300
@@ -1325,8 +1325,8 @@ int bond_enslave(struct net_device *bond
/* bond must be initialized by bond_open() before enslaving */
if (!(bond_dev->flags & IFF_UP)) {
- dprintk("Error, master_dev is not up\n");
- return -EPERM;
+ printk(KERN_WARNING DRV_NAME
+ " %s: master_dev is not up in bond_enslave\n", bond_dev->name);
}
/* already enslaved */
Index: net-2.6/drivers/net/bonding/bond_sysfs.c
===================================================================
--- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-07-25 14:18:12.000000000 +0300
+++ net-2.6/drivers/net/bonding/bond_sysfs.c 2007-07-25 15:06:17.176820452 +0300
@@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru
/* Quick sanity check -- is the bond interface up? */
if (!(bond->dev->flags & IFF_UP)) {
- printk(KERN_ERR DRV_NAME
- ": %s: Unable to update slaves because interface is down.\n",
+ printk(KERN_WARNING DRV_NAME
+ ": %s: doing slave updates when interface is down.\n",
bond->dev->name);
- ret = -EPERM;
- goto out;
}
/* Note: We can't hold bond->lock here, as bond_create grabs it. */
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] [PATCH V3 6/7] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device
2007-07-30 12:37 [ofa-general] [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Moni Shoua
` (4 preceding siblings ...)
2007-07-30 12:54 ` [ofa-general] [PATCH V3 5/7] net/bonding: Enable IP multicast for bonding IPoIB devices Moni Shoua
@ 2007-07-30 12:54 ` Moni Shoua
2007-07-30 12:56 ` [ofa-general] [PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure Moni Shoua
2007-07-30 21:20 ` [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Roland Dreier
7 siblings, 0 replies; 20+ messages in thread
From: Moni Shoua @ 2007-07-30 12:54 UTC (permalink / raw)
To: rdreier, davem, fubar; +Cc: netdev, general
bonding sometimes uses Ethernet constants (such as MTU and address length) which
are not good when it enslaves non Ethernet devices (such as InfiniBand).
Signed-off-by: Moni Shoua <monis@voltaire.com>
---
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/bonding/bond_sysfs.c | 10 ++++++++--
drivers/net/bonding/bonding.h | 1 +
3 files changed, 10 insertions(+), 3 deletions(-)
Index: net-2.6/drivers/net/bonding/bond_main.c
===================================================================
--- net-2.6.orig/drivers/net/bonding/bond_main.c 2007-07-25 15:06:17.000000000 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-25 15:33:25.012883360 +0300
@@ -1255,7 +1255,7 @@ static int bond_compute_features(struct
unsigned long features = BOND_INTERSECT_FEATURES;
struct slave *slave;
struct net_device *bond_dev = bond->dev;
- unsigned short max_hard_header_len = ETH_HLEN;
+ u16 max_hard_header_len = max((u16)ETH_HLEN, bond_dev->hard_header_len);
int i;
bond_for_each_slave(bond, slave, i) {
Index: net-2.6/drivers/net/bonding/bond_sysfs.c
===================================================================
--- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-07-25 15:06:17.000000000 +0300
+++ net-2.6/drivers/net/bonding/bond_sysfs.c 2007-07-25 15:20:10.224527636 +0300
@@ -260,6 +260,7 @@ static ssize_t bonding_store_slaves(stru
char command[IFNAMSIZ + 1] = { 0, };
char *ifname;
int i, res, found, ret = count;
+ u32 original_mtu;
struct slave *slave;
struct net_device *dev = NULL;
struct bonding *bond = to_bond(d);
@@ -325,6 +326,7 @@ static ssize_t bonding_store_slaves(stru
}
/* Set the slave's MTU to match the bond */
+ original_mtu = dev->mtu;
if (dev->mtu != bond->dev->mtu) {
if (dev->change_mtu) {
res = dev->change_mtu(dev,
@@ -339,6 +341,9 @@ static ssize_t bonding_store_slaves(stru
}
rtnl_lock();
res = bond_enslave(bond->dev, dev);
+ bond_for_each_slave(bond, slave, i)
+ if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0)
+ slave->original_mtu=original_mtu;
rtnl_unlock();
if (res) {
ret = res;
@@ -351,6 +356,7 @@ static ssize_t bonding_store_slaves(stru
bond_for_each_slave(bond, slave, i)
if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0) {
dev = slave->dev;
+ original_mtu = slave->original_mtu;
break;
}
if (dev) {
@@ -365,9 +371,9 @@ static ssize_t bonding_store_slaves(stru
}
/* set the slave MTU to the default */
if (dev->change_mtu) {
- dev->change_mtu(dev, 1500);
+ dev->change_mtu(dev, original_mtu);
} else {
- dev->mtu = 1500;
+ dev->mtu = original_mtu;
}
}
else {
Index: net-2.6/drivers/net/bonding/bonding.h
===================================================================
--- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-25 15:03:32.000000000 +0300
+++ net-2.6/drivers/net/bonding/bonding.h 2007-07-25 15:20:10.223527810 +0300
@@ -156,6 +156,7 @@ struct slave {
s8 link; /* one of BOND_LINK_XXXX */
s8 state; /* one of BOND_STATE_XXXX */
u32 original_flags;
+ u32 original_mtu;
u32 link_failure_count;
u16 speed;
u8 duplex;
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] [PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure
2007-07-30 12:37 [ofa-general] [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Moni Shoua
` (5 preceding siblings ...)
2007-07-30 12:54 ` [ofa-general] [PATCH V3 6/7] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device Moni Shoua
@ 2007-07-30 12:56 ` Moni Shoua
2007-07-30 20:29 ` [ofa-general] " Jay Vosburgh
2007-07-30 21:20 ` [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Roland Dreier
7 siblings, 1 reply; 20+ messages in thread
From: Moni Shoua @ 2007-07-30 12:56 UTC (permalink / raw)
To: rdreier, davem, fubar; +Cc: netdev, general
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
in dev->state field is on. This improves the chances for the arp packet to
be transmitted.
Signed-off-by: Moni Shoua <monis@voltaire.com>
---
drivers/net/bonding/bond_main.c | 25 +++++++++++++++++++++----
drivers/net/bonding/bonding.h | 1 +
2 files changed, 22 insertions(+), 4 deletions(-)
Index: net-2.6/drivers/net/bonding/bond_main.c
===================================================================
--- net-2.6.orig/drivers/net/bonding/bond_main.c 2007-07-25 15:33:25.000000000 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-26 18:42:59.296296622 +0300
@@ -1134,8 +1134,13 @@ void bond_change_active_slave(struct bon
if (new_active && !bond->do_set_mac_addr)
memcpy(bond->dev->dev_addr, new_active->dev->dev_addr,
new_active->dev->addr_len);
-
- bond_send_gratuitous_arp(bond);
+ if (bond->curr_active_slave &&
+ test_bit(__LINK_STATE_LINKWATCH_PENDING, &bond->curr_active_slave->dev->state)){
+ dprintk("delaying gratuitous arp on %s\n",bond->curr_active_slave->dev->name);
+ bond->send_grat_arp=1;
+ }else{
+ bond_send_gratuitous_arp(bond);
+ }
}
}
@@ -2120,6 +2125,15 @@ void bond_mii_monitor(struct net_device
* program could monitor the link itself if needed.
*/
+ if (bond->send_grat_arp) {
+ if (bond->curr_active_slave && test_bit(__LINK_STATE_LINKWATCH_PENDING, &bond->curr_active_slave->dev->state))
+ dprintk("Needs to send gratuitous arp but not yet\n",__FUNCTION__);
+ else {
+ dprintk("sending delayed gratuitous arp on ond->curr_active_slave->dev->name\n");
+ bond_send_gratuitous_arp(bond);
+ bond->send_grat_arp=0;
+ }
+ }
read_lock(&bond->curr_slave_lock);
oldcurrent = bond->curr_active_slave;
read_unlock(&bond->curr_slave_lock);
@@ -2513,6 +2527,7 @@ static void bond_send_gratuitous_arp(str
struct slave *slave = bond->curr_active_slave;
struct vlan_entry *vlan;
struct net_device *vlan_dev;
+ int i;
dprintk("bond_send_grat_arp: bond %s slave %s\n", bond->dev->name,
slave ? slave->dev->name : "NULL");
@@ -2520,8 +2535,9 @@ static void bond_send_gratuitous_arp(str
return;
if (bond->master_ip) {
- bond_arp_send(slave->dev, ARPOP_REPLY, bond->master_ip,
- bond->master_ip, 0);
+ for (i=0;i<3;i++)
+ bond_arp_send(slave->dev, ARPOP_REPLY, bond->master_ip,
+ bond->master_ip, 0);
}
list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
@@ -4331,6 +4347,7 @@ static int bond_init(struct net_device *
bond->current_arp_slave = NULL;
bond->primary_slave = NULL;
bond->dev = bond_dev;
+ bond->send_grat_arp=0;
INIT_LIST_HEAD(&bond->vlan_list);
/* Initialize the device entry points */
Index: net-2.6/drivers/net/bonding/bonding.h
===================================================================
--- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-25 15:20:10.000000000 +0300
+++ net-2.6/drivers/net/bonding/bonding.h 2007-07-26 18:42:43.652087660 +0300
@@ -203,6 +203,7 @@ struct bonding {
struct vlan_group *vlgrp;
struct packet_type arp_mon_pt;
s8 do_set_mac_addr;
+ int send_grat_arp;
};
/**
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] Re: [PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure
2007-07-30 12:56 ` [ofa-general] [PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure Moni Shoua
@ 2007-07-30 20:29 ` Jay Vosburgh
2007-07-31 13:33 ` Moni Shoua
0 siblings, 1 reply; 20+ messages in thread
From: Jay Vosburgh @ 2007-07-30 20:29 UTC (permalink / raw)
To: Moni Shoua; +Cc: netdev, rdreier, davem, general
Moni Shoua <monis@voltaire.com> wrote:
>Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
>in dev->state field is on. This improves the chances for the arp packet to
>be transmitted.
Under what circumstances were you seeing problems that delaying
the gratuitous ARP until linkwatch is done improves things? Is this
really an IB thing, or did you experience problems here over regular
ethernet?
>Signed-off-by: Moni Shoua <monis@voltaire.com>
>---
> drivers/net/bonding/bond_main.c | 25 +++++++++++++++++++++----
> drivers/net/bonding/bonding.h | 1 +
> 2 files changed, 22 insertions(+), 4 deletions(-)
>
>Index: net-2.6/drivers/net/bonding/bond_main.c
>===================================================================
>--- net-2.6.orig/drivers/net/bonding/bond_main.c 2007-07-25 15:33:25.000000000 +0300
>+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-26 18:42:59.296296622 +0300
>@@ -1134,8 +1134,13 @@ void bond_change_active_slave(struct bon
> if (new_active && !bond->do_set_mac_addr)
> memcpy(bond->dev->dev_addr, new_active->dev->dev_addr,
> new_active->dev->addr_len);
>-
>- bond_send_gratuitous_arp(bond);
>+ if (bond->curr_active_slave &&
>+ test_bit(__LINK_STATE_LINKWATCH_PENDING, &bond->curr_active_slave->dev->state)){
>+ dprintk("delaying gratuitous arp on %s\n",bond->curr_active_slave->dev->name);
>+ bond->send_grat_arp=1;
>+ }else{
>+ bond_send_gratuitous_arp(bond);
>+ }
Style issues throughout the patch series: many lines are too
long, many things are all smashed together, e.g., "}else{" instead of
"} else {", "send_grat_arp=1" instead of "send_grat_arp = 1", and so on.
> }
> }
>
>@@ -2120,6 +2125,15 @@ void bond_mii_monitor(struct net_device
> * program could monitor the link itself if needed.
> */
>
>+ if (bond->send_grat_arp) {
>+ if (bond->curr_active_slave && test_bit(__LINK_STATE_LINKWATCH_PENDING, &bond->curr_active_slave->dev->state))
>+ dprintk("Needs to send gratuitous arp but not yet\n",__FUNCTION__);
>+ else {
>+ dprintk("sending delayed gratuitous arp on ond->curr_active_slave->dev->name\n");
>+ bond_send_gratuitous_arp(bond);
>+ bond->send_grat_arp=0;
>+ }
>+ }
> read_lock(&bond->curr_slave_lock);
> oldcurrent = bond->curr_active_slave;
> read_unlock(&bond->curr_slave_lock);
>@@ -2513,6 +2527,7 @@ static void bond_send_gratuitous_arp(str
> struct slave *slave = bond->curr_active_slave;
> struct vlan_entry *vlan;
> struct net_device *vlan_dev;
>+ int i;
>
> dprintk("bond_send_grat_arp: bond %s slave %s\n", bond->dev->name,
> slave ? slave->dev->name : "NULL");
>@@ -2520,8 +2535,9 @@ static void bond_send_gratuitous_arp(str
> return;
>
> if (bond->master_ip) {
>- bond_arp_send(slave->dev, ARPOP_REPLY, bond->master_ip,
>- bond->master_ip, 0);
>+ for (i=0;i<3;i++)
>+ bond_arp_send(slave->dev, ARPOP_REPLY, bond->master_ip,
>+ bond->master_ip, 0);
> }
If you delay the grat ARP until linkwatch is done, why is it
also necessary to shotgun several ARPs instead of one? Why are the ARPs
sent for VLANs not also shotgunned in a similar fashion?
If shotgunning like this really is useful, would it not make
more sense to space them out a bit, e.g., over successive monitor
passes?
> list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
>@@ -4331,6 +4347,7 @@ static int bond_init(struct net_device *
> bond->current_arp_slave = NULL;
> bond->primary_slave = NULL;
> bond->dev = bond_dev;
>+ bond->send_grat_arp=0;
> INIT_LIST_HEAD(&bond->vlan_list);
>
> /* Initialize the device entry points */
>Index: net-2.6/drivers/net/bonding/bonding.h
>===================================================================
>--- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-25 15:20:10.000000000 +0300
>+++ net-2.6/drivers/net/bonding/bonding.h 2007-07-26 18:42:43.652087660 +0300
>@@ -203,6 +203,7 @@ struct bonding {
> struct vlan_group *vlgrp;
> struct packet_type arp_mon_pt;
> s8 do_set_mac_addr;
>+ int send_grat_arp;
This need not be a full int, and (this applies to
do_set_mac_addr, also) could probably be squeezed into gaps already
existing within the struct bonding somewhere.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
2007-07-30 12:37 [ofa-general] [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Moni Shoua
` (6 preceding siblings ...)
2007-07-30 12:56 ` [ofa-general] [PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure Moni Shoua
@ 2007-07-30 21:20 ` Roland Dreier
2007-07-31 13:44 ` [ofa-general] " Moni Shoua
7 siblings, 1 reply; 20+ messages in thread
From: Roland Dreier @ 2007-07-30 21:20 UTC (permalink / raw)
To: Moni Shoua; +Cc: davem, fubar, netdev, general
> 1. When bonding enslaves an IPoIB device the bonding neighbor holds a
> reference to a cleanup function in the IPoIB drives. This makes it unsafe to
> unload the IPoIB module if there are bonding neighbors in the air. So, to
> avoid this race one must unload bonding before unloading IPoIB.
I think we really want to resolve this somehow. Getting an oops by
doing "modprobe -r ipoib" isn't that friendly.
Also, what happened to the problem of having an address handle
belonging to the wrong device on bond failover? Did you figure out a
way to fix that one?
- R.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: [PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure
2007-07-30 20:29 ` [ofa-general] " Jay Vosburgh
@ 2007-07-31 13:33 ` Moni Shoua
0 siblings, 0 replies; 20+ messages in thread
From: Moni Shoua @ 2007-07-31 13:33 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: rdreier, davem, general, netdev
Jay Vosburgh wrote:
> Moni Shoua <monis@voltaire.com> wrote:
>
>> Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
>> in dev->state field is on. This improves the chances for the arp packet to
>> be transmitted.
>
> Under what circumstances were you seeing problems that delaying
> the gratuitous ARP until linkwatch is done improves things? Is this
> really an IB thing, or did you experience problems here over regular
> ethernet?
>
I tried to figure out what is the difference in the state/flags of the device when
grat. ARP send succeeds and when it fails. I found exact correlation with the
LINK_STATE_LINKWATCH_PENDING bit on.
I don't think that this is an IB issue but I can't be sure. I didn't run tests
for Ethernet.
>> Signed-off-by: Moni Shoua <monis@voltaire.com>
>> ---
>> drivers/net/bonding/bond_main.c | 25 +++++++++++++++++++++----
>> drivers/net/bonding/bonding.h | 1 +
>> 2 files changed, 22 insertions(+), 4 deletions(-)
>>
>> Index: net-2.6/drivers/net/bonding/bond_main.c
>> ===================================================================
>> --- net-2.6.orig/drivers/net/bonding/bond_main.c 2007-07-25 15:33:25.000000000 +0300
>> +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-26 18:42:59.296296622 +0300
>> @@ -1134,8 +1134,13 @@ void bond_change_active_slave(struct bon
>> if (new_active && !bond->do_set_mac_addr)
>> memcpy(bond->dev->dev_addr, new_active->dev->dev_addr,
>> new_active->dev->addr_len);
>> -
>> - bond_send_gratuitous_arp(bond);
>> + if (bond->curr_active_slave &&
>> + test_bit(__LINK_STATE_LINKWATCH_PENDING, &bond->curr_active_slave->dev->state)){
>> + dprintk("delaying gratuitous arp on %s\n",bond->curr_active_slave->dev->name);
>> + bond->send_grat_arp=1;
>> + }else{
>> + bond_send_gratuitous_arp(bond);
>> + }
>
> Style issues throughout the patch series: many lines are too
> long, many things are all smashed together, e.g., "}else{" instead of
> "} else {", "send_grat_arp=1" instead of "send_grat_arp = 1", and so on.
>
OK thanks. I'll fix and repost.
>> }
>> }
>>
>> @@ -2120,6 +2125,15 @@ void bond_mii_monitor(struct net_device
>> * program could monitor the link itself if needed.
>> */
>>
>> + if (bond->send_grat_arp) {
>> + if (bond->curr_active_slave && test_bit(__LINK_STATE_LINKWATCH_PENDING, &bond->curr_active_slave->dev->state))
>> + dprintk("Needs to send gratuitous arp but not yet\n",__FUNCTION__);
>> + else {
>> + dprintk("sending delayed gratuitous arp on ond->curr_active_slave->dev->name\n");
>> + bond_send_gratuitous_arp(bond);
>> + bond->send_grat_arp=0;
>> + }
>> + }
>
>
>> read_lock(&bond->curr_slave_lock);
>> oldcurrent = bond->curr_active_slave;
>> read_unlock(&bond->curr_slave_lock);
>> @@ -2513,6 +2527,7 @@ static void bond_send_gratuitous_arp(str
>> struct slave *slave = bond->curr_active_slave;
>> struct vlan_entry *vlan;
>> struct net_device *vlan_dev;
>> + int i;
>>
>> dprintk("bond_send_grat_arp: bond %s slave %s\n", bond->dev->name,
>> slave ? slave->dev->name : "NULL");
>> @@ -2520,8 +2535,9 @@ static void bond_send_gratuitous_arp(str
>> return;
>>
>> if (bond->master_ip) {
>> - bond_arp_send(slave->dev, ARPOP_REPLY, bond->master_ip,
>> - bond->master_ip, 0);
>> + for (i=0;i<3;i++)
>> + bond_arp_send(slave->dev, ARPOP_REPLY, bond->master_ip,
>> + bond->master_ip, 0);
>> }
>
> If you delay the grat ARP until linkwatch is done, why is it
> also necessary to shotgun several ARPs instead of one? Why are the ARPs
> sent for VLANs not also shotgunned in a similar fashion?
Besides the linkwatch issue I also noticed that on rare occasions, grat. ARPs
that found their way to the slave's xmit function were not xmitted.
The 3 times send is just an another attempt to improve chances.
I'd like to emphasize here that with IB slaves, grat. ARP is much more crucial to
a successful change of slaves and that was my focus.
> If shotgunning like this really is useful, would it not make
> more sense to space them out a bit, e.g., over successive monitor
> passes?
>
I guess you are right about that.
>> list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
>> @@ -4331,6 +4347,7 @@ static int bond_init(struct net_device *
>> bond->current_arp_slave = NULL;
>> bond->primary_slave = NULL;
>> bond->dev = bond_dev;
>> + bond->send_grat_arp=0;
>> INIT_LIST_HEAD(&bond->vlan_list);
>>
>> /* Initialize the device entry points */
>> Index: net-2.6/drivers/net/bonding/bonding.h
>> ===================================================================
>> --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-25 15:20:10.000000000 +0300
>> +++ net-2.6/drivers/net/bonding/bonding.h 2007-07-26 18:42:43.652087660 +0300
>> @@ -203,6 +203,7 @@ struct bonding {
>> struct vlan_group *vlgrp;
>> struct packet_type arp_mon_pt;
>> s8 do_set_mac_addr;
>> + int send_grat_arp;
>
> This need not be a full int, and (this applies to
> do_set_mac_addr, also) could probably be squeezed into gaps already
> existing within the struct bonding somewhere.
Thanks. Will be fixed.
>
> -J
>
> ---
> -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> _______________________________________________
> general mailing list
> general@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
2007-07-30 21:20 ` [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Roland Dreier
@ 2007-07-31 13:44 ` Moni Shoua
2007-07-31 14:04 ` [ofa-general] " Michael S. Tsirkin
0 siblings, 1 reply; 20+ messages in thread
From: Moni Shoua @ 2007-07-31 13:44 UTC (permalink / raw)
To: Roland Dreier; +Cc: fubar, davem, general, netdev
Roland Dreier wrote:
> > 1. When bonding enslaves an IPoIB device the bonding neighbor holds a
> > reference to a cleanup function in the IPoIB drives. This makes it unsafe to
> > unload the IPoIB module if there are bonding neighbors in the air. So, to
> > avoid this race one must unload bonding before unloading IPoIB.
>
> I think we really want to resolve this somehow. Getting an oops by
> doing "modprobe -r ipoib" isn't that friendly.
>
You are right and we want to resolve that.
One way is to clean the neigh destructor function from all IPoIB neighs.
The other way is to prevent ipoib unload if device is a slave or is referenced from
somewhere else.
I guess I would like an advice here.
> Also, what happened to the problem of having an address handle
> belonging to the wrong device on bond failover? Did you figure out a
> way to fix that one?
This is what patch 2 handles.
>
> - R.
> _______________________________________________
> general mailing list
> general@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] Re: Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
2007-07-31 13:44 ` [ofa-general] " Moni Shoua
@ 2007-07-31 14:04 ` Michael S. Tsirkin
2007-07-31 14:19 ` Or Gerlitz
0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2007-07-31 14:04 UTC (permalink / raw)
To: Moni Shoua; +Cc: netdev, Roland Dreier, fubar, davem, general
> Quoting Moni Shoua <monisonlists@gmail.com>:
> Subject: Re: Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for?the bonding driver
>
> Roland Dreier wrote:
> > > 1. When bonding enslaves an IPoIB device the bonding neighbor holds a
> > > reference to a cleanup function in the IPoIB drives. This makes it unsafe to
> > > unload the IPoIB module if there are bonding neighbors in the air. So, to
> > > avoid this race one must unload bonding before unloading IPoIB.
> >
> > I think we really want to resolve this somehow. Getting an oops by
> > doing "modprobe -r ipoib" isn't that friendly.
> >
> You are right and we want to resolve that.
> One way is to clean the neigh destructor function from all IPoIB neighs.
> The other way is to prevent ipoib unload if device is a slave or is referenced from
> somewhere else.
>
> I guess I would like an advice here.
I had this idea:
Maybe we could use hard_header_cache/header_cache_update methods instead of
neighbour cleanup calls.
To do this, it is possible that we'll have to switch from storing pointers
inside the neighbour to keeping an index there, but I expect the
performance impact to be minimal.
As a result, bonding would not have to copy pointers into ipoib module
and module removal would get fixed.
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
2007-07-31 14:04 ` [ofa-general] " Michael S. Tsirkin
@ 2007-07-31 14:19 ` Or Gerlitz
2007-07-31 14:22 ` [ofa-general] " Michael S. Tsirkin
0 siblings, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2007-07-31 14:19 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, Roland Dreier, fubar, general, davem
Michael S. Tsirkin wrote:
> Maybe we could use hard_header_cache/header_cache_update methods instead of
> neighbour cleanup calls.
> To do this, it is possible that we'll have to switch from storing pointers
> inside the neighbour to keeping an index there, but I expect the
> performance impact to be minimal.
>
> As a result, bonding would not have to copy pointers into ipoib module
> and module removal would get fixed.
To be precise, bonding will copy all the symbols it copies today from
the slave module (ipoib), see bond_setup_by_slave() in patch 3/7, except
for the neighbour cleanup callback which is copied through coping the
neigh_setup function.
Or.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] Re: Re: Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
2007-07-31 14:19 ` Or Gerlitz
@ 2007-07-31 14:22 ` Michael S. Tsirkin
2007-07-31 14:36 ` Or Gerlitz
2007-08-01 14:12 ` [ofa-general] Re: " Moni Shoua
0 siblings, 2 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2007-07-31 14:22 UTC (permalink / raw)
To: Or Gerlitz
Cc: netdev, Roland Dreier, fubar, Michael S. Tsirkin, general, davem
> Quoting Or Gerlitz <ogerlitz@voltaire.com>:
> Subject: Re: Re: Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for?the bonding driver
>
> Michael S. Tsirkin wrote:
> >Maybe we could use hard_header_cache/header_cache_update methods instead of
> >neighbour cleanup calls.
>
> >To do this, it is possible that we'll have to switch from storing pointers
> >inside the neighbour to keeping an index there, but I expect the
> >performance impact to be minimal.
> >
> >As a result, bonding would not have to copy pointers into ipoib module
> >and module removal would get fixed.
>
> To be precise, bonding will copy all the symbols it copies today from
> the slave module (ipoib),
> see bond_setup_by_slave() in patch 3/7, except
> for the neighbour cleanup callback which is copied through coping the
> neigh_setup function.
Not really.
This copying of symbols is something that you added, isn't it?
So with this approach, it won't be needed.
It's always wrong to copy symbols from another module without
referencing it.
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
2007-07-31 14:22 ` [ofa-general] " Michael S. Tsirkin
@ 2007-07-31 14:36 ` Or Gerlitz
2007-07-31 14:48 ` Michael S. Tsirkin
2007-08-01 14:12 ` [ofa-general] Re: " Moni Shoua
1 sibling, 1 reply; 20+ messages in thread
From: Or Gerlitz @ 2007-07-31 14:36 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, Roland Dreier, fubar, general, davem
Michael S. Tsirkin wrote:
>> Quoting Or Gerlitz <ogerlitz@voltaire.com>:
>> To be precise, bonding will copy all the symbols it copies today from
>> the slave module (ipoib), see bond_setup_by_slave() in patch 3/7
> Not really.
> This copying of symbols is something that you added, isn't it?
> So with this approach, it won't be needed.
> It's always wrong to copy symbols from another module without
> referencing it.
Its the --first-- time you make this comment, please suggest a different
approach, the relevant code is below.
> +static void bond_setup_by_slave(struct net_device *bond_dev,
> + struct net_device *slave_dev)
> +{
> + bond_dev->hard_header = slave_dev->hard_header;
> + bond_dev->rebuild_header = slave_dev->rebuild_header;
> + bond_dev->hard_header_cache = slave_dev->hard_header_cache;
> + bond_dev->header_cache_update = slave_dev->header_cache_update;
> + bond_dev->hard_header_parse = slave_dev->hard_header_parse;
> +
> + bond_dev->neigh_setup = slave_dev->neigh_setup;
> +
> + bond_dev->type = slave_dev->type;
> + bond_dev->hard_header_len = slave_dev->hard_header_len;
> + bond_dev->addr_len = slave_dev->addr_len;
> +
> + memcpy(bond_dev->broadcast, slave_dev->broadcast,
> + slave_dev->addr_len);
> +}
> +
> /* enslave device <slave> to bond device <master> */
> int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> {
> @@ -1351,6 +1371,24 @@ int bond_enslave(struct net_device *bond
> goto err_undo_flags;
> }
>
> + /* set bonding device ether type by slave - bonding netdevices are
> + * created with ether_setup, so when the slave type is not ARPHRD_ETHER
> + * there is a need to override some of the type dependent attribs/funcs.
> + *
> + * bond ether type mutual exclusion - don't allow slaves of dissimilar
> + * ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond
> + */
> + if (bond->slave_cnt == 0) {
> + if (slave_dev->type != ARPHRD_ETHER)
> + bond_setup_by_slave(bond_dev, slave_dev);
> + } else if (bond_dev->type != slave_dev->type) {
> + printk(KERN_ERR DRV_NAME ": %s ether type (%d) is different from "
> + "other slaves (%d), can not enslave it.\n", slave_dev->name,
> + slave_dev->type, bond_dev->type);
> + res = -EINVAL;
> + goto err_undo_flags;
> + }
> +
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
2007-07-31 14:36 ` Or Gerlitz
@ 2007-07-31 14:48 ` Michael S. Tsirkin
2007-07-31 14:57 ` [ofa-general] " Or Gerlitz
0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2007-07-31 14:48 UTC (permalink / raw)
To: Or Gerlitz
Cc: Michael S. Tsirkin, netdev, Roland Dreier, fubar, davem, general
> Quoting Or Gerlitz <ogerlitz@voltaire.com>:
> Subject: Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for?the?bonding driver
>
> Michael S. Tsirkin wrote:
> >>Quoting Or Gerlitz <ogerlitz@voltaire.com>:
>
> >>To be precise, bonding will copy all the symbols it copies today from
> >>the slave module (ipoib), see bond_setup_by_slave() in patch 3/7
>
> >Not really.
> >This copying of symbols is something that you added, isn't it?
> >So with this approach, it won't be needed.
>
> >It's always wrong to copy symbols from another module without
> >referencing it.
>
> Its the --first-- time you make this comment,
It's really a well known fact. That's where the crash
with modprobe -r comes from, right?
> please suggest a different approach,
I don't know, really - if you want to access a module, you really must get
a reference to it, or to the device.
How about adding the module pointer to struct net_device?
>the relevant code is below.
>+static void bond_setup_by_slave(struct net_device *bond_dev,
>+ struct net_device *slave_dev)
>+{
>+ bond_dev->hard_header = slave_dev->hard_header;
>+ bond_dev->rebuild_header = slave_dev->rebuild_header;
>+ bond_dev->hard_header_cache = slave_dev->hard_header_cache;
>+ bond_dev->header_cache_update = slave_dev->header_cache_update;
>+ bond_dev->hard_header_parse = slave_dev->hard_header_parse;
>+
>+ bond_dev->neigh_setup = slave_dev->neigh_setup;
>+
>+ bond_dev->type = slave_dev->type;
>+ bond_dev->hard_header_len = slave_dev->hard_header_len;
>+ bond_dev->addr_len = slave_dev->addr_len;
>+
>+ memcpy(bond_dev->broadcast, slave_dev->broadcast,
>+ slave_dev->addr_len);
>+}
>+
Hmm, it seems that switching to hard_header_cache as I suggested won't help at all.
I wonder: is bonding currently broken with devices that implement
hard_header_cache/header_cache_update?
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread
* [ofa-general] Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
2007-07-31 14:48 ` Michael S. Tsirkin
@ 2007-07-31 14:57 ` Or Gerlitz
0 siblings, 0 replies; 20+ messages in thread
From: Or Gerlitz @ 2007-07-31 14:57 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, Roland Dreier, fubar, davem, general
Michael S. Tsirkin wrote:
>> Quoting Or Gerlitz <ogerlitz@voltaire.com>:
>> Michael S. Tsirkin wrote:
>>> It's always wrong to copy symbols from another module without
>>> referencing it.
>> Its the --first-- time you make this comment,
> It's really a well known fact. That's where the crash
> with modprobe -r comes from, right?
no, the crash --only-- comes from the neighbour cleanup function being
called while ipoib is now probed out of the kernel. The other symbols
are not problematic. I got positive feedback that this --is-- the
problem in the previous posts and from Roland during my Sonoma presentation.
>> please suggest a different approach,
> I don't know, really - if you want to access a module, you really must get
> a reference to it, or to the device.
> How about adding the module pointer to struct net_device?
I think there used to be there owner field of type struct module and it
was removed... we will check that.
>> the relevant code is below.
>
>> +static void bond_setup_by_slave(struct net_device *bond_dev,
>> + struct net_device *slave_dev)
>> +{
>> + bond_dev->hard_header = slave_dev->hard_header;
>> + bond_dev->rebuild_header = slave_dev->rebuild_header;
>> + bond_dev->hard_header_cache = slave_dev->hard_header_cache;
>> + bond_dev->header_cache_update = slave_dev->header_cache_update;
>> + bond_dev->hard_header_parse = slave_dev->hard_header_parse;
>> +
>> + bond_dev->neigh_setup = slave_dev->neigh_setup;
>> +
>> + bond_dev->type = slave_dev->type;
>> + bond_dev->hard_header_len = slave_dev->hard_header_len;
>> + bond_dev->addr_len = slave_dev->addr_len;
>> +
>> + memcpy(bond_dev->broadcast, slave_dev->broadcast,
>> + slave_dev->addr_len);
>> +}
>> +
>
> Hmm, it seems that switching to hard_header_cache as I suggested won't help at all.
why? please clarify.
> I wonder: is bonding currently broken with devices that implement
> hard_header_cache/header_cache_update?
I don't think so. Note that bond_setup_by_slave is only called for
slaves whose ether type is --not-- Ethernet.
Or.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: Re: Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
2007-07-31 14:22 ` [ofa-general] " Michael S. Tsirkin
2007-07-31 14:36 ` Or Gerlitz
@ 2007-08-01 14:12 ` Moni Shoua
2007-08-01 16:10 ` Michael S. Tsirkin
1 sibling, 1 reply; 20+ messages in thread
From: Moni Shoua @ 2007-08-01 14:12 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Or Gerlitz, netdev, Roland Dreier, fubar, general, davem
> It's always wrong to copy symbols from another module without
> referencing it.
Michael,
It seems like the preferred approach is to prevent ib_ipoib from being
unloaded while bonding is on top it, right?
It seems like it would handle all safety issues (not just neigh cleanup).
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: Re: Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
2007-08-01 14:12 ` [ofa-general] Re: " Moni Shoua
@ 2007-08-01 16:10 ` Michael S. Tsirkin
0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2007-08-01 16:10 UTC (permalink / raw)
To: Moni Shoua
Cc: netdev, Roland Dreier, fubar, Michael S. Tsirkin, general, davem
> Quoting Moni Shoua <monisonlists@gmail.com>:
> Subject: Re: [ofa-general] Re: Re: Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for?the bonding driver
>
>
> > It's always wrong to copy symbols from another module without
> > referencing it.
>
> Michael,
> It seems like the preferred approach is to prevent ib_ipoib from being
> unloaded while bonding is on top it, right?
> It seems like it would handle all safety issues (not just neigh cleanup).
Donnu about preferred, but that'll work I think.
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2007-08-01 16:10 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-30 12:37 [ofa-general] [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Moni Shoua
2007-07-30 12:48 ` [ofa-general] [PATCH V3 1/7] IB/ipoib: Bound the net device to the ipoib_neigh structue Moni Shoua
2007-07-30 12:49 ` [ofa-general] [PATCH V3 2/7] IB/ipoib: Verify address handle validity on send Moni Shoua
2007-07-30 12:51 ` [ofa-general] [PATCH V3 3/7] net/bonding: Enable bonding to enslave non ARPHRD_ETHER Moni Shoua
2007-07-30 12:52 ` [ofa-general] [PATCH V3 4/7] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address() Moni Shoua
2007-07-30 12:54 ` [ofa-general] [PATCH V3 5/7] net/bonding: Enable IP multicast for bonding IPoIB devices Moni Shoua
2007-07-30 12:54 ` [ofa-general] [PATCH V3 6/7] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device Moni Shoua
2007-07-30 12:56 ` [ofa-general] [PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure Moni Shoua
2007-07-30 20:29 ` [ofa-general] " Jay Vosburgh
2007-07-31 13:33 ` Moni Shoua
2007-07-30 21:20 ` [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver Roland Dreier
2007-07-31 13:44 ` [ofa-general] " Moni Shoua
2007-07-31 14:04 ` [ofa-general] " Michael S. Tsirkin
2007-07-31 14:19 ` Or Gerlitz
2007-07-31 14:22 ` [ofa-general] " Michael S. Tsirkin
2007-07-31 14:36 ` Or Gerlitz
2007-07-31 14:48 ` Michael S. Tsirkin
2007-07-31 14:57 ` [ofa-general] " Or Gerlitz
2007-08-01 14:12 ` [ofa-general] Re: " Moni Shoua
2007-08-01 16:10 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).