Netdev List
 help / color / mirror / Atom feed
* [PATCH 01/11] IB/ipoib: Export call to call_netdevice_notifiers and add new private flag
From: Jay Vosburgh @ 2007-09-14 23:40 UTC (permalink / raw)
  To: netdev, rdreier, monis
  Cc: monisonlists, ogerlitz, jgarzik, davem, general, Moni Shoua
In-Reply-To: <11898132301664-git-send-email-fubar@us.ibm.com>

From: Moni Shoua <monisonlists@gmail.com>

Export the call to raw_notifier_call_chain so modules can send notifications
on netdev events to the netdev_chain.
Add IFF_SLAVE_DETACH to the list of priv_flags for net_device.
This flag is set by a slave that is about to unregisster from the kernel.

Both changes are used in bonding slaves that wish to inform the bonding master
about coming detachment.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
---
 include/linux/if.h |    1 +
 net/core/dev.c     |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/if.h b/include/linux/if.h
index 32bf419..b302b22 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -61,6 +61,7 @@
 #define IFF_MASTER_ALB	0x10		/* bonding master, balance-alb.	*/
 #define IFF_BONDING	0x20		/* bonding master or slave	*/
 #define IFF_SLAVE_NEEDARP 0x40		/* need ARPs for validation	*/
+#define IFF_SLAVE_DETACH 0x80		/* slave is about to unregister */
 
 #define IF_GET_IFACE	0x0001		/* for querying only */
 #define IF_GET_PROTO	0x0002
diff --git a/net/core/dev.c b/net/core/dev.c
index a76021c..5322add 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1148,6 +1148,7 @@ int call_netdevice_notifiers(unsigned long val, void *v)
 {
 	return raw_notifier_call_chain(&netdev_chain, val, v);
 }
+EXPORT_SYMBOL(call_netdevice_notifiers);
 
 /* When > 0 there are consumers of rx skb time stamps */
 static atomic_t netstamp_needed = ATOMIC_INIT(0);
-- 
1.5.2-rc2.GIT


^ permalink raw reply related

* [PATCH 03/11] IB/ipoib: Bound the net device to the ipoib_neigh structue
From: Jay Vosburgh @ 2007-09-14 23:40 UTC (permalink / raw)
  To: netdev, rdreier, monis
  Cc: monisonlists, ogerlitz, jgarzik, davem, general, Moni Shoua
In-Reply-To: <1189813234208-git-send-email-fubar@us.ibm.com>

From: Moni Shoua <monis@voltaire.com>

IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
can be casted to struct ipoib_dev_priv is buggy.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n->dev->flags has the
IFF_MASTER bit set.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h           |    4 +++-
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   17 +++++++++++++++--
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    3 ++-
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 285c143..a13730c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -328,6 +328,7 @@ struct ipoib_neigh {
 	struct sk_buff_head queue;
 
 	struct neighbour   *neighbour;
+	struct net_device *dev;
 
 	struct list_head    list;
 };
@@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ipoib_neigh(struct neighbour *neigh)
 				     INFINIBAND_ALEN, sizeof(void *));
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh);
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
+				      struct net_device *dev);
 void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 97a9661..cb26cfd 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -511,7 +511,7 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev)
 	struct ipoib_path *path;
 	struct ipoib_neigh *neigh;
 
-	neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+	neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev);
 	if (!neigh) {
 		++priv->stats.tx_dropped;
 		dev_kfree_skb_any(skb);
@@ -830,6 +830,17 @@ static void ipoib_neigh_cleanup(struct neighbour *n)
 	unsigned long flags;
 	struct ipoib_ah *ah = NULL;
 
+	if (n->dev->flags & IFF_MASTER) {
+		/* n->dev is not an IPoIB device and we have
+			to take priv from elsewhere */
+		neigh = *to_ipoib_neigh(n);
+		if (neigh) {
+			priv = netdev_priv(neigh->dev);
+			ipoib_dbg(priv, "neigh_destructor for bonding device: %s\n",
+				  n->dev->name);
+		} else
+			return;
+	}
 	ipoib_dbg(priv,
 		  "neigh_cleanup for %06x " IPOIB_GID_FMT "\n",
 		  IPOIB_QPN(n->ha),
@@ -851,7 +862,8 @@ static void ipoib_neigh_cleanup(struct neighbour *n)
 		ipoib_put_ah(ah);
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour,
+				      struct net_device *dev)
 {
 	struct ipoib_neigh *neigh;
 
@@ -860,6 +872,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
 		return NULL;
 
 	neigh->neighbour = neighbour;
+	neigh->dev = dev;
 	*to_ipoib_neigh(neighbour) = neigh;
 	skb_queue_head_init(&neigh->queue);
 	ipoib_cm_set(neigh, NULL);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index aae3670..ed0f0bb 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -727,7 +727,8 @@ out:
 		if (skb->dst            &&
 		    skb->dst->neighbour &&
 		    !*to_ipoib_neigh(skb->dst->neighbour)) {
-			struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+			struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour,
+									skb->dev);
 
 			if (neigh) {
 				kref_get(&mcast->ah->ref);
-- 
1.5.2-rc2.GIT


^ permalink raw reply related

* [PATCH 04/11] IB/ipoib: Verify address handle validity on send
From: Jay Vosburgh @ 2007-09-14 23:40 UTC (permalink / raw)
  To: netdev, rdreier, monis
  Cc: monisonlists, ogerlitz, jgarzik, davem, general, Moni Shoua
In-Reply-To: <11898132352341-git-send-email-fubar@us.ibm.com>

From: Moni Shoua <monis@voltaire.com>

When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index cb26cfd..6c4e9fb 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -686,9 +686,10 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 				goto out;
 			}
 		} else if (neigh->ah) {
-			if (unlikely(memcmp(&neigh->dgid.raw,
+			if (unlikely((memcmp(&neigh->dgid.raw,
 					    skb->dst->neighbour->ha + 4,
-					    sizeof(union ib_gid)))) {
+					    sizeof(union ib_gid))) ||
+						 (neigh->dev != dev))) {
 				spin_lock(&priv->lock);
 				/*
 				 * It's safe to call ipoib_put_ah() inside
-- 
1.5.2-rc2.GIT


^ permalink raw reply related

* [PATCH 05/11] net/bonding: Enable bonding to enslave non ARPHRD_ETHER
From: Jay Vosburgh @ 2007-09-14 23:40 UTC (permalink / raw)
  To: netdev, rdreier, monis
  Cc: monisonlists, ogerlitz, jgarzik, davem, general, Moni Shoua
In-Reply-To: <11898132372856-git-send-email-fubar@us.ibm.com>

From: Moni Shoua <monis@voltaire.com>

This patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types. It also enforces mutual exclusion on bonding slaves
from dissimilar ether types, as was concluded over the v1 discussion.

IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have omitted here some details which
are not important for the bonding RFC).

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/net/bonding/bond_main.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1afda32..13ec73d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1237,6 +1237,26 @@ static int bond_compute_features(struct bonding *bond)
 	return 0;
 }
 
+
+static void bond_setup_by_slave(struct net_device *bond_dev,
+				struct net_device *slave_dev)
+{
+	bond_dev->hard_header	        = slave_dev->hard_header;
+	bond_dev->rebuild_header        = slave_dev->rebuild_header;
+	bond_dev->hard_header_cache	= slave_dev->hard_header_cache;
+	bond_dev->header_cache_update   = slave_dev->header_cache_update;
+	bond_dev->hard_header_parse	= slave_dev->hard_header_parse;
+
+	bond_dev->neigh_setup           = slave_dev->neigh_setup;
+
+	bond_dev->type		    = slave_dev->type;
+	bond_dev->hard_header_len   = slave_dev->hard_header_len;
+	bond_dev->addr_len	    = slave_dev->addr_len;
+
+	memcpy(bond_dev->broadcast, slave_dev->broadcast,
+		slave_dev->addr_len);
+}
+
 /* enslave device <slave> to bond device <master> */
 int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 {
@@ -1311,6 +1331,25 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		goto err_undo_flags;
 	}
 
+	/* set bonding device ether type by slave - bonding netdevices are
+	 * created with ether_setup, so when the slave type is not ARPHRD_ETHER
+	 * there is a need to override some of the type dependent attribs/funcs.
+	 *
+	 * bond ether type mutual exclusion - don't allow slaves of dissimilar
+	 * ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond
+	 */
+	if (bond->slave_cnt == 0) {
+		if (slave_dev->type != ARPHRD_ETHER)
+			bond_setup_by_slave(bond_dev, slave_dev);
+	} else if (bond_dev->type != slave_dev->type) {
+		printk(KERN_ERR DRV_NAME ": %s ether type (%d) is different "
+			"from other slaves (%d), can not enslave it.\n",
+			slave_dev->name,
+			slave_dev->type, bond_dev->type);
+			res = -EINVAL;
+			goto err_undo_flags;
+	}
+
 	if (slave_dev->set_mac_address == NULL) {
 		printk(KERN_ERR DRV_NAME
 			": %s: Error: The slave device you specified does "
-- 
1.5.2-rc2.GIT


^ permalink raw reply related

* [PATCH 06/11] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()
From: Jay Vosburgh @ 2007-09-14 23:40 UTC (permalink / raw)
  To: netdev, rdreier, monis
  Cc: monisonlists, ogerlitz, jgarzik, davem, general, Moni Shoua
In-Reply-To: <11898132411426-git-send-email-fubar@us.ibm.com>

From: Moni Shoua <monis@voltaire.com>

This patch allows for enslaving netdevices which do not support
the set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs
(this is already done by the bonding code).

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/net/bonding/bond_main.c |   87 ++++++++++++++++++++++++++------------
 drivers/net/bonding/bonding.h   |    1 +
 2 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 13ec73d..d937bae 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1095,6 +1095,14 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 		if (new_active) {
 			bond_set_slave_active_flags(new_active);
 		}
+
+		/* when bonding does not set the slave MAC address, the bond MAC
+		 * address is the one of the active slave.
+		 */
+		if (new_active && !bond->do_set_mac_addr)
+			memcpy(bond->dev->dev_addr,  new_active->dev->dev_addr,
+				new_active->dev->addr_len);
+
 		bond_send_gratuitous_arp(bond);
 	}
 }
@@ -1351,13 +1359,22 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	}
 
 	if (slave_dev->set_mac_address == NULL) {
-		printk(KERN_ERR DRV_NAME
-			": %s: Error: The slave device you specified does "
-			"not support setting the MAC address. "
-			"Your kernel likely does not support slave "
-			"devices.\n", bond_dev->name);
-  		res = -EOPNOTSUPP;
-		goto err_undo_flags;
+		if (bond->slave_cnt == 0) {
+			printk(KERN_WARNING DRV_NAME
+				": %s: Warning: The first slave device you "
+				"specified does not support setting the MAC "
+				"address. This bond MAC address would be that "
+				"of the active slave.\n", bond_dev->name);
+			bond->do_set_mac_addr = 0;
+		} else if (bond->do_set_mac_addr) {
+			printk(KERN_ERR DRV_NAME
+				": %s: Error: The slave device you specified "
+				"does not support setting the MAC addres,."
+				"but this bond uses this practice. \n"
+				, bond_dev->name);
+			res = -EOPNOTSUPP;
+			goto err_undo_flags;
+		}
 	}
 
 	new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
@@ -1378,16 +1395,18 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	 */
 	memcpy(new_slave->perm_hwaddr, slave_dev->dev_addr, ETH_ALEN);
 
-	/*
-	 * Set slave to master's mac address.  The application already
-	 * set the master's mac address to that of the first slave
-	 */
-	memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
-	addr.sa_family = slave_dev->type;
-	res = dev_set_mac_address(slave_dev, &addr);
-	if (res) {
-		dprintk("Error %d calling set_mac_address\n", res);
-		goto err_free;
+	if (bond->do_set_mac_addr) {
+		/*
+		 * Set slave to master's mac address.  The application already
+		 * set the master's mac address to that of the first slave
+		 */
+		memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
+		addr.sa_family = slave_dev->type;
+		res = dev_set_mac_address(slave_dev, &addr);
+		if (res) {
+			dprintk("Error %d calling set_mac_address\n", res);
+			goto err_free;
+		}
 	}
 
 	res = netdev_set_master(slave_dev, bond_dev);
@@ -1612,9 +1631,11 @@ err_close:
 	dev_close(slave_dev);
 
 err_restore_mac:
-	memcpy(addr.sa_data, new_slave->perm_hwaddr, ETH_ALEN);
-	addr.sa_family = slave_dev->type;
-	dev_set_mac_address(slave_dev, &addr);
+	if (bond->do_set_mac_addr) {
+		memcpy(addr.sa_data, new_slave->perm_hwaddr, ETH_ALEN);
+		addr.sa_family = slave_dev->type;
+		dev_set_mac_address(slave_dev, &addr);
+	}
 
 err_free:
 	kfree(new_slave);
@@ -1792,10 +1813,12 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 	/* close slave before restoring its mac address */
 	dev_close(slave_dev);
 
-	/* restore original ("permanent") mac address */
-	memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
-	addr.sa_family = slave_dev->type;
-	dev_set_mac_address(slave_dev, &addr);
+	if (bond->do_set_mac_addr) {
+		/* restore original ("permanent") mac address */
+		memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
+		addr.sa_family = slave_dev->type;
+		dev_set_mac_address(slave_dev, &addr);
+	}
 
 	slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB |
 				   IFF_SLAVE_INACTIVE | IFF_BONDING |
@@ -1882,10 +1905,12 @@ static int bond_release_all(struct net_device *bond_dev)
 		/* close slave before restoring its mac address */
 		dev_close(slave_dev);
 
-		/* restore original ("permanent") mac address*/
-		memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
-		addr.sa_family = slave_dev->type;
-		dev_set_mac_address(slave_dev, &addr);
+		if (bond->do_set_mac_addr) {
+			/* restore original ("permanent") mac address*/
+			memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
+			addr.sa_family = slave_dev->type;
+			dev_set_mac_address(slave_dev, &addr);
+		}
 
 		slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB |
 					   IFF_SLAVE_INACTIVE);
@@ -3922,6 +3947,9 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
 
 	dprintk("bond=%p, name=%s\n", bond, (bond_dev ? bond_dev->name : "None"));
 
+	if (!bond->do_set_mac_addr)
+		return -EOPNOTSUPP;
+
 	if (!is_valid_ether_addr(sa->sa_data)) {
 		return -EADDRNOTAVAIL;
 	}
@@ -4312,6 +4340,9 @@ static int bond_init(struct net_device *bond_dev, struct bond_params *params)
 	bond_create_proc_entry(bond);
 #endif
 
+	/* set do_set_mac_addr to true on startup */
+	bond->do_set_mac_addr = 1;
+
 	list_add_tail(&bond->bond_list, &bond_dev_list);
 
 	return 0;
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 6dcbd25..700d40a 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -185,6 +185,7 @@ struct bonding {
 	struct   timer_list mii_timer;
 	struct   timer_list arp_timer;
 	s8       kill_timers;
+	s8       do_set_mac_addr;
 	struct   net_device_stats stats;
 #ifdef CONFIG_PROC_FS
 	struct   proc_dir_entry *proc_entry;
-- 
1.5.2-rc2.GIT


^ permalink raw reply related

* [PATCH 07/11] net/bonding: Enable IP multicast for bonding IPoIB devices
From: Jay Vosburgh @ 2007-09-14 23:40 UTC (permalink / raw)
  To: netdev, rdreier, monis
  Cc: monisonlists, ogerlitz, jgarzik, davem, general, Moni Shoua
In-Reply-To: <1189813242354-git-send-email-fubar@us.ibm.com>

From: Moni Shoua <monis@voltaire.com>

Allow to enslave devices when the bonding device is not up. Over the discussion
held at the previous post this seemed to be the most clean way to go, where it
is not expected to cause instabilities.

Normally, the bonding driver is UP before any enslavement takes place.
Once a netdevice is UP, the network stack acts to have it join some multicast groups
(eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device
type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
where for multicast joins taking place after the enslavement another ip_xxx_mc_map()
is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/net/bonding/bond_main.c  |    5 +++--
 drivers/net/bonding/bond_sysfs.c |    6 ++----
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d937bae..a1fe87a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1285,8 +1285,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 
 	/* bond must be initialized by bond_open() before enslaving */
 	if (!(bond_dev->flags & IFF_UP)) {
-		dprintk("Error, master_dev is not up\n");
-		return -EPERM;
+		printk(KERN_WARNING DRV_NAME
+			" %s: master_dev is not up in bond_enslave\n",
+			bond_dev->name);
 	}
 
 	/* already enslaved */
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 9afd172..073841f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -265,11 +265,9 @@ static ssize_t bonding_store_slaves(struct device *d,
 
 	/* Quick sanity check -- is the bond interface up? */
 	if (!(bond->dev->flags & IFF_UP)) {
-		printk(KERN_ERR DRV_NAME
-		       ": %s: Unable to update slaves because interface is down.\n",
+		printk(KERN_WARNING DRV_NAME
+		       ": %s: doing slave updates when interface is down.\n",
 		       bond->dev->name);
-		ret = -EPERM;
-		goto out;
 	}
 
 	/* Note:  We can't hold bond->lock here, as bond_create grabs it. */
-- 
1.5.2-rc2.GIT


^ permalink raw reply related

* [PATCH 08/11] net/bonding: Handle wrong assumptions that slave is always an Ethernet device
From: Jay Vosburgh @ 2007-09-14 23:40 UTC (permalink / raw)
  To: netdev, rdreier, monis
  Cc: monisonlists, ogerlitz, jgarzik, davem, general, Moni Shoua
In-Reply-To: <11898132441599-git-send-email-fubar@us.ibm.com>

From: Moni Shoua <monis@voltaire.com>

bonding sometimes uses Ethernet constants (such as MTU and address length) which
are not good when it enslaves non Ethernet devices (such as InfiniBand).

Signed-off-by: Moni Shoua <monis@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/net/bonding/bond_main.c  |    3 ++-
 drivers/net/bonding/bond_sysfs.c |   19 +++++++++++++------
 drivers/net/bonding/bonding.h    |    1 +
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a1fe87a..9ff2cf6 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1224,7 +1224,8 @@ static int bond_compute_features(struct bonding *bond)
 	struct slave *slave;
 	struct net_device *bond_dev = bond->dev;
 	unsigned long features = bond_dev->features;
-	unsigned short max_hard_header_len = ETH_HLEN;
+	unsigned short max_hard_header_len = max((u16)ETH_HLEN,
+						bond_dev->hard_header_len);
 	int i;
 
 	features &= ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES);
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 073841f..71db5d9 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -163,9 +163,7 @@ static ssize_t bonding_store_bonds(struct class *cls, const char *buffer, size_t
 				printk(KERN_INFO DRV_NAME
 					": %s is being deleted...\n",
 					bond->dev->name);
-				bond_deinit(bond->dev);
-		        	bond_destroy_sysfs_entry(bond);
-				unregister_netdevice(bond->dev);
+				bond_destroy(bond);
 				rtnl_unlock();
 				goto out;
 			}
@@ -259,6 +257,7 @@ static ssize_t bonding_store_slaves(struct device *d,
 	char command[IFNAMSIZ + 1] = { 0, };
 	char *ifname;
 	int i, res, found, ret = count;
+	u32 original_mtu;
 	struct slave *slave;
 	struct net_device *dev = NULL;
 	struct bonding *bond = to_bond(d);
@@ -324,6 +323,7 @@ static ssize_t bonding_store_slaves(struct device *d,
 		}
 
 		/* Set the slave's MTU to match the bond */
+		original_mtu = dev->mtu;
 		if (dev->mtu != bond->dev->mtu) {
 			if (dev->change_mtu) {
 				res = dev->change_mtu(dev,
@@ -338,6 +338,9 @@ static ssize_t bonding_store_slaves(struct device *d,
 		}
 		rtnl_lock();
 		res = bond_enslave(bond->dev, dev);
+		bond_for_each_slave(bond, slave, i)
+			if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0)
+				slave->original_mtu = original_mtu;
 		rtnl_unlock();
 		if (res) {
 			ret = res;
@@ -350,13 +353,17 @@ static ssize_t bonding_store_slaves(struct device *d,
 		bond_for_each_slave(bond, slave, i)
 			if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0) {
 				dev = slave->dev;
+				original_mtu = slave->original_mtu;
 				break;
 			}
 		if (dev) {
 			printk(KERN_INFO DRV_NAME ": %s: Removing slave %s\n",
 				bond->dev->name, dev->name);
 			rtnl_lock();
-			res = bond_release(bond->dev, dev);
+			if (bond->setup_by_slave)
+				res = bond_release_and_destroy(bond->dev, dev);
+			else
+				res = bond_release(bond->dev, dev);
 			rtnl_unlock();
 			if (res) {
 				ret = res;
@@ -364,9 +371,9 @@ static ssize_t bonding_store_slaves(struct device *d,
 			}
 			/* set the slave MTU to the default */
 			if (dev->change_mtu) {
-				dev->change_mtu(dev, 1500);
+				dev->change_mtu(dev, original_mtu);
 			} else {
-				dev->mtu = 1500;
+				dev->mtu = original_mtu;
 			}
 		}
 		else {
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 700d40a..b7b4f4a 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -156,6 +156,7 @@ struct slave {
 	s8     link;    /* one of BOND_LINK_XXXX */
 	s8     state;   /* one of BOND_STATE_XXXX */
 	u32    original_flags;
+	u32    original_mtu;
 	u32    link_failure_count;
 	u16    speed;
 	u8     duplex;
-- 
1.5.2-rc2.GIT


^ permalink raw reply related

* [PATCH 9/11] net/bonding: Delay sending of gratuitous ARP to avoid failure
From: Jay Vosburgh @ 2007-09-14 23:40 UTC (permalink / raw)
  To: netdev, rdreier, monis
  Cc: monisonlists, ogerlitz, jgarzik, davem, general, Moni Shoua
In-Reply-To: <11898132452802-git-send-email-fubar@us.ibm.com>

From: Moni Shoua <monis@voltaire.com>

Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
in dev->state field is on. This improves the chances for the arp packet to
be transmitted.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/net/bonding/bond_main.c |   24 +++++++++++++++++++++---
 drivers/net/bonding/bonding.h   |    1 +
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 9ff2cf6..dfbfb00 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1102,8 +1102,14 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 		if (new_active && !bond->do_set_mac_addr)
 			memcpy(bond->dev->dev_addr,  new_active->dev->dev_addr,
 				new_active->dev->addr_len);
-
-		bond_send_gratuitous_arp(bond);
+		if (bond->curr_active_slave &&
+			test_bit(__LINK_STATE_LINKWATCH_PENDING,
+					&bond->curr_active_slave->dev->state)) {
+			dprintk("delaying gratuitous arp on %s\n",
+				bond->curr_active_slave->dev->name);
+			bond->send_grat_arp = 1;
+		} else
+			bond_send_gratuitous_arp(bond);
 	}
 }
 
@@ -2083,6 +2089,17 @@ void bond_mii_monitor(struct net_device *bond_dev)
 	 * program could monitor the link itself if needed.
 	 */
 
+	if (bond->send_grat_arp) {
+		if (bond->curr_active_slave && test_bit(__LINK_STATE_LINKWATCH_PENDING,
+				&bond->curr_active_slave->dev->state))
+			dprintk("Needs to send gratuitous arp but not yet\n");
+		else {
+			dprintk("sending delayed gratuitous arp on on %s\n",
+				bond->curr_active_slave->dev->name);
+			bond_send_gratuitous_arp(bond);
+			bond->send_grat_arp = 0;
+		}
+	}
 	read_lock(&bond->curr_slave_lock);
 	oldcurrent = bond->curr_active_slave;
 	read_unlock(&bond->curr_slave_lock);
@@ -2484,7 +2501,7 @@ static void bond_send_gratuitous_arp(struct bonding *bond)
 
 	if (bond->master_ip) {
 		bond_arp_send(slave->dev, ARPOP_REPLY, bond->master_ip,
-				  bond->master_ip, 0);
+				bond->master_ip, 0);
 	}
 
 	list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
@@ -4293,6 +4310,7 @@ static int bond_init(struct net_device *bond_dev, struct bond_params *params)
 	bond->current_arp_slave = NULL;
 	bond->primary_slave = NULL;
 	bond->dev = bond_dev;
+	bond->send_grat_arp = 0;
 	INIT_LIST_HEAD(&bond->vlan_list);
 
 	/* Initialize the device entry points */
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index b7b4f4a..b1cdb1f 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -187,6 +187,7 @@ struct bonding {
 	struct   timer_list arp_timer;
 	s8       kill_timers;
 	s8       do_set_mac_addr;
+	s8	 send_grat_arp;
 	struct   net_device_stats stats;
 #ifdef CONFIG_PROC_FS
 	struct   proc_dir_entry *proc_entry;
-- 
1.5.2-rc2.GIT


^ permalink raw reply related

* [PATCH 10/11] net/bonding: Destroy bonding master when last slave is gone
From: Jay Vosburgh @ 2007-09-14 23:40 UTC (permalink / raw)
  To: netdev, rdreier, monis
  Cc: monisonlists, ogerlitz, jgarzik, davem, general, Moni Shoua
In-Reply-To: <11898132472055-git-send-email-fubar@us.ibm.com>

From: Moni Shoua <monis@voltaire.com>

When bonding enslaves non Ethernet devices it takes pointers to functions
in the module that owns the slaves. In this case it becomes unsafe
to keep the bonding master registered after last slave was unenslaved
because we don't know if the pointers are still valid.  Destroying the bond when slave_cnt is zero
ensures that these functions be used anymore.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/net/bonding/bond_main.c |   45 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/bonding/bonding.h   |    3 ++
 2 files changed, 47 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index dfbfb00..77caca3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1256,6 +1256,7 @@ static int bond_compute_features(struct bonding *bond)
 static void bond_setup_by_slave(struct net_device *bond_dev,
 				struct net_device *slave_dev)
 {
+	struct bonding *bond = bond_dev->priv;
 	bond_dev->hard_header	        = slave_dev->hard_header;
 	bond_dev->rebuild_header        = slave_dev->rebuild_header;
 	bond_dev->hard_header_cache	= slave_dev->hard_header_cache;
@@ -1270,6 +1271,7 @@ static void bond_setup_by_slave(struct net_device *bond_dev,
 
 	memcpy(bond_dev->broadcast, slave_dev->broadcast,
 		slave_dev->addr_len);
+	bond->setup_by_slave = 1;
 }
 
 /* enslave device <slave> to bond device <master> */
@@ -1838,6 +1840,35 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 }
 
 /*
+* Destroy a bonding device.
+* Must be under rtnl_lock when this function is called.
+*/
+void bond_destroy(struct bonding *bond)
+{
+	bond_deinit(bond->dev);
+	bond_destroy_sysfs_entry(bond);
+	unregister_netdevice(bond->dev);
+}
+
+/*
+* First release a slave and than destroy the bond if no more slaves iare left.
+* Must be under rtnl_lock when this function is called.
+*/
+int  bond_release_and_destroy(struct net_device *bond_dev, struct net_device *slave_dev)
+{
+	struct bonding *bond = bond_dev->priv;
+	int ret;
+
+	ret = bond_release(bond_dev, slave_dev);
+	if ((ret == 0) && (bond->slave_cnt == 0)) {
+		printk(KERN_INFO DRV_NAME " %s: destroying bond for.\n",
+					bond_dev->name);
+		bond_destroy(bond);
+	}
+	return ret;
+}
+
+/*
  * This function releases all slaves.
  */
 static int bond_release_all(struct net_device *bond_dev)
@@ -3322,7 +3353,11 @@ static int bond_slave_netdev_event(unsigned long event, struct net_device *slave
 	switch (event) {
 	case NETDEV_UNREGISTER:
 		if (bond_dev) {
-			bond_release(bond_dev, slave_dev);
+			dprintk("slave %s unregisters\n", slave_dev->name);
+			if (bond->setup_by_slave)
+				bond_release_and_destroy(bond_dev, slave_dev);
+			else
+				bond_release(bond_dev, slave_dev);
 		}
 		break;
 	case NETDEV_CHANGE:
@@ -3331,6 +3366,13 @@ static int bond_slave_netdev_event(unsigned long event, struct net_device *slave
 		 * sets up a hierarchical bond, then rmmod's
 		 * one of the slave bonding devices?
 		 */
+		if (slave_dev->priv_flags & IFF_SLAVE_DETACH) {
+			dprintk("slave %s detaching\n", slave_dev->name);
+			if (bond->setup_by_slave)
+				bond_release_and_destroy(bond_dev, slave_dev);
+			else
+				bond_release(bond_dev, slave_dev);
+		}
 		break;
 	case NETDEV_DOWN:
 		/*
@@ -4311,6 +4353,7 @@ static int bond_init(struct net_device *bond_dev, struct bond_params *params)
 	bond->primary_slave = NULL;
 	bond->dev = bond_dev;
 	bond->send_grat_arp = 0;
+	bond->setup_by_slave = 0;
 	INIT_LIST_HEAD(&bond->vlan_list);
 
 	/* Initialize the device entry points */
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index b1cdb1f..ed0f587 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -188,6 +188,7 @@ struct bonding {
 	s8       kill_timers;
 	s8       do_set_mac_addr;
 	s8	 send_grat_arp;
+	s8	 setup_by_slave;
 	struct   net_device_stats stats;
 #ifdef CONFIG_PROC_FS
 	struct   proc_dir_entry *proc_entry;
@@ -295,6 +296,8 @@ static inline void bond_unset_master_alb_flags(struct bonding *bond)
 struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr);
 int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev);
 int bond_create(char *name, struct bond_params *params, struct bonding **newbond);
+void bond_destroy(struct bonding *bond);
+int  bond_release_and_destroy(struct net_device *bond_dev, struct net_device *slave_dev);
 void bond_deinit(struct net_device *bond_dev);
 int bond_create_sysfs(void);
 void bond_destroy_sysfs(void);
-- 
1.5.2-rc2.GIT


^ permalink raw reply related

* [PATCH 11/11] bonding: Optionally allow ethernet slaves to keep own MAC
From: Jay Vosburgh @ 2007-09-14 23:40 UTC (permalink / raw)
  To: netdev, rdreier, monis
  Cc: monisonlists, ogerlitz, jgarzik, davem, general, Jay Vosburgh
In-Reply-To: <11898132492312-git-send-email-fubar@us.ibm.com>

 	Update the "don't change MAC of slaves" functionality added in
previous changes to be a generic option, rather than something tied to IB
devices, as it's occasionally useful for regular ethernet devices as well.

	Adds "fail_over_mac" option (which is automatically enabled for IB
slaves), applicable only to active-backup mode.

	Includes documentation update.

	Updates bonding driver version to 3.2.0.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
---
 Documentation/networking/bonding.txt |   33 +++++++++++++++++++
 drivers/net/bonding/bond_main.c      |   57 +++++++++++++++++++++------------
 drivers/net/bonding/bond_sysfs.c     |   49 +++++++++++++++++++++++++++++
 drivers/net/bonding/bonding.h        |    6 ++--
 4 files changed, 121 insertions(+), 24 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 1da5666..1134062 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -281,6 +281,39 @@ downdelay
 	will be rounded down to the nearest multiple.  The default
 	value is 0.
 
+fail_over_mac
+
+	Specifies whether active-backup mode should set all slaves to
+	the same MAC address (the traditional behavior), or, when
+	enabled, change the bond's MAC address when changing the
+	active interface (i.e., fail over the MAC address itself).
+
+	Fail over MAC is useful for devices that cannot ever alter
+	their MAC address, or for devices that refuse incoming
+	broadcasts with their own source MAC (which interferes with
+	the ARP monitor).
+
+	The down side of fail over MAC is that every device on the
+	network must be updated via gratuitous ARP, vs. just updating
+	a switch or set of switches (which often takes place for any
+	traffic, not just ARP traffic, if the switch snoops incoming
+	traffic to update its tables) for the traditional method.  If
+	the gratuitous ARP is lost, communication may be disrupted.
+
+	When fail over MAC is used in conjuction with the mii monitor,
+	devices which assert link up prior to being able to actually
+	transmit and receive are particularly susecptible to loss of
+	the gratuitous ARP, and an appropriate updelay setting may be
+	required.
+
+	A value of 0 disables fail over MAC, and is the default.  A
+	value of 1 enables fail over MAC.  This option is enabled
+	automatically if the first slave added cannot change its MAC
+	address.  This option may be modified via sysfs only when no
+	slaves are present in the bond.
+
+	This option was added in bonding version 3.2.0.
+
 lacp_rate
 
 	Option specifying the rate in which we'll ask our link partner
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 77caca3..c01ff9d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -97,6 +97,7 @@ static char *xmit_hash_policy = NULL;
 static int arp_interval = BOND_LINK_ARP_INTERV;
 static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, };
 static char *arp_validate = NULL;
+static int fail_over_mac = 0;
 struct bond_params bonding_defaults;
 
 module_param(max_bonds, int, 0);
@@ -130,6 +131,8 @@ module_param_array(arp_ip_target, charp, NULL, 0);
 MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form");
 module_param(arp_validate, charp, 0);
 MODULE_PARM_DESC(arp_validate, "validate src/dst of ARP probes: none (default), active, backup or all");
+module_param(fail_over_mac, int, 0);
+MODULE_PARM_DESC(fail_over_mac, "For active-backup, do not set all slaves to the same MAC.  0 of off (default), 1 for on.");
 
 /*----------------------------- Global variables ----------------------------*/
 
@@ -1099,7 +1102,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 		/* when bonding does not set the slave MAC address, the bond MAC
 		 * address is the one of the active slave.
 		 */
-		if (new_active && !bond->do_set_mac_addr)
+		if (new_active && bond->params.fail_over_mac)
 			memcpy(bond->dev->dev_addr,  new_active->dev->dev_addr,
 				new_active->dev->addr_len);
 		if (bond->curr_active_slave &&
@@ -1371,16 +1374,16 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	if (slave_dev->set_mac_address == NULL) {
 		if (bond->slave_cnt == 0) {
 			printk(KERN_WARNING DRV_NAME
-				": %s: Warning: The first slave device you "
-				"specified does not support setting the MAC "
-				"address. This bond MAC address would be that "
-				"of the active slave.\n", bond_dev->name);
-			bond->do_set_mac_addr = 0;
-		} else if (bond->do_set_mac_addr) {
+			       ": %s: Warning: The first slave device "
+			       "specified does not support setting the MAC "
+			       "address. Enabling the fail_over_mac option.",
+			       bond_dev->name);
+			bond->params.fail_over_mac = 1;
+		} else if (!bond->params.fail_over_mac) {
 			printk(KERN_ERR DRV_NAME
-				": %s: Error: The slave device you specified "
-				"does not support setting the MAC addres,."
-				"but this bond uses this practice. \n"
+				": %s: Error: The slave device specified "
+				"does not support setting the MAC address, "
+				"but fail_over_mac is not enabled.\n"
 				, bond_dev->name);
 			res = -EOPNOTSUPP;
 			goto err_undo_flags;
@@ -1405,7 +1408,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	 */
 	memcpy(new_slave->perm_hwaddr, slave_dev->dev_addr, ETH_ALEN);
 
-	if (bond->do_set_mac_addr) {
+	if (!bond->params.fail_over_mac) {
 		/*
 		 * Set slave to master's mac address.  The application already
 		 * set the master's mac address to that of the first slave
@@ -1641,7 +1644,7 @@ err_close:
 	dev_close(slave_dev);
 
 err_restore_mac:
-	if (bond->do_set_mac_addr) {
+	if (!bond->params.fail_over_mac) {
 		memcpy(addr.sa_data, new_slave->perm_hwaddr, ETH_ALEN);
 		addr.sa_family = slave_dev->type;
 		dev_set_mac_address(slave_dev, &addr);
@@ -1823,7 +1826,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 	/* close slave before restoring its mac address */
 	dev_close(slave_dev);
 
-	if (bond->do_set_mac_addr) {
+	if (!bond->params.fail_over_mac) {
 		/* restore original ("permanent") mac address */
 		memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
 		addr.sa_family = slave_dev->type;
@@ -1944,7 +1947,7 @@ static int bond_release_all(struct net_device *bond_dev)
 		/* close slave before restoring its mac address */
 		dev_close(slave_dev);
 
-		if (bond->do_set_mac_addr) {
+		if (!bond->params.fail_over_mac) {
 			/* restore original ("permanent") mac address*/
 			memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN);
 			addr.sa_family = slave_dev->type;
@@ -3066,9 +3069,15 @@ static void bond_info_show_master(struct seq_file *seq)
 	curr = bond->curr_active_slave;
 	read_unlock(&bond->curr_slave_lock);
 
-	seq_printf(seq, "Bonding Mode: %s\n",
+	seq_printf(seq, "Bonding Mode: %s",
 		   bond_mode_name(bond->params.mode));
 
+	if (bond->params.mode == BOND_MODE_ACTIVEBACKUP &&
+	    bond->params.fail_over_mac)
+		seq_printf(seq, " (fail_over_mac)");
+
+	seq_printf(seq, "\n");
+
 	if (bond->params.mode == BOND_MODE_XOR ||
 		bond->params.mode == BOND_MODE_8023AD) {
 		seq_printf(seq, "Transmit Hash Policy: %s (%d)\n",
@@ -4008,8 +4017,12 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
 
 	dprintk("bond=%p, name=%s\n", bond, (bond_dev ? bond_dev->name : "None"));
 
-	if (!bond->do_set_mac_addr)
-		return -EOPNOTSUPP;
+	/*
+	 * If fail_over_mac is enabled, do nothing and return success.
+	 * Returning an error causes ifenslave to fail.
+	 */
+	if (bond->params.fail_over_mac)
+		return 0;
 
 	if (!is_valid_ether_addr(sa->sa_data)) {
 		return -EADDRNOTAVAIL;
@@ -4402,10 +4415,6 @@ static int bond_init(struct net_device *bond_dev, struct bond_params *params)
 #ifdef CONFIG_PROC_FS
 	bond_create_proc_entry(bond);
 #endif
-
-	/* set do_set_mac_addr to true on startup */
-	bond->do_set_mac_addr = 1;
-
 	list_add_tail(&bond->bond_list, &bond_dev_list);
 
 	return 0;
@@ -4739,6 +4748,11 @@ static int bond_check_params(struct bond_params *params)
 		primary = NULL;
 	}
 
+	if (fail_over_mac && (bond_mode != BOND_MODE_ACTIVEBACKUP))
+		printk(KERN_WARNING DRV_NAME
+		       ": Warning: fail_over_mac only affects "
+		       "active-backup mode.\n");
+
 	/* fill params struct with the proper values */
 	params->mode = bond_mode;
 	params->xmit_policy = xmit_hashtype;
@@ -4750,6 +4764,7 @@ static int bond_check_params(struct bond_params *params)
 	params->use_carrier = use_carrier;
 	params->lacp_fast = lacp_fast;
 	params->primary[0] = 0;
+	params->fail_over_mac = fail_over_mac;
 
 	if (primary) {
 		strncpy(params->primary, primary, IFNAMSIZ);
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 71db5d9..a907b68 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -567,6 +567,54 @@ static ssize_t bonding_store_arp_validate(struct device *d,
 static DEVICE_ATTR(arp_validate, S_IRUGO | S_IWUSR, bonding_show_arp_validate, bonding_store_arp_validate);
 
 /*
+ * Show and store fail_over_mac.  User only allowed to change the
+ * value when there are no slaves.
+ */
+static ssize_t bonding_show_fail_over_mac(struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct bonding *bond = to_bond(d);
+
+	return sprintf(buf, "%d\n", bond->params.fail_over_mac) + 1;
+}
+
+static ssize_t bonding_store_fail_over_mac(struct device *d, struct device_attribute *attr, const char *buf, size_t count)
+{
+	int new_value;
+	int ret = count;
+	struct bonding *bond = to_bond(d);
+
+	if (bond->slave_cnt != 0) {
+		printk(KERN_ERR DRV_NAME
+		       ": %s: Can't alter fail_over_mac with slaves in bond.\n",
+		       bond->dev->name);
+		ret = -EPERM;
+		goto out;
+	}
+
+	if (sscanf(buf, "%d", &new_value) != 1) {
+		printk(KERN_ERR DRV_NAME
+		       ": %s: no fail_over_mac value specified.\n",
+		       bond->dev->name);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if ((new_value == 0) || (new_value == 1)) {
+		bond->params.fail_over_mac = new_value;
+		printk(KERN_INFO DRV_NAME ": %s: Setting fail_over_mac to %d.\n",
+		       bond->dev->name, new_value);
+	} else {
+		printk(KERN_INFO DRV_NAME
+		       ": %s: Ignoring invalid fail_over_mac value %d.\n",
+		       bond->dev->name, new_value);
+	}
+out:
+	return ret;
+}
+
+static DEVICE_ATTR(fail_over_mac, S_IRUGO | S_IWUSR, bonding_show_fail_over_mac, bonding_store_fail_over_mac);
+
+/*
  * Show and set the arp timer interval.  There are two tricky bits
  * here.  First, if ARP monitoring is activated, then we must disable
  * MII monitoring.  Second, if the ARP timer isn't running, we must
@@ -1390,6 +1438,7 @@ static DEVICE_ATTR(ad_partner_mac, S_IRUGO, bonding_show_ad_partner_mac, NULL);
 static struct attribute *per_bond_attrs[] = {
 	&dev_attr_slaves.attr,
 	&dev_attr_mode.attr,
+	&dev_attr_fail_over_mac.attr,
 	&dev_attr_arp_validate.attr,
 	&dev_attr_arp_interval.attr,
 	&dev_attr_arp_ip_target.attr,
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index ed0f587..9d6153e 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -22,8 +22,8 @@
 #include "bond_3ad.h"
 #include "bond_alb.h"
 
-#define DRV_VERSION	"3.1.3"
-#define DRV_RELDATE	"June 13, 2007"
+#define DRV_VERSION	"3.2.0"
+#define DRV_RELDATE	"September 13, 2007"
 #define DRV_NAME	"bonding"
 #define DRV_DESCRIPTION	"Ethernet Channel Bonding Driver"
 
@@ -128,6 +128,7 @@ struct bond_params {
 	int arp_interval;
 	int arp_validate;
 	int use_carrier;
+	int fail_over_mac;
 	int updelay;
 	int downdelay;
 	int lacp_fast;
@@ -186,7 +187,6 @@ struct bonding {
 	struct   timer_list mii_timer;
 	struct   timer_list arp_timer;
 	s8       kill_timers;
-	s8       do_set_mac_addr;
 	s8	 send_grat_arp;
 	s8	 setup_by_slave;
 	struct   net_device_stats stats;
-- 
1.5.2-rc2.GIT


^ permalink raw reply related

* [PATCH]: New SO_BINDTODEVICE fix.
From: David Miller @ 2007-09-14 23:42 UTC (permalink / raw)
  To: netdev; +Cc: greearb, kaber


Ok, I changed my mind and decided to retain the optlen==0
intended behavior.  It fell out of fixing the small
string length case.

This is likely what I'll push to Linus and later -stable
as a fix for this stuff.

Thanks.

commit 4878809f711981a602cc562eb47994fc81ea0155
Author: David S. Miller <davem@sunset.davemloft.net>
Date:   Fri Sep 14 16:41:03 2007 -0700

    [NET]: Fix two issues wrt. SO_BINDTODEVICE.
    
    1) Comments suggest that setting optlen to zero will unbind
       the socket from whatever device it might be attached to.  This
       hasn't been the case since at least 2.2.x because the first thing
       this function does is return -EINVAL if 'optlen' is less than
       sizeof(int).
    
       This check also means that passing in a two byte string doesn't
       work so well.  It's almost as if this code was testing with "eth?"
       patterned strings and nothing else :-)
    
       Fix this by breaking the logic of this facility out into a
       seperate function which validates optlen more appropriately.
    
       The optlen==0 and small string cases now work properly.
    
    2) We should reset the cached route of the socket after we have made
       the device binding changes, not before.
    
    Reported by Ben Greear.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/sock.c b/net/core/sock.c
index cfed7d4..190de61 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -362,6 +362,61 @@ struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie)
 }
 EXPORT_SYMBOL(sk_dst_check);
 
+static int sock_bindtodevice(struct sock *sk, char __user *optval, int optlen)
+{
+	int ret = -ENOPROTOOPT;
+#ifdef CONFIG_NETDEVICES
+	char devname[IFNAMSIZ];
+	int index;
+
+	/* Sorry... */
+	ret = -EPERM;
+	if (!capable(CAP_NET_RAW))
+		goto out;
+
+	ret = -EINVAL;
+	if (optlen < 0)
+		goto out;
+
+	/* Bind this socket to a particular device like "eth0",
+	 * as specified in the passed interface name. If the
+	 * name is "" or the option length is zero the socket
+	 * is not bound.
+	 */
+	if (optlen > IFNAMSIZ - 1)
+		optlen = IFNAMSIZ - 1;
+	memset(devname, 0, sizeof(devname));
+
+	ret = -EFAULT;
+	if (copy_from_user(devname, optval, optlen))
+		goto out;
+
+	if (devname[0] == '\0') {
+		index = 0;
+	} else {
+		struct net_device *dev = dev_get_by_name(devname);
+
+		ret = -ENODEV;
+		if (!dev)
+			goto out;
+
+		index = dev->ifindex;
+		dev_put(dev);
+	}
+
+	lock_sock(sk);
+	sk->sk_bound_dev_if = index;
+	sk_dst_reset(sk);
+	release_sock(sk);
+
+	ret = 0;
+
+out:
+#endif
+
+	return ret;
+}
+
 /*
  *	This is meant for all protocols to use and covers goings on
  *	at the socket level. Everything here is generic.
@@ -390,6 +445,9 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 	}
 #endif
 
+	if (optname == SO_BINDTODEVICE)
+		return sock_bindtodevice(sk, optval, optlen);
+
 	if (optlen < sizeof(int))
 		return -EINVAL;
 
@@ -578,54 +636,6 @@ set_rcvbuf:
 		ret = sock_set_timeout(&sk->sk_sndtimeo, optval, optlen);
 		break;
 
-#ifdef CONFIG_NETDEVICES
-	case SO_BINDTODEVICE:
-	{
-		char devname[IFNAMSIZ];
-
-		/* Sorry... */
-		if (!capable(CAP_NET_RAW)) {
-			ret = -EPERM;
-			break;
-		}
-
-		/* Bind this socket to a particular device like "eth0",
-		 * as specified in the passed interface name. If the
-		 * name is "" or the option length is zero the socket
-		 * is not bound.
-		 */
-
-		if (!valbool) {
-			sk->sk_bound_dev_if = 0;
-		} else {
-			if (optlen > IFNAMSIZ - 1)
-				optlen = IFNAMSIZ - 1;
-			memset(devname, 0, sizeof(devname));
-			if (copy_from_user(devname, optval, optlen)) {
-				ret = -EFAULT;
-				break;
-			}
-
-			/* Remove any cached route for this socket. */
-			sk_dst_reset(sk);
-
-			if (devname[0] == '\0') {
-				sk->sk_bound_dev_if = 0;
-			} else {
-				struct net_device *dev = dev_get_by_name(devname);
-				if (!dev) {
-					ret = -ENODEV;
-					break;
-				}
-				sk->sk_bound_dev_if = dev->ifindex;
-				dev_put(dev);
-			}
-		}
-		break;
-	}
-#endif
-
-
 	case SO_ATTACH_FILTER:
 		ret = -EINVAL;
 		if (optlen == sizeof(struct sock_fprog)) {

^ permalink raw reply related

* Re: [git patches] net driver fixes
From: Kok, Auke @ 2007-09-14 21:09 UTC (permalink / raw)
  To: Dan Williams, netdev, Jan-Bernd Themann
  Cc: Jeff Garzik, Andrew Morton, Linus Torvalds, LKML
In-Reply-To: <1189793598.2508.4.camel@xo-3E-67-34.localdomain>

Dan Williams wrote:
> On Thu, 2007-09-13 at 01:30 -0400, Jeff Garzik wrote:
>> Please pull from 'upstream-linus' branch of
>> master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream-linus
>>
>> to receive the following updates:
>>
>>  drivers/net/atl1/atl1_main.c |   19 +++++++------------
>>  drivers/net/ehea/ehea.h      |    5 ++++-
>>  drivers/net/ehea/ehea_main.c |   16 ++++++++++++++--
>>  drivers/net/phy/phy.c        |    4 ++--
>>  drivers/net/phy/phy_device.c |    4 ++--
>>  drivers/net/sky2.c           |    9 ++++++++-
>>  drivers/net/spider_net.c     |   12 ++++--------
>>  7 files changed, 41 insertions(+), 28 deletions(-)
>>
>> Hans-Jürgen Koch (1):
>>       Fix a lock problem in generic phy code
>>
>> Ishizaki Kou (1):
>>       spidernet: fix interrupt reason recognition
>>
>> Jan-Bernd Themann (2):
>>       ehea: propagate physical port state
>>       ehea: fix last_rx update
>>

maybe a little bit late with this comment:

>>  			ehea_error("Failed setting port speed");
>>  		}
>>  	}
>> -	netif_carrier_on(port->netdev);
>> +	if (!prop_carrier_state || (port->phy_link == EHEA_PHY_LINK_UP))
>> +		netif_carrier_on(port->netdev);
>> +
>>  	kfree(cb4);
>>  out:
>>  	return ret;
>> @@ -869,13 +875,19 @@ static void ehea_parse_eqe(struct ehea_adapter *adapter, u64 eqe)
>>  			}
>>  
>>  		if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PORT_UP, eqe)) {
>> +			port->phy_link = EHEA_PHY_LINK_UP;
>>  			if (netif_msg_link(port))
>>  				ehea_info("%s: Physical port up",
>>  					  port->netdev->name);
>> +			if (prop_carrier_state)
>> +				netif_carrier_on(port->netdev);
>>  		} else {
>> +			port->phy_link = EHEA_PHY_LINK_DOWN;
>>  			if (netif_msg_link(port))
>>  				ehea_info("%s: Physical port down",
>>  					  port->netdev->name);
>> +			if (prop_carrier_state)
>> +				netif_carrier_off(port->netdev);

maybe it was better to code this as 'ehea_carrier_off/on()' which then tests 
(prop_carrier_state) - this now begs for regressions where this isn't properly 
done in future commits, and on top of that there are all these extra conditions now.

Cheers,

Auke


^ permalink raw reply

* Re: [PATCH 1/4] [IPV6]: Fix unbalanced socket reference with MSG_CONFIRM.
From: David Miller @ 2007-09-15  0:17 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev
In-Reply-To: <20070913.093039.76110425.yoshfuji@linux-ipv6.org>

From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Date: Thu, 13 Sep 2007 09:30:39 +0900 (JST)

> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Applied, I'll push to -stable after it lands in Linus's tree.

Thank you.

^ permalink raw reply

* Re: [PATCH 2/4] [IPV6]: Fix oops during flushing corked datagrams.
From: David Miller @ 2007-09-15  0:18 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev
In-Reply-To: <20070913.093051.04296781.yoshfuji@linux-ipv6.org>

From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Date: Thu, 13 Sep 2007 09:30:51 +0900 (JST)

> When we corking sub-datagrams, we do not clone skb->dst for sub-datagrams
> other than the first one, so we get oops if we have multiple sub-datagrams
> here.
> 
> One possible way to fix this is to clone skb->dst for all sub-datagrams,
> but we do not take this approach because skb->dst is not used in other
> places and it is more natural to increment statistics once per a datagram.
> 
> Also applicable for stable releases.
> 
> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

This is already fixed (by you) in Linus's tree, and I already
plan to submit this to -stable.

^ permalink raw reply

* Re: [PATCH 3/4] [IPV6]: Just increment OutDatagrams once per a datagram.
From: David Miller @ 2007-09-15  0:18 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev
In-Reply-To: <20070913.093058.08351675.yoshfuji@linux-ipv6.org>

From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Date: Thu, 13 Sep 2007 09:30:58 +0900 (JST)

> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH 4/4] [IPV4]: Just increment OutDatagrams once per a datagram.
From: David Miller @ 2007-09-15  0:18 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev
In-Reply-To: <20070913.093106.30240091.yoshfuji@linux-ipv6.org>

From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Date: Thu, 13 Sep 2007 09:31:06 +0900 (JST)

> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Also applied, thank you.

^ permalink raw reply

* Re: Question about NAT-T and PF_KEY...
From: David Miller @ 2007-09-15  0:36 UTC (permalink / raw)
  To: sgros; +Cc: netdev, ikev2-devel
In-Reply-To: <1189369814.19024.28.camel@localhost.localdomain>

From: Stjepan Gros <sgros@zemris.fer.hr>
Date: Sun, 09 Sep 2007 22:30:13 +0200

> I'm having problems telling the kernel to do ESP-in-UDP encapsulation.
> Outgoing direction seems to work, but the incoming packets on the other
> side are passed to ikev2 daemon instead of kernel decapsulating them.

The daemon getting the packets on the UDP socket can only mean that
the rule hasn't been setup correctly.

^ permalink raw reply

* Re: e1000 driver and samba
From: L F @ 2007-09-15  0:37 UTC (permalink / raw)
  To: Kok, Auke; +Cc: netdev
In-Reply-To: <46EAF644.1040006@intel.com>

> can you describe your setup a bit more in detail? you're writing from a linux
> client to a windows smb server? or even to a linux server? which end sees the
> connection drop? the samba server? the samba linux client?
Certainly.
I have a LAN, with two switches in a stack. There currently are 7
WinXP clients and one linux machine. The linux machine acts as a samba
server and as a firewall/gateway.
The two ports of the PRO/1000 in the linux box are connected to the
LAN (eth4) and to a Comcast modem (eth3) respectively. Shorewall 3.4.5
is running on the linux machine, with a strong firewall + NAT setup.
Further, the linux machine currently has a tap device bridged into the
LAN side, for virtualbox.
Therefore, eth3 is a plain ethernet interface. br0, on the lan side,
is tap0 + eth4.
If I get any client on the LAN side, I can read from the linux box
without a problem. However, if I attempt to write to the linux box
from a LANside client, it will fail. If traffic is low, the failures
are sporadic. If traffic is high (large file and/or multiple incoming
files) the failure is guaranteed, either in 'delayed write fail' mode
on the client or in silent corruption of the file (much worse). If
read/write activity is combined, for instance when I unzip a zip
archive to its own directory, failure is guaranteed and rapid, with a
'delayed write fail' on the client after 50MB or so.
I can post .config and anything else you may want if you require it. I
tried changing cable as you suggested with little success. I'll try
changing switch port, just to cover all bases.
>
> Auke
>

^ permalink raw reply

* Re: [PATCH] ipg: add IP1000A driver to kernel tree
From: Andrew Morton @ 2007-09-15  2:21 UTC (permalink / raw)
  To: Francois Romieu; +Cc: jeff, Jesse Huang, netdev, shemminger, s.l-h
In-Reply-To: <20070914212105.GA28233@electric-eye.fr.zoreil.com>

On Fri, 14 Sep 2007 23:21:05 +0200 Francois Romieu <romieu@fr.zoreil.com> wrote:

> ...
>
> +
> +static void ipg_dump_tfdlist(struct net_device *dev)
> +{
> +	struct ipg_nic_private *sp = netdev_priv(dev);
> +	void __iomem *ioaddr = sp->ioaddr;
> +	unsigned int i;
> +	u32 offset;
> +
> +	IPG_DEBUG_MSG("_dump_tfdlist\n");
> +
> +	printk(KERN_INFO "tx_current         = %2.2x\n", sp->tx_current);
> +	printk(KERN_INFO "tx_dirty = %2.2x\n", sp->tx_dirty);
> +	printk(KERN_INFO "TFDList start address = %16.16lx\n",
> +	       (unsigned long) sp->txd_map);
> +	printk(KERN_INFO "TFDListPtr register   = %8.8x%8.8x\n",
> +	       ipg_r32(IPG_TFDLISTPTR1), ipg_r32(IPG_TFDLISTPTR0));
> +
> +	for (i = 0; i < IPG_TFDLIST_LENGTH; i++) {
> +		offset = (u32) &sp->txd[i].next_desc - (u32) sp->txd;
> +		printk(KERN_INFO "%2.2x %4.4x TFDNextPtr = %16.16lx\n", i,
> +		       offset, (unsigned long) sp->txd[i].next_desc);
> +
> +		offset = (u32) &sp->txd[i].tfc - (u32) sp->txd;

Is the u32 cast safe here on all architectures?

> +		printk(KERN_INFO "%2.2x %4.4x TFC        = %16.16lx\n", i,
> +		       offset, (unsigned long) sp->txd[i].tfc);
> +		offset = (u32) &sp->txd[i].frag_info - (u32) sp->txd;
> +		printk(KERN_INFO "%2.2x %4.4x frag_info   = %16.16lx\n", i,
> +		       offset, (unsigned long) sp->txd[i].frag_info);
> +	}
> +}
>
> ...
>
> +static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
> +{
> +	void __iomem *ioaddr = ipg_ioaddr(dev);
> +	/*
> +	 * The GMII mangement frame structure for a read is as follows:
> +	 *
> +	 * |Preamble|st|op|phyad|regad|ta|      data      |idle|
> +	 * |< 32 1s>|01|10|AAAAA|RRRRR|z0|DDDDDDDDDDDDDDDD|z   |
> +	 *
> +	 * <32 1s> = 32 consecutive logic 1 values
> +	 * A = bit of Physical Layer device address (MSB first)
> +	 * R = bit of register address (MSB first)
> +	 * z = High impedance state
> +	 * D = bit of read data (MSB first)
> +	 *
> +	 * Transmission order is 'Preamble' field first, bits transmitted
> +	 * left to right (first to last).
> +	 */
> +	struct {
> +		u32 field;
> +		unsigned int len;
> +	} p[] = {
> +		{ GMII_PREAMBLE,	32 },	/* Preamble */
> +		{ GMII_ST,		2  },	/* ST */
> +		{ GMII_READ,		2  },	/* OP */
> +		{ phy_id,		5  },	/* PHYAD */
> +		{ phy_reg,		5  },	/* REGAD */
> +		{ 0x0000,		2  },	/* TA */
> +		{ 0x0000,		16 },	/* DATA */
> +		{ 0x0000,		1  }	/* IDLE */
> +	};

This will be built on the stack at runtime.

> +	unsigned int i, j;
> +	u8 polarity, data;
> +
> +	polarity  = ipg_r8(PHY_CTRL);
> +	polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
> +
> +	/* Create the Preamble, ST, OP, PHYAD, and REGAD field. */
> +	for (j = 0; j < 5; j++) {
> +		for (i = 0; i < p[j].len; i++) {
> +			/* For each variable length field, the MSB must be
> +			 * transmitted first. Rotate through the field bits,
> +			 * starting with the MSB, and move each bit into the
> +			 * the 1st (2^1) bit position (this is the bit position
> +			 * corresponding to the MgmtData bit of the PhyCtrl
> +			 * register for the IPG).
> +			 *
> +			 * Example: ST = 01;
> +			 *
> +			 *          First write a '0' to bit 1 of the PhyCtrl
> +			 *          register, then write a '1' to bit 1 of the
> +			 *          PhyCtrl register.
> +			 *
> +			 * To do this, right shift the MSB of ST by the value:
> +			 * [field length - 1 - #ST bits already written]
> +			 * then left shift this result by 1.
> +			 */
> +			data  = (p[j].field >> (p[j].len - 1 - i)) << 1;
> +			data &= IPG_PC_MGMTDATA;
> +			data |= polarity | IPG_PC_MGMTDIR;
> +
> +			ipg_drive_phy_ctl_low_high(ioaddr, data);
> +		}
> +	}
> +
> +	send_three_state(ioaddr, polarity);
> +
> +	read_phy_bit(ioaddr, polarity);
> +
> +	/*
> +	 * For a read cycle, the bits for the next two fields (TA and
> +	 * DATA) are driven by the PHY (the IPG reads these bits).
> +	 */
> +	for (i = 0; i < p[6].len; i++) {
> +		p[6].field |=
> +		    (read_phy_bit(ioaddr, polarity) << (p[6].len - 1 - i));
> +	}

Simply because we're using p[6] as a temporary variable.

This can be tightened up.

> +	send_three_state(ioaddr, polarity);
> +	send_three_state(ioaddr, polarity);
> +	send_three_state(ioaddr, polarity);
> +	send_end(ioaddr, polarity);
> +
> +	/* Return the value of the DATA field. */
> +	return p[6].field;
> +}
> +
> +/*
> + * Write to a register from the Physical Layer device located
> + * on the IPG NIC, using the IPG PHYCTRL register.
> + */
> +static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int val)
> +{
> +	void __iomem *ioaddr = ipg_ioaddr(dev);
> +	/*
> +	 * The GMII mangement frame structure for a read is as follows:
> +	 *
> +	 * |Preamble|st|op|phyad|regad|ta|      data      |idle|
> +	 * |< 32 1s>|01|10|AAAAA|RRRRR|z0|DDDDDDDDDDDDDDDD|z   |
> +	 *
> +	 * <32 1s> = 32 consecutive logic 1 values
> +	 * A = bit of Physical Layer device address (MSB first)
> +	 * R = bit of register address (MSB first)
> +	 * z = High impedance state
> +	 * D = bit of write data (MSB first)
> +	 *
> +	 * Transmission order is 'Preamble' field first, bits transmitted
> +	 * left to right (first to last).
> +	 */
> +	struct {
> +		u32 field;
> +		unsigned int len;
> +	} p[] = {
> +		{ GMII_PREAMBLE,	32 },	/* Preamble */
> +		{ GMII_ST,		2  },	/* ST */
> +		{ GMII_WRITE,		2  },	/* OP */
> +		{ phy_id,		5  },	/* PHYAD */
> +		{ phy_reg,		5  },	/* REGAD */
> +		{ 0x0002,		2  },	/* TA */
> +		{ val & 0xffff,		16 },	/* DATA */
> +		{ 0x0000,		1  }	/* IDLE */
> +	};

similar here

> +	unsigned int i, j;
> +	u8 polarity, data;
> +
> +	polarity  = ipg_r8(PHY_CTRL);
> +	polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
> +
> +	/* Create the Preamble, ST, OP, PHYAD, and REGAD field. */
> +	for (j = 0; j < 7; j++) {
> +		for (i = 0; i < p[j].len; i++) {
> +			/* For each variable length field, the MSB must be
> +			 * transmitted first. Rotate through the field bits,
> +			 * starting with the MSB, and move each bit into the
> +			 * the 1st (2^1) bit position (this is the bit position
> +			 * corresponding to the MgmtData bit of the PhyCtrl
> +			 * register for the IPG).
> +			 *
> +			 * Example: ST = 01;
> +			 *
> +			 *          First write a '0' to bit 1 of the PhyCtrl
> +			 *          register, then write a '1' to bit 1 of the
> +			 *          PhyCtrl register.
> +			 *
> +			 * To do this, right shift the MSB of ST by the value:
> +			 * [field length - 1 - #ST bits already written]
> +			 * then left shift this result by 1.
> +			 */
> +			data  = (p[j].field >> (p[j].len - 1 - i)) << 1;
> +			data &= IPG_PC_MGMTDATA;
> +			data |= polarity | IPG_PC_MGMTDIR;
> +
> +			ipg_drive_phy_ctl_low_high(ioaddr, data);
> +		}
> +	}
> +
> +	/* The last cycle is a tri-state, so read from the PHY. */
> +	for (j = 7; j < 8; j++) {
> +		for (i = 0; i < p[j].len; i++) {
> +			ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | polarity);
> +
> +			p[j].field |= ((ipg_r8(PHY_CTRL) &
> +				IPG_PC_MGMTDATA) >> 1) << (p[j].len - 1 - i);
> +
> +			ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_HI | polarity);
> +		}
> +	}

although it might be tricky to avoid

> +}
> +
>
> ...
>
> +static int ipg_reset(struct net_device *dev, u32 resetflags)
> +{
> +	/* Assert functional resets via the IPG AsicCtrl
> +	 * register as specified by the 'resetflags' input
> +	 * parameter.
> +	 */
> +	void __iomem *ioaddr = ipg_ioaddr(dev);	//JES20040127EEPROM:
> +	unsigned int timeout_count = 0;
> +
> +	IPG_DEBUG_MSG("_reset\n");
> +
> +	ipg_w32(ipg_r32(ASIC_CTRL) | resetflags, ASIC_CTRL);
> +
> +	/* Delay added to account for problem with 10Mbps reset. */
> +	mdelay(IPG_AC_RESETWAIT);
> +
> +	while (IPG_AC_RESET_BUSY & ipg_r32(ASIC_CTRL)) {
> +		mdelay(IPG_AC_RESETWAIT);
> +		if (++timeout_count > IPG_AC_RESET_TIMEOUT)
> +			return -ETIME;

Is ETIME an appropriate errno here?  Zillions of drivers use it, but I
think it's for posix timers or something like that?

> +	}
> +	/* Set LED Mode in Asic Control JES20040127EEPROM */
> +	ipg_set_led_mode(dev);
> +
> +	/* Set PHYSet Register Value JES20040127EEPROM */
> +	ipg_set_phy_set(dev);
> +	return 0;
> +}
> +
> 
> [vast amounts trimmed]
>

Attention span expired, sorry.  ETIME.

^ permalink raw reply

* 2.6.23-rc6: known regressions with patches v2
From: Michal Piotrowski @ 2007-09-15  2:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, LKML, Gabriel C, Satyam Sharma, Vitaly Bordug,
	linux-acpi, Len Brown, Chuck Ebbert, Alexey Starikovskiy,
	Kay Sievers, Dmitry Torokhov, Greg KH, Anssi Hannula,
	Christian Kujau, jamal, netdev

Hi all,

Here is a list of some known regressions in 2.6.23-rc6
with patches available.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

Name                    Regressions fixed since 21-Jun-2007
Adrian Bunk                            10
Andi Kleen                             7
Linus Torvalds                         6
Alan Stern                             5
Hugh Dickins                           5
Trond Myklebust                        5
Andrew Morton                          4
David S. Miller                        4
Al Viro                                3
Alexey Starikovskiy                    3
Cornelia Huck                          3
Jens Axboe                             3
Stephen Hemminger                      3
Tejun Heo                              3



Unclassified

Subject         : Oops while modprobing phy fixed module
References      : http://lkml.org/lkml/2007/7/14/63
Last known good : ?
Submitter       : Gabriel C <nix.or.die@googlemail.com>
Caused-By       : ?
Handled-By      : Satyam Sharma <satyam.sharma@gmail.com>
                  Vitaly Bordug <vitb@kernel.crashing.org>
Patch1          : http://lkml.org/lkml/2007/7/18/506
Status          : patch available



ACPI

Subject         : 2.6.23-rc5 hangs on boot, apparently when initializing the EC
References      : http://lkml.org/lkml/2007/9/11/369
Last known good : ?
Submitter       : Chuck Ebbert <cebbert@redhat.com>
Caused-By       : ?
Handled-By      : Alexey Starikovskiy <aystarik@gmail.com>
Patch           : http://bugzilla.kernel.org/attachment.cgi?id=12673
Status          : patch was suggested



Drivercore

Subject         : sysfs change of input/event devices in 2.6.23rc breaks udev
References      : http://lkml.org/lkml/2007/9/8/86
Last known good : ?
Submitter       : Anssi Hannula <anssi.hannula@gmail.com>
Caused-By       : ?
Handled-By      : Dmitry Torokhov <dmitry.torokhov@gmail.com>
Patch           : http://lkml.org/lkml/2007/9/10/3
Status          : patch available



Networking

Subject         : 2.6.23-rc5: possible irq lock inversion dependency detected
References      : http://lkml.org/lkml/2007/9/2/97
Last known good : ?
Submitter       : Christian Kujau <lists@nerdbynature.de>
Caused-By       : ?
Handled-By      : jamal <hadi@cyberus.ca>
Patch           : http://lkml.org/lkml/2007/9/11/159
Status          : patch available



Farewell!
Michal

--
LOGOUT
http://www.stardust.webpages.pl/

^ permalink raw reply

* Re: [2/3] 2.6.23-rc6: known regressions v2
From: Michal Piotrowski @ 2007-09-15  2:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, LKML, linux-fsdevel, Dave Kleikamp, jfs-discussion,
	Andy Whitcroft, sct, adilger, linux-ext4, netdev, Shish,
	Karl Meyer, Francois Romieu, Daniel Drake, Oliver Neukum,
	linux-usb-devel, Florian Lohoff, Toralf Förster
In-Reply-To: <46EB3E94.7080704@googlemail.com>

Hi all,

Here is a list of some known regressions in 2.6.23-rc6.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

Name                    Regressions fixed since 21-Jun-2007
Adrian Bunk                            10
Andi Kleen                             7
Linus Torvalds                         6
Alan Stern                             5
Hugh Dickins                           5
Trond Myklebust                        5
Andrew Morton                          4
David S. Miller                        4
Al Viro                                3
Alexey Starikovskiy                    3
Cornelia Huck                          3
Jens Axboe                             3
Stephen Hemminger                      3
Tejun Heo                              3



FS

Subject         : hanging ext3 dbench tests
References      : http://lkml.org/lkml/2007/9/11/176
Last known good : ?
Submitter       : Andy Whitcroft <apw@shadowen.org>
Caused-By       : ?
Handled-By      : ?
Status          : under test -- unreproducible at present

Subject         : umount triggers a warning in jfs and takes almost a minute
References      : http://lkml.org/lkml/2007/9/4/73
Last known good : ?
Submitter       : Oliver Neukum <oliver@neukum.org>
Caused-By       : ?
Handled-By      : ?
Status          : unknown



Networking

Subject         : build #301 failed for 2.6.23-rc6-g0d4cbb5 in linux/drivers/net/wireless/libertas/
References      : http://lkml.org/lkml/2007/9/11/150
Last known good : ?
Submitter       : Toralf Förster <toralf.foerster@gmx.de>
Caused-By       : ?
Handled-By      : ?
Status          : unknown

Subject         : zd1211rw regression, device does not enumerate
References      : http://marc.info/?l=linux-usb-devel&m=118854967709322&w=2
                  http://bugzilla.kernel.org/show_bug.cgi?id=8972
Last known good : ?
Submitter       : Oliver Neukum <oliver@neukum.org>
Caused-By       : Daniel Drake <dsd@gentoo.org>
                  commit 74553aedd46b3a2cae986f909cf2a3f99369decc
Handled-By      : ?
Status          : unknown

Subject         : NETDEV WATCHDOG: eth0: transmit timed out
References      : http://lkml.org/lkml/2007/8/13/737
Last known good : ?
Submitter       : Karl Meyer <adhocrocker@gmail.com>
Caused-By       : ?
Handled-By      : Francois Romieu <romieu@fr.zoreil.com>
Status          : problem is being debugged

Subject         : Weird network problems with 2.6.23-rc2
References      : http://lkml.org/lkml/2007/8/11/40
Last known good : ?
Submitter       : Shish <shish@shishnet.org>
Caused-By       : ?
Handled-By      : ?
Status          : unknown



Farewell!
Michal

--
LOGOUT
http://www.stardust.webpages.pl/

^ permalink raw reply

* Re: Distributed storage. Move away from char device ioctls.
From: Mike Snitzer @ 2007-09-15  2:54 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Evgeniy Polyakov, netdev, linux-kernel, linux-fsdevel
In-Reply-To: <46EADC02.9070409@garzik.org>

On 9/14/07, Jeff Garzik <jeff@garzik.org> wrote:
> Evgeniy Polyakov wrote:
> > Hi.
> >
> > I'm pleased to announce fourth release of the distributed storage
> > subsystem, which allows to form a storage on top of remote and local
> > nodes, which in turn can be exported to another storage as a node to
> > form tree-like storages.
> >
> > This release includes new configuration interface (kernel connector over
> > netlink socket) and number of fixes of various bugs found during move
> > to it (in error path).
> >
> > Further TODO list includes:
> > * implement optional saving of mirroring/linear information on the remote
> >       nodes (simple)
> > * new redundancy algorithm (complex)
> > * some thoughts about distributed filesystem tightly connected to DST
> >       (far-far planes so far)
> >
> > Homepage:
> > http://tservice.net.ru/~s0mbre/old/?section=projects&item=dst
> >
> > Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
>
> My thoughts.  But first a disclaimer:   Perhaps you will recall me as
> one of the people who really reads all your patches, and examines your
> code and proposals closely.  So, with that in mind...
>
> I question the value of distributed block services (DBS), whether its
> your version or the others out there.  DBS are not very useful, because
> it still relies on a useful filesystem sitting on top of the DBS.  It
> devolves into one of two cases:  (1) multi-path much like today's SCSI,
> with distributed filesystem arbitrarion to ensure coherency, or (2) the
> filesystem running on top of the DBS is on a single host, and thus, a
> single point of failure (SPOF).

This distributed storage is very much needed; even if it were to act
as a more capable/performant replacement for NBD (or MD+NBD) in the
near term.  Many high availability applications don't _need_ all the
additional complexity of a full distributed filesystem.  So given
that, its discouraging to see you trying to gently push Evgeniy away
from all the promising work he has published.

Evgeniy, please continue your current work.

Mike

^ permalink raw reply

* Re: why does tcp_v[46]_conn_request not inc MIB stats
From: David Miller @ 2007-09-15  3:10 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev
In-Reply-To: <46E5900A.3010009@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Mon, 10 Sep 2007 11:42:18 -0700

> I've been digging around to see about inducing /proc/net/tcp to show 
> some "interesting" things for listen sockets (eg backlog depth, its max, 
> and dropped connection requests).  While there I've noticed that both 
> tcp_v[46]_syn_recv_sock and tcp_v[46]conn_request both check that the 
> listen queue is full, but only tcp_v[46]_syn_recv_sock increments some 
> mib stats for dropped connection requests.

They are checking two different things.

tcp_v{4,6}_conn_request is checking whether we are hitting the limit
for allowing the initial SYN and creating a new embryonic mini-socket.
Exceeding that is not a listen overflow.

tcp_v{4,6}_syn_recv_sock() is processing the end of the 3-way
handshake and wants to create a full established state socket to queue
into the listening parent.  This is checking the listening socket
queue limits, and indeed is a listen queue overflow if exceeded.

^ permalink raw reply

* Re: why does tcp_v[46]_conn_request not inc MIB stats
From: David Miller @ 2007-09-15  3:11 UTC (permalink / raw)
  To: sri; +Cc: rick.jones2, netdev
In-Reply-To: <1189461284.11066.10.camel@w-sridhar2.beaverton.ibm.com>

From: Sridhar Samudrala <sri@us.ibm.com>
Date: Mon, 10 Sep 2007 14:54:43 -0700

> looks like it is a hole in the stats. I think we should increment
> LISTENOVERFLOWS or LISTENDROPS in tcp_v[46]_conn_request too if the
> SYN is dropped.

No we should not.

This is limiting embryonic mini-socket creation.  The listen overflow
should only increment when the 3-way handshake completion is aborted
because the listening socket limit is exceeded, which is entirely
different from the embryonic limit.

^ permalink raw reply

* Re: Distributed storage. Move away from char device ioctls.
From: Jeff Garzik @ 2007-09-15  4:08 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: netdev, linux-kernel, linux-fsdevel
In-Reply-To: <20070914224212.GJ12444@fieldses.org>

J. Bruce Fields wrote:
> On Fri, Sep 14, 2007 at 06:32:11PM -0400, Jeff Garzik wrote:
>> J. Bruce Fields wrote:
>>> On Fri, Sep 14, 2007 at 05:14:53PM -0400, Jeff Garzik wrote:
>>>> NFSv4.1 adds to the fun, by throwing interoperability completely out the 
>>>> window.

>>> What parts are you worried about in particular?

>> I'm not worried; I'm stating facts as they exist today (draft 13):
>>
>> NFS v4.1 does something completely without precedent in the history of NFS: 
>>  the specification is defined such that interoperability is -impossible- to 
>> guarantee.
>>
>> pNFS permits private and unspecified layout types.  This means it is 
>> impossible to guarantee that one NFSv4.1 implementation will be able to 
>> talk another NFSv4.1 implementation.

> No, servers are required to support ordinary nfs operations to the
> metadata server.
> 
> At least, that's the way it was last I heard, which was a while ago.  I
> agree that it'd stink (for any number of reasons) if you ever *had* to
> get a layout to access some file.
> 
> Was that your main concern?

I just sorta assumed you could fall back to the NFSv4.0 mode of 
operation, going through the metadata server for all data accesses.

But look at that choice in practice:  you can either ditch pNFS 
completely, or use a proprietary solution.  The market incentives are 
CLEARLY tilted in favor of makers of proprietary solutions.  But it's a 
poor choice (really little choice at all).

Overall, my main concern is that NFSv4.1 is no longer an open 
architecture solution.  The "no-pNFS or proprietary platform" choice 
merely illustrate one of many negative aspects of this architecture.

One of NFS's biggest value propositions is its interoperability.  To 
quote some Wall Street guys, "NFS is like crack.  It Just Works.  We 
love it."

Now, for the first time in NFS's history (AFAIK), the protocol is no 
longer completely specified, completely known.  No longer a "closed 
loop."  Private layout types mean that it is _highly_ unlikely that any 
OS or appliance or implementation will be able to claim "full NFS 
compatibility."

And when the proprietary portion of the spec involves something as basic 
as accessing one's own data, I consider that a fundamental flaw.  NFS is 
no longer completely open.

	Jeff




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox